MyArxiv
Robotics
Goal Force: Teaching Video Models To Accomplish Physics-Conditioned Goals
Recent advancements in video generation have enabled the development of ``world models'' capable of simulating potential futures for robotics and planning. However, specifying precise goals for these models remains a challenge; text instructions are often too abstract to capture physical nuances, while target images are frequently infeasible to specify for dynamic tasks. To address this, we introduce Goal Force, a novel framework that allows users to define goals via explicit force vectors and intermediate dynamics, mirroring how humans conceptualize physical tasks. We train a video generation model on a curated dataset of synthetic causal primitives-such as elastic collisions and falling dominos-teaching it to propagate forces through time and space. Despite being trained on simple physics data, our model exhibits remarkable zero-shot generalization to complex, real-world scenarios, including tool manipulation and multi-object causal chains. Our results suggest that by grounding video generation in fundamental physical interactions, models can emerge as implicit neural physics simulators, enabling precise, physics-aware planning without reliance on external engines. We release all datasets, code, model weights, and interactive video demos at our project page.
comment: Code and interactive demos at https://goal-force.github.io/
DexterCap: An Affordable and Automated System for Capturing Dexterous Hand-Object Manipulation
Capturing fine-grained hand-object interactions is challenging due to severe self-occlusion from closely spaced fingers and the subtlety of in-hand manipulation motions. Existing optical motion capture systems rely on expensive camera setups and extensive manual post-processing, while low-cost vision-based methods often suffer from reduced accuracy and reliability under occlusion. To address these challenges, we present DexterCap, a low-cost optical capture system for dexterous in-hand manipulation. DexterCap uses dense, character-coded marker patches to achieve robust tracking under severe self-occlusion, together with an automated reconstruction pipeline that requires minimal manual effort. With DexterCap, we introduce DexterHand, a dataset of fine-grained hand-object interactions covering diverse manipulation behaviors and objects, from simple primitives to complex articulated objects such as a Rubik's Cube. We release the dataset and code to support future research on dexterous hand-object interaction.
comment: 12 pages, 12 figures
Intelligent Singularity Avoidance in UR10 Robotic Arm Path Planning Using Hybrid Fuzzy Logic and Reinforcement Learning
This paper presents a comprehensive approach to singularity detection and avoidance in UR10 robotic arm path planning through the integration of fuzzy logic safety systems and reinforcement learning algorithms. The proposed system addresses critical challenges in robotic manipulation where singularities can cause loss of control and potential equipment damage. Our hybrid approach combines real-time singularity detection using manipulability measures, condition number analysis, and fuzzy logic decision-making with a stable reinforcement learning framework for adaptive path planning. Experimental results demonstrate a 90% success rate in reaching target positions while maintaining safe distances from singular configurations. The system integrates PyBullet simulation for training data collection and URSim connectivity for real-world deployment.
comment: Published in TANET 2025 (Paper No. T0404)
SceneFoundry: Generating Interactive Infinite 3D Worlds
The ability to automatically generate large-scale, interactive, and physically realistic 3D environments is crucial for advancing robotic learning and embodied intelligence. However, existing generative approaches often fail to capture the functional complexity of real-world interiors, particularly those containing articulated objects with movable parts essential for manipulation and navigation. This paper presents SceneFoundry, a language-guided diffusion framework that generates apartment-scale 3D worlds with functionally articulated furniture and semantically diverse layouts for robotic training. From natural language prompts, an LLM module controls floor layout generation, while diffusion-based posterior sampling efficiently populates the scene with articulated assets from large-scale 3D repositories. To ensure physical usability, SceneFoundry employs differentiable guidance functions to regulate object quantity, prevent articulation collisions, and maintain sufficient walkable space for robotic navigation. Extensive experiments demonstrate that our framework generates structurally valid, semantically coherent, and functionally interactive environments across diverse scene types and conditions, enabling scalable embodied AI research.
comment: 15 pages
Modular Autonomy with Conversational Interaction: An LLM-driven Framework for Decision Making in Autonomous Driving
Recent advancements in Large Language Models (LLMs) offer new opportunities to create natural language interfaces for Autonomous Driving Systems (ADSs), moving beyond rigid inputs. This paper addresses the challenge of mapping the complexity of human language to the structured action space of modular ADS software. We propose a framework that integrates an LLM-based interaction layer with Autoware, a widely used open-source software. This system enables passengers to issue high-level commands, from querying status information to modifying driving behavior. Our methodology is grounded in three key components: a taxonomization of interaction categories, an application-centric Domain Specific Language (DSL) for command translation, and a safety-preserving validation layer. A two-stage LLM architecture ensures high transparency by providing feedback based on the definitive execution status. Evaluation confirms the system's timing efficiency and translation robustness. Simulation successfully validated command execution across all five interaction categories. This work provides a foundation for extensible, DSL-assisted interaction in modular and safety-conscious autonomy stacks.
comment: Submitted to the IEEE Intelligent Vehicles Symposium (IV 2026), Detroit, MI, United States
InsSo3D: Inertial Navigation System and 3D Sonar SLAM for turbid environment inspection
This paper presents InsSo3D, an accurate and efficient method for large-scale 3D Simultaneous Localisation and Mapping (SLAM) using a 3D Sonar and an Inertial Navigation System (INS). Unlike traditional sonar, which produces 2D images containing range and azimuth information but lacks elevation information, 3D Sonar produces a 3D point cloud, which therefore does not suffer from elevation ambiguity. We introduce a robust and modern SLAM framework adapted to the 3D Sonar data using INS as prior, detecting loop closure and performing pose graph optimisation. We evaluated InsSo3D performance inside a test tank with access to ground truth data and in an outdoor flooded quarry. Comparisons to reference trajectories and maps obtained from an underwater motion tracking system and visual Structure From Motion (SFM) demonstrate that InsSo3D efficiently corrects odometry drift. The average trajectory error is below 21cm during a 50-minute-long mission, producing a map of 10m by 20m with a 9cm average reconstruction error, enabling safe inspection of natural or artificial underwater structures even in murky water conditions.
FlyPose: Towards Robust Human Pose Estimation From Aerial Views WACV
Unmanned Aerial Vehicles (UAVs) are increasingly deployed in close proximity to humans for applications such as parcel delivery, traffic monitoring, disaster response and infrastructure inspections. Ensuring safe and reliable operation in these human-populated environments demands accurate perception of human poses and actions from an aerial viewpoint. This perspective challenges existing methods with low resolution, steep viewing angles and (self-)occlusion, especially if the application demands realtime feasibile models. We train and deploy FlyPose, a lightweight top-down human pose estimation pipeline for aerial imagery. Through multi-dataset training, we achieve an average improvement of 6.8 mAP in person detection across the test-sets of Manipal-UAV, VisDrone, HIT-UAV as well as our custom dataset. For 2D human pose estimation we report an improvement of 16.3 mAP on the challenging UAV-Human dataset. FlyPose runs with an inference latency of ~20 milliseconds including preprocessing on a Jetson Orin AGX Developer Kit and is deployed onboard a quadrotor UAV during flight experiments. We also publish FlyPose-104, a small but challenging aerial human pose estimation dataset, that includes manual annotations from difficult aerial perspectives: https://github.com/farooqhassaan/FlyPose.
comment: 11 pages, 9 figures, IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2026
Motion Compensation for Real Time Ultrasound Scanning in Robotically Assisted Prostate Biopsy Procedures ICRA 2026
Prostate cancer is one of the most common types of cancer in men. Its diagnosis by biopsy requires a high level of expertise and precision from the surgeon, so the results are highly operator-dependent. The aim of this work is to develop a robotic system for assisted ultrasound (US) examination of the prostate, a prebiopsy step that could reduce the dexterity requirements and enable faster, more accurate and more available prostate biopsy. We developed and validated a laboratory setup with a collaborative robotic arm that can autonomously scan a prostate phantom and attached the phantom to a medical robotic arm that mimics the patient's movements. The scanning robot keeps the relative position of the US probe and the prostate constant, ensuring a consistent and robust approach to reconstructing the prostate. To reconstruct the prostate, each slice is segmented to generate a series of prostate contours converted into a 3D point cloud used for biopsy planning. The average scan time of the prostate was 30 s, and the average 3D reconstruction of the prostate took 3 s. We performed four motion scenarios: the phantom was scanned in a stationary state (S), with horizontal motion (H), with vertical motion (V), and with a combination of the two (C). System validation is performed by registering the prostate point cloud reconstructions acquired during different motions (H, V, C) with those obtained in the stationary state. ICP registration with a threshold of 0.8 mm yields mean 83.2\% fitness and 0.35 mm RMSE for S-H registration, 84.1\% fitness and 0.37 mm RMSE for S-V registration and 79.4\% fitness and 0.37 mm RMSE for S-C registration. Due to the elastic and soft material properties of the prostate phantom, the maximum robot tracking error was 3 mm, which can be sufficient for prostate biopsy according to medical literature. The maximum delay in motion compensation was 0.5 s.
comment: Submitted for ICRA 2026
EvoQRE: Modeling Bounded Rationality in Safety-Critical Traffic Simulation via Evolutionary Quantal Response Equilibrium
Existing traffic simulation frameworks for autonomous vehicles typically rely on imitation learning or game-theoretic approaches that solve for Nash or coarse correlated equilibria, implicitly assuming perfectly rational agents. However, human drivers exhibit bounded rationality, making approximately optimal decisions under cognitive and perceptual constraints. We propose EvoQRE, a principled framework for modeling safety-critical traffic interactions as general-sum Markov games solved via Quantal Response Equilibrium (QRE) and evolutionary game dynamics. EvoQRE integrates a pre-trained generative world model with entropy-regularized replicator dynamics, capturing stochastic human behavior while maintaining equilibrium structure. We provide rigorous theoretical results, proving that the proposed dynamics converge to Logit-QRE under a two-timescale stochastic approximation with an explicit convergence rate of O(log k / k^{1/3}) under weak monotonicity assumptions. We further extend QRE to continuous action spaces using mixture-based and energy-based policy representations. Experiments on the Waymo Open Motion Dataset and nuPlan benchmark demonstrate that EvoQRE achieves state-of-the-art realism, improved safety metrics, and controllable generation of diverse safety-critical scenarios through interpretable rationality parameters.
comment: 11 pages, 5 figures
Mobile Robot Localization Using a Novel Whisker-Like Sensor
Whisker-like touch sensors offer unique advantages for short-range perception in environments where visual and long-range sensing are unreliable, such as confined, cluttered, or low-visibility settings. This paper presents a framework for estimating contact points and robot localization in a known planar environment using a single whisker sensor. We develop a family of virtual sensor models. Each model maps robot configurations to sensor observations and enables structured reasoning through the concept of preimages - the set of robot states consistent with a given observation. The notion of virtual sensor models serves as an abstraction to reason about state uncertainty without dependence on physical implementation. By combining sensor observations with a motion model, we estimate the contact point. Iterative estimation then enables reconstruction of obstacle boundaries. Furthermore, intersecting states inferred from current observations with forward-projected states from previous steps allow accurate robot localization without relying on vision or external systems. The framework supports both deterministic and possibilistic formulations and is validated through simulation and physical experiments using a low-cost, 3D printed, Hall-effect-based whisker sensor. Results demonstrate accurate contact estimation and localization with errors under 7 mm, demonstrating the potential of whisker-based sensing as a lightweight, adaptable complement to vision-based navigation.
Learning specifications for reactive synthesis with safety constraints
This paper presents a novel approach to learning from demonstration that enables robots to autonomously execute complex tasks in dynamic environments. We model latent tasks as probabilistic formal languages and introduce a tailored reactive synthesis framework that balances robot costs with user task preferences. Our methodology focuses on safety-constrained learning and inferring formal task specifications as Probabilistic Deterministic Finite Automata (PDFA). We adapt existing evidence-driven state merging algorithms and incorporate safety requirements throughout the learning process to ensure that the learned PDFA always complies with safety constraints. Furthermore, we introduce a multi-objective reactive synthesis algorithm that generates deterministic strategies that are guaranteed to satisfy the PDFA task while optimizing the trade-offs between user preferences and robot costs, resulting in a Pareto front of optimal solutions. Our approach models the interaction as a two-player game between the robot and the environment, accounting for dynamic changes. We present a computationally-tractable value iteration algorithm to generate the Pareto front and the corresponding deterministic strategies. Comprehensive experimental results demonstrate the effectiveness of our algorithms across various robots and tasks, showing that the learned PDFA never includes unsafe behaviors and that synthesized strategies consistently achieve the task while meeting both the robot cost and user-preference requirements.
Safety Not Found (404): Hidden Risks of LLM-Based Robotics Decision Making
One mistake by an AI system in a safety-critical setting can cost lives. As Large Language Models (LLMs) become integral to robotics decision-making, the physical dimension of risk grows; a single wrong instruction can directly endanger human safety. This paper addresses the urgent need to systematically evaluate LLM performance in scenarios where even minor errors are catastrophic. Through a qualitative evaluation of a fire evacuation scenario, we identified critical failure cases in LLM-based decision-making. Based on these, we designed seven tasks for quantitative assessment, categorized into: Complete Information, Incomplete Information, and Safety-Oriented Spatial Reasoning (SOSR). Complete information tasks utilize ASCII maps to minimize interpretation ambiguity and isolate spatial reasoning from visual processing. Incomplete information tasks require models to infer missing context, testing for spatial continuity versus hallucinations. SOSR tasks use natural language to evaluate safe decision-making in life-threatening contexts. We benchmark various LLMs and Vision-Language Models (VLMs) across these tasks. Beyond aggregate performance, we analyze the implications of a 1% failure rate, highlighting how "rare" errors escalate into catastrophic outcomes. Results reveal serious vulnerabilities: several models achieved a 0% success rate in ASCII navigation, while in a simulated fire drill, models instructed robots to move toward hazardous areas instead of emergency exits. Our findings lead to a sobering conclusion: current LLMs are not ready for direct deployment in safety-critical systems. A 99% accuracy rate is dangerously misleading in robotics, as it implies one out of every hundred executions could result in catastrophic harm. We demonstrate that even state-of-the-art models cannot guarantee safety, and absolute reliance on them creates unacceptable risks.
TOSC: Task-Oriented Shape Completion for Open-World Dexterous Grasp Generation from Partial Point Clouds AAAI 2026
Task-oriented dexterous grasping remains challenging in robotic manipulations of open-world objects under severe partial observation, where significant missing data invalidates generic shape completion. In this paper, to overcome this limitation, we study Task-Oriented Shape Completion, a new task that focuses on completing the potential contact regions rather than the entire shape. We argue that shape completion for grasping should be explicitly guided by the downstream manipulation task. To achieve this, we first generate multiple task-oriented shape completion candidates by leveraging the zero-shot capabilities of object functional understanding from several pre-trained foundation models. A 3D discriminative autoencoder is then proposed to evaluate the plausibility of each generated candidate and optimize the most plausible one from a global perspective. A conditional flow-matching model named FlowGrasp is developed to generate task-oriented dexterous grasps from the optimized shape. Our method achieves state-of-the-art performance in task-oriented dexterous grasping and task-oriented shape completion, improving the Grasp Displacement and the Chamfer Distance over the state-of-the-art by 16.17\% and 55.26%, respectively. In particular, it shows good capabilities in grasping objects with severe missing data. It also demonstrates good generality in handling open-set categories and tasks.
comment: Accepted to AAAI 2026
Assembling Solar Panels by Dual Robot Arms Towards Full Autonomous Lunar Base Construction
Since the successful Apollo program, humanity is once again aiming to return to the Moon for scientific discovery, resource mining, and inhabitation. Upcoming decades focus on building a lunar outpost, with robotic systems playing a crucial role to safely and efficiently establish essential infrastructure such as solar power generating towers. Similar to the construction of the International Space Station (ISS), shipping necessary components via modules and assembling them in situ should be a practical scenario. In this context, this paper focuses on the integration of vision, control, and hardware systems within an autonomous sequence for a dual-arm robot system. We explore a perception and control pipeline specifically designed for assembling solar panel modules, one of the benchmark tasks. Ad hoc hardware was designed and tested in real-world experiments. A mock-up of modular solar panels and active-passive connectors are employed, with the control of this grappling fixture integrated into the proposed pipeline. The successful implementation of our method demonstrates that the two robot manipulators can effectively connect arbitrarily placed panels, highlighting the seamless integration of vision, control, and hardware systems in complex space applications.
comment: This is the authors' version of a paper accepted for publication in IEEE/SICE International Symposium on System Integration (SII), 2025, (c) IEEE
Low-Latency Event-Based Velocimetry for Quadrotor Control in a Narrow Pipe
Autonomous quadrotor flight in confined spaces such as pipes and tunnels presents significant challenges due to unsteady, self-induced aerodynamic disturbances. Very recent advances have enabled flight in such conditions, but they either rely on constant motion through the pipe to mitigate airflow recirculation effects or suffer from limited stability during hovering. In this work, we present the first closed-loop control system for quadrotors for hovering in narrow pipes that leverages real-time flow field measurements. We develop a low-latency, event-based smoke velocimetry method that estimates local airflow at high temporal resolution. This flow information is used by a disturbance estimator based on a recurrent convolutional neural network, which infers force and torque disturbances in real time. The estimated disturbances are integrated into a learning-based controller trained via reinforcement learning. The flow-feedback control proves particularly effective during lateral translation maneuvers in the pipe cross-section. There, the real-time disturbance information enables the controller to effectively counteract transient aerodynamic effects, thereby preventing collisions with the pipe wall. To the best of our knowledge, this work represents the first demonstration of an aerial robot with closed-loop control informed by real-time flow field measurements. This opens new directions for research on flight in aerodynamically complex environments. In addition, our work also sheds light on the characteristic flow structures that emerge during flight in narrow, circular pipes, providing new insights at the intersection of robotics and fluid dynamics.
comment: 19 pages
Closing the Reality Gap: Zero-Shot Sim-to-Real Deployment for Dexterous Force-Based Grasping and Manipulation
Human-like dexterous hands with multiple fingers offer human-level manipulation capabilities, but training control policies that can directly deploy on real hardware remains difficult due to contact-rich physics and imperfect actuation. We close this gap with a practical sim-to-real reinforcement learning (RL) framework that utilizes dense tactile feedback combined with joint torque sensing to explicitly regulate physical interactions. To enable effective sim-to-real transfer, we introduce (i) a computationally fast tactile simulation that computes distances between dense virtual tactile units and the object via parallel forward kinematics, providing high-rate, high-resolution touch signals needed by RL; (ii) a current-to-torque calibration that eliminates the need for torque sensors on dexterous hands by mapping motor current to joint torque; and (iii) actuator dynamics modeling to bridge the actuation gaps with randomization of non-ideal effects such as backlash, torque-speed saturation. Using an asymmetric actor-critic PPO pipeline trained entirely in simulation, our policies deploy directly to a five-finger hand. The resulting policies demonstrated two essential skills: (1) command-based, controllable grasp force tracking, and (2) reorientation of objects in the hand, both of which were robustly executed without fine-tuning on the robot. By combining tactile and torque in the observation space with effective sensing/actuation modeling, our system provides a practical solution to achieve reliable dexterous manipulation. To our knowledge, this is the first demonstration of controllable grasping on a multi-finger dexterous hand trained entirely in simulation and transferred zero-shot on real hardware.
iTeach: Interactive Teaching for Robot Perception using Mixed Reality
Robots deployed in the wild often encounter objects and scenes that break pre-trained perception models, yet adapting these models typically requires slow offline data collection, labeling, and retraining. We introduce iTeach, a human-in-the-loop system that enables robots to improve perception continuously as they explore new environments. A human sees the robot's predictions from its own viewpoint, corrects failures in real time, and the informed data drives iterative fine-tuning until performance is satisfactory. A mixed reality headset provides the interface, overlaying predictions in the user's view and enabling lightweight annotation via eye gaze and voice. Instead of tedious frame-by-frame labeling, a human guides the robot to scenes of choice and records short videos while interacting with objects. The human labels only the final frame, and a video segmentation model propagates labels across the sequence, converting seconds of input into dense supervision. The refined model is deployed immediately, closing the loop between human feedback and robot learning. We demonstrate iTeach on Unseen Object Instance Segmentation (UOIS), achieving consistent improvements over a pre-trained MSMFormer baseline on both our collected dataset and the SceneReplica benchmark, where it leads to higher grasping success, followed by a real-world demonstration of grasping unseen objects with a Fetch robot. By combining human judgment, efficient annotation, and on-the-fly refinement, iTeach provides a practical path toward perception systems that generalize robustly in diverse real-world conditions. Project page at https://irvlutd.github.io/iTeach
Dense 3D Displacement Estimation for Landslide Monitoring via Fusion of TLS Point Clouds and Embedded RGB Images
Landslide monitoring is essential for understanding geohazards and mitigating associated risks. Existing point cloud-based methods, however, typically rely on either geometric or radiometric information and often yield sparse or non-3D displacement estimates. In this paper, we propose a hierarchical partitioning-based coarse-to-fine approach that integrates 3D point clouds and co-registered RGB images to estimate dense 3D displacement vector fields. Patch-level matches are constructed using both 3D geometry and 2D image features, refined via geometric consistency checks, and followed by rigid transformation estimation per match. Experimental results on two real-world landslide datasets demonstrate that the proposed method produces 3D displacement estimates with high spatial coverage (79% and 97%) and accuracy. Deviations in displacement magnitude with respect to external measurements (total station or GNSS observations) are 0.15 m and 0.25 m on the two datasets, respectively, and only 0.07 m and 0.20 m compared to manually derived references, all below the mean scan resolutions (0.08 m and 0.30 m). Compared with the state-of-the-art method F2S3, the proposed approach improves spatial coverage while maintaining comparable accuracy. The proposed approach offers a practical and adaptable solution for TLS-based landslide monitoring and is extensible to other types of point clouds and monitoring tasks. The example data and source code are publicly available at https://github.com/gseg-ethz/fusion4landslide.
comment: Published in the International Journal of Applied Earth Observation and Geoinformation. 25 pages, 19 figures
Anomaly detection for generic failure monitoring in robotic assembly, screwing and manipulation
Out-of-distribution states in robot manipulation often lead to unpredictable robot behavior or task failure, limiting success rates and increasing risk of damage. Anomaly detection (AD) can identify deviations from expected patterns in data, which can be used to trigger failsafe behaviors and recovery strategies. Prior work has applied data-driven AD on time series data for specific robotic tasks, however the transferability of an AD approach between different robot control strategies and task types has not been shown. Leveraging time series data, such as force/torque signals, allows to directly capture robot-environment interactions, crucial for manipulation and online failure detection. Their broad availability, high sampling rates, and low dimensionality enable high temporal resolution and efficient processing. As robotic tasks can have widely signal characteristics and requirements, AD methods which can be applied in the same way to a wide range of tasks is needed, ideally with good data efficiency. We examine three industrial tasks, each presenting several anomalies. Test scenarios in robotic cabling, screwing, and sanding are built, and multi-modal time series data is gathered. Several autoencoder-based methods are compared, and we evaluate the generalization across different tasks and control methods (diffusion policy-, position-, and impedance-controlled). This allows us to validate the integration of AD in complex tasks involving tighter tolerances and variation from both the robot and its environment. Additionally, we evaluate data efficiency, detection latency, and task characteristics which support robust detection. The results indicate reliable detection with AUROC above 0.96 in failures in the cabling and screwing task, such as incorrect or misaligned parts. In the polishing task, only severe failures were reliably detected, while more subtle failures remained undetected.
Non-Prehensile Tool-Object Manipulation by Integrating LLM-Based Planning and Manoeuvrability-Driven Controls
The ability to wield tools was once considered exclusive to human intelligence, but it's now known that many other animals, like crows, possess this capability. Yet, robotic systems still fall short of matching biological dexterity. In this paper, we investigate the use of Large Language Models (LLMs), tool affordances, and object manoeuvrability for non-prehensile tool-based manipulation tasks. Our novel method leverages LLMs based on scene information and natural language instructions to enable symbolic task planning for tool-object manipulation. This approach allows the system to convert a human language sentence into a sequence of feasible motion functions. We have developed a novel manoeuvrability-driven controller using a new tool affordance model derived from visual feedback. This controller helps guide the robot's tool utilization and manipulation actions, even within confined areas, using a stepping incremental approach. The proposed methodology is evaluated with experiments to prove its effectiveness under various manipulation scenarios.
AURASeg: Attention Guided Upsampling with Residual Boundary-Assistive Refinement for Drivable-Area Segmentation
Free space ground segmentation is essential to navigate autonomous robots, recognize drivable zones, and traverse efficiently. Fine-grained features remain challenging for existing segmentation models, particularly for robots in indoor and structured environments. These difficulties arise from ineffective multi-scale processing, suboptimal boundary refinement, and limited feature representation. To address this, we propose Attention-Guided Upsampling with Residual Boundary-Assistive Refinement (AURASeg), a ground-plane semantic segmentation framework designed to improve border precision while preserving strong region accuracy. Built on a ResNet-50 backbone, AURASeg introduces (i) a Residual Border Refinement Module (RBRM) that enhances edge delineation through boundary-assistive feature refinement, and (ii) Attention Progressive Upsampling Decoder (APUD) blocks that progressively fuse multi-level features during decoding. Additionally, we integrate a (iii) lightweight ASPPLite module to capture multi-scale context with minimal overhead. Extensive experiments on CARL-D, the Ground Mobile Robot Perception (GMRP) dataset, and a custom Gazebo indoor dataset show that AURASeg consistently outperforms strong baselines, with notable gains in boundary metrics. Finally, we demonstrate real-time deployment on a Kobuki TurtleBot, validating practical usability. The code is available at https://github.com/Narendhiranv04/AURASeg
comment: 6 pages, 4 figures, 4 tables
GR-Dexter Technical Report
Vision-language-action (VLA) models have enabled language-conditioned, long-horizon robot manipulation, but most existing systems are limited to grippers. Scaling VLA policies to bimanual robots with high degree-of-freedom (DoF) dexterous hands remains challenging due to the expanded action space, frequent hand-object occlusions, and the cost of collecting real-robot data. We present GR-Dexter, a holistic hardware-model-data framework for VLA-based generalist manipulation on a bimanual dexterous-hand robot. Our approach combines the design of a compact 21-DoF robotic hand, an intuitive bimanual teleoperation system for real-robot data collection, and a training recipe that leverages teleoperated robot trajectories together with large-scale vision-language and carefully curated cross-embodiment datasets. Across real-world evaluations spanning long-horizon everyday manipulation and generalizable pick-and-place, GR-Dexter achieves strong in-domain performance and improved robustness to unseen objects and unseen instructions. We hope GR-Dexter serves as a practical step toward generalist dexterous-hand robotic manipulation.
SLAM&Render: A Benchmark for the Intersection Between Neural Rendering, Gaussian Splatting and SLAM
Models and methods originally developed for Novel View Synthesis and Scene Rendering, such as Neural Radiance Fields (NeRF) and Gaussian Splatting, are increasingly being adopted as representations in Simultaneous Localization and Mapping (SLAM). However, existing datasets fail to include the specific challenges of both fields, such as sequential operations and, in many settings, multi-modality in SLAM or generalization across viewpoints and illumination conditions in neural rendering. Additionally, the data are often collected using sensors which are handheld or mounted on drones or mobile robots, which complicates the accurate reproduction of sensor motions. To bridge these gaps, we introduce SLAM&Render, a novel dataset designed to benchmark methods in the intersection between SLAM, Novel View Rendering and Gaussian Splatting. Recorded with a robot manipulator, it uniquely includes 40 sequences with time-synchronized RGB-D images, IMU readings, robot kinematic data, and ground-truth pose streams. By releasing robot kinematic data, the dataset also enables the assessment of recent integrations of SLAM paradigms within robotic applications. The dataset features five setups with consumer and industrial objects under four controlled lighting conditions, each with separate training and test trajectories. All sequences are static with different levels of object rearrangements and occlusions. Our experimental results, obtained with several baselines from the literature, validate SLAM&Render as a relevant benchmark for this emerging research area.
comment: 9 pages, 8 figures, submitted to The International Journal of Robotics Research (IJRR)
Hierarchical GNN-Based Multi-Agent Learning for Dynamic Queue-Jump Lane and Emergency Vehicle Corridor Formation
Emergency vehicles require rapid passage through congested traffic, yet existing strategies fail to adapt to dynamic conditions. We propose a novel hierarchical graph neural network (GNN)-based multi-agent reinforcement learning framework to coordinate connected vehicles for emergency corridor formation. Our approach uses a high-level planner for global strategy and low-level controllers for trajectory execution, utilizing graph attention networks to scale with variable agent counts. Trained via Multi-Agent Proximal Policy Optimization (MAPPO), the system reduces emergency vehicle travel time by 28.3% compared to baselines and 44.6% compared to uncoordinated traffic in simulations. The design achieves near-zero collision rates (0.3%) while maintaining 81% of background traffic efficiency. Ablation and generalization studies confirm the framework's robustness across diverse scenarios. These results demonstrate the effectiveness of combining GNNs with hierarchical learning for intelligent transportation systems.
comment: 16 Pages, 5 Figures, 9 Tables, submitted to IEEE TITS
Volume-Consistent Kneading-Based Deformation Manufacturing for Material-Efficient Shaping
Conventional subtractive manufacturing inevitably involves material loss during geometric realization, while additive manufacturing still suffers from limitations in surface quality, process continuity, and productivity when fabricating complex geometries. To address these challenges, this paper proposes a volume-consistent kneading-based forming method for plastic materials, enabling continuous and controllable three-dimensional deformation under mass conservation. An integrated kneading-based manufacturing system is developed, in which geometry-aware kneading command generation, layer-wise kneading execution, and in-process point-cloud scanning are tightly coupled to form a closed-loop workflow of scanning, forming, and feedback compensation. Target geometries are analyzed through layer-wise point-cloud processing and classified into enveloping and non-enveloping types. Accordingly, an Envelope Shaping First strategy and a Similar Gradient Method are adopted to ensure stable material flow and continuous deformation. An RMSE-based compensation scheme is further introduced to correct systematic geometric deviations induced by elastic rebound and material redistribution. Experimental validation on five representative geometries demonstrates high geometric fidelity, with material utilization consistently exceeding 98%. The results indicate that kneading-based forming provides a promising alternative manufacturing paradigm for low-waste, customizable production.
comment: 39 pages, 31 figures
On Steerability Factors for Growing Vine Robots
Vine robots extend their tubular bodies by everting material from the tip, enabling navigation in complex environments with a minimalist soft body. Despite their promise for field applications, especially in the urban search and rescue domain, performance is constrained by the weight of attached sensors or tools, as well as other design and control choices. This work investigates how tip load, pressure, length, diameter, and fabrication method shape vine robot steerability--the ability to maneuver with controlled curvature--for robots that steer with series pouch motor-style pneumatic actuators. We conduct two groups of experiments: (1) studying tip load, chamber pressure, length, and diameter in a robot supporting itself against gravity, and (2) studying fabrication method and ratio of actuator to chamber pressure in a robot supported on the ground. Results show that steerability decreases with increasing tip load, is best at moderate chamber pressure, increases with length, and is largely unaffected by diameter. Robots with actuators attached on their exterior begin curving at low pressure ratios, but curvature saturates at high pressure ratios; those with actuators integrated into the robot body require higher pressure ratios to begin curving but achieve higher curvature overall. We demonstrate that robots optimized with these principles outperform those with ad hoc parameters in a mobility task that involves maximizing upward and horizontal curvatures.
PartDexTOG: Generating Dexterous Task-Oriented Grasping via Language-driven Part Analysis
Task-oriented grasping is a crucial yet challenging task in robotic manipulation. Despite the recent progress, few existing methods address task-oriented grasping with dexterous hands. Dexterous hands provide better precision and versatility, enabling robots to perform task-oriented grasping more effectively. In this paper, we argue that part analysis can enhance dexterous grasping by providing detailed information about the object's functionality. We propose PartDexTOG, a method that generates dexterous task-oriented grasps via language-driven part analysis. Taking a 3D object and a manipulation task represented by language as input, the method first generates the category-level and part-level grasp descriptions w.r.t the manipulation task by LLMs. Then, a category-part conditional diffusion model is developed to generate a dexterous grasp for each part, respectively, based on the generated descriptions. To select the most plausible combination of grasp and corresponding part from the generated ones, we propose a measure of geometric consistency between grasp and part. We show that our method greatly benefits from the open-world knowledge reasoning on object parts by LLMs, which naturally facilitates the learning of grasp generation on objects with different geometry and for different manipulation tasks. Our method ranks top on the OakInk-shape dataset over all previous methods, improving the Penetration Volume, the Grasp Displace, and the P-FID over the state-of-the-art by $3.58\%$, $2.87\%$, and $41.43\%$, respectively. Notably, it demonstrates good generality in handling novel categories and tasks.
Model-free Adaptive Output Feedback Vibration Suppression in a Cantilever Beam
This paper presents a model-free adaptive control approach to suppress vibrations in a cantilevered beam excited by an unknown disturbance. The cantilevered beam under harmonic excitation is modeled using a lumped parameter approach. Based on retrospective cost optimization, a sampled-data adaptive controller is developed to suppress vibrations caused by external disturbances. Both displacement and acceleration measurements are considered for feedback. Since acceleration measurements are more sensitive to spillover, which excites higher frequency modes, a filter is developed to extract key displacement information from the acceleration data and enhance suppression performance. The vibration suppression performance is compared using both displacement and acceleration measurements.
comment: 16 pages, 14 figures, to be presented at Scitech 2026, uploaded new version that corrects some mistakes in the paper
A Photorealistic Dataset and Vision-Based Algorithm for Anomaly Detection During Proximity Operations in Lunar Orbit
NASA's forthcoming Lunar Gateway space station, which will be uncrewed most of the time, will need to operate with an unprecedented level of autonomy. One key challenge is enabling the Canadarm3, the Gateway's external robotic system, to detect hazards in its environment using its onboard inspection cameras. This task is complicated by the extreme and variable lighting conditions in space. In this paper, we introduce the visual anomaly detection and localization task for the space domain and establish a benchmark based on a synthetic dataset called ALLO (Anomaly Localization in Lunar Orbit). We show that state-of-the-art visual anomaly detection methods often fail in the space domain, motivating the need for new approaches. To address this, we propose MRAD (Model Reference Anomaly Detection), a statistical algorithm that leverages the known pose of the Canadarm3 and a CAD model of the Gateway to generate reference images of the expected scene appearance. Anomalies are then identified as deviations from this model-generated reference. On the ALLO dataset, MRAD surpasses state-of-the-art anomaly detection algorithms, achieving an AP score of 62.9% at the pixel level and an AUROC score of 75.0% at the image level. Given the low tolerance for risk in space operations and the lack of domain-specific data, we emphasize the need for novel, robust, and accurate anomaly detection methods to handle the challenging visual conditions found in lunar orbit and beyond.
comment: In IEEE Robotics and Automation Letters (RA-L) and presented at the IEEE International Conference on Robotics and Automation, 1-5 June 2026, Vienna, Austria
Multiagent Systems
Can We Predict Before Executing Machine Learning Agents?
Autonomous machine learning agents have revolutionized scientific discovery, yet they remain constrained by a Generate-Execute-Feedback paradigm. Previous approaches suffer from a severe Execution Bottleneck, as hypothesis evaluation relies strictly on expensive physical execution. To bypass these physical constraints, we internalize execution priors to substitute costly runtime checks with instantaneous predictive reasoning, drawing inspiration from World Models. In this work, we formalize the task of Data-centric Solution Preference and construct a comprehensive corpus of 18,438 pairwise comparisons. We demonstrate that LLMs exhibit significant predictive capabilities when primed with a Verified Data Analysis Report, achieving 61.5% accuracy and robust confidence calibration. Finally, we instantiate this framework in FOREAGENT, an agent that employs a Predict-then-Verify loop, achieving a 6x acceleration in convergence while surpassing execution-based baselines by +6%. Our code and dataset will be publicly available soon at https://github.com/zjunlp/predict-before-execute.
comment: Work in progress
Illusions of Confidence? Diagnosing LLM Truthfulness via Neighborhood Consistency
As Large Language Models (LLMs) are increasingly deployed in real-world settings, correctness alone is insufficient. Reliable deployment requires maintaining truthful beliefs under contextual perturbations. Existing evaluations largely rely on point-wise confidence like Self-Consistency, which can mask brittle belief. We show that even facts answered with perfect self-consistency can rapidly collapse under mild contextual interference. To address this gap, we propose Neighbor-Consistency Belief (NCB), a structural measure of belief robustness that evaluates response coherence across a conceptual neighborhood. To validate the efficiency of NCB, we introduce a new cognitive stress-testing protocol that probes outputs stability under contextual interference. Experiments across multiple LLMs show that the performance of high-NCB data is relatively more resistant to interference. Finally, we present Structure-Aware Training (SAT), which optimizes context-invariant belief structure and reduces long-tail knowledge brittleness by approximately 30%. Code will be available at https://github.com/zjunlp/belief.
comment: Work in progress
EvoQRE: Modeling Bounded Rationality in Safety-Critical Traffic Simulation via Evolutionary Quantal Response Equilibrium
Existing traffic simulation frameworks for autonomous vehicles typically rely on imitation learning or game-theoretic approaches that solve for Nash or coarse correlated equilibria, implicitly assuming perfectly rational agents. However, human drivers exhibit bounded rationality, making approximately optimal decisions under cognitive and perceptual constraints. We propose EvoQRE, a principled framework for modeling safety-critical traffic interactions as general-sum Markov games solved via Quantal Response Equilibrium (QRE) and evolutionary game dynamics. EvoQRE integrates a pre-trained generative world model with entropy-regularized replicator dynamics, capturing stochastic human behavior while maintaining equilibrium structure. We provide rigorous theoretical results, proving that the proposed dynamics converge to Logit-QRE under a two-timescale stochastic approximation with an explicit convergence rate of O(log k / k^{1/3}) under weak monotonicity assumptions. We further extend QRE to continuous action spaces using mixture-based and energy-based policy representations. Experiments on the Waymo Open Motion Dataset and nuPlan benchmark demonstrate that EvoQRE achieves state-of-the-art realism, improved safety metrics, and controllable generation of diverse safety-critical scenarios through interpretable rationality parameters.
comment: 11 pages, 5 figures
Conformity Dynamics in LLM Multi-Agent Systems: The Roles of Topology and Self-Social Weighting
Large Language Models (LLMs) are increasingly instantiated as interacting agents in multi-agent systems (MAS), where collective decisions emerge through social interaction rather than independent reasoning. A fundamental yet underexplored mechanism in this process is conformity, the tendency of agents to align their judgments with prevailing group opinions. This paper presents a systematic study of how network topology shapes conformity dynamics in LLM-based MAS through a misinformation detection task. We introduce a confidence-normalized pooling rule that controls the trade-off between self-reliance and social influence, enabling comparisons between two canonical decision paradigms: Centralized Aggregation and Distributed Consensus. Experimental results demonstrate that network topology critically governs both the efficiency and robustness of collective judgments. Centralized structures enable immediate decisions but are sensitive to hub competence and exhibit same-model alignment biases. In contrast, distributed structures promote more robust consensus, while increased network connectivity speeds up convergence but also heightens the risk of wrong-but-sure cascades, in which agents converge on incorrect decisions with high confidence. These findings characterize the conformity dynamics in LLM-based MAS, clarifying how network topology and self-social weighting jointly shape the efficiency, robustness, and failure modes of collective decision-making.
comment: Under Review
Crisis-Bench: Benchmarking Strategic Ambiguity and Reputation Management in Large Language Models
Standard safety alignment optimizes Large Language Models (LLMs) for universal helpfulness and honesty, effectively instilling a rigid "Boy Scout" morality. While robust for general-purpose assistants, this one-size-fits-all ethical framework imposes a "transparency tax" on professional domains requiring strategic ambiguity and information withholding, such as public relations, negotiation, and crisis management. To measure this gap between general safety and professional utility, we introduce Crisis-Bench, a multi-agent Partially Observable Markov Decision Process (POMDP) that evaluates LLMs in high-stakes corporate crises. Spanning 80 diverse storylines across 8 industries, Crisis-Bench tasks an LLM-based Public Relations (PR) Agent with navigating a dynamic 7-day corporate crisis simulation while managing strictly separated Private and Public narrative states to enforce rigorous information asymmetry. Unlike traditional benchmarks that rely on static ground truths, we introduce the Adjudicator-Market Loop: a novel evaluation metric where public sentiment is adjudicated and translated into a simulated stock price, creating a realistic economic incentive structure. Our results expose a critical dichotomy: while some models capitulate to ethical concerns, others demonstrate the capacity for Machiavellian, legitimate strategic withholding in order to stabilize the simulated stock price. Crisis-Bench provides the first quantitative framework for assessing "Reputation Management" capabilities, arguing for a shift from rigid moral absolutism to context-aware professional alignment.
How Exploration Breaks Cooperation in Shared-Policy Multi-Agent Reinforcement Learning
Multi-agent reinforcement learning in dynamic social dilemmas commonly relies on parameter sharing to enable scalability. We show that in shared-policy Deep Q-Network learning, standard exploration can induce a robust and systematic collapse of cooperation even in environments where fully cooperative equilibria are stable and payoff dominant. Through controlled experiments, we demonstrate that shared DQN converges to stable but persistently low-cooperation regimes. This collapse is not caused by reward misalignment, noise, or insufficient training, but by a representational failure arising from partial observability combined with parameter coupling across heterogeneous agent states. Exploration-driven updates bias the shared representation toward locally dominant defection responses, which then propagate across agents and suppress cooperative learning. We confirm that the failure persists across network sizes, exploration schedules, and payoff structures, and disappears when parameter sharing is removed or when agents maintain independent representations. These results identify a fundamental failure mode of shared-policy MARL and establish structural conditions under which scalable learning architectures can systematically undermine cooperation. Our findings provide concrete guidance for the design of multi-agent learning systems in social and economic environments where collective behavior is critical.
comment: 38 pages, 9 figures
Memory Poisoning Attack and Defense on Memory Based LLM-Agents
Large language model agents equipped with persistent memory are vulnerable to memory poisoning attacks, where adversaries inject malicious instructions through query only interactions that corrupt the agents long term memory and influence future responses. Recent work demonstrated that the MINJA (Memory Injection Attack) achieves over 95 % injection success rate and 70 % attack success rate under idealized conditions. However, the robustness of these attacks in realistic deployments and effective defensive mechanisms remain understudied. This work addresses these gaps through systematic empirical evaluation of memory poisoning attacks and defenses in Electronic Health Record (EHR) agents. We investigate attack robustness by varying three critical dimensions: initial memory state, number of indication prompts, and retrieval parameters. Our experiments on GPT-4o-mini, Gemini-2.0-Flash and Llama-3.1-8B-Instruct models using MIMIC-III clinical data reveal that realistic conditions with pre-existing legitimate memories dramatically reduce attack effectiveness. We then propose and evaluate two novel defense mechanisms: (1) Input/Output Moderation using composite trust scoring across multiple orthogonal signals, and (2) Memory Sanitization with trust-aware retrieval employing temporal decay and pattern-based filtering. Our defense evaluation reveals that effective memory sanitization requires careful trust threshold calibration to prevent both overly conservative rejection (blocking all entries) and insufficient filtering (missing subtle attacks), establishing important baselines for future adaptive defense mechanisms. These findings provide crucial insights for securing memory-augmented LLM agents in production environments.
EvidFuse: Writing-Time Evidence Learning for Consistent Text-Chart Data Reporting
Data-driven reports communicate decision-relevant insights by tightly interleaving narrative text with charts grounded in underlying tables. However, current LLM-based systems typically generate narratives and visualizations in staged pipelines, following either a text-first-graph-second or a graph-first-text-second paradigm. These designs often lead to chart-text inconsistency and insight freezing, where the intermediate evidence space becomes fixed and the model can no longer retrieve or construct new visual evidence as the narrative evolves, resulting in shallow and predefined analysis. To address the limitations, we propose \textbf{EvidFuse}, a training-free multi-agent framework that enables writing-time text-chart interleaved generation for data-driven reports. EvidFuse decouples visualization analysis from long-form drafting via two collaborating components: a \textbf{Data-Augmented Analysis Agent}, equipped with Exploratory Data Analysis (EDA)-derived knowledge and access to raw tables, and a \textbf{Real-Time Evidence Construction Writer} that plans an outline and drafts the report while intermittently issuing fine-grained analysis requests. This design allows visual evidence to be constructed and incorporated exactly when the narrative requires it, directly constraining subsequent claims and enabling on-demand expansion of the evidence space. Experiments demonstrate that EvidFuse attains the top rank in both LLM-as-a-judge and human evaluations on chart quality, chart-text alignment, and report-level usefulness.
Simulating Multi-Stakeholder Decision-Making with Generative Agents in Urban Planning
Reaching consensus in urban planning is a complex process often hindered by prolonged negotiations, trade-offs, power dynamics, and competing stakeholder interests, resulting in inefficiencies and inequities. Advances in large language models (LLMs), with their increasing capabilities in knowledge transfer, reasoning, and planning, have enabled the development of multi-generative agent systems, offering a promising approach to simulating discussions and interactions among diverse stakeholders on contentious topics. However, applying such systems also carries significant societal and ethical risks, including misrepresentation, privacy concerns, and biases stemming from opinion convergence among agents, hallucinations caused by insufficient or biased prompts, and the inherent limitations of foundation models. To evaluate the influence of these factors, we incorporate varying levels of real-world survey data and demographic detail to test agents' performance under two decision-making value frameworks: altruism-driven and interest-driven, using a real-world urban rezoning challenge. This approach evaluates the influence of demographic factors such as race, gender, and age on collective decision-making in the design of multi-generative agent systems. Our experimental results reveal that integrating demographic and life-value data enhances the diversity and stability of agent outputs. In addition, communication among generated agents improves the quality of collective reasoning. These findings provide a predictive framework for decision-makers to anticipate stakeholder reactions, including concerns, objections, and support. By enabling iterative refinement of proposals before public release, the simulated approach fosters more equitable and cost-effective decisions in urban planning.
Adversarial Yet Cooperative: Multi-Perspective Reasoning in Retrieved-Augmented Language Models
Recent advances in synergizing large reasoning models (LRMs) with retrieval-augmented generation (RAG) have shown promising results, yet two critical challenges remain: (1) reasoning models typically operate from a single, unchallenged perspective, limiting their ability to conduct deep, self-correcting reasoning over external documents, and (2) existing training paradigms rely excessively on outcome-oriented rewards, which provide insufficient signal for shaping the complex, multi-step reasoning process. To address these issues, we propose an Reasoner-Verifier framework named Adversarial Reasoning RAG (ARR). The Reasoner and Verifier engage in reasoning on retrieved evidence and critiquing each other's logic while being guided by process-aware advantage that requires no external scoring model. This reward combines explicit observational signals with internal model uncertainty to jointly optimize reasoning fidelity and verification rigor. Experiments on multiple benchmarks demonstrate the effectiveness of our method.
Benchmarking LLM-based Agents for Single-cell Omics Analysis
The surge in multimodal single-cell omics data exposes limitations in traditional, manually defined analysis workflows. AI agents offer a paradigm shift, enabling adaptive planning, executable code generation, traceable decisions, and real-time knowledge fusion. However, the lack of a comprehensive benchmark critically hinders progress. We introduce a novel benchmarking evaluation system to rigorously assess agent capabilities in single-cell omics analysis. This system comprises: a unified platform compatible with diverse agent frameworks and LLMs; multidimensional metrics assessing cognitive program synthesis, collaboration, execution efficiency, bioinformatics knowledge integration, and task completion quality; and 50 diverse real-world single-cell omics analysis tasks spanning multi-omics, species, and sequencing technologies. Our evaluation reveals that Grok-3-beta achieves state-of-the-art performance among tested agent frameworks. Multi-agent frameworks significantly enhance collaboration and execution efficiency over single-agent approaches through specialized role division. Attribution analyses of agent capabilities identify that high-quality code generation is crucial for task success, and self-reflection has the most significant overall impact, followed by retrieval-augmented generation (RAG) and planning. This work highlights persistent challenges in code generation, long-context handling, and context-aware knowledge retrieval, providing a critical empirical foundation and best practices for developing robust AI agents in computational biology.
comment: 6 main figures; 13 supplementary figures
Systems and Control (EESS)
Resilient UAV Data Mule via Adaptive Sensor Association under Timing Constraints
Unmanned aerial vehicles (UAVs) can be critical for time-sensitive data collection missions, yet existing research often relies on simulations that fail to capture real-world complexities. Many studies assume ideal wireless conditions or focus only on path planning, neglecting the challenge of making real-time decisions in dynamic environments. To bridge this gap, we address the problem of adaptive sensor selection for a data-gathering UAV, considering both the buffered data at each sensor and realistic propagation conditions. We introduce the Hover-based Greedy Adaptive Download (HGAD) strategy, designed to maximize data transfer by intelligently hovering over sensors during periods of peak signal quality. We validate HGAD using both a digital twin (DT) and a real-world (RW) testbed at the NSF-funded AERPAW platform. Our experiments show that HGAD significantly improves download stability and successfully meets per-sensor data targets. When compared with the traditional Greedy approach that simply follows the strongest signal, HGAD is shown to outperform in the cumulative data download. This work demonstrates the importance of integrating signal-to-noise ratio (SNR)-aware and buffer-aware scheduling with DT and RW signal traces to design resilient UAV data-mule strategies for realistic deployments.
comment: 13 pages
Generalized Spectral Clustering of Low-Inertia Power Networks
Large-scale integration of distributed energy resources has led to a rapid increase in the number of controllable devices and a significant change in system dynamics. This has necessitating the shift towards more distributed and scalable control strategies to manage the increasing system complexity. In this work, we address the problem of partitioning a low-inertia power network into dynamically coherent subsystems to facilitate the utilization of distributed control schemes. We show that an embedding of the power network using the spectrum of the linearized synchronization dynamics matrix results in a natural decomposition of the network. We establish the connection between our approach and the broader framework of spectral clustering using the Laplacian matrix of the admittance network. The proposed method is demonstrated on the IEEE 30-bus test system, and numerical simulations show that the resulting clusters using our approach are dynamically coherent. We consider the robustness of the clusters identified in the network by analyzing the sensitivity of the small eigenvalues and their corresponding eigenspaces, which determines the coherency structure of the oscillator dynamics, to variations in the steady-state operating points of the network.
Modeling and Bifurcation Analysis of Longitudinal Dynamics of an Air-Breathing Hypersonic Vehicle
A nonlinear model of an Air-Breathing Hypersonic Vehicle (ABHV) longitudinal dynamics characterized by coupling of aerodynamic and propulsive terms is presented in this paper. The model is verified using modal analysis carried out around a design operating condition with results available in the literature. Further, parametric dynamic behavior is computed for the model as steady states with local stability with respect to its control inputs, elevator and fuel-equivalence ratio in four different cases using a numerical continuation algorithm. Detailed analysis of the qualitative longitudinal dynamics of the model is carried out based on bifurcation theory methodology. Numerical simulation results are presented to verify bifurcation analysis results.
comment: 22 pages, 14 figures
Explicit Reward Mechanisms for Local Flexibility in Renewable Energy Communities
Incentivizing flexible consumption of end-users is key to maximizing the value of local exchanges within Renewable Energy Communities. If centralized coordination for flexible resources planning raises concerns regarding data privacy and fair benefits distribution, state-of-the-art approaches (e.g., bi-level, ADMM) often face computational complexity and convexity challenges, limiting the precision of embedded flexible models. This work proposes an iterative resolution procedure to solve the decentralized flexibility planning with a central operator as a coordinator within a community. The community operator asks for upward or downward flexibility depending on the global needs, while members can individually react with an offer for flexible capacity. This approach ensures individual optimality while converging towards a global optimum, as validated on a 20-member domestic case study for which the gap in terms of collective bill is not more than 3.5% between the decentralized and centralized coordination schemes.
GenCtrl -- A Formal Controllability Toolkit for Generative Models
As generative models become ubiquitous, there is a critical need for fine-grained control over the generation process. Yet, while controlled generation methods from prompting to fine-tuning proliferate, a fundamental question remains unanswered: are these models truly controllable in the first place? In this work, we provide a theoretical framework to formally answer this question. Framing human-model interaction as a control process, we propose a novel algorithm to estimate the controllable sets of models in a dialogue setting. Notably, we provide formal guarantees on the estimation error as a function of sample complexity: we derive probably-approximately correct bounds for controllable set estimates that are distribution-free, employ no assumptions except for output boundedness, and work for any black-box nonlinear control system (i.e., any generative model). We empirically demonstrate the theoretical framework on different tasks in controlling dialogue processes, for both language models and text-to-image generation. Our results show that model controllability is surprisingly fragile and highly dependent on the experimental setting. This highlights the need for rigorous controllability analysis, shifting the focus from simply attempting control to first understanding its fundamental limits.
LLM-DMD: Large Language Model-based Power System Dynamic Model Discovery
Current model structural discovery methods for power system dynamics impose rigid priors on the basis functions and variable sets of dynamic models while often neglecting algebraic constraints, thereby limiting the formulation of high-fidelity models required for precise simulation and analysis. This letter presents a novel large language model (LLM)-based framework for dynamic model discovery (LLM-DMD) which integrates the reasoning and code synthesis capabilities of LLMs to discover dynamic equations and enforce algebraic constraints through two sequential loops: the differential-equation loop that identifies state dynamics and associated variables, and the algebraic-equation loop that formulates algebraic constraints on the identified algebraic variables. In each loop, executable skeletons of power system dynamic equations are generated by the LLM-based agent and evaluated via gradient-based optimizer. Candidate models are stored in an island-based archive to guide future iterations, and evaluation stagnation activates a variable extension mechanism that augments the model with missing algebraic or input variables, such as stator currents to refine the model. Validation on synchronous generator benchmarks of the IEEE 39-bus system demonstrates the superiority of LLM-DMD in complete dynamic model discovery.
Discrete Homogeneity and Quantizer Design for Nonlinear Homogeneous Control Systems
This paper proposes a framework for analysis of generalized homogeneous control systems under state quantization. In particular, it addresses the challenge of maintaining finite/fixed-time stability of nonlinear systems in the presence of quantized measurements. To analyze the behavior of quantized control system, we introduce a new type of discrete homogeneity, where the dilation is defined by a discrete group. The converse Lyapunov function theorem is established for homogeneous systems with respect to discrete dilations. By extending the notion of sector-boundedness to a homogeneous vector space, we derive a generalized homogeneous sector-boundedness condition that guarantees finite/fixed-time stability of nonlinear control system under quantized measurements. A geometry-aware homogeneous static vector quantizer is then designed using generalized homogeneous coordinates, enabling an efficient quantization scheme. The resulting homogeneous control system with the proposed quantizer is proven to be homogeneous with respect to discrete dilation and globally finite-time, nearly fixed-time, or exponentially stable, depending on the homogeneity degree. Numerical examples validate the effectiveness of the proposed approach.
SIaD-Tool: A Comprehensive Frequency-Domain Tool for Small-Signal Stability and Interaction Assessment in Modern Power Systems
This paper presents SIaD-Tool, an open-source frequency-domain (FD) scanning solution for stability and interaction assessment in modern power systems. The tool enables multi-sequence identification in the abc, dq0, and 0pn frames and supports both series voltage and parallel current perturbation strategies. A novel perturbation scheme allows direct scanning in the target frame, simplifying the analysis of coupling effects and mirrored frequencies. SIaD-Tool is implemented on a multi-platform architecture, including MATLAB/Simulink and Python-PSCAD/EMTDC. Beyond system identification, it integrates automated stability evaluation through four standardized methods: Generalized Nyquist Criterion (GNC), modal impedance analysis, phase margin assessment, and passivity checks. Validation is carried out via extensive case studies involving passive elements, grid-following and grid-forming converters, offshore wind power plants, and the IEEE 9-bus system. Results confirm high accuracy, scalability, and robustness in detecting critical modes, interaction frequencies, oscillatory behavior, and stability margins.
comment: IEEE Transactions on Power Delivery
How Carbon Border Adjustment Mechanism is Energizing the EU Carbon Market and Industrial Transformation
The global carbon market is fragmented and characterized by limited pricing transparency and empirical evidence, creating challenges for investors and policymakers in identifying carbon management opportunities. The European Union is among several regions that have implemented emissions pricing through an Emissions Trading System (EU ETS). While the EU ETS has contributed to emissions reductions, it has also raised concerns related to international competitiveness and carbon leakage, particularly given the strong integration of EU industries into global value chains. To address these challenges, the European Commission proposed the Carbon Border Adjustment Mechanism (CBAM) in 2021. CBAM is designed to operate alongside the EU ETS by applying a carbon price to selected imported goods, thereby aligning carbon costs between domestic and foreign producers. It will gradually replace existing carbon leakage mitigation measures, including the allocation of free allowances under the EU ETS. The initial scope of CBAM covers electricity, cement, fertilizer, aluminium, iron, and steel. As climate policies intensify under the Paris Agreement, CBAM-like mechanisms are expected to play an increasingly important role in managing carbon-related trade risks and supporting the transition to net zero emissions.
comment: 17 Pages; 4 Figures
Pivot-Only Azimuthal Control and Attitude Estimation of Balloon-borne Payloads SC
This paper presents an attitude estimation and yaw-rate control framework for balloon-borne payloads using pivot-only actuation, motivated by the Taurus experiment. Taurus is a long-duration balloon instrument designed for rapid azimuthal scanning at approximately 30 deg/s using a motorized pivot at the flight-train connection, without a reaction wheel. We model the gondola as a rigid body subject to realistic disturbances and sensing limitations, and implement a Multiplicative Extended Kalman Filter (MEKF) that estimates attitude and gyroscope bias by fusing inertial and vector-camera measurements. A simple PI controller uses the estimated states to regulate yaw rate. Numerical simulations incorporating representative disturbance and measurement noise levels are used to evaluate closed-loop control performance and MEKF behavior under flight-like conditions. Experimental tests on the Taurus gondola validate the pivot-only approach, demonstrating stable high-rate tracking under realistic hardware constraints. The close agreement between simulation and experiment indicates that the simplified rigid-body model captures the dominant dynamics relevant for controller design and integrated estimation-and-control development.
comment: AIAA SCITECH 2026 Forum
Selection-Induced Contraction of Innovation Statistics in Gated Kalman Filters
Validation gating is a fundamental component of classical Kalman-based tracking systems. Only measurements whose normalized innovation squared (NIS) falls below a prescribed threshold are considered for state update. While this procedure is statistically motivated by the chi-square distribution, it implicitly replaces the unconditional innovation process with a conditionally observed one, restricted to the validation event. This paper shows that innovation statistics computed after gating converge to gate-conditioned rather than nominal quantities. Under classical linear--Gaussian assumptions, we derive exact expressions for the first- and second-order moments of the innovation conditioned on ellipsoidal gating, and show that gating induces a deterministic, dimension-dependent contraction of the innovation covariance. The analysis is extended to NN association, which is shown to act as an additional statistical selection operator. We prove that selecting the minimum-norm innovation among multiple in-gate measurements introduces an unavoidable energy contraction, implying that nominal innovation statistics cannot be preserved under nontrivial gating and association. Closed-form results in the two-dimensional case quantify the combined effects and illustrate their practical significance.
comment: 9 pages, preprint
Communication-Efficient Stochastic Distributed Learning
We address distributed learning problems, both nonconvex and convex, over undirected networks. In particular, we design a novel algorithm based on the distributed Alternating Direction Method of Multipliers (ADMM) to address the challenges of high communication costs, and large datasets. Our design tackles these challenges i) by enabling the agents to perform multiple local training steps between each round of communications; and ii) by allowing the agents to employ stochastic gradients while carrying out local computations. We show that the proposed algorithm converges to a neighborhood of a stationary point, for nonconvex problems, and of an optimal point, for convex problems. We also propose a variant of the algorithm to incorporate variance reduction thus achieving exact convergence. We show that the resulting algorithm indeed converges to a stationary (or optimal) point, and moreover that local training accelerates convergence. We thoroughly compare the proposed algorithms with the state of the art, both theoretically and through numerical results.
Orthogonal-by-construction augmentation of physics-based input-output models
This paper proposes a novel orthogonal-by-construction parametrization for augmenting physics-based input-output models with a learning component in an additive sense. The parametrization allows to jointly optimize the parameters of the physics-based model and the learning component. Unlike the commonly applied additive (parallel) augmentation structure, the proposed formulation eliminates overlap in representation of the system dynamics, thereby preserving the uniqueness of the estimated physical parameters, ultimately leading to enhanced model interpretability. By theoretical analysis, we show that, under mild conditions, the method is statistically consistent and guarantees recovery of the true physical parameters. With further analysis regarding the asymptotic covariance matrix of the identified parameters, we also prove that the proposed structure provides a clear separation between the physics-based and learning components of the augmentation structure. The effectiveness of the proposed approach is demonstrated through simulation studies, showing accurate reproduction of the data-generating dynamics without sacrificing consistent estimation of the physical parameters.
comment: Submitted for publication
Hierarchical GNN-Based Multi-Agent Learning for Dynamic Queue-Jump Lane and Emergency Vehicle Corridor Formation
Emergency vehicles require rapid passage through congested traffic, yet existing strategies fail to adapt to dynamic conditions. We propose a novel hierarchical graph neural network (GNN)-based multi-agent reinforcement learning framework to coordinate connected vehicles for emergency corridor formation. Our approach uses a high-level planner for global strategy and low-level controllers for trajectory execution, utilizing graph attention networks to scale with variable agent counts. Trained via Multi-Agent Proximal Policy Optimization (MAPPO), the system reduces emergency vehicle travel time by 28.3% compared to baselines and 44.6% compared to uncoordinated traffic in simulations. The design achieves near-zero collision rates (0.3%) while maintaining 81% of background traffic efficiency. Ablation and generalization studies confirm the framework's robustness across diverse scenarios. These results demonstrate the effectiveness of combining GNNs with hierarchical learning for intelligent transportation systems.
comment: 16 Pages, 5 Figures, 9 Tables, submitted to IEEE TITS
Model-free Adaptive Output Feedback Vibration Suppression in a Cantilever Beam
This paper presents a model-free adaptive control approach to suppress vibrations in a cantilevered beam excited by an unknown disturbance. The cantilevered beam under harmonic excitation is modeled using a lumped parameter approach. Based on retrospective cost optimization, a sampled-data adaptive controller is developed to suppress vibrations caused by external disturbances. Both displacement and acceleration measurements are considered for feedback. Since acceleration measurements are more sensitive to spillover, which excites higher frequency modes, a filter is developed to extract key displacement information from the acceleration data and enhance suppression performance. The vibration suppression performance is compared using both displacement and acceleration measurements.
comment: 16 pages, 14 figures, to be presented at Scitech 2026, uploaded new version that corrects some mistakes in the paper
Battery State of Health Estimation and Incremental Capacity Analysis under Dynamic Charging Profile Using Neural Networks
Incremental capacity analysis (ICA) and differential voltage analysis (DVA) are two effective approaches for battery degradation monitoring. One limiting factor for their real-world application is that they require constant-current (CC) charging profiles. This research removes this limitation and proposes an approach that extends ICA/DVA-based degradation monitoring from CC charging to dynamic charging profiles. A novel concept of virtual incremental capacity (VIC) and virtual differential voltage (VDV) is proposed. Then, two related convolutional neural networks (CNNs), called U-Net and Conv-Net, are proposed to construct VIC/VDV curves and estimate the state of health (SOH) from dynamic charging profiles across any state-of-charge (SOC) range that satisfies some constraints. Finally, two CNNs called Mobile U-Net and Mobile-Net are proposed as replacements for the U-Net and Conv-Net, respectively, to reduce the computational footprint and memory requirements, while keeping similar performance. Using an extensive experimental dataset of battery modules, the proposed CNNs are demonstrated to provide accurate VIC/VDV curves and enable ICA/DVA-based battery degradation monitoring under various fast-charging protocols and different SOC ranges.
comment: Addressed a lot of reviewer comments; Modified title and addressed reviewer comments
Robotics
LaST$_{0}$: Latent Spatio-Temporal Chain-of-Thought for Robotic Vision-Language-Action Model
Vision-Language-Action (VLA) models have recently demonstrated strong generalization capabilities in robotic manipulation. Some existing VLA approaches attempt to improve action accuracy by explicitly generating linguistic reasoning traces or future visual observations before action execution. However, explicit reasoning typically incurs non-negligible inference latency, which constrains the temporal resolution required for robotic manipulation. Moreover, such reasoning is confined to the linguistic space, imposing a representational bottleneck that struggles to faithfully capture ineffable physical attributes. To mitigate these limitations, we propose LaST$_0$, a framework that enables efficient reasoning before acting through a Latent Spatio-Temporal Chain-of-Thought (CoT), capturing fine-grained physical and robotic dynamics that are often difficult to verbalize. Specifically, we introduce a token-efficient latent CoT space that models future visual dynamics, 3D structural information, and robot proprioceptive states, and further extends these representations across time to enable temporally consistent implicit reasoning trajectories. Furthermore, LaST$_0$ adopts a dual-system architecture implemented via a Mixture-of-Transformers design, where a reasoning expert conducts low-frequency latent inference and an acting expert generates high-frequency actions conditioned on robotics-oriented latent representations. To facilitate coordination, LaST$_0$ is trained with heterogeneous operation frequencies, enabling adaptive switching between reasoning and action inference rates during deployment. Across ten simulated and six real-world manipulation tasks, LaST$_0$ improves mean success rates by 8% and 13% over prior VLA methods, respectively, while achieving substantially faster inference. Project website: https://sites.google.com/view/last0
Generate, Transfer, Adapt: Learning Functional Dexterous Grasping from a Single Human Demonstration
Functional grasping with dexterous robotic hands is a key capability for enabling tool use and complex manipulation, yet progress has been constrained by two persistent bottlenecks: the scarcity of large-scale datasets and the absence of integrated semantic and geometric reasoning in learned models. In this work, we present CorDex, a framework that robustly learns dexterous functional grasps of novel objects from synthetic data generated from just a single human demonstration. At the core of our approach is a correspondence-based data engine that generates diverse, high-quality training data in simulation. Based on the human demonstration, our data engine generates diverse object instances of the same category, transfers the expert grasp to the generated objects through correspondence estimation, and adapts the grasp through optimization. Building on the generated data, we introduce a multimodal prediction network that integrates visual and geometric information. By devising a local-global fusion module and an importance-aware sampling mechanism, we enable robust and computationally efficient prediction of functional dexterous grasps. Through extensive experiments across various object categories, we demonstrate that CorDex generalizes well to unseen object instances and significantly outperforms state-of-the-art baselines.
comment: Project Page: https://cordex-manipulation.github.io/
RoboVIP: Multi-View Video Generation with Visual Identity Prompting Augments Robot Manipulation
The diversity, quantity, and quality of manipulation data are critical for training effective robot policies. However, due to hardware and physical setup constraints, collecting large-scale real-world manipulation data remains difficult to scale across diverse environments. Recent work uses text-prompt conditioned image diffusion models to augment manipulation data by altering the backgrounds and tabletop objects in the visual observations. However, these approaches often overlook the practical need for multi-view and temporally coherent observations required by state-of-the-art policy models. Further, text prompts alone cannot reliably specify the scene setup. To provide the diffusion model with explicit visual guidance, we introduce visual identity prompting, which supplies exemplar images as conditioning inputs to guide the generation of the desired scene setup. To this end, we also build a scalable pipeline to curate a visual identity pool from large robotics datasets. Using our augmented manipulation data to train downstream vision-language-action and visuomotor policy models yields consistent performance gains in both simulation and real-robot settings.
Driving on Registers
We present DrivoR, a simple and efficient transformer-based architecture for end-to-end autonomous driving. Our approach builds on pretrained Vision Transformers (ViTs) and introduces camera-aware register tokens that compress multi-camera features into a compact scene representation, significantly reducing downstream computation without sacrificing accuracy. These tokens drive two lightweight transformer decoders that generate and then score candidate trajectories. The scoring decoder learns to mimic an oracle and predicts interpretable sub-scores representing aspects such as safety, comfort, and efficiency, enabling behavior-conditioned driving at inference. Despite its minimal design, DrivoR outperforms or matches strong contemporary baselines across NAVSIM-v1, NAVSIM-v2, and the photorealistic closed-loop HUGSIM benchmark. Our results show that a pure-transformer architecture, combined with targeted token compression, is sufficient for accurate, efficient, and adaptive end-to-end driving. Code and checkpoints will be made available via the project page.
Compensation Effect Amplification Control (CEAC): A movement-based approach for coordinated position and velocity control of the elbow of upper-limb prostheses
Despite advances in upper-limb (UL) prosthetic design, achieving intuitive control of intermediate joints - such as the wrist and elbow - remains challenging, particularly for continuous and velocity-modulated movements. We introduce a novel movement-based control paradigm entitled Compensation Effect Amplification Control (CEAC) that leverages users' trunk flexion and extension as input for controlling prosthetic elbow velocity. Considering that the trunk can be both a functional and compensatory joint when performing upper-limb actions, CEAC amplifies the natural coupling between trunk and prosthesis while introducing a controlled delay that allows users to modulate both the position and velocity of the prosthetic joint. We evaluated CEAC in a generic drawing task performed by twelve able-bodied participants using a supernumerary prosthesis with an active elbow. Additionally a multiple-target-reaching task was performed by a subset of ten participants. Results demonstrate task performances comparable to those obtained with natural arm movements, even when gesture velocity or drawing size were varied, while maintaining ergonomic trunk postures. Analysis revealed that CEAC effectively restores joint coordinated action, distributes movement effort between trunk and elbow, enabling intuitive trajectory control without requiring extreme compensatory movements. Overall, CEAC offers a promising control strategy for intermediate joints of UL prostheses, particularly in tasks requiring continuous and precise coordination.
DAVOS: An Autonomous Vehicle Operating System in the Vehicle Computing Era
Vehicle computing represents a fundamental shift in how autonomous vehicles are designed and deployed, transforming them from isolated transportation systems into mobile computing platforms that support both safety-critical, real-time driving and data-centric services. In this setting, vehicles simultaneously support real-time driving pipelines and a growing set of data-driven applications, placing increased responsibility on the vehicle operating system to coordinate computation, data movement, storage, and access. These demands highlight recurring system considerations related to predictable execution, data and execution protection, efficient handling of high-rate sensor data, and long-term system evolvability, commonly summarized as Safety, Security, Efficiency, and Extensibility (SSEE). Existing vehicle operating systems and runtimes address these concerns in isolation, resulting in fragmented software stacks that limit coordination between autonomy workloads and vehicle data services. This paper presents DAVOS, the Delaware Autonomous Vehicle Operating System, a unified vehicle operating system architecture designed for the vehicle computing context. DAVOS provides a cohesive operating system foundation that supports both real-time autonomy and extensible vehicle computing within a single system framework.
The RoboSense Challenge: Sense Anything, Navigate Anywhere, Adapt Across Platforms IROS 2025
Autonomous systems are increasingly deployed in open and dynamic environments -- from city streets to aerial and indoor spaces -- where perception models must remain reliable under sensor noise, environmental variation, and platform shifts. However, even state-of-the-art methods often degrade under unseen conditions, highlighting the need for robust and generalizable robot sensing. The RoboSense 2025 Challenge is designed to advance robustness and adaptability in robot perception across diverse sensing scenarios. It unifies five complementary research tracks spanning language-grounded decision making, socially compliant navigation, sensor configuration generalization, cross-view and cross-modal correspondence, and cross-platform 3D perception. Together, these tasks form a comprehensive benchmark for evaluating real-world sensing reliability under domain shifts, sensor failures, and platform discrepancies. RoboSense 2025 provides standardized datasets, baseline models, and unified evaluation protocols, enabling large-scale and reproducible comparison of robust perception methods. The challenge attracted 143 teams from 85 institutions across 16 countries, reflecting broad community engagement. By consolidating insights from 23 winning solutions, this report highlights emerging methodological trends, shared design principles, and open challenges across all tracks, marking a step toward building robots that can sense reliably, act robustly, and adapt across platforms in real-world environments.
comment: Official IROS 2025 RoboSense Challenge Report; 51 pages, 37 figures, 5 tables; Competition Website at https://robosense2025.github.io/
When to Act: Calibrated Confidence for Reliable Human Intention Prediction in Assistive Robotics
Assistive devices must determine both what a user intends to do and how reliable that prediction is before providing support. We introduce a safety-critical triggering framework based on calibrated probabilities for multimodal next-action prediction in Activities of Daily Living. Raw model confidence often fails to reflect true correctness, posing a safety risk. Post-hoc calibration aligns predicted confidence with empirical reliability and reduces miscalibration by about an order of magnitude without affecting accuracy. The calibrated confidence drives a simple ACT/HOLD rule that acts only when reliability is high and withholds assistance otherwise. This turns the confidence threshold into a quantitative safety parameter for assisted actions and enables verifiable behavior in an assistive control loop.
SKATER: Synthesized Kinematics for Advanced Traversing Efficiency on a Humanoid Robot via Roller Skate Swizzles
Although recent years have seen significant progress of humanoid robots in walking and running, the frequent foot strikes with ground during these locomotion gaits inevitably generate high instantaneous impact forces, which leads to exacerbated joint wear and poor energy utilization. Roller skating, as a sport with substantial biomechanical value, can achieve fast and continuous sliding through rational utilization of body inertia, featuring minimal kinetic energy loss. Therefore, this study proposes a novel humanoid robot with each foot equipped with a row of four passive wheels for roller skating. A deep reinforcement learning control framework is also developed for the swizzle gait with the reward function design based on the intrinsic characteristics of roller skating. The learned policy is first analyzed in simulation and then deployed on the physical robot to demonstrate the smoothness and efficiency of the swizzle gait over traditional bipedal walking gait in terms of Impact Intensity and Cost of Transport during locomotion. A reduction of $75.86\%$ and $63.34\%$ of these two metrics indicate roller skating as a superior locomotion mode for enhanced energy efficiency and joint longevity.
Zero Wrench Control via Wrench Disturbance Observer for Learning-free Peg-in-hole Assembly
This paper proposes a Dynamic Wrench Disturbance Observer (DW-DOB) designed to achieve highly sensitive zero-wrench control in contact-rich manipulation. By embedding task-space inertia into the observer nominal model, DW-DOB cleanly separates intrinsic dynamic reactions from true external wrenches. This preserves sensitivity to small forces and moments while ensuring robust regulation of contact wrenches. A passivity-based analysis further demonstrates that DW-DOB guarantees stable interactions under dynamic conditions, addressing the shortcomings of conventional observers that fail to compensate for inertial effects. Peg-in-hole experiments at industrial tolerances (H7/h6) validate the approach, yielding deeper and more compliant insertions with minimal residual wrenches and outperforming a conventional wrench disturbance observer and a PD baseline. These results highlight DW-DOB as a practical learning-free solution for high-precision zero-wrench control in contact-rich tasks.
SeqWalker: Sequential-Horizon Vision-and-Language Navigation with Hierarchical Planning
Sequential-Horizon Vision-and-Language Navigation (SH-VLN) presents a challenging scenario where agents should sequentially execute multi-task navigation guided by complex, long-horizon language instructions. Current vision-and-language navigation models exhibit significant performance degradation with such multi-task instructions, as information overload impairs the agent's ability to attend to observationally relevant details. To address this problem, we propose SeqWalker, a navigation model built on a hierarchical planning framework. Our SeqWalker features: i) A High-Level Planner that dynamically selects global instructions into contextually relevant sub-instructions based on the agent's current visual observations, thus reducing cognitive load; ii) A Low-Level Planner incorporating an Exploration-Verification strategy that leverages the inherent logical structure of instructions for trajectory error correction. To evaluate SH-VLN performance, we also extend the IVLN dataset and establish a new benchmark. Extensive experiments are performed to demonstrate the superiority of the proposed SeqWalker.
Nightmare Dreamer: Dreaming About Unsafe States And Planning Ahead
Reinforcement Learning (RL) has shown remarkable success in real-world applications, particularly in robotics control. However, RL adoption remains limited due to insufficient safety guarantees. We introduce Nightmare Dreamer, a model-based Safe RL algorithm that addresses safety concerns by leveraging a learned world model to predict potential safety violations and plan actions accordingly. Nightmare Dreamer achieves nearly zero safety violations while maximizing rewards. Nightmare Dreamer outperforms model-free baselines on Safety Gymnasium tasks using only image observations, achieving nearly a 20x improvement in efficiency.
comment: RSS'25: Multi-Objective Optimization and Planning in Robotics Workshop: 5 pages, 8 figures
Optimizing Path Planning using Deep Reinforcement Learning for UGVs in Precision Agriculture
This study focuses on optimizing path planning for unmanned ground vehicles (UGVs) in precision agriculture using deep reinforcement learning (DRL) techniques in continuous action spaces. The research begins with a review of traditional grid-based methods, such as A* and Dijkstra's algorithms, and discusses their limitations in dynamic agricultural environments, highlighting the need for adaptive learning strategies. The study then explores DRL approaches, including Deep Q-Networks (DQN), which demonstrate improved adaptability and performance in two-dimensional simulations. Enhancements such as Double Q-Networks and Dueling Networks are evaluated to further improve decision-making. Building on these results, the focus shifts to continuous action space models, specifically Deep Deterministic Policy Gradient (DDPG) and Twin Delayed Deep Deterministic Policy Gradient (TD3), which are tested in increasingly complex environments. Experiments conducted in a three-dimensional environment using ROS and Gazebo demonstrate the effectiveness of continuous DRL algorithms in navigating dynamic agricultural scenarios. Notably, the pretrained TD3 agent achieves a 95 percent success rate in dynamic environments, demonstrating the robustness of the proposed approach in handling moving obstacles while ensuring safety for both crops and the robot.
Model of Spatial Human-Agent Interaction with Consideration for Others
Communication robots often need to initiate conversations with people in public spaces. At the same time, such robots must not disturb pedestrians. To handle these two requirements, an agent needs to estimate the communication desires of others based on their behavior and then adjust its own communication activities accordingly. In this study, we construct a computational spatial interaction model that considers others. Consideration is expressed as a quantitative parameter: the amount of adjustment of one's internal state to the estimated internal state of the other. To validate the model, we experimented with a human and a virtual robot interacting in a VR environment. The results show that when the participant moves to the target, a virtual robot with a low consideration value inhibits the participant's movement, while a robot with a higher consideration value did not inhibit the participant's movement. When the participant approached the robot, the robot also exhibited approaching behavior, regardless of the consideration value, thus decreasing the participant's movement. These results appear to verify the proposed model's ability to clarify interactions with consideration for others.
UniBiDex: A Unified Teleoperation Framework for Robotic Bimanual Dexterous Manipulation
We present UniBiDex a unified teleoperation framework for robotic bimanual dexterous manipulation that supports both VRbased and leaderfollower input modalities UniBiDex enables realtime contactrich dualarm teleoperation by integrating heterogeneous input devices into a shared control stack with consistent kinematic treatment and safety guarantees The framework employs nullspace control to optimize bimanual configurations ensuring smooth collisionfree and singularityaware motion across tasks We validate UniBiDex on a longhorizon kitchentidying task involving five sequential manipulation subtasks demonstrating higher task success rates smoother trajectories and improved robustness compared to strong baselines By releasing all hardware and software components as opensource we aim to lower the barrier to collecting largescale highquality human demonstration datasets and accelerate progress in robot learning.
Discrete Fourier Transform-based Point Cloud Compression for Efficient SLAM in Featureless Terrain
Simultaneous Localization and Mapping (SLAM) is an essential technology for the efficiency and reliability of unmanned robotic exploration missions. While the onboard computational capability and communication bandwidth are critically limited, the point cloud data handled by SLAM is large in size, attracting attention to data compression methods. To address such a problem, in this paper, we propose a new method for compressing point cloud maps by exploiting the Discrete Fourier Transform (DFT). The proposed technique converts the Digital Elevation Model (DEM) to the frequency-domain 2D image and omits its high-frequency components, focusing on the exploration of gradual terrains such as planets and deserts. Unlike terrains with detailed structures such as artificial environments, high-frequency components contribute little to the representation of gradual terrains. Thus, this method is effective in compressing data size without significant degradation of the point cloud. We evaluated the method in terms of compression rate and accuracy using camera sequences of two terrains with different elevation profiles.
comment: Author's version of a manuscript accepted at the 11th International Conference on Automation, Robotics, and Applications (ICARA). (c) IEEE
Data-Driven Terramechanics Approach Towards a Realistic Real-Time Simulator for Lunar Rovers
High-fidelity simulators for the lunar surface provide a digital environment for extensive testing of rover operations and mission planning. However, current simulators focus on either visual realism or physical accuracy, which limits their capability to replicate lunar conditions comprehensively. This work addresses that gap by combining high visual fidelity with realistic terrain interaction for a realistic representation of rovers on the lunar surface. Because direct simulation of wheel-soil interactions is computationally expensive, a data-driven approach was adopted, using regression models for slip and sinkage from data collected in both full-rover and single-wheel experiments and simulations. The resulting regression-based terramechanics model accurately reproduced steady-state and dynamic slip, as well as sinkage behavior, on flat terrain and slopes up to 20 degrees, with validation against field test results. Additionally, improvements were made to enhance the realism of terrain deformation and wheel trace visualization. This method supports real-time applications that require physically plausible terrain response alongside high visual fidelity.
comment: Author's version of a manuscript accepted at the International Conference on Space Robotics 2025 (iSpaRo 2025). (c) IEEE
Design and Development of Modular Limbs for Reconfigurable Robots on the Moon
In this paper, we present the development of 4-DOF robot limbs, which we call Moonbots, designed to connect in various configurations with each other and wheel modules, enabling adaptation to different environments and tasks. These modular components are intended primarily for robotic systems in space exploration and construction on the Moon in our Moonshot project. Such modular robots add flexibility and versatility for space missions where resources are constrained. Each module is driven by a common actuator characterized by a high torque-to-speed ratio, supporting both precise control and dynamic motion when required. This unified actuator design simplifies development and maintenance across the different module types. The paper describes the hardware implementation, the mechanical design of the modules, and the overall software architecture used to control and coordinate them. Additionally, we evaluate the control performance of the actuator under various load conditions to characterize its suitability for modular robot applications. To demonstrate the adaptability of the system, we introduce nine functional configurations assembled from the same set of modules: 4DOF-limb, 8DOF-limb, vehicle, dragon, minimal, quadruped, cargo, cargo-minimal, and bike. These configurations reflect different locomotion strategies and task-specific behaviors, offering a practical foundation for further research in reconfigurable robotic systems.
comment: Author's version of a manuscript accepted at the International Conference on Space Robotics 2025 (iSpaRo 2025). (c) IEEE
Multiagent Reinforcement Learning with Neighbor Action Estimation
Multiagent reinforcement learning, as a prominent intelligent paradigm, enables collaborative decision-making within complex systems. However, existing approaches often rely on explicit action exchange between agents to evaluate action value functions, which is frequently impractical in real-world engineering environments due to communication constraints, latency, energy consumption, and reliability requirements. From an artificial intelligence perspective, this paper proposes an enhanced multiagent reinforcement learning framework that employs action estimation neural networks to infer agent behaviors. By integrating a lightweight action estimation module, each agent infers neighboring agents' behaviors using only locally observable information, enabling collaborative policy learning without explicit action sharing. This approach is fully compatible with standard TD3 algorithms and scalable to larger multiagent systems. At the engineering application level, this framework has been implemented and validated in dual-arm robotic manipulation tasks: two robotic arms collaboratively lift objects. Experimental results demonstrate that this approach significantly enhances the robustness and deployment feasibility of real-world robotic systems while reducing dependence on information infrastructure. Overall, this research advances the development of decentralized multiagent artificial intelligence systems while enabling AI to operate effectively in dynamic, information-constrained real-world environments.
Fast Continuum Robot Shape and External Load State Estimation on SE(3) ICRA 2026
Previous on-manifold approaches to continuum robot state estimation have typically adopted simplified Cosserat rod models, which cannot directly account for actuation inputs or external loads. We introduce a general framework that incorporates uncertainty models for actuation (e.g., tendon tensions), applied forces and moments, process noise, boundary conditions, and arbitrary backbone measurements. By adding temporal priors across time steps, our method additionally performs joint estimation in both the spatial (arclength) and temporal domains, enabling full \textit{spacetime} state estimation. Discretizing the arclength domain yields a factor graph representation of the continuum robot model, which can be exploited for fast batch sparse nonlinear optimization in the style of SLAM. The framework is general and applies to a broad class of continuum robots; as illustrative cases, we show (i) tendon-driven robots in simulation, where we demonstrate real-time kinematics with uncertainty, tip force sensing from position feedback, and distributed load estimation from backbone strain, and (ii) a surgical concentric tube robot in experiment, where we validate accurate kinematics and tip force estimation, highlighting potential for surgical palpation.
comment: Public preprint for ICRA 2026
Inverting Non-Injective Functions with Twin Neural Network Regression
Non-injective functions are not invertible. However, non-injective functions can be restricted to sub-domains on which they are locally injective and surjective and thus invertible if the dimensionality between input and output spaces are the same. Further, even if the dimensionalities do not match it is often possible to choose a preferred solution from many possible solutions. Twin neural network regression is naturally capable of incorporating these properties to invert non-injective functions. Twin neural network regression is trained to predict adjustments to well known input variables $\mathbf{x}^{\text{anchor}}$ to obtain an estimate for an unknown $\mathbf{x}^{\text{new}}$ under a change of the target variable from $\mathbf{y}^{\text{anchor}}$ to $\mathbf{y}^{\text{new}}$. In combination with k-nearest neighbor search, I propose a deterministic framework that finds input parameters to a given target variable of non-injective functions. The method is demonstrated by inverting non-injective functions describing toy problems and robot arm control that are a) defined by data or b) known as mathematical formula.
PRISM: Protocol Refinement through Intelligent Simulation Modeling SC
Automating experimental protocol design and execution remains as a fundamental bottleneck in realizing self-driving laboratories. We introduce PRISM (Protocol Refinement through Intelligent Simulation Modeling), a framework that automates the design, validation, and execution of experimental protocols on a laboratory platform composed of off-the-shelf robotic instruments. PRISM uses a set of language-model-based agents that work together to generate and refine experimental steps. The process begins with automatically gathering relevant procedures from web-based sources describing experimental workflows. These are converted into structured experimental steps (e.g., liquid handling steps, deck layout and other related operations) through a planning, critique, and validation loop. The finalized steps are translated into the Argonne MADSci protocol format, which provides a unified interface for coordinating multiple robotic instruments (Opentrons OT-2 liquid handler, PF400 arm, Azenta plate sealer and peeler) without requiring human intervention between steps. To evaluate protocol-generation performance, we benchmarked both single reasoning models and multi-agent workflow across constrained and open-ended prompting paradigms. The resulting protocols were validated in a digital-twin environment built in NVIDIA Omniverse to detect physical or sequencing errors before execution. Using Luna qPCR amplification and Cell Painting as case studies, we demonstrate PRISM as a practical end-to-end workflow that bridges language-based protocol generation, simulation-based validation, and automated robotic execution.
comment: 43 pages, 8 figures, submitted to RSC Digital Discovery. Equal contribution: B. Hsu, P.V. Setty, R.M. Butler. Corresponding author: A. Ramanathan
Intent at a Glance: Gaze-Guided Robotic Manipulation via Foundation Models
Designing intuitive interfaces for robotic control remains a central challenge in enabling effective human-robot interaction, particularly in assistive care settings. Eye gaze offers a fast, non-intrusive, and intent-rich input modality, making it an attractive channel for conveying user goals. In this work, we present GAMMA (Gaze Assisted Manipulation for Modular Autonomy), a system that leverages ego-centric gaze tracking and a vision-language model to infer user intent and autonomously execute robotic manipulation tasks. By contextualizing gaze fixations within the scene, the system maps visual attention to high-level semantic understanding, enabling skill selection and parameterization without task-specific training. We evaluate GAMMA on a range of table-top manipulation tasks and compare it against baseline gaze-based control without reasoning. Results demonstrate that GAMMA provides robust, intuitive, and generalizable control, highlighting the potential of combining foundation models and gaze for natural and scalable robot autonomy. Project website: https://gamma0.vercel.app/
comment: Accepted to 2025 RSS Robot Planning in the Era of Foundation Models (FM4RoboPlan) Workshop
$π_0$: A Vision-Language-Action Flow Model for General Robot Control
Robot learning holds tremendous promise to unlock the full potential of flexible, general, and dexterous robot systems, as well as to address some of the deepest questions in artificial intelligence. However, bringing robot learning to the level of generality required for effective real-world systems faces major obstacles in terms of data, generalization, and robustness. In this paper, we discuss how generalist robot policies (i.e., robot foundation models) can address these challenges, and how we can design effective generalist robot policies for complex and highly dexterous tasks. We propose a novel flow matching architecture built on top of a pre-trained vision-language model (VLM) to inherit Internet-scale semantic knowledge. We then discuss how this model can be trained on a large and diverse dataset from multiple dexterous robot platforms, including single-arm robots, dual-arm robots, and mobile manipulators. We evaluate our model in terms of its ability to perform tasks in zero shot after pre-training, follow language instructions from people and from a high-level VLM policy, and its ability to acquire new skills via fine-tuning. Our results cover a wide variety of tasks, such as laundry folding, table cleaning, and assembling boxes.
comment: See project website for videos: https://physicalintelligence.company/blog/pi0 Published in RSS 2025
Uncertainty-Aware Robotic World Model Makes Offline Model-Based Reinforcement Learning Work on Real Robots
Reinforcement Learning (RL) has achieved impressive results in robotics, yet high-performing pipelines remain highly task-specific, with little reuse of prior data. Offline Model-based RL (MBRL) offers greater data efficiency by training policies entirely from existing datasets, but suffers from compounding errors and distribution shift in long-horizon rollouts. Although existing methods have shown success in controlled simulation benchmarks, robustly applying them to the noisy, biased, and partially observed datasets typical of real-world robotics remains challenging. We present a principled pipeline for making offline MBRL effective on physical robots. Our RWM-U extends autoregressive world models with epistemic uncertainty estimation, enabling temporally consistent multi-step rollouts with uncertainty effectively propagated over long horizons. We combine RWM-U with MOPO-PPO, which adapts uncertainty-penalized policy optimization to the stable, on-policy PPO framework for real-world control. We evaluate our approach on diverse manipulation and locomotion tasks in simulation and on real quadruped and humanoid, training policies entirely from offline datasets. The resulting policies consistently outperform model-free and uncertainty-unaware model-based baselines, and fusing real-world data in model learning further yields robust policies that surpass online model-free baselines trained solely in simulation.
ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning
We propose ReinFlow, a simple yet effective online reinforcement learning (RL) framework that fine-tunes a family of flow matching policies for continuous robotic control. Derived from rigorous RL theory, ReinFlow injects learnable noise into a flow policy's deterministic path, converting the flow into a discrete-time Markov Process for exact and straightforward likelihood computation. This conversion facilitates exploration and ensures training stability, enabling ReinFlow to fine-tune diverse flow model variants, including Rectified Flow [35] and Shortcut Models [19], particularly at very few or even one denoising step. We benchmark ReinFlow in representative locomotion and manipulation tasks, including long-horizon planning with visual input and sparse reward. The episode reward of Rectified Flow policies obtained an average net growth of 135.36% after fine-tuning in challenging legged locomotion tasks while saving denoising steps and 82.63% of wall time compared to state-of-the-art diffusion RL fine-tuning method DPPO [43]. The success rate of the Shortcut Model policies in state and visual manipulation tasks achieved an average net increase of 40.34% after fine-tuning with ReinFlow at four or even one denoising step, whose performance is comparable to fine-tuned DDIM policies while saving computation time for an average of 23.20%. Project webpage: https://reinflow.github.io/
comment: 38 pages
ImagineNav++: Prompting Vision-Language Models as Embodied Navigator through Scene Imagination
Visual navigation is a fundamental capability for autonomous home-assistance robots, enabling long-horizon tasks such as object search. While recent methods have leveraged Large Language Models (LLMs) to incorporate commonsense reasoning and improve exploration efficiency, their planning remains constrained by textual representations, which cannot adequately capture spatial occupancy or scene geometry--critical factors for navigation decisions. We explore whether Vision-Language Models (VLMs) can achieve mapless visual navigation using only onboard RGB/RGB-D streams, unlocking their potential for spatial perception and planning. We achieve this through an imagination-powered navigation framework, ImagineNav++, which imagines future observation images from candidate robot views and translates navigation planning into a simple best-view image selection problem for VLMs. First, a future-view imagination module distills human navigation preferences to generate semantically meaningful viewpoints with high exploration potential. These imagined views then serve as visual prompts for the VLM to identify the most informative viewpoint. To maintain spatial consistency, we develop a selective foveation memory mechanism, which hierarchically integrates keyframe observations via a sparse-to-dense framework, constructing a compact yet comprehensive memory for long-term spatial reasoning. This approach transforms goal-oriented navigation into a series of tractable point-goal navigation tasks. Extensive experiments on open-vocabulary object and instance navigation benchmarks show that ImagineNav++ achieves SOTA performance in mapless settings, even surpassing most map-based methods, highlighting the importance of scene imagination and memory in VLM-based spatial reasoning.
comment: 17 pages, 10 figures. arXiv admin note: text overlap with arXiv:2410.09874
Cognitive-Hierarchy Guided End-to-End Planning for Autonomous Driving
While end-to-end autonomous driving has advanced significantly, prevailing methods remain fundamentally misaligned with human cognitive principles in both perception and planning. In this paper, we propose CogAD, a novel end-to-end autonomous driving model that emulates the hierarchical cognition mechanisms of human drivers. CogAD implements dual hierarchical mechanisms: global-to-local context processing for human-like perception and intent-conditioned multi-mode trajectory generation for cognitively-inspired planning. The proposed method demonstrates three principal advantages: comprehensive environmental understanding through hierarchical perception, robust planning exploration enabled by multi-level planning, and diverse yet reasonable multi-modal trajectory generation facilitated by dual-level uncertainty modeling. Extensive experiments on nuScenes and Bench2Drive demonstrate that CogAD achieves state-of-the-art performance in end-to-end planning, exhibiting particular superiority in long-tail scenarios and robust generalization to complex real-world driving conditions.
Solving Robotics Tasks with Prior Demonstration via Exploration-Efficient Deep Reinforcement Learning
This paper proposes an exploration-efficient Deep Reinforcement Learning with Reference policy (DRLR) framework for learning robotics tasks that incorporates demonstrations. The DRLR framework is developed based on an algorithm called Imitation Bootstrapped Reinforcement Learning (IBRL). We propose to improve IBRL by modifying the action selection module. The proposed action selection module provides a calibrated Q-value, which mitigates the bootstrapping error that otherwise leads to inefficient exploration. Furthermore, to prevent the RL policy from converging to a sub-optimal policy, SAC is used as the RL policy instead of TD3. The effectiveness of our method in mitigating bootstrapping error and preventing overfitting is empirically validated by learning two robotics tasks: bucket loading and open drawer, which require extensive interactions with the environment. Simulation results also demonstrate the robustness of the DRLR framework across tasks with both low and high state-action dimensions, and varying demonstration qualities. To evaluate the developed framework on a real-world industrial robotics task, the bucket loading task is deployed on a real wheel loader. The sim2real results validate the successful deployment of the DRLR framework.
comment: This paper has been accepted for Journal publication in Frontiers in Robotics and AI
Properties of Lyapunov Subcenter Manifolds in Conservative Mechanical Systems
Multi-body mechanical systems have rich internal dynamics, whose solutions can be exploited as efficient control targets. Yet, solutions non-trivially depend on system parameters, obscuring feasible properties for use as target trajectories. For periodic regulation tasks in robotics applications, we investigate properties of nonlinear normal modes (NNMs) collected in Lyapunov subcenter manifolds (LSMs) of conservative mechanical systems. Using a time-symmetry of conservative mechanical systems, we show that mild non-resonance conditions guarantee LSMs to be Eigenmanifolds, in which NNMs are guaranteed to oscillate between two points of zero velocity. We also prove the existence of a unique generator, which is a connected, 1D manifold that collects these points of zero velocity for a given Eigenmanifold. Furthermore, we show that an additional spatial symmetry provides LSMs with yet stronger properties of Rosenberg manifolds. Here all brake trajectories pass through a unique equilibrium configuration, which can be favorable for control applications. These theoretical results are numerically confirmed on two mechanical systems: a double pendulum and a 5-link pendulum.
comment: 18 pages, 27 figures, submitted to Automatica
SynDroneVision: A Synthetic Dataset for Image-Based Drone Detection
Developing robust drone detection systems is often constrained by the limited availability of large-scale annotated training data and the high costs associated with real-world data collection. However, leveraging synthetic data generated via game engine-based simulations provides a promising and cost-effective solution to overcome this issue. Therefore, we present SynDroneVision, a synthetic dataset specifically designed for RGB-based drone detection in surveillance applications. Featuring diverse backgrounds, lighting conditions, and drone models, SynDroneVision offers a comprehensive training foundation for deep learning algorithms. To evaluate the dataset's effectiveness, we perform a comparative analysis across a selection of recent YOLO detection models. Our findings demonstrate that SynDroneVision is a valuable resource for real-world data enrichment, achieving notable enhancements in model performance and robustness, while significantly reducing the time and costs of real-world data acquisition. SynDroneVision will be publicly released upon paper acceptance.
RoboReward: General-Purpose Vision-Language Reward Models for Robotics
A well-designed reward is critical for effective reinforcement learning-based policy improvement. In real-world robotics, obtaining such rewards typically requires either labor-intensive human labeling or brittle, handcrafted objectives. Vision-language models (VLMs) have shown promise as automatic reward models, yet their effectiveness on real robot tasks is poorly understood. In this work, we aim to close this gap by introducing (1) RoboReward, a robotics reward dataset and benchmark built on large-scale real-robot corpora from Open X-Embodiment (OXE) and RoboArena, and (2) vision-language reward models trained on this dataset (RoboReward 4B/8B). Because OXE is success-heavy and lacks failure examples, we propose a negative examples data augmentation pipeline that generates calibrated negative and near-misses via counterfactual relabeling of successful episodes and temporal clipping to create partial-progress outcomes from the same videos. Using this framework, we build a large training and evaluation dataset spanning diverse tasks and embodiments to test whether state-of-the-art VLMs can reliably provide rewards for robot learning. Our evaluation of open and proprietary VLMs finds that no model excels across tasks, highlighting substantial room for improvement. We then train general-purpose 4B- and 8B-parameter models that outperform much larger VLMs in assigning rewards for short-horizon robotic tasks. Finally, we deploy the 8B model in real-robot reinforcement learning and find that it improves policy learning over Gemini Robotics-ER 1.5 while narrowing the gap to RL training with human-provided rewards. We release the full dataset, trained reward models, and evaluation suite on our website to advance the development of general-purpose reward models in robotics: https://crfm.stanford.edu/helm/robo-reward-bench (project website).
Human-Aided Trajectory Planning for Automated Vehicles through Teleoperation and Arbitration Graphs
Teleoperation enables remote human support of automated vehicles in scenarios where the automation is not able to find an appropriate solution. Remote assistance concepts, where operators provide discrete inputs to aid specific automation modules like planning, is gaining interest due to its reduced workload on the human remote operator and improved safety. However, these concepts are challenging to implement and maintain due to their deep integration and interaction with the automated driving system. In this paper, we propose a solution to facilitate the implementation of remote assistance concepts that intervene on planning level and extend the operational design domain of the vehicle at runtime. Using arbitration graphs, a modular decision-making framework, we integrate remote assistance into an existing automated driving system without modifying the original software components. Our simulative implementation demonstrates this approach in two use cases, allowing operators to adjust planner constraints and enable trajectory generation beyond nominal operational design domains.
comment: 7 pages, 8 figures, presented at IEEE Intelligent Vehicles Symposium 2025, video demonstration available at https://www.youtube.com/watch?v=fVSO-YOeGMk
Bio-Skin: A Cost-Effective Thermostatic Tactile Sensor with Multi-Modal Force and Temperature Detection IROS2025
Tactile sensors can significantly enhance the perception of humanoid robotics systems by providing contact information that facilitates human-like interactions. However, existing commercial tactile sensors focus on improving the resolution and sensitivity of single-modal detection with high-cost components and densely integrated design, incurring complex manufacturing processes and unaffordable prices. In this work, we present Bio-Skin, a cost-effective multi-modal tactile sensor that utilizes single-axis Hall-effect sensors for planar normal force measurement and bar-shape piezo resistors for 2D shear force measurement. A thermistor coupling with a heating wire is integrated into a silicone body to achieve temperature sensation and thermostatic function analogous to human skin. We also present a cross-reference framework to validate the two modalities of the force sensing signal, improving the sensing fidelity in a complex electromagnetic environment. Bio-Skin has a multi-layer design, and each layer is manufactured sequentially and subsequently integrated, thereby offering a fast production pathway. After calibration, Bio-Skin demonstrates performance metrics-including signal-to-range ratio, sampling rate, and measurement range-comparable to current commercial products, with one-tenth of the cost. The sensor's real-world performance is evaluated using an Allegro hand in object grasping tasks, while its temperature regulation functionality was assessed in a material detection task.
comment: This work has been published by IROS2025
Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems
The rapid advancement of autonomous systems, including self-driving vehicles and drones, has intensified the need to forge true Spatial Intelligence from multi-modal onboard sensor data. While foundation models excel in single-modal contexts, integrating their capabilities across diverse sensors like cameras and LiDAR to create a unified understanding remains a formidable challenge. This paper presents a comprehensive framework for multi-modal pre-training, identifying the core set of techniques driving progress toward this goal. We dissect the interplay between foundational sensor characteristics and learning strategies, evaluating the role of platform-specific datasets in enabling these advancements. Our central contribution is the formulation of a unified taxonomy for pre-training paradigms: ranging from single-modality baselines to sophisticated unified frameworks that learn holistic representations for advanced tasks like 3D object detection and semantic occupancy prediction. Furthermore, we investigate the integration of textual inputs and occupancy representations to facilitate open-world perception and planning. Finally, we identify critical bottlenecks, such as computational efficiency and model scalability, and propose a roadmap toward general-purpose multi-modal foundation models capable of achieving robust Spatial Intelligence for real-world deployment.
comment: Survey; 40 pages, 7 figures, 9 tables; GitHub Repo at https://github.com/worldbench/awesome-spatial-intelligence
FoldNet: Learning Generalizable Closed-Loop Policy for Garment Folding via Keypoint-Driven Asset and Demonstration Synthesis
Due to the deformability of garments, generating a large amount of high-quality data for robotic garment manipulation tasks is highly challenging. In this paper, we present a synthetic garment dataset that can be used for robotic garment folding. We begin by constructing geometric garment templates based on keypoints and applying generative models to generate realistic texture patterns. Leveraging these keypoint annotations, we generate folding demonstrations in simulation and train folding policies via closed-loop imitation learning. To improve robustness, we propose KG-DAgger, which uses a keypoint-based strategy to generate demonstration data for recovering from failures. KG-DAgger significantly improves the model performance, boosting the real-world success rate by 25\%. After training with 15K trajectories (about 2M image-action pairs), the model achieves a 75\% success rate in the real world. Experiments in both simulation and real-world settings validate the effectiveness of our proposed framework.
What Drives Success in Physical Planning with Joint-Embedding Predictive World Models?
A long-standing challenge in AI is to develop agents capable of solving a wide range of physical tasks and generalizing to new, unseen tasks and environments. A popular recent approach involves training a world model from state-action trajectories and subsequently use it with a planning algorithm to solve new tasks. Planning is commonly performed in the input space, but a recent family of methods has introduced planning algorithms that optimize in the learned representation space of the world model, with the promise that abstracting irrelevant details yields more efficient planning. In this work, we characterize models from this family as JEPA-WMs and investigate the technical choices that make algorithms from this class work. We propose a comprehensive study of several key components with the objective of finding the optimal approach within the family. We conducted experiments using both simulated environments and real-world robotic data, and studied how the model architecture, the training objective, and the planning algorithm affect planning success. We combine our findings to propose a model that outperforms two established baselines, DINO-WM and V-JEPA-2-AC, in both navigation and manipulation tasks. Code, data and checkpoints are available at https://github.com/facebookresearch/jepa-wms.
comment: V2 of the article: - Added AdaLN-zero - Added table comparing JEPA-WMs with baselines with std translating per-seed variability only, no variability across epochs - Reordered figures in main body of the paper
Grasp the Graph (GtG) 2.0: Ensemble of Graph Neural Networks for High-Precision Grasp Pose Detection in Clutter
Grasp pose detection in cluttered, real-world environments remains a significant challenge due to noisy and incomplete sensory data combined with complex object geometries. This paper introduces Grasp the Graph 2.0 (GtG 2.0) method, a lightweight yet highly effective hypothesis-and-test robotics grasping framework which leverages an ensemble of Graph Neural Networks for efficient geometric reasoning from point cloud data. Building on the success of GtG 1.0, which demonstrated the potential of Graph Neural Networks for grasp detection but was limited by assumptions of complete, noise-free point clouds and 4-Dof grasping, GtG 2.0 employs a conventional Grasp Pose Generator to efficiently produce 7-Dof grasp candidates. Candidates are assessed with an ensemble Graph Neural Network model which includes points within the gripper jaws (inside points) and surrounding contextual points (outside points). This improved representation boosts grasp detection performance over previous methods using the same generator. GtG 2.0 shows up to a 35% improvement in Average Precision on the GraspNet-1Billion benchmark compared to hypothesis-and-test and Graph Neural Network-based methods, ranking it among the top three frameworks. Experiments with a 3-Dof Delta Parallel robot and Kinect-v1 camera show a success rate of 91% and a clutter completion rate of 100%, demonstrating its flexibility and reliability.
comment: 20 pages
Symbolic Planning and Multi-Agent Path Finding in Extremely Dense Environments with Unassigned Agents AAAI
We introduce the Block Rearrangement Problem (BRaP), a challenging component of large warehouse management which involves rearranging storage blocks within dense grids to achieve a goal state. We formally define the BRaP as a graph search problem. Building on intuitions from sliding puzzle problems, we propose five search-based solution algorithms, leveraging joint configuration space search, classical planning, multi-agent pathfinding, and expert heuristics. We evaluate the five approaches empirically for plan quality and scalability. Despite the exponential relation between search space size and block number, our methods demonstrate efficiency in creating rearrangement plans for deeply buried blocks in up to 80x80 grids.
comment: AAAI Conference on Artificial Intelligence (AAAI-26)
Multiagent Systems
Nalar: An agent serving framework
LLM-driven agentic applications increasingly automate complex, multi-step tasks, but serving them efficiently remains challenging due to heterogeneous components, dynamic and model-driven control flow, long-running state, and unpredictable latencies. Nalar is a ground-up agent-serving framework that cleanly separates workflow specification from execution while providing the runtime visibility and control needed for robust performance. Nalar preserves full Python expressiveness, using lightweight auto-generated stubs that turn agent and tool invocations into futures carrying dependency and context metadata. A managed state layer decouples logical state from physical placement, enabling safe reuse, migration, and consistent retry behavior. A two-level control architecture combines global policy computation with local event-driven enforcement to support adaptive routing, scheduling, and resource management across evolving workflows. Together, these mechanisms allow Nalar to deliver scalable, efficient, and policy-driven serving of heterogeneous agentic applications without burdening developers with orchestration logic. Across three agentic workloads, Nalar cuts tail latency by 34--74\%, achieves up to $2.9\times$ speedups, sustains 80 RPS where baselines fail, and scales to 130K futures with sub-500 ms control overhead.
FinDeepForecast: A Live Multi-Agent System for Benchmarking Deep Research Agents in Financial Forecasting
Deep Research (DR) Agents powered by advanced Large Language Models (LLMs) have fundamentally shifted the paradigm for completing complex research tasks. Yet, a comprehensive and live evaluation of their forecasting performance on real-world, research-oriented tasks in high-stakes domains (e.g., finance) remains underexplored. We introduce FinDeepForecast, the first live, end-to-end multi-agent system for automatically evaluating DR agents by continuously generating research-oriented financial forecasting tasks. This system is equipped with a dual-track taxonomy, enabling the dynamic generation of recurrent and non-recurrent forecasting tasks at both corporate and macro levels. With this system, we generate FinDeepForecastBench, a weekly evaluation benchmark over a ten-week horizon, encompassing 8 global economies and 1,314 listed companies, and evaluate 13 representative methods. Extensive experiments show that, while DR agents consistently outperform strong baselines, their performance still falls short of genuine forward-looking financial reasoning. We expect the proposed FinDeepForecast system to consistently facilitate future advancements of DR agents in research-oriented financial forecasting tasks. The benchmark and leaderboard are publicly available on the OpenFinArena Platform.
From Idea to Co-Creation: A Planner-Actor-Critic Framework for Agent Augmented 3D Modeling
We present a framework that extends the Actor-Critic architecture to creative 3D modeling through multi-agent self-reflection and human-in-the-loop supervision. While existing approaches rely on single-prompt agents that directly execute modeling commands via tools like Blender MCP, our approach introduces a Planner-Actor-Critic architecture. In this design, the Planner coordinates modeling steps, the Actor executes them, and the Critic provides iterative feedback, while human users act as supervisors and advisors throughout the process. Through systematic comparison between single-prompt modeling and our reflective multi-agent approach, we demonstrate improvements in geometric accuracy, aesthetic quality, and task completion rates across diverse 3D modeling scenarios. Our evaluation reveals that critic-guided reflection, combined with human supervisory input, reduces modeling errors and increases complexity and quality of the result compared to direct single-prompt execution. This work establishes that structured agent self-reflection, when augmented by human oversight and advisory guidance, produces higher-quality 3D models while maintaining efficient workflow integration through real-time Blender synchronization.
Defense Against Indirect Prompt Injection via Tool Result Parsing
As LLM agents transition from digital assistants to physical controllers in autonomous systems and robotics, they face an escalating threat from indirect prompt injection. By embedding adversarial instructions into the results of tool calls, attackers can hijack the agent's decision-making process to execute unauthorized actions. This vulnerability poses a significant risk as agents gain more direct control over physical environments. Existing defense mechanisms against Indirect Prompt Injection (IPI) generally fall into two categories. The first involves training dedicated detection models; however, this approach entails high computational overhead for both training and inference, and requires frequent updates to keep pace with evolving attack vectors. Alternatively, prompt-based methods leverage the inherent capabilities of LLMs to detect or ignore malicious instructions via prompt engineering. Despite their flexibility, most current prompt-based defenses suffer from high Attack Success Rates (ASR), demonstrating limited robustness against sophisticated injection attacks. In this paper, we propose a novel method that provides LLMs with precise data via tool result parsing while effectively filtering out injected malicious code. Our approach achieves competitive Utility under Attack (UA) while maintaining the lowest Attack Success Rate (ASR) to date, significantly outperforming existing methods. Code is available at GitHub.
comment: 20 pages, 3 figures, 5 tables
When Single-Agent with Skills Replace Multi-Agent Systems and When They Fail
Multi-agent AI systems have proven effective for complex reasoning. These systems are compounded by specialized agents, which collaborate through explicit communication, but incur substantial computational overhead. A natural question arises: can we achieve similar modularity benefits with a single agent that selects from a library of skills? We explore this question by viewing skills as internalized agent behaviors. From this perspective, a multi-agent system can be compiled into an equivalent single-agent system, trading inter-agent communication for skill selection. Our preliminary experiments suggest this approach can substantially reduce token usage and latency while maintaining competitive accuracy on reasoning benchmarks. However, this efficiency raises a deeper question that has received little attention: how does skill selection scale as libraries grow? Drawing on principles from cognitive science, we propose that LLM skill selection exhibits bounded capacity analogous to human decision-making. We investigate the scaling behavior of skill selection and observe a striking pattern. Rather than degrading gradually, selection accuracy remains stable up to a critical library size, then drops sharply, indicating a phase transition reminiscent of capacity limits in human cognition. Furthermore, we find evidence that semantic confusability among similar skills, rather than library size alone, plays a central role in this degradation. This perspective suggests that hierarchical organization, which has long helped humans manage complex choices, may similarly benefit AI systems. Our initial results with hierarchical routing support this hypothesis. This work opens new questions about the fundamental limits of semantic-based skill selection in LLMs and offers a cognitive-grounded framework and practical guidelines for designing scalable skill-based agents.
comment: 25 pages, technical report
Adversarial Yet Cooperative: Multi-Perspective Reasoning in Retrieved-Augmented Language Models
Recent advances in synergizing large reasoning models (LRMs) with retrieval-augmented generation (RAG) have shown promising results, yet two critical challenges remain: (1) reasoning models typically operate from a single, unchallenged perspective, limiting their ability to conduct deep, self-correcting reasoning over external documents, and (2) existing training paradigms rely excessively on outcome-oriented rewards, which provide insufficient signal for shaping the complex, multi-step reasoning process. To address these issues, we propose an Reasoner-Verifier framework named Adversarial Reasoning RAG (ARR). The Reasoner and Verifier engage in reasoning on retrieved evidence and critiquing each other's logic while being guided by process-aware advantage that requires no external scoring model. This reward combines explicit observational signals with internal model uncertainty to jointly optimize reasoning fidelity and verification rigor. Experiments on multiple benchmarks demonstrate the effectiveness of our method.
Autonomous Agents on Blockchains: Standards, Execution Models, and Trust Boundaries
Advances in large language models have enabled agentic AI systems that can reason, plan, and interact with external tools to execute multi-step workflows, while public blockchains have evolved into a programmable substrate for value transfer, access control, and verifiable state transitions. Their convergence introduces a high-stakes systems challenge: designing standard, interoperable, and secure interfaces that allow agents to observe on-chain state, formulate transaction intents, and authorize execution without exposing users, protocols, or organizations to unacceptable security, governance, or economic risks. This survey systematizes the emerging landscape of agent-blockchain interoperability through a systematic literature review, identifying 317 relevant works from an initial pool of over 3000 records. We contribute a five-part taxonomy of integration patterns spanning read-only analytics, simulation and intent generation, delegated execution, autonomous signing, and multi-agent workflows; a threat model tailored to agent-driven transaction pipelines that captures risks ranging from prompt injection and policy misuse to key compromise, adversarial execution dynamics, and multi-agent collusion; and a comparative capability matrix analyzing more than 20 representative systems across 13 dimensions, including custody models, permissioning, policy enforcement, observability, and recovery. Building on the gaps revealed by this analysis, we outline a research roadmap centered on two interface abstractions: a Transaction Intent Schema for portable and unambiguous goal specification, and a Policy Decision Record for auditable, verifiable policy enforcement across execution environments. We conclude by proposing a reproducible evaluation suite and benchmarks for assessing the safety, reliability, and economic robustness of agent-mediated on-chain execution.
A Closed-Loop Multi-Agent System Driven by LLMs for Meal-Level Personalized Nutrition Management
Personalized nutrition management aims to tailor dietary guidance to an individual's intake and phenotype, but most existing systems handle food logging, nutrient analysis and recommendation separately. We present a next-generation mobile nutrition assistant that combines image based meal logging with an LLM driven multi agent controller to provide meal level closed loop support. The system coordinates vision, dialogue and state management agents to estimate nutrients from photos and update a daily intake budget. It then adapts the next meal plan to user preferences and dietary constraints. Experiments with SNAPMe meal images and simulated users show competitive nutrient estimation, personalized menus and efficient task plans. These findings demonstrate the feasibility of multi agent LLM control for personalized nutrition and reveal open challenges in micronutrient estimation from images and in large scale real world studies.
comment: 6 pages, 6 figures, 6 tables, Conference: Robotics, Automation, and Artificial Intelligence 2025
On the Transition to an Auction-based Intelligent Parking Assignment System
Finding a free parking space in a city has become a challenging task over the past decades. A recently proposed auction-based parking assignment can alleviate cruising for parking and also set a market-driven, demand-responsive parking price. However, the wide acceptance of such a system is far from certain. To evaluate the merits of auction-based parking assignment, we assume that drivers have access to a smartphone-based reservation system prior to its mandatory introduction and thus have the opportunity to test and experience its merits voluntarily. We set our experiment as Eclipse SUMO simulations with different rates of participants and non-participants to check how different market penetration levels affect the traffic flow, the performance of the auction-based assignment system, and the financial outcomes. The results show that the auction-based system improves traffic flow with increasing penetration rates, allowing participants to park gradually closer to their preferred parking lots. However, it comes with a price; the system also increases parking expenditures for participants. Interestingly, non-participating drivers will face even higher parking prices. Consequently, they will be motivated to use the new system.
Interactive Distillation for Cooperative Multi-Agent Reinforcement Learning
Knowledge distillation (KD) has the potential to accelerate MARL by employing a centralized teacher for decentralized students but faces key bottlenecks. Specifically, there are (1) challenges in synthesizing high-performing teaching policies in complex domains, (2) difficulties when teachers must reason in out-of-distribution (OOD) states, and (3) mismatches between the decentralized students' and the centralized teacher's observation spaces. To address these limitations, we propose HINT (Hierarchical INteractive Teacher-based transfer), a novel KD framework for MARL in a centralized training, decentralized execution setup. By leveraging hierarchical RL, HINT provides a scalable, high-performing teacher. Our key innovation, pseudo off-policy RL, enables the teacher policy to be updated using both teacher and student experience, thereby improving OOD adaptation. HINT also applies performance-based filtering to retain only outcome-relevant guidance, reducing observation mismatches. We evaluate HINT on challenging cooperative domains (e.g., FireCommander for resource allocation, MARINE for tactical combat). Across these benchmarks, HINT outperforms baselines, achieving improvements of 60% to 165% in success rate.
PRISM: Protocol Refinement through Intelligent Simulation Modeling SC
Automating experimental protocol design and execution remains as a fundamental bottleneck in realizing self-driving laboratories. We introduce PRISM (Protocol Refinement through Intelligent Simulation Modeling), a framework that automates the design, validation, and execution of experimental protocols on a laboratory platform composed of off-the-shelf robotic instruments. PRISM uses a set of language-model-based agents that work together to generate and refine experimental steps. The process begins with automatically gathering relevant procedures from web-based sources describing experimental workflows. These are converted into structured experimental steps (e.g., liquid handling steps, deck layout and other related operations) through a planning, critique, and validation loop. The finalized steps are translated into the Argonne MADSci protocol format, which provides a unified interface for coordinating multiple robotic instruments (Opentrons OT-2 liquid handler, PF400 arm, Azenta plate sealer and peeler) without requiring human intervention between steps. To evaluate protocol-generation performance, we benchmarked both single reasoning models and multi-agent workflow across constrained and open-ended prompting paradigms. The resulting protocols were validated in a digital-twin environment built in NVIDIA Omniverse to detect physical or sequencing errors before execution. Using Luna qPCR amplification and Cell Painting as case studies, we demonstrate PRISM as a practical end-to-end workflow that bridges language-based protocol generation, simulation-based validation, and automated robotic execution.
comment: 43 pages, 8 figures, submitted to RSC Digital Discovery. Equal contribution: B. Hsu, P.V. Setty, R.M. Butler. Corresponding author: A. Ramanathan
A Novel Convex Layers Strategy for Circular Formation in Multi-Agent Systems
This article considers the problem of conflict-free distribution of point-sized agents on a circular periphery encompassing all agents. The two key elements of the proposed policy include the construction of a set of convex layers (nested convex polygons) using the initial positions of the agents, and a novel search space region for each of the agents. The search space for an agent on a convex layer is defined as the region enclosed between the lines passing through the agent's position and normal to its supporting edges. Guaranteeing collision-free paths, a goal assignment policy designates a unique goal position within the search space of an agent at the initial time itself, requiring no further computation thereafter. In contrast to the existing literature, this work presents a one-shot, collision-free solution to the circular distribution problem by utilizing only the initial positions of the agents. Illustrative examples and extensive Monte-Carlo studies considering various practical attributes demonstrate the effectiveness of the proposed method.
comment: Accepted to IEEE Transactions on Systems, Man, and Cybernetics: Systems (Early Access) 15 pages, 18 figures
Symbolic Planning and Multi-Agent Path Finding in Extremely Dense Environments with Unassigned Agents AAAI
We introduce the Block Rearrangement Problem (BRaP), a challenging component of large warehouse management which involves rearranging storage blocks within dense grids to achieve a goal state. We formally define the BRaP as a graph search problem. Building on intuitions from sliding puzzle problems, we propose five search-based solution algorithms, leveraging joint configuration space search, classical planning, multi-agent pathfinding, and expert heuristics. We evaluate the five approaches empirically for plan quality and scalability. Despite the exponential relation between search space size and block number, our methods demonstrate efficiency in creating rearrangement plans for deeply buried blocks in up to 80x80 grids.
comment: AAAI Conference on Artificial Intelligence (AAAI-26)
Systems and Control (EESS)
Online Bayesian Learning of Agent Behavior in Differential Games
This work introduces an online Bayesian game-theoretic method for behavior identification in multi-agent dynamical systems. By casting Hamilton-Jacobi-Bellman optimality conditions as linear-in-parameter residuals, the method enables fast sequential Bayesian updates, uncertainty-aware inference, and robust prediction from limited, noisy data-without history stacks. The approach accommodates nonlinear dynamics and nonquadratic value functions through basis expansions, providing flexible models. Experiments, including linear-quadratic and nonlinear shared-control scenarios, demonstrate accurate prediction with quantified uncertainty, highlighting the method's relevance for adaptive interaction and real-time decision making.
Driver-Intention Prediction with Deep Learning: Real-Time Brain-to-Vehicle Communication
Brain-computer interfaces (BCIs) allow direct communication between the brain and electronics without the need for speech or physical movement. Such interfaces can be particularly beneficial in applications requiring rapid response times, such as driving, where a vehicle's advanced driving assistance systems could benefit from immediate understanding of a driver's intentions. This study presents a novel method for predicting a driver's intention to steer using electroencephalography (EEG) signals through deep learning. A driving simulator created a controlled environment in which participants imagined controlling a vehicle during various driving scenarios, including left and right turns, as well as straight driving. A convolutional neural network (CNN) classified the detected EEG data with minimal pre-processing. Our model achieved an accuracy of 83.7% in distinguishing between the three steering intentions and demonstrated the ability of CNNs to process raw EEG data effectively. The classification accuracy was highest for right-turn segments, which suggests a potential spatial bias in brain activity. This study lays the foundation for more intuitive brain-to-vehicle communication systems.
comment: 6 pages, 7 figures
Effect of Dispatch Decisions on Small-Signal Stability of Converter-Dominated Power Systems
Small-signal stability of modern converter-dominated power systems has been the subject of extensive research, particularly from the perspective of device-level control design for grid-forming (GFM) and grid-following (GFL) converters. However, the influence of power flow variables on system stability has received limited attention. Conventional small-signal stability analyses are typically conducted at a specific operating point, emphasizing the selection of control or system design parameters while neglecting the sensitivity of stability characteristics to operating conditions. This paper seeks to bridge this gap by systematically investigating the impact of dispatch decisions on the small-signal stability of converter-based power systems. Our findings are first illustrated on a three-bus system and then validated on the standard IEEE 39-bus test system to demonstrate scalability. Across the test systems, we find that high-voltage capacitive operation of GFL converters limits its active power injection, whereas inductive operation permits higher injections, and it is generally preferable for the GFM converter to supply more active power.
ZIVR: An Incremental Variance Reduction Technique For Zeroth-Order Composite Problems
This paper investigates zeroth-order (ZO) finite-sum composite optimization. Recently, variance reduction techniques have been applied to ZO methods to mitigate the non-vanishing variance of 2-point estimators in constrained/composite optimization, yielding improved convergence rates. However, existing ZO variance reduction methods typically involve batch sampling of size at least $Θ(n)$ or $Θ(d)$, which can be computationally prohibitive for large-scale problems. In this work, we propose a general variance reduction framework, Zeroth-Order Incremental Variance Reduction (ZIVR), which supports flexible implementations$\unicode{x2014}$including a pure 2-point zeroth-order algorithm that eliminates the need for large batch sampling. Furthermore, we establish comprehensive convergence guarantees for ZIVR across strongly-convex, convex, and non-convex settings that match their first-order counterparts. Numerical experiments validate the effectiveness of our proposed algorithm.
Safe Reinforcement Learning Beyond Baseline Control: A Hierarchical Framework for Space Triangle Tethered Formation System
Triangular tethered formation system (TTFS) provide a promising platform for deep space exploration and distributed sensing due to its intrinsic spatial-orientation stability and capability of adjusting distances among node satellites through deployment and retrieval of tethers. However, due to the coupled tether-satellite dynamics and disturbance sensitivity of TTFS, traditional control methods struggle to achieve a balanced trade-off among configuration accuracy requirements, tension constraints, and energy efficiency consumption throughout the deployment process.In this paper, a novel model-reference reinforcement learning control framework is proposed for TTFS. By integrating baseline model-based control with a Soft Actor-Critic (SAC) compensator, the proposed method simultaneously achieves high-precision tracking, fuel efficiency, and compliance with tension limits. A hierarchical training scheme is developed to address the convergence difficulties arising from strongly coupled states in centralized training, while tailored reward functions, reset conditions, and normalization criteria are designed to accelerate training convergence. Closed-loop stability of the overall control law is rigorously proven using Lyapunov methods. Simulation results demonstrate that the proposed controller reduces steady-state tracking errors by over 96% for tethers and 99% for node satellites, while cutting fuel consumption by two orders of magnitude compared with the baseline method. These results validate the effectiveness and stability of the proposed approach for TTFS deployment control.
comment: This work has been submitted to the IEEE for possible publication
Towards Sustainable 6G: A Holistic View of Trade-offs and Enablers
The sixth generation of mobile networks (6G) can play a central role in shaping a sustainable future, the most compelling contemporary challenge. Connecting the unconnected, reducing carbon emissions of vertical sectors, and allowing heterogeneous types of intelligence (including humans) to safely and constructively interact in complex environments, are only a few of the several challenges that can be supported by 6G. However, this requires a careful design that balances positive and negative impacts of 6G, towards a sustainable and sustainability-enabling technology. This paper presents a holistic view that translates the complex interplay between the 6G enabling effects and the sustainability of 6G by design, into concrete trade-offs and research questions. Starting from today's challenges for society and associated key values, we unfold the dilemma into a set of technical trade-offs, whose solutions span from technological innovations to standardization actions towards applicability.
comment: This work has been submitted to IEEE Communications Magazine
Matrix-Valued Passivity Indices: Foundations, Properties, and Stability Implications
The passivity index, a quantitative measure of a system's passivity deficiency or excess, has been widely used in stability analysis and control. Existing studies mostly rely on scalar forms of indices, which are restrictive for multi-input, multi-output (MIMO) systems. This paper extends the classical scalar indices to a systematic matrix-valued framework, referred to as passivity matrices. A broad range of classical results in passivity theory can be naturally generalized in this framework. We first show that, under the matrix representation, passivity indices essentially correspond to the curvature of the dissipativity functional under a second-variation interpretation. This result reveals that the intrinsic geometric structure of passivity consists of its directions and intensities, which a scalar index cannot fully capture. For linear time-invariant (LTI) systems, we examine the structural properties of passivity matrices with respect to the Loewner partial order and propose two principled criteria for selecting representative matrices. Compared with conventional scalar indices, the matrix-valued indices capture the passivity coupling among different input-output channels in MIMO systems and provide a more comprehensive description of system passivity. This richer information leads to lower passivation effort and less conservative stability assessment.
comment: 12 pages, 4 figures
CircuitLM: A Multi-Agent LLM-Aided Design Framework for Generating Circuit Schematics from Natural Language Prompts
Generating accurate circuit schematics from high-level natural language descriptions remains a persistent challenge in electronics design, as large language models (LLMs) frequently hallucinate in granular details, violate electrical constraints, and produce non-machine-readable outputs. We present CircuitLM, a novel multi-agent LLM-aided circuit design pipeline that translates user prompts into structured, visually interpretable CircuitJSON schematics through five sequential stages: (i) LLM-based component identification, (ii) canonical pinout retrieval, (iii) chain-of-thought reasoning by an electronics expert agent, (iv) JSON schematic synthesis, and (v) force-directed SVG visualization. Anchored by a curated, embedding-powered component knowledge base. While LLMs often violate electrical constraints, CircuitLM bridges this gap by grounding generation in a verified and dynamically extensible component database, initially comprising 50 components. To ensure safety, we incorporate a hybrid evaluation framework, namely Dual-Metric Circuit Validation (DMCV), validated against human-expert assessments, which achieves high fidelity in microcontroller-centric designs. We evaluate the system on 100 diverse embedded-systems prompts across six LLMs and introduce DMCV to assess both structural and electrical validity. This work bridges natural language input to deployable hardware designs, enabling reliable circuit prototyping by non-experts. Our code and data will be made public upon acceptance.
comment: Under review, 13 pages, 11 figures, 2 tables
Definition and Formulation of Inertia Service Incorporating Inverter-Based Resources
Increasing concerns over the scarcity of inertia have motivated the procurement of inertia as an ancillary service (AS). Despite numerous academic and practical efforts, there remains a lack of consensus regarding the definition and treatment of inertia service in market operations, particularly the specification of inertia variables and the separation between synchronous inertia (SI) from synchronous generators and virtual inertia (VI) from inverter-based resources. To address these issues, this paper proposes a power-oriented (P-oriented) definition based on inertial response, which establishes conceptual consistency between SI and VI and makes the inertia service commensurable with other ASs. This definition explicitly incorporates both positive and negative inertial responses during frequency drop events. We then formulate a security-constrained economic dispatch framework based on this P-oriented definition and demonstrate its practical effectiveness through simulations. Case studies on a modified IEEE 30-bus system show that the proposed bidirectional service definition ensures price signals that reflect the economic value of inertial provision.
comment: 10 pages, 5 figures
Experimental Demonstration of a Decentralized Electromagnetic Formation Flying Control Using Alternating Magnetic Field Forces
Electromagnetic formation flying (EMFF) is challenging due to the complex coupling between the electromagnetic fields generated by each satellite in the formation. To address this challenge, this article uses alternating magnetic field forces (AMFF) to decouple the electromagnetic forces between each pair of satellites. Each satellite's electromagnetic actuation system is driven by a sum of amplitude-modulated sinusoids, where amplitudes are controlled to achieve desired forces between each pair of satellites. The main contribution of this article is a 3-satellite experimental demonstration of decentralized closed-loop EMFF using AMFF. To our knowledge, this is the first demonstration of AMFF with at least 3 satellites in open or closed loop. This is noteworthy because the coupling challenges of EMFF are only present with more than 2 satellites, and thus, a formation of at least 3 is necessary to evaluate the effectiveness of AMFF. The experiments are conducted on a ground-based testbed consisting of 3 electromagnetically actuated satellites on linear air tracks. The closed-loop experimental results are compared with behavior from numerical simulations.
comment: Preprint submitted to Aerospace Science and Technology (Elsevier)
Data-Based Analysis of Relative Degree and Zero Dynamics in Linear Systems
Data-driven control offers a powerful alternative to traditional model-based methods, particularly when accurate system models are unavailable or prohibitively complex. While existing data-driven control methods primarily aim to construct controllers directly from measured data, our approach uses the available data to assess fundamental system-theoretic properties. This allows the informed selection of suitable control strategies without explicit model identification. We provide data-based conditions characterizing the (vector) relative degree and the stability of the zero dynamics, which are critical for ensuring proper performance of modern controllers. Our results cover both single- and multi-input/output settings of discrete-time linear systems. We further show how a continuous-time system can be reconstructed from three sampling discretizations obtained via Zero-order Hold at suitable sampling times, thus allowing the extension of the results to the combined data collected from these discretizations. All results can be applied directly to observed data sets using the proposed algorithms.
comment: 43 pages, submitted to MCSS
Noise-Tolerant Hybrid Approach for Data-Driven Predictive Control
This paper focuses on a key challenge in hybrid data-driven predictive control: the effect of measurement noise on Hankel matrices. While noise is handled in direct and indirect methods, hybrid approaches often overlook its impact during trajectory estimation. We propose a Noise-Tolerant Data-Driven Predictive Control (NTDPC) framework that integrates singular value decomposition to separate system dynamics from noise within reduced-order Hankel matrices. This enables accurate prediction with shorter data horizons and lower computational effort. A sensitivity index is introduced to support horizon selection under different noise levels. Simulation results indicate improved robustness and efficiency compared to existing hybrid methods.
Structural Methods for handling mode changes in multimode DAE systems
Hybrid systems are an important concept in Cyber-Physical Systems modeling, for which multiphysics modeling from first principles and the reuse of models from libraries are key. To achieve this, DAEs must be used to specify the dynamics in each discrete state (or mode in our context). This led to the development of DAE-based equational languages supporting multiple modes, of which Modelica is a popular standard. Mode switching can be time- or state-based. Impulsive behaviors can occur at mode changes. While mode changes are well understood in particular physics (e.g., contact mechanics), this is not the case in physics-agnostic paradigms such as Modelica. This situation causes difficulties for the compilation of programs, often requiring users to manually smooth out mode changes. In this paper, we propose a novel approach for the hot restart at mode changes in such paradigms. We propose a mathematical meaning for hot restarts (such a mathematical meaning does not exist in general), as well as a combined structural and impulse analysis for mode changes, generating the hot restart even in the presence of impulses. Our algorithm detects at compile time if the mode change is insufficiently specified, in which case it returns diagnostics information to the user.
comment: 53 pages, 3 figures
Repeater Swarm-Assisted Cellular Systems: Interaction Stability and Performance Analysis
We consider a cellular massive MIMO system where swarms of wireless repeaters are deployed to improve coverage. These repeaters are full-duplex relays with small form factors that receive and instantaneously retransmit signals. They can be deployed in a plug-and-play manner at low cost, while being transparent to the network--conceptually they are active channel scatterers with amplification capabilities. Two fundamental questions need to be addressed in repeater deployments: (I) How can we prevent destructive effects of positive feedback caused by inter-repeater interaction (i.e., each repeater receives and amplifies signals from others)? (ii) How much performance improvement can be achieved given that repeaters also inject noise and may introduce more interference? To answer these questions, we first derive a generalized Nyquist stability criterion for the repeater swarm system, and provide an easy-to-check stability condition. Then, we study the uplink performance and develop an efficient iterative algorithm that jointly optimizes the repeater gains, user transmit powers, and receive combining weights to maximize the weighted sum rate while ensuring system stability. Numerical results corroborate our theoretical findings and show that the repeaters can significantly improve the system performance, both in sub-6 GHz and millimeter-wave bands. The results also warrant careful deployment to fully realize the benefits of repeaters, for example, by ensuring a high probability of line-of-sight links between repeaters and the base station.
comment: 16 pages, 13 figures. Accepted for publication in IEEE Transactions on Wireless Communications
An Improved Dual-Attention Transformer-LSTM for Small-Sample Prediction of Modal Frequency and Actual Anchor Radius in Micro Hemispherical Resonator Design
The high-temperature glassblowing-fabricated micro hemispherical resonator (MHR) exhibits high symmetry and high Q-value for precision inertial navigation. However, MHR design entails a comprehensive evaluation of multiple possible configurations and demands extremely time-consuming simulation of key parameters combination. To address this problem, this paper proposed a rapid prediction method of modal frequency and actual anchor radius of designed MHR using an improved Transformer-LSTM (Long Short-Term Memory) model for rapid design sizing. High-temperature-induced softening deformation at the anchor point reduces the actual anchor radius below the designed value. By varying key parameters such as resonator height, anchor radius and edge thickness, finite element glassblowing simulation and modal analyse were conducted to obtain the first six modal frequencies and actual anchor radius. To address regression prediction challenges with limited data, dual multi-head self-attention (MHSA) mechanisms replaced the transformer's standard Feed Forward Network, to improve hidden information capture for high-accuracy predictions of modal frequencies and anchor radius. By checking fabricating feasibility of anchor radius and allowing rapid modal characteristics evaluation without interference, ablation and comparative experiments validated the method's superiority, as an effective support of MHR design. Design optimization experiments demonstrate a prediction accuracy of 96.35%, with computational time reduced to 1/48,000 of traditional finite element methods, significantly improving design efficiency. This study offers a new paradigm for intelligent Micro-Electro-Mechanical System (MEMS) device design under complex process conditions.
comment: Due to the fact that the results of this article are from simulation experiments and there is a certain gap with the actual experimental results, this article has not been corrected. Therefore, the authors Yang and Tu have not given final consent to this submitted version, nor have they authorized the submitter to publish this public preprint
Reliable Grid Forecasting: State Space Models for Safety-Critical Energy Systems
Accurate grid load forecasting is safety-critical: under-predictions risk supply shortfalls, while symmetric error metrics mask this operational asymmetry. We introduce a grid-specific evaluation framework (Asymmetric MAPE, Under-Prediction Rate, and Reserve Margin) that directly measures operational risk rather than statistical accuracy alone. Using this framework, we conduct a systematic evaluation of Mamba-based State Space Models for California grid forecasting on a weather-aligned CA ISO-TAC dataset spanning Nov 2023 to Nov 2025 (84,498 hourly records across 5 transmission areas). Our analysis reveals that standard accuracy metrics are poor proxies for operational safety: models with identical MAPE can require vastly different reserve margins. We demonstrate that forecast errors are weakly but statistically significantly associated with temperature (r = 0.16), motivating weather-aware modeling rather than loss function modification alone. The S-Mamba model achieves the lowest 99.5th-percentile reserve margin (14.12 percent) compared to 16.66 percent for iTransformer, demonstrating superior forecast reliability under a 99.5th-percentile tail-risk reserve proxy.
comment: 25 pages, 7 figures, 8 tables
The factorization of matrices into products of positive definite factors
Positive-definite matrices materialize as state transition matrices of linear time-invariant gradient flows, and the composition of such materializes as the state transition after successive steps where the driving potential is suitably adjusted. Thus, factoring an arbitrary matrix (with positive determinant) into a product of positive-definite ones provides the needed schedule for a time-varying potential to have a desired effect. The present work provides a detailed analysis of this factorization problem by lifting it into a sequence of Monge-Kantorovich transportation steps on Gaussian distributions and studying the induced holonomy of the optimal transportation problem. From this vantage point we determine the minimal number of positive-definite factors that have a desired effect on the spectrum of the product, e.g., ensure specified eigenvalues or being a rotation matrix. Our approach is computational and allows to identify the needed number of factors as well as trade off their conditioning number with their actual number.
comment: 10 pages, 1 figure
Robotics
Choreographing a World of Dynamic Objects
Dynamic objects in our physical 4D (3D + time) world are constantly evolving, deforming, and interacting with other objects, leading to diverse 4D scene dynamics. In this paper, we present a universal generative pipeline, CHORD, for CHOReographing Dynamic objects and scenes and synthesizing this type of phenomena. Traditional rule-based graphics pipelines to create these dynamics are based on category-specific heuristics, yet are labor-intensive and not scalable. Recent learning-based methods typically demand large-scale datasets, which may not cover all object categories in interest. Our approach instead inherits the universality from the video generative models by proposing a distillation-based pipeline to extract the rich Lagrangian motion information hidden in the Eulerian representations of 2D videos. Our method is universal, versatile, and category-agnostic. We demonstrate its effectiveness by conducting experiments to generate a diverse range of multi-body 4D dynamics, show its advantage compared to existing methods, and demonstrate its applicability in generating robotics manipulation policies. Project page: https://yanzhelyu.github.io/chord
Embedding Autonomous Agents in Resource-Constrained Robotic Platforms
Many embedded devices operate under resource constraints and in dynamic environments, requiring local decision-making capabilities. Enabling devices to make independent decisions in such environments can improve the responsiveness of the system and reduce the dependence on constant external control. In this work, we integrate an autonomous agent, programmed using AgentSpeak, with a small two-wheeled robot that explores a maze using its own decision-making and sensor data. Experimental results show that the agent successfully solved the maze in 59 seconds using 287 reasoning cycles, with decision phases taking less than one millisecond. These results indicate that the reasoning process is efficient enough for real-time execution on resource-constrained hardware. This integration demonstrates how high-level agent-based control can be applied to resource-constrained embedded systems for autonomous operation.
comment: This is an open-access, author-archived version of a manuscript published in European Conference on Multi-Agent Systems 2025
Hierarchical GNN-Based Multi-Agent Learning for Dynamic Queue-Jump Lane and Emergency Vehicle Corridor Formation
Emergency vehicles require rapid passage through congested traffic, yet existing strategies fail to adapt to dynamic conditions. We propose a novel hierarchical graph neural network (GNN)-based multi-agent reinforcement learning framework to coordinate connected vehicles for emergency corridor formation. Our approach uses a high-level planner for global strategy and low-level controllers for trajectory execution, utilizing graph attention networks to scale with variable agent counts. Trained via Multi-Agent Proximal Policy Optimization (MAPPO), the system reduces emergency vehicle travel time by 28.3% compared to baselines and 44.6% compared to uncoordinated traffic in simulations. The design achieves near-zero collision rates (0.3%) while maintaining 81% of background traffic efficiency. Ablation and generalization studies confirm the framework's robustness across diverse scenarios. These results demonstrate the effectiveness of combining GNNs with hierarchical learning for intelligent transportation systems.
comment: 16 Pages, 5 Figures, 9 Tables, submitted to IEEE TITS
Wow, wo, val! A Comprehensive Embodied World Model Evaluation Turing Test
As world models gain momentum in Embodied AI, an increasing number of works explore using video foundation models as predictive world models for downstream embodied tasks like 3D prediction or interactive generation. However, before exploring these downstream tasks, video foundation models still have two critical questions unanswered: (1) whether their generative generalization is sufficient to maintain perceptual fidelity in the eyes of human observers, and (2) whether they are robust enough to serve as a universal prior for real-world embodied agents. To provide a standardized framework for answering these questions, we introduce the Embodied Turing Test benchmark: WoW-World-Eval (Wow,wo,val). Building upon 609 robot manipulation data, Wow-wo-val examines five core abilities, including perception, planning, prediction, generalization, and execution. We propose a comprehensive evaluation protocol with 22 metrics to assess the models' generation ability, which achieves a high Pearson Correlation between the overall score and human preference (>0.93) and establishes a reliable foundation for the Human Turing Test. On Wow-wo-val, models achieve only 17.27 on long-horizon planning and at best 68.02 on physical consistency, indicating limited spatiotemporal consistency and physical reasoning. For the Inverse Dynamic Model Turing Test, we first use an IDM to evaluate the video foundation models' execution accuracy in the real world. However, most models collapse to $\approx$ 0% success, while WoW maintains a 40.74% success rate. These findings point to a noticeable gap between the generated videos and the real world, highlighting the urgency and necessity of benchmarking World Model in Embodied AI.
CLAP: Contrastive Latent Action Pretraining for Learning Vision-Language-Action Models from Human Videos
Generalist Vision-Language-Action models are currently hindered by the scarcity of robotic data compared to the abundance of human video demonstrations. Existing Latent Action Models attempt to leverage video data but often suffer from visual entanglement, capturing noise rather than manipulation skills. To address this, we propose Contrastive Latent Action Pretraining (CLAP), a framework that aligns the visual latent space from videos with a proprioceptive latent space from robot trajectories. By employing contrastive learning, CLAP maps video transitions onto a quantized, physically executable codebook. Building on this representation, we introduce a dual-formulation VLA framework offering both CLAP-NTP, an autoregressive model excelling at instruction following and object generalization, and CLAP-RF, a Rectified Flow-based policy designed for high-frequency, precise manipulation. Furthermore, we propose a Knowledge Matching (KM) regularization strategy to mitigate catastrophic forgetting during fine-tuning. Extensive experiments demonstrate that CLAP significantly outperforms strong baselines, enabling the effective transfer of skills from human videos to robotic execution. Project page: https://lin-shan.com/CLAP/.
comment: Project page: https://lin-shan.com/CLAP/
Stable Language Guidance for Vision-Language-Action Models
Vision-Language-Action (VLA) models have demonstrated impressive capabilities in generalized robotic control; however, they remain notoriously brittle to linguistic perturbations. We identify a critical ``modality collapse'' phenomenon where strong visual priors overwhelm sparse linguistic signals, causing agents to overfit to specific instruction phrasings while ignoring the underlying semantic intent. To address this, we propose \textbf{Residual Semantic Steering (RSS)}, a probabilistic framework that disentangles physical affordance from semantic execution. RSS introduces two theoretical innovations: (1) \textbf{Monte Carlo Syntactic Integration}, which approximates the true semantic posterior via dense, LLM-driven distributional expansion, and (2) \textbf{Residual Affordance Steering}, a dual-stream decoding mechanism that explicitly isolates the causal influence of language by subtracting the visual affordance prior. Theoretical analysis suggests that RSS effectively maximizes the mutual information between action and intent while suppressing visual distractors. Empirical results across diverse manipulation benchmarks demonstrate that RSS achieves state-of-the-art robustness, maintaining performance even under adversarial linguistic perturbations.
CoINS: Counterfactual Interactive Navigation via Skill-Aware VLM
Recent Vision-Language Models (VLMs) have demonstrated significant potential in robotic planning. However, they typically function as semantic reasoners, lacking an intrinsic understanding of the specific robot's physical capabilities. This limitation is particularly critical in interactive navigation, where robots must actively modify cluttered environments to create traversable paths. Existing VLM-based navigators are predominantly confined to passive obstacle avoidance, failing to reason about when and how to interact with objects to clear blocked paths. To bridge this gap, we propose Counterfactual Interactive Navigation via Skill-aware VLM (CoINS), a hierarchical framework that integrates skill-aware reasoning and robust low-level execution. Specifically, we fine-tune a VLM, named InterNav-VLM, which incorporates skill affordance and concrete constraint parameters into the input context and grounds them into a metric-scale environmental representation. By internalizing the logic of counterfactual reasoning through fine-tuning on the proposed InterNav dataset, the model learns to implicitly evaluate the causal effects of object removal on navigation connectivity, thereby determining interaction necessity and target selection. To execute the generated high-level plans, we develop a comprehensive skill library through reinforcement learning, specifically introducing traversability-oriented strategies to manipulate diverse objects for path clearance. A systematic benchmark in Isaac Sim is proposed to evaluate both the reasoning and execution aspects of interactive navigation. Extensive simulations and real-world experiments demonstrate that CoINS significantly outperforms representative baselines, achieving a 17\% higher overall success rate and over 80\% improvement in complex long-horizon scenarios compared to the best-performing baseline
comment: 17 pages, 13 figures
An Event-Based Opto-Tactile Skin
This paper presents a neuromorphic, event-driven tactile sensing system for soft, large-area skin, based on the Dynamic Vision Sensors (DVS) integrated with a flexible silicone optical waveguide skin. Instead of repetitively scanning embedded photoreceivers, this design uses a stereo vision setup comprising two DVS cameras looking sideways through the skin. Such a design produces events as changes in brightness are detected, and estimates press positions on the 2D skin surface through triangulation, utilizing Density-Based Spatial Clustering of Applications with Noise (DBSCAN) to find the center of mass of contact events resulting from pressing actions. The system is evaluated over a 4620 mm2 probed area of the skin using a meander raster scan. Across 95 % of the presses visible to both cameras, the press localization achieved a Root-Mean-Squared Error (RMSE) of 4.66 mm. The results highlight the potential of this approach for wide-area flexible and responsive tactile sensors in soft robotics and interactive environments. Moreover, we examined how the system performs when the amount of event data is strongly reduced. Using stochastic down-sampling, the event stream was reduced to 1/1024 of its original size. Under this extreme reduction, the average localization error increased only slightly (from 4.66 mm to 9.33 mm), and the system still produced valid press localizations for 85 % of the trials. This reduction in pass rate is expected, as some presses no longer produce enough events to form a reliable cluster for triangulation. These results show that the sensing approach remains functional even with very sparse event data, which is promising for reducing power consumption and computational load in future implementations. The system exhibits a detection latency distribution with a characteristic width of 31 ms.
comment: Accepted for publication in Frontiers in Neuromorphic Engineering. 23 pages, 9 figures
Towards Safe Autonomous Driving: A Real-Time Motion Planning Algorithm on Embedded Hardware
Ensuring the functional safety of Autonomous Vehicles (AVs) requires motion planning modules that not only operate within strict real-time constraints but also maintain controllability in case of system faults. Existing safeguarding concepts, such as Online Verification (OV), provide safety layers that detect infeasible planning outputs. However, they lack an active mechanism to ensure safe operation in the event that the main planner fails. This paper presents a first step toward an active safety extension for fail-operational Autonomous Driving (AD). We deploy a lightweight sampling-based trajectory planner on an automotive-grade, embedded platform running a Real-Time Operating System (RTOS). The planner continuously computes trajectories under constrained computational resources, forming the foundation for future emergency planning architectures. Experimental results demonstrate deterministic timing behavior with bounded latency and minimal jitter, validating the feasibility of trajectory planning on safety-certifiable hardware. The study highlights both the potential and the remaining challenges of integrating active fallback mechanisms as an integral part of next-generation safeguarding frameworks. The code is available at: https://github.com/TUM-AVS/real-time-motion-planning
comment: 7 pages, submitted to the IEEE Intelligent Vehicles Symposium (IV 2026), Detroit, MI, United States
Bayesian Monocular Depth Refinement via Neural Radiance Fields
Monocular depth estimation has applications in many fields, such as autonomous navigation and extended reality, making it an essential computer vision task. However, current methods often produce smooth depth maps that lack the fine geometric detail needed for accurate scene understanding. We propose MDENeRF, an iterative framework that refines monocular depth estimates using depth information from Neural Radiance Fields (NeRFs). MDENeRF consists of three components: (1) an initial monocular estimate for global structure, (2) a NeRF trained on perturbed viewpoints, with per-pixel uncertainty, and (3) Bayesian fusion of the noisy monocular and NeRF depths. We derive NeRF uncertainty from the volume rendering process to iteratively inject high-frequency fine details. Meanwhile, our monocular prior maintains global structure. We demonstrate superior performance on key metrics and experiments using indoor scenes from the SUN RGB-D dataset.
comment: IEEE 8th International Conference on Algorithms, Computing and Artificial Intelligence (ACAI 2025). Oral presentation; Best Presenter Award
Integrating Sample Inheritance into Bayesian Optimization for Evolutionary Robotics
In evolutionary robotics, robot morphologies are designed automatically using evolutionary algorithms. This creates a body-brain optimization problem, where both morphology and control must be optimized together. A common approach is to include controller optimization for each morphology, but starting from scratch for every new body may require a high controller learning budget. We address this by using Bayesian optimization for controller optimization, exploiting its sample efficiency and strong exploration capabilities, and using sample inheritance as a form of Lamarckian inheritance. Under a deliberately low controller learning budget for each morphology, we investigate two types of sample inheritance: (1) transferring all the parent's samples to the offspring to be used as prior without evaluating them, and (2) reevaluating the parent's best samples on the offspring. Both are compared to a baseline without inheritance. Our results show that reevaluation performs best, with prior-based inheritance also outperforming no inheritance. Analysis reveals that while the learning budget is too low for a single morphology, generational inheritance compensates for this by accumulating learned adaptations across generations. Furthermore, inheritance mainly benefits offspring morphologies that are similar to their parents. Finally, we demonstrate the critical role of the environment, with more challenging environments resulting in more stable walking gaits. Our findings highlight that inheritance mechanisms can boost performance in evolutionary robotics without needing large learning budgets, offering an efficient path toward more capable robot design.
Generational Replacement and Learning for High-Performing and Diverse Populations in Evolvable Robots
Evolutionary Robotics offers the possibility to design robots to solve a specific task automatically by optimizing their morphology and control together. However, this co-optimization of body and control is challenging, because controllers need some time to adapt to the evolving morphology - which may make it difficult for new and promising designs to enter the evolving population. A solution to this is to add intra-life learning, defined as an additional controller optimization loop, to each individual in the evolving population. A related problem is the lack of diversity often seen in evolving populations as evolution narrows the search down to a few promising designs too quickly. This problem can be mitigated by implementing full generational replacement, where offspring robots replace the whole population. This solution for increasing diversity usually comes at the cost of lower performance compared to using elitism. In this work, we show that combining such generational replacement with intra-life learning can increase diversity while retaining performance. We also highlight the importance of performance metrics when studying learning in morphologically evolving robots, showing that evaluating according to function evaluations versus according to generations of evolution can give different conclusions.
PointWorld: Scaling 3D World Models for In-The-Wild Robotic Manipulation
Humans anticipate, from a glance and a contemplated action of their bodies, how the 3D world will respond, a capability that is equally vital for robotic manipulation. We introduce PointWorld, a large pre-trained 3D world model that unifies state and action in a shared 3D space as 3D point flows: given one or few RGB-D images and a sequence of low-level robot action commands, PointWorld forecasts per-pixel displacements in 3D that respond to the given actions. By representing actions as 3D point flows instead of embodiment-specific action spaces (e.g., joint positions), this formulation directly conditions on physical geometries of robots while seamlessly integrating learning across embodiments. To train our 3D world model, we curate a large-scale dataset spanning real and simulated robotic manipulation in open-world environments, enabled by recent advances in 3D vision and simulated environments, totaling about 2M trajectories and 500 hours across a single-arm Franka and a bimanual humanoid. Through rigorous, large-scale empirical studies of backbones, action representations, learning objectives, partial observability, data mixtures, domain transfers, and scaling, we distill design principles for large-scale 3D world modeling. With a real-time (0.1s) inference speed, PointWorld can be efficiently integrated in the model-predictive control (MPC) framework for manipulation. We demonstrate that a single pre-trained checkpoint enables a real-world Franka robot to perform rigid-body pushing, deformable and articulated object manipulation, and tool use, without requiring any demonstrations or post-training and all from a single image captured in-the-wild. Project website at https://point-world.github.io/.
Dual-Attention Heterogeneous GNN for Multi-robot Collaborative Area Search via Deep Reinforcement Learning
In multi-robot collaborative area search, a key challenge is to dynamically balance the two objectives of exploring unknown areas and covering specific targets to be rescued. Existing methods are often constrained by homogeneous graph representations, thus failing to model and balance these distinct tasks. To address this problem, we propose a Dual-Attention Heterogeneous Graph Neural Network (DA-HGNN) trained using deep reinforcement learning. Our method constructs a heterogeneous graph that incorporates three entity types: robot nodes, frontier nodes, and interesting nodes, as well as their historical states. The dual-attention mechanism comprises the relational-aware attention and type-aware attention operations. The relational-aware attention captures the complex spatio-temporal relationships among robots and candidate goals. Building on this relational-aware heterogeneous graph, the type-aware attention separately computes the relevance between robots and each goal type (frontiers vs. points of interest), thereby decoupling the exploration and coverage from the unified tasks. Extensive experiments conducted in interactive 3D scenarios within the iGibson simulator, leveraging the Gibson and MatterPort3D datasets, validate the superior scalability and generalization capability of the proposed approach.
Systematic Evaluation of Depth Backbones and Semantic Cues for Monocular Pseudo-LiDAR 3D Detection
Monocular 3D object detection offers a low-cost alternative to LiDAR, yet remains less accurate due to the difficulty of estimating metric depth from a single image. We systematically evaluate how depth backbones and feature engineering affect a monocular Pseudo-LiDAR pipeline on the KITTI validation split. Specifically, we compare NeWCRFs (supervised metric depth) against Depth Anything V2 Metric-Outdoor (Base) under an identical pseudo-LiDAR generation and PointRCNN detection protocol. NeWCRFs yields stronger downstream 3D detection, achieving 10.50\% AP$_{3D}$ at IoU$=0.7$ on the Moderate split using grayscale intensity (Exp~2). We further test point-cloud augmentations using appearance cues (grayscale intensity) and semantic cues (instance segmentation confidence). Contrary to the expectation that semantics would substantially close the gap, these features provide only marginal gains, and mask-based sampling can degrade performance by removing contextual geometry. Finally, we report a depth-accuracy-versus-distance diagnostic using ground-truth 2D boxes (including Ped/Cyc), highlighting that coarse depth correctness does not fully predict strict 3D IoU. Overall, under an off-the-shelf LiDAR detector, depth-backbone choice and geometric fidelity dominate performance, outweighing secondary feature injection.
comment: 7 pages, 4 figures
Locomotion Beyond Feet
Most locomotion methods for humanoid robots focus on leg-based gaits, yet natural bipeds frequently rely on hands, knees, and elbows to establish additional contacts for stability and support in complex environments. This paper introduces Locomotion Beyond Feet, a comprehensive system for whole-body humanoid locomotion across extremely challenging terrains, including low-clearance spaces under chairs, knee-high walls, knee-high platforms, and steep ascending and descending stairs. Our approach addresses two key challenges: contact-rich motion planning and generalization across diverse terrains. To this end, we combine physics-grounded keyframe animation with reinforcement learning. Keyframes encode human knowledge of motor skills, are embodiment-specific, and can be readily validated in simulation or on hardware, while reinforcement learning transforms these references into robust, physically accurate motions. We further employ a hierarchical framework consisting of terrain-specific motion-tracking policies, failure recovery mechanisms, and a vision-based skill planner. Real-world experiments demonstrate that Locomotion Beyond Feet achieves robust whole-body locomotion and generalizes across obstacle sizes, obstacle instances, and terrain sequences.
comment: Project website: https://locomotion-beyond-feet.github.io/
From Score to Sound: An End-to-End MIDI-to-Motion Pipeline for Robotic Cello Performance
Robot musicians require precise control to obtain proper note accuracy, sound quality, and musical expression. Performance of string instruments, such as violin and cello, presents a significant challenge due to the precise control required over bow angle and pressure to produce the desired sound. While prior robotic cellists focus on accurate bowing trajectories, these works often rely on expensive motion capture techniques, and fail to sightread music in a human-like way. We propose a novel end-to-end MIDI score to robotic motion pipeline which converts musical input directly into collision-aware bowing motions for a UR5e robot cellist. Through use of Universal Robot Freedrive feature, our robotic musician can achieve human-like sound without the need for motion capture. Additionally, this work records live joint data via Real-Time Data Exchange (RTDE) as the robot plays, providing labeled robotic playing data from a collection of five standard pieces to the research community. To demonstrate the effectiveness of our method in comparison to human performers, we introduce the Musical Turing Test, in which a collection of 132 human participants evaluate our robot's performance against a human baseline. Human reference recordings are also released, enabling direct comparison for future studies. This evaluation technique establishes the first benchmark for robotic cello performance. Finally, we outline a residual reinforcement learning methodology to improve upon baseline robotic controls, highlighting future opportunities for improved string-crossing efficiency and sound quality.
A Reinforcement Learning-Based Model for Mapping and Goal-Directed Navigation Using Multiscale Place Fields
Autonomous navigation in complex and partially observable environments remains a central challenge in robotics. Several bio-inspired models of mapping and navigation based on place cells in the mammalian hippocampus have been proposed. This paper introduces a new robust model that employs parallel layers of place fields at multiple spatial scales, a replay-based reward mechanism, and dynamic scale fusion. Simulations show that the model improves path efficiency and accelerates learning compared to single-scale baselines, highlighting the value of multiscale spatial representations for adaptive robot navigation.
comment: 11 pages, 8 figures. Submitted to IEEE Transactions on Cognitive and Developmental Systems
A Vision-Language-Action Model with Visual Prompt for OFF-Road Autonomous Driving
Efficient trajectory planning in off-road terrains presents a formidable challenge for autonomous vehicles, often necessitating complex multi-step pipelines. However, traditional approaches exhibit limited adaptability in dynamic environments. To address these limitations, this paper proposes OFF-EMMA, a novel end-to-end multimodal framework designed to overcome the deficiencies of insufficient spatial perception and unstable reasoning in visual-language-action (VLA) models for off-road autonomous driving scenarios. The framework explicitly annotates input images through the design of a visual prompt block and introduces a chain-of-thought with self-consistency (COT-SC) reasoning strategy to enhance the accuracy and robustness of trajectory planning. The visual prompt block utilizes semantic segmentation masks as visual prompts, enhancing the spatial understanding ability of pre-trained visual-language models for complex terrains. The COT- SC strategy effectively mitigates the error impact of outliers on planning performance through a multi-path reasoning mechanism. Experimental results on the RELLIS-3D off-road dataset demonstrate that OFF-EMMA significantly outperforms existing methods, reducing the average L2 error of the Qwen backbone model by 13.3% and decreasing the failure rate from 16.52% to 6.56%.
Transformer-based Multi-agent Reinforcement Learning for Separation Assurance in Structured and Unstructured Airspaces
Conventional optimization-based metering depends on strict adherence to precomputed schedules, which limits the flexibility required for the stochastic operations of Advanced Air Mobility (AAM). In contrast, multi-agent reinforcement learning (MARL) offers a decentralized, adaptive framework that can better handle uncertainty, required for safe aircraft separation assurance. Despite this advantage, current MARL approaches often overfit to specific airspace structures, limiting their adaptability to new configurations. To improve generalization, we recast the MARL problem in a relative polar state space and train a transformer encoder model across diverse traffic patterns and intersection angles. The learned model provides speed advisories to resolve conflicts while maintaining aircraft near their desired cruising speeds. In our experiments, we evaluated encoder depths of 1, 2, and 3 layers in both structured and unstructured airspaces, and found that a single encoder configuration outperformed deeper variants, yielding near-zero near mid-air collision rates and shorter loss-of-separation infringements than the deeper configurations. Additionally, we showed that the same configuration outperforms a baseline model designed purely with attention. Together, our results suggest that the newly formulated state representation, novel design of neural network architecture, and proposed training strategy provide an adaptable and scalable decentralized solution for aircraft separation assurance in both structured and unstructured airspaces.
comment: 9 pages, 4 figures, 4 tables. Presented at SESAR Innovation Days 2025
Enhanced-FQL($λ$), an Efficient and Interpretable RL with novel Fuzzy Eligibility Traces and Segmented Experience Replay
This paper introduces a fuzzy reinforcement learning framework, Enhanced-FQL($λ$), that integrates novel Fuzzified Eligibility Traces (FET) and Segmented Experience Replay (SER) into fuzzy Q-learning with Fuzzified Bellman Equation (FBE) for continuous control tasks. The proposed approach employs an interpretable fuzzy rule base instead of complex neural architectures, while maintaining competitive performance through two key innovations: a fuzzified Bellman equation with eligibility traces for stable multi-step credit assignment, and a memory-efficient segment-based experience replay mechanism for enhanced sample efficiency. Theoretical analysis proves the proposed method convergence under standard assumptions. Extensive evaluations in continuous control domains demonstrate that Enhanced-FQL($λ$) achieves superior sample efficiency and reduced variance compared to n-step fuzzy TD and fuzzy SARSA($λ$) baselines, while maintaining substantially lower computational complexity than deep RL alternatives such as DDPG. The framework's inherent interpretability, combined with its computational efficiency and theoretical convergence guarantees, makes it particularly suitable for safety-critical applications where transparency and resource constraints are essential.
comment: Submitted to ECC26 conference
UNIC: Learning Unified Multimodal Extrinsic Contact Estimation
Contact-rich manipulation requires reliable estimation of extrinsic contacts-the interactions between a grasped object and its environment which provide essential contextual information for planning, control, and policy learning. However, existing approaches often rely on restrictive assumptions, such as predefined contact types, fixed grasp configurations, or camera calibration, that hinder generalization to novel objects and deployment in unstructured environments. In this paper, we present UNIC, a unified multimodal framework for extrinsic contact estimation that operates without any prior knowledge or camera calibration. UNIC directly encodes visual observations in the camera frame and integrates them with proprioceptive and tactile modalities in a fully data-driven manner. It introduces a unified contact representation based on scene affordance maps that captures diverse contact formations and employs a multimodal fusion mechanism with random masking, enabling robust multimodal representation learning. Extensive experiments demonstrate that UNIC performs reliably. It achieves a 9.6 mm average Chamfer distance error on unseen contact locations, performs well on unseen objects, remains robust under missing modalities, and adapts to dynamic camera viewpoints. These results establish extrinsic contact estimation as a practical and versatile capability for contact-rich manipulation.
Autonomous Reasoning for Spacecraft Control: A Large Language Model Framework with Group Relative Policy Optimization
This paper presents a learning-based guidance-and-control approach that couples a reasoning-enabled Large Language Model (LLM) with Group Relative Policy Optimization (GRPO). A two-stage procedure consisting of Supervised Fine-Tuning (SFT) to learn formatting and control primitives, followed by GRPO for interaction-driven policy improvement, trains controllers for each environment. The framework is demonstrated on four control problems spanning a gradient of dynamical complexity, from canonical linear systems through nonlinear oscillatory dynamics to three-dimensional spacecraft attitude control with gyroscopic coupling and thrust constraints. Results demonstrate that an LLM with explicit reasoning, optimized via GRPO, can synthesize feasible stabilizing policies under consistent training settings across both linear and nonlinear systems. The two-stage training methodology enables models to generate control sequences while providing human-readable explanations of their decision-making process. This work establishes a foundation for applying GRPO-based reasoning to autonomous control systems, with potential applications in aerospace and other safety-critical domains.
Online Action-Stacking Improves Reinforcement Learning Performance for Air Traffic Control
We introduce online action-stacking, an inference-time wrapper for reinforcement learning policies that produces realistic air traffic control commands while allowing training on a much smaller discrete action space. Policies are trained with simple incremental heading or level adjustments, together with an action-damping penalty that reduces instruction frequency and leads agents to issue commands in short bursts. At inference, online action-stacking compiles these bursts of primitive actions into domain-appropriate compound clearances. Using Proximal Policy Optimisation and the BluebirdDT digital twin platform, we train agents to navigate aircraft along lateral routes, manage climb and descent to target flight levels, and perform two-aircraft collision avoidance under a minimum separation constraint. In our lateral navigation experiments, action stacking greatly reduces the number of issued instructions relative to a damped baseline and achieves comparable performance to a policy trained with a 37-dimensional action space, despite operating with only five actions. These results indicate that online action-stacking helps bridge a key gap between standard reinforcement learning formulations and operational ATC requirements, and provides a simple mechanism for scaling to more complex control scenarios.
Correcting Autonomous Driving Object Detection Misclassifications with Automated Commonsense Reasoning
Autonomous Vehicle (AV) technology has been heavily researched and sought after, yet there are no SAE Level 5 AVs available today in the marketplace. We contend that over-reliance on machine learning technology is the main reason. Use of automated commonsense reasoning technology, we believe, can help achieve SAE Level 5 autonomy. In this paper, we show how automated common-sense reasoning technology can be deployed in situations where there are not enough data samples available to train a deep learning-based AV model that can handle certain abnormal road scenarios. Specifically, we consider two situations where (i) a traffic signal is malfunctioning at an intersection and (ii) all the cars ahead are slowing down and steering away due to an unexpected obstruction (e.g., animals on the road). We show that in such situations, our commonsense reasoning-based solution accurately detects traffic light colors and obstacles not correctly captured by the AV's perception model. We also provide a pathway for efficiently invoking commonsense reasoning by measuring uncertainty in the computer vision model and using commonsense reasoning to handle uncertain scenarios. We describe our experiments conducted using the CARLA simulator and the results obtained. The main contribution of our research is to show that automated commonsense reasoning effectively corrects AV-based object detection misclassifications and that hybrid models provide an effective pathway to improving AV perception.
comment: In Proceedings ICLP 2025, arXiv:2601.00047
Supercomputing for High-speed Avoidance and Reactive Planning in Robots
This paper presents SHARP (Supercomputing for High-speed Avoidance and Reactive Planning), a proof-of-concept study demonstrating how high-performance computing (HPC) can enable millisecond-scale responsiveness in robotic control. While modern robots face increasing demands for reactivity in human-robot shared workspaces, onboard processors are constrained by size, power, and cost. Offloading to HPC offers massive parallelism for trajectory planning, but its feasibility for real-time robotics remains uncertain due to network latency and jitter. We evaluate SHARP in a stress-test scenario where a 7-DOF manipulator must dodge high-speed foam projectiles. Using a hash-distributed multi-goal A* search implemented with MPI on both local and remote HPC clusters, the system achieves mean planning latencies of 22.9 ms (local) and 30.0 ms (remote, ~300 km away), with avoidance success rates of 84% and 88%, respectively. These results show that when round-trip latency remains within the tens-of-milliseconds regime, HPC-side computation is no longer the bottleneck, enabling avoidance well below human reaction times. The SHARP results motivate hybrid control architectures: low-level reflexes remain onboard for safety, while bursty, high-throughput planning tasks are offloaded to HPC for scalability. By reporting per-stage timing and success rates, this study provides a reproducible template for assessing real-time feasibility of HPC-driven robotics. Collectively, SHARP reframes HPC offloading as a viable pathway toward dependable, reactive robots in dynamic environments.
comment: Error in the graph size calculation, recalculated and resubmitted
Uncertainty-Aware Robotic World Model Makes Offline Model-Based Reinforcement Learning Work on Real Robots
Reinforcement Learning (RL) has achieved impressive results in robotics, yet high-performing pipelines remain highly task-specific, with little reuse of prior data. Offline Model-based RL (MBRL) offers greater data efficiency by training policies entirely from existing datasets, but suffers from compounding errors and distribution shift in long-horizon rollouts. Although existing methods have shown success in controlled simulation benchmarks, robustly applying them to the noisy, biased, and partially observed datasets typical of real-world robotics remains challenging. We present a principled pipeline for making offline MBRL effective on physical robots. Our RWM-U extends autoregressive world models with epistemic uncertainty estimation, enabling temporally consistent multi-step rollouts with uncertainty effectively propagated over long horizons. We combine RWM-U with MOPO-PPO, which adapts uncertainty-penalized policy optimization to the stable, on-policy PPO framework for real-world control. We evaluate our approach on diverse manipulation and locomotion tasks in simulation and on real quadruped and humanoid, training policies entirely from offline datasets. The resulting policies consistently outperform model-free and uncertainty-unaware model-based baselines, and fusing real-world data in model learning further yields robust policies that surpass online model-free baselines trained solely in simulation.
Real-time Velocity Profile Optimization for Time-Optimal Maneuvering with Generic Acceleration Constraints
The computation of time-optimal velocity profiles along prescribed paths, subject to generic acceleration constraints, is a crucial problem in robot trajectory planning, with particular relevance to autonomous racing. However, the existing methods either support arbitrary acceleration constraints at high computational cost or use conservative box constraints for computational efficiency. We propose FBGA, a new \underline{F}orward-\underline{B}ackward algorithm with \underline{G}eneric \underline{A}cceleration constraints, which achieves both high accuracy and low computation time. FBGA operates forward and backward passes to maximize the velocity profile in short, discretized path segments, while satisfying user-defined performance limits. Tested on five racetracks and two vehicle classes, FBGA handles complex, non-convex acceleration constraints with custom formulations. Its maneuvers and lap times closely match optimal control baselines (within $0.11\%$-$0.36\%$), while being up to three orders of magnitude faster. FBGA maintains high accuracy even with coarse discretization, making it well-suited for online multi-query trajectory planning. Our open-source \texttt{C++} implementation is available at: https://anonymous.4open.science/r/FB_public_RAL.
The Combined Problem of Online Task Assignment and Lifelong Path Finding in Logistics Warehouses: Rule-Based Systems Matter
We study the combined problem of online task assignment and lifelong path finding, which is crucial for the logistics industries. However, most literature either (1) focuses on lifelong path finding assuming a given task assigner, or (2) studies the offline version of this problem where tasks are known in advance. We argue that, to maximize the system throughput, the online version that integrates these two components should be tackled directly. To this end, we introduce a formal framework of the combined problem and its solution concept. Then, we design a rule-based lifelong planner under a practical robot model that works well even in environments with severe local congestion. Upon that, we automate the search for the task assigner with respect to the underlying path planner. Simulation experiments conducted in warehouse scenarios at Meituan, one of the largest shopping platforms in China, demonstrate that (a)in terms of time efficiency, our system requires only 83.77% of the execution time needed for the currently deployed system at Meituan, outperforming other SOTA algorithms by 8.09%; (b)in terms of economic efficiency, ours can achieve the same throughput with only 60% of the agents currently in use. The code and demos are available at https://github.com/Fernadoo/Online-TAPF.
comment: In Proceedings ICLP 2025, arXiv:2601.00047
Tackling the Kidnapped Robot Problem via Sparse Feasible Hypothesis Sampling and Reliable Batched Multi-Stage Inference
This paper addresses the Kidnapped Robot Problem (KRP), a core localization challenge of relocalizing a robot in a known map without prior pose estimate when localization loss or at SLAM initialization. For this purpose, a passive 2-D global relocalization framework is proposed. It estimates the global pose efficiently and reliably from a single LiDAR scan and an occupancy grid map while the robot remains stationary, thereby enhancing the long-term autonomy of mobile robots. The proposed framework casts global relocalization as a non-convex problem and solves it via the multi-hypothesis scheme with batched multi-stage inference and early termination, balancing completeness and efficiency. The Rapidly-exploring Random Tree (RRT), under traversability constraints, asymptotically covers the reachable space to generate sparse, uniformly distributed feasible positional hypotheses, fundamentally reducing the sampling space. The hypotheses are preliminarily ordered by the proposed Scan Mean Absolute Difference (SMAD), a coarse beam-error level metric that facilitates the early termination by prioritizing high-likelihood candidates. The SMAD computation is optimized for non-panoramic scans. The Translation-Affinity Scan-to-Map Alignment Metric (TAM) is proposed for reliable orientation selection at hypothesized positions and accurate final pose evaluation to mitigate degradation in conventional likelihood-field metrics under translational uncertainty induced by sparse hypotheses, as well as non-panoramic LiDAR scan and environmental changes. Real-world experiments on a resource-constrained mobile robot with non-panoramic LiDAR scans show that the proposed framework achieves competitive performance in both global relocalization success rate and computational efficiency.
comment: 10 pages, 8 figures. This work has been submitted to the IEEE for possible publication
Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail
End-to-end architectures trained via imitation learning have advanced autonomous driving by scaling model size and data, yet performance remains brittle in safety-critical long-tail scenarios where supervision is sparse and causal understanding is limited. We introduce Alpamayo-R1 (AR1), a vision-language-action model (VLA) that integrates Chain of Causation reasoning with trajectory planning for complex driving scenarios. Our approach features three key innovations: (1) the Chain of Causation (CoC) dataset, built through a hybrid auto-labeling and human-in-the-loop pipeline producing decision-grounded, causally linked reasoning traces aligned with driving behaviors; (2) a modular VLA architecture combining Cosmos-Reason, a vision-language model pre-trained for Physical AI, with a diffusion-based trajectory decoder that generates dynamically feasible trajectories in real time; (3) a multi-stage training strategy using supervised fine-tuning to elicit reasoning and reinforcement learning (RL) to enforce reasoning-action consistency and optimize reasoning quality. AR1 achieves up to a 12% improvement in planning accuracy on challenging cases compared to a trajectory-only baseline, with a 35% reduction in close encounter rate in closed-loop simulation. RL post-training improves reasoning quality by 45% and reasoning-action consistency by 37%. Model scaling from 0.5B to 7B parameters shows consistent improvements. On-vehicle road tests confirm real-time performance (99 ms latency) and successful urban deployment. By bridging interpretable reasoning with precise control, AR1 demonstrates a practical path towards Level 4 autonomous driving. Model weights are available at https://huggingface.co/nvidia/Alpamayo-R1-10B with inference code at https://github.com/NVlabs/alpamayo.
From Human Intention to Action Prediction: Intention-Driven End-to-End Autonomous Driving
While end-to-end autonomous driving has achieved remarkable progress in geometric control, current systems remain constrained by a command-following paradigm that relies on simple navigational instructions. Transitioning to genuinely intelligent agents requires the capability to interpret and fulfill high-level, abstract human intentions. However, this advancement is hindered by the lack of dedicated benchmarks and semantic-aware evaluation metrics. In this paper, we formally define the task of Intention-Driven End-to-End Autonomous Driving and present Intention-Drive, a comprehensive benchmark designed to bridge this gap. We construct a large-scale dataset featuring complex natural language intentions paired with high-fidelity sensor data. To overcome the limitations of conventional trajectory-based metrics, we introduce the Imagined Future Alignment (IFA), a novel evaluation protocol leveraging generative world models to assess the semantic fulfillment of human goals beyond mere geometric accuracy. Furthermore, we explore the solution space by proposing two distinct paradigms: an end-to-end vision-language planner and a hierarchical agent-based framework. The experiments reveal a critical dichotomy where existing models exhibit satisfactory driving stability but struggle significantly with intention fulfillment. Notably, the proposed frameworks demonstrate superior alignment with human intentions.
Physics-Driven Data Generation for Contact-Rich Manipulation via Trajectory Optimization
We present a low-cost data generation pipeline that integrates physics-based simulation, human demonstrations, and model-based planning to efficiently generate large-scale, high-quality datasets for contact-rich robotic manipulation tasks. Starting with a small number of embodiment-flexible human demonstrations collected in a virtual reality simulation environment, the pipeline refines these demonstrations using optimization-based kinematic retargeting and trajectory optimization to adapt them across various robot embodiments and physical parameters. This process yields a diverse, physically consistent dataset that enables cross-embodiment data transfer, and offers the potential to reuse legacy datasets collected under different hardware configurations or physical parameters. We validate the pipeline's effectiveness by training diffusion policies from the generated datasets for challenging contact-rich manipulation tasks across multiple robot embodiments, including a floating Allegro hand and bimanual robot arms. The trained policies are deployed zero-shot on hardware for bimanual iiwa arms, achieving high success rates with minimal human input. Project website: https://lujieyang.github.io/physicsgen/.
OmniNav: A Unified Framework for Prospective Exploration and Visual-Language Navigation
Embodied navigation presents a core challenge for intelligent robots, requiring the comprehension of visual environments, natural language instructions, and autonomous exploration. Existing models often fall short in offering a unified solution across diverse navigation paradigms, resulting in low success rates and limited generalization. We introduce OmniNav, a unified framework addressing instruct-goal, object-goal, point-goal navigation, and frontier-based exploration within a single architecture. Our approach features a lightweight, low-latency policy that accurately predicts continuous-space waypoints (coordinates and orientations). This policy surpasses action-chunk methods in precision and supports real-world deployment at control frequencies up to 5 Hz. Architecturally, OmniNav employs a fast-slow system design: a fast module generates waypoints using short-horizon visual context and subtasks, while a slow module performs deliberative planning with long-horizon observations and candidate frontiers to select subsequent subgoals and subtasks. This collaboration enhances path efficiency and maintains trajectory coherence, particularly in exploration and memory-intensive scenarios. Crucially, we identify that the primary bottleneck isn't merely navigation policy learning, but a robust understanding of general instructions and objects. To boost generalization, OmniNav integrates large-scale, general-purpose training datasets, including those for image captioning and visual recognition, into a joint multi-task regimen. This significantly improves success rates and robustness. Extensive experiments confirm OmniNav's state-of-the-art performance across various navigation benchmarks, with real-world deployment further validating its efficacy. OmniNav provides practical insights for embodied navigation, charting a scalable path towards versatile, highly generalizable robotic intelligence.
Real-time Sampling-based Model Predictive Control based on Reverse Kullback-Leibler Divergence and Its Adaptive Acceleration
Sampling-based model predictive control (MPC) has the potential for use in a wide variety of robotic systems. However, its unstable updates and poor convergence render it unsuitable for real-time control of robotic systems. This study addresses this challenge with a novel approach from reverse Kullback-Leibler divergence, which has a mode-seeking property and is likely to find one of the locally optimal solutions early. Using this approach, a weighted maximum likelihood estimation with positive and negative weights is obtained and solved using the mirror descent (MD) algorithm. Negative weights eliminate unnecessary actions, but a practical implementation needs to be designed to avoid interference with positive and negative updates based on rejection sampling. In addition, Nesterov's acceleration method for the proposed MD is modified to improve heuristic step size adaptive to the noise estimated in update amounts. Real-time simulations show that the proposed method can solve a wider variety of tasks statistically than the conventional method. In addition, higher degrees-of-freedom tasks can be solved by the improved acceleration even with a CPU only. The real-world applicability of the proposed method is also demonstrated by optimizing the operability in a variable impedance control of a force-driven mobile robot. https://youtu.be/D8bFMzct1XM
comment: 18 pages, 16 figures
Adaptive Anomaly Recovery for Telemanipulation: A Diffusion Model Approach to Vision-Based Tracking
Dexterous telemanipulation critically relies on the continuous and stable tracking of the human operator's commands to ensure robust operation. Vison-based tracking methods are widely used but have low stability due to anomalies such as occlusions, inadequate lighting, and loss of sight. Traditional filtering, regression, and interpolation methods are commonly used to compensate for explicit information such as angles and positions. These approaches are restricted to low-dimensional data and often result in information loss compared to the original high-dimensional image and video data. Recent advances in diffusion-based approaches, which can operate on high-dimensional data, have achieved remarkable success in video reconstruction and generation. However, these methods have not been fully explored in continuous control tasks in robotics. This work introduces the Diffusion-Enhanced Telemanipulation (DET) framework, which incorporates the Frame-Difference Detection (FDD) technique to identify and segment anomalies in video streams. These anomalous clips are replaced after reconstruction using diffusion models, ensuring robust telemanipulation performance under challenging visual conditions. We validated this approach in various anomaly scenarios and compared it with the baseline methods. Experiments show that DET achieves an average RMSE reduction of 17.2% compared to the cubic spline and 51.1% compared to FFT-based interpolation for different occlusion durations.
comment: This work has been submitted to the IEEE for possible publication
Nonholonomic Robot Parking by Feedback -- Part I: Modular Strict CLF Designs
It has been known in the robotics literature since about 1995 that, in polar coordinates, the nonholonomic unicycle is asymptotically stabilizable by smooth feedback, even globally. We introduce a modular design framework that selects the forward velocity to decouple the radial coordinate, allowing the steering subsystem to be stabilized independently. Within this structure, we develop families of feedback laws using passivity, backstepping, and integrator forwarding. Each law is accompanied by a strict control Lyapunov function, including barrier variants that enforce angular constraints. These strict CLFs provide constructive class KL convergence estimates and enable eigenvalue assignment at the target equilibrium. The framework generalizes and extends prior modular and nonmodular approaches, while preparing the ground for inverse optimal and adaptive redesigns in the sequel paper.
comment: arXiv admin note: text overlap with arXiv:2509.25575
Deep Reinforcement Learning for Bipedal Locomotion: A Brief Survey
Bipedal robots are gaining global recognition due to their potential applications and advancements in artificial intelligence, particularly through Deep Reinforcement Learning (DRL). While DRL has significantly advanced bipedal locomotion, the development of a unified framework capable of handling a wide range of tasks remains an ongoing challenge. This survey systematically categorises, compares, and analyses existing DRL frameworks for bipedal locomotion, organising them into end-to-end and hierarchical control schemes. End-to-end frameworks are evaluated based on their learning approaches, while hierarchical frameworks are examined in terms of layered structures that integrate learning-based or traditional model-based methods. We provide a detailed evaluation of the composition, strengths, limitations, and capabilities of each framework. Additionally, this survey identifies key research gaps and proposes future directions aimed at creating a more integrated and efficient framework for bipedal locomotion, with wide-ranging applications in real-world environments.
comment: Published in Artificial Intelligence Review
Enhancing Sampling-based Planning with a Library of Paths
Path planning for 3D solid objects is a challenging problem, requiring a search in a six-dimensional configuration space, which is, nevertheless, essential in many robotic applications such as bin-picking and assembly. The commonly used sampling-based planners, such as Rapidly-exploring Random Trees, struggle with narrow passages where the sampling probability is low, increasing the time needed to find a solution. In scenarios like robotic bin-picking, various objects must be transported through the same environment. However, traditional planners start from scratch each time, losing valuable information gained during the planning process. We address this by using a library of past solutions, allowing the reuse of previous experiences even when planning for a new, previously unseen object. Paths for a set of objects are stored, and when planning for a new object, we find the most similar one in the library and use its paths as approximate solutions, adjusting for possible mutual transformations. The configuration space is then sampled along the approximate paths. Our method is tested in various narrow passage scenarios and compared with state-of-the-art methods from the OMPL library. Results show significant speed improvements (up to 85% decrease in the required time) of our method, often finding a solution in cases where the other planners fail. Our implementation of the proposed method is released as an open-source package.
Generative Models From and For Sampling-Based MPC: A Bootstrapped Approach For Adaptive Contact-Rich Manipulation
We present a generative predictive control (GPC) framework that amortizes sampling-based Model Predictive Control (SPC) by bootstrapping it with conditional flow-matching models trained on SPC control sequences collected in simulation. Unlike prior work relying on iterative refinement or gradient-based solvers, we show that meaningful proposal distributions can be learned directly from noisy SPC data, enabling more efficient and informed sampling during online planning. We further demonstrate, for the first time, the application of this approach to real-world contact-rich loco-manipulation with a quadruped robot. Extensive experiments in simulation and on hardware show that our method improves sample efficiency, reduces planning horizon requirements, and generalizes robustly across task variations.
comment: 9 pages, 4 figures
Multiagent Systems
Introducing The Maximum Common Bigraph Problem
Bigraph reactive systems offer a powerful and flexible mathematical framework for modelling both spatial and non-spatial relationships between agents, with practical applications in domains such as smart technologies, networks, sensor systems, and biology. While bigraphs theoretically support the identification of bisimilar agents, by simulating and comparing their corresponding minimal contextual transition systems, no known algorithm exists for computing the maximum shared structure between two bigraphs, an essential prerequisite for determining the set of possible transitions for a given agent state. In this work, we provide a definition of the maximum common bigraph problem, and present an adaptation of the McSplit maximum common induced subgraph algorithm to compute the maximum common bigraph between two bigraph states. Our approach opens a path toward supporting bisimulation checking in bigraph-based tools, which have been leveraged in other modelling paradigms for simplification, optimisation, and verification of models.
comment: In Proceedings GCM 2025, arXiv:2601.03249
When Numbers Start Talking: Implicit Numerical Coordination Among LLM-Based Agents
LLMs-based agents increasingly operate in multi-agent environments where strategic interaction and coordination are required. While existing work has largely focused on individual agents or on interacting agents sharing explicit communication, less is known about how interacting agents coordinate implicitly. In particular, agents may engage in covert communication, relying on indirect or non-linguistic signals embedded in their actions rather than on explicit messages. This paper presents a game-theoretic study of covert communication in LLM-driven multi-agent systems. We analyse interactions across four canonical game-theoretic settings under different communication regimes, including explicit, restricted, and absent communication. Considering heterogeneous agent personalities and both one-shot and repeated games, we characterise when covert signals emerge and how they shape coordination and strategic outcomes.
A Chromatographic Process Design and Optimization Platform Powered by Large Language Models: A Case Application on Extract of Ginkgo Biloba Leaf
Chromatographic separation technology has been widely applied in pharmaceutical, chemical, and food industries due to its high efficiency. However, traditional human-dependent chromatographic process development faces challenges such as reliance on expert experience, long development cycles, and labor intensity. ChromR, a large language model (LLM)-driven platform for chromatographic process design and optimization, is presented in this work. The platform integrates ChromLLM, a domain-specific LLM trained for chromatography, along with a multi-agent system and an automated chromatographic experimental device. The multi-agent system comprises four agents: domain knowledge answering, experimental design, experimental execution, and data analysis. ChromR enables automatic completion of the entire workflow-including initial process parameter recommendation, experimental design, automated execution, data analysis, and multi-objective optimization. By utilizing ChromR, dependency on expert knowledge is effectively reduced, while labor input and development time are significantly decreased. Chromatographic purification of the extract of Ginkgo biloba leaf (EGBL) was selected as a case study. ChromR successfully developed a chromatographic process within one week that meets multiple objectives, including fraction quality and production efficiency, reducing development time to approximately one-seventh of that required by the conventional paradigm. An intelligent, automated, and universally applicable new paradigm was established for chromatographic process development.
Online Decision-Making Under Uncertainty for Vehicle-to-Building Systems
Vehicle-to-building (V2B) systems integrate physical infrastructures, such as smart buildings and electric vehicles (EVs) connected to chargers at the building, with digital control mechanisms to manage energy use. By utilizing EVs as flexible energy reservoirs, buildings can dynamically charge and discharge them to optimize energy use and cut costs under time-variable pricing and demand charge policies. This setup leads to the V2B optimization problem, where buildings coordinate EV charging and discharging to minimize total electricity costs while meeting users' charging requirements. However, the V2B optimization problem is challenging because of: (1) fluctuating electricity pricing, which includes both energy charges ($/kWh) and demand charges ($/kW); (2) long planning horizons (typically over 30 days); (3) heterogeneous chargers with varying charging rates, controllability, and directionality (i.e., unidirectional or bidirectional); and (4) user-specific battery levels at departure to ensure user requirements are met. In contrast to existing approaches that often model this setting as a single-shot combinatorial optimization problem, we highlight critical limitations in prior work and instead model the V2B optimization problem as a Markov decision process (MDP), i.e., a stochastic control process. Solving the resulting MDP is challenging due to the large state and action spaces. To address the challenges of the large state space, we leverage online search, and we counter the action space by using domain-specific heuristics to prune unpromising actions. We validate our approach in collaboration with Nissan Advanced Technology Center - Silicon Valley. Using data from their EV testbed, we show that the proposed framework significantly outperforms state-of-the-art methods.
comment: 17 pages, 2 figures, 10 tables. Published in the Proceedings of the 16th ACM/IEEE International Conference on Cyber-Physical Systems (ICCPS '25), May 06--09, 2025, Irvine, CA, USA
Transformer-based Multi-agent Reinforcement Learning for Separation Assurance in Structured and Unstructured Airspaces
Conventional optimization-based metering depends on strict adherence to precomputed schedules, which limits the flexibility required for the stochastic operations of Advanced Air Mobility (AAM). In contrast, multi-agent reinforcement learning (MARL) offers a decentralized, adaptive framework that can better handle uncertainty, required for safe aircraft separation assurance. Despite this advantage, current MARL approaches often overfit to specific airspace structures, limiting their adaptability to new configurations. To improve generalization, we recast the MARL problem in a relative polar state space and train a transformer encoder model across diverse traffic patterns and intersection angles. The learned model provides speed advisories to resolve conflicts while maintaining aircraft near their desired cruising speeds. In our experiments, we evaluated encoder depths of 1, 2, and 3 layers in both structured and unstructured airspaces, and found that a single encoder configuration outperformed deeper variants, yielding near-zero near mid-air collision rates and shorter loss-of-separation infringements than the deeper configurations. Additionally, we showed that the same configuration outperforms a baseline model designed purely with attention. Together, our results suggest that the newly formulated state representation, novel design of neural network architecture, and proposed training strategy provide an adaptable and scalable decentralized solution for aircraft separation assurance in both structured and unstructured airspaces.
comment: 9 pages, 4 figures, 4 tables. Presented at SESAR Innovation Days 2025
Human-in-the-Loop Testing of AI Agents for Air Traffic Control with a Regulated Assessment Framework
We present a rigorous, human-in-the-loop evaluation framework for assessing the performance of AI agents on the task of Air Traffic Control, grounded in a regulator-certified simulator-based curriculum used for training and testing real-world trainee controllers. By leveraging legally regulated assessments and involving expert human instructors in the evaluation process, our framework enables a more authentic and domain-accurate measurement of AI performance. This work addresses a critical gap in the existing literature: the frequent misalignment between academic representations of Air Traffic Control and the complexities of the actual operational environment. It also lays the foundations for effective future human-machine teaming paradigms by aligning machine performance with human assessment targets.
Online Action-Stacking Improves Reinforcement Learning Performance for Air Traffic Control
We introduce online action-stacking, an inference-time wrapper for reinforcement learning policies that produces realistic air traffic control commands while allowing training on a much smaller discrete action space. Policies are trained with simple incremental heading or level adjustments, together with an action-damping penalty that reduces instruction frequency and leads agents to issue commands in short bursts. At inference, online action-stacking compiles these bursts of primitive actions into domain-appropriate compound clearances. Using Proximal Policy Optimisation and the BluebirdDT digital twin platform, we train agents to navigate aircraft along lateral routes, manage climb and descent to target flight levels, and perform two-aircraft collision avoidance under a minimum separation constraint. In our lateral navigation experiments, action stacking greatly reduces the number of issued instructions relative to a damped baseline and achieves comparable performance to a policy trained with a 37-dimensional action space, despite operating with only five actions. These results indicate that online action-stacking helps bridge a key gap between standard reinforcement learning formulations and operational ATC requirements, and provides a simple mechanism for scaling to more complex control scenarios.
A Future Capabilities Agent for Tactical Air Traffic Control
Escalating air traffic demand is driving the adoption of automation to support air traffic controllers, but existing approaches face a trade-off between safety assurance and interpretability. Optimisation-based methods such as reinforcement learning offer strong performance but are difficult to verify and explain, while rules-based systems are transparent yet rarely check safety under uncertainty. This paper outlines Agent Mallard, a forward-planning, rules-based agent for tactical control in systemised airspace that embeds a stochastic digital twin directly into its conflict-resolution loop. Mallard operates on predefined GPS-guided routes, reducing continuous 4D vectoring to discrete choices over lanes and levels, and constructs hierarchical plans from an expert-informed library of deconfliction strategies. A depth-limited backtracking search uses causal attribution, topological plan splicing, and monotonic axis constraints to seek a complete safe plan for all aircraft, validating each candidate manoeuvre against uncertain execution scenarios (e.g., wind variation, pilot response, communication loss) before commitment. Preliminary walkthroughs with UK controllers and initial tests in the BluebirdDT airspace digital twin indicate that Mallard's behaviour aligns with expert reasoning and resolves conflicts in simplified scenarios. The architecture is intended to combine model-based safety assessment, interpretable decision logic, and tractable computational performance in future structured en-route environments.
Information Theoretic Optimal Surveillance for Epidemic Prevalence in Networks
Estimating the true prevalence of an epidemic outbreak is a key public health problem. This is challenging because surveillance is usually resource intensive and biased. In the network setting, prior work on cost sensitive disease surveillance has focused on choosing a subset of individuals (or nodes) to minimize objectives such as probability of outbreak detection. Such methods do not give insights into the outbreak size distribution which, despite being complex and multi-modal, is very useful in public health planning. We introduce TESTPREV, a problem of choosing a subset of nodes which maximizes the mutual information with disease prevalence, which directly provides information about the outbreak size distribution. We show that, under the independent cascade (IC) model, solutions computed by all prior disease surveillance approaches are highly sub-optimal for TESTPREV in general. We also show that TESTPREV is hard to even approximate. While this mutual information objective is computationally challenging for general networks, we show that it can be computed efficiently for various network classes. We present a greedy strategy, called GREEDYMI, that uses estimates of mutual information from cascade simulations and thus can be applied on any network and disease model. We find that GREEDYMI does better than natural baselines in terms of maximizing the mutual information as well as reducing the expected variance in outbreak size, under the IC model.
comment: 25 pages, 26 figures
Computing Universal Plans for Partially Observable Multi-Agent Routing Using Answer Set Programming
Multi-agent routing problems have gained significant attention recently due to their wide range of industrial applications, ranging from logistics warehouse automation to indoor service robots. Conventionally, they are modeled as classical planning problems. In this paper, we argue that it can be beneficial to formulate them as universal planning problems, particularly when the agents are autonomous entities and may encounter unforeseen situations. We therefore propose universal plans, also known as policies, as the solution concept, and implement a system based on Answer Set Programming (ASP) to compute them. Given an arbitrary two-dimensional map and a profile of goals for a group of partially observable agents, the system translates the problem configuration into logic programs and finds a feasible universal plan for each agent, mapping its observations to actions while ensuring that there are no collisions with other agents. We use the system to conduct experiments and obtain findings regarding the types of goal profiles and environments that lead to feasible policies, as well as how feasibility may depend on the agents' sensors. We also demonstrate how users can customize action preferences to compute more efficient policies, even (near-)optimal ones. The code is available at https://github.com/Fernadoo/MAPF_ASP.
comment: In Proceedings ICLP 2025, arXiv:2601.00047
The Combined Problem of Online Task Assignment and Lifelong Path Finding in Logistics Warehouses: Rule-Based Systems Matter
We study the combined problem of online task assignment and lifelong path finding, which is crucial for the logistics industries. However, most literature either (1) focuses on lifelong path finding assuming a given task assigner, or (2) studies the offline version of this problem where tasks are known in advance. We argue that, to maximize the system throughput, the online version that integrates these two components should be tackled directly. To this end, we introduce a formal framework of the combined problem and its solution concept. Then, we design a rule-based lifelong planner under a practical robot model that works well even in environments with severe local congestion. Upon that, we automate the search for the task assigner with respect to the underlying path planner. Simulation experiments conducted in warehouse scenarios at Meituan, one of the largest shopping platforms in China, demonstrate that (a)in terms of time efficiency, our system requires only 83.77% of the execution time needed for the currently deployed system at Meituan, outperforming other SOTA algorithms by 8.09%; (b)in terms of economic efficiency, ours can achieve the same throughput with only 60% of the agents currently in use. The code and demos are available at https://github.com/Fernadoo/Online-TAPF.
comment: In Proceedings ICLP 2025, arXiv:2601.00047
Web Fraud Attacks Against LLM-Driven Multi-Agent Systems
With the proliferation of LLM-driven multi-agent systems (MAS), the security of Web links has become a critical concern. Once MAS is induced to trust a malicious link, attackers can use it as a springboard to expand the attack surface. In this paper, we propose Web Fraud Attacks, a novel type of attack manipulating unique structures of web links to deceive MAS. We design 12 representative attack variants that encompass various methods, such as homoglyph deception, sub-directory nesting, and parameter obfuscation. Through extensive experiments on these attack vectors, we demonstrate that Web fraud attacks not only exhibit significant destructive potential across different MAS architectures but also possess a distinct advantage in evasion: they circumvent the need for complex input design, lowering the threshold for attacks significantly. These results underscore the importance of addressing Web fraud attacks, providing new insights into MAS safety. Our code is available at https://github.com/JiangYingEr/Web-Fraud-Attack-in-MAS.
SPIO: Ensemble and Selective Strategies via LLM-Based Multi-Agent Planning in Automated Data Science
Large Language Models (LLMs) have enabled dynamic reasoning in automated data analytics, yet recent multi-agent systems remain limited by rigid, single-path workflows that restrict strategic exploration and often lead to suboptimal outcomes. To overcome these limitations, we propose SPIO (Sequential Plan Integration and Optimization), a framework that replaces rigid workflows with adaptive, multi-path planning across four core modules: data preprocessing, feature engineering, model selection, and hyperparameter tuning. In each module, specialized agents generate diverse candidate strategies, which are cascaded and refined by an optimization agent. SPIO offers two operating modes: SPIO-S for selecting a single optimal pipeline, and SPIO-E for ensembling top-k pipelines to maximize robustness. Extensive evaluations on Kaggle and OpenML benchmarks show that SPIO consistently outperforms state-of-the-art baselines, achieving an average performance gain of 5.6%. By explicitly exploring and integrating multiple solution paths, SPIO delivers a more flexible, accurate, and reliable foundation for automated data science.
comment: Under Review
V-Agent: An Interactive Video Search System Using Vision-Language Models CIKM 2025
We introduce V-Agent, a novel multi-agent platform designed for advanced video search and interactive user-system conversations. By fine-tuning a vision-language model (VLM) with a small video preference dataset and enhancing it with a retrieval vector from an image-text retrieval model, we overcome the limitations of traditional text-based retrieval systems in multimodal scenarios. The VLM-based retrieval model independently embeds video frames and audio transcriptions from an automatic speech recognition (ASR) module into a shared multimodal representation space, enabling V-Agent to interpret both visual and spoken content for context-aware video search. This system consists of three agents-a routing agent, a search agent, and a chat agent-that work collaboratively to address user intents by refining search outputs and communicating with users. The search agent utilizes the VLM-based retrieval model together with an additional re-ranking module to further enhance video retrieval quality. Our proposed framework demonstrates state-of-the-art zero-shot performance on the MultiVENT 2.0 benchmark, highlighting its potential for both academic research and real-world applications. The retrieval model and demo videos are available at https://huggingface.co/NCSOFT/multimodal-embedding.
comment: CIKM 2025 MMGENSR Workshop
FinPos: A Position-Aware Trading Agent System for Real Financial Markets
The exceptional potential of large language models (LLMs) in handling text information has garnered significant attention in the field of financial trading. However, most existing trading agents operate under intraday, independent unit-based trading tasks, where decisions are made as isolated directional actions, and thus lack awareness of continuous position management. Therefore, we propose a position-aware trading task designed to simulate a more realistic market. To address this task, we propose FinPos, a position-aware trading agent system designed to explicitly model and manage continuous positions. FinPos enhances position awareness through three key mechanisms: (1) professional-level interpretation of heterogeneous market information; (2) a dual-agent decision structure that separates directional reasoning from risk-aware position adjustment; and (3) multi-timescale reward signals, allowing the agent to internalize position awareness through experiential feedback rather than static instructions alone. Extensive experiments demonstrate that FinPos surpasses state-of-the-art trading agents in the position-aware trading task, which closely mirrors real market conditions. More importantly, our findings reveal that LLM-centered agent systems exhibit a vast, largely unexplored potential in long-term market decision-making.
comment: LLM Applications, LLM Agents, Financial Technology
Enabling Agents to Communicate Entirely in Latent Space
While natural language is the de facto communication medium for LLM-based agents, it presents a fundamental constraint. The process of downsampling rich, internal latent states into discrete tokens inherently limits the depth and nuance of information that can be transmitted, thereby hindering collaborative problem-solving. Inspired by telepathy, which bypasses symbolic language in communication, we propose Interlat (Inter-agent Latent Space Communication), a paradigm that leverages the continuous last hidden states of an LLM as a representation of its thought for direct communication (termed latent communication). An additional learned compression process further compresses latent communication via latent space reasoning. Experiments demonstrate that Interlat outperforms both fine-tuned chain-of-thought (CoT) prompting and single-agent baselines, even across heterogeneous models, promoting more exploratory behavior and enabling genuine utilization of latent information. Further compression not only substantially accelerates inference by up to 24 times but also maintains competitive performance through an efficient information-preserving mechanism. We position this work as a feasibility study of entirely latent space inter-agent communication, and our results highlight its potential, offering valuable insights for future research.
comment: Work in progess
Computational Foundations for Strategic Coopetition: Formalizing Trust and Reputation Dynamics
Modern socio-technical systems increasingly involve multi-stakeholder environments where actors simultaneously cooperate and compete. These coopetitive relationships exhibit dynamic trust evolution based on observed behavior over repeated interactions. While conceptual modeling languages like i* represent trust relationships qualitatively, they lack computational mechanisms for analyzing how trust changes with behavioral evidence. Conversely, computational trust models from multi-agent systems provide algorithmic updating but lack grounding in conceptual models that capture strategic dependencies covering mixed motives of actors. This technical report bridges this gap by developing a computational trust model that extends game-theoretic foundations for strategic coopetition with dynamic trust evolution. Building on companion work that achieved 58/60 validation (96.7%) for logarithmic specifications, we introduce trust as a two-layer system with immediate trust responding to current behavior and reputation tracking violation history. Trust evolves through asymmetric updating where cooperation builds trust gradually while violations erode it sharply, creating hysteresis effects and trust ceilings that constrain relationship recovery. We develop a structured translation framework enabling practitioners to instantiate computational trust models from i* dependency networks encompassing mixed motives of actors. Comprehensive experimental validation across 78,125 parameter configurations establishes robust emergence of negativity bias, hysteresis effects, and cumulative damage amplification. Empirical validation using the Renault-Nissan Alliance case study (1999-2025) achieves 49/60 validation points (81.7%), successfully reproducing documented trust evolution across five distinct relationship phases including crisis and recovery periods.
comment: 57 pages, 20 figures. Second technical report in research program; should be read with foundational companion arXiv:2510.18802. Adapts and extends trustworthiness and reputation material from Pant (2021) doctoral dissertation, University of Toronto. Validation source code: https://github.com/vikpant/strategic-coopetition/tree/master/TR_validation/TR2_trust
When Identity Skews Debate: Anonymization for Bias-Reduced Multi-Agent Reasoning
Multi-agent debate (MAD) aims to improve large language model (LLM) reasoning by letting multiple agents exchange answers and then aggregate their opinions. Yet recent studies reveal that agents are not neutral: they are prone to identity-driven sycophancy and self-bias, uncritically adopting a peer's view or stubbornly adhering to their own prior output, undermining the reliability of debate. In this work, we present the first principled framework that joins sycophancy and self-bias to mitigate and quantify identity bias in MAD. First, we formalize the debate dynamics as an identity-weighted Bayesian update process. Second, we propose response anonymization: by removing identity markers from prompts, agents cannot distinguish "self" from "peer", which forces equal weights on agent identity, thereby reducing bias and improving trustworthiness. Third, we define the Identity Bias Coefficient (IBC), a principled bias metric that measures an agent's tendency to follow its peer versus itself. Empirical studies across multiple models and benchmarks confirm that identity bias is widespread, with sycophancy far more common than self-bias. Our findings highlight the need to ensure that MAD systems reason based on content rather than identity. Code is released in https://github.com/deeplearning-wisc/MAD-identity-bias.
Noncooperative Consensus via a Trading-based Auction
Noncooperative multi-agent systems often face coordination challenges due to conflicting preferences among agents. In particular, when agents act in their own self-interest, they may prefer different choices among multiple feasible outcomes, leading to suboptimal outcomes or even safety concerns. We propose an algorithm named trading auction for consensus (TACo), a decentralized approach that enables noncooperative agents to reach consensus without communicating directly or disclosing private valuations. TACo facilitates coordination through a structured trading-based auction, where agents iteratively select choices of interest and provably reach an agreement within an a priori bounded number of steps. A series of numerical experiments validate that the termination guarantees of TACo hold in practice, and show that TACo achieves a median performance that minimizes the total cost across all agents, while allocating resources significantly more fairly than baseline approaches.
Agent+P: Guiding UI Agents via Symbolic Planning
Large Language Model (LLM)-based UI agents show great promise for UI automation but often hallucinate in long-horizon tasks due to their lack of understanding of the global UI transition structure. To address this, we introduce AGENT+P, a novel framework that leverages symbolic planning to guide LLM-based UI agents. Specifically, we model an app's UI transition structure as a UI Transition Graph (UTG), which allows us to reformulate the UI automation task as a pathfinding problem on the UTG. This further enables an off-the-shelf symbolic planner to generate a provably correct and optimal high-level plan, preventing the agent from redundant exploration and guiding the agent to achieve the automation goals. AGENT+P is designed as a plug-and-play framework to enhance existing UI agents. Evaluation on the AndroidWorld benchmark demonstrates that AGENT+P improves the success rates of state-of-the-art UI agents by up to 14.31% and reduces the action steps by 37.70%.
Systems and Control (EESS)
Solar Panel-based Visible Light Communication for Batteryless Systems
This paper presents a batteryless wireless communication node for the Internet of Things, powered entirely by ambient light and capable of receiving data through visible light communication. A solar panel serves dual functions as an energy harvester and an optical antenna, capturing modulated signals from LED light sources. A lightweight analog front-end filters and digitizes the signals for an 8-bit low-power processor, which manages the system's operational states based on stored energy levels. The main processor is selectively activated to minimize energy consumption. Data reception is synchronized with the harvester's open-circuit phase, reducing interference and improving signal quality. The prototype reliably decodes 32-bit VLC frames at 800\,Herz, consuming less than 2.8\,mJ, and maintains sleep-mode power below 30\,uW.
comment: This is an open-access, author-archived version of a manuscript published in ApplePies 2025 Conference
Hierarchical GNN-Based Multi-Agent Learning for Dynamic Queue-Jump Lane and Emergency Vehicle Corridor Formation
Emergency vehicles require rapid passage through congested traffic, yet existing strategies fail to adapt to dynamic conditions. We propose a novel hierarchical graph neural network (GNN)-based multi-agent reinforcement learning framework to coordinate connected vehicles for emergency corridor formation. Our approach uses a high-level planner for global strategy and low-level controllers for trajectory execution, utilizing graph attention networks to scale with variable agent counts. Trained via Multi-Agent Proximal Policy Optimization (MAPPO), the system reduces emergency vehicle travel time by 28.3% compared to baselines and 44.6% compared to uncoordinated traffic in simulations. The design achieves near-zero collision rates (0.3%) while maintaining 81% of background traffic efficiency. Ablation and generalization studies confirm the framework's robustness across diverse scenarios. These results demonstrate the effectiveness of combining GNNs with hierarchical learning for intelligent transportation systems.
comment: 16 Pages, 5 Figures, 9 Tables, submitted to IEEE TITS
A Load Impedance Emulation Active Interface for Piezoelectric Vibration Energy Harvesters
A single stage active AC/DC interface able to emulate the optimal load impedance of a Resonant Piezoelectric Vibration Energy Harvester (RPVEH) is proposed. As theoretically shown, unlike an electronic interface that emulates an optimal load generator, an interface that emulates an optimal load impedance does not require adaptation to the acceleration of input vibrations. This allows the use of a very simple control, avoiding the implementation of Maximum Power Point Tracking (MPPT) algorithms that require lossy microcontrollers. Thus, the proposed interface is equipped with a simple analog controller allowing the RPVEH to work in its Maximum Power Point (MPP) in both steady-state and variable conditions of vibrations, without recurring to multivariable perturbative approaches, as it happens for the most of single stage AC/DC interfaces proposed in the literature. The absence of perturbative techniques allows a significant improvement of both stationary and dynamic performances. Experimental tests of a prototype of the proposed interface confirm the theoretical findings and the predicted behavior.
Stigmergic optimal transport
Efficient navigation in swarms often relies on the emergence of decentralized approaches that minimize traversal time or energy. Stigmergy, where agents modify a shared environment that then modifies their behavior, is a classic mechanism that can encode this strategy. We develop a theoretical framework for stigmergic transport by casting it as a stochastic optimal control problem: agents (collectively) lay and (individually) follow trails while minimizing expected traversal time. Simulations and analysis reveal two emergent behaviors: path straightening in homogeneous environments and path refraction at material interfaces, both consistent with experimental observations of insect trails. While reminiscent of Fermat's principle, our results show how local, noisy agent+field interactions can give rise to geodesic trajectories in heterogeneous environments, without centralized coordination or global knowledge, relying instead on an embodied slow fast dynamical mechanism.
On-Device Deep Reinforcement Learning for Decentralized Task Offloading Performance trade-offs in the training process
Allowing less capable devices to offload computational tasks to more powerful devices or servers enables the development of new applications that may not run correctly on the device itself. Deciding where and why to run each of those applications is a complex task. Therefore, different approaches have been adopted to make offloading decisions. In this work, we propose a decentralized Deep Reinforcement Learning (DRL) agent to address the selection of computing locations. Unlike most existing work, we analyze it in a real testbed composed of various edge devices running the agent to determine where to execute each task. These devices are connected to a Multi-Access Edge Computing (MEC) server and a Cloud server through 5G communications. We evaluate not only the agent's performance in meeting task requirements but also the implications of running this type of agent locally, assessing the trade-offs of training locally versus remotely in terms of latency and energy consumption.
comment: Submitted to IEEE Transactions on Cognitive Communications and Networking
Posterior error bounds for prior-driven balancing in linear Gaussian inverse problems
In large-scale Bayesian inverse problems, it is often necessary to apply approximate forward models to reduce the cost of forward model evaluations, while controlling approximation quality. In the context of Bayesian inverse problems with linear forward models, Gaussian priors, and Gaussian noise, we use perturbation theory for inverses to bound the error in the approximate posterior mean and posterior covariance resulting from a linear approximate forward model. We then focus on the smoothing problem of inferring the initial condition of linear time-invariant dynamical systems, using finitely many partial state observations. For such problems, and for a specific model order reduction method based on balanced truncation, we show that the impulse response of a certain prior-driven system is closely related to the prior-preconditioned Hessian of the inverse problem. This reveals a novel connection between systems theory and inverse problems. We exploit this connection to prove the first a priori error bounds for system-theoretic model order reduction methods applied to smoothing problems. The bounds control the approximation error of the posterior mean and covariance in terms of the truncated Hankel singular values of the underlying system.
Exact Continuous Reformulations of Logic Constraints in Nonlinear Optimization and Optimal Control Problems
Many nonlinear optimal control and optimization problems involve constraints that combine continuous dynamics with discrete logic conditions. Standard approaches typically rely on mixed-integer programming, which introduces scalability challenges and requires specialized solvers. This paper presents an exact reformulation of broad classes of logical constraints as binary-variable-free expressions whose differentiability properties coincide with those of the underlying predicates, enabling their direct integration into nonlinear programming models. Our approach rewrites arbitrary logical propositions into conjunctive normal form, converts them into equivalent max--min constraints, and applies a smoothing procedure that preserves the exact feasible set. The method is evaluated on two benchmark problems, a quadrotor trajectory optimization with obstacle avoidance and a hybrid two-tank system with temporal logic constraints, and is shown to obtain optimal solutions more consistently and efficiently than existing binary variable elimination techniques.
comment: 8 pages, 11 figures, submitted for publication to Automatica
Smooth Sampling-Based Model Predictive Control Using Deterministic Samples
Sampling-based model predictive control (MPC) is effective for nonlinear systems but often produces non-smooth control inputs due to random sampling. To address this issue, we extend the model predictive path integral (MPPI) framework with deterministic sampling and improvements from cross-entropy method (CEM)--MPC, such as iterative optimization, proposing deterministic sampling MPPI (dsMPPI). This combination leverages the exponential weighting of MPPI alongside the efficiency of deterministic samples. Experiments demonstrate that dsMPPI achieves smoother trajectories compared to state-of-the-art methods.
A Systems-Engineered ESP32 DAQ Architecture and FAIR Data Workflow for Small-Scale Wind Turbine Performance Measurement in Tropical Environments
Small-scale wind turbine research in resource-constrained academic settings frequently produces unreliable or unpublishable datasets due to ad-hoc instrumentation, inadequate time synchronization, storage failures, and weak data governance. This paper presents a systematic data acquisition (DAQ) methodology and ESP32-based reference implementation design for field characterization of small wind turbines (100~W--5~kW), emphasizing tropical/coastal deployment constraints typical of Low- and Middle-Income Countries (LMIC). We integrate (i)~a student-adapted V-model with requirements traceability, (ii)~hardware selection strategies for high-humidity and salt-spray environments, (iii)~an embedded firmware architecture featuring interrupt-driven rotor speed measurement, state-machine fault handling, and NTP-based time synchronization, (iv)~a local-first hybrid storage design combining SD-card persistence with optional MQTT cloud telemetry, and (v)~a data-management workflow adapting CRISP-DM and FAIR principles with explicit quality dimensions and publication templates. A detailed helical vertical-axis wind turbine (VAWT) design scenario for coastal Sri Lanka illustrates the complete methodology, targeting $>90\%$ data completeness over six-month campaigns. The methodology is accompanied by open-source firmware, hardware templates, and data-publication workflow artifacts released via GitHub and Zenodo.
Unified and Efficient Analysis of Machining Chatter and Surface Location Error
Although machining chatter can be suppressed by the choice of stable cutting parameters through means of stability lobe diagram (SLD), surface roughness still remains due to the forced vibration, which limits surface quality, especially in the surface finish. Better cutting parameters can be achieved considering surface location error (SLE) together with SLD. This paper proposes an innovative modeling framework of the machining dynamic system that enables efficient computation of the chatter stability and SLE. The framework mainly embodies two techniques, namely semi-discretization method (SDM) and lifting method. The machining dynamics system is mathematically expressed as an angle-varying delay differential equation (DDE). The SDM approximates the angle-varying and delayed terms to ordinary terms using zero-phase interpolations and governs the discrete angle-varying dynamics system. Then, the system is merged over the tooth passing angle using the lifted approach to establish an explicit dynamic system in the compact state-space form. Based on the compact state-space model, the chatter stability and SLE prediction are easily and efficiently conducted. Simulation results show the improved efficiency of the proposed method over other well-known methods.
comment: 13 pages, 9 figures, and 1 table
Multi-agent Optimization of Non-cooperative Multimodal Mobility Systems
While multimodal mobility systems have the potential to bring many benefits to travelers, drivers, the environment, and traffic congestion, such systems typically involve multiple non-cooperative decision-makers who may selfishly optimize their own objectives without considering the overall system benefits. This paper aims to investigate market-based interactions of travelers and ride-sourcing drivers in the context of multimodal mobility systems. We propose a unified mathematical modeling framework to capture the decentralized travelers and drivers' decision-making process and balance the network's demand and supply by equilibrium pricing. Such a model allows analyses of the impact of decentralized decision-making on multimodal mobility efficiencies. The proposed formulation can be further convexified to efficiently compute the equilibrium ride-sourcing prices. We conduct numerical experiments on different settings of transportation networks to gain policy insights. We find that travelers prefer ride-sourcing and multimodal transportation more than the driving option when they are more sensitive to prices. We also find that travelers may need to be subsidized to use multimodal transportation when there is fewer transit hubs in the network or, ride-sourcing drivers become too sensitive to the prices. However, we find that more transit hubs in the network increases the total empty VMT of ride-sourcing drivers by increasing the total relocation time. The proposed model can be used by policymakers and platform operators to design pricing and subsidy schemes that align individual decision-making with system-level efficiency and evaluate the trade-offs between accessibility and environmental impacts in multimodal transportation networks.
Output Consensus on Periodic References for Constrained Multi-agent Systems Under a Switching Network
This work addresses the output consensus problem of constrained heterogeneous multi-agent systems under a switching network with potential communication delay, where outputs are periodic and characterized by a linear exosystem. Since periodic references have more complex dynamics, it is more challenging to track periodic references and achieve consensus on them. In this paper, a model predictive control method incorporating an artificial reference and a modified cost is proposed to track periodic references, which maintains recursive feasibility even when reference switches. Moreover, consensus protocols are proposed to achieve consensus on periodic references in different scenarios, in which global information such as the set of globally admissible references and the global time index are not involved. Theoretical analysis proves that constrained output consensus is asymptotically achieved with the proposed algorithm as the references of each agent converge and agents track their references while maintaining constraint satisfaction. Finally, numerical examples are provided to verify the effectiveness of the proposed algorithm.
Derivation of the Thermal Conductivity in a Latent Thermal Energy Storage Unit for Use in Simplified System Models
Latent Thermal Energy Storages (LTES) can store thermal energy in a narrow temperature range. Therefore, they are favorable for integration into Rankine-based Carnot Batteries. For the design of such systems, simulations based on accurate models are desirable. However, physical phenomena such as natural convection in LTES units cannot be modeled directly in transient system models. Simplified models are required. Therefore, the objective of this work is to derive simplified LTES unit models for use in system models. In transient simulations the state of charge of the LTES influences its temperature profile. The temperature profile depends on the geometry of the LTES unit. Therefore, the geometry must be considered to model the transient behavior of an LTES unit. The LTES unit under investigation has a shell and tube heat exchanger structure. The phase change material (PCM) is located between the hexagonal fins and in the space between the finned tubes. Aluminum fins are used. They have a high thermal conductivity and thus compensate for the low thermal conductivity of the sodium nitrate used as PCM. The interaction between fins and PCM is complex. Therefore, a numerical approach can be used to gain insight into the behavior of the LTES unit. To transfer the results of a complex model to a simplified model where fins and PCM are not considered individually, the effective thermal conductivity of a single finned tube can be used to approximate the performance of the LTES unit. In this study, a model of a section with a single finned tube is developed using the COMSOL software. The effective thermal conductivity of the system is determined by varying the effective thermal conductivity in a simplified model and comparing the results with reference cases based on a complex modeling approach. The results can serve as model input for simplified system models of Carnot Batteries, among others.
Accounting for Optimal Control in the Sizing of Isolated Hybrid Renewable Energy Systems Using Imitation Learning
Decarbonization of isolated or off-grid energy systems through phase-in of large shares of intermittent solar or wind generation requires co-installation of energy storage or continued use of existing fossil dispatchable power sources to balance supply and demand. The effective CO2 emission reduction depends on the relative capacity of the energy storage and renewable sources, the stochasticity of the renewable generation, and the optimal control or dispatch of the isolated energy system. While the operations of the energy storage and dispatchable sources may impact the optimal sizing of the system, it is challenging to account for the effect of finite horizon, optimal control at the stage of system sizing. Here, we present a flexible and computationally efficient sizing framework for energy storage and renewable capacity in isolated energy systems, accounting for uncertainty in the renewable generation and the optimal feedback control. To this end, we implement an imitation learning approach to stochastic neural model predictive control (MPC) which allows us to relate the battery storage and wind peak capacities to the emissions reduction and investment costs while accounting for finite horizon, optimal control. Through this approach, decision makers can evaluate the effective emission reduction and costs of different storage and wind capacities at any price point while accounting for uncertainty in the renewable generation with limited foresight. We evaluate the proposed sizing framework on a case study of an offshore energy system with a gas turbine, a wind farm and a battery energy storage system (BESS). In this case, we find a nonlinear, nontrivial relationship between the investment costs and reduction in gas usage relative to the wind and BESS capacities, emphasizing the complexity and importance of accounting for optimal control in the design of isolated energy systems.
comment: 11 pages, 9 figures
DSP-Based Sub-Switching-Period Current-Limiting Control for Grid-Tied Inverter under Grid Faults
This paper presents a sub-switching period current-limiting control for a grid-tied inverter to prevent transient overcurrents during grid faults and enable seamless fault ride-through (FRT). Sudden grid-voltage disturbances, such as voltage sags or phase jumps, can induce large transient currents within a switching period, particularly at low switching frequencies. Upon disturbance detection, the proposed method immediately modifies the pulse-width modulation carrier, enabling continuous regulation of the inverter output current within a time much shorter than a switching period without interrupting current flow. The proposed method can be implemented on commonly used digital signal processors without requiring specialized analog or digital circuits or high-speed computing devices. Experimental results from a 2-level, 3-phase inverter switching at 3.6 kHz validate the effectiveness of the proposed method under symmetric and asymmetric voltage sags and phase jumps.
comment: 8 pages
Spider web-inspired sensing and computation with fiber network physical reservoirs
Physical reservoir computing leverages the intrinsic dynamics of mechanical systems to perform computation through their natural responses to input signals. Here, we study a compliant fiber network inspired by orb-weaving spider webs and investigate how its mechanical design and operating conditions shape its computational capability. Using Cosserat rod-based simulations, we identify how network topology, geometry, actuation, and axial tension impact the nonlinear computation and memory capacity of the network. We further evaluate several readout reduction strategies to assess how computational performance varies with the number and placement of measured outputs. We then experimentally validate these results using a physical fiber-network prototype. Overall, results provide insights and guidance on design, actuation, and sensing choices to enable fiber networks for mechano-intelligent computation. They demonstrate the ability of structured compliant fibers networks to serve as physical reservoirs capable of nonlinear transformation and input-history retention.
comment: 13 pages, 6 figures
Cyberattack Detection in Virtualized Microgrids Using LightGBM and Knowledge-Distilled Classifiers
Modern microgrids depend on distributed sensing and communication interfaces, making them increasingly vulnerable to cyber physical disturbances that threaten operational continuity and equipment safety. In this work, a complete virtual microgrid was designed and implemented in MATLAB/Simulink, integrating heterogeneous renewable sources and secondary controller layers. A structured cyberattack framework was developed using MGLib to inject adversarial signals directly into the secondary control pathways. Multiple attack classes were emulated, including ramp, sinusoidal, additive, coordinated stealth, and denial of service behaviors. The virtual environment was used to generate labeled datasets under both normal and attack conditions. The datasets trained Light Gradient Boosting Machine (LightGBM) models to perform two functions: detecting the presence of an intrusion (binary) and distinguishing among attack types (multiclass). The multiclass model attained 99.72% accuracy and a 99.62% F1 score, while the binary model attained 94.8% accuracy and a 94.3% F1 score. A knowledge-distillation step reduced the size of the multiclass model, allowing faster predictions with only a small drop in performance. Real-time tests showed a processing delay of about 54 to 67 ms per 1000 samples, demonstrating suitability for CPU-based edge deployment in microgrid controllers. The results confirm that lightweight machine learning based intrusion detection methods can provide fast, accurate, and efficient cyberattack detection without relying on complex deep learning models. Key contributions include: (1) development of a complete MATLAB-based virtual microgrid, (2) structured attack injection at the control layer, (3) creation of multiclass labeled datasets, and (4) design of low-cost AI models suitable for practical microgrid cybersecurity.
comment: 12 pages
Adaptive Model-Based Reinforcement Learning for Orbit Feedback Control in NSLS-II Storage Ring
The National Synchrotron Light Source II (NSLS-II) uses highly stable electron beam to produce high-quality X-ray beams with high brightness and low-emittance synchrotron radiation. The traditional algorithm to stabilize the beam applies singular value decomposition (SVD) on the orbit response matrix to remove noise and extract actions. Supervised learning has been studied on NSLS-II storage ring stabilization and other accelerator facilities recently. Several problems, for example, machine status drifting, environment noise, and non-linear accelerator dynamics, remain unresolved in the SVD-based and supervised learning algorithms. To address these problems, we propose an adaptive training framework based on model-based reinforcement learning. This framework consists of two types of optimizations: trajectory optimization attempts to minimize the expected total reward in a differentiable environment, and online model optimization learns non-linear machine dynamics through the agent-environment interaction. Through online training, this framework tracks the internal status drifting in the electron beam ring. Simulation and real in-facility experiments on NSLS-II reveal that our method stabilizes the beam position and minimizes the alignment error, defined as the root mean square (RMS) error between adjusted beam positions and the reference position, down to ~1$μ$m.
comment: Accepted by the 20th International Conference on Accelerator and Large Experimental Physics Control Systems (ICALEPCS 2025)
Online Decision-Making Under Uncertainty for Vehicle-to-Building Systems
Vehicle-to-building (V2B) systems integrate physical infrastructures, such as smart buildings and electric vehicles (EVs) connected to chargers at the building, with digital control mechanisms to manage energy use. By utilizing EVs as flexible energy reservoirs, buildings can dynamically charge and discharge them to optimize energy use and cut costs under time-variable pricing and demand charge policies. This setup leads to the V2B optimization problem, where buildings coordinate EV charging and discharging to minimize total electricity costs while meeting users' charging requirements. However, the V2B optimization problem is challenging because of: (1) fluctuating electricity pricing, which includes both energy charges ($/kWh) and demand charges ($/kW); (2) long planning horizons (typically over 30 days); (3) heterogeneous chargers with varying charging rates, controllability, and directionality (i.e., unidirectional or bidirectional); and (4) user-specific battery levels at departure to ensure user requirements are met. In contrast to existing approaches that often model this setting as a single-shot combinatorial optimization problem, we highlight critical limitations in prior work and instead model the V2B optimization problem as a Markov decision process (MDP), i.e., a stochastic control process. Solving the resulting MDP is challenging due to the large state and action spaces. To address the challenges of the large state space, we leverage online search, and we counter the action space by using domain-specific heuristics to prune unpromising actions. We validate our approach in collaboration with Nissan Advanced Technology Center - Silicon Valley. Using data from their EV testbed, we show that the proposed framework significantly outperforms state-of-the-art methods.
comment: 17 pages, 2 figures, 10 tables. Published in the Proceedings of the 16th ACM/IEEE International Conference on Cyber-Physical Systems (ICCPS '25), May 06--09, 2025, Irvine, CA, USA
Enhanced-FQL($λ$), an Efficient and Interpretable RL with novel Fuzzy Eligibility Traces and Segmented Experience Replay
This paper introduces a fuzzy reinforcement learning framework, Enhanced-FQL($λ$), that integrates novel Fuzzified Eligibility Traces (FET) and Segmented Experience Replay (SER) into fuzzy Q-learning with Fuzzified Bellman Equation (FBE) for continuous control tasks. The proposed approach employs an interpretable fuzzy rule base instead of complex neural architectures, while maintaining competitive performance through two key innovations: a fuzzified Bellman equation with eligibility traces for stable multi-step credit assignment, and a memory-efficient segment-based experience replay mechanism for enhanced sample efficiency. Theoretical analysis proves the proposed method convergence under standard assumptions. Extensive evaluations in continuous control domains demonstrate that Enhanced-FQL($λ$) achieves superior sample efficiency and reduced variance compared to n-step fuzzy TD and fuzzy SARSA($λ$) baselines, while maintaining substantially lower computational complexity than deep RL alternatives such as DDPG. The framework's inherent interpretability, combined with its computational efficiency and theoretical convergence guarantees, makes it particularly suitable for safety-critical applications where transparency and resource constraints are essential.
comment: Submitted to ECC26 conference
Ultra-sensitive graphene-based electro-optic sensors for optically-multiplexed neural recording
Large-scale neural recording with high spatio-temporal resolution is essential for understanding information processing in brain, yet current neural interfaces fall far short of comprehensively capturing brain activity due to extremely high neuronal density and limited scalability. Although recent advances have miniaturized neural probes and increased channel density, fundamental design constraints still prevent dramatic scaling of simultaneously recorded channels. To address this limitation, we introduce a novel electro-optic sensor that directly converts ultra-low-amplitude neural electrical signals into optical signals with high signal-to-noise ratio. By leveraging the ultra-high bandwidth and intrinsic multiplexing capability of light, this approach offers a scalable path toward massively parallel neural recording beyond the limits of traditional electrical interfaces. The sensor integrates an on-chip photonic microresonator with a graphene layer, enabling direct detection of neural signals without genetically encoded optical indicators or tissue modification, making it suitable for human translation. Neural signals are locally transduced into amplified optical modulations and transmitted through on-chip waveguides, enabling interference-free recording without bulky electromagnetic shielding. Arrays of wavelength-selective sensors can be multiplexed on a single bus waveguide using wavelength-division multiplexing (WDM), greatly improving scalability while maintaining a minimal footprint to reduce tissue damage. We demonstrate detection of evoked neural signals as small as 25 $μ$V with 3 dB SNR from mouse brain tissue and show multiplexed recording from 10 sensors on a single waveguide. These results establish a proof-of-concept for optically multiplexed neural recording and point toward scalable, high-density neural interfaces for neurological research and clinical applications.
Latent Representations for Control Design with Provable Stability and Safety Guarantees
We initiate a formal study on the use of low-dimensional latent representations of dynamical systems for verifiable control synthesis. Our main goal is to enable the application of verification techniques -- such as Lyapunov or barrier functions -- that might otherwise be computationally prohibitive when applied directly to the full state representation. Towards this goal, we first provide dynamics-aware approximate conjugacy conditions which formalize the notion of reconstruction error necessary for systems analysis. We then utilize our conjugacy conditions to transfer the stability and invariance guarantees of a latent certificate function (e.g., a Lyapunov or barrier function) for a latent space controller back to the original system. Importantly, our analysis contains several important implications for learning latent spaces and dynamics, by highlighting the necessary geometric properties which need to be preserved by the latent space, in addition to providing concrete loss functions for dynamics reconstruction that are directly related to control design. We conclude by demonstrating the applicability of our theory to two case studies: (1) stabilization of a cartpole system, and (2) collision avoidance for a two vehicle system.
comment: 14 pages, 3 figures. Presented at CDC 2025. Expanded version
Spatial exponential decay of perturbations in optimal control of general evolution equations
We analyze the robustness of optimally controlled evolution equations with respect to spatially localized perturbations. We prove that if the involved operators are domain-uniformly stabilizable and detectable, then these localized perturbations only have a local effect on the optimal solution. We characterize this domain-uniform stabilizability and detectability for the transport equation with constant transport velocity, showing that even for unitary semigroups, optimality implies exponential damping. We extend this result to the case of a space-dependent transport velocity. Finally we leverage the results for the transport equation to characterize domain-uniform stabilizability of the wave equation. Numerical examples in one space dimension complement the theoretical results.
comment: 53 pages, 5 figures
Real-time Velocity Profile Optimization for Time-Optimal Maneuvering with Generic Acceleration Constraints
The computation of time-optimal velocity profiles along prescribed paths, subject to generic acceleration constraints, is a crucial problem in robot trajectory planning, with particular relevance to autonomous racing. However, the existing methods either support arbitrary acceleration constraints at high computational cost or use conservative box constraints for computational efficiency. We propose FBGA, a new \underline{F}orward-\underline{B}ackward algorithm with \underline{G}eneric \underline{A}cceleration constraints, which achieves both high accuracy and low computation time. FBGA operates forward and backward passes to maximize the velocity profile in short, discretized path segments, while satisfying user-defined performance limits. Tested on five racetracks and two vehicle classes, FBGA handles complex, non-convex acceleration constraints with custom formulations. Its maneuvers and lap times closely match optimal control baselines (within $0.11\%$-$0.36\%$), while being up to three orders of magnitude faster. FBGA maintains high accuracy even with coarse discretization, making it well-suited for online multi-query trajectory planning. Our open-source \texttt{C++} implementation is available at: https://anonymous.4open.science/r/FB_public_RAL.
Economic zone data-enabled predictive control for connected open water systems
The real-time operation of open water systems is essential for ensuring operational safety, satisfying operational requirements, and optimizing energy usage. However, existing rule-based control strategies rely heavily on human experience, while model-based approaches depend on accurate hydrodynamic models, which limit their applicability to water systems with complex dynamics and uncertain disturbances. In this work, we develop a fully data-driven, zone-based control framework with adaptive control target zone selection for safe and energy-efficient operation of connected open water systems. Specifically, we propose a mixed-integer economic zone data-enabled predictive control (DeePC) approach that aims to maintain the water levels of the branches within the desired water-level zone while reducing real-time operational energy consumption. The DeePC-based approach enables direct use of input-output data for predictive control, eliminating the need for explicit dynamic modeling. To handle multiple control objectives with different priorities, we employ lexicographic optimization and reformulate the traditional DeePC cost function to incorporate zone tracking and energy consumption minimization objectives. Additionally, Bayesian optimization is utilized to determine the control target zone, which enables an effective trade-off between zone tracking and energy consumption in the presence of external disturbances. Comprehensive simulations and comparative analyses demonstrate the effectiveness of the proposed method. The proposed method maintains water levels within the desired water-level zone for 97.04% of the operating time, with an average energy consumption of 33.5 kWh per 0.5 hour. Compared to rule-based control method, the proposed method lowers zone-violation frequency by 74.96% and the average energy consumption by 22.44%.
On Game based Distributed Approach for General Multi-agent Optimal Coverage with Application to UAV Networks
This paper focuses on the optimal coverage problem (OCP) for multi-agent systems with a decentralized optimization mechanism. A game based distributed decision-making method for the multi-agent OCP is proposed to address the high computational costs arising from the large scale of the multi-agent system and to ensure that the game's equilibrium achieves the global performance objective's maximum value. In particular, a distributed algorithm that needs only local information is developed and proved to converge to near-optimal global coverage. Finally, the proposed method is applied to maximize the coverage area of the UAV network for a target region. The simulation results show that our method can require much less computational time than other typical distributed algorithms in related work, while achieving a faster convergence rate. Comparison with centralized optimization also demonstrates that the proposed method has approximate optimization results and high computation efficiency.
comment: 11 pages,11 figures
Physics-Driven Data Generation for Contact-Rich Manipulation via Trajectory Optimization
We present a low-cost data generation pipeline that integrates physics-based simulation, human demonstrations, and model-based planning to efficiently generate large-scale, high-quality datasets for contact-rich robotic manipulation tasks. Starting with a small number of embodiment-flexible human demonstrations collected in a virtual reality simulation environment, the pipeline refines these demonstrations using optimization-based kinematic retargeting and trajectory optimization to adapt them across various robot embodiments and physical parameters. This process yields a diverse, physically consistent dataset that enables cross-embodiment data transfer, and offers the potential to reuse legacy datasets collected under different hardware configurations or physical parameters. We validate the pipeline's effectiveness by training diffusion policies from the generated datasets for challenging contact-rich manipulation tasks across multiple robot embodiments, including a floating Allegro hand and bimanual robot arms. The trained policies are deployed zero-shot on hardware for bimanual iiwa arms, achieving high success rates with minimal human input. Project website: https://lujieyang.github.io/physicsgen/.
From inconsistency to decision: explainable operation and maintenance of battery energy storage systems
Battery Energy Storage Systems (BESSs) are increasingly critical to power-system stability, yet their operation and maintenance remain dominated by reactive, expert-dependent diagnostics. While cell-level inconsistencies provide early warning signals of degradation and safety risks, the lack of scalable and interpretable decision-support frameworks prevents these signals from being effectively translated into operational actions. Here we introduce an inconsistency-driven operation and maintenance paradigm for large-scale BESSs that systematically transforms routine monitoring data into explainable, decision-oriented guidance. The proposed framework integrates multi-dimensional inconsistency evaluation with large language model-based semantic reasoning to bridge the gap between quantitative diagnostics and practical maintenance decisions. Using eight months of field data from an in-service battery system comprising 3,564 cells, we demonstrate how electrical, thermal, and aging-related inconsistencies can be distilled into structured operational records and converted into actionable maintenance insights through a multi-agent framework. The proposed approach enables accurate and explainable responses to real-world operation and maintenance queries, reducing response time and operational cost by over 80% compared with conventional expert-driven practices. These results establish a scalable pathway for intelligent operation and maintenance of battery energy storage systems, with direct implications for reliability, safety, and cost-effective integration of energy storage into modern power systems.
comment: 13 pages, 5 figures
Nonholonomic Robot Parking by Feedback -- Part I: Modular Strict CLF Designs
It has been known in the robotics literature since about 1995 that, in polar coordinates, the nonholonomic unicycle is asymptotically stabilizable by smooth feedback, even globally. We introduce a modular design framework that selects the forward velocity to decouple the radial coordinate, allowing the steering subsystem to be stabilized independently. Within this structure, we develop families of feedback laws using passivity, backstepping, and integrator forwarding. Each law is accompanied by a strict control Lyapunov function, including barrier variants that enforce angular constraints. These strict CLFs provide constructive class KL convergence estimates and enable eigenvalue assignment at the target equilibrium. The framework generalizes and extends prior modular and nonmodular approaches, while preparing the ground for inverse optimal and adaptive redesigns in the sequel paper.
comment: arXiv admin note: text overlap with arXiv:2509.25575
Improving Bayesian Optimization for Portfolio Management with an Adaptive Scheduling
Existing black-box portfolio management systems are prevalent in the financial industry due to commercial and safety constraints, though their performance can fluctuate dramatically with changing market regimes. Evaluating these non-transparent systems is computationally expensive, as fixed budgets limit the number of possible observations. Therefore, achieving stable and sample-efficient optimization for these systems has become a critical challenge. This work presents a novel Bayesian optimization framework (TPE-AS) that improves search stability and efficiency for black-box portfolio models under these limited observation budgets. Standard Bayesian optimization, which solely maximizes expected return, can yield erratic search trajectories and misalign the surrogate model with the true objective, thereby wasting the limited evaluation budget. To mitigate these issues, we propose a weighted Lagrangian estimator that leverages an adaptive schedule and importance sampling. This estimator dynamically balances exploration and exploitation by incorporating both the maximization of model performance and the minimization of the variance of model observations. It guides the search from broad, performance-seeking exploration towards stable and desirable regions as the optimization progresses. Extensive experiments and ablation studies, which establish our proposed method as the primary approach and other configurations as baselines, demonstrate its effectiveness across four backtest settings with three distinct black-box portfolio management models.
comment: 5 pages, 2 figures; version of record. ICAAI 2025, 9th International Conference on Advances in Artificial Intelligence (ICAAI 2025), November 14-16, 2025, Manchester, United Kingdom. ACM, New York, NY, USA, 5 pages
Robotics
Nonlinear Spectral Modeling and Control of Soft-Robotic Muscles from Data
Artificial muscles are essential for compliant musculoskeletal robotics but complicate control due to nonlinear multiphysics dynamics. Hydraulically amplified electrostatic (HASEL) actuators, a class of soft artificial muscles, offer high performance but exhibit memory effects and hysteresis. Here we present a data-driven reduction and control strategy grounded in spectral submanifold (SSM) theory. In the adiabatic regime, where inputs vary slowly relative to intrinsic transients, trajectories rapidly converge to a low-dimensional slow manifold. We learn an explicit input-to-output map on this manifold from forced-response trajectories alone, avoiding decay experiments that can trigger hysteresis. We deploy the SSM-based model for real-time control of an antagonistic HASEL-clutch joint. This approach yields a substantial reduction in tracking error compared to feedback-only and feedforward-only baselines under identical settings. This record-and-control workflow enables rapid characterization and high-performance control of soft muscles and muscle-driven joints without detailed physics-based modeling.
A High-Fidelity Digital Twin for Robotic Manipulation Based on 3D Gaussian Splatting
Developing high-fidelity, interactive digital twins is crucial for enabling closed-loop motion planning and reliable real-world robot execution, which are essential to advancing sim-to-real transfer. However, existing approaches often suffer from slow reconstruction, limited visual fidelity, and difficulties in converting photorealistic models into planning-ready collision geometry. We present a practical framework that constructs high-quality digital twins within minutes from sparse RGB inputs. Our system employs 3D Gaussian Splatting (3DGS) for fast, photorealistic reconstruction as a unified scene representation. We enhance 3DGS with visibility-aware semantic fusion for accurate 3D labelling and introduce an efficient, filter-based geometry conversion method to produce collision-ready models seamlessly integrated with a Unity-ROS2-MoveIt physics engine. In experiments with a Franka Emika Panda robot performing pick-and-place tasks, we demonstrate that this enhanced geometric accuracy effectively supports robust manipulation in real-world trials. These results demonstrate that 3DGS-based digital twins, enriched with semantic and geometric consistency, offer a fast, reliable, and scalable path from perception to manipulation in unstructured environments.
comment: Under review of Journal of Robot Learning
Limited Linguistic Diversity in Embodied AI Datasets
Language plays a critical role in Vision-Language-Action (VLA) models, yet the linguistic characteristics of the datasets used to train and evaluate these systems remain poorly documented. In this work, we present a systematic dataset audit of several widely used VLA corpora, aiming to characterize what kinds of instructions these datasets actually contain and how much linguistic variety they provide. We quantify instruction language along complementary dimensions-including lexical variety, duplication and overlap, semantic similarity, and syntactic complexity. Our analysis shows that many datasets rely on highly repetitive, template-like commands with limited structural variation, yielding a narrow distribution of instruction forms. We position these findings as descriptive documentation of the language signal available in current VLA training and evaluation data, intended to support more detailed dataset reporting, more principled dataset selection, and targeted curation or augmentation strategies that broaden language coverage.
Dual-quaternion learning control for autonomous vehicle trajectory tracking with safety guarantees
We propose a learning-based trajectory tracking controller for autonomous robotic platforms whose motion can be described kinematically on $\mathrm{SE}(3)$. The controller is formulated in the dual quaternion framework and operates at the velocity level, assuming direct command of angular and linear velocities, as is standard in many aerial vehicles and omnidirectional mobile robots. Gaussian Process (GP) regression is integrated into a geometric feedback law to learn and compensate online for unknown, state-dependent disturbances and modeling imperfections affecting both attitude and position, while preserving the algebraic structure and coupling properties inherent to rigid-body motion. The proposed approach does not rely on explicit parametric models of the unknown effects, making it well-suited for robotic systems subject to sensor-induced disturbances, unmodeled actuation couplings, and environmental uncertainties. A Lyapunov-based analysis establishes probabilistic ultimate boundedness of the pose tracking error under bounded GP uncertainty, providing formal stability guarantees for the learning-based controller. Simulation results demonstrate accurate and smooth trajectory tracking in the presence of realistic, localized disturbances, including correlated rotational and translational effects arising from magnetometer perturbations. These results illustrate the potential of combining geometric modeling and probabilistic learning to achieve robust, data-efficient pose control for autonomous robotic systems.
HEXAR: a Hierarchical Explainability Architecture for Robots
As robotic systems become increasingly complex, the need for explainable decision-making becomes critical. Existing explainability approaches in robotics typically either focus on individual modules, which can be difficult to query from the perspective of high-level behaviour, or employ monolithic approaches, which do not exploit the modularity of robotic architectures. We present HEXAR (Hierarchical EXplainability Architecture for Robots), a novel framework that provides a plug-in, hierarchical approach to generate explanations about robotic systems. HEXAR consists of specialised component explainers using diverse explanation techniques (e.g., LLM-based reasoning, causal models, feature importance, etc) tailored to specific robot modules, orchestrated by an explainer selector that chooses the most appropriate one for a given query. We implement and evaluate HEXAR on a TIAGo robot performing assistive tasks in a home environment, comparing it against end-to-end and aggregated baseline approaches across 180 scenario-query variations. We observe that HEXAR significantly outperforms baselines in root cause identification, incorrect information exclusion, and runtime, offering a promising direction for transparent autonomous systems.
comment: 8 pages, 6 figures
A Fast Semidefinite Convex Relaxation for Optimal Control Problems With Spatio-Temporal Constraints
Solving optimal control problems (OCPs) of autonomous agents operating under spatial and temporal constraints fast and accurately is essential in applications ranging from eco-driving of autonomous vehicles to quadrotor navigation. However, the nonlinear programs approximating the OCPs are inherently nonconvex due to the coupling between the dynamics and the event timing, and therefore, they are challenging to solve. Most approaches address this challenge by predefining waypoint times or just using nonconvex trajectory optimization, which simplifies the problem but often yields suboptimal solutions. To significantly improve the numerical properties, we propose a formulation with a time-scaling direct multiple shooting scheme that partitions the prediction horizon into segments aligned with characteristic time constraints. Moreover, we develop a fast semidefinite-programming-based convex relaxation that exploits the sparsity pattern of the lifted formulation. Comprehensive simulation studies demonstrate the solution optimality and computational efficiency. Furthermore, real-world experiments on a quadrotor waypoint flight task with constrained open time windows validate the practical applicability of the approach in complex environments.
comment: 9 pages, 6 figures
SOP: A Scalable Online Post-Training System for Vision-Language-Action Models
Vision-language-action (VLA) models achieve strong generalization through large-scale pre-training, but real-world deployment requires expert-level task proficiency in addition to broad generality. Existing post-training approaches for VLA models are typically offline, single-robot, or task-specific, limiting effective on-policy adaptation and scalable learning from real-world interaction. We introduce a Scalable Online Post-training (SOP) system that enables online, distributed, multi-task post-training of generalist VLA models directly in the physical world. SOP tightly couples execution and learning through a closed-loop architecture in which a fleet of robots continuously streams on-policy experience and human intervention signals to a centralized cloud learner, and asynchronously receives updated policies. This design supports prompt on-policy correction, scales experience collection through parallel deployment, and preserves generality during adaptation. SOP is agnostic to the choice of post-training algorithm; we instantiate it with both interactive imitation learning (HG-DAgger) and reinforcement learning (RECAP). Across a range of real-world manipulation tasks including cloth folding, box assembly, and grocery restocking, we show that SOP substantially improves the performance of large pretrained VLA models while maintaining a single shared policy across tasks. Effective post-training can be achieved within hours of real-world interaction, and performance scales near-linearly with the number of robots in the fleet. These results suggest that tightly coupling online learning with fleet-scale deployment is instrumental to enabling efficient, reliable, and scalable post-training of generalist robot policies in the physical world.
PiDR: Physics-Informed Inertial Dead Reckoning for Autonomous Platforms
A fundamental requirement for full autonomy is the ability to sustain accurate navigation in the absence of external data, such as GNSS signals or visual information. In these challenging environments, the platform must rely exclusively on inertial sensors, leading to pure inertial navigation. However, the inherent noise and other error terms of the inertial sensors in such real-world scenarios will cause the navigation solution to drift over time. Although conventional deep-learning models have emerged as a possible approach to inertial navigation, they are inherently black-box in nature. Furthermore, they struggle to learn effectively with limited supervised sensor data and often fail to preserve physical principles. To address these limitations, we propose PiDR, a physics-informed inertial dead-reckoning framework for autonomous platforms in situations of pure inertial navigation. PiDR offers transparency by explicitly integrating inertial navigation principles into the network training process through the physics-informed residual component. PiDR plays a crucial role in mitigating abrupt trajectory deviations even under limited or sparse supervision. We evaluated PiDR on real-world datasets collected by a mobile robot and an autonomous underwater vehicle. We obtained more than 29% positioning improvement in both datasets, demonstrating the ability of PiDR to generalize different platforms operating in various environments and dynamics. Thus, PiDR offers a robust, lightweight, yet effective architecture and can be deployed on resource-constrained platforms, enabling real-time pure inertial navigation in adverse scenarios.
comment: 11 pages and 7 figures
Validating Generalist Robots with Situation Calculus and STL Falsification
Generalist robots are becoming a reality, capable of interpreting natural language instructions and executing diverse operations. However, their validation remains challenging because each task induces its own operational context and correctness specification, exceeding the assumptions of traditional validation methods. We propose a two-layer validation framework that combines abstract reasoning with concrete system falsification. At the abstract layer, situation calculus models the world and derives weakest preconditions, enabling constraint-aware combinatorial testing to systematically generate diverse, semantically valid world-task configurations with controllable coverage strength. At the concrete layer, these configurations are instantiated for simulation-based falsification with STL monitoring. Experiments on tabletop manipulation tasks show that our framework effectively uncovers failure cases in the NVIDIA GR00T controller, demonstrating its promise for validating general-purpose robot autonomy.
A Bi-directional Adaptive Framework for Agile UAV Landing
Autonomous landing on mobile platforms is crucial for extending quadcopter operational flexibility, yet conventional methods are often too inefficient for highly dynamic scenarios. The core limitation lies in the prevalent ``track-then-descend'' paradigm, which treats the platform as a passive target and forces the quadcopter to perform complex, sequential maneuvers. This paper challenges that paradigm by introducing a bi-directional cooperative landing framework that redefines the roles of the vehicle and the platform. The essential innovation is transforming the problem from a single-agent tracking challenge into a coupled system optimization. Our key insight is that the mobile platform is not merely a target, but an active agent in the landing process. It proactively tilts its surface to create an optimal, stable terminal attitude for the approaching quadcopter. This active cooperation fundamentally breaks the sequential model by parallelizing the alignment and descent phases. Concurrently, the quadcopter's planning pipeline focuses on generating a time-optimal and dynamically feasible trajectory that minimizes energy consumption. This bi-directional coordination allows the system to execute the recovery in an agile manner, characterized by aggressive trajectory tracking and rapid state synchronization within transient windows. The framework's effectiveness, validated in dynamic scenarios, significantly improves the efficiency, precision, and robustness of autonomous quadrotor recovery in complex and time-constrained missions.
comment: This work has been submitted to the IEEE Robotics and Automation Letters (RA-L) for possible publication
Learning to Act Robustly with View-Invariant Latent Actions
Vision-based robotic policies often struggle with even minor viewpoint changes, underscoring the need for view-invariant visual representations. This challenge becomes more pronounced in real-world settings, where viewpoint variability is unavoidable and can significantly disrupt policy performance. Existing methods typically learn invariance from multi-view observations at the scene level, but such approaches rely on visual appearance and fail to incorporate the physical dynamics essential for robust generalization. We propose View-Invariant Latent Action (VILA), which models a latent action capturing transition patterns across trajectories to learn view-invariant representations grounded in physical dynamics. VILA aligns these latent actions across viewpoints using an action-guided objective based on ground-truth action sequences. Experiments in both simulation and the real world show that VILA-based policies generalize effectively to unseen viewpoints and transfer well to new tasks, establishing VILA as a strong pretraining framework that improves robustness and downstream learning performance.
comment: Website: https://joon-stack.github.io/VILA/
Parameter-Robust MPPI for Safe Online Learning of Unknown Parameters
Robots deployed in dynamic environments must remain safe even when key physical parameters are uncertain or change over time. We propose Parameter-Robust Model Predictive Path Integral (PRMPPI) control, a framework that integrates online parameter learning with probabilistic safety constraints. PRMPPI maintains a particle-based belief over parameters via Stein Variational Gradient Descent, evaluates safety constraints using Conformal Prediction, and optimizes both a nominal performance-driven and a safety-focused backup trajectory in parallel. This yields a controller that is cautious at first, improves performance as parameters are learned, and ensures safety throughout. Simulation and hardware experiments demonstrate higher success rates, lower tracking error, and more accurate parameter estimates than baselines.
LOST-3DSG: Lightweight Open-Vocabulary 3D Scene Graphs with Semantic Tracking in Dynamic Environments
Tracking objects that move within dynamic environments is a core challenge in robotics. Recent research has advanced this topic significantly; however, many existing approaches remain inefficient due to their reliance on heavy foundation models. To address this limitation, we propose LOST-3DSG, a lightweight open-vocabulary 3D scene graph designed to track dynamic objects in real-world environments. Our method adopts a semantic approach to entity tracking based on word2vec and sentence embeddings, enabling an open-vocabulary representation while avoiding the necessity of storing dense CLIP visual features. As a result, LOST-3DSG achieves superior performance compared to approaches that rely on high-dimensional visual embeddings. We evaluate our method through qualitative and quantitative experiments conducted in a real 3D environment using a TIAGo robot. The results demonstrate the effectiveness and efficiency of LOST-3DSG in dynamic object tracking. Code and supplementary material are publicly available on the project website at https://lab-rococo-sapienza.github.io/lost-3dsg/.
Warm-Starting Collision-Free Model Predictive Control With Object-Centric Diffusion
Acting in cluttered environments requires predicting and avoiding collisions while still achieving precise control. Conventional optimization-based controllers can enforce physical constraints, but they struggle to produce feasible solutions quickly when many obstacles are present. Diffusion models can generate diverse trajectories around obstacles, yet prior approaches lacked a general and efficient way to condition them on scene structure. In this paper, we show that combining diffusion-based warm-starting conditioned with a latent object-centric representation of the scene and with a collision-aware model predictive controller (MPC) yields reliable and efficient motion generation under strict time limits. Our approach conditions a diffusion transformer on the system state, task, and surroundings, using an object-centric slot attention mechanism to provide a compact obstacle representation suitable for control. The sampled trajectories are refined by an optimal control problem that enforces rigid-body dynamics and signed-distance collision constraints, producing feasible motions in real time. On benchmark tasks, this hybrid method achieved markedly higher success rates and lower latency than sampling-based planners or either component alone. Real-robot experiments with a torque-controlled Panda confirm reliable and safe execution with MPC.
comment: An open-source implementation is provided https://cozy-fairy-0e0139.netlify.app/
Soft Responsive Materials Enhance Humanoid Safety
Humanoid robots are envisioned as general-purpose platforms in human-centered environments, yet their deployment is limited by vulnerability to falls and the risks posed by rigid metal-plastic structures to people and surroundings. We introduce a soft-rigid co-design framework that leverages non-Newtonian fluid-based soft responsive materials to enhance humanoid safety. The material remains compliant during normal interaction but rapidly stiffens under impact, absorbing and dissipating fall-induced forces. Physics-based simulations guide protector placement and thickness and enable learning of active fall policies. Applied to a 42 kg life-size humanoid, the protector markedly reduces peak impact and allows repeated falls without hardware damage, including drops from 3 m and tumbles down long staircases. Across diverse scenarios, the approach improves robot robustness and environmental safety. By uniting responsive materials, structural co-design, and learning-based control, this work advances interact-safe, industry-ready humanoid robots.
comment: 40 pages, 11 figures
Reinforcement Learning for Follow-the-Leader Robotic Endoscopic Navigation via Synthetic Data
Autonomous navigation is crucial for both medical and industrial endoscopic robots, enabling safe and efficient exploration of narrow tubular environments without continuous human intervention, where avoiding contact with the inner walls has been a longstanding challenge for prior approaches. We present a follow-the-leader endoscopic robot based on a flexible continuum structure designed to minimize contact between the endoscope body and intestinal walls, thereby reducing patient discomfort. To achieve this objective, we propose a vision-based deep reinforcement learning framework guided by monocular depth estimation. A realistic intestinal simulation environment was constructed in \textit{NVIDIA Omniverse} to train and evaluate autonomous navigation strategies. Furthermore, thousands of synthetic intraluminal images were generated using NVIDIA Replicator to fine-tune the Depth Anything model, enabling dense three-dimensional perception of the intestinal environment with a single monocular camera. Subsequently, we introduce a geometry-aware reward and penalty mechanism to enable accurate lumen tracking. Compared with the original Depth Anything model, our method improves $δ_{1}$ depth accuracy by 39.2% and reduces the navigation J-index by 0.67 relative to the second-best method, demonstrating the robustness and effectiveness of the proposed approach.
Closing the Reality Gap: Zero-Shot Sim-to-Real Deployment for Dexterous Force-Based Grasping and Manipulation
Human-like dexterous hands with multiple fingers offer human-level manipulation capabilities, but training control policies that can directly deploy on real hardware remains difficult due to contact-rich physics and imperfect actuation. We close this gap with a practical sim-to-real reinforcement learning (RL) framework that utilizes dense tactile feedback combined with joint torque sensing to explicitly regulate physical interactions. To enable effective sim-to-real transfer, we introduce (i) a computationally fast tactile simulation that computes distances between dense virtual tactile units and the object via parallel forward kinematics, providing high-rate, high-resolution touch signals needed by RL; (ii) a current-to-torque calibration that eliminates the need for torque sensors on dexterous hands by mapping motor current to joint torque; and (iii) actuator dynamics modeling to bridge the actuation gaps with randomization of non-ideal effects such as backlash, torque-speed saturation. Using an asymmetric actor-critic PPO pipeline trained entirely in simulation, our policies deploy directly to a five-finger hand. The resulting policies demonstrated two essential skills: (1) command-based, controllable grasp force tracking, and (2) reorientation of objects in the hand, both of which were robustly executed without fine-tuning on the robot. By combining tactile and torque in the observation space with effective sensing/actuation modeling, our system provides a practical solution to achieve reliable dexterous manipulation. To our knowledge, this is the first demonstration of controllable grasping on a multi-finger dexterous hand trained entirely in simulation and transferred zero-shot on real hardware.
M-SEVIQ: A Multi-band Stereo Event Visual-Inertial Quadruped-based Dataset for Perception under Rapid Motion and Challenging Illumination
Agile locomotion in legged robots poses significant challenges for visual perception. Traditional frame-based cameras often fail in these scenarios for producing blurred images, particularly under low-light conditions. In contrast, event cameras capture changes in brightness asynchronously, offering low latency, high temporal resolution, and high dynamic range. These advantages make them suitable for robust perception during rapid motion and under challenging illumination. However, existing event camera datasets exhibit limitations in stereo configurations and multi-band sensing domains under various illumination conditions. To address this gap, we present M-SEVIQ, a multi-band stereo event visual and inertial quadruped dataset collected using a Unitree Go2 equipped with stereo event cameras, a frame-based camera, an inertial measurement unit (IMU), and joint encoders. This dataset contains more than 30 real-world sequences captured across different velocity levels, illumination wavelengths, and lighting conditions. In addition, comprehensive calibration data, including intrinsic, extrinsic, and temporal alignments, are provided to facilitate accurate sensor fusion and benchmarking. Our M-SEVIQ can be used to support research in agile robot perception, sensor fusion, semantic segmentation and multi-modal vision in challenging environments.
comment: 6 pages, 7 figures
Advancing Assistive Robotics: Multi-Modal Navigation and Biophysical Monitoring for Next-Generation Wheelchairs
Assistive electric-powered wheelchairs (EPWs) have become essential mobility aids for people with disabilities such as amyotrophic lateral sclerosis (ALS), post-stroke hemiplegia, and dementia-related mobility impairment. This work presents a novel multi-modal EPW control system designed to prioritize patient needs while allowing seamless switching between control modes. Four complementary interfaces, namely joystick, speech, hand gesture, and electrooculography (EOG), are integrated with a continuous vital sign monitoring framework measuring heart rate variability, oxygen saturation (SpO2), and skin temperature. This combination enables greater patient independence while allowing caregivers to maintain real-time supervision and early intervention capability. Two-point calibration of the biophysical sensors against clinical reference devices resulted in root mean square errors of at most 2 bpm for heart rate, 0.5 degree Celsius for skin temperature, and 1 percent for SpO2. Experimental evaluation involved twenty participants with mobility impairments executing a total of 500 indoor navigation commands. The achieved command recognition accuracies were 99 percent for joystick control, 97 percent plus or minus 2 percent for speech, and 95 percent plus or minus 3 percent for hand gesture, with an average closed-loop latency of 20 plus or minus 0.5 milliseconds. Caregivers receive real-time alerts through an Android application following encrypted cloud transmission of physiological data. By integrating multi-modal mobility control with cloud-enabled health monitoring and reporting latency and energy budgets, the proposed prototype addresses key challenges in assistive robotics, contributes toward compliance with ISO 7176-31 and IEC 80601-2-78 safety standards, and establishes a foundation for future adaptive machine learning enhancements.
Unified Meta-Representation and Feedback Calibration for General Disturbance Estimation
Precise control in modern robotic applications is always an open issue due to unknown time-varying disturbances. Existing meta-learning-based approaches require a shared representation of environmental structures, which lack flexibility for realistic non-structural disturbances. Besides, representation error and the distribution shifts can lead to heavy degradation in prediction accuracy. This work presents a generalizable disturbance estimation framework that builds on meta-learning and feedback-calibrated online adaptation. By extracting features from a finite time window of past observations, a unified representation that effectively captures general non-structural disturbances can be learned without predefined structural assumptions. The online adaptation process is subsequently calibrated by a state-feedback mechanism to attenuate the learning residual originating from the representation and generalizability limitations. Theoretical analysis shows that simultaneous convergence of both the online learning error and the disturbance estimation error can be achieved. Through the unified meta-representation, our framework effectively estimates multiple rapidly changing disturbances, as demonstrated by quadrotor flight experiments. See the project page for video, supplementary material and code: https://nonstructural-metalearn.github.io.
comment: 8 pages, 10 figures
Towards Zero-Shot Point Cloud Registration Across Diverse Scales, Scenes, and Sensor Setups ICCV 2025
Some deep learning-based point cloud registration methods struggle with zero-shot generalization, often requiring dataset-specific hyperparameter tuning or retraining for new environments. We identify three critical limitations: (a) fixed user-defined parameters (e.g., voxel size, search radius) that fail to generalize across varying scales, (b) learned keypoint detectors exhibit poor cross-domain transferability, and (c) absolute coordinates amplify scale mismatches between datasets. To address these three issues, we present BUFFER-X, a training-free registration framework that achieves zero-shot generalization through: (a) geometric bootstrapping for automatic hyperparameter estimation, (b) distribution-aware farthest point sampling to replace learned detectors, and (c) patch-level coordinate normalization to ensure scale consistency. Our approach employs hierarchical multi-scale matching to extract correspondences across local, middle, and global receptive fields, enabling robust registration in diverse environments. For efficiency-critical applications, we introduce BUFFER-X-Lite, which reduces total computation time by 43% (relative to BUFFER-X) through early exit strategies and fast pose solvers while preserving accuracy. We evaluate on a comprehensive benchmark comprising 12 datasets spanning object-scale, indoor, and outdoor scenes, including cross-sensor registration between heterogeneous LiDAR configurations. Results demonstrate that our approach generalizes effectively without manual tuning or prior knowledge of test domains. Code: https://github.com/MIT-SPARK/BUFFER-X.
comment: 18 pages, 15 figures. Extended version of our ICCV 2025 highlight paper [arXiv:2503.07940]. arXiv admin note: substantial text overlap with arXiv:2503.07940
Optimizing Control-Friendly Trajectories with Self-Supervised Residual Learning
Real-world physics can only be analytically modeled with a certain level of precision for modern intricate robotic systems. As a result, tracking aggressive trajectories accurately could be challenging due to the existence of residual physics during controller synthesis. This paper presents a self-supervised residual learning and trajectory optimization framework to address the aforementioned challenges. At first, unknown dynamic effects on the closed-loop model are learned and treated as residuals of the nominal dynamics, jointly forming a hybrid model. We show that learning with analytic gradients can be achieved using only trajectory-level data while enjoying accurate long-horizon prediction with an arbitrary integration step size. Subsequently, a trajectory optimizer is developed to compute the optimal reference trajectory with the residual physics along it minimized. It ends up with trajectories that are friendly to the following control level. The agile flight of quadrotors illustrates that by utilizing the hybrid dynamics, the proposed optimizer outputs aggressive motions that can be precisely tracked.
comment: 10 pages, 9 figures
Loop Closure using AnyLoc Visual Place Recognition in DPV-SLAM
Loop closure is crucial for maintaining the accuracy and consistency of visual SLAM. We propose a method to improve loop closure performance in DPV-SLAM. Our approach integrates AnyLoc, a learning-based visual place recognition technique, as a replacement for the classical Bag of Visual Words (BoVW) loop detection method. In contrast to BoVW, which relies on handcrafted features, AnyLoc utilizes deep feature representations, enabling more robust image retrieval across diverse viewpoints and lighting conditions. Furthermore, we propose an adaptive mechanism that dynamically adjusts similarity threshold based on environmental conditions, removing the need for manual tuning. Experiments on both indoor and outdoor datasets demonstrate that our method significantly outperforms the original DPV-SLAM in terms of loop closure accuracy and robustness. The proposed method offers a practical and scalable solution for enhancing loop closure performance in modern SLAM systems.
comment: Accepted at IEEE/SICE International Symposium on System Integration(SII) 2026. 6 pages, 14 figures
Analysis of Various Manipulator Configurations Based on Multi-Objective Black-Box Optimization
Various 6-degree-of-freedom (DOF) and 7-DOF manipulators have been developed to date. Over a long history, their joint configurations and link length ratios have been determined empirically. In recent years, the development of robotic foundation models has become increasingly active, leading to the continuous proposal of various manipulators to support these models. However, none of these manipulators share exactly the same structure, as the order of joints and the ratio of link lengths differ among robots. Therefore, in order to discuss the optimal structure of a manipulator, we performed multi-objective optimization from the perspectives of end-effector reachability and joint torque. We analyze where existing manipulator structures stand within the sampling results of the optimization and provide insights for future manipulator design.
comment: Accepted to Advanced Robotics, website: https://haraduka.github.io/bbo-manip-design
Learning to Nudge: A Scalable Barrier Function Framework for Safe Robot Interaction in Dense Clutter
Robots operating in everyday environments must navigate and manipulate within densely cluttered spaces, where physical contact with surrounding objects is unavoidable. Traditional safety frameworks treat contact as unsafe, restricting robots to collision avoidance and limiting their ability to function in dense, everyday settings. As the number of objects grows, model-based approaches for safe manipulation become computationally intractable; meanwhile, learned methods typically tie safety to the task at hand, making them hard to transfer to new tasks without retraining. In this work we introduce Dense Contact Barrier Functions(DCBF). Our approach bypasses the computational complexity of explicitly modeling multi-object dynamics by instead learning a composable, object-centric function that implicitly captures the safety constraints arising from physical interactions. Trained offline on interactions with a few objects, the learned DCBFcomposes across arbitrary object sets at runtime, producing a single global safety filter that scales linearly and transfers across tasks without retraining. We validate our approach through simulated experiments in dense clutter, demonstrating its ability to enable collision-free navigation and safe, contact-rich interaction in suitable settings.
Effective Online 3D Bin Packing with Lookahead Parcels Using Monte Carlo Tree Search
Online 3D Bin Packing (3D-BP) with robotic arms is crucial for reducing transportation and labor costs in modern logistics. While Deep Reinforcement Learning (DRL) has shown strong performance, it often fails to adapt to real-world short-term distribution shifts, which arise as different batches of goods arrive sequentially, causing performance drops. We argue that the short-term lookahead information available in modern logistics systems is key to mitigating this issue, especially during distribution shifts. We formulate online 3D-BP with lookahead parcels as a Model Predictive Control (MPC) problem and adapt the Monte Carlo Tree Search (MCTS) framework to solve it. Our framework employs a dynamic exploration prior that automatically balances a learned RL policy and a robust random policy based on the lookahead characteristics. Additionally, we design an auxiliary reward to penalize long-term spatial waste from individual placements. Extensive experiments on real-world datasets show that our method consistently outperforms state-of-the-art baselines, achieving over 10\% gains under distributional shifts, 4\% average improvement in online deployment, and up to more than 8\% in the best case--demonstrating the effectiveness of our framework.
Making Infeasible Tasks Feasible: Planning to Reconfigure Disconnected 3D Environments with Movable Objects
Several planners have been developed to compute dynamically feasible, collision-free robot paths from an initial to a goal configuration. A key assumption in these works is that the goal region is reachable; an assumption that often fails in practice when environments are disconnected. Motivated by this limitation, we consider known 3D environments comprising objects, also called blocks, that form distinct navigable support surfaces (planes), and that are either non-movable (e.g., tables) or movable (e.g., boxes). These surfaces may be mutually disconnected due to height differences, holes, or lateral separations. Our focus is on tasks where the robot must reach a goal region residing on an elevated plane that is unreachable. Rather than declaring such tasks infeasible, an effective strategy is to enable the robot to interact with the environment, rearranging movable objects to create new traversable connections; a problem known as Navigation Among Movable Objects (NAMO). Existing NAMO planners typically address 2D environments, where obstacles are pushed aside to clear a path. These methods cannot directly handle the considered 3D setting; in such cases, obstacles must be placed strategically to bridge these physical disconnections. We address this challenge by developing BRiDGE (Block-based Reconfiguration in Disconnected 3D Geometric Environments), a sampling-based planner that incrementally builds trees over robot and object configurations to compute feasible plans specifying which objects to move, where to place them, and in what order, while accounting for a limited number of movable objects. To accelerate planning, we introduce non-uniform sampling strategies. We show that our method is probabilistically complete and we provide extensive numerical and hardware experiments validating its effectiveness.
FIRE-VLM: A Vision-Language-Driven Reinforcement Learning Framework for UAV Wildfire Tracking in a Physics-Grounded Fire Digital Twin
Wildfire monitoring demands autonomous systems capable of reasoning under extreme visual degradation, rapidly evolving physical dynamics, and scarce real-world training data. Existing UAV navigation approaches rely on simplified simulators and supervised perception pipelines, and lack embodied agents interacting with physically realistic fire environments. We introduce FIRE-VLM, the first end-to-end vision-language model (VLM) guided reinforcement learning (RL) framework trained entirely within a high-fidelity, physics-grounded wildfire digital twin. Built from USGS Digital Elevation Model (DEM) terrain, LANDFIRE fuel inventories, and semi-physical fire-spread solvers, this twin captures terrain-induced runs, wind-driven acceleration, smoke plume occlusion, and dynamic fuel consumption. Within this environment, a PPO agent with dual-view UAV sensing is guided by a CLIP-style VLM. Wildfire-specific semantic alignment scores, derived from a single prompt describing active fire and smoke plumes, are integrated as potential-based reward shaping signals. Our contributions are: (1) a GIS-to-simulation pipeline for constructing wildfire digital twins; (2) a VLM-guided RL agent for UAV firefront tracking; and (3) a wildfire-aware reward design that combines physical terms with VLM semantics. Across five digital-twin evaluation tasks, our VLM-guided policy reduces time-to-detection by up to 6 times, increases time-in-FOV, and is, to our knowledge, the first RL-based UAV wildfire monitoring system demonstrated in kilometer-scale, physics-grounded digital-twin fires.
Cost-Effective Radar Sensors for Field-Based Water Level Monitoring with Sub-Centimeter Accuracy
Water level monitoring is critical for flood management, water resource allocation, and ecological assessment, yet traditional methods remain costly and limited in coverage. This work explores radar-based sensing as a low-cost alternative for water level estimation, leveraging its non-contact nature and robustness to environmental conditions. Commercial radar sensors are evaluated in real-world field tests, applying statistical filtering techniques to improve accuracy. Results show that a single radar sensor can achieve centimeter-scale precision with minimal calibration, making it a practical solution for autonomous water monitoring using drones and robotic platforms.
comment: 10 pages, 6 figures. Preliminary results presented as a poster at an academic conference
Towards Zero-Knowledge Task Planning via a Language-based Approach
In this work, we introduce and formalize the Zero-Knowledge Task Planning (ZKTP) problem, i.e., formulating a sequence of actions to achieve some goal without task-specific knowledge. Additionally, we present a first investigation and approach for ZKTP that leverages a large language model (LLM) to decompose natural language instructions into subtasks and generate behavior trees (BTs) for execution. If errors arise during task execution, the approach also uses an LLM to adjust the BTs on-the-fly in a refinement loop. Experimental validation in the AI2-THOR simulator demonstrate our approach's effectiveness in improving overall task performance compared to alternative approaches that leverage task-specific knowledge. Our work demonstrates the potential of LLMs to effectively address several aspects of the ZKTP problem, providing a robust framework for automated behavior generation with no task-specific setup.
Modeling and Control for UAV with Off-center Slung Load
Unmanned aerial vehicle (UAV) with slung load system is a classic air transportation system. In practical applications, the suspension point of the slung load does not always align with the center of mass (CoM) of the UAV due to mission requirements or mechanical interference. This offset creates coupling in the system's nonlinear dynamics which leads to a complicated motion control problem. In existing research, modeling of the system are performed about the UAV's CoM. In this work we use the point of suspension instead. Based on the new model, a cascade control strategy is developed. In the middle-loop controller, the acceleration of the suspension point is used to regulate the swing angle of the slung load without the need for considering the coupling between the slung load and the UAV. Using the off-center reference frame, an inner-loop controller is designed to track the UAV's attitude without the need of simplification on the coupling effects. We prove local exponential stability of the closed-loop using Lyapunov approach. Finally, simulations and experiments are conducted to validate the proposed control system.
Lunar Rover Cargo Transport: Mission Concept and Field Test
In future operations on the lunar surface, automated vehicles will be required to transport cargo between known locations. Such vehicles must be able to navigate precisely in safe regions to avoid natural hazards, human-constructed infrastructure, and dangerous dark shadows. Rovers must be able to park their cargo autonomously within a small tolerance to achieve a successful pickup and delivery. In this field test, Lidar Teach and Repeat provides an ideal autonomy solution for transporting cargo in this way. A one-tonne path-to-flight rover was driven in a semi-autonomous remote-control mode to create a network of safe paths. Once the route was taught, the rover immediately repeated the entire network of paths autonomously while carrying cargo. The closed-loop performance is accurate enough to align the vehicle to the cargo and pick it up. This field report describes a two-week deployment at the Canadian Space Agency's Analogue Terrain, culminating in a simulated lunar operation to evaluate the system's capabilities. Successful cargo collection and delivery were demonstrated in harsh environmental conditions.
comment: 15 Pages, 13 Figures, to appear in IEEE Transactions on Field Robotics
Revisiting Continuous-Time Trajectory Estimation via Gaussian Processes and the Magnus Expansion
Continuous-time state estimation has been shown to be an effective means of (i) handling asynchronous and high-rate measurements, (ii) introducing smoothness to the estimate, (iii) post hoc querying the estimate at times other than those of the measurements, and (iv) addressing certain observability issues related to scanning-while-moving sensors. A popular means of representing the trajectory in continuous time is via a Gaussian process (GP) prior, with the prior's mean and covariance functions generated by a linear time-varying (LTV) stochastic differential equation (SDE) driven by white noise. When the state comprises elements of Lie groups, previous works have resorted to a patchwork of local GPs each with a linear time-invariant SDE kernel, which while effective in practice, lacks theoretical elegance. Here we revisit the full LTV GP approach to continuous-time trajectory estimation, deriving a global GP prior on Lie groups via the Magnus expansion, which offers a more elegant and general solution. We provide a numerical comparison between the two approaches and discuss their relative merits.
comment: 21 pages, 12 figures
CageDroneRF: A Large-Scale RF Benchmark and Toolkit for Drone Perception
We present CageDroneRF (CDRF), a large-scale benchmark for Radio-Frequency (RF) drone detection and identification built from real-world captures and systematically generated synthetic variants. CDRF addresses the scarcity and limited diversity of existing RF datasets by coupling extensive raw recordings with a principled augmentation pipeline that (i) precisely controls Signal-to-Noise Ratio (SNR), (ii) injects interfering emitters, and (iii) applies frequency shifts with label-consistent bounding-box transformations for detection. This dataset spans a wide range of contemporary drone models, many unavailable in current public datasets, and acquisition conditions, derived from data collected at the Rowan University campus and within a controlled RF-cage facility. CDRF is released with interoperable open-source tools for data generation, preprocessing, augmentation, and evaluation that also operate on existing public benchmarks. CDRF enables standardized benchmarking for classification, open-set recognition, and object detection, supporting rigorous comparisons and reproducible pipelines. By releasing this comprehensive benchmark and tooling, CDRF aims to accelerate progress toward robust, generalizable RF perception models.
Characterizing the Robustness of Black-Box LLM Planners Under Perturbed Observations with Adaptive Stress Testing
Large language models (LLMs) have recently demonstrated success in decision-making tasks including planning, control, and prediction, but their tendency to hallucinate unsafe and undesired outputs poses risks. This unwanted behavior is further exacerbated in environments where sensors are noisy or unreliable. Characterizing the behavior of LLM planners to varied observations is necessary to proactively avoid failures in safety-critical scenarios. We specifically investigate the response of LLMs along two different perturbation dimensions. Like prior works, one dimension generates semantically similar prompts with varied phrasing by randomizing order of details, modifying access to few-shot examples, etc. Unique to our work, the second dimension simulates access to varied sensors and noise to mimic raw sensor or detection algorithm failures. An initial case study in which perturbations are manually applied show that both dimensions lead LLMs to hallucinate in a multi-agent driving environment. However, manually covering the entire perturbation space for several scenarios is infeasible. As such, we propose a novel method for efficiently searching the space of prompt perturbations using adaptive stress testing (AST) with Monte-Carlo tree search (MCTS). Our AST formulation enables discovery of scenarios, sensor configurations, and prompt phrasing that cause language models to act with high uncertainty or even crash. By generating MCTS prompt perturbation trees across diverse scenarios, we show through extensive experiments that offline analyses can be used to proactively understand potential failures that may arise at runtime.
comment: 30 pages, 24 figures, 6 tables
Indicating Robot Vision Capabilities with Augmented Reality
Research indicates that humans can mistakenly assume that robots and humans have the same field of view, possessing an inaccurate mental model of robots. This misperception may lead to failures during human-robot collaboration tasks where robots might be asked to complete impossible tasks about out-of-view objects. The issue is more severe when robots do not have a chance to scan the scene to update their world model while focusing on assigned tasks. To help align humans' mental models of robots' vision capabilities, we propose four field-of-view indicators in augmented reality and conducted a human-subjects experiment (N=41) to evaluate them in a collaborative assembly task regarding accuracy, confidence, task efficiency, and workload. These indicators span a spectrum of positions: two at robot's eye and head space -- deepening eye socket and adding blocks to two sides of the eyes (i.e., egocentric), and two anchoring in the robot's task space -- adding extended blocks from the sides of eyes to the table and placing blocks directly on the tables (i.e., allocentric). Results showed that, when placed directly in the task space, the allocentric indicator yields the highest accuracy, although with a delay in interpreting the robot's field of view. When placed at the robot's eyes, the egocentric indicator of deeper eye sockets, possible for physical alteration, also increased accuracy. In all indicators, participants' confidence was high while cognitive load remained low. Finally, we contribute six guidelines for practitioners to apply our augmented reality indicators or physical alterations to align humans' mental models with robots' vision capabilities.
FICO: Finite-Horizon Closed-Loop Factorization for Unified Multi-Agent Path Finding
Multi-Agent Path Finding is a fundamental problem in robotics and AI, yet most existing formulations treat planning and execution separately and address variants of the problem in an ad hoc manner. This paper presents a system-level framework for MAPF that integrates planning and execution, generalizes across variants, and explicitly models uncertainties. At its core is the MAPF system, a formal model that casts MAPF as a control design problem encompassing classical and uncertainty-aware formulations. To solve it, we introduce Finite-Horizon Closed-Loop Factorization (FICO), a factorization-based algorithm inspired by receding-horizon control that exploits compositional structure for efficient closed-loop operation. FICO enables real-time responses -- commencing execution within milliseconds -- while scaling to thousands of agents and adapting seamlessly to execution-time uncertainties. Extensive case studies demonstrate that it reduces computation time by up to two orders of magnitude compared with open-loop baselines, while delivering significantly higher throughput under stochastic delays and agent arrivals. These results establish a principled foundation for analyzing and advancing MAPF through system-level modeling, factorization, and closed-loop design.
Evaluating Gemini Robotics Policies in a Veo World Simulator
Generative world models hold significant potential for simulating interactions with visuomotor policies in varied environments. Frontier video models can enable generation of realistic observations and environment interactions in a scalable and general manner. However, the use of video models in robotics has been limited primarily to in-distribution evaluations, i.e., scenarios that are similar to ones used to train the policy or fine-tune the base video model. In this report, we demonstrate that video models can be used for the entire spectrum of policy evaluation use cases in robotics: from assessing nominal performance to out-of-distribution (OOD) generalization, and probing physical and semantic safety. We introduce a generative evaluation system built upon a frontier video foundation model (Veo). The system is optimized to support robot action conditioning and multi-view consistency, while integrating generative image-editing and multi-view completion to synthesize realistic variations of real-world scenes along multiple axes of generalization. We demonstrate that the system preserves the base capabilities of the video model to enable accurate simulation of scenes that have been edited to include novel interaction objects, novel visual backgrounds, and novel distractor objects. This fidelity enables accurately predicting the relative performance of different policies in both nominal and OOD conditions, determining the relative impact of different axes of generalization on policy performance, and performing red teaming of policies to expose behaviors that violate physical or semantic safety constraints. We validate these capabilities through 1600+ real-world evaluations of eight Gemini Robotics policy checkpoints and five tasks for a bimanual manipulator.
DDBot: Differentiable Physics-based Digging Robot for Unknown Granular Materials
Automating the manipulation of granular materials poses significant challenges due to complex contact dynamics, unpredictable material properties, and intricate system states. Existing approaches often fail to achieve efficiency and accuracy in such tasks. To fill the research gap, this article studies the small-scale and high-precision granular material digging task with unknown physical properties. A key scientific problem addressed is the feasibility of applying first-order gradient-based optimization to complex differentiable granular material simulation and overcoming associated numerical instability. A new framework, named differentiable digging robot (DDBot), is proposed to manipulate granular materials, including sand and soil. Specifically, we equip DDBot with a differentiable physics-based simulator, tailored for granular material manipulation, powered by GPU-accelerated parallel computing and automatic differentiation. DDBot can perform efficient differentiable system identification and high-precision digging skill optimization for unknown granular materials, which is enabled by a differentiable skill-to-action mapping, a task-oriented demonstration method, gradient clipping and line search-based gradient descent. Experimental results show that DDBot can efficiently (converge within 5 to 20 minutes) identify unknown granular material dynamics and optimize digging skills, with high-precision results in zero-shot real-world deployments, highlighting its practicality. Benchmark results against state-of-the-art baselines also confirm the robustness and efficiency of DDBot in such digging tasks.
comment: Published as a regular paper by the IEEE Transactions on Robotics
An Informative Planning Framework for Target Tracking and Active Mapping in Dynamic Environments with ASVs
Mobile robot platforms are increasingly being used to automate information gathering tasks such as environmental monitoring. Efficient target tracking in dynamic environments is critical for applications such as search and rescue and pollutant cleanups. In this letter, we study active mapping of floating targets that drift due to environmental disturbances such as wind and currents. This is a challenging problem as it involves predicting both spatial and temporal variations in the map due to changing conditions. We introduce an integrated framework combining dynamic occupancy grid mapping and an informative planning approach to actively map and track freely drifting targets with an autonomous surface vehicle. A key component of our adaptive planning approach is a spatiotemporal prediction network that predicts target position distributions over time. We further propose a planning objective for target tracking that leverages these predictions. Simulation experiments show that this planning objective improves target tracking performance compared to existing methods that consider only entropy reduction as the planning objective. Finally, we validate our approach in field tests, showcasing its ability to track targets in real-world monitoring scenarios.
comment: Accepted for publication in Robotics and Automation Letters (RA-L)
Augmented Reality for RObots (ARRO): Pointing Visuomotor Policies Towards Visual Robustness
Visuomotor policies trained on human expert demonstrations have recently shown strong performance across a wide range of robotic manipulation tasks. However, these policies remain highly sensitive to domain shifts stemming from background or robot embodiment changes, which limits their generalization capabilities. In this paper, we present ARRO, a novel visual representation that leverages zero-shot open-vocabulary segmentation and object detection models to efficiently mask out task-irrelevant regions of the scene in real time without requiring additional training, modeling of the setup, or camera calibration. By filtering visual distractors and overlaying virtual guides during both training and inference, ARRO improves robustness to scene variations and reduces the need for additional data collection. We extensively evaluate ARRO with Diffusion Policy on a range of tabletop manipulation tasks in both simulation and real-world environments, and further demonstrate its compatibility and effectiveness with generalist robot policies, such as Octo, OpenVLA and Pi Zero. Across all settings in our evaluation, ARRO yields consistent performance gains, allows for selective masking to choose between different objects, and shows robustness even to challenging segmentation conditions. Videos showcasing our results are available at: https://augmented-reality-for-robots.github.io/
RobotDiffuse: Diffusion-Based Motion Planning for Redundant Manipulators with the ROP Obstacle Avoidance Dataset
Redundant manipulators, with their higher Degrees of Freedom (DoFs), offer enhanced kinematic performance and versatility, making them suitable for applications like manufacturing, surgical robotics, and human-robot collaboration. However, motion planning for these manipulators is challenging due to increased DoFs and complex, dynamic environments. While traditional motion planning algorithms struggle with high-dimensional spaces, deep learning-based methods often face instability and inefficiency in complex tasks. This paper introduces RobotDiffuse, a diffusion model-based approach for motion planning in redundant manipulators. By integrating physical constraints with a point cloud encoder and replacing the U-Net structure with an encoder-only transformer, RobotDiffuse improves the model's ability to capture temporal dependencies and generate smoother, more coherent motion plans. We validate the approach using a complex simulator and release a new dataset, Robot-obtalcles-panda (ROP), with 35M robot poses and 0.14M obstacle avoidance scenarios. The highest overall score obtained in the experiment demonstrates the effectiveness of RobotDiffuse and the promise of diffusion models for motion planning tasks. The dataset can be accessed at https://github.com/ACRoboT-buaa/RobotDiffuse.
VLN-MME: Diagnosing MLLMs as Language-guided Visual Navigation agents
Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities across a wide range of vision-language tasks. However, their performance as embodied agents, which requires multi-round dialogue spatial reasoning and sequential action prediction, needs further exploration. Our work investigates this potential in the context of Vision-and-Language Navigation (VLN) by introducing a unified and extensible evaluation framework to probe MLLMs as zero-shot agents by bridging traditional navigation datasets into a standardized benchmark, named VLN-MME. We simplify the evaluation with a highly modular and accessible design. This flexibility streamlines experiments, enabling structured comparisons and component-level ablations across diverse MLLM architectures, agent designs, and navigation tasks. Crucially, enabled by our framework, we observe that enhancing our baseline agent with Chain-of-Thought (CoT) reasoning and self-reflection leads to an unexpected performance decrease. This suggests MLLMs exhibit poor context awareness in embodied navigation tasks; although they can follow instructions and structure their output, their 3D spatial reasoning fidelity is low. VLN-MME lays the groundwork for systematic evaluation of general-purpose MLLMs in embodied navigation settings and reveals limitations in their sequential decision-making capabilities. We believe these findings offer crucial guidance for MLLM post-training as embodied agents.
Steering Flexible Linear Objects in Planar Environments by Two Robot Hands Using Euler's Elastica Solutions
The manipulation of flexible objects such as cables, wires and fresh food items by robot hands forms a special challenge in robot grasp mechanics. This paper considers the steering of flexible linear objects in planar environments by two robot hands. The flexible linear object, modeled as an elastic non-stretchable rod, is manipulated by varying the gripping endpoint positions while keeping equal endpoint tangents. The flexible linear object shape has a closed form solution in terms of the grasp endpoint positions and tangents, called Euler's elastica. This paper obtains the elastica solutions under the optimal control framework, then uses the elastica solutions to obtain closed-form criteria for non self-intersection, stability and obstacle avoidance of the flexible linear object. The new tools are incorporated into a planning scheme for steering flexible linear objects in planar environments populated by sparsely spaced obstacles. The scheme is fully implemented and demonstrated with detailed examples.
Chain-of-Action: Trajectory Autoregressive Modeling for Robotic Manipulation
We present Chain-of-Action (CoA), a novel visuo-motor policy paradigm built upon Trajectory Autoregressive Modeling. Unlike conventional approaches that predict next step action(s) forward, CoA generates an entire trajectory by explicit backward reasoning with task-specific goals through an action-level Chain-of-Thought (CoT) process. This process is unified within a single autoregressive structure: (1) the first token corresponds to a stable keyframe action that encodes the task-specific goals; and (2) subsequent action tokens are generated autoregressively, conditioned on the initial keyframe and previously predicted actions. This backward action reasoning enforces a global-to-local structure, allowing each local action to be tightly constrained by the final goal. To further realize the action reasoning structure, CoA incorporates four complementary designs: continuous action token representation; dynamic stopping for variable-length trajectory generation; reverse temporal ensemble; and multi-token prediction to balance action chunk modeling with global structure. As a result, CoA gives strong spatial generalization capabilities while preserving the flexibility and simplicity of a visuo-motor policy. Empirically, we observe CoA achieves the state-of-the-art performance across 60 RLBench tasks and 8 real-world manipulation tasks.
Efficient Swept Volume-Based Trajectory Generation for Arbitrary-Shaped Ground Robot Navigation
Navigating an arbitrary-shaped ground robot safely in cluttered environments remains a challenging problem. The existing trajectory planners that account for the robot's physical geometry severely suffer from the intractable runtime. To achieve both computational efficiency and Continuous Collision Avoidance (CCA) of arbitrary-shaped ground robot planning, we proposed a novel coarse-to-fine navigation framework that significantly accelerates planning. In the first stage, a sampling-based method selectively generates distinct topological paths that guarantee a minimum inflated margin. In the second stage, a geometry-aware front-end strategy is designed to discretize these topologies into full-state robot motion sequences while concurrently partitioning the paths into SE(2) sub-problems and simpler R2 sub-problems for back-end optimization. In the final stage, an SVSDF-based optimizer generates trajectories tailored to these sub-problems and seamlessly splices them into a continuous final motion plan. Extensive benchmark comparisons show that the proposed method is one to several orders of magnitude faster than the cutting-edge methods in runtime while maintaining a high planning success rate and ensuring CCA.
RoboMIND 2.0: A Multimodal, Bimanual Mobile Manipulation Dataset for Generalizable Embodied Intelligence
While data-driven imitation learning has revolutionized robotic manipulation, current approaches remain constrained by the scarcity of large-scale, diverse real-world demonstrations. Consequently, the ability of existing models to generalize across long-horizon bimanual tasks and mobile manipulation in unstructured environments remains limited. To bridge this gap, we present RoboMIND 2.0, a comprehensive real-world dataset comprising over 310K dual-arm manipulation trajectories collected across six distinct robot embodiments and 739 complex tasks. Crucially, to support research in contact-rich and spatially extended tasks, the dataset incorporates 12K tactile-enhanced episodes and 20K mobile manipulation trajectories. Complementing this physical data, we construct high-fidelity digital twins of our real-world environments, releasing an additional 20K-trajectory simulated dataset to facilitate robust sim-to-real transfer. To fully exploit the potential of RoboMIND 2.0, we propose MIND-2 system, a hierarchical dual-system frame-work optimized via offline reinforcement learning. MIND-2 integrates a high-level semantic planner (MIND-2-VLM) to decompose abstract natural language instructions into grounded subgoals, coupled with a low-level Vision-Language-Action executor (MIND-2-VLA), which generates precise, proprioception-aware motor actions.
RoboTracer: Mastering Spatial Trace with Reasoning in Vision-Language Models for Robotics
Spatial tracing, as a fundamental embodied interaction ability for robots, is inherently challenging as it requires multi-step metric-grounded reasoning compounded with complex spatial referring and real-world metric measurement. However, existing methods struggle with this compositional task. To this end, we propose RoboTracer, a 3D-aware VLM that first achieves both 3D spatial referring and measuring via a universal spatial encoder and a regression-supervised decoder to enhance scale awareness during supervised fine-tuning (SFT). Moreover, RoboTracer advances multi-step metric-grounded reasoning via reinforcement fine-tuning (RFT) with metric-sensitive process rewards, supervising key intermediate perceptual cues to accurately generate spatial traces. To support SFT and RFT training, we introduce TraceSpatial, a large-scale dataset of 30M QA pairs, spanning outdoor/indoor/tabletop scenes and supporting complex reasoning processes (up to 9 steps). We further present TraceSpatial-Bench, a challenging benchmark filling the gap to evaluate spatial tracing. Experimental results show that RoboTracer surpasses baselines in spatial understanding, measuring, and referring, with an average success rate of 79.1%, and also achieves SOTA performance on TraceSpatial-Bench by a large margin, exceeding Gemini-2.5-Pro by 36% accuracy. Notably, RoboTracer can be integrated with various control policies to execute long-horizon, dynamic tasks across diverse robots (UR5, G1 humanoid) in cluttered real-world scenes. See the project page at https://zhoues.github.io/RoboTracer.
comment: Project page: https://zhoues.github.io/RoboTracer
AdaVLN: Towards Visual Language Navigation in Continuous Indoor Environments with Moving Humans
Visual Language Navigation is a task that challenges robots to navigate in realistic environments based on natural language instructions. While previous research has largely focused on static settings, real-world navigation must often contend with dynamic human obstacles. Hence, we propose an extension to the task, termed Adaptive Visual Language Navigation (AdaVLN), which seeks to narrow this gap. AdaVLN requires robots to navigate complex 3D indoor environments populated with dynamically moving human obstacles, adding a layer of complexity to navigation tasks that mimic the real-world. To support exploration of this task, we also present AdaVLN simulator and AdaR2R datasets. The AdaVLN simulator enables easy inclusion of fully animated human models directly into common datasets like Matterport3D. We also introduce a "freeze-time" mechanism for both the navigation task and simulator, which pauses world state updates during agent inference, enabling fair comparisons and experimental reproducibility across different hardware. We evaluate several baseline models on this task, analyze the unique challenges introduced by AdaVLN, and demonstrate its potential to bridge the sim-to-real gap in VLN research.
DarkEQA: Benchmarking Vision-Language Models for Embodied Question Answering in Low-Light Indoor Environments
Vision Language Models (VLMs) are increasingly adopted as central reasoning modules for embodied agents. Existing benchmarks evaluate their capabilities under ideal, well-lit conditions, yet robust 24/7 operation demands performance under a wide range of visual degradations, including low-light conditions at night or in dark environments--a core necessity that has been largely overlooked. To address this underexplored challenge, we present DarkEQA, an open-source benchmark for evaluating EQA-relevant perceptual primitives under multi-level low-light conditions. DarkEQA isolates the perception bottleneck by evaluating question answering from egocentric observations under controlled degradations, enabling attributable robustness analysis. A key design feature of DarkEQA is its physical fidelity: visual degradations are modeled in linear RAW space, simulating physics-based illumination drop and sensor noise followed by an ISP-inspired rendering pipeline. We demonstrate the utility of DarkEQA by evaluating a wide range of state-of-the-art VLMs and Low-Light Image Enhancement (LLIE) models. Our analysis systematically reveals VLMs' limitations when operating under these challenging visual conditions. Project website: https://darkeqa-benchmark.github.io/
comment: Submitted to IEEE Robotics and Automation Letters (RA-L)
Multiagent Systems
InfiAgent: An Infinite-Horizon Framework for General-Purpose Autonomous Agents
LLM agents can reason and use tools, but they often break down on long-horizon tasks due to unbounded context growth and accumulated errors. Common remedies such as context compression or retrieval-augmented prompting introduce trade-offs between information fidelity and reasoning stability. We present InfiAgent, a general-purpose framework that keeps the agent's reasoning context strictly bounded regardless of task duration by externalizing persistent state into a file-centric state abstraction. At each step, the agent reconstructs context from a workspace state snapshot plus a fixed window of recent actions. Experiments on DeepResearch and an 80-paper literature review task show that, without task-specific fine-tuning, InfiAgent with a 20B open-source model is competitive with larger proprietary systems and maintains substantially higher long-horizon coverage than context-centric baselines. These results support explicit state externalization as a practical foundation for stable long-horizon agents. Github Repo:https://github.com/ChenglinPoly/infiAgent
Software-Defined Agentic Serving
As multi-agent LLM pipelines grow in complexity, existing serving paradigms fail to adapt to the dynamic serving conditions. We argue that agentic serving systems should be programmable and system-aware, unlike existing serving which statically encode the parameters. In this work, we propose a new SDN-inspired agentic serving framework that helps control the key attributes of communication based on runtime state. This architecture enables serving-efficient, responsive agent systems and paves the way for high-level intent-driven agentic serving.
Computationally Efficient Estimation of Localized Treatment Effects in High-Dimensional Design Spaces using Gaussian Process Regression
Population-scale agent-based simulations of the opioid epidemic help evaluate intervention strategies and overdose outcomes in heterogeneous communities and provide estimates of localized treatment effects, which support the design of locally-tailored policies for precision public health. However, it is prohibitively costly to run simulations of all treatment conditions in all communities because the number of possible treatments grows exponentially with the number of interventions and levels at which they are applied. To address this need efficiently, we develop a metamodel framework, whereby treatment outcomes are modeled using a response function whose coefficients are learned through Gaussian process regression (GPR) on locally-contextualized covariates. We apply this framework to efficiently estimate treatment effects on overdose deaths in Pennsylvania counties. In contrast to classical designs such as fractional factorial design or Latin hypercube sampling, our approach leverages spatial correlations and posterior uncertainty to sequentially sample the most informative counties and treatment conditions. Using a calibrated agent-based opioid epidemic model, informed by county-level overdose mortality and baseline dispensing rate data for different treatments, we obtained county-level estimates of treatment effects on overdose deaths per 100,000 population for all treatment conditions in Pennsylvania, achieving approximately 5% average relative error using one-tenth the number of simulation runs required for exhaustive evaluation. Our bi-level framework provides a computationally efficient approach to decision support for policy makers, enabling rapid evaluation of alternative resource-allocation strategies to mitigate the opioid epidemic in local communities. The same analytical framework can be applied to guide precision public health interventions in other epidemic settings.
comment: repository link: https://github.com/abdulrahmanfci/gpr-metamodel/
ReCCur: A Recursive Corner-Case Curation Framework for Robust Vision-Language Understanding in Open and Edge Scenarios
Corner cases are rare or extreme scenarios that drive real-world failures, but they are difficult to curate at scale: web data are noisy, labels are brittle, and edge deployments preclude large retraining. We present ReCCur (Recursive Corner-Case Curation), a low-compute framework that converts noisy web imagery into auditable fine-grained labels via a multi-agent recursive pipeline. First, large-scale data acquisition and filtering expands a domain vocabulary with a vision-language model (VLM), crawls the web, and enforces tri-modal (image, description, keyword) consistency with light human spot checks to yield refined candidates. Next, mixture-of-experts knowledge distillation uses complementary encoders (e.g., CLIP, DINOv2, BEiT) for kNN voting with dual-confidence activation and uncertainty sampling, converging to a high-precision set. Finally, region-evidence VLM adversarial labeling pairs a proposer (multi-granularity regions and semantic cues) with a validator (global and local chained consistency) to produce explainable labels and close the loop. On realistic corner-case scenarios (e.g., flooded-car inspection), ReCCur runs on consumer-grade GPUs, steadily improves purity and separability, and requires minimal human supervision, providing a practical substrate for downstream training and evaluation under resource constraints. Code and dataset will be released.
MixTTE: Multi-Level Mixture-of-Experts for Scalable and Adaptive Travel Time Estimation KDD 2026
Accurate Travel Time Estimation (TTE) is critical for ride-hailing platforms, where errors directly impact user experience and operational efficiency. While existing production systems excel at holistic route-level dependency modeling, they struggle to capture city-scale traffic dynamics and long-tail scenarios, leading to unreliable predictions in large urban networks. In this paper, we propose \model, a scalable and adaptive framework that synergistically integrates link-level modeling with industrial route-level TTE systems. Specifically, we propose a spatio-temporal external attention module to capture global traffic dynamic dependencies across million-scale road networks efficiently. Moreover, we construct a stabilized graph mixture-of-experts network to handle heterogeneous traffic patterns while maintaining inference efficiency. Furthermore, an asynchronous incremental learning strategy is tailored to enable real-time and stable adaptation to dynamic traffic distribution shifts. Experiments on real-world datasets validate MixTTE significantly reduces prediction errors compared to seven baselines. MixTTE has been deployed in DiDi, substantially improving the accuracy and stability of the TTE service.
comment: Accepted to KDD 2026
Image, Word and Thought: A More Challenging Language Task for the Iterated Learning Model
The iterated learning model simulates the transmission of language from generation to generation in order to explore how the constraints imposed by language transmission facilitate the emergence of language structure. Despite each modelled language learner starting from a blank slate, the presence of a bottleneck limiting the number of utterances to which the learner is exposed can lead to the emergence of language that lacks ambiguity, is governed by grammatical rules, and is consistent over successive generations, that is, one that is expressive, compositional and stable. The recent introduction of a more computationally tractable and ecologically valid semi supervised iterated learning model, combining supervised and unsupervised learning within an autoencoder architecture, has enabled exploration of language transmission dynamics for much larger meaning-signal spaces. Here, for the first time, the model has been successfully applied to a language learning task involving the communication of much more complex meanings: seven-segment display images. Agents in this model are able to learn and transmit a language that is expressive: distinct codes are employed for all 128 glyphs; compositional: signal components consistently map to meaning components, and stable: the language does not change from generation to generation.
comment: This is an extended version of a paper accepted for EvoLang2026, it includes additional details of the numerical experiments
EvoRoute: Experience-Driven Self-Routing LLM Agent Systems
Complex agentic AI systems, powered by a coordinated ensemble of Large Language Models (LLMs), tool and memory modules, have demonstrated remarkable capabilities on intricate, multi-turn tasks. However, this success is shadowed by prohibitive economic costs and severe latency, exposing a critical, yet underexplored, trade-off. We formalize this challenge as the \textbf{Agent System Trilemma}: the inherent tension among achieving state-of-the-art performance, minimizing monetary cost, and ensuring rapid task completion. To dismantle this trilemma, we introduce EvoRoute, a self-evolving model routing paradigm that transcends static, pre-defined model assignments. Leveraging an ever-expanding knowledge base of prior experience, EvoRoute dynamically selects Pareto-optimal LLM backbones at each step, balancing accuracy, efficiency, and resource use, while continually refining its own selection policy through environment feedback. Experiments on challenging agentic benchmarks such as GAIA and BrowseComp+ demonstrate that EvoRoute, when integrated into off-the-shelf agentic systems, not only sustains or enhances system performance but also reduces execution cost by up to $80\%$ and latency by over $70\%$.
Sensor to Pixels: Decentralized Swarm Gathering via Image-Based Reinforcement Learning
This study highlights the potential of image-based reinforcement learning methods for addressing swarm-related tasks. In multi-agent reinforcement learning, effective policy learning depends on how agents sense, interpret, and process inputs. Traditional approaches often rely on handcrafted feature extraction or raw vector-based representations, which limit the scalability and efficiency of learned policies concerning input order and size. In this work we propose an image-based reinforcement learning method for decentralized control of a multi-agent system, where observations are encoded as structured visual inputs that can be processed by Neural Networks, extracting its spatial features and producing novel decentralized motion control rules. We evaluate our approach on a multi-agent convergence task of agents with limited-range and bearing-only sensing that aim to keep the swarm cohesive during the aggregation. The algorithm's performance is evaluated against two benchmarks: an analytical solution proposed by Bellaiche and Bruckstein, which ensures convergence but progresses slowly, and VariAntNet, a neural network-based framework that converges much faster but shows medium success rates in hard constellations. Our method achieves high convergence, with a pace nearly matching that of VariAntNet. In some scenarios, it serves as the only practical alternative.
LLM-Enabled Multi-Agent Systems: Empirical Evaluation and Insights into Emerging Design Patterns & Paradigms
This paper formalises the literature on emerging design patterns and paradigms for Large Language Model (LLM)-enabled multi-agent systems (MAS), evaluating their practical utility across various domains. We define key architectural components, including agent orchestration, communication mechanisms, and control-flow strategies, and demonstrate how these enable rapid development of modular, domain-adaptive solutions. Three real-world case studies are tested in controlled, containerised pilots in telecommunications security, national heritage asset management, and utilities customer service automation. Initial empirical results show that, for these case studies, prototypes were delivered within two weeks and pilot-ready solutions within one month, suggesting reduced development overhead compared to conventional approaches and improved user accessibility. However, findings also reinforce limitations documented in the literature, including variability in LLM behaviour that leads to challenges in transitioning from prototype to production maturity. We conclude by outlining critical research directions for improving reliability, scalability, and governance in MAS architectures and the further work needed to mature MAS design patterns to mitigate the inherent challenges.
PC2P: Multi-Agent Path Finding via Personalized-Enhanced Communication and Crowd Perception IROS 2025
Distributed Multi-Agent Path Finding (MAPF) integrated with Multi-Agent Reinforcement Learning (MARL) has emerged as a prominent research focus, enabling real-time cooperative decision-making in partially observable environments through inter-agent communication. However, due to insufficient collaborative and perceptual capabilities, existing methods are inadequate for scaling across diverse environmental conditions. To address these challenges, we propose PC2P, a novel distributed MAPF method derived from a Q-learning-based MARL framework. Initially, we introduce a personalized-enhanced communication mechanism based on dynamic graph topology, which ascertains the core aspects of ``who" and ``what" in interactive process through three-stage operations: selection, generation, and aggregation. Concurrently, we incorporate local crowd perception to enrich agents' heuristic observation, thereby strengthening the model's guidance for effective actions via the integration of static spatial constraints and dynamic occupancy changes. To resolve extreme deadlock issues, we propose a region-based deadlock-breaking strategy that leverages expert guidance to implement efficient coordination within confined areas. Experimental results demonstrate that PC2P achieves superior performance compared to state-of-the-art distributed MAPF methods in varied environments. Ablation studies further confirm the effectiveness of each module for overall performance.
comment: 8 pages,7 figures,3 tables,Accepted to IROS 2025
AI Agents as Policymakers in Simulated Epidemics
AI agents are increasingly deployed as quasi-autonomous systems for specialized tasks, yet their potential as computational models of decision-making remains underexplored. We develop a generative AI agent to study repetitive policy decisions during an epidemic, embedding the agent, prompted to act as a city mayor, within a simulated SEIR environment. Each week, the agent receives updated epidemiological information, evaluates the evolving situation, and sets business restriction levels. The agent is equipped with a dynamic memory that weights past events by recency and is evaluated in both single- and ensemble-agent settings across environments of varying complexity. Across scenarios, the agent exhibits human-like reactive behavior, tightening restrictions in response to rising cases and relaxing them as risk declines. Crucially, providing the agent with brief systems-level knowledge of epidemic dynamics, highlighting feedbacks between disease spread and behavioral responses, substantially improves decision quality and stability. The results illustrate how theory-informed prompting can shape emergent policy behavior in AI agents. These findings demonstrate that generative AI agents, when situated in structured environments and guided by minimal domain theory, can serve as powerful computational models for studying decision-making and policy design in complex social systems.
comment: 24 pages, 5 figures
AgentArch: A Comprehensive Benchmark to Evaluate Agent Architectures in Enterprise
While individual components of agentic architectures have been studied in isolation, there remains limited empirical understanding of how different design dimensions interact within complex multi-agent systems. This study aims to address these gaps by providing a comprehensive enterprise-specific benchmark evaluating 18 distinct agentic configurations across state-of-the-art large language models. We examine four critical agentic system dimensions: orchestration strategy, agent prompt implementation (ReAct versus function calling), memory architecture, and thinking tool integration. Our benchmark reveals significant model-specific architectural preferences that challenge the prevalent one-size-fits-all paradigm in agentic AI systems. It also reveals significant weaknesses in overall agentic performance on enterprise tasks with the highest scoring models achieving a maximum of only 35.3\% success on the more complex task and 70.8\% on the simpler task. We hope these findings inform the design of future agentic systems by enabling more empirically backed decisions regarding architectural components and model selection.
It's Not All Black and White: Degree of Truthfulness for Risk-Avoiding Agents
The classic notion of \emph{truthfulness} requires that no agent has a profitable manipulation -- an untruthful report that, for \emph{some} combination of reports of the other agents, increases her utility. This strong notion implicitly assumes that the manipulating agent either knows what all other agents are going to report, or is willing to take the risk and act as-if she knows their reports. Without knowledge of the others' reports, most manipulations are \emph{risky} -- they might decrease the manipulator's utility for some other combinations of reports by the other agents. Accordingly, a recent paper (Bu, Song and Tao, ``On the existence of truthful fair cake cutting mechanisms'', Artificial Intelligence 319 (2023), 103904) suggests a relaxed notion, which we refer to as \emph{risk-avoiding truthfulness (RAT)}, which requires only that no agent can gain from a \emph{safe} manipulation -- one that is sometimes beneficial and never harmful. Truthfulness and RAT are two extremes: the former considers manipulators with complete knowledge of others, whereas the latter considers manipulators with no knowledge at all. In reality, agents often know about some -- but not all -- of the other agents. This paper introduces the \emph{RAT-degree} of a mechanism, defined as the smallest number of agents whose reports, if known, may allow another agent to safely manipulate, or $n$ if there is no such number. This notion interpolates between classic truthfulness (degree $n$) and RAT (degree at least $1$): a mechanism with a higher RAT-degree is harder to manipulate safely. To illustrate the generality and applicability of this concept, we analyze the RAT-degree of prominent mechanisms across various social choice settings, including auctions, indivisible goods allocations, cake-cutting, voting, and two-sided matching.
comment: Accepted to EC 2025
Scene-Aware Vectorized Memory Multi-Agent Framework with Cross-Modal Differentiated Quantization VLMs for Visually Impaired Assistance
Visually impaired individuals face significant challenges in environmental perception. Traditional assistive technologies often lack adaptive intelligence, focusing on individual components rather than integrated systems. While Vision-Language Models (VLMs) offer a promising path to richer, integrated understanding, their deployment is severely limited by substantial computational requirements, demanding dozens of gigabytes of memory. To address these gaps in computational efficiency and integrated design, this study proposes a dual technological innovation framework: a cross-modal differentiated quantization framework for VLMs and a scene-aware vectorized memory multi-agent system. The quantization framework implements differentiated strategies, reducing memory from 38GB to 11.3GB. The multi-agent system uses vectorized memory and perception-memory-reasoning workflows to provide environmental information beyond the current view, achieving 2.83-3.52s latency to initial speech output. Experiments show the quantized 19B-parameter model only experiences a 2.05% performance drop on MMBench and maintains 63.7 accuracy on OCR-VQA (original: 64.9), outperforming smaller models with equivalent memory. This research advances computational efficiency and assistive technology, offering comprehensive assistance in scene perception, text recognition, and navigation.
comment: 28 pages,9 figures
Neural Power-Optimal Magnetorquer Solution for Multi-Agent Formation and Attitude Control
This paper presents a learning-based current calculation model to achieve power-optimal magnetic-field interaction for multi-agent formation and attitude control. In aerospace engineering, electromagnetic coils are referred to as magnetorquer (MTQ) coils and used as satellite attitude actuators in Earth's orbit and for long-term formation and attitude control. This study derives a unique, continuous, and power-optimal current solution via sequential convex programming and approximates it using a multilayer perceptron model. The effectiveness of our strategy was demonstrated through numerical simulations and experimental trials on the formation and attitude control.
comment: Submitted to IEEE Robotics and Automation Letters
SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents
Agentic reinforcement learning (RL) holds great promise for the development of autonomous agents under complex GUI tasks, but its scalability remains severely hampered by the verification of task completion. Existing task verification is treated as a passive, post-hoc process: a verifier (i.e., rule-based scoring script, reward or critic model, and LLM-as-a-Judge) analyzes the agent's entire interaction trajectory to determine if the agent succeeds. Such processing of verbose context that contains irrelevant, noisy history poses challenges to the verification protocols and therefore leads to prohibitive cost and low reliability. To overcome this bottleneck, we propose SmartSnap, a paradigm shift from this passive, post-hoc verification to proactive, in-situ self-verification by the agent itself. We introduce the Self-Verifying Agent, a new type of agent designed with dual missions: to not only complete a task but also to prove its accomplishment with curated snapshot evidences. Guided by our proposed 3C Principles (Completeness, Conciseness, and Creativity), the agent leverages its accessibility to the online environment to perform self-verification on a minimal, decisive set of snapshots. Such evidences are provided as the sole materials for a general LLM-as-a-Judge verifier to determine their validity and relevance. Experiments on mobile tasks across model families and scales demonstrate that our SmartSnap paradigm allows training LLM-driven agents in a scalable manner, bringing performance gains up to 26.08% and 16.66% respectively to 8B and 30B models. The synergizing between solution finding and evidence seeking facilitates the cultivation of efficient, self-verifying agents with competitive performance against DeepSeek V3.1 and Qwen3-235B-A22B. Code is available at: https://github.com/TencentYoutuResearch/SmartSnap
Systems and Control (EESS)
Nonlinear Spectral Modeling and Control of Soft-Robotic Muscles from Data
Artificial muscles are essential for compliant musculoskeletal robotics but complicate control due to nonlinear multiphysics dynamics. Hydraulically amplified electrostatic (HASEL) actuators, a class of soft artificial muscles, offer high performance but exhibit memory effects and hysteresis. Here we present a data-driven reduction and control strategy grounded in spectral submanifold (SSM) theory. In the adiabatic regime, where inputs vary slowly relative to intrinsic transients, trajectories rapidly converge to a low-dimensional slow manifold. We learn an explicit input-to-output map on this manifold from forced-response trajectories alone, avoiding decay experiments that can trigger hysteresis. We deploy the SSM-based model for real-time control of an antagonistic HASEL-clutch joint. This approach yields a substantial reduction in tracking error compared to feedback-only and feedforward-only baselines under identical settings. This record-and-control workflow enables rapid characterization and high-performance control of soft muscles and muscle-driven joints without detailed physics-based modeling.
Conditioning Aircraft Trajectory Prediction on Meteorological Data with a Physics-Informed Machine Learning Approach
Accurate aircraft trajectory prediction (TP) in air traffic management systems is confounded by a number of epistemic uncertainties, dominated by uncertain meteorological conditions and operator specific procedures. Handling this uncertainty necessitates the use of probabilistic, machine learned models for generating trajectories. However, the trustworthiness of such models is limited if generated trajectories are not physically plausible. For this reason we propose a physics-informed approach in which aircraft thrust and airspeed are learned from data and are used to condition the existing Base of Aircraft Data (BADA) model, which is physics-based and enforces energy-based constraints on generated trajectories. A set of informative features are identified and used to condition a probabilistic model of aircraft thrust and airspeed, with the proposed scheme demonstrating a 20% improvement in skilfulness across a set of six metrics, compared against a baseline probabilistic model that ignores contextual information such as meteorological conditions.
comment: Accepted to AIAA SciTech 2026 Forum
Time-Varying Kinematics Control for Magnetically-Actuated Satellite Swarm without Additional Actuator
Electromagnetic Formation Flight is a technology that uses electromagnetic forces and torques to control multiple satellites without conventional fuel-based propulsion. In this paper, the controllability of the system is discussed based on the conservation of the entire system's angular momentum, which constitutes a nonholonomic constraint. This paper designs a new controller for multiple satellites without an additional attitude actuator.
comment: This manuscript is a revised author's preprint based on a paper originally presented at the Workshop on JAXA, "Astrodynamics and Flight Mechanics,'' Sagamihara, Japan, C-6, 2020, and has not been peer-reviewed
Finite Memory Belief Approximation for Optimal Control in Partially Observable Markov Decision Processes
We study finite memory belief approximation for partially observable (PO) stochastic optimal control (SOC) problems. While belief states are sufficient for SOC in partially observable Markov decision processes (POMDPs), they are generally infinite-dimensional and impractical. We interpret truncated input-output (IO) histories as inducing a belief approximation and develop a metric-based theory that directly relates information loss to control performance. Using the Wasserstein metric, we derive policy-conditional performance bounds that quantify value degradation induced by finite memory along typical closed-loop trajectories. Our analysis proceeds via a fixed-policy comparison: we evaluate two cost functionals under the same closed-loop execution and isolate the effect of replacing the true belief by its finite memory approximation inside the belief-level cost. For linear quadratic Gaussian (LQG) systems, we provide closed-form belief mismatch evaluation and empirically validate the predicted mechanism, demonstrating that belief mismatch decays approximately exponentially with memory length and that the induced performance mismatch scales accordingly. Together, these results provide a metric-aware characterization of what finite memory belief approximation can and cannot achieve in PO settings.
comment: 6 pages, 3 figures
Dual-quaternion learning control for autonomous vehicle trajectory tracking with safety guarantees
We propose a learning-based trajectory tracking controller for autonomous robotic platforms whose motion can be described kinematically on $\mathrm{SE}(3)$. The controller is formulated in the dual quaternion framework and operates at the velocity level, assuming direct command of angular and linear velocities, as is standard in many aerial vehicles and omnidirectional mobile robots. Gaussian Process (GP) regression is integrated into a geometric feedback law to learn and compensate online for unknown, state-dependent disturbances and modeling imperfections affecting both attitude and position, while preserving the algebraic structure and coupling properties inherent to rigid-body motion. The proposed approach does not rely on explicit parametric models of the unknown effects, making it well-suited for robotic systems subject to sensor-induced disturbances, unmodeled actuation couplings, and environmental uncertainties. A Lyapunov-based analysis establishes probabilistic ultimate boundedness of the pose tracking error under bounded GP uncertainty, providing formal stability guarantees for the learning-based controller. Simulation results demonstrate accurate and smooth trajectory tracking in the presence of realistic, localized disturbances, including correlated rotational and translational effects arising from magnetometer perturbations. These results illustrate the potential of combining geometric modeling and probabilistic learning to achieve robust, data-efficient pose control for autonomous robotic systems.
From inconsistency to decision: explainable operation and maintenance of battery energy storage systems
Battery Energy Storage Systems (BESSs) are increasingly critical to power-system stability, yet their operation and maintenance remain dominated by reactive, expert-dependent diagnostics. While cell-level inconsistencies provide early warning signals of degradation and safety risks, the lack of scalable and interpretable decision-support frameworks prevents these signals from being effectively translated into operational actions. Here we introduce an inconsistency-driven operation and maintenance paradigm for large-scale BESSs that systematically transforms routine monitoring data into explainable, decision-oriented guidance. The proposed framework integrates multi-dimensional inconsistency evaluation with large language model-based semantic reasoning to bridge the gap between quantitative diagnostics and practical maintenance decisions. Using eight months of field data from an in-service battery system comprising 3,564 cells, we demonstrate how electrical, thermal, and aging-related inconsistencies can be distilled into structured operational records and converted into actionable maintenance insights through a multi-agent framework. The proposed approach enables accurate and explainable responses to real-world operation and maintenance queries, reducing response time and operational cost by over 80% compared with conventional expert-driven practices. These results establish a scalable pathway for intelligent operation and maintenance of battery energy storage systems, with direct implications for reliability, safety, and cost-effective integration of energy storage into modern power systems.
comment: 13 pages, 5 figures
Closed-Loop Transmission Power Control for Reliable and Low-Power BLE Communication in Dynamic IoT Settings
Reliable and energy-efficient Bluetooth Low Energy (BLE) communication is crucial for Internet of Things (IoT) applications in dynamic environments. However, the Received Signal Strength Indicator (RSSI) and data throughput in BLE are highly susceptible to environmental variability, which degrades communication performance. In this work, we systematically analyze the interdependence among RSSI, throughput, transmission power (TXP), and the peripheral device system power consumption under diverse real-world conditions. We observe that adjusting the TXP effectively influences both RSSI and throughput. We propose a robust closed-loop TXP control framework based on Proportional-Integral-Derivative (PID) controllers. Two initial control strategies are investigated: an RSSI-based approach and a throughput-based approach, each exhibiting distinct advantages and limitations. The RSSI-based method provides rapid responsiveness to signal fluctuations but lacks direct correlation with data throughput, whereas the throughput-based method offers more accurate feedback on effective throughput at the cost of slower response. To address these limitations, a hybrid RSSI-throughput control strategy is developed, combining the responsiveness of RSSI feedback with the accuracy of throughput measurements. Experimental results demonstrate that the proposed hybrid approach maintains data throughput close to the target level with minimal variance, even under rapidly changing environmental conditions.
comment: 13 pages, 18 figures, published in IEEE Internet of Things Journal
Post-Earthquake Restoration of Electricity-Gas Distribution Systems with Damage Information Collection and Repair Vehicle Routing
Extreme events such as earthquakes pose significant threats to integrated electricity-gas distribution systems (IEGDS) by causing widespread damage. Existing restoration approaches typically assume full awareness of damage, which may not be true if monitoring and communication infrastructures are impaired. In such circumstances, field inspection is necessary. This paper presents a novel adaptive restoration framework for IEGDS, considering dynamic damage assessment and repair. The restoration problem is formulated as a partially observable Markov decision process (POMDP), capturing the gradually revealed contingency and the evolving impact of field crew actions. To address the computational challenges of POMDPs in real-time applications, an advanced belief tree search (BTS) algorithm is introduced. This algorithm enables crew members to continuously update their actions based on evolving belief states, leveraging comprehensive simulations to evaluate potential future trajectories and identify optimal inspection and repair strategies. Based on the BTS algorithm, a unified real-time decision-making framework is developed for IEGDS restoration. Case studies on two distinct IEGDS systems demonstrate the effectiveness and scalability of the proposed method. The results indicate that the proposed approach achieves an outage cost comparable to the ideal solution, and reduces the total outage cost by more than 15% compared to strategies based on stochastic programming and heuristic methods.
comment: Accepted for publication in CSEE Journal of Power and Energy Systems
Site-Specific and Frequency-Dependent Channel Characterization and MIMO Performance in FR3
Next-generation wireless systems aim to enable on-demand connectivity through dynamic spectrum utilization. Motivated by this vision, this paper investigates the propagation characteristics and MIMO performance of the upper mid-band, spanning approximately 7-24 GHz and unofficially referred to as FR3. Using site-specific ray-tracing (RT) simulations based on the Sionna framework, we analyze indoor and outdoor environments at representative frequencies across FR1, FR3, and FR2, including 3.5, 7, 10, 14, 20, 24, and 28 GHz, under both single-antenna and multi-antenna configurations. The results show that FR3 exhibits intermediate propagation behavior between sub-6 GHz and millimeter-wave bands while sustaining effective spatial multiplexing and favorable spectral efficiency. Furthermore, large-array analysis indicates that performance gains in FR3 are closely tied to antenna scaling, highlighting the importance of large-size or large-aperture MIMO architectures for practical deployments.
comment: Submitted to the IEEE conference for potential publication
A Mathematical Formalization of Self-Determining Agency
Defining agency is an extremely important challenge for cognitive science and artificial intelligence. Physics generally describes mechanical happenings, but there remains an unbridgeable gap between them and the acts of agents. To discuss the morality and responsibility of agents, it is necessary to model acts; whether such responsible acts can be fully explained by physical determinism has been debated. Although we have already proposed a physical "agent determinism" model that appears to go beyond mere mechanical happenings, we have not yet established a strict mathematical formalism to eliminate ambiguity. Here, we explain why a physical system can follow coarse-graining agent-level determination without violating physical laws by formulating supervenient causation. Generally, supervenience including coarse graining does not change without a change in its lower base; therefore, a single supervenience alone cannot define supervenient causation. We define supervenient causation as the causal efficacy from the supervenience level to its lower base level. Although an algebraic expression composed of the multiple supervenient functions does supervenes on the base, a sequence of indices that determines the algebraic expression does not supervene on the base. Therefore, the sequence can possess unique dynamical laws that are independent of the lower base level. This independent dynamics creates the possibility for temporally preceding changes at the supervenience level to cause changes at the lower base level. Such a dual-laws system is considered useful for modeling self-determining agents such as humans.
Distributionally Robust Game for Proof-of-Work Blockchain Mining Under Resource Uncertainties
Blockchain plays a crucial role in ensuring the security and integrity of decentralized systems, with the proof-of-work (PoW) mechanism being fundamental for achieving distributed consensus. As PoW blockchains see broader adoption, an increasingly diverse set of miners with varying computing capabilities participate in the network. In this paper, we consider the PoW blockchain mining, where the miners are associated with resource uncertainties. To characterize the uncertainty computing resources at different mining participants, we establish an ambiguous set representing uncertainty of resource distributions. Then, the networked mining is formulated as a non-cooperative game, where distributionally robust performance is calculated for each individual miner to tackle the resource uncertainties. We prove the existence of the equilibrium of the distributionally robust mining game. To derive the equilibrium, we propose the conditional value-at-risk (CVaR)-based reinterpretation of the best response of each miner. We then solve the individual strategy with alternating optimization, which facilitates the iteration among miners towards the game equilibrium. Furthermore, we consider the case that the ambiguity of resource distribution reduces to Gaussian distribution and the case that another uncertainties vanish, and then characterize the properties of the equilibrium therein along with a distributed algorithm to achieve the equilibrium. Simulation results show that the proposed approaches effectively converge to the equilibrium, and effectively tackle the uncertainties in blockchain mining to achieve a robust performance guarantee.
comment: Accepted @ IEEE TIFS
Hierarchical Preemptive Holistic Collaborative Systems for Embodied Multi-Agent Systems: Framework, Hybrid Stability, and Scalability Analysis
The coordination of Embodied Multi-Agent Systems in constrained physical environments requires a rigorous balance between safety, scalability, and efficiency. Traditional decentralized approaches, e.g., reactive collision avoidance, are prone to local minima or reciprocal yielding standoffs due to the lack of future intent awareness. In contrast, centralized planning suffers from intractable computational complexity and single-point-of-failure vulnerabilities. To address these limitations, we propose the Hierarchical Preemptive Holistic Collaborative (Prollect) framework, which generalizes the Preemptive Holistic Collaborative System (PHCS) by decomposing the global coordination problem into topologically connected subspace optimizations. We formalize the system as a Hybrid Automaton and introduce a three-stage receding horizon mechanism (frozen execution, preliminary planning, proactive look-ahead windows) with explicit padding to prevent races between coordination dissemination and intent updates. Notably, we design a robust timing protocol with a mandatory Idle Buffer that acts as a dwell-time constraint to eliminate Zeno behaviors and ensure computational stability under jitter. Furthermore, we formalize a Shadow Agent protocol to guarantee seamless trajectory consistency across subspace boundaries, which we treat as an Input-to-State Stability (ISS) problem.
Scaling Laws of Machine Learning for Optimal Power Flow
Optimal power flow (OPF) is one of the fundamental tasks for power system operations. While machine learning (ML) approaches such as deep neural networks (DNNs) have been widely studied to enhance OPF solution speed and performance, their practical deployment faces two critical scaling questions: What is the minimum training data volume required for reliable results? How should ML models' complexity balance accuracy with real-time computational limits? Existing studies evaluate discrete scenarios without quantifying these scaling relationships, leading to trial-and-error-based ML development in real-world applications. This work presents the first systematic scaling study for ML-based OPF across two dimensions: data scale (0.1K-40K training samples) and compute scale (multiple NN architectures with varying FLOPs). Our results reveal consistent power-law relationships on both DNNs and physics-informed NNs (PINNs) between each resource dimension and three core performance metrics: prediction error (MAE), constraint violations and speed. We find that for ACOPF, the accuracy metric scales with dataset size and training compute. These scaling laws enable predictable and principled ML pipeline design for OPF. We further identify the divergence between prediction accuracy and constraint feasibility and characterize the compute-optimal frontier. This work provides quantitative guidance for ML-OPF design and deployments.
comment: 5 pages
Topology-Aware Spatio-Temporal Graph Transformer for Predicting Smart Grid Failures
Smart grid infrastructure needs improved resilience and preventive maintenance through more accurate predictions. Current methodologies lack accurate representation of spatio-temporal-causal interdependencies and class imbalance in failure prediction tasks. This study introduces a Topology-Aware Spatio-Temporal Graph Transformer (ST-GT) architecture that overcomes existing limitations by using three main innovations: (1) directly incorporating physical transmission network topology into the transformer attention mechanism to identify spatial failure propagation patterns; (2) unified processing of static topological descriptors and (3) temporal Phasor Measurement Units (PMU) sequences in an end-to-end framework. The ST-GT model exhibited outstanding performance in five-fold cross-validation across 10 substations, attaining perfect recall (1.000 $\pm$ 0.001) and an F1-score of 0.858 $\pm$ 0.009, markedly surpassing XGBoost baselines (0.683 accuracy/F1). Perfect recall guarantees that no critical failures are overlooked, which is essential for grid safety; however, it may lead to an increase in false alarm rates. This framework integrates temporal dynamics modeling with spatial graph awareness for critical infrastructure monitoring. It offers interpretable insights into failure propagation pathways and enhances maintenance strategies. Future research focuses on developing cost-weighted loss functions for precision-recall trade-off enhancement, implementing real-time monitoring systems with uncertainty quantification, and creating cost-sensitive frameworks balancing false alarm expenditures with failure consequences. The methodology success suggests its potential for wider application in critical infrastructure areas requiring spatial temporal failure prediction.
comment: 6 pages, 4 figures, RAMS 2026, conference
Developing a Quantitative Resiliency Approach
Resiliency has garnered attention in the management of critical infrastructure as a metric of system performance, but there are significant roadblocks to its implementation in a realistic decision-making framework. Contrasted to risk and reliability, which have robust quantification approaches and undergird many regulatory approaches to system safety (e.g., "risk-informed decision-making"), resiliency is a diffuse, qualitatively-understood characteristic, often treated differently or distinctly. However, in the emerging context of highly-complex, highly-interdependent critical systems, the idea of reliability (as the probability of non-failure) may not be an appropriate metric of system health. As a result, focus is shifting towards resiliency-centered approaches that value the response to failure as much as the avoidance of failure. Supporting this approach requires a robustly-defined, quantitative understanding of resiliency. In this paper, we explore the foundations of reliability and resiliency engineering, and propose an approach to resiliency-informed decision-making bolstered by a quantitative understanding of resiliency.
comment: 18 pages
Policy Synthesis for Interval MDPs via Polyhedral Lyapunov Functions
Decision-making under uncertainty is central to many safety-critical applications, where decisions must be guided by probabilistic modeling formalisms. This paper introduces a novel approach to policy synthesis in multi-objective interval Markov decision processes using polyhedral Lyapunov functions. Unlike previous Lyapunov-based methods that mainly rely on quadratic functions, our method utilizes polyhedral functions to enhance accuracy in managing uncertainties within value iteration of dynamic programming. We reformulate the value iteration algorithm as a switched affine system with interval uncertainties and apply control-theoretic stability principles to synthesize policies that guide the system toward a desired target set. By constructing an invariant set of attraction, we ensure that the synthesized policies provide convergence guarantees while minimizing the impact of transition uncertainty in the underlying model. Our methodology removes the need for computationally intensive Pareto curve computations by directly determining a policy that brings objectives within a specified range of their target values. We validate our approach through numerical case studies, including a recycling robot and an electric vehicle battery, demonstrating its effectiveness in achieving policy synthesis under uncertainty.
Provable Acceleration of Distributed Optimization with Local Updates
In conventional distributed optimization, each agent performs a single local update between two communication rounds with its neighbors to synchronize solutions. Inspired by the success of using multiple local updates in federated learning, incorporating local updates into distributed optimization has recently attracted increasing attention. However, unlike federated learning, where multiple local updates can accelerate learning by improving gradient estimation under mini-batch settings, it remains unclear whether similar benefits hold in distributed optimization when gradients are exact. Moreover, existing theoretical results typically require reducing the step size when multiple local updates are employed, which can entirely offset any potential benefit of these additional local updates and obscure their true impact on convergence. In this paper, we focus on the classic DIGing algorithm and leverage the tight performance bounds provided by Performance Estimation Problems (PEP) to show that incorporating local updates can indeed accelerate distributed optimization. To the best of our knowledge, this is the first rigorous demonstration of such acceleration for a broad class of objective functions. Our analysis further reveals that, under an appropriate step size, performing only two local updates is sufficient to achieve the maximal possible improvement, and that additional local updates provide no further gains. Because more updates increase computational cost, these findings offer practical guidance for efficient implementation. Extensive experiments on both synthetic and real-world datasets corroborate the theoretical findings.
Sensor to Pixels: Decentralized Swarm Gathering via Image-Based Reinforcement Learning
This study highlights the potential of image-based reinforcement learning methods for addressing swarm-related tasks. In multi-agent reinforcement learning, effective policy learning depends on how agents sense, interpret, and process inputs. Traditional approaches often rely on handcrafted feature extraction or raw vector-based representations, which limit the scalability and efficiency of learned policies concerning input order and size. In this work we propose an image-based reinforcement learning method for decentralized control of a multi-agent system, where observations are encoded as structured visual inputs that can be processed by Neural Networks, extracting its spatial features and producing novel decentralized motion control rules. We evaluate our approach on a multi-agent convergence task of agents with limited-range and bearing-only sensing that aim to keep the swarm cohesive during the aggregation. The algorithm's performance is evaluated against two benchmarks: an analytical solution proposed by Bellaiche and Bruckstein, which ensures convergence but progresses slowly, and VariAntNet, a neural network-based framework that converges much faster but shows medium success rates in hard constellations. Our method achieves high convergence, with a pace nearly matching that of VariAntNet. In some scenarios, it serves as the only practical alternative.
Modeling and Control for UAV with Off-center Slung Load
Unmanned aerial vehicle (UAV) with slung load system is a classic air transportation system. In practical applications, the suspension point of the slung load does not always align with the center of mass (CoM) of the UAV due to mission requirements or mechanical interference. This offset creates coupling in the system's nonlinear dynamics which leads to a complicated motion control problem. In existing research, modeling of the system are performed about the UAV's CoM. In this work we use the point of suspension instead. Based on the new model, a cascade control strategy is developed. In the middle-loop controller, the acceleration of the suspension point is used to regulate the swing angle of the slung load without the need for considering the coupling between the slung load and the UAV. Using the off-center reference frame, an inner-loop controller is designed to track the UAV's attitude without the need of simplification on the coupling effects. We prove local exponential stability of the closed-loop using Lyapunov approach. Finally, simulations and experiments are conducted to validate the proposed control system.
Revisiting Continuous-Time Trajectory Estimation via Gaussian Processes and the Magnus Expansion
Continuous-time state estimation has been shown to be an effective means of (i) handling asynchronous and high-rate measurements, (ii) introducing smoothness to the estimate, (iii) post hoc querying the estimate at times other than those of the measurements, and (iv) addressing certain observability issues related to scanning-while-moving sensors. A popular means of representing the trajectory in continuous time is via a Gaussian process (GP) prior, with the prior's mean and covariance functions generated by a linear time-varying (LTV) stochastic differential equation (SDE) driven by white noise. When the state comprises elements of Lie groups, previous works have resorted to a patchwork of local GPs each with a linear time-invariant SDE kernel, which while effective in practice, lacks theoretical elegance. Here we revisit the full LTV GP approach to continuous-time trajectory estimation, deriving a global GP prior on Lie groups via the Magnus expansion, which offers a more elegant and general solution. We provide a numerical comparison between the two approaches and discuss their relative merits.
comment: 21 pages, 12 figures
Orthogonal-by-construction augmentation of physics-based input-output models
This paper proposes a novel orthogonal-by-construction parametrization for augmenting physics-based input-output models with a learning component in an additive sense. The parametrization allows to jointly optimize the parameters of the physics-based model and the learning component. Unlike the commonly applied additive (parallel) augmentation structure, the proposed formulation eliminates overlap in representation of the system dynamics, thereby preserving the uniqueness of the estimated physical parameters, ultimately leading to enhanced model interpretability. By theoretical analysis, we show that, under mild conditions, the method is statistically consistent and guarantees recovery of the true physical parameters. With further analysis regarding the asymptotic covariance matrix of the identified parameters, we also prove that the proposed structure provides a clear separation between the physics-based and learning components of the augmentation structure. The effectiveness of the proposed approach is demonstrated through simulation studies, showing accurate reproduction of the data-generating dynamics without sacrificing consistent estimation of the physical parameters.
comment: Submitted for publication
AdGT: Decentralized Gradient Tracking with Tuning-free Per-Agent Stepsize
In decentralized optimization, the choice of stepsize plays a critical role in algorithm performance. A common approach is to use a shared stepsize across all agents to ensure convergence. However, selecting an optimal stepsize often requires careful tuning, which can be time-consuming and may lead to slow convergence, especially when there is significant variation in the smoothness (L-smoothness) of local objective functions across agents. Individually tuning stepsizes per agent is also impractical, particularly in large-scale networks. To address these limitations, we propose AdGT, an adaptive gradient tracking method that enables each agent to adjust its stepsize based on the smoothness of its local objective. We prove that AdGT achieves linear convergence to the global optimal solution. Through numerical experiments, we compare AdGT with fixed-stepsize gradient tracking methods and demonstrate its superior performance. Additionally, we compare AdGT with adaptive gradient descent (AdGD) in a centralized setting and observe that fully adaptive stepsizes offer greater benefits in decentralized networks than in centralized ones.
Low-Cost Architecture and Efficient Pattern Synthesis for Polarimetric Phased Array Based on Polarization Coding Reconfigurable Elements
Polarimetric phased arrays (PPAs) enhance radar target detection and anti-jamming capabilities, but their conventional dual transmit/receive (T/R) channel architecture leads to high cost and system complexity. To address these limitations, this paper proposes a polarization-coding reconfigurable phased array (PCRPA) and associated pattern synthesis techniques, which reduce the channel count while preserving key performance. In the PCRPA, each antenna element connects to a single T/R channel and is equipped with a two-level RF switch, enabling real-time control of its polarization state and subarray grouping. By optimizing both the element polarization codes and the excitation weights, the array can synthesize arbitrarily polarized and dual-polarized beams. Simulation results show that the proposed approach achieves suppressed cross-polarization and comparable sidelobe levels compared to conventional PPAs across a wide scan range, with performance improvements being more pronounced in larger arrays. The inherent channel reduction does, however, incur a trade-off in terms of radiated power and directivity. Experimental validation using an $8\times 8$ X-band array antenna confirms the feasibility and effectiveness of the proposed system. The PCRPA architecture and the accompanying synthesis methods offer a cost-effective solution for large-scale PPA systems, maintaining sidelobe and polarization control with significantly reduced hardware complexity.
Optimal Control of an Epidemic with Intervention Design
This paper investigates the optimal control of an epidemic governed by a SEIR model with operational delays in vaccination and non pharmaceutical interventions. We address the mathematical challenge of imposing hard healthcare capacity constraints (e.g., ICU limits) over an infinite time horizon. To rigorously bridge the gap between theoretical constraints and numerical tractability, we employ a variational framework based on Moreau--Yosida regularization and establish the connection between finite- and infinite-horizon solutions via $Γ$-convergence. The necessary conditions for optimality are derived using the Pontryagin Maximum Principle, allowing for the characterization of singular regimes where the optimal strategy maintains the infection level precisely at the capacity boundary. Numerical simulations illustrate these theoretical findings, quantifying the shadow prices of infection and costs associated with intervention delays.
comment: For code and computational details in Python, please refer to \url{https://github.com/BehroozMoosavi/Codes/blob/main/Epidemic\%20With\%20Intervention/Epidemic.ipynb}
Adaptive Real-Time Scheduling Algorithms for Embedded Systems
Embedded systems are becoming more in demand to work in dynamic and uncertain environments, and being confined to the strong requirements of real-time. Conventional static scheduling models usually cannot cope with runtime modification in workload, resource availability, or system updates. This brief survey covers the area of feedback-based control (e.g., Feedback Control Scheduling) and interdependence between tasks (e.g., Symbiotic Scheduling of Periodic Tasks) models. It also borders on predictive methods and power management, combining methods based on Dynamic Voltage and Frequency Scaling (DVFS). In this paper, key mechanisms are briefly summarized, influencing trade-offs relating to adaptivity/predictability, typical metrics of evaluation, and ongoing problems, especially in situations where safety is a critical factor, giving a succinct and easy-to-understand introduction to researchers and practitioners who have to cope with the changing environment of adaptive real-time systems.
Reliable Grid Forecasting: State Space Models for Safety-Critical Energy Systems
Accurate grid load forecasting is safety-critical: under-predictions risk supply shortfalls, while symmetric error metrics mask this operational asymmetry. We introduce a grid-specific evaluation framework (Asymmetric MAPE, Under-Prediction Rate, and Reserve Margin) that directly measures operational risk rather than statistical accuracy alone. Using this framework, we conduct a systematic evaluation of Mamba-based State Space Models for California grid forecasting on a weather-aligned CA ISO-TAC dataset spanning Nov 2023 to Nov 2025 (84,498 hourly records across 5 transmission areas). Our analysis reveals that standard accuracy metrics are poor proxies for operational safety: models with identical MAPE can require vastly different reserve margins. We demonstrate that forecast errors are weakly but statistically significantly associated with temperature (r = 0.16), motivating weather-aware modeling rather than loss function modification alone. The S-Mamba model achieves the lowest 99.5th-percentile reserve margin (14.12 percent) compared to 16.66 percent for iTransformer, demonstrating superior forecast reliability under a 99.5th-percentile tail-risk reserve proxy.
comment: 25 pages, 7 figures, 8 tables
Steady-State Spread Bounds for Graph Diffusion via Laplacian Regularisation in Networked Systems
We study how far a diffusion process on a graph can deviate from a designed starting pattern when the pattern is generated via Laplacian regularisation. Under standard stability conditions for undirected, entrywise nonnegative graphs, we give a closed-form, instance-specific upper bound on the steady-state spread, measured as the relative change between the final and initial profiles. The bound separates two effects: (i) an irreducible term determined by the graph's maximum node degree, and (ii) a design-controlled term that shrinks as the regularisation strength increases (with an inverse square-root law). This leads to a design rule: given any target limit on spread, one can choose a sufficient regularisation strength in closed form. Although one motivating application is array beamforming -- where the initial pattern is the squared magnitude of the beamformer weights -- the result applies to any scenario that first enforces Laplacian smoothness and then evolves by linear diffusion on a graph. Overall, the guarantee is non-asymptotic, easy to compute, and certifies the maximum steady-state deviation.
Robotics
CycleVLA: Proactive Self-Correcting Vision-Language-Action Models via Subtask Backtracking and Minimum Bayes Risk Decoding
Current work on robot failure detection and correction typically operate in a post hoc manner, analyzing errors and applying corrections only after failures occur. This work introduces CycleVLA, a system that equips Vision-Language-Action models (VLAs) with proactive self-correction, the capability to anticipate incipient failures and recover before they fully manifest during execution. CycleVLA achieves this by integrating a progress-aware VLA that flags critical subtask transition points where failures most frequently occur, a VLM-based failure predictor and planner that triggers subtask backtracking upon predicted failure, and a test-time scaling strategy based on Minimum Bayes Risk (MBR) decoding to improve retry success after backtracking. Extensive experiments show that CycleVLA improves performance for both well-trained and under-trained VLAs, and that MBR serves as an effective zero-shot test-time scaling strategy for VLAs. Project Page: https://dannymcy.github.io/cyclevla/
comment: Project Page: https://dannymcy.github.io/cyclevla/
Differential Barometric Altimetry for Submeter Vertical Localization and Floor Recognition Indoors
Accurate altitude estimation and reliable floor recognition are critical for mobile robot localization and navigation within complex multi-storey environments. In this paper, we present a robust, low-cost vertical estimation framework leveraging differential barometric sensing integrated within a fully ROS-compliant software package. Our system simultaneously publishes real-time altitude data from both a stationary base station and a mobile sensor, enabling precise and drift-free vertical localization. Empirical evaluations conducted in challenging scenarios -- such as fully enclosed stairwells and elevators, demonstrate that our proposed barometric pipeline achieves sub-meter vertical accuracy (RMSE: 0.29 m) and perfect (100%) floor-level identification. In contrast, our results confirm that standalone height estimates, obtained solely from visual- or LiDAR-based SLAM odometry, are insufficient for reliable vertical localization. The proposed ROS-compatible barometric module thus provides a practical and cost-effective solution for robust vertical awareness in real-world robotic deployments. The implementation of our method is released as open source at https://github.com/witsir/differential-barometric.
SingingBot: An Avatar-Driven System for Robotic Face Singing Performance
Equipping robotic faces with singing capabilities is crucial for empathetic Human-Robot Interaction. However, existing robotic face driving research primarily focuses on conversations or mimicking static expressions, struggling to meet the high demands for continuous emotional expression and coherence in singing. To address this, we propose a novel avatar-driven framework for appealing robotic singing. We first leverage portrait video generation models embedded with extensive human priors to synthesize vivid singing avatars, providing reliable expression and emotion guidance. Subsequently, these facial features are transferred to the robot via semantic-oriented mapping functions that span a wide expression space. Furthermore, to quantitatively evaluate the emotional richness of robotic singing, we propose the Emotion Dynamic Range metric to measure the emotional breadth within the Valence-Arousal space, revealing that a broad emotional spectrum is crucial for appealing performances. Comprehensive experiments prove that our method achieves rich emotional expressions while maintaining lip-audio synchronization, significantly outperforming existing approaches.
Vision-Based Early Fault Diagnosis and Self-Recovery for Strawberry Harvesting Robots
Strawberry harvesting robots faced persistent challenges such as low integration of visual perception, fruit-gripper misalignment, empty grasping, and strawberry slippage from the gripper due to insufficient gripping force, all of which compromised harvesting stability and efficiency in orchard environments. To overcome these issues, this paper proposed a visual fault diagnosis and self-recovery framework that integrated multi-task perception with corrective control strategies. At the core of this framework was SRR-Net, an end-to-end multi-task perception model that simultaneously performed strawberry detection, segmentation, and ripeness estimation, thereby unifying visual perception with fault diagnosis. Based on this integrated perception, a relative error compensation method based on the simultaneous target-gripper detection was designed to address positional misalignment, correcting deviations when error exceeded the tolerance threshold. To mitigate empty grasping and fruit-slippage faults, an early abort strategy was implemented. A micro-optical camera embedded in the end-effector provided real-time visual feedback, enabling grasp detection during the deflating stage and strawberry slip prediction during snap-off through MobileNet V3-Small classifier and a time-series LSTM classifier. Experiments demonstrated that SRR-Net maintained high perception accuracy. For detection, it achieved a precision of 0.895 and recall of 0.813 on strawberries, and 0.972/0.958 on hands. In segmentation, it yielded a precision of 0.887 and recall of 0.747 for strawberries, and 0.974/0.947 for hands. For ripeness estimation, SRR-Net attained a mean absolute error of 0.035, while simultaneously supporting multi-task perception and sustaining a competitive inference speed of 163.35 FPS.
Realistic adversarial scenario generation via human-like pedestrian model for autonomous vehicle control parameter optimisation
Autonomous vehicles (AVs) are rapidly advancing and are expected to play a central role in future mobility. Ensuring their safe deployment requires reliable interaction with other road users, not least pedestrians. Direct testing on public roads is costly and unsafe for rare but critical interactions, making simulation a practical alternative. Within simulation-based testing, adversarial scenarios are widely used to probe safety limits, but many prioritise difficulty over realism, producing exaggerated behaviours which may result in AV controllers that are overly conservative. We propose an alternative method, instead using a cognitively inspired pedestrian model featuring both inter-individual and intra-individual variability to generate behaviourally plausible adversarial scenarios. We provide a proof of concept demonstration of this method's potential for AV control optimisation, in closed-loop testing and tuning of an AV controller. Our results show that replacing the rule-based CARLA pedestrian with the human-like model yields more realistic gap acceptance patterns and smoother vehicle decelerations. Unsafe interactions occur only for certain pedestrian individuals and conditions, underscoring the importance of human variability in AV testing. Adversarial scenarios generated by this model can be used to optimise AV control towards safer and more efficient behaviour. Overall, this work illustrates how incorporating human-like road user models into simulation-based adversarial testing can enhance the credibility of AV evaluation and provide a practical basis to behaviourally informed controller optimisation.
Genie Sim 3.0 : A High-Fidelity Comprehensive Simulation Platform for Humanoid Robot
The development of robust and generalizable robot learning models is critically contingent upon the availability of large-scale, diverse training data and reliable evaluation benchmarks. Collecting data in the physical world poses prohibitive costs and scalability challenges, and prevailing simulation benchmarks frequently suffer from fragmentation, narrow scope, or insufficient fidelity to enable effective sim-to-real transfer. To address these challenges, we introduce Genie Sim 3.0, a unified simulation platform for robotic manipulation. We present Genie Sim Generator, a large language model (LLM)-powered tool that constructs high-fidelity scenes from natural language instructions. Its principal strength resides in rapid and multi-dimensional generalization, facilitating the synthesis of diverse environments to support scalable data collection and robust policy evaluation. We introduce the first benchmark that pioneers the application of LLM for automated evaluation. It leverages LLM to mass-generate evaluation scenarios and employs Vision-Language Model (VLM) to establish an automated assessment pipeline. We also release an open-source dataset comprising more than 10,000 hours of synthetic data across over 200 tasks. Through systematic experimentation, we validate the robust zero-shot sim-to-real transfer capability of our open-source dataset, demonstrating that synthetic data can server as an effective substitute for real-world data under controlled conditions for scalable policy training. For code and dataset details, please refer to: https://github.com/AgibotTech/genie_sim.
VIT-Ped: Visionary Intention Transformer for Pedestrian Behavior Analysis
Pedestrian Intention prediction is one of the key technologies in the transition from level 3 to level 4 autonomous driving. To understand pedestrian crossing behaviour, several elements and features should be taken into consideration to make the roads of tomorrow safer for everybody. We introduce a transformer / video vision transformer based algorithm of different sizes which uses different data modalities .We evaluated our algorithms on popular pedestrian behaviour dataset, JAAD, and have reached SOTA performance and passed the SOTA in metrics like Accuracy, AUC and F1-score. The advantages brought by different model design choices are investigated via extensive ablation studies.
Deep Robust Koopman Learning from Noisy Data
Koopman operator theory has emerged as a leading data-driven approach that relies on a judicious choice of observable functions to realize global linear representations of nonlinear systems in the lifted observable space. However, real-world data is often noisy, making it difficult to obtain an accurate and unbiased approximation of the Koopman operator. The Koopman operator generated from noisy datasets is typically corrupted by noise-induced bias that severely degrades prediction and downstream tracking performance. In order to address this drawback, this paper proposes a novel autoencoder-based neural architecture to jointly learn the appropriate lifting functions and the reduced-bias Koopman operator from noisy data. The architecture initially learns the Koopman basis functions that are consistent for both the forward and backward temporal dynamics of the system. Subsequently, by utilizing the learned forward and backward temporal dynamics, the Koopman operator is synthesized with a reduced bias making the method more robust to noise compared to existing techniques. Theoretical analysis is used to demonstrate significant bias reduction in the presence of training noise. Dynamics prediction and tracking control simulations are conducted for multiple serial manipulator arms, including performance comparisons with leading alternative designs, to demonstrate its robustness under various noise levels. Experimental studies with the Franka FR3 7-DoF manipulator arm are further used to demonstrate the effectiveness of the proposed approach in a practical setting.
What you reward is what you learn: Comparing rewards for online speech policy optimization in public HRI
Designing policies that are both efficient and acceptable for conversational service robots in open and diverse environments is non-trivial. Unlike fixed, hand-tuned parameters, online learning can adapt to non-stationary conditions. In this paper, we study how to adapt a social robot's speech policy in the wild. During a 12-day in-situ deployment with over 1,400 public encounters, we cast online policy optimization as a multi-armed bandit problem and use Thompson sampling to select among six actions defined by speech rate (slow/normal/fast) and verbosity (concise/detailed). We compare three complementary binary rewards--Ru (user rating), Rc (conversation closure), and Rt (>=2 turns)--and show that each induces distinct arm distributions and interaction behaviors. We complement the online results with offline evaluations that analyze contextual factors (e.g., crowd level, group size) using video-annotated data. Taken together, we distill ready-to-use design lessons for deploying online optimization of speech policies in real public HRI settings.
Learning Diffusion Policy from Primitive Skills for Robot Manipulation AAAI2026
Diffusion policies (DP) have recently shown great promise for generating actions in robotic manipulation. However, existing approaches often rely on global instructions to produce short-term control signals, which can result in misalignment in action generation. We conjecture that the primitive skills, referred to as fine-grained, short-horizon manipulations, such as ``move up'' and ``open the gripper'', provide a more intuitive and effective interface for robot learning. To bridge this gap, we propose SDP, a skill-conditioned DP that integrates interpretable skill learning with conditional action planning. SDP abstracts eight reusable primitive skills across tasks and employs a vision-language model to extract discrete representations from visual observations and language instructions. Based on them, a lightweight router network is designed to assign a desired primitive skill for each state, which helps construct a single-skill policy to generate skill-aligned actions. By decomposing complex tasks into a sequence of primitive skills and selecting a single-skill policy, SDP ensures skill-consistent behavior across diverse tasks. Extensive experiments on two challenging simulation benchmarks and real-world robot deployments demonstrate that SDP consistently outperforms SOTA methods, providing a new paradigm for skill-based robot learning with diffusion policies.
comment: Accepted to AAAI2026
From Metrics to Meaning: Insights from a Mixed-Methods Field Experiment on Retail Robot Deployment
We report a mixed-methods field experiment of a conversational service robot deployed under everyday staffing discretion in a live bedding store. Over 12 days we alternated three conditions--Baseline (no robot), Robot-only, and Robot+Fixture--and video-annotated the service funnel from passersby to purchase. An explanatory sequential design then used six post-experiment staff interviews to interpret the quantitative patterns. Quantitatively, the robot increased stopping per passerby (highest with the fixture), yet clerk-led downstream steps per stopper--clerk approach, store entry, assisted experience, and purchase--decreased. Interviews explained this divergence: clerks avoided interrupting ongoing robot-customer talk, struggled with ambiguous timing amid conversational latency, and noted child-centered attraction that often satisfied curiosity at the doorway. The fixture amplified visibility but also anchored encounters at the threshold, creating a well-defined micro-space where needs could ``close'' without moving inside. We synthesize these strands into an integrative account from the initial show of interest on the part of a customer to their entering the store and derive actionable guidance. The results advance the understanding of interactions between customers, staff members, and the robot and offer practical recommendations for deploying service robots in high-touch retail.
CausalNav: A Long-term Embodied Navigation System for Autonomous Mobile Robots in Dynamic Outdoor Scenarios
Autonomous language-guided navigation in large-scale outdoor environments remains a key challenge in mobile robotics, due to difficulties in semantic reasoning, dynamic conditions, and long-term stability. We propose CausalNav, the first scene graph-based semantic navigation framework tailored for dynamic outdoor environments. We construct a multi-level semantic scene graph using LLMs, referred to as the Embodied Graph, that hierarchically integrates coarse-grained map data with fine-grained object entities. The constructed graph serves as a retrievable knowledge base for Retrieval-Augmented Generation (RAG), enabling semantic navigation and long-range planning under open-vocabulary queries. By fusing real-time perception with offline map data, the Embodied Graph supports robust navigation across varying spatial granularities in dynamic outdoor environments. Dynamic objects are explicitly handled in both the scene graph construction and hierarchical planning modules. The Embodied Graph is continuously updated within a temporal window to reflect environmental changes and support real-time semantic navigation. Extensive experiments in both simulation and real-world settings demonstrate superior robustness and efficiency.
comment: Accepted by IEEE Robotics and Automation Letters (RA-L)
DisCo-FLoc: Using Dual-Level Visual-Geometric Contrasts to Disambiguate Depth-Aware Visual Floorplan Localization
Since floorplan data is readily available, long-term persistent, and robust to changes in visual appearance, visual Floorplan Localization (FLoc) has garnered significant attention. Existing methods either ingeniously match geometric priors or utilize sparse semantics to reduce FLoc uncertainty. However, they still suffer from ambiguous FLoc caused by repetitive structures within minimalist floorplans. Moreover, expensive but limited semantic annotations restrict their applicability. To address these issues, we propose DisCo-FLoc, which utilizes dual-level visual-geometric Contrasts to Disambiguate depth-aware visual Floc, without requiring additional semantic labels. Our solution begins with a ray regression predictor tailored for ray-casting-based FLoc, predicting a series of FLoc candidates using depth estimation expertise. In addition, a novel contrastive learning method with position-level and orientation-level constraints is proposed to strictly match depth-aware visual features with the corresponding geometric structures in the floorplan. Such matches can effectively eliminate FLoc ambiguity and select the optimal imaging pose from FLoc candidates. Exhaustive comparative studies on two standard visual Floc benchmarks demonstrate that our method outperforms the state-of-the-art semantic-based method, achieving significant improvements in both robustness and accuracy.
comment: 7 pages, 4 figures
AlignDrive: Aligned Lateral-Longitudinal Planning for End-to-End Autonomous Driving
End-to-end autonomous driving has rapidly progressed, enabling joint perception and planning in complex environments. In the planning stage, state-of-the-art (SOTA) end-to-end autonomous driving models decouple planning into parallel lateral and longitudinal predictions. While effective, this parallel design can lead to i) coordination failures between the planned path and speed, and ii) underutilization of the drive path as a prior for longitudinal planning, thus redundantly encoding static information. To address this, we propose a novel cascaded framework that explicitly conditions longitudinal planning on the drive path, enabling coordinated and collision-aware lateral and longitudinal planning. Specifically, we introduce a path-conditioned formulation that explicitly incorporates the drive path into longitudinal planning. Building on this, the model predicts longitudinal displacements along the drive path rather than full 2D trajectory waypoints. This design simplifies longitudinal reasoning and more tightly couples it with lateral planning. Additionally, we introduce a planning-oriented data augmentation strategy that simulates rare safety-critical events, such as vehicle cut-ins, by adding agents and relabeling longitudinal targets to avoid collision. Evaluated on the challenging Bench2Drive benchmark, our method sets a new SOTA, achieving a driving score of 89.07 and a success rate of 73.18%, demonstrating significantly improved coordination and safety
comment: underreview
Simulations and Advancements in MRI-Guided Power-Driven Ferric Tools for Wireless Therapeutic Interventions
Designing a robotic system that functions effectively within the specific environment of a Magnetic Resonance Imaging (MRI) scanner requires solving numerous technical issues, such as maintaining the robot's precision and stability under strong magnetic fields. This research focuses on enhancing MRI's role in medical imaging, especially in its application to guide intravascular interventions using robot-assisted devices. A newly developed computational system is introduced, designed for seamless integration with the MRI scanner, including a computational unit and user interface. This system processes MR images to delineate the vascular network, establishing virtual paths and boundaries within vessels to prevent procedural damage. Key findings reveal the system's capability to create tailored magnetic field gradient patterns for device control, considering the vessel's geometry and safety norms, and adapting to different blood flow characteristics for finer navigation. Additionally, the system's modeling aspect assesses the safety and feasibility of navigating pre-set vascular paths. Conclusively, this system, based on the Qt framework and C/C++, with specialized software modules, represents a major step forward in merging imaging technology with robotic aid, significantly enhancing precision and safety in intravascular procedures.
comment: 10 pages, 7 figures
Explicit World Models for Reliable Human-Robot Collaboration AAAI-26
This paper addresses the topic of robustness under sensing noise, ambiguous instructions, and human-robot interaction. We take a radically different tack to the issue of reliable embodied AI: instead of focusing on formal verification methods aimed at achieving model predictability and robustness, we emphasise the dynamic, ambiguous and subjective nature of human-robot interactions that requires embodied AI systems to perceive, interpret, and respond to human intentions in a manner that is consistent, comprehensible and aligned with human expectations. We argue that when embodied agents operate in human environments that are inherently social, multimodal, and fluid, reliability is contextually determined and only has meaning in relation to the goals and expectations of humans involved in the interaction. This calls for a fundamentally different approach to achieving reliable embodied AI that is centred on building and updating an accessible "explicit world model" representing the common ground between human and AI, that is used to align robot behaviours with human expectations.
comment: Accepted to AAAI-26 Bridge Program B10: Making Embodied AI Reliable with Testing and Formal Verification
Real-Time Lane Detection via Efficient Feature Alignment and Covariance Optimization for Low-Power Embedded Systems
Real-time lane detection in embedded systems encounters significant challenges due to subtle and sparse visual signals in RGB images, often constrained by limited computational resources and power consumption. Although deep learning models for lane detection categorized into segmentation-based, anchor-based, and curve-based methods there remains a scarcity of universally applicable optimization techniques tailored for low-power embedded environments. To overcome this, we propose an innovative Covariance Distribution Optimization (CDO) module specifically designed for efficient, real-time applications. The CDO module aligns lane feature distributions closely with ground-truth labels, significantly enhancing detection accuracy without increasing computational complexity. Evaluations were conducted on six diverse models across all three method categories, including two optimized for real-time applications and four state-of-the-art (SOTA) models, tested comprehensively on three major datasets: CULane, TuSimple, and LLAMAS. Experimental results demonstrate accuracy improvements ranging from 0.01% to 1.5%. The proposed CDO module is characterized by ease of integration into existing systems without structural modifications and utilizes existing model parameters to facilitate ongoing training, thus offering substantial benefits in performance, power efficiency, and operational flexibility in embedded systems.
AMC26: High-performance DOb for robust position control
This paper presents a new HPDOb that significantly improves disturbance estimation accuracy and robustness in motion control systems, surpassing the capabilities of conventional DObs. The proposed observer is analysed and synthesised in the discrete-time domain, providing a realistic representation of their dynamic behaviour and enabling enhanced controller design for practical applications. The core contribution of the HPDOb is a novel synthesis method that incorporates higher-order truncation error dynamics into disturbance estimation. Unlike conventional DObs, which are limited to zero-order truncation error, the HPDOb achieves first-order truncation error, yielding markedly improved estimation accuracy and robustness against disturbances in motion control systems. Simulation and experiments verify the stability and performance of HPDOb.
Learning and Optimizing the Efficacy of Spatio-Temporal Task Allocation under Temporal and Resource Constraints
Complex multi-robot missions often require heterogeneous teams to jointly optimize task allocation, scheduling, and path planning to improve team performance under strict constraints. We formalize these complexities into a new class of problems, dubbed Spatio-Temporal Efficacy-optimized Allocation for Multi-robot systems (STEAM). STEAM builds upon trait-based frameworks that model robots using their capabilities (e.g., payload and speed), but goes beyond the typical binary success-failure model by explicitly modeling the efficacy of allocations as trait-efficacy maps. These maps encode how the aggregated capabilities assigned to a task determine performance. Further, STEAM accommodates spatio-temporal constraints, including a user-specified time budget (i.e., maximum makespan). To solve STEAM problems, we contribute a novel algorithm named Efficacy-optimized Incremental Task Allocation Graph Search (E-ITAGS) that simultaneously optimizes task performance and respects time budgets by interleaving task allocation, scheduling, and path planning. Motivated by the fact that trait-efficacy maps are difficult, if not impossible, to specify, E-ITAGS efficiently learns them using a realizability-aware active learning module. Our approach is realizability-aware since it explicitly accounts for the fact that not all combinations of traits are realizable by the robots available during learning. Further, we derive experimentally-validated bounds on E-ITAGS' suboptimality with respect to efficacy. Detailed numerical simulations and experiments using an emergency response domain demonstrate that E-ITAGS generates allocations of higher efficacy compared to baselines, while respecting resource and spatio-temporal constraints. We also show that our active learning approach is sample efficient and establishes a principled tradeoff between data and computational efficiency.
comment: The journal extension version of our conference paper: arXiv:2404.07902, which has been accepted by ISRR 2024
InternVLA-A1: Unifying Understanding, Generation and Action for Robotic Manipulation
Prevalent Vision-Language-Action (VLA) models are typically built upon Multimodal Large Language Models (MLLMs) and demonstrate exceptional proficiency in semantic understanding, but they inherently lack the capability to deduce physical world dynamics. Consequently, recent approaches have shifted toward World Models, typically formulated via video prediction; however, these methods often suffer from a lack of semantic grounding and exhibit brittleness when handling prediction errors. To synergize semantic understanding with dynamic predictive capabilities, we present InternVLA-A1. This model employs a unified Mixture-of-Transformers architecture, coordinating three experts for scene understanding, visual foresight generation, and action execution. These components interact seamlessly through a unified masked self-attention mechanism. Building upon InternVL3 and Qwen3-VL, we instantiate InternVLA-A1 at 2B and 3B parameter scales. We pre-train these models on hybrid synthetic-real datasets spanning InternData-A1 and Agibot-World, covering over 533M frames. This hybrid training strategy effectively harnesses the diversity of synthetic simulation data while minimizing the sim-to-real gap. We evaluated InternVLA-A1 across 12 real-world robotic tasks and simulation benchmark. It significantly outperforms leading models like pi0 and GR00T N1.5, achieving a 14.5\% improvement in daily tasks and a 40\%-73.3\% boost in dynamic settings, such as conveyor belt sorting.
comment: Homepage: https://internrobotics.github.io/internvla-a1.github.io/
LIMOncello: Iterated Error-State Kalman Filter on the SGal(3) Manifold for Fast LiDAR-Inertial Odometry
This work introduces LIMOncello, a tightly coupled LiDAR-Inertial Odometry system that models 6-DoF motion on the $\mathrm{SGal}(3)$ manifold within an iterated error-state Kalman filter backend. Compared to state representations defined on $\mathrm{SO}(3)\times\mathbb{R}^6$, the use of $\mathrm{SGal}(3)$ provides a coherent and numerically stable discrete-time propagation model that helps limit drift in low-observability conditions. LIMOncello also includes a lightweight incremental i-Octree mapping backend that enables faster updates and substantially lower memory usage than incremental kd-tree style map structures, without relying on locality-restricted search heuristics. Experiments on multiple real-world datasets show that LIMOncello achieves competitive accuracy while improving robustness in geometrically sparse environments. The system maintains real-time performance with stable memory growth and is released as an extensible open-source implementation at https://github.com/CPerezRuiz335/LIMOncello.
FORTE: Tactile Force and Slip Sensing on Compliant Fingers for Delicate Manipulation
Handling fragile objects remains a major challenge for robotic manipulation. Tactile sensing and soft robotics can improve delicate object handling, but typically involve high integration complexity or slow response times. We address these issues through FORTE, an easy-to-fabricate tactile sensing system. FORTE uses 3D-printed fin-ray grippers with internal air channels to provide low-latency force and slip feedback. This feedback allows us to apply just enough force to grasp objects without damaging them. We accurately estimate grasping forces from 0-8 N with an average error of 0.2 N, and detect slip events within 100 ms of occurring. FORTE can grasp a wide range of slippery, fragile, and deformable objects, including raspberries and potato chips with 92% success and achieves 93% accuracy in detecting slip events. These results highlight FORTE's potential as a robust solution for delicate robotic manipulation. Project page: https://merge-lab.github.io/FORTE/
Volume-Consistent Kneading-Based Deformation Manufacturing for Material-Efficient Shaping
Conventional subtractive manufacturing inevitably involves material loss during geometric realization, while additive manufacturing still suffers from limitations in surface quality, process continuity, and productivity when fabricating complex geometries. To address these challenges, this paper proposes a volume-consistent kneading-based forming method for plastic materials, enabling continuous and controllable three-dimensional deformation under mass conservation. An integrated kneading-based manufacturing system is developed, in which geometry-aware kneading command generation, layer-wise kneading execution, and in-process point-cloud scanning are tightly coupled to form a closed-loop workflow of scanning, forming, and feedback compensation. Target geometries are analyzed through layer-wise point-cloud processing and classified into enveloping and non-enveloping types. Accordingly, an Envelope Shaping First strategy and a Similar Gradient Method are adopted to ensure stable material flow and continuous deformation. An RMSE-based compensation scheme is further introduced to correct systematic geometric deviations induced by elastic rebound and material redistribution. Experimental validation on five representative geometries demonstrates high geometric fidelity, with material utilization consistently exceeding 98%. The results indicate that kneading-based forming provides a promising alternative manufacturing paradigm for low-waste, customizable production.
comment: 39 pages, 31 figures
Compositional Diffusion with Guided Search for Long-Horizon Planning
Generative models have emerged as powerful tools for planning, with compositional approaches offering particular promise for modeling long-horizon task distributions by composing together local, modular generative models. This compositional paradigm spans diverse domains, from multi-step manipulation planning to panoramic image synthesis to long video generation. However, compositional generative models face a critical challenge: when local distributions are multimodal, existing composition methods average incompatible modes, producing plans that are neither locally feasible nor globally coherent. We propose Compositional Diffusion with Guided Search (CDGS), which addresses this mode averaging problem by embedding search directly within the diffusion denoising process. Our method explores diverse combinations of local modes through population-based sampling, prunes infeasible candidates using likelihood-based filtering, and enforces global consistency through iterative resampling between overlapping segments. CDGS matches oracle performance on seven robot manipulation tasks, outperforming baselines that lack compositionality or require long-horizon training data. The approach generalizes across domains, enabling coherent text-guided panoramic images and long videos through effective local-to-global message passing. More details: https://cdgsearch.github.io/
comment: 38 pages, 18 figures
Interconnection and Damping Assignment Passivity-Based Control using Sparse Neural ODEs
Interconnection and Damping Assignment Passivity-Based Control (IDA-PBC) is a nonlinear control technique that assigns a port-Hamiltonian (pH) structure to a controlled system using a state-feedback law. While IDA-PBC has been extensively studied and applied to many systems, its practical implementation often remains confined to academic examples and, almost exclusively, to stabilization tasks. The main limitation of IDA-PBC stems from the complexity of analytically solving a set of partial differential equations (PDEs), referred to as the matching conditions, which enforce the pH structure of the closed-loop system. However, this is extremely challenging, especially for complex physical systems and tasks. In this work, we propose a novel numerical approach for designing IDA-PBC controllers without solving the matching PDEs exactly. We cast the IDA-PBC problem as the learning of a neural ordinary differential equation. In particular, we rely on sparse dictionary learning to parametrize the desired closed-loop system as a sparse linear combination of nonlinear state-dependent functions. Optimization of the controller parameters is achieved by solving a multi-objective optimization problem whose cost function is composed of a generic task-dependent cost and a matching condition-dependent cost. Our numerical results show that the proposed method enables (i) IDA-PBC to be applicable to complex tasks beyond stabilization, such as the discovery of periodic oscillatory behaviors, (ii) the derivation of closed-form expressions of the controlled system, including residual terms in case of approximate matching, and (iii) stability analysis of the learned controller.
Foundation models on the bridge: Semantic hazard detection and safety maneuvers for maritime autonomy with vision-language models
The draft IMO MASS Code requires autonomous and remotely supervised maritime vessels to detect departures from their operational design domain, enter a predefined fallback that notifies the operator, permit immediate human override, and avoid changing the voyage plan without approval. Meeting these obligations in the alert-to-takeover gap calls for a short-horizon, human-overridable fallback maneuver. Classical maritime autonomy stacks struggle when the correct action depends on meaning (e.g., diver-down flag means people in the water, fire close by means hazard). We argue (i) that vision-language models (VLMs) provide semantic awareness for such out-of-distribution situations, and (ii) that a fast-slow anomaly pipeline with a short-horizon, human-overridable fallback maneuver makes this practical in the handover window. We introduce Semantic Lookout, a camera-only, candidate-constrained VLM fallback maneuver selector that selects one cautious action (or station-keeping) from water-valid, world-anchored trajectories under continuous human authority. On 40 harbor scenes we measure per-call scene understanding and latency, alignment with human consensus (model majority-of-three voting), short-horizon risk-relief on fire hazard scenes, and an on-water alert->fallback maneuver->operator handover. Sub-10 s models retain most of the awareness of slower state-of-the-art models. The fallback maneuver selector outperforms geometry-only baselines and increases standoff distance on fire scenes. A field run verifies end-to-end operation. These results support VLMs as semantic fallback maneuver selectors compatible with the draft IMO MASS Code, within practical latency budgets, and motivate future work on domain-adapted, hybrid autonomy that pairs foundation-model semantics with multi-sensor bird's-eye-view perception and short-horizon replanning. Website: kimachristensen.github.io/bridge_policy
comment: 17 pages without bibliography or appendix. The main paper has 16 figures. Paper webpage can be found at https://kimachristensen.github.io/bridge_policy/
MOON: Multi-Objective Optimization-Driven Object-Goal Navigation Using a Variable-Horizon Set-Orienteering Planner
This paper proposes MOON (Multi-Objective Optimization-driven Object-goal Navigation), a novel framework designed for efficient navigation in large-scale, complex indoor environments. While existing methods often rely on local heuristics, they frequently fail to address the strategic trade-offs between competing objectives in vast areas. To overcome this, we formulate the task as a multi-objective optimization problem (MOO) that balances frontier-based exploration with the exploitation of observed landmarks. Our prototype integrates three key pillars: (1) QOM [IROS05] for discriminative landmark encoding; (2) StructNav [RSS23] to enhance the navigation pipeline; and (3) a variable-horizon Set Orienteering Problem (SOP) formulation for globally coherent planning. To further support the framework's scalability, we provide a detailed theoretical foundation for the budget-constrained SOP formulation and the data-driven mode-switching strategy that enables long-horizon resource allocation. Additionally, we introduce a high-speed neural planner that distills the expert solver into a transformer-based model, reducing decision latency by a factor of nearly 10 while maintaining high planning quality.
comment: 9 pages, 7 figures, technical report
Multi-Robot Data-Free Continual Communicative Learning (CCL) from Black-Box Visual Place Recognition Models
In emerging multi-robot societies, heterogeneous agents must continually extract and integrate local knowledge from one another through communication, even when their internal models are completely opaque. Existing approaches to continual or collaborative learning for visual place recognition (VPR) largely assume white-box access to model parameters or shared training datasets, which is unrealistic when robots encounter unknown peers in the wild. This paper introduces \emph{Continual Communicative Learning (CCL)}, a data-free multi-robot framework in which a traveler robot (student) continually improves its VPR capability by communicating with black-box teacher models via a constrained query--response channel. We repurpose Membership Inference Attacks (MIA), originally developed as privacy attacks on machine learning models, as a constructive communication primitive to reconstruct pseudo-training sets from black-box VPR teachers without accessing their parameters or raw data. To overcome the intrinsic communication bottleneck caused by the low sampling efficiency of black-box MIA, we propose a prior-based query strategy that leverages the student's own VPR prior to focus queries on informative regions of the embedding space, thereby reducing the knowledge transfer (KT) cost. Experimental results on a standard multi-session VPR benchmark demonstrate that the proposed CCL framework yields substantial performance gains for low-performing robots under modest communication budgets, highlighting CCL as a promising building block for scalable and fault-tolerant multi-robot systems.
comment: 6 pages, 4 figures, technical report
AdaVLN: Towards Visual Language Navigation in Continuous Indoor Environments with Moving Humans
Visual Language Navigation is a task that challenges robots to navigate in realistic environments based on natural language instructions. While previous research has largely focused on static settings, real-world navigation must often contend with dynamic human obstacles. Hence, we propose an extension to the task, termed Adaptive Visual Language Navigation (AdaVLN), which seeks to narrow this gap. AdaVLN requires robots to navigate complex 3D indoor environments populated with dynamically moving human obstacles, adding a layer of complexity to navigation tasks that mimic the real-world. To support exploration of this task, we also present AdaVLN simulator and AdaR2R datasets. The AdaVLN simulator enables easy inclusion of fully animated human models directly into common datasets like Matterport3D. We also introduce a "freeze-time" mechanism for both the navigation task and simulator, which pauses world state updates during agent inference, enabling fair comparisons and experimental reproducibility across different hardware. We evaluate several baseline models on this task, analyze the unique challenges introduced by AdaVLN, and demonstrate its potential to bridge the sim-to-real gap in VLN research.
RoboBPP: Benchmarking Robotic Online Bin Packing with Physics-based Simulation
Physical feasibility in 3D bin packing is a key requirement in modern industrial logistics and robotic automation. With the growing adoption of industrial automation, online bin packing has gained increasing attention. However, inconsistencies in problem settings, test datasets, and evaluation metrics have hindered progress in the field, and there is a lack of a comprehensive benchmarking system. Direct testing on real hardware is costly, and building a realistic simulation environment is also challenging. To address these limitations, we introduce RoboBPP, a benchmarking system designed for robotic online bin packing. RoboBPP integrates a physics-based simulator to assess physical feasibility. In our simulation environment, we introduce a robotic arm and boxes at real-world scales to replicate real industrial packing workflows. By simulating conditions that arise in real industrial applications, we ensure that evaluated algorithms are practically deployable. In addition, prior studies often rely on synthetic datasets whose distributions differ from real-world industrial data. To address this issue, we collect three datasets from real industrial workflows, including assembly-line production, logistics packing, and furniture manufacturing. The benchmark comprises three carefully designed test settings and extends existing evaluation metrics with new metrics for structural stability and operational safety. We design a scoring system and derive a range of insights from the evaluation results. RoboBPP is fully open-source and is equipped with visualization tools and an online leaderboard, providing a reproducible and extensible foundation for future research and industrial applications (https://robot-bin-packing-benchmark.github.io).
comment: Under review at the International Journal of Robotics Research (IJRR)
RNBF: Real-Time RGB-D Based Neural Barrier Functions for Safe Robotic Navigation
Autonomous safe navigation in unstructured and novel environments poses significant challenges, especially when environment information can only be provided through low-cost vision sensors. Although safe reactive approaches have been proposed to ensure robot safety in complex environments, many base their theory off the assumption that the robot has prior knowledge on obstacle locations and geometries. In this paper, we present a real-time, vision-based framework that constructs continuous, first-order differentiable Signed Distance Fields (SDFs) of unknown environments entirely online, without any pre-training, and is fully compatible with established SDF-based reactive controllers. To achieve robust performance under practical sensing conditions, our approach explicitly accounts for noise in affordable RGB-D cameras, refining the neural SDF representation online for smoother geometry and stable gradient estimates. We validate the proposed method in simulation and real-world experiments using a Fetch robot. Videos and supplementary material are available at https://satyajeetburla.github.io/rnbf/.
SurgWorld: Learning Surgical Robot Policies from Videos via World Modeling
Data scarcity remains a fundamental barrier to achieving fully autonomous surgical robots. While large scale vision language action (VLA) models have shown impressive generalization in household and industrial manipulation by leveraging paired video action data from diverse domains, surgical robotics suffers from the paucity of datasets that include both visual observations and accurate robot kinematics. In contrast, vast corpora of surgical videos exist, but they lack corresponding action labels, preventing direct application of imitation learning or VLA training. In this work, we aim to alleviate this problem by learning policy models from SurgWorld, a world model designed for surgical physical AI. We curated the Surgical Action Text Alignment (SATA) dataset with detailed action description specifically for surgical robots. Then we built SurgeWorld based on the most advanced physical AI world model and SATA. It's able to generate diverse, generalizable and realistic surgery videos. We are also the first to use an inverse dynamics model to infer pseudokinematics from synthetic surgical videos, producing synthetic paired video action data. We demonstrate that a surgical VLA policy trained with these augmented data significantly outperforms models trained only on real demonstrations on a real surgical robot platform. Our approach offers a scalable path toward autonomous surgical skill acquisition by leveraging the abundance of unlabeled surgical video and generative world modeling, thus opening the door to generalizable and data efficient surgical robot policies.
Openpi Comet: Competition Solution For 2025 BEHAVIOR Challenge
The 2025 BEHAVIOR Challenge is designed to rigorously track progress toward solving long-horizon tasks by physical agents in simulated environments. BEHAVIOR-1K focuses on everyday household tasks that people most want robots to assist with and these tasks introduce long-horizon mobile manipulation challenges in realistic settings, bridging the gap between current research and real-world, human-centric applications. This report presents our solution to the 2025 BEHAVIOR Challenge in a very close 2nd place and substantially outperforms the rest of the submissions. Building on $π_{0.5}$, we focus on systematically building our solution by studying the effects of training techniques and data. Through careful ablation studies, we reveal the scaling benefits in both the pre-training and post-training phases, leading to a validation Q-score of 0.345, significantly surpassing previous state-of-the-art performance. We summarize our practical lessons and design recommendations that we hope will provide actionable insights for the broader embodied AI community when adapting powerful foundation models to complex embodied scenarios. Project page: https://github.com/mli0603/openpi-comet
comment: Post-challenge bug fix
Multiagent Systems
Finite-State Decentralized Policy-Based Control With Guaranteed Ground Coverage
We propose a finite-state, decentralized decision and control framework for multi-agent ground coverage. The approach decomposes the problem into two coupled components: (i) the structural design of a deep neural network (DNN) induced by the reference configuration of the agents, and (ii) policy-based decentralized coverage control. Agents are classified as anchors and followers, yielding a generic and scalable communication architecture in which each follower interacts with exactly three in-neighbors from the preceding layer, forming an enclosing triangular communication structure. The DNN training weights implicitly encode the spatial configuration of the agent team, thereby providing a geometric representation of the environmental target set. Within this architecture, we formulate a computationally efficient decentralized Markov decision process (MDP) whose components are time-invariant except for a time-varying cost function defined by the deviation from the centroid of the target set contained within each agent communication triangle. By introducing the concept of Anyway Output Controllability (AOC), we assume each agent is AOC and establish decentralized convergence to a desired configuration that optimally represents the environmental target.
ARIES: A Scalable Multi-Agent Orchestration Framework for Real-Time Epidemiological Surveillance and Outbreak Monitoring
Global health surveillance is currently facing a challenge of Knowledge Gaps. While general-purpose AI has proliferated, it remains fundamentally unsuited for the high-stakes epidemiological domain due to chronic hallucinations and an inability to navigate specialized data silos. This paper introduces ARIES (Agentic Retrieval Intelligence for Epidemiological Surveillance), a specialized, autonomous multi-agent framework designed to move beyond static, disease-specific dashboards toward a dynamic intelligence ecosystem. Built on a hierarchical command structure, ARIES utilizes GPTs to orchestrate a scalable swarm of sub-agents capable of autonomously querying World Health Organization (WHO), Center for Disease Control and Prevention (CDC), and peer-reviewed research papers. By automating the extraction and logical synthesis of surveillance data, ARIES provides a specialized reasoning that identifies emergent threats and signal divergence in near real-time. This modular architecture proves that a task-specific agentic swarm can outperform generic models, offering a robust, extensible for next-generation outbreak response and global health intelligence.
comment: 6 pages, 14 figures, 1 table
Modellierung und Simulation der Dynamik von Fussgängerströmen
This work presents a microscopic model to describe pedestrian flows based on the social force theory. The aim of this study is twofold: (1) developing a realistic model that can be used as a tool for designing pedestrian-friendly infrastructure, and (2) verifying a social science theory using a model with sufficient data. The investigation of the pedestrian model shows that despite simple individual behavior patterns, complex spatial and temporal structures emerge through the interactions in pedestrian flows. Collective behavior emerges from individuals following two basic rules: (1) moving directly towards their goal at a certain speed, and (2) maintaining a distance to other pedestrians and obstacles. This self-organized collective behavior manifests itself as trails that are formed by pedestrians moving in one direction. Furthermore, strong dependencies of the properties of pedestrian flows on geometric forms of buildings are shown, and the influence of geometric changes on performance characteristics is investigated. An example demonstrates how efficiency can be increased by reducing walkable areas. This work also presents an evolutionary algorithm for optimizing building layouts based on the social force model. Additionally, a decision-making model is integrated to describe alternative goal selection, and adaptation and learning capabilities are included to improve pedestrian avoidance behavior and decision strategies based on accumulated experience. A method for determining load distributions in individual sections of a path system considering subjective selection criteria is also developed. Finally, a model that describes the self-organization of path systems with minimal detours is presented, similar to natural transport networks where total length and material costs are optimized.
comment: In german language. Submitted as doctoral dissertation at the University of Stuttgart
Stigmergic Swarming Agents for Fast Subgraph Isomorphism AAMAS 2026
Maximum partial subgraph isomorphism compares two graphs (nodes joined by edges) to find a largest common subgraph. A common use case, for graphs with labeled nodes, seeks to find instances of a \textit{query} graph with $q$ nodes in a (typically larger) \textit{data} graph with $d$ nodes. The problem is NP-complete, and naïve solutions are exponential in $q + d$. The fastest current heuristic has complexity $O(d^2)$. This paper outlines ASSIST (Approximate Swarming Subgraph Isomorphism through Stigmergy), inspired by the ant colony optimization approach to the traveling salesperson. After peering (identifying matching individual nodes in query and data) in time $O(q\cdot log(d))$, the time required for ASSIST's iterative subgraph search, the combinatorially complex part of the problem, is linear in query size and constant in data size. ASSIST can be extended to support matching problems (such as temporally ordered edges, inexact matches, and missing nodes or edges in the data graph) that frustrate other heuristics.
comment: Accepted as full paper for AAMAS 2026 (May 2026)
μACP: A Formal Calculus for Expressive, Resource-Constrained Agent Communication AAMAS 2026
Agent communication remains a foundational problem in multi-agent systems: protocols such as FIPA-ACL guarantee semantic richness but are intractable for constrained environments, while lightweight IoT protocols achieve efficiency at the expense of expressiveness. This paper presents $μ$ACP, a formal calculus for expressive agent communication under explicit resource bounds. We formalize the Resource-Constrained Agent Communication (RCAC) model, prove that a minimal four-verb basis \textit{\{PING, TELL, ASK, OBSERVE\}} is suffices to encode finite-state FIPA protocols, and establish tight information-theoretic bounds on message complexity. We further show that $μ$ACP can implement standard consensus under partial synchrony and crash faults, yielding a constructive coordination framework for edge-native agents. Formal verification in TLA$^{+}$ (model checking) and Coq (mechanized invariants) establishes safety and boundedness, and supports liveness under modeled assumptions. Large-scale system simulations confirm ACP achieves a median end-to-end message latency of 34 ms (95th percentile 104 ms) at scale, outperforming prior agent and IoT protocols under severe resource constraints. The main contribution is a unified calculus that reconciles semantic expressiveness with provable efficiency, providing a rigorous foundation for the next generation of resource-constrained multi-agent systems.
comment: Accepted as a Full paper at AAMAS 2026
Fusion-PSRO: Nash Policy Fusion for Policy Space Response Oracles ECAI 2025
For solving zero-sum games involving non-transitivity, a useful approach is to maintain a policy population to approximate the Nash Equilibrium (NE). Previous studies have shown that the Policy Space Response Oracles (PSRO) algorithm is an effective framework for solving such games. However, current methods initialize a new policy from scratch or inherit a single historical policy in Best Response (BR), missing the opportunity to leverage past policies to generate a better BR. In this paper, we propose Fusion-PSRO, which employs Nash Policy Fusion to initialize a new policy for BR training. Nash Policy Fusion serves as an implicit guiding policy that starts exploration on the current Meta-NE, thus providing a closer approximation to BR. Moreover, it insightfully captures a weighted moving average of past policies, dynamically adjusting these weights based on the Meta-NE in each iteration. This cumulative process further enhances the policy population. Empirical results on classic benchmarks show that Fusion-PSRO achieves lower exploitability, thereby mitigating the shortcomings of previous research on policy initialization in BR.
comment: Accepted by ECAI 2025
CEE: An Inference-Time Jailbreak Defense for Embodied Intelligence via Subspace Concept Rotation
Large language models (LLMs) are widely used for task understanding and action planning in embodied intelligence (EI) systems, but their adoption substantially increases vulnerability to jailbreak attacks. While recent work explores inference-time defenses, existing methods rely on static interventions on intermediate representations, which often degrade generation quality and impair adherence to task instructions, reducing system usability in EI settings. We propose a dynamic defense framework. For each EI inference request, we dynamically construct a task-specific safety-semantic subspace, project its hidden state to the most relevant direction, and apply SLERP rotation for adaptive safety control. At comparable defense success rates, our method preserves generation quality, improves usability, reduces tuning cost, and strengthens robustness in EI scenarios.
When Identity Skews Debate: Anonymization for Bias-Reduced Multi-Agent Reasoning
Multi-agent debate (MAD) aims to improve large language model (LLM) reasoning by letting multiple agents exchange answers and then aggregate their opinions. Yet recent studies reveal that agents are not neutral: they are prone to identity-driven sycophancy and self-bias, uncritically adopting a peer's view or stubbornly adhering to their own prior output, undermining the reliability of debate. In this work, we present the first principled framework that joins sycophancy and self-bias to mitigate and quantify identity bias in MAD. First, we formalize the debate dynamics as an identity-weighted Bayesian update process. Second, we propose response anonymization: by removing identity markers from prompts, agents cannot distinguish "self" from "peer", which forces equal weights on agent identity, thereby reducing bias and improving trustworthiness. Third, we define the Identity Bias Coefficient (IBC), a principled bias metric that measures an agent's tendency to follow its peer versus itself. Empirical studies across multiple models and benchmarks confirm that identity bias is widespread, with sycophancy far more common than self-bias. Our findings highlight the need to ensure that MAD systems reason based on content rather than identity. Code is released in https://github.com/deeplearning-wisc/MAD-identity-bias.
Large Language Models can Achieve Social Balance
Large Language Models (LLMs) can be deployed in situations where they process positive/negative interactions with other agents. We study how this is done under the sociological framework of social balance, which explains the emergence of one faction or multiple antagonistic ones among agents. Across different LLM models, we find that balance depends on the (i) type of interaction, (ii) update mechanism, and (iii) population size. Across (i)-(iii), we characterize the frequency at which social balance is achieved, the justifications for the social dynamics, and the diversity and stability of interactions. Finally, we explain how our findings inform the deployment of agentic systems.
Systems and Control (EESS)
Multi-mode Fault Diagnosis Datasets of Three-phase Asynchronous Motor Under Variable Working Conditions
Three-phase asynchronous motor are fundamental components in industrial systems, and their failure can lead to significant operational downtime and economic losses. Vibration and current signals are effective indicators for monitoring motor health and diagnosing faults. However, motors in real applications often operate under variable conditions such as fluctuating speeds and loads, which complicate the fault diagnosis process. This paper presents a comprehensive dataset collected from a three-phase asynchronous motor under various fault types and severities, operating under diverse speed and load conditions. The dataset includes both single faults and mechanical-electrical compound faults, such as rotor unbalance, stator winding short circuits, bearing faults, and their combinations. Data were acquired under both steady and transitional conditions, with signals including triaxial vibration, three-phase currents, torque, and key-phase signals. This dataset supports the development and validation of robust fault diagnosis methods for electric motors under realistic operating conditions.
comment: 12 pages, 9 figures
Machine Learning Guided Cooling Optimization for Data Centers
Effective data center cooling is crucial for reliable operation; however, cooling systems often exhibit inefficiencies that result in excessive energy consumption. This paper presents a three-stage, physics-guided machine learning framework for identifying and reducing cooling energy waste in high-performance computing facilities. Using one year of 10-minute resolution operational data from the Frontier exascale supercomputer, we first train a monotonicity-constrained gradient boosting surrogate that predicts facility accessory power from coolant flow rates, temperatures, and server power. The surrogate achieves a mean absolute error of 0.026 MW and predicts power usage effectiveness within 0.01 of measured values for 98.7% of test samples. In the second stage, the surrogate serves as a physics-consistent baseline to quantify excess cooling energy, revealing approximately 85 MWh of annual inefficiency concentrated in specific months, hours, and operating regimes. The third stage evaluates guardrail-constrained counterfactual adjustments to supply temperature and subloop flows, demonstrating that up to 96% of identified excess can be recovered through small, safe setpoint changes while respecting thermal limits and operational constraints. The framework yields interpretable recommendations, supports counterfactual analyses such as flow reduction during low-load periods and redistribution of thermal duty across cooling loops, and provides a practical pathway toward quantifiable reductions in accessory power. The developed framework is readily compatible with model predictive control and can be extended to other liquid-cooled data centers with different configurations and cooling requirements.
comment: 10 pages, 11 figures
Characterizing All Locally Exponentially Stabilizing Controllers as a Linear Feedback Plus Learnable Nonlinear Youla Dynamics
We derive a state-space characterization of all dynamic state-feedback controllers that make an equilibrium of a nonlinear input-affine continuous-time system locally exponentially stable. Specirically, any controller obtained as the sum of a linear state-feedback $u=Kx$, with $K$ stabilizing the linearized system, and the output of internal locally exponentially stable controller dynamics is itself locally exponentially stabilizing. Conversely, every dynamic state-feedback controller that locally exponentially stabilizes the equilibrium admits such a decomposition. The result can be viewed as a state-space nonlinear Youla-type parametrization specialized to local, rather than global, and exponential, rather than asymptotic, closed-loop stability. The residual locally exponentially stable controller dynamics can be implemented with stable recurrent neural networks and trained as neural ODEs to achieve high closed-loop performance in nonlinear control tasks.
Optimal Dispatch of Electricity and Water in Renewable-Integrated Desalination Plants
We develop a mathematical framework for the optimal dispatch of flexible water desalination plants (WDPs) as hybrid generator-load resources. WDPs integrate thermal generation, membrane-based controllable loads, and renewable energy sources, offering unique operational flexibility for power system operations. They can simultaneously participate in two markets: selling desalinated water to a water utility, and bidirectionally transacting electricity with the grid based on their net electricity demand. We formulate the dispatch decision problem of a profit-maximizing WDP, capturing operational, technological, and market-based coupling between water and electricity flows. The threshold-based structure we derive provides computationally tractable coordination suitable for large-scale deployment, offering operational insights into how thermal generation and membrane-based loads complementarily provide continuous bidirectional flexibility. The thresholds are analytically characterized in closed form as explicit functions of technology and tariff parameters. We examine how small changes in the exogenous tariff and technology parameters affect the WDP's profit. Extensive simulations illustrate the optimal WDP's operation, profit, and water-electricity exchange, demonstrating significant improvements relative to benchmark algorithms.
comment: 14 pages, 7 figures, 1 table
Finite-State Decentralized Policy-Based Control With Guaranteed Ground Coverage
We propose a finite-state, decentralized decision and control framework for multi-agent ground coverage. The approach decomposes the problem into two coupled components: (i) the structural design of a deep neural network (DNN) induced by the reference configuration of the agents, and (ii) policy-based decentralized coverage control. Agents are classified as anchors and followers, yielding a generic and scalable communication architecture in which each follower interacts with exactly three in-neighbors from the preceding layer, forming an enclosing triangular communication structure. The DNN training weights implicitly encode the spatial configuration of the agent team, thereby providing a geometric representation of the environmental target set. Within this architecture, we formulate a computationally efficient decentralized Markov decision process (MDP) whose components are time-invariant except for a time-varying cost function defined by the deviation from the centroid of the target set contained within each agent communication triangle. By introducing the concept of Anyway Output Controllability (AOC), we assume each agent is AOC and establish decentralized convergence to a desired configuration that optimally represents the environmental target.
Ageing Monitoring for Commercial Microcontrollers Based on Timing Windows
Microcontrollers are increasingly present in embedded deployments and dependable applications, for which malfunctions due to hardware ageing can have severe impact. The lack of deployable techniques for ageing monitoring on these devices has spread the application of guard bands to prevent timing errors due to degradation. Applying this static technique can limit performance and lead to sudden failures as devices age. In this paper, we follow a software-based self-testing approach to design monitoring of hardware degradation for microcontrollers. Deployable in the field, our technique leverages timing windows of variable lengths to determine the maximum operational frequency of the devices. We empirically validate the method on real hardware and find that it consistently detects temperature-induced degradations in maximum operating frequency of up to 13.79 % across devices for 60 °C temperature increase.
Asymptotic Behavior of an Unforced Duhem-Type Hysteretic Oscillator
The article describes fundamental analytical properties of an unforced mechanical oscillator with a Duhem-type viscoelastoplastic hysteretic element. These properties include global existence of solutions, uniqueness of solutions, and convergence of each solution to an equilibrium point.
comment: 8 pages
Policy Optimization with Differentiable MPC: Convergence Analysis under Uncertainty
Model-based policy optimization is a well-established framework for designing reliable and high-performance controllers across a wide range of control applications. Recently, this approach has been extended to model predictive control policies, where explicit dynamical models are embedded within the control law. However, the performance of the resulting controllers, and the convergence of the associated optimization algorithms, critically depends on the accuracy of the models. In this paper, we demonstrate that combining gradient-based policy optimization with recursive system identification ensures convergence to an optimal controller design and showcase our finding in several control examples.
Distributed Federated Learning by Alternating Periods of Training
Federated learning is a privacy-focused approach towards machine learning where models are trained on client devices with locally available data and aggregated at a central server. However, the dependence on a single central server is challenging in the case of a large number of clients and even poses the risk of a single point of failure. To address these critical limitations of scalability and fault-tolerance, we present a distributed approach to federated learning comprising multiple servers with inter-server communication capabilities. While providing a fully decentralized approach, the designed framework retains the core federated learning structure where each server is associated with a disjoint set of clients with server-client communication capabilities. We propose a novel DFL (Distributed Federated Learning) algorithm which uses alternating periods of local training on the client data followed by global training among servers. We show that the DFL algorithm, under a suitable choice of parameters, ensures that all the servers converge to a common model value within a small tolerance of the ideal model, thus exhibiting effective integration of local and global training models. Finally, we illustrate our theoretical claims through numerical simulations.
A Survey on Applications of Quantum Computing for Unit Commitment
Unit Commitment (UC) is a core optimization problem in power system operation and electricity market scheduling. It determines the optimal on/off status and dispatch of generating units while satisfying system, operational, and market constraints. Traditionally, UC has been solved using mixed-integer programming, dynamic programming, or metaheuristic methods, all of which face scalability challenges as systems grow in size and uncertainty. Recent advances in quantum computing, spanning quantum annealing, variational algorithms, and hybrid quantum classical optimization, have opened new opportunities to accelerate UC solution processes by exploiting quantum parallelism and entanglement. This paper presents a comprehensive survey of existing research on the applications of quantum computing for solving the UC problem. The reviewed works are categorized based on the employed quantum paradigms, including annealing-based, variational hybrid, quantum machine learning, and quantum-inspired methods. Key modeling strategies, hardware implementations, and computational trade-offs are discussed, highlighting the current progress, limitations, and potential future directions for large-scale quantum-enabled UC.
EdgeSSVEP: A Fully Embedded SSVEP BCI Platform for Low-Power Real-Time Applications
Brain-Computer Interfaces (BCIs) enable users to interact with machines directly via neural activity, yet their real-world deployment is often hindered by bulky and powerhungry hardware. We present EdgeSSVEP, a fully embedded microcontroller-based Steady-State Visually Evoked Potential (SSVEP) BCI platform that performs real-time EEG acquisition, zero-phase filtering, and on-device classification within a lowpower 240 MHz MCU operating at only 222 mW. The system incorporates an 8-channel EEG front end, supports 5-second stimulus durations, and executes the entire SSVEP decoding pipeline locally, eliminating dependence on PC-based processing. EdgeSSVEP was evaluated using six stimulus frequencies (7, 8, 9, 11, 7.5, and 8.5 Hz) with 10 participants. The device achieved 99.17% classification accuracy and 27.33 bits/min Information Transfer Rate (ITR), while consuming substantially less power than conventional desktop-based systems. The system integrates motion sensing to support artifact detection and improve robustness and signal stability in practical environments. For development and debugging, the system also provides optional TCP data streaming to external clients. Overall, EdgeSSVEP offers a scalable, energy-efficient, and secure embedded BCI platform suitable for assistive communication and neurofeedback applications, with potential extensions to accelerometer-based artifact mitigation and broader real-world deployments.
Simulations and Advancements in MRI-Guided Power-Driven Ferric Tools for Wireless Therapeutic Interventions
Designing a robotic system that functions effectively within the specific environment of a Magnetic Resonance Imaging (MRI) scanner requires solving numerous technical issues, such as maintaining the robot's precision and stability under strong magnetic fields. This research focuses on enhancing MRI's role in medical imaging, especially in its application to guide intravascular interventions using robot-assisted devices. A newly developed computational system is introduced, designed for seamless integration with the MRI scanner, including a computational unit and user interface. This system processes MR images to delineate the vascular network, establishing virtual paths and boundaries within vessels to prevent procedural damage. Key findings reveal the system's capability to create tailored magnetic field gradient patterns for device control, considering the vessel's geometry and safety norms, and adapting to different blood flow characteristics for finer navigation. Additionally, the system's modeling aspect assesses the safety and feasibility of navigating pre-set vascular paths. Conclusively, this system, based on the Qt framework and C/C++, with specialized software modules, represents a major step forward in merging imaging technology with robotic aid, significantly enhancing precision and safety in intravascular procedures.
comment: 10 pages, 7 figures
Extremum Seeking Control for Wave-PDE Actuation with Distributed Effects
This paper deals with the gradient-based extremum seeking control (ESC) with actuation dynamics governed by distributed wave partial differential equations (PDEs). To achieve the control objective of real-time optimization for this class of infinite-dimensional systems, we first solve the trajectory generation problem to re-design the additive perturbation signal of the ESC system. Then, we develop a boundary control law through the backstepping method to compensate for the wave PDE with distributed effects, which ensures the exponential stability of the average closed-loop system by means of a Lyapunov-based analysis. At last, by employing the averaging theory for infinite-dimensional systems, we prove that the closed-loop trajectories converge to a small neighborhood surrounding the optimal point. Numerical simulations are presented to illustrate the effectiveness of the proposed method.
comment: 10 pages, 4 figures
AI Social Responsibility as Reachability: Execution-Level Semantics for the Social Responsibility Stack
Artificial intelligence systems are increasingly embedded as persistent, closed-loop components within cyber-physical, social, and institutional processes. Rather than producing isolated outputs, such systems operate continuously under feedback, adaptation, and scale, reshaping physical flows, human behavior, and institutional practice over time. In these settings, socially unacceptable outcomes rarely arise from singular faults or explicit policy violations. Instead, they emerge through cumulative execution trajectories enabled by repetition, concurrency, and feedback. This paper advances the formal foundation of the Social Responsibility Stack (SRS) by making its central requirement explicit: responsibility is fundamentally a reachability property of system execution. A system is responsible iff its execution semantics prevent entry into inadmissible global configurations, regardless of local performance gains or optimization objectives. Responsibility failures are therefore not objective-level errors, but execution-level failures of trajectory control. To operationalize this perspective, we introduce Petri nets as an execution-level formalism for responsible autonomous systems. We show how SRS value commitments correspond to forbidden markings, safeguards to structural constraints on transition firing, auditing to monitoring of reachability pressure, and governance to legitimate modification of execution structure. Embedding Petri-net reachability within the SRS architecture internalizes responsibility as a structural invariant rather than an external objective or post-hoc mechanism. These results establish the Social Responsibility Stack as an executable responsibility architecture and position reachability-based execution semantics as a necessary foundation for responsible autonomy in feedback-rich cyber-physical and socio-technical systems.
O-DSS: An Open Dynamic Spectrum Sharing Framework for Cellular-Radar Coexistence in Mid-band Frequencies
The growing demand for mid-band spectrum necessitates efficient Dynamic Spectrum Sharing (DSS) to ensure coexistence between cellular networks and incumbent radar systems. Existing Spectrum Access System (SAS) frameworks rely on fixed Environmental Sensing Capability (ESC) sensors, which are latency-prone and inflexible. This paper introduces O-DSS, an O-RAN-compliant, Machine Learning (ML)-driven DSS framework that enables real-time cellular-radar coexistence in mid-band frequencies with shipborne and fast-moving airborne radars. O-DSS integrates radar detection from low-overhead Key Performance Metrics (KPMs) with spectrogram-based localization to drive fine-grained RAN control, including PRB blanking and radar-aware MCS adaptation. Deployed as a modular xApp, O-DSS achieves 60~ms detection and 700~ms evacuation latencies, outperforming existing baselines. Evaluations across simulations and Over-The-Air (OTA) testbeds show that O-DSS ensures robust incumbent protection while maintaining cellular performance by achieving radar detection of $\geq 99\%$ at SINR $\geq -4$~dB and localization recall of $\geq 95\%$ at SINR $\geq 8$~dB.
comment: Paper has been accepted to INFOCOM 2026
AMC26: High-performance DOb for robust position control
This paper presents a new HPDOb that significantly improves disturbance estimation accuracy and robustness in motion control systems, surpassing the capabilities of conventional DObs. The proposed observer is analysed and synthesised in the discrete-time domain, providing a realistic representation of their dynamic behaviour and enabling enhanced controller design for practical applications. The core contribution of the HPDOb is a novel synthesis method that incorporates higher-order truncation error dynamics into disturbance estimation. Unlike conventional DObs, which are limited to zero-order truncation error, the HPDOb achieves first-order truncation error, yielding markedly improved estimation accuracy and robustness against disturbances in motion control systems. Simulation and experiments verify the stability and performance of HPDOb.
AMC26: VSSEA robust position control
This paper presents robust position control strategies for the novel VSSEA. By employing a constructed state-space model, two control schemes are developed in a unified framework: a state-feedback controller and a sliding mode controller, both integrated with a second-order DOb. The proposed framework achieves high-performance motion control by precisely estimating and compensating for internal and external disturbances, while preserving the nominal dynamic response. Simulation results demonstrate that pole-placement-based controllers are highly sensitive to disturbances, whereas LQR-based controllers offer improved robustness at the expense of slower dynamics. By incorporating DOb, robustness is significantly enhanced without degrading time response, and the LQR controller can be tuned solely for performance optimization. Experimental results confirm that the proposed robust position controllers can be implemented in real world applications. These results highlight the effectiveness of the proposed approach and lay the foundation for future investigations on robust stability and performance under different stiffness settings.
Improving Action Smoothness for a Cascaded Online Learning Flight Control System
This paper aims to improve the action smoothness of a cascaded online learning flight control system. Although the cascaded structure is widely used in flight control design, its stability can be compromised by oscillatory control actions, which poses challenges for practical engineering applications. To address this issue, we introduce an online temporal smoothness technique and a low-pass filter to reduce the amplitude and frequency of the control actions. Fast Fourier Transform (FFT) is used to analyze policy performance in the frequency domain. Simulation results demonstrate the improvements achieved by the two proposed techniques.
Dynamic Input Mapping Inversion to Eliminate Algebraic Loops in Hydraulic Actuator Control
The application of nonlinear control schemes to electro-hydraulic actuators often requires several alterations in the design of the controllers during their implementation. This is to overcome challenges that frequently arise in such control algorithms owing to model nonlinearities. Moreover, advanced control solutions for this type of systems often introduce input algebraic loops that pose significant design and tuning difficulties. Conventional methods to avoid such loops introduce chatter, which considerably degrade tracking performance and has oil degradation and wear as side effects. This study presents a nonlinear control architecture for hydraulic actuators that comprises low-complexity modules that facilitate robust high performance in tracking and avoids the drawbacks of chatter. The salient feature is a dynamic input-mapping inversion module that avoids algebraic loops in the control input and is followed by dedicated position control. The stability of the closed-loop system is analyzed using arguments from Lyapunov theory for cascaded non-autonomous nonlinear systems. The effectiveness of the proposed solution is evaluated on a high-fidelity simulator of a wind turbine pitch system, and validated on a full-scale laboratory setup that includes a hydraulic pitch system and blade bearing. Appropriate quantitative metrics are used to evaluate the closed-loop system performance in comparison to a state-of-the-art nonlinear design.
comment: Author version of an article published in IEEE Transactions on Control Systems Technology. The final version is available via DOI
Pricing Short-Circuit Current via a Primal-Dual Formulation for Preserving Integrality Constraints SC
Synchronous Generators (SGs) currently provide important levels of Short-Circuit Current (SCC), a critical ancillary service that ensures line protections trip during short-circuit faults. Given the ongoing replacement of SGs by power-electronics-based generation, which have a hard limit for current injection, it has become relevant to optimize the procurement of SCC provided by remaining SGs. Pricing this service is however challenging due to the integrality constraints in Unit Commitment (UC). Existing methods, e.g., dispatchable pricing, restricted pricing and marginal unit pricing, attempt to address this issue but exhibit limitations in handling binary variables, resulting in SCC prices that either fail to cover the operating costs of units or lack interpretability. To overcome these pitfalls, we propose a primal-dual formulation of the SCC-constrained dispatch that preserves the binary nature of UC while effectively computing shadow prices of SCC services. Using a modified IEEE 30-bus system, a comparison is carried out between the proposed approach and the state-of-the-art pricing schemes, highlighting the advantages of the primal-dual method in preserving UC integrality for SCC pricing.
comment: In the primal-dual pricing method used in the paper, its dual problem is actually dual to dispatchable pricing (except that the dispatchable method discards the nonlinear terms in the SCC constraint). Therefore, there is no essential difference between the two pricing methods
Distributed Koopman Operator Learning for Perception and Safe Navigation
This paper presents a unified and scalable framework for predictive and safe autonomous navigation in dynamic transportation environments by integrating model predictive control (MPC) with distributed Koopman operator learning. High-dimensional sensory data are employed to model and forecast the motion of surrounding dynamic obstacles. A consensus-based distributed Koopman learning algorithm enables multiple computational agents or sensing units to collaboratively estimate the Koopman operator without centralized data aggregation, thereby supporting large-scale and communication-efficient learning across a networked system. The learned operator predicts future spatial densities of obstacles, which are subsequently represented through Gaussian mixture models. Their confidence ellipses are approximated by convex polytopes and embedded as linear constraints in the MPC formulation to guarantee safe and collision-free navigation. The proposed approach not only ensures obstacle avoidance but also scales efficiently with the number of sensing or computational nodes, aligning with cooperative perception principles in autonomous navigation applications. Theoretical convergence guarantees and predictive constraint formulations are established, and extensive simulations demonstrate reliable, safe, and computationally efficient navigation performance in complex environments.
AdGT: Decentralized Gradient Tracking with Tuning-free Per-Agent Stepsize
In decentralized optimization, the choice of stepsize plays a critical role in algorithm performance. A common approach is to use a shared stepsize across all agents to ensure convergence. However, selecting an optimal stepsize often requires careful tuning, which can be time-consuming and may lead to slow convergence, especially when there is significant variation in the smoothness (L-smoothness) of local objective functions across agents. Individually tuning stepsizes per agent is also impractical, particularly in large-scale networks. To address these limitations, we propose AdGT, an adaptive gradient tracking method that enables each agent to adjust its stepsize based on the smoothness of its local objective. We prove that AdGT achieves linear convergence to the global optimal solution. Through numerical experiments, we compare AdGT with fixed-stepsize gradient tracking methods and demonstrate its superior performance. Additionally, we compare AdGT with adaptive gradient descent (AdGD) in a centralized setting and observe that fully adaptive stepsizes offer greater benefits in decentralized networks than in centralized ones.
Convergence Filters for Efficient Economic MPC of Non-dissipative Systems
This note presents a novel and efficient Economic Model Predictive Control (EMPC) scheme specifically designed for non-dissipative systems subject to state and input constraints. To address the stability challenge of EMPC for constrained non-dissipative systems, a new concept of convergence filters is introduced. Three alternative convergence filters are designed accordingly to be incorporated into the receding horizon optimization problem of EMPC. To improve online computational efficiency, the variable horizon approach without explicit terminal state constraints is adopted. This design allows for a flexible trade-off among convergence speed, economic performance, and computational burden via simple parameter adjustment. Moreover, sufficient conditions are rigorously derived to guarantee recursive feasibility and stability. The advantages of the proposed EMPC are validated through simulations on a classical non-dissipative continuous stirred-tank reactor.
comment: Revised version with updated author list and minor text corrections
Sufficient and Necessary Barrier-like Conditions for Safety and Reach-avoid Verification of Stochastic Discrete-time Systems
This paper investigates necessary and sufficient barrier-like conditions for infinite-horizon safety and reach-avoid verification of stochastic discrete-time systems, derived via a relaxation of the Bellman equations. Unlike prior approaches that primarily focus on sufficient conditions, our work rigorously establishes both necessity and sufficiency for infinite-horizon properties. Safety verification concerns certifying that, starting from a given initial state, the system remains within a safe set at all future time steps with probability at least equal to a specified threshold. For this purpose, we formulate a necessary and sufficient barrier-like condition that captures this infinite-time safety property. In contrast, reach-avoid verification generalizes safety verification by also incorporating reachability. Specifically, it aims to ensure that the probability of the system, starting from a given initial state, eventually reaching a target set while remaining within the safe set until the first hit of the target is no less than a prescribed bound. Under suitable assumptions, we establish two necessary and sufficient barrier-like conditions for this reach-avoid specification.
comment: This paper has been accepted for publication in the journal Automatica
Robust Control Design and Analysis Based on Lifting Linearization of Nonlinear Systems Under Uncertain Initial Conditions
This paper presents a robust control synthesis and analysis framework for nonlinear systems with uncertain initial conditions. First, a deep learning-based lifting approach is proposed to approximate nonlinear dynamical systems with linear parameter-varying (LPV) state-space models in higher-dimensional spaces while simultaneously characterizing the uncertain initial states within the lifted state space. Then, convex synthesis conditions are provided to generate full-state feedback nonstationary LPV (NSLPV) controllers for the lifted LPV system. A performance measure similar to the l2-induced norm is used to provide robust performance guarantees in the presence of exogenous disturbances and uncertain initial conditions. The paper also includes results for synthesizing full-state feedback linear time-invariant controllers and output feedback NSLPV controllers. Additionally, a robustness analysis approach based on integral quadratic constraint (IQC) theory is developed to analyze and tune the synthesized controllers while accounting for noise associated with state measurements. This analysis approach characterizes model parameters and disturbance inputs using IQCs to reduce conservatism. Finally, the effectiveness of the proposed framework is demonstrated through two illustrative examples.
comment: Published in the International Journal of Robust and Nonlinear Control. This version incorporates minor revisions from peer review, including a slightly modified title
Myopically Verifiable Probabilistic Certificates for Safe Control and Learning
This paper addresses the design of safety certificates for stochastic systems, with a focus on ensuring long-term safety through fast real-time control. In stochastic environments, set invariance-based methods that restrict the probability of risk events in infinitesimal time intervals may exhibit significant long-term risks due to cumulative uncertainties/risks. On the other hand, reachability-based approaches that account for the long-term future may require prohibitive computation in real-time decision making. To overcome this challenge involving stringent long-term safety vs. computation tradeoffs, we first introduce a novel technique termed 'probabilistic invariance'. This technique characterizes the invariance conditions of the probability of interest. When the target probability is defined using long-term trajectories, this technique can be used to design myopic conditions/controllers with assured long-term safe probability. Then, we integrate this technique into safe control and learning. The proposed control methods efficiently assure long-term safety using neural networks or model predictive controllers with short outlook horizons. The proposed learning methods can be used to guarantee long-term safety during and after training. Finally, we demonstrate the performance of the proposed techniques in numerical simulations.
Robotics
VisuoTactile 6D Pose Estimation of an In-Hand Object using Vision and Tactile Sensor Data ICRA 2022
Knowledge of the 6D pose of an object can benefit in-hand object manipulation. In-hand 6D object pose estimation is challenging because of heavy occlusion produced by the robot's grippers, which can have an adverse effect on methods that rely on vision data only. Many robots are equipped with tactile sensors at their fingertips that could be used to complement vision data. In this paper, we present a method that uses both tactile and vision data to estimate the pose of an object grasped in a robot's hand. To address challenges like lack of standard representation for tactile data and sensor fusion, we propose the use of point clouds to represent object surfaces in contact with the tactile sensor and present a network architecture based on pixel-wise dense fusion. We also extend NVIDIA's Deep Learning Dataset Synthesizer to produce synthetic photo-realistic vision data and corresponding tactile point clouds. Results suggest that using tactile data in addition to vision data improves the 6D pose estimate, and our network generalizes successfully from synthetic training to real physical robots.
comment: Accepted for publication in IEEE Robotics and Automation Letters (RA-L), January 2022. Presented at ICRA 2022. This is the author's version of the manuscript
DemoBot: Efficient Learning of Bimanual Manipulation with Dexterous Hands From Third-Person Human Videos
This work presents DemoBot, a learning framework that enables a dual-arm, multi-finger robotic system to acquire complex manipulation skills from a single unannotated RGB-D video demonstration. The method extracts structured motion trajectories of both hands and objects from raw video data. These trajectories serve as motion priors for a novel reinforcement learning (RL) pipeline that learns to refine them through contact-rich interactions, thereby eliminating the need to learn from scratch. To address the challenge of learning long-horizon manipulation skills, we introduce: (1) Temporal-segment based RL to enforce temporal alignment of the current state with demonstrations; (2) Success-Gated Reset strategy to balance the refinement of readily acquired skills and the exploration of subsequent task stages; and (3) Event-Driven Reward curriculum with adaptive thresholding to guide the RL learning of high-precision manipulation. The novel video processing and RL framework successfully achieved long-horizon synchronous and asynchronous bimanual assembly tasks, offering a scalable approach for direct skill acquisition from human videos.
Action-Sketcher: From Reasoning to Action via Visual Sketches for Long-Horizon Robotic Manipulation
Long-horizon robotic manipulation is increasingly important for real-world deployment, requiring spatial disambiguation in complex layouts and temporal resilience under dynamic interaction. However, existing end-to-end and hierarchical Vision-Language-Action (VLA) policies often rely on text-only cues while keeping plan intent latent, which undermines referential grounding in cluttered or underspecified scenes, impedes effective task decomposition of long-horizon goals with close-loop interaction, and limits causal explanation by obscuring the rationale behind action choices. To address these issues, we first introduce Visual Sketch, an implausible visual intermediate that renders points, boxes, arrows, and typed relations in the robot's current views to externalize spatial intent, connect language to scene geometry. Building on Visual Sketch, we present Action-Sketcher, a VLA framework that operates in a cyclic See-Think-Sketch-Act workflow coordinated by adaptive token-gated strategy for reasoning triggers, sketch revision, and action issuance, thereby supporting reactive corrections and human interaction while preserving real-time action prediction. To enable scalable training and evaluation, we curate diverse corpus with interleaved images, text, Visual Sketch supervision, and action sequences, and train Action-Sketcher with a multi-stage curriculum recipe that combines interleaved sequence alignment for modality unification, language-to-sketch consistency for precise linguistic grounding, and imitation learning augmented with sketch-to-action reinforcement for robustness. Extensive experiments on cluttered scenes and multi-object tasks, in simulation and on real-world tasks, show improved long-horizon success, stronger robustness to dynamic scene changes, and enhanced interpretability via editable sketches and step-wise plans. Project website: https://action-sketcher.github.io
comment: 26 pages, 14 figures
HanoiWorld : A Joint Embedding Predictive Architecture BasedWorld Model for Autonomous Vehicle Controller
Current attempts of Reinforcement Learning for Autonomous Controller are data-demanding while the results are under-performed, unstable, and unable to grasp and anchor on the concept of safety, and over-concentrating on noise features due to the nature of pixel reconstruction. While current Self-Supervised Learningapproachs that learning on high-dimensional representations by leveraging the JointEmbedding Predictive Architecture (JEPA) are interesting and an effective alternative, as the idea mimics the natural ability of the human brain in acquiring new skill usingimagination and minimal samples of observations. This study introduces Hanoi-World, a JEPA-based world model that using recurrent neural network (RNN) formaking longterm horizontal planning with effective inference time. Experimentsconducted on the Highway-Env package with difference enviroment showcase the effective capability of making a driving plan while safety-awareness, with considerablecollision rate in comparison with SOTA baselines
AIMS: An Adaptive Integration of Multi-Sensor Measurements for Quadrupedal Robot Localization
This paper addresses the problem of accurate localization for quadrupedal robots operating in narrow tunnel-like environments. Due to the long and homogeneous characteristics of such scenarios, LiDAR measurements often provide weak geometric constraints, making traditional sensor fusion methods susceptible to accumulated motion estimation errors. To address these challenges, we propose AIMS, an adaptive LiDAR-IMU-leg odometry fusion method for robust quadrupedal robot localization in degenerate environments. The proposed method is formulated within an error-state Kalman filtering framework, where LiDAR and leg odometry measurements are integrated with IMU-based state prediction, and measurement noise covariance matrices are adaptively adjusted based on online degeneracy-aware reliability assessment. Experimental results obtained in narrow corridor environments demonstrate that the proposed method improves localization accuracy and robustness compared with state-of-the-art approaches.
DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving
Video generation models, as one form of world models, have emerged as one of the most exciting frontiers in AI, promising agents the ability to imagine the future by modeling the temporal evolution of complex scenes. In autonomous driving, this vision gives rise to driving world models: generative simulators that imagine ego and agent futures, enabling scalable simulation, safe testing of corner cases, and rich synthetic data generation. Yet, despite fast-growing research activity, the field lacks a rigorous benchmark to measure progress and guide priorities. Existing evaluations remain limited: generic video metrics overlook safety-critical imaging factors; trajectory plausibility is rarely quantified; temporal and agent-level consistency is neglected; and controllability with respect to ego conditioning is ignored. Moreover, current datasets fail to cover the diversity of conditions required for real-world deployment. To address these gaps, we present DrivingGen, the first comprehensive benchmark for generative driving world models. DrivingGen combines a diverse evaluation dataset curated from both driving datasets and internet-scale video sources, spanning varied weather, time of day, geographic regions, and complex maneuvers, with a suite of new metrics that jointly assess visual realism, trajectory plausibility, temporal coherence, and controllability. Benchmarking 14 state-of-the-art models reveals clear trade-offs: general models look better but break physics, while driving-specific ones capture motion realistically but lag in visual quality. DrivingGen offers a unified evaluation framework to foster reliable, controllable, and deployable driving world models, enabling scalable simulation, planning, and data-driven decision-making.
comment: 10 pages, 4 figures; Project Website: https://drivinggen-bench.github.io/
Online Estimation and Manipulation of Articulated Objects
From refrigerators to kitchen drawers, humans interact with articulated objects effortlessly every day while completing household chores. For automating these tasks, service robots must be capable of manipulating arbitrary articulated objects. Recent deep learning methods have been shown to predict valuable priors on the affordance of articulated objects from vision. In contrast, many other works estimate object articulations by observing the articulation motion, but this requires the robot to already be capable of manipulating the object. In this article, we propose a novel approach combining these methods by using a factor graph for online estimation of articulation which fuses learned visual priors and proprioceptive sensing during interaction into an analytical model of articulation based on Screw Theory. With our method, a robotic system makes an initial prediction of articulation from vision before touching the object, and then quickly updates the estimate from kinematic and force sensing during manipulation. We evaluate our method extensively in both simulations and real-world robotic manipulation experiments. We demonstrate several closed-loop estimation and manipulation experiments in which the robot was capable of opening previously unseen drawers. In real hardware experiments, the robot achieved a 75% success rate for autonomous opening of unknown articulated objects.
comment: This preprint has not undergone peer review or any post-submission improvements or corrections. The Version of Record of this article is published in Autonomous Robots, and is available online at [Link will be updated when available]
Sampling Strategy Design for Model Predictive Path Integral Control on Legged Robot Locomotion
Model Predictive Path Integral (MPPI) control has emerged as a powerful sampling-based optimal control method for complex, nonlinear, and high-dimensional systems. However, directly applying MPPI to legged robotic systems presents several challenges. This paper systematically investigates the role of sampling strategy design within the MPPI framework for legged robot locomotion. Based upon the idea of structured control parameterization, we explore and compare multiple sampling strategies within the framework, including both unstructured and spline-based approaches. Through extensive simulations on a quadruped robot platform, we evaluate how different sampling strategies affect control smoothness, task performance, robustness, and sample efficiency. The results provide new insights into the practical implications of sampling design for deploying MPPI on complex legged systems.
How Robot Dogs See the Unseeable: Improving Visual Interpretability via Peering for Exploratory Robots
In vegetated environments, such as forests, exploratory robots play a vital role in navigating complex, cluttered environments where human access is limited and traditional equipment struggles. Visual occlusion from obstacles, such as foliage, can severely obstruct a robot's sensors, impairing scene understanding. We show that "peering", a characteristic side-to-side movement used by insects to overcome their visual limitations, can also allow robots to markedly improve visual reasoning under partial occlusion. This is accomplished by applying core signal processing principles, specifically optical synthetic aperture sensing, together with the vision reasoning capabilities of modern large multimodal models. Peering enables real-time, high-resolution, and wavelength-independent perception, which is crucial for vision-based scene understanding across a wide range of applications. The approach is low-cost and immediately deployable on any camera-equipped robot. We investigated different peering motions and occlusion masking strategies, demonstrating that, unlike peering, state-of-the-art multi-view 3D vision techniques fail in these conditions due to their high susceptibility to occlusion. Our experiments were carried out on an industrial-grade quadrupedal robot. However, the ability to peer is not limited to such platforms, but potentially also applicable to bipedal, hexapod, wheeled, or crawling platforms. Robots that can effectively see through partial occlusion will gain superior perception abilities - including enhanced scene understanding, situational awareness, camouflage breaking, and advanced navigation in complex environments.
General Dynamic Goal Recognition using Goal-Conditioned and Meta Reinforcement Learning AAMAS 2026
Understanding an agent's goal through its behavior is a common AI problem called Goal Recognition (GR). This task becomes particularly challenging in dynamic environments where goals are numerous and ever-changing. We introduce the General Dynamic Goal Recognition (GDGR) problem, a broader definition of GR aimed at real-time adaptation of GR systems. This paper presents two novel approaches to tackle GDGR: (1) GC-AURA, generalizing to new goals using Model-Free Goal-Conditioned Reinforcement Learning, and (2) Meta-AURA, adapting to novel environments with Meta-Reinforcement Learning. We evaluate these methods across diverse environments, demonstrating their ability to achieve rapid adaptation and high GR accuracy under dynamic and noisy conditions. This work is a significant step forward in enabling GR in dynamic and unpredictable real-world environments.
comment: Accepted for publication at AAMAS 2026
Standing Tall: Sim to Real Fall Classification and Lead Time Prediction for Bipedal Robots
This paper extends a previously proposed fall prediction algorithm to a real-time (online) setting, with implementations in both hardware and simulation. The system is validated on the full-sized bipedal robot Digit, where the real-time version achieves performance comparable to the offline implementation while maintaining a zero false positive rate, an average lead time (defined as the difference between the true and predicted fall time) of 1.1s (well above the required minimum of 0.2s), and a maximum lead time error of just 0.03s. It also achieves a high recovery rate of 0.97, demonstrating its effectiveness in real-world deployment. In addition to the real-time implementation, this work identifies key limitations of the original algorithm, particularly under omnidirectional faults, and introduces a fine-tuned strategy to improve robustness. The enhanced algorithm shows measurable improvements across all evaluated metrics, including a 0.05 reduction in average false positive rate and a 1.19s decrease in the maximum error of the average predicted lead time.
comment: This work has been submitted to the IEEE for possible publication
Phase-based Nonlinear Model Predictive Control for Humanoid Walking Stabilization with Single and Double Support Time Adjustments
The contact sequence of humanoid walking consists of single and double support phases (SSP and DSP), and their coordination through proper duration and dynamic transition based on the robot's state is crucial for maintaining walking stability. Numerous studies have investigated phase duration optimization as an effective means of improving walking stability. This paper presents a phase-based Nonlinear Model Predictive Control (NMPC) framework that jointly optimizes Zero Moment Point (ZMP) modulation, step location, SSP duration (step timing), and DSP duration within a single formulation. Specifically, the proposed framework reformulates the nonlinear DCM (Divergent Component of Motion) error dynamics into a phase-consistent representation and incorporates them as dynamic constraints within the NMPC. The proposed framework also guarantees ZMP input continuity during contact-phase transitions and disables footstep updates during the DSP, thereby enabling dynamically reliable balancing control regardless of whether the robot is in SSP or DSP. The effectiveness of the proposed method is validated through extensive simulation and hardware experiments, demonstrating improved balance performance under external disturbances.
comment: 15 pages, 9 figures
H2R: A Human-to-Robot Data Augmentation for Robot Pre-training from Videos
Large-scale pre-training using videos has proven effective for robot learning. However, the models pre-trained on such data can be suboptimal for robot learning due to the significant visual gap between human hands and those of different robots. To remedy this, we propose H2R, a simple data augmentation technique that detects human hand keypoints, synthesizes robot motions in simulation, and composites rendered robots into egocentric videos. This process explicitly bridges the visual gap between human and robot embodiments during pre-training. We apply H2R to augment large-scale egocentric human video datasets such as Ego4D and SSv2, replacing human hands with simulated robotic arms to generate robot-centric training data. Based on this, we construct and release a family of 1M-scale datasets covering multiple robot embodiments (UR5 with gripper/Leaphand, Franka) and data sources (SSv2, Ego4D). To verify the effectiveness of the augmentation pipeline, we introduce a CLIP-based image-text similarity metric that quantitatively evaluates the semantic fidelity of robot-rendered frames to the original human actions. We validate H2R across three simulation benchmarks: Robomimic, RLBench and PushT and real-world manipulation tasks with a UR5 robot equipped with Gripper and Leaphand end-effectors. H2R consistently improves downstream success rates, yielding gains of 5.0%-10.2% in simulation and 6.7%-23.3% in real-world tasks across various visual encoders and policy learning methods. These results indicate that H2R improves the generalization ability of robotic policies by mitigating the visual discrepancies between human and robot domains.
Multimodal Classification Network Guided Trajectory Planning for Four-Wheel Independent Steering Autonomous Parking Considering Obstacle Attributes
Four-wheel Independent Steering (4WIS) vehicles have attracted increasing attention for their superior maneuverability. Human drivers typically choose to cross or drive over the low-profile obstacles (e.g., plastic bags) to efficiently navigate through narrow spaces, while existing planners neglect obstacle attributes, leading to suboptimal efficiency or planning failures. To address this issue, we propose a novel multimodal trajectory planning framework that employs a neural network for scene perception, combines 4WIS hybrid A* search to generate a warm start, and utilizes an optimal control problem (OCP) for trajectory optimization. Specifically, a multimodal perception network fusing visual information and vehicle states is employed to capture semantic and contextual scene understanding, enabling the planner to adapt the strategy according to scene complexity (hard or easy task). For hard tasks, guided points are introduced to decompose complex tasks into local subtasks, improving the search efficiency. The multiple steering modes of 4WIS vehicles, Ackermann, diagonal, and zero-turn, are also incorporated as kinematically feasible motion primitives. Moreover, a hierarchical obstacle handling strategy, which categorizes obstacles as "non-traversable", "crossable", and "drive-over", is incorporated into the node expansion process, explicitly linking obstacle attributes to planning actions to enable efficient decisions. Furthermore, to address dynamic obstacles with motion uncertainty, we introduce a probabilistic risk field model, constructing risk-aware driving corridors that serve as linear collision constraints in OCP. Experimental results demonstrate the proposed framework's effectiveness in generating safe, efficient, and smooth trajectories for 4WIS vehicles, especially in constrained environments.
Vision-Language-Action Models for Autonomous Driving: Past, Present, and Future
Autonomous driving has long relied on modular "Perception-Decision-Action" pipelines, where hand-crafted interfaces and rule-based components often break down in complex or long-tailed scenarios. Their cascaded design further propagates perception errors, degrading downstream planning and control. Vision-Action (VA) models address some limitations by learning direct mappings from visual inputs to actions, but they remain opaque, sensitive to distribution shifts, and lack structured reasoning or instruction-following capabilities. Recent progress in Large Language Models (LLMs) and multimodal learning has motivated the emergence of Vision-Language-Action (VLA) frameworks, which integrate perception with language-grounded decision making. By unifying visual understanding, linguistic reasoning, and actionable outputs, VLAs offer a pathway toward more interpretable, generalizable, and human-aligned driving policies. This work provides a structured characterization of the emerging VLA landscape for autonomous driving. We trace the evolution from early VA approaches to modern VLA frameworks and organize existing methods into two principal paradigms: End-to-End VLA, which integrates perception, reasoning, and planning within a single model, and Dual-System VLA, which separates slow deliberation (via VLMs) from fast, safety-critical execution (via planners). Within these paradigms, we further distinguish subclasses such as textual vs. numerical action generators and explicit vs. implicit guidance mechanisms. We also summarize representative datasets and benchmarks for evaluating VLA-based driving systems and highlight key challenges and open directions, including robustness, interpretability, and instruction fidelity. Overall, this work aims to establish a coherent foundation for advancing human-compatible autonomous driving systems.
comment: Survey; 47 pages, 7 figures, 9 tables; GitHub Repo at https://github.com/worldbench/awesome-vla-for-ad
VL-LN Bench: Towards Long-horizon Goal-oriented Navigation with Active Dialogs
In most existing embodied navigation tasks, instructions are well-defined and unambiguous, such as instruction following and object searching. Under this idealized setting, agents are required solely to produce effective navigation outputs conditioned on vision and language inputs. However, real-world navigation instructions are often vague and ambiguous, requiring the agent to resolve uncertainty and infer user intent through active dialog. To address this gap, we propose Interactive Instance Goal Navigation (IIGN), a task that requires agents not only to generate navigation actions but also to produce language outputs via active dialog, thereby aligning more closely with practical settings. IIGN extends Instance Goal Navigation (IGN) by allowing agents to freely consult an oracle in natural language while navigating. Building on this task, we present the Vision Language-Language Navigation (VL-LN) benchmark, which provides a large-scale, automatically generated dataset and a comprehensive evaluation protocol for training and assessing dialog-enabled navigation models. VL-LN comprises over 41k long-horizon dialog-augmented trajectories for training and an automatic evaluation protocol with an oracle capable of responding to agent queries. Using this benchmark, we train a navigation model equipped with dialog capabilities and show that it achieves significant improvements over the baselines. Extensive experiments and analyses further demonstrate the effectiveness and reliability of VL-LN for advancing research on dialog-enabled embodied navigation. Code and dataset: https://0309hws.github.io/VL-LN.github.io/
Evaluation of Impression Difference of a Domestic Mobile Manipulator with Autonomous and/or Remote Control in Fetch-and-Carry Tasks
A single service robot can present two distinct agencies: its onboard autonomy and an operator-mediated agency, yet users experience them through one physical body. We formalize this dual-agency structure as a User-Robot-Operator triad in an autonomous remote-control setting that integrates teleoperation with autonomous execution and human-in-the-loop remote assistance. Prior to the recent surge of language-based and multimodal interfaces, we developed and evaluated an early-stage prototype in 2020 that combined natural-language text chat with a sketch-based interface enabling freehand on-image annotation over the robot's live camera view to support remote intervention. We evaluated three modes - remote control via teleoperation, autonomous control, and autonomous remote control (a hybrid mode representing different levels of autonomy) - in controlled fetch-and-carry mobile manipulation tasks using a domestic mobile manipulator, the Human Support Robot (HSR), on a World Robot Summit 2020 rule-compliant test field. The results show systematic mode-dependent differences in user-rated affinity and perceived security, indicating that switching or blending agency within one robot measurably shapes human impressions in Human-Robot Interaction (HRI). These findings provide empirical guidance for designing human-in-the-loop mobile manipulation in domestic physical tasks.
comment: Published in Advanced Robotics (2020). v2 updates Abstract/Comments (metadata only); paper content unchanged. Please cite: Advanced Robotics 34(20):1291-1308, 2020. https://doi.org/10.1080/01691864.2020.1780152
Do You Have Freestyle? Expressive Humanoid Locomotion via Audio Control
Humans intuitively move to sound, but current humanoid robots lack expressive improvisational capabilities, confined to predefined motions or sparse commands. Generating motion from audio and then retargeting it to robots relies on explicit motion reconstruction, leading to cascaded errors, high latency, and disjointed acoustic-actuation mapping. We propose RoboPerform, the first unified audio-to-locomotion framework that can directly generate music-driven dance and speech-driven co-speech gestures from audio. Guided by the core principle of "motion = content + style", the framework treats audio as implicit style signals and eliminates the need for explicit motion reconstruction. RoboPerform integrates a ResMoE teacher policy for adapting to diverse motion patterns and a diffusion-based student policy for audio style injection. This retargeting-free design ensures low latency and high fidelity. Experimental validation shows that RoboPerform achieves promising results in physical plausibility and audio alignment, successfully transforming robots into responsive performers capable of reacting to audio.
RoboMirror: Understand Before You Imitate for Video to Humanoid Locomotion
Humans learn locomotion through visual observation, interpreting visual content first before imitating actions. However, state-of-the-art humanoid locomotion systems rely on either curated motion capture trajectories or sparse text commands, leaving a critical gap between visual understanding and control. Text-to-motion methods suffer from semantic sparsity and staged pipeline errors, while video-based approaches only perform mechanical pose mimicry without genuine visual understanding. We propose RoboMirror, the first retargeting-free video-to-locomotion framework embodying "understand before you imitate". Leveraging VLMs, it distills raw egocentric/third-person videos into visual motion intents, which directly condition a diffusion-based policy to generate physically plausible, semantically aligned locomotion without explicit pose reconstruction or retargeting. Extensive experiments validate the effectiveness of RoboMirror, it enables telepresence via egocentric videos, drastically reduces third-person control latency by 80%, and achieves a 3.7% higher task success rate than baselines. By reframing humanoid control around video understanding, we bridge the visual understanding and action gap.
ManiBox: Enhancing Embodied Spatial Generalization via Scalable Simulation Data Generations
Embodied agents require robust spatial intelligence to execute precise real-world manipulations. However, this remains a significant challenge, as current methods often struggle to accurately position objects in space. Collecting extensive data can help address this issue by enhancing the agent's spatial understanding. Nonetheless, obtaining such data with real robots is prohibitively expensive, and relying on simulation data frequently leads to visual generalization gaps during real-world deployment. To tackle these challenges, we propose ManiBox, a novel bounding-box-guided framework. By decoupling perception from policy generalization, ManiBox effectively reduces the Sim2Real gap, leverages Internet-scale data, and scales our policy data collection in simulation. Specifically, within ManiBox, the RL teacher policy efficiently generates scalable simulation data. The student policy is distilled from this data and takes bounding boxes as input, which is proven sufficient for determining objects' spatial positions, thus enabling zero-shot transfer to real robots. Comprehensive evaluations in both simulated and real-world environments demonstrate that ManiBox exhibits strong spatial generalization and adaptability across various manipulation tasks and settings. Furthermore, our empirical study provides preliminary verification of spatial scaling laws, i.e., the amount of data required for spatial generalization scales with spatial volume following a power-law relationship. At a given spatial volume level, the success rate of manipulation tasks follows Michaelis-Menten kinetics with respect to data volume, exhibiting a saturation effect as data increases. Our videos and code are available at https://thkkk.github.io/manibox
LSRE: Latent Semantic Rule Encoding for Real-Time Semantic Risk Detection in Autonomous Driving
Real-world autonomous driving must adhere to complex human social rules that extend beyond legally codified traffic regulations. Many of these semantic constraints, such as yielding to emergency vehicles, complying with traffic officers' gestures, or stopping for school buses, are intuitive for humans yet difficult to encode explicitly. Although large vision-language models (VLMs) can interpret such semantics, their inference cost makes them impractical for real-time deployment. This work proposes LSRE, a Latent Semantic Rule Encoding framework that converts sparsely sampled VLM judgments into decision boundaries within the latent space of a recurrent world model. By encoding language-defined safety semantics into a lightweight latent classifier, LSRE enables real-time semantic risk assessment at 10 Hz without per-frame VLM queries. Experiments on six semantic-failure scenarios in CARLA demonstrate that LSRE attains semantic risk detection accuracy comparable to a large VLM baseline, while providing substantially earlier hazard anticipation and maintaining low computational latency. LSRE further generalizes to rarely seen semantic-similar test cases, indicating that language-guided latent classification offers an effective and deployable mechanism for semantic safety monitoring in autonomous driving.
Multiagent Systems
Lying with Truths: Open-Channel Multi-Agent Collusion for Belief Manipulation via Generative Montage
As large language models (LLMs) transition to autonomous agents synthesizing real-time information, their reasoning capabilities introduce an unexpected attack surface. This paper introduces a novel threat where colluding agents steer victim beliefs using only truthful evidence fragments distributed through public channels, without relying on covert communications, backdoors, or falsified documents. By exploiting LLMs' overthinking tendency, we formalize the first cognitive collusion attack and propose Generative Montage: a Writer-Editor-Director framework that constructs deceptive narratives through adversarial debate and coordinated posting of evidence fragments, causing victims to internalize and propagate fabricated conclusions. To study this risk, we develop CoPHEME, a dataset derived from real-world rumor events, and simulate attacks across diverse LLM families. Our results show pervasive vulnerability across 14 LLM families: attack success rates reach 74.4% for proprietary models and 70.6% for open-weights models. Counterintuitively, stronger reasoning capabilities increase susceptibility, with reasoning-specialized models showing higher attack success than base models or prompts. Furthermore, these false beliefs then cascade to downstream judges, achieving over 60% deception rates, highlighting a socio-technical vulnerability in how LLM-based agents interact with dynamic information environments. Our implementation and data are available at: https://github.com/CharlesJW222/Lying_with_Truth/tree/main.
comment: Under Review
Learning Resilient Elections with Adversarial GNNs
In the face of adverse motives, it is indispensable to achieve a consensus. Elections have been the canonical way by which modern democracy has operated since the 17th century. Nowadays, they regulate markets, provide an engine for modern recommender systems or peer-to-peer networks, and remain the main approach to represent democracy. However, a desirable universal voting rule that satisfies all hypothetical scenarios is still a challenging topic, and the design of these systems is at the forefront of mechanism design research. Automated mechanism design is a promising approach, and recent works have demonstrated that set-invariant architectures are uniquely suited to modelling electoral systems. However, various concerns prevent the direct application to real-world settings, such as robustness to strategic voting. In this paper, we generalise the expressive capability of learned voting rules, and combine improvements in neural network architecture with adversarial training to improve the resilience of voting rules while maximizing social welfare. We evaluate the effectiveness of our methods on both synthetic and real-world datasets. Our method resolves critical limitations of prior work regarding learning voting rules by representing elections using bipartite graphs, and learning such voting rules using graph neural networks. We believe this opens new frontiers for applying machine learning to real-world elections.
CONSENT: A Negotiation Framework for Leveraging User Flexibility in Vehicle-to-Building Charging under Uncertainty AAMAS 2026
The growth of Electric Vehicles (EVs) creates a conflict in vehicle-to-building (V2B) settings between building operators, who face high energy costs from uncoordinated charging, and drivers, who prioritize convenience and a full charge. To resolve this, we propose a negotiation-based framework that, by design, guarantees voluntary participation, strategy-proofness, and budget feasibility. It transforms EV charging into a strategic resource by offering drivers a range of incentive-backed options for modest flexibility in their departure time or requested state of charge (SoC). Our framework is calibrated with user survey data and validated using real operational data from a commercial building and an EV manufacturer. Simulations show that our negotiation protocol creates a mutually beneficial outcome: lowering the building operator's costs by over 3.5\% compared to an optimized, non-negotiating smart charging policy, while simultaneously reducing user charging expenses by 22\% below the utility's retail energy rate. By aligning operator and EV user objectives, our framework provides a strategic bridge between energy and mobility systems, transforming EV charging from a source of operational friction into a platform for collaboration and shared savings.
comment: Submitted to AAMAS 2026. 25 pages, 13 figures, 14 tables
On the Emergence of Linear Behavior in Large-Scale Dynamical Systems via Spatial Averaging
Various natural and engineered systems, from urban traffic flow to the human brain, can be described by large-scale networked dynamical systems. These systems are similar in being comprised of a large number of microscopic subsystems, each with complex nonlinear dynamics and interactions, that collectively give rise to different forms of macroscopic dynamics. Despite significant research, why and how various forms of macroscopic dynamics emerge from underlying micro-dynamics remains largely unknown. In this work we focus on linearity as one of the most fundamental aspects of system dynamics. By extending the theory of mixing sequences, we show that \textit{in a broad class of autonomous nonlinear networked systems, the dynamics of the average of all subsystems' states becomes asymptotically linear as the number of subsystems grows to infinity, provided that, in addition to technical assumptions, pairwise correlations between subsystems decay to 0 as their pairwise distance grows to infinity}. We prove this result when the latter distance is between subsystems' linear indices or spatial locations, and provide extensions to linear time-invariant (LTI) limit dynamics, finite-sample analysis of rates of convergence, and networks of spatially-embedded subsystems with random locations. To our knowledge, this work is the first rigorous analysis of macroscopic linearity in large-scale heterogeneous networked dynamical systems, and provides a solid foundation for further theoretical and empirical analyses in various domains of science and engineering.
A Multi-Memory Segment System for Generating High-Quality Long-Term Memory Content in Agents
In the current field of agent memory, extensive explorations have been conducted in the area of memory retrieval, yet few studies have focused on exploring the memory content. Most research simply stores summarized versions of historical dialogues, as exemplified by methods like A-MEM and MemoryBank. However, when humans form long-term memories, the process involves multi-dimensional and multi-component generation, rather than merely creating simple summaries. The low-quality memory content generated by existing methods can adversely affect recall performance and response quality. In order to better construct high-quality long-term memory content, we have designed a multi-memory segment system (MMS) inspired by cognitive psychology theory. The system processes short-term memory into multiple long-term memory segments, and constructs retrieval memory units and contextual memory units based on these segments, with a one-to-one correspondence between the two. During the retrieval phase, MMS will match the most relevant retrieval memory units based on the user's query. Then, the corresponding contextual memory units is obtained as the context for the response stage to enhance knowledge, thereby effectively utilizing historical data. We conducted experiments on the LoCoMo dataset and further performed ablation experiments, experiments on the robustness regarding the number of input memories, and overhead experiments, which demonstrated the effectiveness and practical value of our method.
Systems and Control (EESS)
Host-Aware Control of Gene Expression using Data-Enabled Predictive Control
Cybergenetic gene expression control in bacteria enables applications in engineering biology, drug development, and biomanufacturing. AI-based controllers offer new possibilities for real-time, single-cell-level regulation but typically require large datasets and re-training for new systems. Data-enabled Predictive Control (DeePC) offers better sample efficiency without prior modelling. We apply DeePC to a system with two inputs, optogenetic control and media concentration, and two outputs, expression of gene of interest and host growth rate. Using basis functions to address nonlinearities, we demonstrate that DeePC remains robust to parameter variations and performs among the best control strategies while using the least data.
Cross-Directional Modelling and Control of Slot-Die Battery Electrode Coating
As global battery demand increases, real-time process control becomes increasingly important for battery electrode manufacturing, yet slot-die lines are still mostly manually operated in open loop. This paper develops a physics-based modelling-and-control pipeline for film-thickness regulation. Computational fluid dynamics (CFD) simulations provide the data from which a low-order cross-directional model is identified and calibrated. Numerical simulations demonstrate close agreement between the CFD and the cross-directional model, which is used to design a controller that can be used in both real-time, automated feedback operation and manual feedforward operation during line commissioning.
Full-Time-Scale Power Management Strategy for Hybrid AC/DC/DS Microgrid with Dynamic Concatenation and Autonomous Frequency / Voltage Restorations
Hybrid AC/DC microgrids with distributed energy storage (DS) improve power reliability in remote areas. Existing power management methods either focus on steady-state power sharing or transient inertia support, but rarely combine both. They also often ignore frequency and voltage deviations caused by droop control, which can harm sensitive loads. To overcome these issues, this paper proposes a full-time-scale (FTS) power management strategy that unifies transient inertia sharing and steady-state power allocation through a novel dynamic concatenator. It also introduces autonomous frequency/voltage restoration to eliminate steady-state deviations in each subgrid. Additionally, a global equivalent circuit model (GECM) is developed to simplify system analysis and design. Experiments confirm that the approach maintains nominal frequency and voltage in steady state while enabling seamless transition between transient inertia support and proportional power sharing across all time scales.
CONSENT: A Negotiation Framework for Leveraging User Flexibility in Vehicle-to-Building Charging under Uncertainty AAMAS 2026
The growth of Electric Vehicles (EVs) creates a conflict in vehicle-to-building (V2B) settings between building operators, who face high energy costs from uncoordinated charging, and drivers, who prioritize convenience and a full charge. To resolve this, we propose a negotiation-based framework that, by design, guarantees voluntary participation, strategy-proofness, and budget feasibility. It transforms EV charging into a strategic resource by offering drivers a range of incentive-backed options for modest flexibility in their departure time or requested state of charge (SoC). Our framework is calibrated with user survey data and validated using real operational data from a commercial building and an EV manufacturer. Simulations show that our negotiation protocol creates a mutually beneficial outcome: lowering the building operator's costs by over 3.5\% compared to an optimized, non-negotiating smart charging policy, while simultaneously reducing user charging expenses by 22\% below the utility's retail energy rate. By aligning operator and EV user objectives, our framework provides a strategic bridge between energy and mobility systems, transforming EV charging from a source of operational friction into a platform for collaboration and shared savings.
comment: Submitted to AAMAS 2026. 25 pages, 13 figures, 14 tables
Lyapunov Functions can Exactly Quantify Rate Performance of Nonlinear Differential Equations
Pointwise-in-time stability notions for Ordinary Differential Equations (ODEs) provide quantitative metrics for system performance by establishing bounds on the rate of decay of the system state in terms of initial condition -- allowing stability to be quantified by e.g. the maximum provable decay rate. Such bounds may be obtained by finding suitable Lyapunov functions using, e.g. Sum-of-Squares (SOS) optimization. While Lyapunov tests have been proposed for numerous pointwise-in-time stability notions, including exponential, rational, and finite-time stability, it is unclear whether these characterizations are able to provide accurate bounds on system performance. In this paper, we start by proposing a generalized notion of rate performance -- with exponential, rational, and finite-time decay rates being special cases. Then, for any such notion and rate, we associate a Lyapunov condition which is shown to be necessary and sufficient for a system to achieve that rate. Finally, we show how the proposed conditions can be enforced using SOS programming in the case of exponential, rational, and finite-time stability. Numerical examples in each case demonstrate that the corresponding SOS test can achieve tight bounds on the rate performance with accurate inner bounds on the associated regions of performance.
Context-Aware Information Transfer via Digital Semantic Communication in UAV-Based Networks
In smart cities, bandwidth-constrained Unmanned Aerial Vehicles (UAVs) often fail to relay mission-critical data in time, compromising real-time decision-making. This highlights the need for faster and more efficient transmission of only the most relevant information. To address this, we propose DSC-UAV model, leveraging a context-adaptive Digital Semantic Communication (DSC) framework. This model redefines aerial data transmission through three core components: prompt-aware encoding, dynamic UAV-enabled relaying, and user mobility-optimized reinforcement learning. Ground users transmit context-driven visual content. Images are encoded via Vision Transformer combined with a prompt-text encoder to generate semantic features based on the desired context (generic or object-specific). These features are then quantized and transmitted over a UAV network that dynamically relays the data. Joint trajectory and resource allocation are optimized using Truncated Quantile Critic (TQC)-aided reinforcement learning technique, which offers greater stability and precision over standard SAC and TD3 due to its resistance to overestimation bias. Simulations demonstrate significant performance improvement, up to 22\% gain in semantic-structural similarity and 14\% reduction in Age of Information (AoI) compared to digital and prior UAV-semantic communication baselines. By integrating mobility control with context-driven visual abstraction, DSC-UAV advances resilient, information-centric surveillance for next-generation UAV networks in bandwidth-constrained environments.
comment: 10 Pages, 6 figures
Reliable Grid Forecasting: State Space Models for Safety-Critical Energy Systems
Accurate grid load forecasting is safety-critical: under-predictions risk supply shortfalls, while symmetric error metrics mask this operational asymmetry. We introduce a grid-specific evaluation framework--Asymmetric MAPE, Under-Prediction Rate, and Reserve Margin--that directly measures operational risk rather than statistical accuracy alone. Using this framework, we conduct a systematic evaluation of Mamba-based State Space Models for California grid forecasting on a weather-aligned CAISO TAC-area dataset spanning Nov 2023--Nov 2025 (84,498 hourly records across 5 transmission areas). Our analysis reveals that standard accuracy metrics are poor proxies for operational safety: models with identical MAPE can require vastly different reserve margins. We demonstrate that forecast errors are weakly but significantly associated with temperature (r = 0.16, p < 10^{-16}), motivating weather-aware modeling rather than loss function modification alone. The S-Mamba model achieves the lowest Reserve_{99.5}% margin (14.12%) compared to 16.66% for iTransformer, demonstrating superior forecast reliability under a 99.5th-percentile tail-risk reserve proxy.
comment: 24 pages, 8 figures, 8 tables
Sampling Strategy Design for Model Predictive Path Integral Control on Legged Robot Locomotion
Model Predictive Path Integral (MPPI) control has emerged as a powerful sampling-based optimal control method for complex, nonlinear, and high-dimensional systems. However, directly applying MPPI to legged robotic systems presents several challenges. This paper systematically investigates the role of sampling strategy design within the MPPI framework for legged robot locomotion. Based upon the idea of structured control parameterization, we explore and compare multiple sampling strategies within the framework, including both unstructured and spline-based approaches. Through extensive simulations on a quadruped robot platform, we evaluate how different sampling strategies affect control smoothness, task performance, robustness, and sample efficiency. The results provide new insights into the practical implications of sampling design for deploying MPPI on complex legged systems.
Neural-network-based Self-triggered Observed Platoon Control for Autonomous Vehicles
This paper investigates autonomous vehicle (AV) platoon control under uncertain dynamics and intermittent communication, which remains a critical challenge in intelligent transportation systems. To address these issues, this paper proposes an adaptive consensus tracking control framework for nonlinear multi-agent systems (MASs). The proposed approach integrates backstepping design, a nonlinear sampled-data observer, radial basis function neural networks, and a self-triggered communication mechanism. The radial basis function neural networks approximate unknown nonlinearities and time-varying disturbances, thereby enhancing system robustness. A distributed observer estimates neighboring states based on limited and intermittent measurements, thereby reducing dependence on continuous communication. Moreover, self-triggered mechanism is developed to determine triggering instants, guaranteeing a strictly positive minimum inter-event time and preventing Zeno behavior. The theoretical analysis proves that all closed-loop signals are uniformly ultimately bounded (UUB), and tracking errors converge to a compact set. Simulation results demonstrate that the proposed approach achieves high robustness, adaptability, and communication efficiency, making it suitable for real-world networked vehicle systems.
On the Emergence of Linear Behavior in Large-Scale Dynamical Systems via Spatial Averaging
Various natural and engineered systems, from urban traffic flow to the human brain, can be described by large-scale networked dynamical systems. These systems are similar in being comprised of a large number of microscopic subsystems, each with complex nonlinear dynamics and interactions, that collectively give rise to different forms of macroscopic dynamics. Despite significant research, why and how various forms of macroscopic dynamics emerge from underlying micro-dynamics remains largely unknown. In this work we focus on linearity as one of the most fundamental aspects of system dynamics. By extending the theory of mixing sequences, we show that \textit{in a broad class of autonomous nonlinear networked systems, the dynamics of the average of all subsystems' states becomes asymptotically linear as the number of subsystems grows to infinity, provided that, in addition to technical assumptions, pairwise correlations between subsystems decay to 0 as their pairwise distance grows to infinity}. We prove this result when the latter distance is between subsystems' linear indices or spatial locations, and provide extensions to linear time-invariant (LTI) limit dynamics, finite-sample analysis of rates of convergence, and networks of spatially-embedded subsystems with random locations. To our knowledge, this work is the first rigorous analysis of macroscopic linearity in large-scale heterogeneous networked dynamical systems, and provides a solid foundation for further theoretical and empirical analyses in various domains of science and engineering.
A Multi-Scale Attention-Based Attack Diagnosis Mechanism for Parallel Cyber-Physical Attacks in Power Grids
Parallel cyber--physical attacks (PCPA) can simultaneously damage physical transmission lines and disrupt measurement data transmission in power grids, severely impairing system situational awareness and attack diagnosis. This paper investigates the attack diagnosis problem for linearized AC/DC power flow models under PCPA, where physical attacks include not only line disconnections but also admittance modifications, such as those caused by compromised distributed flexible AC transmission system (D-FACTS) devices. To address this challenge, we propose a learning-assisted attack diagnosis framework based on meta--mixed-integer programming (MMIP), which integrates a convolutional graph cross-attention attack localization (CGCA-AL) model. First, sufficient conditions for measurement reconstruction are derived, enabling the recovery of unknown measurements in attacked areas using available measurements and network topology information. Based on these conditions, the attack diagnosis problem is formulated as an MMIP model. The proposed CGCA-AL employs a multi-scale attention mechanism to predict a probability distribution over potential physical attack locations, which is incorporated into the MMIP as informative objective coefficients. By solving the resulting MMIP, both the locations and magnitudes of physical attacks are optimally estimated, and system states are subsequently reconstructed. Simulation results on IEEE 30-bus and IEEE 118-bus test systems demonstrate the effectiveness, robustness, and scalability of the proposed attack diagnosis framework under complex PCPA scenarios.
comment: 10 pages, 3 figures, 5 tables, journal
Auditing Human Decision-Making in High-Stakes Environments via Prescriptive AI: A Stress-Test on Real-Time Tactical Management AAAI
High-stakes decision-making is often compromised by cognitive biases and outcome dependency. Current AI models typically mimic historical human behavior, inheriting these biases and limiting their utility for normative improvement. Here, we introduce a Prescriptive AI framework designed to audit, rather than automate, human judgment in real-time environments. By decoupling decision quality from stochastic outcomes, we quantify "decision latency" and status quo bias in elite soccer management - a high-pressure adversarial domain. Analyzing 2018 FIFA World Cup data, our system exposes critical risk states, such as performance collapse following salient positive events (e.g., an assist), which human experts systematically overlook due to outcome bias. These findings demonstrate that interpretable auditing systems can reveal structural flaws in human reasoning that predictive models obscure. This approach establishes a paradigm for Human-AI interaction prioritizing epistemic accountability over predictive mimicry in safety-critical domains.
comment: Preprint; suitable for AI, decision sciences, and prescriptive analytics. Short versions published in Wharton Sports Analytics Journal Fall 2025 (AI Feature Spotlight) and accepted to AAAI Bridge on LM Reasoning 2026
Hardware Acceleration for Neural Networks: A Comprehensive Survey
Neural networks have become a dominant computational workload across cloud and edge platforms, but rapid growth in model size and deployment diversity has exposed hardware bottlenecks increasingly dominated by memory movement, communication, and irregular operators rather than peak arithmetic throughput. This survey reviews the technology landscape for hardware acceleration of deep learning, spanning GPUs and tensor-core architectures; domain-specific accelerators (e.g., TPUs/NPUs); FPGA-based designs; ASIC inference engines; and emerging LLM-serving accelerators such as LPUs (language processing units), alongside in-/near-memory computing and neuromorphic/analog approaches. We organize the space using a unified taxonomy across (i) workloads (CNNs, RNNs, GNNs, and Transformers/LLMs), (ii) execution settings (training vs.\ inference; datacenter vs.\ edge), and (iii) optimization levers (reduced precision, sparsity and pruning, operator fusion, compilation and scheduling, and memory-system/interconnect design). We synthesize key architectural ideas including systolic arrays, vector and SIMD engines, specialized attention and softmax kernels, quantization-aware datapaths, and high-bandwidth memory, and we discuss how software stacks and compilers bridge model semantics to hardware. Finally, we highlight open challenges -- including efficient long-context LLM inference (KV-cache management), robust support for dynamic and sparse workloads, energy- and security-aware deployment, and fair benchmarking -- and point to promising directions for the next generation of neural acceleration.
comment: error in section 5.1
Robotics
PyBatchRender: A Python Library for Batched 3D Rendering at Up to One Million FPS
Reinforcement learning from pixels is often bottlenecked by the performance and complexity of 3D rendered environments. Researchers face a trade-off between high-speed, low-level engines and slower, more accessible Python frameworks. To address this, we introduce PyBatchRender, a Python library for high-throughput, batched 3D rendering that achieves over 1 million FPS on simple scenes. Built on the Panda3D game engine, it utilizes its mature ecosystem while enhancing performance through optimized batched rendering for up to 1000X speedups. Designed as a physics-agnostic renderer for reinforcement learning from pixels, PyBatchRender offers greater flexibility than dedicated libraries, simpler setup than typical game-engine wrappers, and speeds rivaling state-of-the-art C++ engines like Madrona. Users can create custom scenes entirely in Python with tens of lines of code, enabling rapid prototyping for scalable AI training. Open-source and easy to integrate, it serves to democratize high-performance 3D simulation for researchers and developers. The library is available at https://github.com/dolphin-in-a-coma/PyBatchRender.
SAHA: Supervised Autonomous HArvester for selective forest thinning
Forestry plays a vital role in our society, creating significant ecological, economic, and recreational value. Efficient forest management involves labor-intensive and complex operations. One essential task for maintaining forest health and productivity is selective thinning, which requires skilled operators to remove specific trees to create optimal growing conditions for the remaining ones. In this work, we present a solution based on a small-scale robotic harvester (SAHA) designed for executing this task with supervised autonomy. We build on a 4.5-ton harvester platform and implement key hardware modifications for perception and automatic control. We implement learning- and model-based approaches for precise control of hydraulic actuators, accurate navigation through cluttered environments, robust state estimation, and reliable semantic estimation of terrain traversability. Integrating state-of-the-art techniques in perception, planning, and control, our robotic harvester can autonomously navigate forest environments and reach targeted trees for selective thinning. We present experimental results from extensive field trials over kilometer-long autonomous missions in northern European forests, demonstrating the harvester's ability to operate in real forests. We analyze the performance and provide the lessons learned for advancing robotic forest management.
Bridging Language Gaps: Utilizing Interactive Robots to Teach Cantonese in Real-Life Contexts for Newly-Arrived Children
Hong Kong's education system is notably multicultural, including local, non-Chinese-speaking, and newly arrived students (NAS) (Mandarine Chinese-speaking). NAS can guess the meaning of vocabulary but cannot speak out, presenting unique challenges for them, particularly language barriers and cultural differences. These challenges hinder their academic success and social integration, leading to feelings of isolation and demotivation. Current resources often fail to address the emotional well-being of these students and predominantly focus on English language acquisition, leaving a gap in support for learning Cantonese and navigating the local cultural landscape. This study explores the effectiveness of an interactive robot, Boon Boon, in teaching Cantonese through real-life contexts to enhance NAS children learning engagement and motivation. The research questions are: (1) How does interactive robot-empowered scenario learning influence the learning engagement and motivation of NAS in learning Cantonese? and (2) What is the impact of a robot-empowered scenario learning system on the Cantonese language proficiency of NAS? Fourteen children are invited to participate in a four-day learning program with Boon Boon. The preliminary result indicated that Boon Boon drove students' attention to learning and academic achievement. Future research will focus on long-term assessments of robot-empowered learning's effectiveness and explore the scalability of this approach across diverse educational settings and cultural backgrounds.
LiveBo: Empowering Non-Chinese Speaking Students through AI-Driven Real-Life Scenarios in Cantonese
Language learning is a multifaceted process. Insufficient vocabulary can hinder communication and lead to demotivation. For non-Chinese speaking (NCS) students, learning Traditional Chinese (Cantonese) poses distinct challenges, particularly due to the complexity of converting spoken and written forms. To address this issue, this study examines the effectiveness of real-life scenario simulations integrated with interactive social robots in enhancing NCS student engagement and language acquisition. The research employs a quasi-experimental design involving NCS students who interact with an AI-driven, robot-assisted language learning system, LiveBo. The study aims to assess the impact of this innovative approach on active participation and motivation. Data are collected through proficiency tests, questionnaires and semi-structured interviews. Findings indicate that NCS students experience positive improvements in behavioural and emotional engagement, motivation and learning outcomes, highlighting the potential of integrating novel technologies in language education. We plan to compare with the control group in the future. This study highlights the significance of interactive and immersive learning experiences in promoting motivation and enhancing language acquisition among NCS students.
MotiBo: The Impact of Interactive Digital Storytelling Robots on Student Motivation through Self-Determination Theory
Creativity is increasingly recognized as an important skill in education, and storytelling can enhance motivation and engagement among students. However, conventional storytelling methods often lack the interactive elements necessary to engage students. To this end, this study examines the impact of an interactive digital storytelling system incorporating a human-like robot on student engagement and creativity. The study aims to compare engagement levels across three modalities: paper-based, PowerPoint, and robot-assisted storytelling, MotiBo. Utilizing a quasi-experimental design, this work involves three groups of students who interact with the storytelling system over a five-day learning. Findings reveal that students using MotiBo exhibit statistically significant improvement in behavioural and cognitive engagement compared to those using traditional methods. These results suggest that the integration of novel technologies can effectively enhance the learning experience, ultimately promoting creativity and self-learning ability in educational settings. Future research will investigate the long-term effects of these technologies on learning outcomes and explore their potential for broader applications in diverse educational contexts.
Real-Time LiDAR Point Cloud Densification for Low-Latency Spatial Data Transmission
To realize low-latency spatial transmission system for immersive telepresence, there are two major problems: capturing dynamic 3D scene densely and processing them in real time. LiDAR sensors capture 3D in real time, but produce sparce point clouds. Therefore, this paper presents a high-speed LiDAR point cloud densification method to generate dense 3D scene with minimal latency, addressing the need for on-the-fly depth completion while maintaining real-time performance. Our approach combines multiple LiDAR inputs with high-resolution color images and applies a joint bilateral filtering strategy implemented through a convolutional neural network architecture. Experiments demonstrate that the proposed method produces dense depth maps at full HD resolution in real time (30 fps), which is over 15x faster than a recent training-based depth completion approach. The resulting dense point clouds exhibit accurate geometry without multiview inconsistencies or ghosting artifacts.
EduSim-LLM: An Educational Platform Integrating Large Language Models and Robotic Simulation for Beginners
In recent years, the rapid development of Large Language Models (LLMs) has significantly enhanced natural language understanding and human-computer interaction, creating new opportunities in the field of robotics. However, the integration of natural language understanding into robotic control is an important challenge in the rapid development of human-robot interaction and intelligent automation industries. This challenge hinders intuitive human control over complex robotic systems, limiting their educational and practical accessibility. To address this, we present the EduSim-LLM, an educational platform that integrates LLMs with robot simulation and constructs a language-drive control model that translates natural language instructions into executable robot behavior sequences in CoppeliaSim. We design two human-robot interaction models: direct control and autonomous control, conduct systematic simulations based on multiple language models, and evaluate multi-robot collaboration, motion planning, and manipulation capabilities. Experiential results show that LLMs can reliably convert natural language into structured robot actions; after applying prompt-engineering templates instruction-parsing accuracy improves significantly; as task complexity increases, overall accuracy rate exceeds 88.9% in the highest complexity tests.
DST-Calib: A Dual-Path, Self-Supervised, Target-Free LiDAR-Camera Extrinsic Calibration Network
LiDAR-camera extrinsic calibration is essential for multi-modal data fusion in robotic perception systems. However, existing approaches typically rely on handcrafted calibration targets (e.g., checkerboards) or specific, static scene types, limiting their adaptability and deployment in real-world autonomous and robotic applications. This article presents the first self-supervised LiDAR-camera extrinsic calibration network that operates in an online fashion and eliminates the need for specific calibration targets. We first identify a significant generalization degradation problem in prior methods, caused by the conventional single-sided data augmentation strategy. To overcome this limitation, we propose a novel double-sided data augmentation technique that generates multi-perspective camera views using estimated depth maps, thereby enhancing robustness and diversity during training. Built upon this augmentation strategy, we design a dual-path, self-supervised calibration framework that reduces the dependence on high-precision ground truth labels and supports fully adaptive online calibration. Furthermore, to improve cross-modal feature association, we replace the traditional dual-branch feature extraction design with a difference map construction process that explicitly correlates LiDAR and camera features. This not only enhances calibration accuracy but also reduces model complexity. Extensive experiments conducted on five public benchmark datasets, as well as our own recorded dataset, demonstrate that the proposed method significantly outperforms existing approaches in terms of generalizability.
ORION: Option-Regularized Deep Reinforcement Learning for Cooperative Multi-Agent Online Navigation
Existing methods for multi-agent navigation typically assume fully known environments, offering limited support for partially known scenarios such as warehouses or factory floors. There, agents may need to plan trajectories that balance their own path optimality with their ability to collect and share information about the environment that can help their teammates reach their own goals. To these ends, we propose ORION, a novel deep reinforcement learning framework for cooperative multi-agent online navigation in partially known environments. Starting from an imperfect prior map, ORION trains agents to make decentralized decisions, coordinate to reach their individual targets, and actively reduce map uncertainty by sharing online observations in a closed perception-action loop. We first design a shared graph encoder that fuses prior map with online perception into a unified representation, providing robust state embeddings under dynamic map discrepancies. At the core of ORION is an option-critic framework that learns to reason about a set of high-level cooperative modes that translate into sequences of low-level actions, allowing agents to switch between individual navigation and team-level exploration adaptively. We further introduce a dual-stage cooperation strategy that enables agents to assist teammates under map uncertainty, thereby reducing the overall makespan. Across extensive maze-like maps and large-scale warehouse environments, our simulation results show that ORION achieves high-quality, real-time decentralized cooperation over varying team sizes, outperforming state-of-the-art classical and learning-based baselines. Finally, we validate ORION on physical robot teams, demonstrating its robustness and practicality for real-world cooperative navigation.
VISO: Robust Underwater Visual-Inertial-Sonar SLAM with Photometric Rendering for Dense 3D Reconstruction
Visual challenges in underwater environments significantly hinder the accuracy of vision-based localisation and the high-fidelity dense reconstruction. In this paper, we propose VISO, a robust underwater SLAM system that fuses a stereo camera, an inertial measurement unit (IMU), and a 3D sonar to achieve accurate 6-DoF localisation and enable efficient dense 3D reconstruction with high photometric fidelity. We introduce a coarse-to-fine online calibration approach for extrinsic parameters estimation between the 3D sonar and the camera. Additionally, a photometric rendering strategy is proposed for the 3D sonar point cloud to enrich the sonar map with visual information. Extensive experiments in a laboratory tank and an open lake demonstrate that VISO surpasses current state-of-the-art underwater and visual-based SLAM algorithms in terms of localisation robustness and accuracy, while also exhibiting real-time dense 3D reconstruction performance comparable to the offline dense mapping method.
Latent Space Reinforcement Learning for Multi-Robot Exploration
Autonomous mapping of unknown environments is a critical challenge, particularly in scenarios where time is limited. Multi-agent systems can enhance efficiency through collaboration, but the scalability of motion-planning algorithms remains a key limitation. Reinforcement learning has been explored as a solution, but existing approaches are constrained by the limited input size required for effective learning, restricting their applicability to discrete environments. This work addresses that limitation by leveraging autoencoders to perform dimensionality reduction, compressing high-fidelity occupancy maps into latent state vectors while preserving essential spatial information. Additionally, we introduce a novel procedural generation algorithm based on Perlin noise, designed to generate topologically complex training environments that simulate asteroid fields, caves and forests. These environments are used for training the autoencoder and the navigation algorithm using a hierarchical deep reinforcement learning framework for decentralized coordination. We introduce a weighted consensus mechanism that modulates reliance on shared data via a tuneable trust parameter, ensuring robustness to accumulation of errors. Experimental results demonstrate that the proposed system scales effectively with number of agents and generalizes well to unfamiliar, structurally distinct environments and is resilient in communication-constrained settings.
Towards reliable subsea object recovery: a simulation study of an auv with a suction-actuated end effector
Autonomous object recovery in the hadal zone is challenging due to extreme hydrostatic pressure, limited visibility and currents, and the need for precise manipulation at full ocean depth. Field experimentation in such environments is costly, high-risk, and constrained by limited vehicle availability, making early validation of autonomous behaviors difficult. This paper presents a simulation-based study of a complete autonomous subsea object recovery mission using a Hadal Small Vehicle (HSV) equipped with a three-degree-of-freedom robotic arm and a suction-actuated end effector. The Stonefish simulator is used to model realistic vehicle dynamics, hydrodynamic disturbances, sensing, and interaction with a target object under hadal-like conditions. The control framework combines a world-frame PID controller for vehicle navigation and stabilization with an inverse-kinematics-based manipulator controller augmented by acceleration feed-forward, enabling coordinated vehicle - manipulator operation. In simulation, the HSV autonomously descends from the sea surface to 6,000 m, performs structured seafloor coverage, detects a target object, and executes a suction-based recovery. The results demonstrate that high-fidelity simulation provides an effective and low-risk means of evaluating autonomous deep-sea intervention behaviors prior to field deployment.
Scalable Data-Driven Reachability Analysis and Control via Koopman Operators with Conformal Coverage Guarantees
We propose a scalable reachability-based framework for probabilistic, data-driven safety verification of unknown nonlinear dynamics. We use Koopman theory with a neural network (NN) lifting function to learn an approximate linear representation of the dynamics and design linear controllers in this space to enable closed-loop tracking of a reference trajectory distribution. Closed-loop reachable sets are efficiently computed in the lifted space and mapped back to the original state space via NN verification tools. To capture model mismatch between the Koopman dynamics and the true system, we apply conformal prediction to produce statistically-valid error bounds that inflate the reachable sets to ensure the true trajectories are contained with a user-specified probability. These bounds generalize across references, enabling reuse without recomputation. Results on high-dimensional MuJoCo tasks (11D Hopper, 28D Swimmer) and 12D quadcopters show improved reachable set coverage rate, computational efficiency, and conservativeness over existing methods.
comment: Under review, 28 pages, 12 figures
Topological Mapping and Navigation using a Monocular Camera based on AnyLoc
This paper proposes a method for topological mapping and navigation using a monocular camera. Based on AnyLoc, keyframes are converted into descriptors to construct topological relationships, enabling loop detection and map building. Unlike metric maps, topological maps simplify path planning and navigation by representing environments with key nodes instead of precise coordinates. Actions for visual navigation are determined by comparing segmented images with the image associated with target nodes. The system relies solely on a monocular camera, ensuring fast map building and navigation using key nodes. Experiments show effective loop detection and navigation in real and simulation environments without pre-training. Compared to a ResNet-based method, this approach improves success rates by 60.2% on average while reducing time and space costs, offering a lightweight solution for robot and human navigation in various scenarios.
comment: Published in Proc. IEEE CASE 2025. 7 pages, 11 figures
No Minima, No Collisions: Combining Modulation and Control Barrier Function Strategies for Feasible Dynamic Collision Avoidance
Control Barrier Function Quadratic Programs (CBF-QPs) have become a central tool for real-time safety-critical control due to their applicability to general control-affine systems and their ability to enforce constraints through optimization. Yet, they often generate trajectories with undesirable local minima that prevent convergence to goals. On the other hand, Modulation of Dynamical Systems (Mod-DS) methods (including normal, reference, and on-manifold variants) reshape nominal vector fields geometrically and achieve obstacle avoidance with few or even no local minima. However, Mod-DS provides no straightforward mechanism for handling input constraints and remains largely restricted to fully actuated systems. In this paper, we revisit the theoretical foundations of both approaches and show that, despite their seemingly different constructions, the normal Mod-DS is a special case of the CBF-QP, and the reference Mod-DS is linked to the CBF-QP through a single shared equation. These connections motivate our Modulated CBF-QP (MCBF-QP) framework, which introduces reference and on-manifold modulation variants that reduce or fully eliminate the spurious equilibria inherent to CBF-QPs for general control-affine systems operating in dynamic, cluttered environments. We validate the proposed controllers in simulated hospital settings and in real-world experiments on fully actuated Ridgeback robots and underactuated Fetch platforms. Across all evaluations, Modulated CBF-QPs consistently outperform standard CBF-QPs on every performance metric.
Nonlinear Oscillatory Response of Automated Vehicle Car-following: Theoretical Analysis with Traffic State and Control Input Limits
This paper presents a framework grounded in the theory of describing function (DF) and incremental-input DF to theoretically analyze the nonlinear oscillatory response of automated vehicles (AVs) car-following (CF) amidst traffic oscillations, considering the limits of traffic state and control input. While prevailing approaches largely ignore these limits (i.e., saturation of acceleration/deceleration and speed) and focus on linear string stability analysis, this framework establishes a basis for theoretically analyzing the frequency response of AV systems with nonlinearities imposed by these limits. To this end, trajectories of CF pairs are decomposed into nominal and oscillatory trajectories, subsequently, the controlled AV system is repositioned within the oscillatory trajectory coordinates. Built on this base, DFs are employed to approximate the frequency responses of nonlinear saturation components by using their first harmonic output, thereby capturing the associated amplification ratio and phase shift. Considering the closed-loop nature of AV control systems, where system states and control input mutually influence each other, amplification ratios and phase shifts are balanced within the loop to ensure consistency. This balancing process may render multiple solutions, hence the incremental-input DF is further applied to identify the reasonable ones. The proposed method is validated by estimations from Simulink, and further comparisons with prevailing methods are conducted. Results confirm the alignment of our framework with Simulink results and exhibit its superior accuracy in analysis compared to the prevailing methods. Furthermore, the framework proves valuable in string stability analysis, especially when conventional linear methods offer misleading insights.
Stochastic Online Optimization for Cyber-Physical and Robotic Systems
We propose a novel gradient-based online optimization framework for solving stochastic programming problems that frequently arise in the context of cyber-physical and robotic systems. Our problem formulation accommodates constraints that model the evolution of a cyber-physical system, which has, in general, a continuous state and action space, is nonlinear, and where the state is only partially observed. We also incorporate an approximate model of the dynamics as prior knowledge into the learning process and show that even rough estimates of the dynamics can significantly improve the convergence of our algorithms. Our online optimization framework encompasses both gradient descent and quasi-Newton methods, and we provide a unified convergence analysis of our algorithms in a non-convex setting. We also characterize the impact of modeling errors in the system dynamics on the convergence rate of the algorithms. Finally, we evaluate our algorithms in simulations of a flexible beam, a four-legged walking robot, and in real-world experiments with a ping-pong playing robot.
comment: 46 pages, 16 figures
SPARC: Spine with Prismatic and Revolute Compliance for Quadruped Robots
We present SPARC, a compact, open-source 3-DoF sagittal-plane spine module that combines revolute (pitch) and prismatic (axial) motion with programmable task-space impedance for quadruped robots. The system integrates three torque-controlled actuators, a 1~kHz control board, and protected power electronics in a 1.26~kg package. A floating-base impedance controller with dynamics compensation renders tunable spring-damper behavior along horizontal, vertical, and rotational axes. Benchtop experiments validate the approach: quasi-static tests demonstrate linear force-displacement characteristics with commanded horizontal stiffness spanning 300--700~N/m ($\leq 1.5\%$ error, $R^2 \geq 0.992$), while dynamic trials confirm predictable damping behavior across multiple settings. Furthermore, simulation studies reveal that adapting spine stiffness based on terrain slope and stride length significantly reduces the cost of transport. SPARC provides an open-source platform for systematic studies of spine compliance in legged locomotion, with complete hardware and firmware resources available for community use. Github repository: https://github.com/YueWang996/sparc.git
RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics NeurIPS 2025
Spatial referring is a fundamental capability of embodied robots to interact with the 3D physical world. However, even with the powerful pretrained vision language models (VLMs), recent approaches are still not qualified to accurately understand the complex 3D scenes and dynamically reason about the instruction-indicated locations for interaction. To this end, we propose RoboRefer, a 3D-aware VLM that can first achieve precise spatial understanding by integrating a disentangled but dedicated depth encoder via supervised fine-tuning (SFT). Moreover, RoboRefer advances generalized multi-step spatial reasoning via reinforcement fine-tuning (RFT), with metric-sensitive process reward functions tailored for spatial referring tasks. To support SFT and RFT training, we introduce RefSpatial, a large-scale dataset of 20M QA pairs (2x prior), covering 31 spatial relations (vs. 15 prior) and supporting complex reasoning processes (up to 5 steps). In addition, we introduce RefSpatial-Bench, a challenging benchmark filling the gap in evaluating spatial referring with multi-step reasoning. Experiments show that SFT-trained RoboRefer achieves state-of-the-art spatial understanding, with an average success rate of 89.6%. RFT-trained RoboRefer further outperforms all other baselines by a large margin, even surpassing Gemini-2.5-Pro by 17.4% in average accuracy on RefSpatial-Bench. Notably, RoboRefer can be integrated with various control policies to execute long-horizon, dynamic tasks across diverse robots (e,g., UR5, G1 humanoid) in cluttered real-world scenes.
comment: Accepted by NeurIPS 2025. Project page: https://zhoues.github.io/RoboRefer/
P2U-SLAM: A Monocular Wide-FoV SLAM System Based on Point Uncertainty and Pose Uncertainty
This paper presents P2U-SLAM, a visual Simultaneous Localization And Mapping (SLAM) system with a wide Field of View (FoV) camera, which utilizes pose uncertainty and point uncertainty. While the wide FoV enables considerable repetitive observations of historical map points for matching cross-view features, the data properties of the historical map points and the poses of historical keyframes have changed during the optimization process. The neglect of data property changes results in the lack of partial information matrices in optimization, increasing the risk of long-term positioning performance degradation. The purpose of our research is to mitigate the risks posed by wide-FoV visual input to the SLAM system. Based on the conditional probability model, this work reveals the definite impacts of the above data properties changes on the optimization process, concretizes these impacts as point uncertainty and pose uncertainty, and gives their specific mathematical form. P2U-SLAM embeds point uncertainty into the tracking module and pose uncertainty into the local mapping module respectively, and updates these uncertainties after each optimization operation including local mapping, map merging, and loop closing. We present an exhaustive evaluation on 27 sequences from two popular public datasets with wide-FoV visual input. P2U-SLAM shows excellent performance compared with other state-of-the-art methods. The source code will be made publicly available at https://github.com/BambValley/P2U-SLAM.
comment: Accepted to IEEE Transactions on Intelligent Transportation Systems (T-ITS). The source code will be made publicly available at https://github.com/BambValley/P2U-SLAM
Stairway to Success: An Online Floor-Aware Zero-Shot Object-Goal Navigation Framework via LLM-Driven Coarse-to-Fine Exploration
Deployable service and delivery robots struggle to navigate multi-floor buildings to reach object goals, as existing systems fail due to single-floor assumptions and requirements for offline, globally consistent maps. Multi-floor environments pose unique challenges including cross-floor transitions and vertical spatial reasoning, especially navigating unknown buildings. Object-Goal Navigation benchmarks like HM3D and MP3D also capture this multi-floor reality, yet current methods lack support for online, floor-aware navigation. To bridge this gap, we propose \textbf{\textit{ASCENT}}, an online framework for Zero-Shot Object-Goal Navigation that enables robots to operate without pre-built maps or retraining on new object categories. It introduces: (1) a \textbf{Multi-Floor Abstraction} module that dynamically constructs hierarchical representations with stair-aware obstacle mapping and cross-floor topology modeling, and (2) a \textbf{Coarse-to-Fine Reasoning} module that combines frontier ranking with LLM-driven contextual analysis for multi-floor navigation decisions. We evaluate on HM3D and MP3D benchmarks, outperforming state-of-the-art zero-shot approaches, and demonstrate real-world deployment on a quadruped robot.
comment: Accepted to IEEE Robotics and Automation Letters (RAL). Project Page at https://zeying-gong.github.io/projects/ascent
Aerial World Model for Long-horizon Visual Generation and Navigation in 3D Space
Unmanned aerial vehicles (UAVs) have emerged as powerful embodied agents. One of the core abilities is autonomous navigation in large-scale three-dimensional environments. Existing navigation policies, however, are typically optimized for low-level objectives such as obstacle avoidance and trajectory smoothness, lacking the ability to incorporate high-level semantics into planning. To bridge this gap, we propose ANWM, an aerial navigation world model that predicts future visual observations conditioned on past frames and actions, thereby enabling agents to rank candidate trajectories by their semantic plausibility and navigational utility. ANWM is trained on 4-DoF UAV trajectories and introduces a physics-inspired module: Future Frame Projection (FFP), which projects past frames into future viewpoints to provide coarse geometric priors. This module mitigates representational uncertainty in long-distance visual generation and captures the mapping between 3D trajectories and egocentric observations. Empirical results demonstrate that ANWM significantly outperforms existing world models in long-distance visual forecasting and improves UAV navigation success rates in large-scale environments.
Affordance-Guided Coarse-to-Fine Exploration for Base Placement in Open-Vocabulary Mobile Manipulation AAAI 2026
In open-vocabulary mobile manipulation (OVMM), task success often hinges on the selection of an appropriate base placement for the robot. Existing approaches typically navigate to proximity-based regions without considering affordances, resulting in frequent manipulation failures. We propose Affordance-Guided Coarse-to-Fine Exploration, a zero-shot framework for base placement that integrates semantic understanding from vision-language models (VLMs) with geometric feasibility through an iterative optimization process. Our method constructs cross-modal representations, namely Affordance RGB and Obstacle Map+, to align semantics with spatial context. This enables reasoning that extends beyond the egocentric limitations of RGB perception. To ensure interaction is guided by task-relevant affordances, we leverage coarse semantic priors from VLMs to guide the search toward task-relevant regions and refine placements with geometric constraints, thereby reducing the risk of convergence to local optima. Evaluated on five diverse open-vocabulary mobile manipulation tasks, our system achieves an 85% success rate, significantly outperforming classical geometric planners and VLM-based methods. This demonstrates the promise of affordance-aware and multimodal reasoning for generalizable, instruction-conditioned planning in OVMM.
comment: Accepted to AAAI 2026
Multiagent Systems
Warp-Cortex: An Asynchronous, Memory-Efficient Architecture for Million-Agent Cognitive Scaling on Consumer Hardware
Current multi-agent Large Language Model (LLM) frameworks suffer from linear memory scaling, rendering "System 2" parallel reasoning impractical on consumer hardware. We present Warp Cortex, an asynchronous architecture that theoretically enables million-agent cognitive scaling by decoupling agent logic from physical memory. Through Singleton Weight Sharing and a novel Topological Synapse--inspired by hybrid landmarking techniques from Topological Data Analysis (TDA)--we reduce memory complexity from O(N * L) to O(1) for weights and O(N * k) for context, where k << L. By treating the KV-cache as a point cloud in latent space, we apply witness-complex-inspired sparsification to preserve persistent homological features of the context manifold. On a single NVIDIA RTX 4090, we empirically demonstrate 100 concurrent agents at 2.2 GB total VRAM, with theoretical capacity exceeding 1,000 agents before compute latency becomes the bottleneck. We further introduce Referential Injection, a non-intrusive KV-cache update mechanism that allows asynchronous sub-agents to influence primary generation without stream disruption.
Latent Space Reinforcement Learning for Multi-Robot Exploration
Autonomous mapping of unknown environments is a critical challenge, particularly in scenarios where time is limited. Multi-agent systems can enhance efficiency through collaboration, but the scalability of motion-planning algorithms remains a key limitation. Reinforcement learning has been explored as a solution, but existing approaches are constrained by the limited input size required for effective learning, restricting their applicability to discrete environments. This work addresses that limitation by leveraging autoencoders to perform dimensionality reduction, compressing high-fidelity occupancy maps into latent state vectors while preserving essential spatial information. Additionally, we introduce a novel procedural generation algorithm based on Perlin noise, designed to generate topologically complex training environments that simulate asteroid fields, caves and forests. These environments are used for training the autoencoder and the navigation algorithm using a hierarchical deep reinforcement learning framework for decentralized coordination. We introduce a weighted consensus mechanism that modulates reliance on shared data via a tuneable trust parameter, ensuring robustness to accumulation of errors. Experimental results demonstrate that the proposed system scales effectively with number of agents and generalizes well to unfamiliar, structurally distinct environments and is resilient in communication-constrained settings.
Harm in AI-Driven Societies: An Audit of Toxicity Adoption on Chirper.ai
Large Language Models (LLMs) are increasingly embedded in autonomous agents that participate in online social ecosystems, where interactions are sequential, cumulative, and only partially controlled. While prior work has documented the generation of toxic content by LLMs, far less is known about how exposure to harmful content shapes agent behavior over time, particularly in environments composed entirely of interacting AI agents. In this work, we study toxicity adoption of LLM-driven agents on Chirper.ai, a fully AI-driven social platform. Specifically, we model interactions in terms of stimuli (posts) and responses (comments), and by operationalizing exposure through observable interactions rather than inferred recommendation mechanisms. We conduct a large-scale empirical analysis of agent behavior, examining how response toxicity relates to stimulus toxicity, how repeated exposure affects the likelihood of toxic responses, and whether toxic behavior can be predicted from exposure alone. Our findings show that while toxic responses are more likely following toxic stimuli, a substantial fraction of toxicity emerges spontaneously, independent of exposure. At the same time, cumulative toxic exposure significantly increases the probability of toxic responding. We further introduce two influence metrics, the Influence-Driven Response Rate and the Spontaneous Response Rate, revealing a strong trade-off between induced and spontaneous toxicity. Finally, we show that the number of toxic stimuli alone enables accurate prediction of whether an agent will eventually produce toxic content. These results highlight exposure as a critical risk factor in the deployment of LLM agents and suggest that monitoring encountered content may provide a lightweight yet effective mechanism for auditing and mitigating harmful behavior in the wild.
Where do We Poop? City-Wide Simulation of Defecation Behavior for Wastewater-Based Epidemiology
Wastewater surveillance, which regularly examines the pathogen biomarkers in wastewater samples, is a valuable tool for monitoring infectious diseases circulating in communities. Yet, most wastewater-based epidemiology methods, which use wastewater surveillance results for disease inferences, implicitly assume that individuals excrete only at their residential locations and that the population contribute to wastewater samples are static. These simplifying assumptions ignore daily mobility, social interactions, and heterogeneous toilet use behavior patterns, which can lead to biased interpretation of wastewater results, especially at upstream sampling locations such as neighborhoods, institutions, or buildings. Here, we introduce an agent-based geospatial simulation framework: Building on an established Patterns of Life model, we simulate daily human activities, mobility, and social contacts within a realistic urban environment and extend this agent-based framework with a physiologically motivated defecation cycle and toilet usage patterns. We couple this behavioral model with an infectious disease model to simulate transmissions through spatial and social interactions. When a defecation occurs for an infected agent, we use a pathogen shedding model to determine the amount of pathogen shed in the feces. Such a framework, integrating population mobility, disease transmission, toilet use behavior, and pathogen shedding models, is capable to simulate the Spatial-temporal dynamics of wastewater signals for a city. Using a case study of 10,000 simulated agents in Fulton County, Georgia, we examine how varying infection rates alter epidemic trajectories, pathogen loads in wastewater, and the spatial distribution of contamination across time.
Finch: Benchmarking Finance & Accounting across Spreadsheet-Centric Enterprise Workflows
We introduce a finance & accounting benchmark (Finch) for evaluating AI agents on real-world, enterprise-grade professional workflows -- interleaving data entry, structuring, formatting, web search, cross-file retrieval, calculation, modeling, validation, translation, visualization, and reporting. Finch is sourced from authentic enterprise workspaces at Enron (15,000 spreadsheets and 500,000 emails from 150 employees) and other financial institutions, preserving in-the-wild messiness across multimodal artifacts (text, tables, formulas, charts, code, and images) and spanning diverse domains such as budgeting, trading, and asset management. We propose a workflow construction process that combines LLM-assisted discovery with expert annotation: (1) LLM-assisted, expert-verified derivation of workflows from real-world email threads and version histories of spreadsheet files, and (2) meticulous expert annotation for workflows, requiring over 700 hours of domain-expert effort. This yields 172 composite workflows with 384 tasks, involving 1,710 spreadsheets with 27 million cells, along with PDFs and other artifacts, capturing the intrinsically messy, long-horizon, knowledge-intensive, and collaborative nature of real-world enterprise work. We conduct both human and automated evaluations of frontier AI systems including GPT 5.1, Claude Sonnet 4.5, Gemini 3 Pro, Grok 4, and Qwen 3 Max, and GPT 5.1 Pro spends 16.8 minutes per workflow yet passes only 38.4% of workflows, while Claude Sonnet 4.5 passes just 25.0%. Comprehensive case studies further surface the challenges that real-world enterprise workflows pose for AI agents.
Systems and Control (EESS)
Simple yet Effective Anti-windup Techniques for Amplitude and Rate Saturation: An Autonomous Underwater Vehicle Case Study
Actuator amplitude and rate saturation (A\&RSat), together with their consequent windup problem, have long been recognised as challenges in control systems. Anti-windup (AW) solutions have been developed over the past decades, which can generally be categorised into two main groups: classical and modern anti-windup (CAW and MAW) approaches. Classical methods have provided simple and effective results, mainly addressing amplitude saturation. In contrast, modern approaches offer powerful and theoretically sound solutions capable of handling both amplitude and rate saturations. However, MAW's derivation process often imposes restrictive conditions and can be complex to apply in practical engineering problems. Nevertheless, the literature has paid limited attention (if not entirely ignored) to the potential of simple yet effective CAW schemes that can operate in the presence of both A\&RSat elements. This paper revisits this issue and proposes modifications to two well-known controllers: PID and LQI. The obtained results, benchmarked on the REMUS AUV yaw control problem and compared with constrained MPC, indicate that these classical techniques can still provide simple yet effective solutions with comparable performance, at least for SISO systems. These findings may stimulate further research into solutions that achieve comparable performance with only one (or a limited number of) additional tuning parameters and straightforward implementation.
An Energy-Efficient Smart Bus Transport Management System with Blind-Spot Collision Detection Ability
Public bus transport systems in developing countries often suffer from a lack of real-time location updates and for users, making commuting inconvenient and unreliable for passengers. Furthermore, stopping at undesired locations rather than designated bus stops creates safety risks and contributes to roadblocks, often causing traffic congestion. Additionally, issues such as blind spots, along with a lack of following traffic laws, increase the chances of accidents. In this work, we address these challenges by proposing a smart public bus system along with intelligent bus stops that enhance safety, efficiency, and sustainability. Our approach includes a deep learning-based blind-spot warning system to help drivers avoid accidents with automated bus-stop detection to accurately identify bus stops, improving transit efficiency. We also introduce IoT-based solar-powered smart bus stops that show real-time passenger counts, along with an RFID-based card system to track where passengers board and exit. A smart door system ensures safer and more organised boarding, while real-time bus tracking keeps passengers informed. To connect all these features, we use an HTTP-based server for seamless communication between the interconnected network systems. Our proposed system demonstrated approximately 99% efficiency in real-time blind spot detection while stopping precisely at the bus stops. Furthermore, the server showed real-time location updates both to the users and at the bus stops, enhancing commuting efficiency. The proposed energy-efficient bus stop demonstrated 12.71kWh energy saving, promoting sustainable architecture. Full implementation and source code are available at: https://github.com/sadman-adib/MoveMe-IoT
comment: 29 pages, 11 figures
Multi-Objective Operational Optimization of Energy Storage Systems in User-Side Microgrids
An operational optimization strategy for microgrid energy storage systems (ESSs) is developed to address practical user-oriented application requirements, and its effectiveness is validated using real-world operational data. First, a fundamental ESS model is established to characterize system dynamics and operational constraints, providing a theoretical basis for optimization. Subsequently, a multi-objective operational optimization framework is formulated to simultaneously minimize electricity cost, reduce carbon emissions, and enhance renewable energy utilization. To ensure computational efficiency and scalability, the commercial optimization solver Gurobi is employed. The proposed strategy is evaluated using actual microgrid operational data, demonstrating that the developed ESS model accurately represents real system constraints. Compared with existing user operational strategies, the proposed approach achieves an average reduction of 13.47% in electricity cost. Moreover, by dynamically adjusting the weighting factors of the multi-objective formulation, the strategy enables flexible operational modes and significantly improves adaptability to varying operating scenarios. In addition, the proposed framework provides decision support for user-side microgrids participating in surplus electricity feed-in policies. The main contribution of this work lies in its user-centric optimization design, which enhances operational flexibility and scenario adaptability through multi-objective weight allocation, offering a practical and scalable solution for real-world microgrid ESS operation.
comment: 9 pages, 9 figures, journal
Reinforcement Learning Based Whittle Index Policy for Scheduling Wireless Sensors SC
Wireless Sensor nodes used in remote monitoring applications typically transmit data in a timely manner, often optimising for the Age of Information (AoI). However, this approach focuses solely on keeping updates at the sink fresh without considering the actual value of each update. This method is inefficient for sensor networks, which have limited resources. Transmitting data indiscriminately without evaluating its significance not only increases energy consumption but also raises storage requirements. To address this challenge, we propose a reinforcement learning-based scheduling strategy that prioritises sensor transmissions based on the Age of Incorrect Information (AoII) using an edge mining technique. This ensures that the most valuable updates are received while reducing energy consumption. We frame the scheduling problem as a Restless Multi-Armed Bandit (RMAB) and introduce a Whittle Index-based Q-learning (WIQL) policy to dynamically select the most informative sensors. Additionally, we employ an edge mining technique, where raw sensor data is processed locally before transmission, enhancing state estimation at the sink. Experimental results demonstrate that WIQL achieves near-optimal performance while significantly reducing the number of transmitted packets by up to 70%. This reinforcement learning-based approach provides a scalable and adaptive solution for efficient data scheduling in resource-constrained WSNs.
comment: 6 pages, 3 figures. To appear in the proceedings of 30th IEEE Symposium on Computers and Communications (ISCC) 2-5 July, Bologna Italy
Transient Power Allocation Control Scheme for Hybrid Hydrogen Electrolyzer-Supercapacitor System with Autonomous Inertia Response
This paper proposes a hybrid hydrogen electrolyzer-supercapacitor system (HESS) with a novel control strategy for renewable-dominant power grids. The HESS consists of alkaline electrolyzers (AEL), proton exchange membrane electrolyzers (PEMEL), and supercapacitors (SC). The interfacing inverters between HESS and power grid are regulated by an inertia emulation control strategy. From HESS, AEL is with conventional DC power control, whereas PEMEL and SC are designed with the proposed dynamic inertia control and capacitive inertia control, respectively. Benefitting from the coordination of three controls, within the HESS, high-frequency transient power components are autonomously handled by SC, stable frequency power components are regulated by PEMEL, and low-frequency steady-state power is addressed by AEL, characterized by low operational gains and longer lifetimes. SC delivers transient power, significantly alleviating energy losses on electrolyzers and achieving adequate inertia recovery capabilities while requiring no additional communication. Implementing SOC recovery control enables the SC to withstand more than three times more stability discharge cycles compared to an SC without SOC recovery. Furthermore, a large-signal mathematical model based on mixed potential theory is established, providing clear stability boundaries for system parameters. Dynamic analyses theoretically verify system feasibility, and extensive hardware-in-the-loop experimental results fully validate the proposed HESS along with the corresponding transient power allocation controls.
Tube-based robust nonlinear model predictive control of anaerobic co-digestion
To match the growing demand for bio-methane production, anaerobic digesters need to embrace the co-digestion of different feedstocks; in addition, to improve the techno-economic performance, an optimal and time-varying adaptation of the input diet is required. These operation modes constitute a very hard challenge for the limited instrumentation and control equipment typically installed aboard full-scale plants. A model-based predictive approach may be able to handle such control problem, but the identification of reliable predictive models is limited by the low information content typical of the data available from full-scale plants' operations, which entail high parametric uncertainty. In this work, the application of a tube-based robust nonlinear model predictive control (NMPC) is proposed to regulate bio-methane production over a period of diet change in time, while warranting safe operation and dealing with uncertainties. In view of its upcoming validation on a true small pilot-scale plant, the NMPC capabilities are assessed via numerical simulations designed to resemble as much as possible the experimental setup, along with some practical final considerations.
comment: This paper has been accepted for presentation at the IEEE 64th Conference on Decision and Control (CDC), 2025. In press in the proceedings of the conference
Compensating Star-Trackers Misalignments with Adaptive Multi-Model Estimation
This paper presents an adaptive multi-model framework for jointly estimating spacecraft attitude and star-tracker misalignments in GPS-denied deep-space CubeSat missions. A Multiplicative Extended Kalman Filter (MEKF) estimates attitude, angular velocity, and gyro bias, while a Bayesian Multiple-Model Adaptive Estimation (MMAE) layer operates on a discrete grid of body-to-sensor misalignment hypotheses. In the single-misalignment case, the MEKF processes gyroscope measurements and TRIAD-based attitude observations, and the MMAE updates a three-dimensional grid over the misalignment vector. For a dual-misalignment configuration, the same MEKF dynamics are retained, and the MMAE bank is driven directly by stacked line-of-sight measurements from two star trackers, forming a six-dimensional grid over the two misalignment quaternions without augmenting the continuous-state dimension. A novel diversity metric, $Ψ$, is introduced to trigger adaptive refinement of the misalignment grid around a weighted-mean estimate, thereby preventing premature collapse of the model probabilities and concentrating computation in the most likely region of the parameter space. Monte Carlo simulations show arcsecond-level misalignment estimation and sub-degree attitude errors for both estimation problems, with estimation errors remaining well-bounded, proving robustness and consistency. These results indicate that the proposed MEKF--MMAE architecture enables accurate, autonomous, and computationally efficient in-flight calibration for resource-constrained spacecraft, and establishes dual star-tracker misalignment estimation as a practical option for deep-space CubeSat missions.
comment: 35 pages, 12 figures. arXiv admin note: substantial text overlap with arXiv:2507.19838
Time Series Based CO2 Emission Forecasting and Energy Mix Analysis for Net Zero Transitions: A Multi Country Study
This study examines long-term CO$_2$ emission trajectories across five major economies: Nigeria, the United States, China, Brazil, and Russia, by integrating national energy-mix characteristics with time-series forecasting models. Annual emissions from 2000-2023 were analyzed alongside energy production data to classify countries into fossil-dependent, transition-phase, or renewable-accelerated profiles. Three forecasting models (ARIMA, SARIMA, and Holt-Winters exponential smoothing) were evaluated using MAE, RMSE, MAPE, and R$^2$ metrics. Results show that Holt-Winters provided the most accurate forecasts for Nigeria, the United States, China, and Brazil, while SARIMA performed best for Russia due to its relatively stable emissions. Long-term projections from 2024 to 2060 indicate divergent decarbonization pathways. Brazil aligns most closely with a low-emission future owing to its renewable-dominant energy system, whereas Nigeria continues on an upward emissions trajectory driven by fossil dependence. The United States and China maintain moderate declines but require accelerated mitigation to reach their respective net-zero commitments. Russia's emissions remain largely flat under current conditions. These findings highlight the strong influence of energy structures on national decarbonization prospects and underscore the need for targeted energy policy reforms to align with global climate objectives.
comment: Accepted at the Global Journal of Engineering and Technology Advances
Scalable Data-Driven Reachability Analysis and Control via Koopman Operators with Conformal Coverage Guarantees
We propose a scalable reachability-based framework for probabilistic, data-driven safety verification of unknown nonlinear dynamics. We use Koopman theory with a neural network (NN) lifting function to learn an approximate linear representation of the dynamics and design linear controllers in this space to enable closed-loop tracking of a reference trajectory distribution. Closed-loop reachable sets are efficiently computed in the lifted space and mapped back to the original state space via NN verification tools. To capture model mismatch between the Koopman dynamics and the true system, we apply conformal prediction to produce statistically-valid error bounds that inflate the reachable sets to ensure the true trajectories are contained with a user-specified probability. These bounds generalize across references, enabling reuse without recomputation. Results on high-dimensional MuJoCo tasks (11D Hopper, 28D Swimmer) and 12D quadcopters show improved reachable set coverage rate, computational efficiency, and conservativeness over existing methods.
comment: Under review, 28 pages, 12 figures
Tiny Machine Learning for Real-Time Aquaculture Monitoring: A Case Study in Morocco
Aquaculture, the farming of aquatic organisms, is a rapidly growing industry facing challenges such as water quality fluctuations, disease outbreaks, and inefficient feed management. Traditional monitoring methods often rely on manual labor and are time consuming, leading to potential delays in addressing issues. This paper proposes the integration of low-power edge devices using Tiny Machine Learning (TinyML) into aquaculture systems to enable real-time automated monitoring and control, such as collecting data and triggering alarms, and reducing labor requirements. The system provides real-time data on the required parameters such as pH levels, temperature, dissolved oxygen, and ammonia levels to control water quality, nutrient levels, and environmental conditions enabling better maintenance, efficient resource utilization, and optimal management of the enclosed aquaculture space. The system enables alerts in case of anomaly detection. The data collected by the sensors over time can serve for important decision-making regarding optimizing water treatment processes, feed distribution, feed pattern analysis and improve feed efficiency, reducing operational costs. This research explores the feasibility of developing TinyML-based solutions for aquaculture monitoring, considering factors such as sensor selection, algorithm design, hardware constraints, and ethical considerations. By demonstrating the potential benefits of TinyML in aquaculture, our aim is to contribute to the development of more sustainable and efficient farming practices.
comment: Published in IEEE GCAIoT 2024
Spatially-Coupled Network RNA Velocities: A Control-Theoretic Perspective
RNA velocity is an important model that combines cellular spliced and unspliced RNA counts to infer dynamical properties of various regulatory functions. Despite its wide applicability and many variants used in practice, the model has not been adequately designed to directly account for both intracellular gene regulatory network interactions and spatial intercellular communications. Here, we propose a new RNA velocity approach that jointly and directly captures two new network structures: an intracellular gene regulatory network (GRN) and an intercellular interaction network that captures interactions between (neighboring) cells, with relevance to spatial transcriptomics. We theoretically analyze this two-level network system through the lens of control and consensus theory. In particular, we investigate network equilibria, stability, cellular network consensus, and optimal control approaches for targeted drug intervention.
comment: 5 figures
Improving Variational Autoencoder using Random Fourier Transformation: An Aviation Safety Anomaly Detection Case-Study
In this study, we focus on the training process and inference improvements of deep neural networks (DNNs), specifically Autoencoders (AEs) and Variational Autoencoders (VAEs), using Random Fourier Transformation (RFT). We further explore the role of RFT in model training behavior using Frequency Principle (F-Principle) analysis and show that models with RFT turn to learn low frequency and high frequency at the same time, whereas conventional DNNs start from low frequency and gradually learn (if successful) high-frequency features. We focus on reconstruction-based anomaly detection using autoencoder and variational autoencoder and investigate the RFT's role. We also introduced a trainable variant of RFT that uses the existing computation graph to train the expansion of RFT instead of it being random. We showcase our findings with two low-dimensional synthetic datasets for data representation, and an aviation safety dataset, called Dashlink, for high-dimensional reconstruction-based anomaly detection. The results indicate the superiority of models with Fourier transformation compared to the conventional counterpart and remain inconclusive regarding the benefits of using trainable Fourier transformation in contrast to the Random variant.
No Minima, No Collisions: Combining Modulation and Control Barrier Function Strategies for Feasible Dynamic Collision Avoidance
Control Barrier Function Quadratic Programs (CBF-QPs) have become a central tool for real-time safety-critical control due to their applicability to general control-affine systems and their ability to enforce constraints through optimization. Yet, they often generate trajectories with undesirable local minima that prevent convergence to goals. On the other hand, Modulation of Dynamical Systems (Mod-DS) methods (including normal, reference, and on-manifold variants) reshape nominal vector fields geometrically and achieve obstacle avoidance with few or even no local minima. However, Mod-DS provides no straightforward mechanism for handling input constraints and remains largely restricted to fully actuated systems. In this paper, we revisit the theoretical foundations of both approaches and show that, despite their seemingly different constructions, the normal Mod-DS is a special case of the CBF-QP, and the reference Mod-DS is linked to the CBF-QP through a single shared equation. These connections motivate our Modulated CBF-QP (MCBF-QP) framework, which introduces reference and on-manifold modulation variants that reduce or fully eliminate the spurious equilibria inherent to CBF-QPs for general control-affine systems operating in dynamic, cluttered environments. We validate the proposed controllers in simulated hospital settings and in real-world experiments on fully actuated Ridgeback robots and underactuated Fetch platforms. Across all evaluations, Modulated CBF-QPs consistently outperform standard CBF-QPs on every performance metric.
From Optimization to Control: Quasi Policy Iteration
Recent control algorithms for Markov decision processes (MDPs) have been designed using an implicit analogy with well-established optimization algorithms. In this paper, we adopt the quasi-Newton method (QNM) from convex optimization to introduce a novel control algorithm coined as quasi-policy iteration (QPI). In particular, QPI is based on a novel approximation of the ``Hessian'' matrix in the policy iteration algorithm, which exploits two linear structural constraints specific to MDPs and allows for the incorporation of prior information on the transition probability kernel. While the proposed algorithm has the same computational complexity as value iteration, it exhibits an empirical convergence behavior similar to that of QNM with a low sensitivity to the discount factor.
Nonlinear Oscillatory Response of Automated Vehicle Car-following: Theoretical Analysis with Traffic State and Control Input Limits
This paper presents a framework grounded in the theory of describing function (DF) and incremental-input DF to theoretically analyze the nonlinear oscillatory response of automated vehicles (AVs) car-following (CF) amidst traffic oscillations, considering the limits of traffic state and control input. While prevailing approaches largely ignore these limits (i.e., saturation of acceleration/deceleration and speed) and focus on linear string stability analysis, this framework establishes a basis for theoretically analyzing the frequency response of AV systems with nonlinearities imposed by these limits. To this end, trajectories of CF pairs are decomposed into nominal and oscillatory trajectories, subsequently, the controlled AV system is repositioned within the oscillatory trajectory coordinates. Built on this base, DFs are employed to approximate the frequency responses of nonlinear saturation components by using their first harmonic output, thereby capturing the associated amplification ratio and phase shift. Considering the closed-loop nature of AV control systems, where system states and control input mutually influence each other, amplification ratios and phase shifts are balanced within the loop to ensure consistency. This balancing process may render multiple solutions, hence the incremental-input DF is further applied to identify the reasonable ones. The proposed method is validated by estimations from Simulink, and further comparisons with prevailing methods are conducted. Results confirm the alignment of our framework with Simulink results and exhibit its superior accuracy in analysis compared to the prevailing methods. Furthermore, the framework proves valuable in string stability analysis, especially when conventional linear methods offer misleading insights.
The Frequency Response of Networks as Open Systems
Many biological, technological, and social systems can be effectively described as networks of interacting subsystems. Typically, these networks are not isolated objects, but interact with their environment through both signals and information that is received by specific nodes with an input function or released to the environment by other nodes with an output function. An important question is whether the structure of different networks, together with the particular selection of input and output nodes, is such that it favors the passing or blocking of such signals. For a given network and a given choice of the input and output nodes, the H2-norm provides a natural and general quantification of the extent to which input signals, whether deterministic or stochastic, periodic or arbitrary, are amplified. We analyze a diverse set of empirical networks and find that many naturally occurring systems, such as food webs, signaling pathways, and gene regulatory circuits, are structurally organized to enhance the passing of signals; in contrast, the structure of engineered systems like power grids appears to be intentionally designed to suppress signal propagation.
comment: Accepted for publication in Nature Communications
Towards a Multi-Layer Defence Framework for Securing Near-Real-Time Operations in Open RAN
Securing the near-real-time (near-RT) control operations in Open Radio Access Networks (Open RAN) is increasingly critical, yet remains insufficiently addressed, as new runtime threats target the control loop while the system is operational. In this paper, we propose a multi-layer defence framework designed to enhance the security of near-RT RAN Intelligent Controller (RIC) operations. We classify operational-time threats into three categories, message-level, data-level, and control logic-level, and design and implement a dedicated detection and mitigation component for each: a signature-based E2 message inspection module performing structural and semantic validation of signalling exchanges, a telemetry poisoning detector based on temporal anomaly scoring using an LSTM network, and a runtime xApp attestation mechanism based on execution-time hash challenge-response. The framework is evaluated on an O-RAN testbed comprising FlexRIC and a commercial RAN emulator, demonstrating effective detection rates, low latency overheads, and practical integration feasibility. Results indicate that the proposed safeguards can operate within near-RT time constraints while significantly improving protection against runtime attacks, introducing less than 80 ms overhead for a network with 500 User Equipment (UEs). Overall, this work lays the foundation for deployable, layered, and policy-driven runtime security architectures for the near-RT RIC control loop in Open RAN, and provides an extensible framework into which future mitigation policies and threat-specific modules can be integrated.
comment: Accepted version. Published in IEEE Open Journal of the Communications Society (OJCOMS). DOI: 10.1109/OJCOMS.2025.3650736
SPARC: Spine with Prismatic and Revolute Compliance for Quadruped Robots
We present SPARC, a compact, open-source 3-DoF sagittal-plane spine module that combines revolute (pitch) and prismatic (axial) motion with programmable task-space impedance for quadruped robots. The system integrates three torque-controlled actuators, a 1~kHz control board, and protected power electronics in a 1.26~kg package. A floating-base impedance controller with dynamics compensation renders tunable spring-damper behavior along horizontal, vertical, and rotational axes. Benchtop experiments validate the approach: quasi-static tests demonstrate linear force-displacement characteristics with commanded horizontal stiffness spanning 300--700~N/m ($\leq 1.5\%$ error, $R^2 \geq 0.992$), while dynamic trials confirm predictable damping behavior across multiple settings. Furthermore, simulation studies reveal that adapting spine stiffness based on terrain slope and stride length significantly reduces the cost of transport. SPARC provides an open-source platform for systematic studies of spine compliance in legged locomotion, with complete hardware and firmware resources available for community use. Github repository: https://github.com/YueWang996/sparc.git
Robotics
Calling for Backup: How Children Navigate Successive Robot Communication Failures
How do children respond to repeated robot errors? While prior research has examined adult reactions to successive robot errors, children's responses remain largely unexplored. In this study, we explore children's reactions to robot social errors and performance errors. For the latter, this study reproduces the successive robot failure paradigm of Liu et al. with child participants (N=59, ages 8-10) to examine how young users respond to repeated robot conversational errors. Participants interacted with a robot that failed to understand their prompts three times in succession, with their behavioral responses video-recorded and analyzed. We found both similarities and differences compared to adult responses from the original study. Like adults, children adjusted their prompts, modified their verbal tone, and exhibited increasingly emotional non-verbal responses throughout successive errors. However, children demonstrated more disengagement behaviors, including temporarily ignoring the robot or actively seeking an adult. Errors did not affect participants' perception of the robot, suggesting more flexible conversational expectations in children. These findings inform the design of more effective and developmentally appropriate human-robot interaction systems for young users.
DefVINS: Visual-Inertial Odometry for Deformable Scenes
Deformable scenes violate the rigidity assumptions underpinning classical visual-inertial odometry (VIO), often leading to over-fitting to local non-rigid motion or severe drift when deformation dominates visual parallax. We introduce DefVINS, a visual-inertial odometry framework that explicitly separates a rigid, IMU-anchored state from a non--rigid warp represented by an embedded deformation graph. The system is initialized using a standard VIO procedure that fixes gravity, velocity, and IMU biases, after which non-rigid degrees of freedom are activated progressively as the estimation becomes well conditioned. An observability analysis is included to characterize how inertial measurements constrain the rigid motion and render otherwise unobservable modes identifiable in the presence of deformation. This analysis motivates the use of IMU anchoring and informs a conditioning-based activation strategy that prevents ill-posed updates under poor excitation. Ablation studies demonstrate the benefits of combining inertial constraints with observability-aware deformation activation, resulting in improved robustness under non-rigid environments.
comment: 4 figures, 3 tables. Submitted to RA-L
Bayesian Inverse Games with High-Dimensional Multi-Modal Observations
Many multi-agent interaction scenarios can be naturally modeled as noncooperative games, where each agent's decisions depend on others' future actions. However, deploying game-theoretic planners for autonomous decision-making requires a specification of all agents' objectives. To circumvent this practical difficulty, recent work develops maximum likelihood techniques for solving inverse games that can identify unknown agent objectives from interaction data. Unfortunately, these methods only infer point estimates and do not quantify estimator uncertainty; correspondingly, downstream planning decisions can overconfidently commit to unsafe actions. We present an approximate Bayesian inference approach for solving the inverse game problem, which can incorporate observation data from multiple modalities and be used to generate samples from the Bayesian posterior over the hidden agent objectives given limited sensor observations in real time. Concretely, the proposed Bayesian inverse game framework trains a structured variational autoencoder with an embedded differentiable Nash game solver on interaction datasets and does not require labels of agents' true objectives. Extensive experiments show that our framework successfully learns prior and posterior distributions, improves inference quality over maximum likelihood estimation-based inverse game approaches, and enables safer downstream decision-making without sacrificing efficiency. When trajectory information is uninformative or unavailable, multimodal inference further reduces uncertainty by exploiting additional observation modalities.
RoboReward: General-Purpose Vision-Language Reward Models for Robotics
A well-designed reward is critical for effective reinforcement learning-based policy improvement. In real-world robotic domains, obtaining such rewards typically requires either labor-intensive human labeling or brittle, handcrafted objectives. Vision-language models (VLMs) have shown promise as automatic reward models, yet their effectiveness on real robot tasks is poorly understood. In this work, we aim to close this gap by introducing (1) \textbf{RoboReward}, a robotics reward dataset and benchmark built on large-scale real-robot corpora from Open X-Embodiment (OXE) and RoboArena, and (2) vision-language reward models trained on this dataset (RoboReward 4B/8B). Because OXE is success-heavy and lacks failure examples, we propose a \emph{negative examples data augmentation} pipeline that generates calibrated \emph{negatives} and \emph{near-misses} via counterfactual relabeling of successful episodes and temporal clipping to create partial-progress outcomes from the same videos. Using this framework, we produce an extensive training and evaluation dataset that spans diverse tasks and embodiments and enables systematic evaluation of whether state-of-the-art VLMs can reliably provide rewards for robotics. Our evaluation of leading open-weight and proprietary VLMs reveals that no model excels across all tasks, underscoring substantial room for improvement. We then train general-purpose 4B- and 8B-parameter models that outperform much larger VLMs in assigning rewards for short-horizon robotic tasks. Finally, we deploy the 8B-parameter reward VLM in real-robot reinforcement learning and find that it improves policy learning over Gemini Robotics-ER 1.5, a frontier physical reasoning VLM trained on robotics data, by a large margin, while substantially narrowing the gap to RL training with human-provided rewards.
From 2D to 3D terrain-following area coverage path planning
An algorithm for 3D terrain-following area coverage path planning is presented. Multiple adjacent paths are generated that are (i) locally apart from each other by a distance equal to the working width of a machinery, while (ii) simultaneously floating at a projection distance equal to a specific working height above the terrain. The complexities of the algorithm in comparison to its 2D equivalent are highlighted. These include uniformly spaced elevation data generation using an Inverse Distance Weighting-approach and a local search. Area coverage path planning results for real-world 3D data within an agricultural context are presented to validate the algorithm.
comment: 6 pages, 10 figures, 1 table
Vision-based Goal-Reaching Control for Mobile Robots Using a Hierarchical Learning Framework
Reinforcement learning (RL) is effective in many robotic applications, but it requires extensive exploration of the state-action space, during which behaviors can be unsafe. This significantly limits its applicability to large robots with complex actuators operating on unstable terrain. Hence, to design a safe goal-reaching control framework for large-scale robots, this paper decomposes the whole system into a set of tightly coupled functional modules. 1) A real-time visual pose estimation approach is employed to provide accurate robot states to 2) an RL motion planner for goal-reaching tasks that explicitly respects robot specifications. The RL module generates real-time smooth motion commands for the actuator system, independent of its underlying dynamic complexity. 3) In the actuation mechanism, a supervised deep learning model is trained to capture the complex dynamics of the robot and provide this model to 4) a model-based robust adaptive controller that guarantees the wheels track the RL motion commands even on slip-prone terrain. 5) Finally, to reduce human intervention, a mathematical safety supervisor monitors the robot, stops it on unsafe faults, and autonomously guides it back to a safe inspection area. The proposed framework guarantees uniform exponential stability of the actuation system and safety of the whole operation. Experiments on a 6,000 kg robot in different scenarios confirm the effectiveness of the proposed framework.
NMPC-Augmented Visual Navigation and Safe Learning Control for Large-Scale Mobile Robots
A large-scale mobile robot (LSMR) is a high-order multibody system that often operates on loose, unconsolidated terrain, which reduces traction. This paper presents a comprehensive navigation and control framework for an LSMR that ensures stability and safety-defined performance, delivering robust operation on slip-prone terrain by jointly leveraging high-performance techniques. The proposed architecture comprises four main modules: (1) a visual pose-estimation module that fuses onboard sensors and stereo cameras to provide an accurate, low-latency robot pose, (2) a high-level nonlinear model predictive control that updates the wheel motion commands to correct robot drift from the robot reference pose on slip-prone terrain, (3) a low-level deep neural network control policy that approximates the complex behavior of the wheel-driven actuation mechanism in LSMRs, augmented with robust adaptive control to handle out-of-distribution disturbances, ensuring that the wheels accurately track the updated commands issued by high-level control module, and (4) a logarithmic safety module to monitor the entire robot stack and guarantees safe operation. The proposed low-level control framework guarantees uniform exponential stability of the actuation subsystem, while the safety module ensures the whole system-level safety during operation. Comparative experiments on a 6,000 kg LSMR actuated by two complex electro-hydrostatic drives, while synchronizing modules operating at different frequencies.
Priority-Aware Multi-Robot Coverage Path Planning
Multi-robot systems are widely used for coverage tasks that require efficient coordination across large environments. In Multi-Robot Coverage Path Planning (MCPP), the objective is typically to minimize the makespan by generating non-overlapping paths for full-area coverage. However, most existing methods assume uniform importance across regions, limiting their effectiveness in scenarios where some zones require faster attention. We introduce the Priority-Aware MCPP (PA-MCPP) problem, where a subset of the environment is designated as prioritized zones with associated weights. The goal is to minimize, in lexicographic order, the total priority-weighted latency of zone coverage and the overall makespan. To address this, we propose a scalable two-phase framework combining (1) greedy zone assignment with local search, spanning-tree-based path planning, and (2) Steiner-tree-guided residual coverage. Experiments across diverse scenarios demonstrate that our method significantly reduces priority-weighted latency compared to standard MCPP baselines, while maintaining competitive makespan. Sensitivity analyses further show that the method scales well with the number of robots and that zone coverage behavior can be effectively controlled by adjusting priority weights.
comment: IEEE Robotics and Automation Letters, 8 pages, 10 figures
LLM-Based Agentic Exploration for Robot Navigation & Manipulation with Skill Orchestration
This paper presents an end-to-end LLM-based agentic exploration system for an indoor shopping task, evaluated in both Gazebo simulation and a corresponding real-world corridor layout. The robot incrementally builds a lightweight semantic map by detecting signboards at junctions and storing direction-to-POI relations together with estimated junction poses, while AprilTags provide repeatable anchors for approach and alignment. Given a natural-language shopping request, an LLM produces a constrained discrete action at each junction (direction and whether to enter a store), and a ROS finite-state main controller executes the decision by gating modular motion primitives, including local-costmap-based obstacle avoidance, AprilTag approaching, store entry, and grasping. Qualitative results show that the integrated stack can perform end-to-end task execution from user instruction to multi-store navigation and object retrieval, while remaining modular and debuggable through its text-based map and logged decision history.
Optimal Transport-Based Decentralized Multi-Agent Distribution Matching
This paper presents a decentralized control framework for distribution matching in multi-agent systems (MAS), where agents collectively achieve a prescribed terminal spatial distribution. The problem is formulated using optimal transport (Wasserstein distance), which provides a principled measure of distributional discrepancy and serves as the basis for the control design. To avoid solving the global optimal transport problem directly, the distribution-matching objective is reformulated into a tractable per-agent decision process, enabling each agent to identify its desired terminal locations using only locally available information. A sequential weight-update rule is introduced to construct feasible local transport plans, and a memory-based correction mechanism is incorporated to maintain reliable operation under intermittent and range-limited communication. Convergence guarantees are established, showing cycle-wise improvement of a surrogate transport cost under both linear and nonlinear agent dynamics. Simulation results demonstrate that the proposed framework achieves effective and scalable distribution matching while operating fully in a decentralized manner.
Variable Elimination in Hybrid Factor Graphs for Discrete-Continuous Inference & Estimation
Many hybrid problems in robotics involve both continuous and discrete components, and modeling them together for estimation tasks has been a long standing and difficult problem. Hybrid Factor Graphs give us a mathematical framework to model these types of problems, however existing approaches for solving them are based on approximations. In this work, we propose an efficient Hybrid Factor Graph framework alongwith a variable elimination algorithm to produce a hybrid Bayes network, which can then be used for exact Maximum A Posteriori estimation and marginalization over both sets of variables. Our approach first develops a novel hybrid Gaussian factor which can connect to both discrete and continuous variables, and a hybrid conditional which can represent multiple continuous hypotheses conditioned on the discrete variables. Using these representations, we derive the process of hybrid variable elimination under the Conditional Linear Gaussian scheme, giving us exact posteriors as hybrid Bayes network. To bound the number of discrete hypotheses, we use a tree-structured representation of the factors coupled with a simple pruning and probabilistic assignment scheme, which allows for tractable inference. We demonstrate the applicability of our framework on a SLAM dataset with ambiguous measurements, where discrete choices for the most likely measurement have to be made. Our demonstrated results showcase the accuracy, generality, and simplicity of our hybrid factor graph framework.
Contractive Diffusion Policies: Robust Action Diffusion via Contractive Score-Based Sampling with Differential Equations ICLR 2026
Diffusion policies have emerged as powerful generative models for offline policy learning, whose sampling process can be rigorously characterized by a score function guiding a Stochastic Differential Equation (SDE). However, the same score-based SDE modeling that grants diffusion policies the flexibility to learn diverse behavior also incurs solver and score-matching errors, large data requirements, and inconsistencies in action generation. While less critical in image generation, these inaccuracies compound and lead to failure in continuous control settings. We introduce Contractive Diffusion Policies (CDPs) to induce contractive behavior in the diffusion sampling dynamics. Contraction pulls nearby flows closer to enhance robustness against solver and score-matching errors while reducing unwanted action variance. We develop an in-depth theoretical analysis along with a practical implementation recipe to incorporate CDPs into existing diffusion policy architectures with minimal modification and computational cost. We evaluate CDPs for offline learning by conducting extensive experiments in simulation and real-world settings. Across benchmarks, CDPs often outperform baseline policies, with pronounced benefits under data scarcity.
comment: Under review at ICLR 2026
Simulations of MRI Guided and Powered Ferric Applicators for Tetherless Delivery of Therapeutic Interventions
Magnetic Resonance Imaging (MRI) is a well-established modality for pre-operative planning and is also explored for intra-operative guidance of procedures such as intravascular interventions. Among the experimental robot-assisted technologies, the magnetic field gradients of the MRI scanner are used to power and maneuver ferromagnetic applicators for accessing sites in the patient's body via the vascular network. In this work, we propose a computational platform for preoperative planning and modeling of MRI-powered applicators inside blood vessels. This platform was implemented as a two-way data and command pipeline that links the MRI scanner, the computational core, and the operator. The platform first processes multi-slice MR data to extract the vascular bed and then fits a virtual corridor inside the vessel. This corridor serves as a virtual fixture (VF), a forbidden region for the applicators to avoid vessel perforation or collision. The geometric features of the vessel centerline, the VF, and MRI safety compliance (dB/dt, max available gradient) are then used to generate magnetic field gradient waveforms. Different blood flow profiles can be user-selected, and those parameters are used for modeling the applicator's maneuvering. The modeling module further generates cues about whether the selected vascular path can be safely maneuvered. Given future experimental studies that require a real-time operation, the platform was implemented on the Qt framework (C/C++) with software modules performing specific tasks running on dedicated threads: PID controller, generation of VF, generation of MR gradient waveforms.
comment: 9 pages, 8 figures, published in ICBBB 2022
From Perception to Symbolic Task Planning: Vision-Language Guided Human-Robot Collaborative Structured Assembly
Human-robot collaboration (HRC) in structured assembly requires reliable state estimation and adaptive task planning under noisy perception and human interventions. To address these challenges, we introduce a design-grounded human-aware planning framework for human-robot collaborative structured assembly. The framework comprises two coupled modules. Module I, Perception-to-Symbolic State (PSS), employs vision-language models (VLMs) based agents to align RGB-D observations with design specifications and domain knowledge, synthesizing verifiable symbolic assembly states. It outputs validated installed and uninstalled component sets for online state tracking. Module II, Human-Aware Planning and Replanning (HPR), performs task-level multi-robot assignment and updates the plan only when the observed state deviates from the expected execution outcome. It applies a minimal-change replanning rule to selectively revise task assignments and preserve plan stability even under human interventions. We validate the framework on a 27-component timber-frame assembly. The PSS module achieves 97% state synthesis accuracy, and the HPR module maintains feasible task progression across diverse HRC scenarios. Results indicate that integrating VLM-based perception with knowledge-driven planning improves robustness of state estimation and task planning under dynamic conditions.
Value Vision-Language-Action Planning & Search
Vision-Language-Action (VLA) models have emerged as powerful generalist policies for robotic manipulation, yet they remain fundamentally limited by their reliance on behavior cloning, leading to brittleness under distribution shift. While augmenting pretrained models with test-time search algorithms like Monte Carlo Tree Search (MCTS) can mitigate these failures, existing formulations rely solely on the VLA prior for guidance, lacking a grounded estimate of expected future return. Consequently, when the prior is inaccurate, the planner can only correct action selection via the exploration term, which requires extensive simulation to become effective. To address this limitation, we introduce Value Vision-Language-Action Planning and Search (V-VLAPS), a framework that augments MCTS with a lightweight, learnable value function. By training a simple multilayer perceptron (MLP) on the latent representations of a fixed VLA backbone (Octo), we provide the search with an explicit success signal that biases action selection toward high-value regions. We evaluate V-VLAPS on the LIBERO robotic manipulation suite, demonstrating that our value-guided search improves success rates by over 5 percentage points while reducing the average number of MCTS simulations by 5-15 percent compared to baselines that rely only on the VLA prior.
comment: 10 pages, 3 figures
Analyzing the Shopping Journey: Computing Shelf Browsing Visits in a Physical Retail Store
Motivated by recent challenges in the deployment of robots into customer-facing roles within retail, this work introduces a study of customer activity in physical stores as a step toward autonomous understanding of shopper intent. We introduce an algorithm that computes shoppers' ``shelf visits'' -- capturing their browsing behavior in the store. Shelf visits are extracted from trajectories obtained via machine vision-based 3D tracking and overhead cameras. We perform two independent calibrations of the shelf visit algorithm, using distinct sets of trajectories (consisting of 8138 and 15129 trajectories), collected in different stores and labeled by human reviewers. The calibrated models are then evaluated on trajectories held out of the calibration process both from the same store on which calibration was performed and from the other store. An analysis of the results shows that the algorithm can recognize customers' browsing activity when evaluated in an environment different from the one on which calibration was performed. We then use the model to analyze the customers' ``browsing patterns'' on a large set of trajectories and their relation to actual purchases in the stores. Finally, we discuss how shelf browsing information could be used for retail planning and in the domain of human-robot interaction scenarios.
Flattening Hierarchies with Policy Bootstrapping NeurIPS 2025
Offline goal-conditioned reinforcement learning (GCRL) is a promising approach for pretraining generalist policies on large datasets of reward-free trajectories, akin to the self-supervised objectives used to train foundation models for computer vision and natural language processing. However, scaling GCRL to longer horizons remains challenging due to the combination of sparse rewards and discounting, which obscures the comparative advantages of primitive actions with respect to distant goals. Hierarchical RL methods achieve strong empirical results on long-horizon goal-reaching tasks, but their reliance on modular, timescale-specific policies and subgoal generation introduces significant additional complexity and hinders scaling to high-dimensional goal spaces. In this work, we introduce an algorithm to train a flat (non-hierarchical) goal-conditioned policy by bootstrapping on subgoal-conditioned policies with advantage-weighted importance sampling. Our approach eliminates the need for a generative model over the (sub)goal space, which we find is key for scaling to high-dimensional control in large state spaces. We further show that existing hierarchical and bootstrapping-based approaches correspond to specific design choices within our derivation. Across a comprehensive suite of state- and pixel-based locomotion and manipulation benchmarks, our method matches or surpasses state-of-the-art offline GCRL algorithms and scales to complex, long-horizon tasks where prior approaches fail. Project page: https://johnlyzhou.github.io/saw/
comment: NeurIPS 2025 (Spotlight, top 3.2%)
Iterative Tuning of Nonlinear Model Predictive Control for Robotic Manufacturing Tasks
Manufacturing processes are often perturbed by drifts in the environment and wear in the system, requiring control re-tuning even in the presence of repetitive operations. This paper presents an iterative learning framework for automatic tuning of Nonlinear Model Predictive Control (NMPC) weighting matrices based on task-level performance feedback. Inspired by norm-optimal Iterative Learning Control (ILC), the proposed method adaptively adjusts NMPC weights Q and R across task repetitions to minimize key performance indicators (KPIs) related to tracking accuracy, control effort, and saturation. Unlike gradient-based approaches that require differentiating through the NMPC solver, we construct an empirical sensitivity matrix, enabling structured weight updates without analytic derivatives. The framework is validated through simulation on a UR10e robot performing carbon fiber winding on a tetrahedral core. Results demonstrate that the proposed approach converges to near-optimal tracking performance (RMSE within 0.3% of offline Bayesian Optimization (BO)) in just 4 online repetitions, compared to 100 offline evaluations required by BO algorithm. The method offers a practical solution for adaptive NMPC tuning in repetitive robotic tasks, combining the precision of carefully optimized controllers with the flexibility of online adaptation.
Digital Twin based Automatic Reconfiguration of Robotic Systems in Smart Environments SC2
Robotic systems have become integral to smart environments, enabling applications ranging from urban surveillance and automated agriculture to industrial automation. However, their effective operation in dynamic settings - such as smart cities and precision farming - is challenged by continuously evolving topographies and environmental conditions. Traditional control systems often struggle to adapt quickly, leading to inefficiencies or operational failures. To address this limitation, we propose a novel framework for autonomous and dynamic reconfiguration of robotic controllers using Digital Twin technology. Our approach leverages a virtual replica of the robot's operational environment to simulate and optimize movement trajectories in response to real-world changes. By recalculating paths and control parameters in the Digital Twin and deploying the updated code to the physical robot, our method ensures rapid and reliable adaptation without manual intervention. This work advances the integration of Digital Twins in robotics, offering a scalable solution for enhancing autonomy in smart, dynamic environments.
comment: Accepted for presentation to 11th IEEE International Smart Cities Conference (ISC2 2025)
Tackling the Kidnapped Robot Problem via Sparse Feasible Hypothesis Sampling and Reliable Batched Multi-Stage Inference
This paper addresses the Kidnapped Robot Problem (KRP), a core localization challenge of relocalizing a robot in a known map without prior pose estimate when localization loss or at SLAM initialization. For this purpose, a passive 2-D global relocalization framework is proposed. It estimates the global pose efficiently and reliably from a single LiDAR scan and an occupancy grid map while the robot remains stationary, thereby enhancing the long-term autonomy of mobile robots. The proposed framework casts global relocalization as a non-convex problem and solves it via the multi-hypothesis scheme with batched multi-stage inference and early termination, balancing completeness and efficiency. The Rapidly-exploring Random Tree (RRT), under traversability constraints, asymptotically covers the reachable space to generate sparse, uniformly distributed feasible positional hypotheses, fundamentally reducing the sampling space. The hypotheses are preliminarily ordered by the proposed Scan Mean Absolute Difference (SMAD), a coarse beam-error level metric that facilitates the early termination by prioritizing high-likelihood candidates. The SMAD computation is optimized for non-panoramic scans. The Translation-Affinity Scan-to-Map Alignment Metric (TAM) is proposed for reliable orientation selection at hypothesized positions and accurate final pose evaluation to mitigate degradation in conventional likelihood-field metrics under translational uncertainty induced by sparse hypotheses, as well as non-panoramic LiDAR scan and environmental changes. Real-world experiments on a resource-constrained mobile robot with non-panoramic LiDAR scans show that the proposed framework achieves competitive performance in both global relocalization success rate and computational efficiency.
comment: 10 pages, 8 figures. This work has been submitted to the IEEE for possible publication
NeRF-VIO: Map-Based Visual-Inertial Odometry with Initialization Leveraging Neural Radiance Fields
A prior map serves as a foundational reference for localization in context-aware applications such as augmented reality (AR). Providing valuable contextual information about the environment, the prior map is a vital tool for mitigating drift. In this paper, we propose a map-based visual-inertial localization algorithm (NeRF-VIO) with initialization using neural radiance fields (NeRF). Our algorithm utilizes a multilayer perceptron model and redefines the loss function as the geodesic distance on \(SE(3)\), ensuring the invariance of the initialization model under a frame change within \(\mathfrak{se}(3)\). The evaluation demonstrates that our model outperforms existing NeRF-based initialization solution in both accuracy and efficiency. By integrating a two-stage update mechanism within a multi-state constraint Kalman filter (MSCKF) framework, the state of NeRF-VIO is constrained by both captured images from an onboard camera and rendered images from a pre-trained NeRF model. The proposed algorithm is validated using a real-world AR dataset, the results indicate that our two-stage update pipeline outperforms MSCKF across all data sequences.
PLK-Calib: Single-shot and Target-less LiDAR-Camera Extrinsic Calibration using Plücker Lines
Accurate LiDAR-Camera (LC) calibration is challenging but crucial for autonomous systems and robotics. In this paper, we propose two single-shot and target-less algorithms to estimate the calibration parameters between LiDAR and camera using line features. The first algorithm constructs line-to-line constraints by defining points-to-line projection errors and minimizes the projection error. The second algorithm (PLK-Calib) utilizes the co-perpendicular and co-parallel geometric properties of lines in Plücker (PLK) coordinate, and decouples the rotation and translation into two constraints, enabling more accurate estimates. Our degenerate analysis and Monte Carlo simulation indicate that three nonparallel line pairs are the minimal requirements to estimate the extrinsic parameters. Furthermore, we collect an LC calibration dataset with varying extrinsic under three different scenarios and use it to evaluate the performance of our proposed algorithms.
Towards Balanced Behavior Cloning from Imbalanced Datasets
Robots should be able to learn complex behaviors from human demonstrations. In practice, these human-provided datasets are inevitably imbalanced: i.e., the human demonstrates some subtasks more frequently than others. State-of-the-art methods default to treating each element of the human's dataset as equally important. So if -- for instance -- the majority of the human's data focuses on reaching a goal, and only a few state-action pairs move to avoid an obstacle, the learning algorithm will place greater emphasis on goal reaching. More generally, misalignment between the relative amounts of data and the importance of that data causes fundamental problems for imitation learning approaches. In this paper we analyze and develop learning methods that automatically account for mixed datasets. We formally prove that imbalanced data leads to imbalanced policies when each state-action pair is weighted equally; these policies emulate the most represented behaviors, and not the human's complex, multi-task demonstrations. We next explore algorithms that rebalance offline datasets (i.e., reweight the importance of different state-action pairs) without human oversight. Reweighting the dataset can enhance the overall policy performance. However, there is no free lunch: each method for autonomously rebalancing brings its own pros and cons. We formulate these advantages and disadvantages, helping other researchers identify when each type of approach is most appropriate. We conclude by introducing a novel meta-gradient rebalancing algorithm that addresses the primary limitations behind existing approaches. Our experiments show that dataset rebalancing leads to better downstream learning, improving the performance of general imitation learning algorithms without requiring additional data collection. See our project website: https://collab.me.vt.edu/data_curation/.
Multiagent Systems
Priority-Aware Multi-Robot Coverage Path Planning
Multi-robot systems are widely used for coverage tasks that require efficient coordination across large environments. In Multi-Robot Coverage Path Planning (MCPP), the objective is typically to minimize the makespan by generating non-overlapping paths for full-area coverage. However, most existing methods assume uniform importance across regions, limiting their effectiveness in scenarios where some zones require faster attention. We introduce the Priority-Aware MCPP (PA-MCPP) problem, where a subset of the environment is designated as prioritized zones with associated weights. The goal is to minimize, in lexicographic order, the total priority-weighted latency of zone coverage and the overall makespan. To address this, we propose a scalable two-phase framework combining (1) greedy zone assignment with local search, spanning-tree-based path planning, and (2) Steiner-tree-guided residual coverage. Experiments across diverse scenarios demonstrate that our method significantly reduces priority-weighted latency compared to standard MCPP baselines, while maintaining competitive makespan. Sensitivity analyses further show that the method scales well with the number of robots and that zone coverage behavior can be effectively controlled by adjusting priority weights.
comment: IEEE Robotics and Automation Letters, 8 pages, 10 figures
Consistent Opponent Modeling in Imperfect-Information Games
The goal of agents in multi-agent environments is to maximize total reward against the opposing agents that are encountered. Following a game-theoretic solution concept, such as Nash equilibrium, may obtain a strong performance in some settings; however, such approaches fail to capitalize on historical and observed data from repeated interactions against our opponents. Opponent modeling algorithms integrate machine learning techniques to exploit suboptimal opponents utilizing available data; however, the effectiveness of such approaches in imperfect-information games to date is quite limited. We show that existing opponent modeling approaches fail to satisfy a simple desirable property even against static opponents drawn from a known prior distribution; namely, they do not guarantee that the model approaches the opponent's true strategy even in the limit as the number of game iterations approaches infinity. We develop a new algorithm that is able to achieve this property and runs efficiently by solving a convex minimization problem based on the sequence-form game representation using projected gradient descent. The algorithm is guaranteed to efficiently converge to the opponent's true strategy under standard Bayesian identifiability and visitation assumptions, given observations from gameplay and possibly additional historical data if it is available.
Computing Evolutionarily Stable Strategies in Multiplayer Games
We present an algorithm for computing all evolutionarily stable strategies in nondegenerate normal-form games with three or more players.
comment: Reverting to original title after fixing Google scholar merge
Systems and Control (EESS)
A formal theory on problem space as a semantic world model in systems engineering
Classic problem-space theory models problem solving as a navigation through a structured space of states, operators, goals, and constraints. Systems Engineering (SE) employs analogous constructs (functional analysis, operational analysis, scenarios, trade studies), yet still lacks a rigorous systems-theoretic representation of the problem space itself. In current practice, reasoning often proceeds directly from stakeholder goals to prescriptive artifacts. This makes foundational assumptions about the operational environment, admissible interactions, and contextual conditions implicit or prematurely embedded in architectures or requirements. This paper addresses that gap by formalizing the problem space as an explicit semantic world model containing theoretical constructs that are defined prior to requirements and solution commitments. These constructs along with the developed axioms, theorems and corollary establish a rigorous criterion for unambiguous boundary semantics, context-dependent interaction traceability to successful stakeholder goal satisfaction, and sufficiency of problem-space specification over which disciplined reasoning can occur independent of solution design. It offers a clear distinction between what is true of the problem domain and what is chosen as a solution. The paper concludes by discussing the significance of the theory on practitioners and provides a dialogue-based hypothetical case study between a stakeholder and an engineer, demonstrating how the theory guides problem framing before designing any prescriptive artifacts.
comment: Submitted to Wiley Systems Engineering Journal
Stochastic Actor-Critic: Mitigating Overestimation via Temporal Aleatoric Uncertainty
Off-policy actor-critic methods in reinforcement learning train a critic with temporal-difference updates and use it as a learning signal for the policy (actor). This design typically achieves higher sample efficiency than purely on-policy methods. However, critic networks tend to overestimate value estimates systematically. This is often addressed by introducing a pessimistic bias based on uncertainty estimates. Current methods employ ensembling to quantify the critic's epistemic uncertainty-uncertainty due to limited data and model ambiguity-to scale pessimistic updates. In this work, we propose a new algorithm called Stochastic Actor-Critic (STAC) that incorporates temporal (one-step) aleatoric uncertainty-uncertainty arising from stochastic transitions, rewards, and policy-induced variability in Bellman targets-to scale pessimistic bias in temporal-difference updates, rather than relying on epistemic uncertainty. STAC uses a single distributional critic network to model the temporal return uncertainty, and applies dropout to both the critic and actor networks for regularization. Our results show that pessimism based on a distributional critic alone suffices to mitigate overestimation, and naturally leads to risk-averse behavior in stochastic environments. Introducing dropout further improves training stability and performance by means of regularization. With this design, STAC achieves improved computational efficiency using a single distributional critic network.
comment: 19 pages
ARISE: Adaptive Reinforcement Integrated with Swarm Exploration SC 2026
Effective exploration remains a key challenge in RL, especially with non-stationary rewards or high-dimensional policies. We introduce ARISE, a lightweight framework that enhances reinforcement learning by augmenting standard policy-gradient methods with a compact swarm-based exploration layer. ARISE blends policy actions with particle-driven proposals, where each particle represents a candidate policy trajectory sampled in the action space, and modulates exploration adaptively using reward-variance cues. While easy benchmarks exhibit only slight improvements (e.g., +0.7% on CartPole-v1), ARISE yields substantial gains on more challenging tasks, including +46% on LunarLander-v3 and +22% on Hopper-v4, while preserving stability on Walker2d and Ant. Under non-stationary reward shifts, ARISE provides marked robustness advantages, outperforming PPO by +75 points on CartPole and improving LunarLander accordingly. Ablation studies confirm that both the swarm component and the adaptive mechanism contribute to the performance. Overall, ARISE offers a simple, architecture-agnostic route to more exploratory and resilient RL agents without altering core algorithmic structures.
comment: 12 pages. Accepted for presentation at WCSC 2026
From 2D to 3D terrain-following area coverage path planning
An algorithm for 3D terrain-following area coverage path planning is presented. Multiple adjacent paths are generated that are (i) locally apart from each other by a distance equal to the working width of a machinery, while (ii) simultaneously floating at a projection distance equal to a specific working height above the terrain. The complexities of the algorithm in comparison to its 2D equivalent are highlighted. These include uniformly spaced elevation data generation using an Inverse Distance Weighting-approach and a local search. Area coverage path planning results for real-world 3D data within an agricultural context are presented to validate the algorithm.
comment: 6 pages, 10 figures, 1 table
NMPC-Augmented Visual Navigation and Safe Learning Control for Large-Scale Mobile Robots
A large-scale mobile robot (LSMR) is a high-order multibody system that often operates on loose, unconsolidated terrain, which reduces traction. This paper presents a comprehensive navigation and control framework for an LSMR that ensures stability and safety-defined performance, delivering robust operation on slip-prone terrain by jointly leveraging high-performance techniques. The proposed architecture comprises four main modules: (1) a visual pose-estimation module that fuses onboard sensors and stereo cameras to provide an accurate, low-latency robot pose, (2) a high-level nonlinear model predictive control that updates the wheel motion commands to correct robot drift from the robot reference pose on slip-prone terrain, (3) a low-level deep neural network control policy that approximates the complex behavior of the wheel-driven actuation mechanism in LSMRs, augmented with robust adaptive control to handle out-of-distribution disturbances, ensuring that the wheels accurately track the updated commands issued by high-level control module, and (4) a logarithmic safety module to monitor the entire robot stack and guarantees safe operation. The proposed low-level control framework guarantees uniform exponential stability of the actuation subsystem, while the safety module ensures the whole system-level safety during operation. Comparative experiments on a 6,000 kg LSMR actuated by two complex electro-hydrostatic drives, while synchronizing modules operating at different frequencies.
Stability Verification for Switched Systems using Neural Multiple Lyapunov Functions
Stability analysis of switched systems, characterized by multiple operational modes and switching signals, is challenging due to their nonlinear dynamics. While frameworks such as multiple Lyapunov functions (MLF) provide a foundation for analysis, their computational applicability is limited for systems without favorable structure. This paper investigates stability analysis for switched systems under state-dependent switching conditions. We propose neural multiple Lyapunov functions (NMLF), a unified framework that combines the theoretical guarantees of MLF with the computational efficiency of neural Lyapunov functions (NLF). Our approach leverages a set of tailored loss functions and a counter-example guided inductive synthesis (CEGIS) scheme to train neural networks that rigorously satisfy MLF conditions. Through comprehensive simulations and theoretical analysis, we demonstrate NMLF's effectiveness and its potential for practical deployment in complex switched systems.
Cyberscurity Threats and Defense Mechanisms in IoT network
The rapid proliferation of Internet of Things (IoT) technologies, projected to exceed 30 billion interconnected devices by 2030, has significantly escalated the complexity of cybersecurity challenges. This survey aims to provide a comprehensive analysis of vulnerabilities, threats, and defense mechanisms, specifically focusing on the integration of network and application layers within real-time monitoring and decision-making systems. Employing an integrative review methodology, 59 scholarly articles published between 2009 and 2024 were selected from databases such as IEEE Xplore, ScienceDirect, and PubMed, utilizing keywords related to IoT vulnerabilities and security attacks. Key findings identify critical threat categories, including sensor vulnerabilities, Denial-of-Service (DoS) attacks, and public cloud insecurity. Conversely, the study highlights advanced defense approaches leveraging Artificial Intelligence (AI) for anomaly detection, Blockchain for decentralized trust, and Zero Trust Architecture (ZTA) for continuous verification. This paper contributes a novel five-layer IoT model and outlines future research directions involving quantum computing and 6G networks to bolster IoT ecosystem resilience.
Optimal Transport-Based Decentralized Multi-Agent Distribution Matching
This paper presents a decentralized control framework for distribution matching in multi-agent systems (MAS), where agents collectively achieve a prescribed terminal spatial distribution. The problem is formulated using optimal transport (Wasserstein distance), which provides a principled measure of distributional discrepancy and serves as the basis for the control design. To avoid solving the global optimal transport problem directly, the distribution-matching objective is reformulated into a tractable per-agent decision process, enabling each agent to identify its desired terminal locations using only locally available information. A sequential weight-update rule is introduced to construct feasible local transport plans, and a memory-based correction mechanism is incorporated to maintain reliable operation under intermittent and range-limited communication. Convergence guarantees are established, showing cycle-wise improvement of a surrogate transport cost under both linear and nonlinear agent dynamics. Simulation results demonstrate that the proposed framework achieves effective and scalable distribution matching while operating fully in a decentralized manner.
Probability-Aware Parking Selection
Current parking navigation systems often underestimate total travel time by failing to account for the time spent searching for a parking space, which significantly affects user experience, mode choice, congestion, and emissions. To address this issue, this paper introduces the probability-aware parking selection problem, which aims to direct drivers to the best parking location rather than straight to their destination. An adaptable dynamic programming framework is proposed for decision-making based on probabilistic information about parking availability at the parking lot level. Closed-form analysis determines when it is optimal to target a specific parking lot or explore alternatives, as well as the expected time cost. Sensitivity analysis and three illustrative cases are examined, demonstrating the model's ability to account for the dynamic nature of parking availability. Acknowledging the financial costs of permanent sensing infrastructure, the paper provides analytical and empirical assessments of errors incurred when leveraging stochastic observations to estimate parking availability. Experiments with real-world data from the US city of Seattle indicate this approach's viability, with mean absolute error decreasing from 7% to below 2% as observation frequency grows. In data-based simulations, probability-aware strategies demonstrate time savings up to 66% relative to probability-unaware baselines, yet still take up to 123% longer than direct-to-destination estimates.
comment: 10 pages, 6 figures, 3 tables. To be published in IEEE Transactions on Intelligent Transportation Systems
Simulations of MRI Guided and Powered Ferric Applicators for Tetherless Delivery of Therapeutic Interventions
Magnetic Resonance Imaging (MRI) is a well-established modality for pre-operative planning and is also explored for intra-operative guidance of procedures such as intravascular interventions. Among the experimental robot-assisted technologies, the magnetic field gradients of the MRI scanner are used to power and maneuver ferromagnetic applicators for accessing sites in the patient's body via the vascular network. In this work, we propose a computational platform for preoperative planning and modeling of MRI-powered applicators inside blood vessels. This platform was implemented as a two-way data and command pipeline that links the MRI scanner, the computational core, and the operator. The platform first processes multi-slice MR data to extract the vascular bed and then fits a virtual corridor inside the vessel. This corridor serves as a virtual fixture (VF), a forbidden region for the applicators to avoid vessel perforation or collision. The geometric features of the vessel centerline, the VF, and MRI safety compliance (dB/dt, max available gradient) are then used to generate magnetic field gradient waveforms. Different blood flow profiles can be user-selected, and those parameters are used for modeling the applicator's maneuvering. The modeling module further generates cues about whether the selected vascular path can be safely maneuvered. Given future experimental studies that require a real-time operation, the platform was implemented on the Qt framework (C/C++) with software modules performing specific tasks running on dedicated threads: PID controller, generation of VF, generation of MR gradient waveforms.
comment: 9 pages, 8 figures, published in ICBBB 2022
KAN-Therm: A Lightweight Battery Thermal Model Using Kolmogorov-Arnold Network
A battery management system (BMS) relies on real-time estimation of battery temperature distribution in battery cells to ensure safe and optimal operation of Lithium-ion batteries. However, physical BMS often suffers from memory and computational resource limitations required by high-fidelity models. Temperature estimation of batteries for safety-critical systems using physics-based models on physical BMS can potentially become challenging due to their higher computational time. In contrast, neural network based approaches offer faster estimation but require greater memory overhead. To address these challenges, we propose Kolmogorov-Arnold network (KAN) based thermal model, KAN-therm, to estimate the core temperature of a cylindrical battery. Unlike traditional neural network architectures, KAN uses learnable nonlinear activation functions that can effectively capture system complexity using relatively lean models. We have compared the memory overhead and estimation time of our model with state-of-the-art neural network models to demonstrate the applicability and potential scalability of KAN-therm on a physical BMS.
comment: 15 pages, 9 figures
Adaptive Learning Guided by Bias-Noise-Alignment Diagnostics
Learning systems deployed in nonstationary and safety-critical environments often suffer from instability, slow convergence, or brittle adaptation when learning dynamics evolve over time. While modern optimization, reinforcement learning, and meta-learning methods adapt to gradient statistics, they largely ignore the temporal structure of the error signal itself. This paper proposes a diagnostic-driven adaptive learning framework that explicitly models error evolution through a principled decomposition into bias, capturing persistent drift; noise, capturing stochastic variability; and alignment, capturing repeated directional excitation leading to overshoot. These diagnostics are computed online from lightweight statistics of loss or temporal-difference (TD) error trajectories and are independent of model architecture or task domain. We show that the proposed bias-noise-alignment decomposition provides a unifying control backbone for supervised optimization, actor-critic reinforcement learning, and learned optimizers. Within this framework, we introduce three diagnostic-driven instantiations: the Human-inspired Supervised Adaptive Optimizer (HSAO), Hybrid Error-Diagnostic Reinforcement Learning (HED-RL) for actor-critic methods, and the Meta-Learned Learning Policy (MLLP). Under standard smoothness assumptions, we establish bounded effective updates and stability properties for all cases. Representative diagnostic illustrations in actor-critic learning highlight how the proposed signals modulate adaptation in response to TD error structure. Overall, this work elevates error evolution to a first-class object in adaptive learning and provides an interpretable, lightweight foundation for reliable learning in dynamic environments.
comment: This preprint focuses on the theoretical framework and diagnostic behavior. Comprehensive experimental validation in application-specific settings is deferred to a companion experimental study
Tannenbaum's gain-margin optimization meets Polyak's heavy-ball algorithm
This paper highlights an apparent, yet relatively unknown link between algorithm design in optimization theory and controller synthesis in robust control. Specifically, quadratic optimization can be recast as a regulation problem within the framework of $\mathcal{H}_\infty$ control. From this vantage point, the optimality of Polyak's fastest heavy-ball algorithm can be ascertained as a solution to a gain margin optimization problem. The approach is independent of Polyak's original and brilliant argument, and relies on foundational work by Tannenbaum, who introduced and solved gain margin optimization via Nevanlinna--Pick interpolation theory. The link between first-order optimization methods and robust control sheds new light on the limits of algorithmic performance of such methods, and suggests a framework where similar computational tasks can be systematically studied and algorithms optimized. In particular, it raises the question as to whether periodically scheduled algorithms can achieve faster rates for quadratic optimization, in a manner analogous to periodic control that extends the gain margin beyond that of time-invariant control. This turns out not to be the case, due to the analytic obstruction of a transmission zero that is inherent in causal schemes. Interestingly, this obstruction can be removed with implicit algorithms, cast as feedback regulation problems with causal, but not strictly causal dynamics, thereby devoid of the transmission zero at infinity and able to achieve superior convergence rates.
comment: 27 pages, 8 figures
Iterative Tuning of Nonlinear Model Predictive Control for Robotic Manufacturing Tasks
Manufacturing processes are often perturbed by drifts in the environment and wear in the system, requiring control re-tuning even in the presence of repetitive operations. This paper presents an iterative learning framework for automatic tuning of Nonlinear Model Predictive Control (NMPC) weighting matrices based on task-level performance feedback. Inspired by norm-optimal Iterative Learning Control (ILC), the proposed method adaptively adjusts NMPC weights Q and R across task repetitions to minimize key performance indicators (KPIs) related to tracking accuracy, control effort, and saturation. Unlike gradient-based approaches that require differentiating through the NMPC solver, we construct an empirical sensitivity matrix, enabling structured weight updates without analytic derivatives. The framework is validated through simulation on a UR10e robot performing carbon fiber winding on a tetrahedral core. Results demonstrate that the proposed approach converges to near-optimal tracking performance (RMSE within 0.3% of offline Bayesian Optimization (BO)) in just 4 online repetitions, compared to 100 offline evaluations required by BO algorithm. The method offers a practical solution for adaptive NMPC tuning in repetitive robotic tasks, combining the precision of carefully optimized controllers with the flexibility of online adaptation.
Digital Twin based Automatic Reconfiguration of Robotic Systems in Smart Environments SC2
Robotic systems have become integral to smart environments, enabling applications ranging from urban surveillance and automated agriculture to industrial automation. However, their effective operation in dynamic settings - such as smart cities and precision farming - is challenged by continuously evolving topographies and environmental conditions. Traditional control systems often struggle to adapt quickly, leading to inefficiencies or operational failures. To address this limitation, we propose a novel framework for autonomous and dynamic reconfiguration of robotic controllers using Digital Twin technology. Our approach leverages a virtual replica of the robot's operational environment to simulate and optimize movement trajectories in response to real-world changes. By recalculating paths and control parameters in the Digital Twin and deploying the updated code to the physical robot, our method ensures rapid and reliable adaptation without manual intervention. This work advances the integration of Digital Twins in robotics, offering a scalable solution for enhancing autonomy in smart, dynamic environments.
comment: Accepted for presentation to 11th IEEE International Smart Cities Conference (ISC2 2025)
SUSTAINABLE Platform: Seamless Smart Farming Integration Towards Agronomy Automation SC2
The global agricultural sector is undergoing a transformative shift, driven by increasing food demands, climate variability and the need for sustainable practices. SUSTAINABLE is a smart farming platform designed to integrate IoT, AI, satellite imaging, and role-based task orchestration to enable efficient, traceable, and sustainable agriculture with a pilot usecase in viticulture. This paper explores current smart agriculture solutions, presents a comparative evaluation, and introduces SUSTAINABLE's key features, including satellite index integration, real-time environmental data, and role-aware task management tailored to Mediterranean vineyards.
comment: Accepted for presentation to 11th IEEE International Smart Cities Conference (ISC2 2025)
Stretching Rubber, Not Budgets: Accurate Parking Utilization on a Shoestring
Effective parking management is essential for ensuring safety and convenience in master-planned communities, particularly in active adult neighborhoods experiencing rapid growth. Accurately assessing parking utilization is a crucial first step in planning for future demand, but data collection methods can be costly and labor-intensive. This paper presents a low-cost yet highly accurate methodology for measuring parking utilization using pneumatic road tubes connected to portable traffic counters from JAMAR Technologies, Inc. By integrating results from JAMAR's analysis tool with custom Python scripting, the methodology enables precise parking lot counts through automated parameter optimization and error correction. The system's efficiency allows for scalable deployment without significant manual observation, reducing both costs and disruptions to daily operations. Using Tellico Village as a case study, this effort demonstrates that community planners can obtain actionable parking insights on a limited budget, empowering them to make informed decisions about capacity expansion and facility scheduling.
Any-Time Regret-Guaranteed Algorithm for Control of Linear Quadratic Systems
We propose a computationally efficient algorithm that achieves anytime regret of order $\mathcal{O}(\sqrt{t})$, with explicit dependence on the system dimensions and on the solution of the Discrete Algebraic Riccati Equation (DARE). Our approach builds on the SDP-based framework of \cite{cohen2019learning}, using an appropriately tuned regularization and a sufficiently accurate initial estimate to construct confidence ellipsoids for control design. A carefully designed input-perturbation mechanism is incorporated to ensure anytime performance. We develop two variants of the algorithm. The first enforces a notion of strong sequential stability, requiring each policy to be stabilizing and successive policies to remain close. However, enforcing this notion results in a suboptimal regret scaling. The second removes the sequential-stability requirement and instead requires only that each generated policy be stabilizing. Closed-loop stability is then preserved through a dwell-time-inspired policy-update rule, adapting ideas from switched-systems control to carefully balance exploration and exploitation. This class of algorithms also addresses key shortcomings of most existing approaches including certainty-equivalence-based methods which typically guarantee stability only in the Lyapunov sense and lack explicit uniform high-probability bounds on the state trajectory expressed in system-theoretic terms. Our analysis explicitly characterizes the trade-off between state amplification and regret, and shows that partially relaxing the sequential-stability requirement yields optimal regret. Finally, our method eliminates the need for any a priori bound on the norm of the DARE solution, an assumption required by all existing computationally efficient optimism in the face of uncertainty (OFU) based algorithms, and thereby removes the reliance of regret guarantees on such external inputs.
Robotics
Space Debris Removal using Nano-Satellites controlled by Low-Power Autonomous Agents
Space debris is an ever-increasing problem in space travel. There are already many old, no longer functional spacecraft and debris orbiting the earth, which endanger both the safe operation of satellites and space travel. Small nano-satellite swarms can address this problem by autonomously de-orbiting debris safely into the Earth's atmosphere. This work builds on the recent advances of autonomous agents deployed in resource-constrained platforms and shows a first simplified approach how such intelligent and autonomous nano-satellite swarms can be realized. We implement our autonomous agent software on wireless microcontrollers and perform experiments on a specialized test-bed to show the feasibility and overall energy efficiency of our approach.
comment: This is an open-access, author-archived version of a manuscript published in European Conference on Multi-Agent Systems 2024
Efficient Prediction of Dense Visual Embeddings via Distillation and RGB-D Transformers IROS 2025
In domestic environments, robots require a comprehensive understanding of their surroundings to interact effectively and intuitively with untrained humans. In this paper, we propose DVEFormer - an efficient RGB-D Transformer-based approach that predicts dense text-aligned visual embeddings (DVE) via knowledge distillation. Instead of directly performing classical semantic segmentation with fixed predefined classes, our method uses teacher embeddings from Alpha-CLIP to guide our efficient student model DVEFormer in learning fine-grained pixel-wise embeddings. While this approach still enables classical semantic segmentation, e.g., via linear probing, it further enables flexible text-based querying and other applications, such as creating comprehensive 3D maps. Evaluations on common indoor datasets demonstrate that our approach achieves competitive performance while meeting real-time requirements, operating at 26.3 FPS for the full model and 77.0 FPS for a smaller variant on an NVIDIA Jetson AGX Orin. Additionally, we show qualitative results that highlight the effectiveness and possible use cases in real-world applications. Overall, our method serves as a drop-in replacement for traditional segmentation approaches while enabling flexible natural-language querying and seamless integration into 3D mapping pipelines for mobile robotics.
comment: Published in Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025)
Replaceable Bit-based Gripper for Picking Cluttered Food Items
The food packaging industry goes through changes in food items and their weights quite rapidly. These items range from easy-to-pick, single-piece food items to flexible, long and cluttered ones. We propose a replaceable bit-based gripper system to tackle the challenge of weight-based handling of cluttered food items. The gripper features specialized food attachments(bits) that enhance its grasping capabilities, and a belt replacement system allows switching between different food items during packaging operations. It offers a wide range of control options, enabling it to grasp and drop specific weights of granular, cluttered, and entangled foods. We specifically designed bits for two flexible food items that differ in shape: ikura(salmon roe) and spaghetti. They represent the challenging categories of sticky, granular food and long, sticky, cluttered food, respectively. The gripper successfully picked up both spaghetti and ikura and demonstrated weight-specific dropping of these items with an accuracy over 80% and 95% respectively. The gripper system also exhibited quick switching between different bits, leading to the handling of a large range of food items.
Pure Inertial Navigation in Challenging Environments with Wheeled and Chassis Mounted Inertial Sensors
Autonomous vehicles and wheeled robots are widely used in many applications in both indoor and outdoor settings. In practical situations with limited GNSS signals or degraded lighting conditions, the navigation solution may rely only on inertial sensors and as result drift in time due to errors in the inertial measurement. In this work, we propose WiCHINS, a wheeled and chassis inertial navigation system by combining wheel-mounted-inertial sensors with a chassis-mounted inertial sensor for accurate pure inertial navigation. To that end, we derive a three-stage framework, each with a dedicated extended Kalman filter. This framework utilizes the benefits of each location (wheel/body) during the estimation process. To evaluate our proposed approach, we employed a dataset with five inertial measurement units with a total recording time of 228.6 minutes. We compare our approach with four other inertial baselines and demonstrate an average position error of 11.4m, which is $2.4\%$ of the average traveled distance, using two wheels and one body inertial measurement units. As a consequence, our proposed method enables robust navigation in challenging environments and helps bridge the pure-inertial performance gap.
Vehicle Painting Robot Path Planning Using Hierarchical Optimization
In vehicle production factories, the vehicle painting process employs multiple robotic arms to simultaneously apply paint to car bodies advancing along a conveyor line. Designing paint paths for these robotic arms, which involves assigning car body areas to arms and determining paint sequences for each arm, remains a time-consuming manual task for engineers, indicating the demand for automation and design time reduction. The unique constraints of the painting process hinder the direct application of conventional robotic path planning techniques, such as those used in welding. Therefore, this paper formulates the design of paint paths as a hierarchical optimization problem, where the upper-layer subproblem resembles a vehicle routing problem (VRP), and the lower-layer subproblem involves detailed path planning. This approach allows the use of different optimization algorithms at each layer, and permits flexible handling of constraints specific to the vehicle painting process through the design of variable representation, constraints, repair operators, and an initialization process at the upper and lower layers. Experiments with three commercially available vehicle models demonstrated that the proposed method can automatically design paths that satisfy all constraints for vehicle painting with quality comparable to those created manually by engineers.
SLAP: Slapband-based Autonomous Perching Drone with Failure Recovery for Vertical Tree Trunks
Perching allows unmanned aerial vehicles (UAVs) to reduce energy consumption, remain anchored for surface sampling operations, or stably survey their surroundings. Previous efforts for perching on vertical surfaces have predominantly focused on lightweight mechanical design solutions with relatively scant system-level integration. Furthermore, perching strategies for vertical surfaces commonly require high-speed, aggressive landing operations that are dangerous for a surveyor drone with sensitive electronics onboard. This work presents the preliminary investigation of a perching approach suitable for larger drones that both gently perches on vertical tree trunks and reacts and recovers from perch failures. The system in this work, called SLAP, consists of vision-based perch site detector, an IMU (inertial-measurement-unit)-based perch failure detector, an attitude controller for soft perching, an optical close-range detection system, and a fast active elastic gripper with microspines made from commercially-available slapbands. We validated this approach on a modified 1.2 kg commercial quadrotor with component and system analysis. Initial human-in-the-loop autonomous indoor flight experiments achieved a 75% perch success rate on a real oak tree segment across 20 flights, and 100% perch failure recovery across 2 flights with induced failures.
comment: Paper accepted to IEEE Aerospace Conference 2026. This is a pre-print
Application Research of a Deep Learning Model Integrating CycleGAN and YOLO in PCB Infrared Defect Detection
This paper addresses the critical bottleneck of infrared (IR) data scarcity in Printed Circuit Board (PCB) defect detection by proposing a cross-modal data augmentation framework integrating CycleGAN and YOLOv8. Unlike conventional methods relying on paired supervision, we leverage CycleGAN to perform unpaired image-to-image translation, mapping abundant visible-light PCB images into the infrared domain. This generative process synthesizes high-fidelity pseudo-IR samples that preserve the structural semantics of defects while accurately simulating thermal distribution patterns. Subsequently, we construct a heterogeneous training strategy that fuses generated pseudo-IR data with limited real IR samples to train a lightweight YOLOv8 detector. Experimental results demonstrate that this method effectively enhances feature learning under low-data conditions. The augmented detector significantly outperforms models trained on limited real data alone and approaches the performance benchmarks of fully supervised training, proving the efficacy of pseudo-IR synthesis as a robust augmentation strategy for industrial inspection.
comment: 8 pages,8 figures
CropNeRF: A Neural Radiance Field-Based Framework for Crop Counting
Rigorous crop counting is crucial for effective agricultural management and informed intervention strategies. However, in outdoor field environments, partial occlusions combined with inherent ambiguity in distinguishing clustered crops from individual viewpoints poses an immense challenge for image-based segmentation methods. To address these problems, we introduce a novel crop counting framework designed for exact enumeration via 3D instance segmentation. Our approach utilizes 2D images captured from multiple viewpoints and associates independent instance masks for neural radiance field (NeRF) view synthesis. We introduce crop visibility and mask consistency scores, which are incorporated alongside 3D information from a NeRF model. This results in an effective segmentation of crop instances in 3D and highly-accurate crop counts. Furthermore, our method eliminates the dependence on crop-specific parameter tuning. We validate our framework on three agricultural datasets consisting of cotton bolls, apples, and pears, and demonstrate consistent counting performance despite major variations in crop color, shape, and size. A comparative analysis against the state of the art highlights superior performance on crop counting tasks. Lastly, we contribute a cotton plant dataset to advance further research on this topic.
comment: 8 pages, 10 figures, and 2 tables
SLEI3D: Simultaneous Exploration and Inspection via Heterogeneous Fleets under Limited Communication
Robotic fleets such as unmanned aerial and ground vehicles have been widely used for routine inspections of static environments, where the areas of interest are known and planned in advance. However, in many applications, such areas of interest are unknown and should be identified online during exploration. Thus, this paper considers the problem of simultaneous exploration, inspection of unknown environments and then real-time communication to a mobile ground control station to report the findings. The heterogeneous robots are equipped with different sensors, e.g., long-range lidars for fast exploration and close-range cameras for detailed inspection. Furthermore, global communication is often unavailable in such environments, where the robots can only communicate with each other via ad-hoc wireless networks when they are in close proximity and free of obstruction. This work proposes a novel planning and coordination framework (SLEI3D) that integrates the online strategies for collaborative 3D exploration, adaptive inspection and timely communication (via the intermit-tent or proactive protocols). To account for uncertainties w.r.t. the number and location of features, a multi-layer and multi-rate planning mechanism is developed for inter-and-intra robot subgroups, to actively meet and coordinate their local plans. The proposed framework is validated extensively via high-fidelity simulations of numerous large-scale missions with up to 48 robots and 384 thousand cubic meters. Hardware experiments of 7 robots are also conducted. Project website is available at https://junfengchen-robotics.github.io/SLEI3D/.
Symphony: A Heuristic Normalized Calibrated Advantage Actor and Critic Algorithm in application for Humanoid Robots
In our work we not explicitly hint that it is a misconception to think that humans learn fast. Learning process takes time. Babies start learning to move in the restricted liquid area called placenta. Children often are limited by underdeveloped body. Even adults are not allowed to participate in complex competitions right away. However, with robots, when learning from scratch, we often don't have the privilege of waiting for dozen millions of steps. "Swaddling" regularization is responsible for restraining an agent in rapid but unstable development penalizing action strength in a specific way not affecting actions directly. The Symphony, Transitional-policy Deterministic Actor and Critic algorithm, is a concise combination of different ideas for possibility of training humanoid robots from scratch with Sample Efficiency, Sample Proximity and Safety of Actions in mind. It is no secret that continuous increase in Gaussian noise without appropriate smoothing is harmful for motors and gearboxes. Compared to Stochastic algorithms, we set a limited parametric noise and promote a reduced strength of actions, safely increasing entropy, since the actions are kind of immersed in weaker noise. When actions require more extreme values, actions rise above the weak noise. Training becomes empirically much safer for both the environment around and the robot's mechanisms. We use Fading Replay Buffer: using a fixed formula containing the hyperbolic tangent, we adjust the batch sampling probability: the memory contains a recent memory and a long-term memory trail. Fading Replay Buffer allows us to use Temporal Advantage when we improve the current Critic Network prediction compared to the exponential moving average. Temporal Advantage allows us to update Actor and Critic in one pass, as well as combine Actor and Critic in one Object and implement their Losses in one line.
comment: https://github.com/SuspensionRailway/symphony
Unified Embodied VLM Reasoning with Robotic Action via Autoregressive Discretized Pre-training
General-purpose robotic systems operating in open-world environments must achieve both broad generalization and high-precision action execution, a combination that remains challenging for existing Vision-Language-Action (VLA) models. While large Vision-Language Models (VLMs) improve semantic generalization, insufficient embodied reasoning leads to brittle behavior, and conversely, strong reasoning alone is inadequate without precise control. To provide a decoupled and quantitative assessment of this bottleneck, we introduce Embodied Reasoning Intelligence Quotient (ERIQ), a large-scale embodied reasoning benchmark in robotic manipulation, comprising 6K+ question-answer pairs across four reasoning dimensions. By decoupling reasoning from execution, ERIQ enables systematic evaluation and reveals a strong positive correlation between embodied reasoning capability and end-to-end VLA generalization. To bridge the gap from reasoning to precise execution, we propose FACT, a flow-matching-based action tokenizer that converts continuous control into discrete sequences while preserving high-fidelity trajectory reconstruction. The resulting GenieReasoner jointly optimizes reasoning and action in a unified space, outperforming both continuous-action and prior discrete-action baselines in real-world tasks. Together, ERIQ and FACT provide a principled framework for diagnosing and overcoming the reasoning-precision trade-off, advancing robust, general-purpose robotic manipulation. Project page: https://geniereasoner.github.io/GenieReasoner/
MDE-AgriVLN: Agricultural Vision-and-Language Navigation with Monocular Depth Estimation
Agricultural robots are serving as powerful assistants across a wide range of agricultural tasks, nevertheless, still heavily relying on manual operations or railway systems for movement. The AgriVLN method and the A2A benchmark pioneeringly extended Vision-and-Language Navigation (VLN) to the agricultural domain, enabling a robot to navigate to a target position following a natural language instruction. Unlike human binocular vision, most agricultural robots are only given a single camera for monocular vision, which results in limited spatial perception. To bridge this gap, we present the method of Agricultural Vision-and-Language Navigation with Monocular Depth Estimation (MDE-AgriVLN), in which we propose the MDE module generating depth features from RGB images, to assist the decision-maker on multimodal reasoning. When evaluated on the A2A benchmark, our MDE-AgriVLN method successfully increases Success Rate from 0.23 to 0.32 and decreases Navigation Error from 4.43m to 4.08m, demonstrating the state-of-the-art performance in the agricultural VLN domain. Code: https://github.com/AlexTraveling/MDE-AgriVLN.
Efficient Multi-Task Scene Analysis with RGB-D Transformers IJCNN 2023
Scene analysis is essential for enabling autonomous systems, such as mobile robots, to operate in real-world environments. However, obtaining a comprehensive understanding of the scene requires solving multiple tasks, such as panoptic segmentation, instance orientation estimation, and scene classification. Solving these tasks given limited computing and battery capabilities on mobile platforms is challenging. To address this challenge, we introduce an efficient multi-task scene analysis approach, called EMSAFormer, that uses an RGB-D Transformer-based encoder to simultaneously perform the aforementioned tasks. Our approach builds upon the previously published EMSANet. However, we show that the dual CNN-based encoder of EMSANet can be replaced with a single Transformer-based encoder. To achieve this, we investigate how information from both RGB and depth data can be effectively incorporated in a single encoder. To accelerate inference on robotic hardware, we provide a custom NVIDIA TensorRT extension enabling highly optimization for our EMSAFormer approach. Through extensive experiments on the commonly used indoor datasets NYUv2, SUNRGB-D, and ScanNet, we show that our approach achieves state-of-the-art performance while still enabling inference with up to 39.1 FPS on an NVIDIA Jetson AGX Orin 32 GB.
comment: Published in Proc. International Joint Conference on Neural Networks (IJCNN 2023)
Video-Based Detection and Analysis of Errors in Robotic Surgical Training
Robot-assisted minimally invasive surgeries offer many advantages but require complex motor tasks that take surgeons years to master. There is currently a lack of knowledge on how surgeons acquire these robotic surgical skills. Toward bridging this gap, a previous study followed surgical residents learning complex surgical dry lab tasks on a surgical robot over six months. Errors are an important measure for training and skill evaluation, but unlike in virtual simulations, in dry lab training, errors are difficult to monitor automatically. Here, we analyzed errors in the ring tower transfer task, in which surgical residents moved a ring along a curved wire as quickly and accurately as possible. We developed an image-processing algorithm using color and size thresholds, optical flow and short time Fourier transforms to detect collision errors and achieved a detection accuracy of approximately 95%. Using the detected errors and task completion time, we found that the residents reduced their completion time and number of errors over the six months, while the percentage of task time spent making errors remained relatively constant on average. This analysis sheds light on the learning process of the residents and can serve as a step towards providing error-related feedback to robotic surgeons.
comment: Title change; 9 pages, 4 figures, 1 table. Alex Geftler and Ilana Nisky contributed equally to this work
World In Your Hands: A Large-Scale and Open-source Ecosystem for Learning Human-centric Manipulation in the Wild
Large-scale pre-training is fundamental for generalization in language and vision models, but data for dexterous hand manipulation remains limited in scale and diversity, hindering policy generalization. Limited scenario diversity, misaligned modalities, and insufficient benchmarking constrain current human manipulation datasets. To address these gaps, we introduce World In Your Hands (WiYH), a large-scale open-source ecosystem for human-centric manipulation learning. WiYH includes (1) the Oracle Suite, a wearable data collection kit with an auto-labeling pipeline for accurate motion capture; (2) the WiYH Dataset, featuring over 1,000 hours of multi-modal manipulation data across hundreds of skills in diverse real-world scenarios; and (3) extensive annotations and benchmarks supporting tasks from perception to action. Furthermore, experiments based on the WiYH ecosystem show that integrating WiYH's human-centric data significantly enhances the generalization and robustness of dexterous hand policies in tabletop manipulation tasks. We believe that World In Your Hands will bring new insights into human-centric data collection and policy learning to the community.
comment: This dataset represents the first large-scale collection of real-world, human-centric multimodal data integrating vision, language, tactile sensing, and action (VLTA)
AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous Driving
Recent advancements in large vision language models (VLMs) tailored for autonomous driving (AD) have shown strong scene understanding and reasoning capabilities, making them undeniable candidates for end-to-end driving systems. However, limited work exists on studying the trustworthiness of DriveVLMs -- a critical factor that directly impacts public transportation safety. In this paper, we introduce AutoTrust, a comprehensive trustworthiness benchmark for large vision-language models in autonomous driving (DriveVLMs), considering diverse perspectives -- including trustfulness, safety, robustness, privacy, and fairness. We constructed the largest visual question-answering dataset for investigating trustworthiness issues in driving scenarios, comprising over 10k unique scenes and 18k queries. We evaluated six publicly available VLMs, spanning from generalist to specialist, from open-source to commercial models. Our exhaustive evaluations have unveiled previously undiscovered vulnerabilities of DriveVLMs to trustworthiness threats. Specifically, we found that the general VLMs like LLaVA-v1.6 and GPT-4o-mini surprisingly outperform specialized models fine-tuned for driving in terms of overall trustworthiness. DriveVLMs like DriveLM-Agent are particularly vulnerable to disclosing sensitive information. Additionally, both generalist and specialist VLMs remain susceptible to adversarial attacks and struggle to ensure unbiased decision-making across diverse environments and populations. Our findings call for immediate and decisive action to address the trustworthiness of DriveVLMs -- an issue of critical importance to public safety and the welfare of all citizens relying on autonomous transportation systems. We release all the codes and datasets in https://github.com/taco-group/AutoTrust.
comment: Published at TMLR 2025
Multiagent Systems
Mapping Human Anti-collusion Mechanisms to Multi-agent AI
As multi-agent AI systems become increasingly autonomous, evidence shows they can develop collusive strategies similar to those long observed in human markets and institutions. While human domains have accumulated centuries of anti-collusion mechanisms, it remains unclear how these can be adapted to AI settings. This paper addresses that gap by (i) developing a taxonomy of human anti-collusion mechanisms, including sanctions, leniency & whistleblowing, monitoring & auditing, market design, and governance and (ii) mapping them to potential interventions for multi-agent AI systems. For each mechanism, we propose implementation approaches. We also highlight open challenges, such as the attribution problem (difficulty attributing emergent coordination to specific agents) identity fluidity (agents being easily forked or modified) the boundary problem (distinguishing beneficial cooperation from harmful collusion) and adversarial adaptation (agents learning to evade detection).
Bio-inspired Agentic Self-healing Framework for Resilient Distributed Computing Continuum Systems
Human biological systems sustain life through extraordinary resilience, continually detecting damage, orchestrating targeted responses, and restoring function through self-healing. Inspired by these capabilities, this paper introduces ReCiSt, a bio-inspired agentic self-healing framework designed to achieve resilience in Distributed Computing Continuum Systems (DCCS). Modern DCCS integrate heterogeneous computing resources, ranging from resource-constrained IoT devices to high-performance cloud infrastructures, and their inherent complexity, mobility, and dynamic operating conditions expose them to frequent faults that disrupt service continuity. These challenges underscore the need for scalable, adaptive, and self-regulated resilience strategies. ReCiSt reconstructs the biological phases of Hemostasis, Inflammation, Proliferation, and Remodeling into the computational layers Containment, Diagnosis, Meta-Cognitive, and Knowledge for DCCS. These four layers perform autonomous fault isolation, causal diagnosis, adaptive recovery, and long-term knowledge consolidation through Language Model (LM)-powered agents. These agents interpret heterogeneous logs, infer root causes, refine reasoning pathways, and reconfigure resources with minimal human intervention. The proposed ReCiSt framework is evaluated on public fault datasets using multiple LMs, and no baseline comparison is included due to the scarcity of similar approaches. Nevertheless, our results, evaluated under different LMs, confirm ReCiSt's self-healing capabilities within tens of seconds with minimum of 10% of agent CPU usage. Our results also demonstrated depth of analysis to over come uncertainties and amount of micro-agents invoked to achieve resilience.
Offline Multi-Agent Reinforcement Learning for 6G Communications: Fundamentals, Applications and Future Directions
The next-generation wireless technologies, including beyond 5G and 6G networks, are paving the way for transformative applications such as vehicle platooning, smart cities, and remote surgery. These innovations are driven by a vast array of interconnected wireless entities, including IoT devices, access points, UAVs, and CAVs, which increase network complexity and demand more advanced decision-making algorithms. Artificial intelligence (AI) and machine learning (ML), especially reinforcement learning (RL), are key enablers for such networks, providing solutions to high-dimensional and complex challenges. However, as networks expand to multi-agent environments, traditional online RL approaches face cost, safety, and scalability limitations. Offline multi-agent reinforcement learning (MARL) offers a promising solution by utilizing pre-collected data, reducing the need for real-time interaction. This article introduces a novel offline MARL algorithm based on conservative Q-learning (CQL), ensuring safe and efficient training. We extend this with meta-learning to address dynamic environments and validate the approach through use cases in radio resource management and UAV networks. Our work highlights offline MARL's advantages, limitations, and future directions in wireless applications.
ClinicalReTrial: A Self-Evolving AI Agent for Clinical Trial Protocol Optimization
Clinical trial failure remains a central bottleneck in drug development, where minor protocol design flaws can irreversibly compromise outcomes despite promising therapeutics. Although cutting-edge AI methods achieve strong performance in predicting trial success, they are inherently reactive for merely diagnosing risk without offering actionable remedies once failure is anticipated. To fill this gap, this paper proposes ClinicalReTrial, a self-evolving AI agent framework that addresses this gap by casting clinical trial reasoning as an iterative protocol redesign problem. Our method integrates failure diagnosis, safety-aware modification, and candidate evaluation in a closed-loop, reward-driven optimization framework. Serving the outcome prediction model as a simulation environment, ClinicalReTrial enables low-cost evaluation of protocol modifications and provides dense reward signals for continuous self-improvement. To support efficient exploration, the framework maintains hierarchical memory that captures iteration-level feedback within trials and distills transferable redesign patterns across trials. Empirically, ClinicalReTrial improves 83.3% of trial protocols with a mean success probability gain of 5.7%, and retrospective case studies demonstrate strong alignment between the discovered redesign strategies and real-world clinical trial modifications.
Next Generation Intelligent Low-Altitude Economy Deployments: The O-RAN Perspective
Despite the growing interest in low-altitude economy (LAE) applications, including UAV-based logistics and emergency response, fundamental challenges remain in orchestrating such missions over complex, signal-constrained environments. These include the absence of real-time, resilient, and context-aware orchestration of aerial nodes with limited integration of artificial intelligence (AI) specialized for LAE missions. This paper introduces an open radio access network (O-RAN)-enabled LAE framework that leverages seamless coordination between the disaggregated RAN architecture, open interfaces, and RAN intelligent controllers (RICs) to facilitate closed-loop, AI-optimized, and mission-critical LAE operations. We evaluate the feasibility and performance of the proposed architecture via a semantic-aware rApp that acts as a terrain interpreter, offering semantic guidance to a reinforcement learning-enabled xApp, which performs real-time trajectory planning for LAE swarm nodes. We survey the capabilities of UAV testbeds that can be leveraged for LAE research, and present critical research challenges and standardization needs.
comment: This article has been accepted for publication in the IEEE Wireless Communications Magazine
μACP: A Formal Calculus for Expressive, Resource-Constrained Agent Communication AAMAS 2026
Agent communication remains a foundational problem in multi-agent systems: protocols such as FIPA-ACL guarantee semantic richness but are intractable for constrained environments, while lightweight IoT protocols achieve efficiency at the expense of expressiveness. This paper presents $μ$ACP, a formal calculus for expressive agent communication under explicit resource bounds. We formalize the Resource-Constrained Agent Communication (RCAC) model, prove that a minimal four-verb basis \textit{\{PING, TELL, ASK, OBSERVE\}} is suffices to encode finite-state FIPA protocols, and establish tight information-theoretic bounds on message complexity. We further show that $μ$ACP can implement standard consensus under partial synchrony and crash faults, yielding a constructive coordination framework for edge-native agents. Formal verification in TLA$^{+}$ (model checking) and Coq (mechanized invariants) establishes safety and boundedness, and supports liveness under modeled assumptions. Large-scale system simulations confirm ACP achieves a median end-to-end message latency of 34 ms (95th percentile 104 ms) at scale, outperforming prior agent and IoT protocols under severe resource constraints. The main contribution is a unified calculus that reconciles semantic expressiveness with provable efficiency, providing a rigorous foundation for the next generation of resource-constrained multi-agent systems.
comment: Accepted as a Full paper at AAMAS 2026
Bio-inspired Agentic Self-healing Framework for Resilient Distributed Computing Continuum Systems
Human biological systems sustain life through extraordinary resilience, continually detecting damage, orchestrating targeted responses, and restoring function through self-healing. Inspired by these capabilities, this paper introduces ReCiSt, a bio-inspired agentic self-healing framework designed to achieve resilience in Distributed Computing Continuum Systems (DCCS). Modern DCCS integrate heterogeneous computing resources, ranging from resource-constrained IoT devices to high-performance cloud infrastructures, and their inherent complexity, mobility, and dynamic operating conditions expose them to frequent faults that disrupt service continuity. These challenges underscore the need for scalable, adaptive, and self-regulated resilience strategies. ReCiSt reconstructs the biological phases of Hemostasis, Inflammation, Proliferation, and Remodeling into the computational layers Containment, Diagnosis, Meta-Cognitive, and Knowledge for DCCS. These four layers perform autonomous fault isolation, causal diagnosis, adaptive recovery, and long-term knowledge consolidation through Language Model (LM)-powered agents. These agents interpret heterogeneous logs, infer root causes, refine reasoning pathways, and reconfigure resources with minimal human intervention. The proposed ReCiSt framework is evaluated on public fault datasets using multiple LMs, and no baseline comparison is included due to the scarcity of similar approaches. Nevertheless, our results, evaluated under different LMs, confirm ReCiSt's self-healing capabilities within tens of seconds with minimum of 10% of agent CPU usage. Our results also demonstrated depth of analysis to over come uncertainties and amount of micro-agents invoked to achieve resilience.
Large-scale distributed synchronization systems, using a cancel-on-completion redundancy mechanism
We consider a class of multi-agent distributed synchronization systems, which are modeled as $n$ particles moving on the real line. This class generalizes the model of a multi-server queueing system, considered in [15], employing so-called cancel-on-completion (c.o.c.) redundancy mechanism, but is motivated by other applications as well. The model in [15] is a particle system, regulated at the left boundary point. The more general model of this paper is such that we allow regulation boundaries on either side, or both sides, or no regulation at all. We consider the mean-field asymptotic regime, when the number of particles $n$ and the job arrival rates go to infinity, while the job arrival rates per particle remain constant. The system state for a given $n$ is the empirical distribution of the particles' locations. The results include: the existence/uniqueness of fixed points of mean-field limits (ML), which describe the limiting dynamics of the system; conditions for the steady-state asymptotic independence (concentration of the stationary distribution on a single ML fixed point); the limits of the average velocity at which unregulated (free) particle system advances. In particular, our results for the left-regulated system unify and generalize the corresponding results in [15]. Our technical approach is such that the systems with different types of regulation are analyzed within a unified framework.
comment: 37 pages. Appendix added
Scaling Patterns in Adversarial Alignment: Evidence from Multi-LLM Jailbreak Experiments
Large language models (LLMs) increasingly operate in multi-agent and safety-critical settings, raising open questions about how their vulnerabilities scale when models interact adversarially. This study examines whether larger models can systematically jailbreak smaller ones - eliciting harmful or restricted behavior despite alignment safeguards. Using standardized adversarial tasks from JailbreakBench, we simulate over 6,000 multi-turn attacker-target exchanges across major LLM families and scales (0.6B-120B parameters), measuring both harm score and refusal behavior as indicators of adversarial potency and alignment integrity. Each interaction is evaluated through aggregated harm and refusal scores assigned by three independent LLM judges, providing a consistent, model-based measure of adversarial outcomes. Aggregating results across prompts, we find a strong and statistically significant correlation between mean harm and the logarithm of the attacker-to-target size ratio (Pearson r = 0.51, p < 0.001; Spearman rho = 0.52, p < 0.001), indicating that relative model size correlates with the likelihood and severity of harmful completions. Mean harm score variance is higher across attackers (0.18) than across targets (0.10), suggesting that attacker-side behavioral diversity contributes more to adversarial outcomes than target susceptibility. Attacker refusal frequency is strongly and negatively correlated with harm (rho = -0.93, p < 0.001), showing that attacker-side alignment mitigates harmful responses. These findings reveal that size asymmetry influences robustness and provide exploratory evidence for adversarial scaling patterns, motivating more controlled investigations into inter-model alignment and safety.
Adaptive GPU Resource Allocation for Multi-Agent Collaborative Reasoning in Serverless Environments
Multi-agent systems powered by large language models have emerged as a promising paradigm for solving complex reasoning tasks through collaborative intelligence. However, efficiently deploying these systems on serverless GPU platforms presents significant resource allocation challenges due to heterogeneous agent workloads, varying computational demands, and the need for cost-effective scaling. This paper presents an adaptive GPU resource allocation framework that achieves 85% latency reduction compared to round-robin scheduling while maintaining comparable throughput to static allocation, using an O(N) complexity algorithm for real-time adaptation. Our approach dynamically allocates GPU resources based on workload characteristics, agent priorities, and minimum resource requirements, enabling efficient utilization while maintaining quality of service. The framework addresses three key challenges: (1) heterogeneous computational demands across lightweight coordinators and heavyweight specialists, (2) dynamic workload fluctuations requiring millisecond-scale reallocation, and (3) capacity constraints in serverless environments. Through comprehensive simulations modeling realistic multi-agent workflows with four heterogeneous agents, we demonstrate that adaptive allocation outperforms static equal and round-robin strategies across latency, cost, and GPU utilization metrics. The framework provides a practical solution for deploying cost-efficient multi-agent AI systems on serverless GPU infrastructure.
comment: 6 pages, 2 figures
Systems and Control (EESS)
Safety for Weakly-Hard Control Systems via Graph-Based Barrier Functions
Despite significant advancement in technology, communication and computational failures are still prevalent in safety-critical engineering applications. Often, networked control systems experience packet dropouts, leading to open-loop behavior that significantly affects the behavior of the system. Similarly, in real-time control applications, control tasks frequently experience computational overruns and thus occasionally no new actuator command is issued. This article addresses the safety verification and controller synthesis problem for a class of control systems subject to weakly-hard constraints, i.e., a set of window-based constraints where the number of failures are bounded within a given time horizon. The results are based on a new notion of graph-based barrier functions that are specifically tailored to the considered system class, offering a set of constraints whose satisfaction leads to safety guarantees despite communication failures. Subsequent reformulations of the safety constraints are proposed to alleviate conservatism and improve computational tractability, and the resulting trade-offs are discussed. Finally, several numerical case studies demonstrate the effectiveness of the proposed approach.
comment: Submitted to IEEE TAC
Can Optimal Transport Improve Federated Inverse Reinforcement Learning?
In robotics and multi-agent systems, fleets of autonomous agents often operate in subtly different environments while pursuing a common high-level objective. Directly pooling their data to learn a shared reward function is typically impractical due to differences in dynamics, privacy constraints, and limited communication bandwidth. This paper introduces an optimal transport-based approach to federated inverse reinforcement learning (IRL). Each client first performs lightweight Maximum Entropy IRL locally, adhering to its computational and privacy limitations. The resulting reward functions are then fused via a Wasserstein barycenter, which considers their underlying geometric structure. We further prove that this barycentric fusion yields a more faithful global reward estimate than conventional parameter averaging methods in federated learning. Overall, this work provides a principled and communication-efficient framework for deriving a shared reward that generalizes across heterogeneous agents and environments.
Impact Assessment of Heterogeneous Grid Support Functions in Smart Inverter Deployments
The decarbonization of the energy sector has led to a significant high penetration of distributed energy resources (DERs), particularly photovoltaic (PV) systems, in low-voltage (LV) distribution networks. To maintain grid stability, recent standards (e.g., IEEE 1547-2018) mandate DERs to provide grid-support functionalities through smart inverters (SIs), which typically operate autonomously based on local measurements. However, as DER penetration increases, uncoordinated control modes of SIs can lead to adverse interactions, compromising system efficiency, voltage regulation, and overall stability. While previous studies have demonstrated the benefits of coordinated inverter control and optimal dispatch strategies, the system-wide impacts of heterogeneous SI groups operating under different control modes remain largely unexamined. This paper addresses this gap by assessing the dynamic interactions among multiple SI groups with varying control strategies, namely: Constant Power Factor (CPF), Volt-VAR, and Volt-Watt modes. Furthermore, the analysis covers both resistive and inductive feeder types. The validation is performed using a real-time setup. The CIRGE low-voltage (LV) distribution network is simulated in the Opal-RT platform as the test network, enabling realistic and high-fidelity evaluation of SI control interactions under practical grid conditions.
Next Generation Intelligent Low-Altitude Economy Deployments: The O-RAN Perspective
Despite the growing interest in low-altitude economy (LAE) applications, including UAV-based logistics and emergency response, fundamental challenges remain in orchestrating such missions over complex, signal-constrained environments. These include the absence of real-time, resilient, and context-aware orchestration of aerial nodes with limited integration of artificial intelligence (AI) specialized for LAE missions. This paper introduces an open radio access network (O-RAN)-enabled LAE framework that leverages seamless coordination between the disaggregated RAN architecture, open interfaces, and RAN intelligent controllers (RICs) to facilitate closed-loop, AI-optimized, and mission-critical LAE operations. We evaluate the feasibility and performance of the proposed architecture via a semantic-aware rApp that acts as a terrain interpreter, offering semantic guidance to a reinforcement learning-enabled xApp, which performs real-time trajectory planning for LAE swarm nodes. We survey the capabilities of UAV testbeds that can be leveraged for LAE research, and present critical research challenges and standardization needs.
comment: This article has been accepted for publication in the IEEE Wireless Communications Magazine
Impact of Clustering on the Observability and Controllability of Complex Networks
The increasing complexity and interconnectedness of systems across various fields have led to a growing interest in studying complex networks, particularly Scale-Free (SF) networks, which best model real-world systems. This paper investigates the influence of clustering on the observability and controllability of complex SF networks, framing these characteristics in the context of structured systems theory. In this paper, we show that densely clustered networks require fewer driver and observer nodes due to better information propagation within clusters. This relationship is of interest for optimizing network design in applications such as social networks and intelligent transportation systems. We first quantify the network observability/controllability requirements, and then, through Monte-Carlo simulations and different case studies, we show how clustering affects these metrics. Our findings offer practical insights into reducing control and observer nodes for sensor/actuator placement, particularly in resource-constrained setups. This work contributes to the understanding of network observability/controllability and presents techniques for improving these features through alterations in network structure and clustering.
comment: Cluster Computing Journal
Battery-time-space fragment-based formulation for the Electric Autonomous Dial-a-Ride Problem
The Electric Autonomous Dial-A-Ride Problem (E-ADARP) optimizes routing and scheduling for electric autonomous vehicles to transport customers from origins to destinations. It features a combined objective that minimizes travel cost and excess user ride time, and allows partial recharging. Motivated by practical scenarios where time and battery data are available with limited precision, we introduce a discrete variant of the problem, termed D-E-ADARP, in which all time and battery parameters are discretized. This enables the development of our alternative solution approach: the discrete battery-time-space fragment-based formulation (BTSFF). In this framework, a fragment represents a subpath with an associated cost that accounts for both travel cost and excess user ride time. The BTSFF network integrates spatial, temporal, and battery dimensions, with the latter two discretized into finite indices. Computational results show that BTSFF solves D-E-ADARP significantly more efficiently than existing methods applied to the original E-ADARP. In addition, BTSFF efficiently provides high-quality lower bounds for E-ADARP and accelerates solving its battery swap variants. For E-ADARP, a relaxed network is constructed by rounding down travel times and battery consumptions, enabling a valid lower bound. For battery swap variants, BTSFF integrates lazy constraints via callbacks to correct time discretization errors, guaranteeing optimal solutions. Experiments show BTSFF outperforms benchmark methods in efficiency.
$α^3$-Bench: A Unified Benchmark of Safety, Robustness, and Efficiency for LLM-Based UAV Agents over 6G Networks
Large Language Models (LLMs) are increasingly used as high level controllers for autonomous Unmanned Aerial Vehicle (UAV) missions. However, existing evaluations rarely assess whether such agents remain safe, protocol compliant, and effective under realistic next generation networking constraints. This paper introduces $α^3$-Bench, a benchmark for evaluating LLM driven UAV autonomy as a multi turn conversational reasoning and control problem operating under dynamic 6G conditions. Each mission is formulated as a language mediated control loop between an LLM based UAV agent and a human operator, where decisions must satisfy strict schema validity, mission policies, speaker alternation, and safety constraints while adapting to fluctuating network slices, latency, jitter, packet loss, throughput, and edge load variations. To reflect modern agentic workflows, $α^3$-Bench integrates a dual action layer supporting both tool calls and agent to agent coordination, enabling evaluation of tool use consistency and multi agent interactions. We construct a large scale corpus of 113k conversational UAV episodes grounded in UAVBench scenarios and evaluate 17 state of the art LLMs using a fixed subset of 50 episodes per scenario under deterministic decoding. We propose a composite $α^3$ metric that unifies six pillars: Task Outcome, Safety Policy, Tool Consistency, Interaction Quality, Network Robustness, and Communication Cost, with efficiency normalized scores per second and per thousand tokens. Results show that while several models achieve high mission success and safety compliance, robustness and efficiency vary significantly under degraded 6G conditions, highlighting the need for network aware and resource efficient LLM based UAV agents. The dataset is publicly available on GitHub : https://github.com/maferrag/AlphaBench
comment: 20 pages
Sampled-data Systems: Stability, Contractivity and Single-iteration Suboptimal MPC
This paper analyzes the stability of interconnected continuous-time (CT) and discrete-time (DT) systems coupled through sampling and zero-order hold mechanisms. The DT system updates its output at regular intervals $T>0$ by applying an $n$-fold composition of a given map. This setup is motivated by online and sampled-data implementations of optimization-based controllers - particularly model predictive control (MPC) - where the DT system models $n$ iterations of an algorithm approximating the solution of an optimization problem. We introduce the concept of a reduced model, defined as the limiting behavior of the sampled-data system as $T \to 0^+$ and $n \to +\infty$. Our main theoretical contribution establishes that when the reduced model is contractive, there exists a threshold duration $T(n)$ for each iteration count $n$ such that the CT-DT interconnection achieves exponential stability for all sampling periods $T < T(n)$. Finally, under the stronger condition that both the CT and DT systems are contractive, we show exponential stability of their interconnection using a small-gain argument. Our theoretical results provide new insights into suboptimal MPC stability, showing that convergence guarantees hold even when using a single iteration of the optimization algorithm - a practically significant finding for real-time control applications.
comment: Modifications relative to version 2: revised paper, conditionally accepted in IEEE TAC
Robotics
SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time
We present SpaceTimePilot, a video diffusion model that disentangles space and time for controllable generative rendering. Given a monocular video, SpaceTimePilot can independently alter the camera viewpoint and the motion sequence within the generative process, re-rendering the scene for continuous and arbitrary exploration across space and time. To achieve this, we introduce an effective animation time-embedding mechanism in the diffusion process, allowing explicit control of the output video's motion sequence with respect to that of the source video. As no datasets provide paired videos of the same dynamic scene with continuous temporal variations, we propose a simple yet effective temporal-warping training scheme that repurposes existing multi-view datasets to mimic temporal differences. This strategy effectively supervises the model to learn temporal control and achieve robust space-time disentanglement. To further enhance the precision of dual control, we introduce two additional components: an improved camera-conditioning mechanism that allows altering the camera from the first frame, and CamxTime, the first synthetic space-and-time full-coverage rendering dataset that provides fully free space-time video trajectories within a scene. Joint training on the temporal-warping scheme and the CamxTime dataset yields more precise temporal control. We evaluate SpaceTimePilot on both real-world and synthetic data, demonstrating clear space-time disentanglement and strong results compared to prior work. Project page: https://zheninghuang.github.io/Space-Time-Pilot/ Code: https://github.com/ZheningHuang/spacetimepilot
comment: Project page: https://zheninghuang.github.io/Space-Time-Pilot/ Code: https://github.com/ZheningHuang/spacetimepilot
Coordinated Humanoid Manipulation with Choice Policies
Humanoid robots hold great promise for operating in human-centric environments, yet achieving robust whole-body coordination across the head, hands, and legs remains a major challenge. We present a system that combines a modular teleoperation interface with a scalable learning framework to address this problem. Our teleoperation design decomposes humanoid control into intuitive submodules, which include hand-eye coordination, grasp primitives, arm end-effector tracking, and locomotion. This modularity allows us to collect high-quality demonstrations efficiently. Building on this, we introduce Choice Policy, an imitation learning approach that generates multiple candidate actions and learns to score them. This architecture enables both fast inference and effective modeling of multimodal behaviors. We validate our approach on two real-world tasks: dishwasher loading and whole-body loco-manipulation for whiteboard wiping. Experiments show that Choice Policy significantly outperforms diffusion policies and standard behavior cloning. Furthermore, our results indicate that hand-eye coordination is critical for success in long-horizon tasks. Our work demonstrates a practical path toward scalable data collection and learning for coordinated humanoid manipulation in unstructured environments.
comment: Code and Website: https://choice-policy.github.io/
DarkEQA: Benchmarking Vision-Language Models for Embodied Question Answering in Low-Light Indoor Environments
Vision Language Models (VLMs) are increasingly adopted as central reasoning modules for embodied agents. Existing benchmarks evaluate their capabilities under ideal, well-lit conditions, yet robust 24/7 operation demands performance under a wide range of visual degradations, including low-light conditions at night or in dark environments--a core necessity that has been largely overlooked. To address this underexplored challenge, we present DarkEQA, an open-source benchmark for evaluating EQA-relevant perceptual primitives under multi-level low-light conditions. DarkEQA isolates the perception bottleneck by evaluating question answering from egocentric observations under controlled degradations, enabling attributable robustness analysis. A key design feature of DarkEQA is its physical fidelity: visual degradations are modeled in linear RAW space, simulating physics-based illumination drop and sensor noise followed by an ISP-inspired rendering pipeline. We demonstrate the utility of DarkEQA by evaluating a wide range of state-of-the-art VLMs and Low-Light Image Enhancement (LLIE) models. Our analysis systematically reveals VLMs' limitations when operating under these challenging visual conditions. Our code and benchmark dataset will be released upon acceptance.
comment: Submitted to IEEE Robotics and Automation Letters (RA-L)
Hierarchical Deformation Planning and Neural Tracking for DLOs in Constrained Environments
Deformable linear objects (DLOs) manipulation presents significant challenges due to DLOs' inherent high-dimensional state space and complex deformation dynamics. The wide-populated obstacles in realistic workspaces further complicate DLO manipulation, necessitating efficient deformation planning and robust deformation tracking. In this work, we propose a novel framework for DLO manipulation in constrained environments. This framework combines hierarchical deformation planning with neural tracking, ensuring reliable performance in both global deformation synthesis and local deformation tracking. Specifically, the deformation planner begins by generating a spatial path set that inherently satisfies the homotopic constraints associated with DLO keypoint paths. Next, a path-set-guided optimization method is applied to synthesize an optimal temporal deformation sequence for the DLO. In manipulation execution, a neural model predictive control approach, leveraging a data-driven deformation model, is designed to accurately track the planned DLO deformation sequence. The effectiveness of the proposed framework is validated in extensive constrained DLO manipulation tasks.
MSACL: Multi-Step Actor-Critic Learning with Lyapunov Certificates for Exponentially Stabilizing Control
Achieving provable stability in model-free reinforcement learning (RL) remains a challenge, particularly in balancing exploration with rigorous safety. This article introduces MSACL, a framework that integrates exponential stability theory with maximum entropy RL through multi-step Lyapunov certificate learning. Unlike methods relying on complex reward engineering, MSACL utilizes off-policy multi-step data to learn Lyapunov certificates satisfying theoretical stability conditions. By introducing Exponential Stability Labels (ESL) and a $λ$-weighted aggregation mechanism, the framework effectively balances the bias-variance trade-off in multi-step learning. Policy optimization is guided by a stability-aware advantage function, ensuring the learned policy promotes rapid Lyapunov descent. We evaluate MSACL across six benchmarks, including stabilization and nonlinear tracking tasks, demonstrating its superiority over state-of-the-art Lyapunov-based RL algorithms. MSACL achieves exponential stability and rapid convergence under simple rewards, while exhibiting significant robustness to uncertainties and generalization to unseen trajectories. Sensitivity analysis establishes the multi-step horizon $n=20$ as a robust default across diverse systems. By linking Lyapunov theory with off-policy actor-critic frameworks, MSACL provides a foundation for verifiably safe learning-based control. Source code and benchmark environments will be made publicly available.
VLN-MME: Diagnosing MLLMs as Language-guided Visual Navigation agents
Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities across a wide range of vision-language tasks. However, their performance as embodied agents, which requires multi-round dialogue spatial reasoning and sequential action prediction, needs further exploration. Our work investigates this potential in the context of Vision-and-Language Navigation (VLN) by introducing a unified and extensible evaluation framework to probe MLLMs as zero-shot agents by bridging traditional navigation datasets into a standardized benchmark, named VLN-MME. We simplify the evaluation with a highly modular and accessible design. This flexibility streamlines experiments, enabling structured comparisons and component-level ablations across diverse MLLM architectures, agent designs, and navigation tasks. Crucially, enabled by our framework, we observe that enhancing our baseline agent with Chain-of-Thought (CoT) reasoning and self-reflection leads to an unexpected performance decrease. This suggests MLLMs exhibit poor context awareness in embodied navigation tasks; although they can follow instructions and structure their output, their 3D spatial reasoning fidelity is low. VLN-MME lays the groundwork for systematic evaluation of general-purpose MLLMs in embodied navigation settings and reveals limitations in their sequential decision-making capabilities. We believe these findings offer crucial guidance for MLLM post-training as embodied agents.
ArtiSG: Functional 3D Scene Graph Construction via Human-demonstrated Articulated Objects Manipulation
3D scene graphs have empowered robots with semantic understanding for navigation and planning, yet they often lack the functional information required for physical manipulation, particularly regarding articulated objects. Existing approaches for inferring articulation mechanisms from static observations are prone to visual ambiguity, while methods that estimate parameters from state changes typically rely on constrained settings such as fixed cameras and unobstructed views. Furthermore, fine-grained functional elements like small handles are frequently missed by general object detectors. To bridge this gap, we present ArtiSG, a framework that constructs functional 3D scene graphs by encoding human demonstrations into structured robotic memory. Our approach leverages a robust articulation data collection pipeline utilizing a portable setup to accurately estimate 6-DoF articulation trajectories and axes even under camera ego-motion. We integrate these kinematic priors into a hierarchical and open-vocabulary graph while utilizing interaction data to discover inconspicuous functional elements missed by visual perception. Extensive real-world experiments demonstrate that ArtiSG significantly outperforms baselines in functional element recall and articulation estimation precision. Moreover, we show that the constructed graph serves as a reliable functional memory that effectively guides robots to perform language-directed manipulation tasks in real-world environments containing diverse articulated objects.
CropTrack: A Tracking with Re-Identification Framework for Precision Agriculture
Multiple-object tracking (MOT) in agricultural environments presents major challenges due to repetitive patterns, similar object appearances, sudden illumination changes, and frequent occlusions. Contemporary trackers in this domain rely on the motion of objects rather than appearance for association. Nevertheless, they struggle to maintain object identities when targets undergo frequent and strong occlusions. The high similarity of object appearances makes integrating appearance-based association nontrivial for agricultural scenarios. To solve this problem we propose CropTrack, a novel MOT framework based on the combination of appearance and motion information. CropTrack integrates a reranking-enhanced appearance association, a one-to-many association with appearance-based conflict resolution strategy, and an exponential moving average prototype feature bank to improve appearance-based association. Evaluated on publicly available agricultural MOT datasets, CropTrack demonstrates consistent identity preservation, outperforming traditional motion-based tracking methods. Compared to the state of the art, CropTrack achieves significant gains in identification F1 and association accuracy scores with a lower number of identity switches.
comment: 8 pages, 5 figures, and 3 tables
Explaining Why Things Go Where They Go: Interpretable Constructs of Human Organizational Preferences
Robotic systems for household object rearrangement often rely on latent preference models inferred from human demonstrations. While effective at prediction, these models offer limited insight into the interpretable factors that guide human decisions. We introduce an explicit formulation of object arrangement preferences along four interpretable constructs: spatial practicality (putting items where they naturally fit best in the space), habitual convenience (making frequently used items easy to reach), semantic coherence (placing items together if they are used for the same task or are contextually related), and commonsense appropriateness (putting things where people would usually expect to find them). To capture these constructs, we designed and validated a self-report questionnaire through a 63-participant online study. Results confirm the psychological distinctiveness of these constructs and their explanatory power across two scenarios (kitchen and living room). We demonstrate the utility of these constructs by integrating them into a Monte Carlo Tree Search (MCTS) planner and show that when guided by participant-derived preferences, our planner can generate reasonable arrangements that closely align with those generated by participants. This work contributes a compact, interpretable formulation of object arrangement preferences and a demonstration of how it can be operationalized for robot planning.
comment: Accepted to the 2026 ACM/IEEE International Conference on Human-Robot Interaction (HRI '26)
Dream2Flow: Bridging Video Generation and Open-World Manipulation with 3D Object Flow
Generative video modeling has emerged as a compelling tool to zero-shot reason about plausible physical interactions for open-world manipulation. Yet, it remains a challenge to translate such human-led motions into the low-level actions demanded by robotic systems. We observe that given an initial image and task instruction, these models excel at synthesizing sensible object motions. Thus, we introduce Dream2Flow, a framework that bridges video generation and robotic control through 3D object flow as an intermediate representation. Our method reconstructs 3D object motions from generated videos and formulates manipulation as object trajectory tracking. By separating the state changes from the actuators that realize those changes, Dream2Flow overcomes the embodiment gap and enables zero-shot guidance from pre-trained video models to manipulate objects of diverse categories-including rigid, articulated, deformable, and granular. Through trajectory optimization or reinforcement learning, Dream2Flow converts reconstructed 3D object flow into executable low-level commands without task-specific demonstrations. Simulation and real-world experiments highlight 3D object flow as a general and scalable interface for adapting video generation models to open-world robotic manipulation. Videos and visualizations are available at https://dream2flow.github.io/.
comment: Project website: https://dream2flow.github.io/
Control of Microrobots with Reinforcement Learning under On-Device Compute Constraints
An important function of autonomous microrobots is the ability to perform robust movement over terrain. This paper explores an edge ML approach to microrobot locomotion, allowing for on-device, lower latency control under compute, memory, and power constraints. This paper explores the locomotion of a sub-centimeter quadrupedal microrobot via reinforcement learning (RL) and deploys the resulting controller on an ultra-small system-on-chip (SoC), SC$μ$M-3C, featuring an ARM Cortex-M0 microcontroller running at 5 MHz. We train a compact FP32 multilayer perceptron (MLP) policy with two hidden layers ($[128, 64]$) in a massively parallel GPU simulation and enhance robustness by utilizing domain randomization over simulation parameters. We then study integer (Int8) quantization (per-tensor and per-feature) to allow for higher inference update rates on our resource-limited hardware, and we connect hardware power budgets to achievable update frequency via a cycles-per-update model for inference on our Cortex-M0. We propose a resource-aware gait scheduling viewpoint: given a device power budget, we can select the gait mode (trot/intermediate/gallop) that maximizes expected RL reward at a corresponding feasible update frequency. Finally, we deploy our MLP policy on a real-world large-scale robot on uneven terrain, qualitatively noting that domain-randomized training can improve out-of-distribution stability. We do not claim real-world large-robot empirical zero-shot transfer in this work.
comment: 9 pages, 10 figures
LSRE: Latent Semantic Rule Encoding for Real-Time Semantic Risk Detection in Autonomous Driving
Real-world autonomous driving must adhere to complex human social rules that extend beyond legally codified traffic regulations. Many of these semantic constraints, such as yielding to emergency vehicles, complying with traffic officers' gestures, or stopping for school buses, are intuitive for humans yet difficult to encode explicitly. Although large vision-language models (VLMs) can interpret such semantics, their inference cost makes them impractical for real-time deployment.This work proposes LSRE, a Latent Semantic Rule Encoding framework that converts sparsely sampled VLM judgments into decision boundaries within the latent space of a recurrent world model. By encoding language-defined safety semantics into a lightweight latent classifier, LSRE enables real-time semantic risk assessment at 10 Hz without per-frame VLM queries. Experiments on six semantic-failure scenarios in CARLA demonstrate that LSRE attains semantic risk detection accuracy comparable to a large VLM baseline, while providing substantially earlier hazard anticipation and maintaining low computational latency. LSRE further generalizes to rarely seen semantic-similar test cases, indicating that language-guided latent classification offers an effective and deployable mechanism for semantic safety monitoring in autonomous driving.
Dynamic Policy Learning for Legged Robot with Simplified Model Pretraining and Model Homotopy Transfer
Generating dynamic motions for legged robots remains a challenging problem. While reinforcement learning has achieved notable success in various legged locomotion tasks, producing highly dynamic behaviors often requires extensive reward tuning or high-quality demonstrations. Leveraging reduced-order models can help mitigate these challenges. However, the model discrepancy poses a significant challenge when transferring policies to full-body dynamics environments. In this work, we introduce a continuation-based learning framework that combines simplified model pretraining and model homotopy transfer to efficiently generate and refine complex dynamic behaviors. First, we pretrain the policy using a single rigid body model to capture core motion patterns in a simplified environment. Next, we employ a continuation strategy to progressively transfer the policy to the full-body environment, minimizing performance loss. To define the continuation path, we introduce a model homotopy from the single rigid body model to the full-body model by gradually redistributing mass and inertia between the trunk and legs. The proposed method not only achieves faster convergence but also demonstrates superior stability during the transfer process compared to baseline methods. Our framework is validated on a range of dynamic tasks, including flips and wall-assisted maneuvers, and is successfully deployed on a real quadrupedal robot.
comment: 8 pages. Submitted to the IEEE for possible publication
CREPES-X: Hierarchical Bearing-Distance-Inertial Direct Cooperative Relative Pose Estimation System
Relative localization is critical for cooperation in autonomous multi-robot systems. Existing approaches either rely on shared environmental features or inertial assumptions or suffer from non-line-of-sight degradation and outliers in complex environments. Robust and efficient fusion of inter-robot measurements such as bearings, distances, and inertials for tens of robots remains challenging. We present CREPES-X (Cooperative RElative Pose Estimation System with multiple eXtended features), a hierarchical relative localization framework that enhances speed, accuracy, and robustness under challenging conditions, without requiring any global information. CREPES-X starts with a compact hardware design: InfraRed (IR) LEDs, an IR camera, an ultra-wideband module, and an IMU housed in a cube no larger than 6cm on each side. Then CREPES-X implements a two-stage hierarchical estimator to meet different requirements, considering speed, accuracy, and robustness. First, we propose a single-frame relative estimator that provides instant relative poses for multi-robot setups through a closed-form solution and robust bearing outlier rejection. Then a multi-frame relative estimator is designed to offer accurate and robust relative states by exploring IMU pre-integration via robocentric relative kinematics with loosely- and tightly-coupled optimization. Extensive simulations and real-world experiments validate the effectiveness of CREPES-X, showing robustness to up to 90% bearing outliers, proving resilience in challenging conditions, and achieving RMSE of 0.073m and 1.817° in real-world datasets.
comment: 21 pages, 23 figures, journal
ReSPIRe: Informative and Reusable Belief Tree Search for Robot Probabilistic Search and Tracking in Unknown Environments
Target search and tracking (SAT) is a fundamental problem for various robotic applications such as search and rescue and environmental exploration. This paper proposes an informative trajectory planning approach, namely ReSPIRe, for SAT in unknown cluttered environments under considerably inaccurate prior target information and limited sensing field of view. We first develop a novel sigma point-based approximation approach to fast and accurately estimate mutual information reward under non-Gaussian belief distributions, utilizing informative sampling in state and observation spaces to mitigate the computational intractability of integral calculation. To tackle significant uncertainty associated with inadequate prior target information, we propose the hierarchical particle structure in ReSPIRe, which not only extracts critical particles for global route guidance, but also adjusts the particle number adaptively for planning efficiency. Building upon the hierarchical structure, we develop the reusable belief tree search approach to build a policy tree for online trajectory planning under uncertainty, which reuses rollout evaluation to improve planning efficiency. Extensive simulations and real-world experiments demonstrate that ReSPIRe outperforms representative benchmark methods with smaller MI approximation error, higher search efficiency, and more stable tracking performance, while maintaining outstanding computational efficiency.
comment: 17 pages, 12 figures, accepted to IEEE Transactions on Systems, Man, and Cybernetics: Systems
VLA-RAIL: A Real-Time Asynchronous Inference Linker for VLA Models and Robots
Vision-Language-Action (VLA) models have achieved remarkable breakthroughs in robotics, with the action chunk playing a dominant role in these advances. Given the real-time and continuous nature of robotic motion control, the strategies for fusing a queue of successive action chunks have a profound impact on the overall performance of VLA models. Existing methods suffer from jitter, stalling, or even pauses in robotic action execution, which not only limits the achievable execution speed but also reduces the overall success rate of task completion. This paper introduces VLA-RAIL (A Real-Time Asynchronous Inference Linker), a novel framework designed to address these issues by conducting model inference and robot motion control asynchronously and guaranteeing smooth, continuous, and high-speed action execution. The core contributions of the paper are two fold: a Trajectory Smoother that effectively filters out the noise and jitter in the trajectory of one action chunk using polynomial fitting and a Chunk Fuser that seamlessly align the current executing trajectory and the newly arrived chunk, ensuring position, velocity, and acceleration continuity between two successive action chunks. We validate the effectiveness of VLA-RAIL on a benchmark of dynamic simulation tasks and several real-world manipulation tasks. Experimental results demonstrate that VLA-RAIL significantly reduces motion jitter, enhances execution speed, and improves task success rates, which will become a key infrastructure for the large-scale deployment of VLA models.
Antagonistic Bowden-Cable Actuation of a Lightweight Robotic Hand: Toward Dexterous Manipulation for Payload Constrained Humanoids
Humanoid robots toward human-level dexterity require robotic hands capable of simultaneously providing high grasping force, rapid actuation speeds, multiple degrees of freedom, and lightweight structures within human-like size constraints. Meeting these conflicting requirements remains challenging, as satisfying this combination typically necessitates heavier actuators and bulkier transmission systems, significantly restricting the payload capacity of robot arms. In this letter, we present a lightweight anthropomorphic hand actuated by Bowden cables, which uniquely combines rolling-contact joint optimization with antagonistic cable actuation, enabling single-motor-per-joint control with negligible cable-length deviation. By relocating the actuator module to the torso, the design substantially reduces distal mass while maintaining anthropomorphic scale and dexterity. Additionally, this antagonistic cable actuation eliminates the need for synchronization between motors. Using the proposed methods, the hand assembly with a distal mass of 236g (excluding remote actuators and Bowden sheaths) demonstrated reliable execution of dexterous tasks, exceeding 18N fingertip force and lifting payloads over one hundred times its own mass. Furthermore, robustness was validated through Cutkosky taxonomy grasps and trajectory consistency under perturbed actuator-hand transformations.
comment: Preprint
RoboMIND 2.0: A Multimodal, Bimanual Mobile Manipulation Dataset for Generalizable Embodied Intelligence
While data-driven imitation learning has revolutionized robotic manipulation, current approaches remain constrained by the scarcity of large-scale, diverse real-world demonstrations. Consequently, the ability of existing models to generalize across long-horizon bimanual tasks and mobile manipulation in unstructured environments remains limited. To bridge this gap, we present RoboMIND 2.0, a comprehensive real-world dataset comprising over 310K dual-arm manipulation trajectories collected across six distinct robot embodiments and 739 complex tasks. Crucially, to support research in contact-rich and spatially extended tasks, the dataset incorporates 12K tactile-enhanced episodes and 20K mobile manipulation trajectories. Complementing this physical data, we construct high-fidelity digital twins of our real-world environments, releasing an additional 20K-trajectory simulated dataset to facilitate robust sim-to-real transfer. To fully exploit the potential of RoboMIND 2.0, we propose MIND-2 system, a hierarchical dual-system frame-work optimized via offline reinforcement learning. MIND-2 integrates a high-level semantic planner (MIND-2-VLM) to decompose abstract natural language instructions into grounded subgoals, coupled with a low-level Vision-Language-Action executor (MIND-2-VLA), which generates precise, proprioception-aware motor actions.
Hybrid Motion Planning with Deep Reinforcement Learning for Mobile Robot Navigation
Autonomous mobile robots operating in complex, dynamic environments face the dual challenge of navigating large-scale, structurally diverse spaces with static obstacles while safely interacting with various moving agents. Traditional graph-based planners excel at long-range pathfinding but lack reactivity, while Deep Reinforcement Learning (DRL) methods demonstrate strong collision avoidance but often fail to reach distant goals due to a lack of global context. We propose Hybrid Motion Planning with Deep Reinforcement Learning (HMP-DRL), a hybrid framework that bridges this gap. Our approach utilizes a graph-based global planner to generate a path, which is integrated into a local DRL policy via a sequence of checkpoints encoded in both the state space and reward function. To ensure social compliance, the local planner employs an entity-aware reward structure that dynamically adjusts safety margins and penalties based on the semantic type of surrounding agents. We validate the proposed method through extensive testing in a realistic simulation environment derived from real-world map data. Comprehensive experiments demonstrate that HMP-DRL consistently outperforms other methods, including state-of-the-art approaches, in terms of key metrics of robot navigation: success rate, collision rate, and time to reach the goal. Overall, these findings confirm that integrating long-term path guidance with semantically-aware local control significantly enhances both the safety and reliability of autonomous navigation in complex human-centric settings.
comment: 22 pages, 4 figures
Resolving State Ambiguity in Robot Manipulation via Adaptive Working Memory Recoding
State ambiguity is common in robotic manipulation. Identical observations may correspond to multiple valid behavior trajectories. The visuomotor policy must correctly extract the appropriate types and levels of information from the history to identify the current task phase. However, naively extending the history window is computationally expensive and may cause severe overfitting. Inspired by the continuous nature of human reasoning and the recoding of working memory, we introduce PAM, a novel visuomotor Policy equipped with Adaptive working Memory. With minimal additional training cost in a two-stage manner, PAM supports a 300-frame history window while maintaining high inference speed. Specifically, a hierarchical frame feature extractor yields two distinct representations for motion primitives and temporal disambiguation. For compact representation, a context router with range-specific queries is employed to produce compact context features across multiple history lengths. And an auxiliary objective of reconstructing historical information is introduced to ensure that the context router acts as an effective bottleneck. We meticulously design 7 tasks and verify that PAM can handle multiple scenarios of state ambiguity simultaneously. With a history window of approximately 10 seconds, PAM still supports stable training and maintains inference speeds above 20Hz. Project website: https://tinda24.github.io/pam/
DISF: Disentangled Iterative Surface Fitting for Contact-stable Grasp Planning with Grasp Pose Alignment to the Object Center of Mass
In this work, we address the limitation of surface fitting-based grasp planning algorithm, which primarily focuses on geometric alignment between the gripper and object surface while overlooking the stability of contact point distribution, often resulting in unstable grasps due to inadequate contact configurations. To overcome this limitation, we propose a novel surface fitting algorithm that integrates contact stability while preserving geometric compatibility. Inspired by human grasping behavior, our method disentangles the grasp pose optimization into three sequential steps: (1) rotation optimization to align contact normals, (2) translation refinement to improve the alignment between the gripper frame origin and the object Center of Mass (CoM), and (3) gripper aperture adjustment to optimize contact point distribution. We validate our approach in simulation across 15 objects under both Known-shape (with clean CAD-derived dataset) and Observed-shape (with YCB object dataset) settings, including cross-platform grasp execution on three robot--gripper platforms. We further validate the method in real-world grasp experiments on a UR3e robot. Overall, DISF reduces CoM misalignment while maintaining geometric compatibility, translating into higher grasp success in both simulation and real-world execution compared to baselines. Additional videos and supplementary results are available on our project page: https://tomoya-yamanokuchi.github.io/disf-ras-project-page/
comment: 48 pages
Compositional Diffusion with Guided search for Long-Horizon Planning
Generative models have emerged as powerful tools for planning, with compositional approaches offering particular promise for modeling long-horizon task distributions by composing together local, modular generative models. This compositional paradigm spans diverse domains, from multi-step manipulation planning to panoramic image synthesis to long video generation. However, compositional generative models face a critical challenge: when local distributions are multimodal, existing composition methods average incompatible modes, producing plans that are neither locally feasible nor globally coherent. We propose Compositional Diffusion with Guided Search (CDGS), which addresses this \emph{mode averaging} problem by embedding search directly within the diffusion denoising process. Our method explores diverse combinations of local modes through population-based sampling, prunes infeasible candidates using likelihood-based filtering, and enforces global consistency through iterative resampling between overlapping segments. CDGS matches oracle performance on seven robot manipulation tasks, outperforming baselines that lack compositionality or require long-horizon training data. The approach generalizes across domains, enabling coherent text-guided panoramic images and long videos through effective local-to-global message passing. More details: https://cdgsearch.github.io/
comment: 38 pages, 18 figures
GRL-SNAM: Geometric Reinforcement Learning with Path Differential Hamiltonians for Simultaneous Navigation and Mapping in Unknown Environments
We present GRL-SNAM, a geometric reinforcement learning framework for Simultaneous Navigation and Mapping(SNAM) in unknown environments. A SNAM problem is challenging as it needs to design hierarchical or joint policies of multiple agents that control the movement of a real-life robot towards the goal in mapless environment, i.e. an environment where the map of the environment is not available apriori, and needs to be acquired through sensors. The sensors are invoked from the path learner, i.e. navigator, through active query responses to sensory agents, and along the motion path. GRL-SNAM differs from preemptive navigation algorithms and other reinforcement learning methods by relying exclusively on local sensory observations without constructing a global map. Our approach formulates path navigation and mapping as a dynamic shortest path search and discovery process using controlled Hamiltonian optimization: sensory inputs are translated into local energy landscapes that encode reachability, obstacle barriers, and deformation constraints, while policies for sensing, planning, and reconfiguration evolve stagewise via updating Hamiltonians. A reduced Hamiltonian serves as an adaptive score function, updating kinetic/potential terms, embedding barrier constraints, and continuously refining trajectories as new local information arrives. We evaluate GRL-SNAM on two different 2D navigation tasks. Comparing against local reactive baselines and global policy learning references under identical stagewise sensing constraints, it preserves clearance, generalizes to unseen layouts, and demonstrates that Geometric RL learning via updating Hamiltonians enables high-quality navigation through minimal exploration via local energy refinement rather than extensive global mapping. The code is publicly available on \href{https://github.com/CVC-Lab/GRL-SNAM}{Github}.
Reinforcement learning with timed constraints for robotics motion planning
Robotic systems operating in dynamic and uncertain environments increasingly require planners that satisfy complex task sequences while adhering to strict temporal constraints. Metric Interval Temporal Logic (MITL) offers a formal and expressive framework for specifying such time-bounded requirements; however, integrating MITL with reinforcement learning (RL) remains challenging due to stochastic dynamics and partial observability. This paper presents a unified automata-based RL framework for synthesizing policies in both Markov Decision Processes (MDPs) and Partially Observable Markov Decision Processes (POMDPs) under MITL specifications. MITL formulas are translated into Timed Limit-Deterministic Generalized Büchi Automata (Timed-LDGBA) and synchronized with the underlying decision process to construct product timed models suitable for Q-learning. A simple yet expressive reward structure enforces temporal correctness while allowing additional performance objectives. The approach is validated in three simulation studies: a $5 \times 5$ grid-world formulated as an MDP, a $10 \times 10$ grid-world formulated as a POMDP, and an office-like service-robot scenario. Results demonstrate that the proposed framework consistently learns policies that satisfy strict time-bounded requirements under stochastic transitions, scales to larger state spaces, and remains effective in partially observable environments, highlighting its potential for reliable robotic planning in time-critical and uncertain settings.
Dichotomous Diffusion Policy Optimization
Diffusion-based policies have gained growing popularity in solving a wide range of decision-making tasks due to their superior expressiveness and controllable generation during inference. However, effectively training large diffusion policies using reinforcement learning (RL) remains challenging. Existing methods either suffer from unstable training due to directly maximizing value objectives, or face computational issues due to relying on crude Gaussian likelihood approximation, which requires a large amount of sufficiently small denoising steps. In this work, we propose DIPOLE (Dichotomous diffusion Policy improvement), a novel RL algorithm designed for stable and controllable diffusion policy optimization. We begin by revisiting the KL-regularized objective in RL, which offers a desirable weighted regression objective for diffusion policy extraction, but often struggles to balance greediness and stability. We then formulate a greedified policy regularization scheme, which naturally enables decomposing the optimal policy into a pair of stably learned dichotomous policies: one aims at reward maximization, and the other focuses on reward minimization. Under such a design, optimized actions can be generated by linearly combining the scores of dichotomous policies during inference, thereby enabling flexible control over the level of greediness.Evaluations in offline and offline-to-online RL settings on ExORL and OGBench demonstrate the effectiveness of our approach. We also use DIPOLE to train a large vision-language-action (VLA) model for end-to-end autonomous driving (AD) and evaluate it on the large-scale real-world AD benchmark NAVSIM, highlighting its potential for complex real-world applications.
CLF-RL: Control Lyapunov Function Guided Reinforcement Learning
Reinforcement learning (RL) has shown promise in generating robust locomotion policies for bipedal robots, but often suffers from tedious reward design and sensitivity to poorly shaped objectives. In this work, we propose a structured reward shaping framework that leverages model-based trajectory generation and control Lyapunov functions (CLFs) to guide policy learning. We explore two model-based planners for generating reference trajectories: a reduced-order linear inverted pendulum (LIP) model for velocity-conditioned motion planning, and a precomputed gait library based on hybrid zero dynamics (HZD) using full-order dynamics. These planners define desired end-effector and joint trajectories, which are used to construct CLF-based rewards that penalize tracking error and encourage rapid convergence. This formulation provides meaningful intermediate rewards, and is straightforward to implement once a reference is available. Both the reference trajectories and CLF shaping are used only during training, resulting in a lightweight policy at deployment. We validate our method both in simulation and through extensive real-world experiments on a Unitree G1 robot. CLF-RL demonstrates significantly improved robustness relative to the baseline RL policy and better performance than a classic tracking reward RL formulation.
comment: 8 pages; 7 figures
Learning Spatial-Aware Manipulation Ordering NeurIPS 2025
Manipulation in cluttered environments is challenging due to spatial dependencies among objects, where an improper manipulation order can cause collisions or blocked access. Existing approaches often overlook these spatial relationships, limiting their flexibility and scalability. To address these limitations, we propose OrderMind, a unified spatial-aware manipulation ordering framework that directly learns object manipulation priorities based on spatial context. Our architecture integrates a spatial context encoder with a temporal priority structuring module. We construct a spatial graph using k-Nearest Neighbors to aggregate geometric information from the local layout and encode both object-object and object-manipulator interactions to support accurate manipulation ordering in real-time. To generate physically and semantically plausible supervision signals, we introduce a spatial prior labeling method that guides a vision-language model to produce reasonable manipulation orders for distillation. We evaluate OrderMind on our Manipulation Ordering Benchmark, comprising 163,222 samples of varying difficulty. Extensive experiments in both simulation and real-world environments demonstrate that our method significantly outperforms prior approaches in effectiveness and efficiency, enabling robust manipulation in cluttered scenes.
comment: Accepted to NeurIPS 2025
Theory of Mind for Explainable Human-Robot Interaction
Within the context of human-robot interaction (HRI), Theory of Mind (ToM) is intended to serve as a user-friendly backend to the interface of robotic systems, enabling robots to infer and respond to human mental states. When integrated into robots, ToM allows them to adapt their internal models to users' behaviors, enhancing the interpretability and predictability of their actions. Similarly, Explainable Artificial Intelligence (XAI) aims to make AI systems transparent and interpretable, allowing humans to understand and interact with them effectively. Since ToM in HRI serves related purposes, we propose to consider ToM as a form of XAI and evaluate it through the eValuation XAI (VXAI) framework and its seven desiderata. This paper identifies a critical gap in the application of ToM within HRI, as existing methods rarely assess the extent to which explanations correspond to the robot's actual internal reasoning. To address this limitation, we propose to integrate ToM within XAI frameworks. By embedding ToM principles inside XAI, we argue for a shift in perspective, as current XAI research focuses predominantly on the AI system itself and often lacks user-centered explanations. Incorporating ToM would enable a change in focus, prioritizing the user's informational needs and perspective.
Towards Data-Driven Metrics for Social Robot Navigation Benchmarking
This paper presents a joint effort towards the development of a data-driven Social Robot Navigation metric to facilitate benchmarking and policy optimization for ground robots. We compiled a dataset with 4427 trajectories -- 182 real and 4245 simulated -- and presented it to human raters, yielding a total of 4402 rated trajectories after data quality assurance. Notably, we provide the first all-encompassing learned social robot navigation metric, along qualitative and quantitative results, including the test loss achieved, a comparison against hand-crafted metrics, and an ablation study. All data, software, and model weights are publicly available.
AINav: Large Language Model-Based Adaptive Interactive Navigation
Robotic navigation in complex environments remains a critical research challenge. Traditional navigation methods focus on optimal trajectory generation within fixed free workspace, therefore struggling in environments lacking viable paths to the goal, such as disaster zones or cluttered warehouses. To address this problem, we propose AINav, an adaptive interactive navigation approach that proactively interacts with environments to create feasible paths to achieve originally unreachable goals. Specifically, we present a primitive skill tree for task planning with large language models (LLMs), facilitating effective reasoning to determine interaction objects and sequences. To ensure robust subtask execution, we adopt reinforcement learning to pre-train a comprehensive skill library containing versatile locomotion and interaction behaviors for motion planning. Furthermore, we introduce an adaptive replanning approach featuring two LLM-based modules: an advisor serving as a flexible replanning trigger and an arborist for autonomous plan adjustment. Integrated with the tree structure, the replanning mechanism allows for convenient node addition and pruning, enabling rapid plan adaptation in a priori unknown environments. Comprehensive simulations and experiments have demonstrated AINav's effectiveness and adaptivity in diverse scenarios. The supplementary video is available at: https://youtu.be/CjXm5KFx9AI.
comment: 13 pages, 12 figures, accepted to IEEE Robotics & Automation Magazine
Collaborative Representation Learning for Alignment of Tactile, Language, and Vision Modalities
Tactile sensing offers rich and complementary information to vision and language, enabling robots to perceive fine-grained object properties. However, existing tactile sensors lack standardization, leading to redundant features that hinder cross-sensor generalization. Moreover, existing methods fail to fully integrate the intermediate communication among tactile, language, and vision modalities. To address this, we propose TLV-CoRe, a CLIP-based Tactile-Language-Vision Collaborative Representation learning method. TLV-CoRe introduces a Sensor-Aware Modulator to unify tactile features across different sensors and employs tactile-irrelevant decoupled learning to disentangle irrelevant tactile features. Additionally, a Unified Bridging Adapter is introduced to enhance tri-modal interaction within the shared representation space. To fairly evaluate the effectiveness of tactile models, we further propose the RSS evaluation framework, focusing on Robustness, Synergy, and Stability across different methods. Experimental results demonstrate that TLV-CoRe significantly improves sensor-agnostic representation learning and cross-modal alignment, offering a new direction for multimodal tactile representation.
Passage-traversing optimal path planning with sampling-based algorithms
This paper introduces a new paradigm of optimal path planning, i.e., passage-traversing optimal path planning (PTOPP), that optimizes paths' traversed passages for specified optimization objectives. In particular, PTOPP is utilized to find the path with optimal accessible free space along its entire length, which represents a basic requirement for paths in robotics. As passages are places where free space shrinks and becomes constrained, the core idea is to leverage the path's passage traversal status to characterize its accessible free space comprehensively. To this end, a novel passage detection and free space decomposition method using proximity graphs is proposed, enabling fast detection of sparse but informative passages and environment decompositions. Based on this preprocessing, optimal path planning with accessible free space objectives or constraints is formulated as PTOPP problems compatible with sampling-based optimal planners. Then, sampling-based algorithms for PTOPP, including their dependent primitive procedures, are developed leveraging partitioned environments for fast passage traversal check. All these methods are implemented and thoroughly tested for effectiveness and efficiency validation. Compared to existing approaches, such as clearance-based methods, PTOPP demonstrates significant advantages in configurability, solution optimality, and efficiency, addressing prior limitations and incapabilities. It is believed to provide an efficient and versatile solution to accessible free space optimization over conventional avenues and more generally, to a broad class of path planning problems that can be formulated as PTOPP.
comment: 27 pages, 20 figures, 4 tables, journal paper
VL-LN Bench: Towards Long-horizon Goal-oriented Navigation with Active Dialogs
In most existing embodied navigation tasks, instructions are well-defined and unambiguous, such as instruction following and object searching. Under this idealized setting, agents are required solely to produce effective navigation outputs conditioned on vision and language inputs. However, real-world navigation instructions are often vague and ambiguous, requiring the agent to resolve uncertainty and infer user intent through active dialog. To address this gap, we propose Interactive Instance Object Navigation (IION), a task that requires agents not only to generate navigation actions but also to produce language outputs via active dialog, thereby aligning more closely with practical settings. IION extends Instance Object Navigation (ION) by allowing agents to freely consult an oracle in natural language while navigating. Building on this task, we present the Vision Language-Language Navigation (VL-LN) benchmark, which provides a large-scale, automatically generated dataset and a comprehensive evaluation protocol for training and assessing dialog-enabled navigation models. VL-LN comprises over 41k long-horizon dialog-augmented trajectories for training and an automatic evaluation protocol with an oracle capable of responding to agent queries. Using this benchmark, we train a navigation model equipped with dialog capabilities and show that it achieves significant improvements over the baselines. Extensive experiments and analyses further demonstrate the effectiveness and reliability of VL-LN for advancing research on dialog-enabled embodied navigation. Code and dataset: https://0309hws.github.io/VL-LN.github.io/
Large Language Model-Driven Closed-Loop UAV Operation with Semantic Observations
Recent advances in large Language Models (LLMs) have revolutionized mobile robots, including unmanned aerial vehicles (UAVs), enabling their intelligent operation within Internet of Things (IoT) ecosystems. However, LLMs still face challenges from logical reasoning and complex decision-making, leading to concerns about the reliability of LLM-driven UAV operations in IoT applications. In this paper, we propose a closed-loop LLM-driven UAV operation code generation framework that enables reliable UAV operations powered by effective feedback and refinement using two LLM modules, i.e., a Code Generator and an Evaluator. Our framework transforms numerical state observations from UAV operations into semantic trajectory descriptions to enhance the evaluator LLM's understanding of UAV dynamics for precise feedback generation. Our framework also enables a simulation-based refinement process, and hence eliminates the risks to physical UAVs caused by incorrect code execution during the refinement. Extensive experiments on UAV control tasks with different complexities are conducted. The experimental results show that our framework can achieve reliable UAV operations using LLMs, which significantly outperforms baseline methods in terms of success rate and completeness with the increase of task complexity.
comment: 13 pages, 10 figures
Dynamic Gap: Safe Gap-based Navigation in Dynamic Environments ICRA
This paper extends the family of gap-based local planners to unknown dynamic environments through generating provable collision-free properties for hierarchical navigation systems. Existing perception-informed local planners that operate in dynamic environments rely on emergent or empirical robustness for collision avoidance as opposed to performing formal analysis of dynamic obstacles. In addition to this, the obstacle tracking that is performed in these existent planners is often achieved with respect to a global inertial frame, subjecting such tracking estimates to transformation errors from odometry drift. The proposed local planner, dynamic gap, shifts the tracking paradigm to modeling how the free space, represented as gaps, evolves over time. Gap crossing and closing conditions are developed to aid in determining the feasibility of passage through gaps, and a breadth of simulation benchmarking is performed against other navigation planners in the literature where the proposed dynamic gap planner achieves the highest success rate out of all planners tested in all environments.
comment: Accepted to 2025 International Conference on Robotics and Automation (ICRA)
Hybrid A* Path Planning with Multi-Modal Motion Extension for Four-Wheel Steering Mobile Robots
Four-wheel independent steering (4WIS) systems provide mobile robots with a rich set of motion modes, such as Ackermann steering, lateral steering, and parallel movement, offering superior maneuverability in constrained environments. However, existing path planning methods generally assume a single kinematic model and thus fail to fully exploit the multi-modal capabilities of 4WIS platforms. To address this limitation, we propose an extended Hybrid A* framework that operates in a four-dimensional state space incorporating both spatial states and motion modes. Within this framework, we design multi-modal Reeds-Shepp curves tailored to the distinct kinematic constraints of each motion mode, develop an enhanced heuristic function that accounts for mode-switching costs, and introduce a terminal connection strategy with intelligent mode selection to ensure smooth transitions between different steering patterns. The proposed planner enables seamless integration of multiple motion modalities within a single path, significantly improving flexibility and adaptability in complex environments. Results demonstrate significantly improved planning performance for 4WIS robots in complex environments.
comment: Updated method details, parameters, and experimental scenarios
Multiagent Systems
Heterogeneous Multi-Agent Multi-Target Tracking using Cellular Sheaves
Multi-agent target tracking in the presence of nonlinear dynamics and agent heterogeneity, where state-space dimensions may differ, is a challenging problem that traditional graph Laplacian methods cannot easily address. This work leverages the framework of cellular sheaves, a mathematical generalization of graph theory, to natively model such heterogeneous systems. While existing coordination sheaf frameworks focus on cooperative problems like consensus, this work extends them to the non-cooperative target-tracking problem. The tracking of multiple, unknown targets is formulated as a harmonic extension problem on a cellular sheaf, accommodating nonlinear dynamics and external disturbances for all agents. A decentralized control law is developed using the sheaf Laplacian, and a corresponding Lyapunov-based stability analysis is provided to guarantee tracking error convergence, with results validated by simulation.
comment: 8 pages
BEDA: Belief Estimation as Probabilistic Constraints for Performing Strategic Dialogue Acts AAMAS 2026
Strategic dialogue requires agents to execute distinct dialogue acts, for which belief estimation is essential. While prior work often estimates beliefs accurately, it lacks a principled mechanism to use those beliefs during generation. We bridge this gap by first formalizing two core acts Adversarial and Alignment, and by operationalizing them via probabilistic constraints on what an agent may generate. We instantiate this idea in BEDA, a framework that consists of the world set, the belief estimator for belief estimation, and the conditional generator that selects acts and realizes utterances consistent with the inferred beliefs. Across three settings, Conditional Keeper Burglar (CKBG, adversarial), Mutual Friends (MF, cooperative), and CaSiNo (negotiation), BEDA consistently outperforms strong baselines: on CKBG it improves success rate by at least 5.0 points across backbones and by 20.6 points with GPT-4.1-nano; on Mutual Friends it achieves an average improvement of 9.3 points; and on CaSiNo it achieves the optimal deal relative to all baselines. These results indicate that casting belief estimation as constraints provides a simple, general mechanism for reliable strategic dialogue.
comment: Accepted by AAMAS 2026
Dynamic Strategy Adaptation in Multi-Agent Environments with Large Language Models
Large language models (LLMs) demonstrate strong reasoning abilities across mathematical, strategic, and linguistic tasks, yet little is known about how well they reason in dynamic, real-time, multi-agent scenarios, such as collaborative environments in which agents continuously adapt to each other's behavior, as in cooperative gameplay settings. In this paper, we bridge this gap by combining LLM-driven agents with strategic reasoning and real-time adaptation in cooperative, multi-agent environments grounded in game-theoretic principles such as belief consistency and Nash equilibrium. The proposed framework applies broadly to dynamic scenarios in which agents coordinate, communicate, and make decisions in response to continuously changing conditions. We provide real-time strategy refinement and adaptive feedback mechanisms that enable agents to dynamically adjust policies based on immediate contextual interactions, in contrast to previous efforts that evaluate LLM capabilities in static or turn-based settings. Empirical results show that our method achieves up to a 26\% improvement in return over PPO baselines in high-noise environments, while maintaining real-time latency under 1.05 milliseconds. Our approach improves collaboration efficiency, task completion rates, and flexibility, illustrating that game-theoretic guidance integrated with real-time feedback enhances LLM performance, ultimately fostering more resilient and flexible strategic multi-agent systems.
Triple-BERT: Do We Really Need MARL for Order Dispatch on Ride-Sharing Platforms?
On-demand ride-sharing platforms, such as Uber and Lyft, face the intricate real-time challenge of bundling and matching passengers-each with distinct origins and destinations-to available vehicles, all while navigating significant system uncertainties. Due to the extensive observation space arising from the large number of drivers and orders, order dispatching, though fundamentally a centralized task, is often addressed using Multi-Agent Reinforcement Learning (MARL). However, independent MARL methods fail to capture global information and exhibit poor cooperation among workers, while Centralized Training Decentralized Execution (CTDE) MARL methods suffer from the curse of dimensionality. To overcome these challenges, we propose Triple-BERT, a centralized Single Agent Reinforcement Learning (MARL) method designed specifically for large-scale order dispatching on ride-sharing platforms. Built on a variant TD3, our approach addresses the vast action space through an action decomposition strategy that breaks down the joint action probability into individual driver action probabilities. To handle the extensive observation space, we introduce a novel BERT-based network, where parameter reuse mitigates parameter growth as the number of drivers and orders increases, and the attention mechanism effectively captures the complex relationships among the large pool of driver and orders. We validate our method using a real-world ride-hailing dataset from Manhattan. Triple-BERT achieves approximately an 11.95% improvement over current state-of-the-art methods, with a 4.26% increase in served orders and a 22.25% reduction in pickup times. Our code, trained model parameters, and processed data are publicly available at the repository https://github.com/RS2002/Triple-BERT .
One Step is Enough: Multi-Agent Reinforcement Learning based on One-Step Policy Optimization for Order Dispatch on Ride-Sharing Platforms
Order dispatch is a critical task in ride-sharing systems with Autonomous Vehicles (AVs), directly influencing efficiency and profits. Recently, Multi-Agent Reinforcement Learning (MARL) has emerged as a promising solution to this problem by decomposing the large state and action spaces among individual agents, effectively addressing the Curse of Dimensionality (CoD) in transportation market, which is caused by the substantial number of vehicles, passengers, and orders. However, conventional MARL-based approaches heavily rely on accurate estimation of the value function, which becomes problematic in large-scale, highly uncertain environments. To address this issue, we propose two novel methods that bypass value function estimation, leveraging the homogeneous property of AV fleets. First, we draw an analogy between AV fleets and groups in Group Relative Policy Optimization (GRPO), adapting it to the order dispatch task. By replacing the Proximal Policy Optimization (PPO) baseline with the group average reward-to-go, GRPO eliminates critic estimation errors and reduces training bias. Inspired by this baseline replacement, we further propose One-Step Policy Optimization (OSPO), demonstrating that the optimal policy can be trained using only one-step group rewards under a homogeneous fleet. Experiments on a real-world ride-hailing dataset show that both GRPO and OSPO achieve promising performance across all scenarios, efficiently optimizing pickup times and the number of served orders using simple Multilayer Perceptron (MLP) networks. Furthermore, OSPO outperforms GRPO in all scenarios, attributed to its elimination of bias caused by the bounded time horizon of GRPO. Our code, trained models, and processed data are provided at https://github.com/RS2002/OSPO .
Systems and Control (EESS)
MSACL: Multi-Step Actor-Critic Learning with Lyapunov Certificates for Exponentially Stabilizing Control
Achieving provable stability in model-free reinforcement learning (RL) remains a challenge, particularly in balancing exploration with rigorous safety. This article introduces MSACL, a framework that integrates exponential stability theory with maximum entropy RL through multi-step Lyapunov certificate learning. Unlike methods relying on complex reward engineering, MSACL utilizes off-policy multi-step data to learn Lyapunov certificates satisfying theoretical stability conditions. By introducing Exponential Stability Labels (ESL) and a $λ$-weighted aggregation mechanism, the framework effectively balances the bias-variance trade-off in multi-step learning. Policy optimization is guided by a stability-aware advantage function, ensuring the learned policy promotes rapid Lyapunov descent. We evaluate MSACL across six benchmarks, including stabilization and nonlinear tracking tasks, demonstrating its superiority over state-of-the-art Lyapunov-based RL algorithms. MSACL achieves exponential stability and rapid convergence under simple rewards, while exhibiting significant robustness to uncertainties and generalization to unseen trajectories. Sensitivity analysis establishes the multi-step horizon $n=20$ as a robust default across diverse systems. By linking Lyapunov theory with off-policy actor-critic frameworks, MSACL provides a foundation for verifiably safe learning-based control. Source code and benchmark environments will be made publicly available.
One-Shot Camera-Based Extrusion Optimization for High Speed Fused Filament Fabrication
Off-the-shelf fused filament fabrication 3D printers are widely accessible and convenient, yet they exhibit quality loss at high speeds due to dynamic mis-synchronization between printhead motion and material extrusion systems, notably corner over-extrusion. Existing methods require specialized hardware, extensive calibration, or firmware modifications that are inaccessible to most users. This work presents a practical, end-to-end optimization framework that enhances high-speed printing using only standard 3D printers and a phone camera, without requiring additional complex setup. The method employs a one-shot calibration approach in which two simple printed patterns, captured by a phone camera, enable identification of extrusion dynamics and cornering behavior. The identified systems enable a model-based constrained optimal control strategy that generates optimized G-code, synchronizing motion and extrusion. Experiments show reduced width tracking error, mitigated corner defects, and lower surface roughness, achieving surface quality at 3600 mm/min comparable to conventional printing at 1600 mm/min, effectively doubling production speed while maintaining print quality. This accessible, hardware-minimal approach enables a wide range of fused filament fabrication users to achieve high-quality, high-speed additive manufacturing.
Heterogeneous Multi-Agent Multi-Target Tracking using Cellular Sheaves
Multi-agent target tracking in the presence of nonlinear dynamics and agent heterogeneity, where state-space dimensions may differ, is a challenging problem that traditional graph Laplacian methods cannot easily address. This work leverages the framework of cellular sheaves, a mathematical generalization of graph theory, to natively model such heterogeneous systems. While existing coordination sheaf frameworks focus on cooperative problems like consensus, this work extends them to the non-cooperative target-tracking problem. The tracking of multiple, unknown targets is formulated as a harmonic extension problem on a cellular sheaf, accommodating nonlinear dynamics and external disturbances for all agents. A decentralized control law is developed using the sheaf Laplacian, and a corresponding Lyapunov-based stability analysis is provided to guarantee tracking error convergence, with results validated by simulation.
comment: 8 pages
Trustworthy Equipment Monitoring via Cascaded Anomaly Detection and Thermal Localization
Predictive maintenance demands accurate anomaly detection and trustable explanations. Although multimodal fusion of sensor time-series and thermal imagery shows promise, we demonstrate that naive fusion strategies can paradoxically degrade performance. This paper introduces a Cascaded Anomaly Detection framework that decouples detection and localization. Stage 1 employs an LSTM-based sensor encoder with temporal attention for high-accuracy detection, while Stage 2 activates a CNN-based thermal encoder for post-detection fault localization. Our results reveal that sensor-only detection outperforms full fusion by 8.3 percentage points (93.08% vs. 84.79% F1-score), challenging the assumption that additional modalities invariably improve performance. We further contribute an explainability pipeline integrating SHAP, temporal/spatial attention, and gate weight analysis. This analysis uncovers a "modality bias" where fusion models assign 65-87% weight to the weaker thermal modality. Validated on a real-world bearing dataset (78,397 samples), our cascaded approach achieves state-of-the-art accuracy while providing actionable diagnostics for maintenance decision-making.
Control of Microrobots with Reinforcement Learning under On-Device Compute Constraints
An important function of autonomous microrobots is the ability to perform robust movement over terrain. This paper explores an edge ML approach to microrobot locomotion, allowing for on-device, lower latency control under compute, memory, and power constraints. This paper explores the locomotion of a sub-centimeter quadrupedal microrobot via reinforcement learning (RL) and deploys the resulting controller on an ultra-small system-on-chip (SoC), SC$μ$M-3C, featuring an ARM Cortex-M0 microcontroller running at 5 MHz. We train a compact FP32 multilayer perceptron (MLP) policy with two hidden layers ($[128, 64]$) in a massively parallel GPU simulation and enhance robustness by utilizing domain randomization over simulation parameters. We then study integer (Int8) quantization (per-tensor and per-feature) to allow for higher inference update rates on our resource-limited hardware, and we connect hardware power budgets to achievable update frequency via a cycles-per-update model for inference on our Cortex-M0. We propose a resource-aware gait scheduling viewpoint: given a device power budget, we can select the gait mode (trot/intermediate/gallop) that maximizes expected RL reward at a corresponding feasible update frequency. Finally, we deploy our MLP policy on a real-world large-scale robot on uneven terrain, qualitatively noting that domain-randomized training can improve out-of-distribution stability. We do not claim real-world large-robot empirical zero-shot transfer in this work.
comment: 9 pages, 10 figures
Exact compensation of communication delays for discrete-time heterogeneous multi-agent linear systems with applications to SIR epidemic model
This paper investigates the output synchronization problem for discrete-time heterogeneous multi-agent systems (MASs) subject to distinct communication delays. The presence of such delays prevents the instantaneous delivery of information from neighboring nodes, thereby severely degrading the performance of standard distributed control schemes. To overcome this, we propose a prediction-based framework for exact delay compensation. Specifically, we introduce predictors combined with a mechanism of distributed predictors, which enables the recursive reconstruction of future state information across the communication network. Building upon these predictors, we construct prediction-based distributed observers and formulate both prediction-based distributed state-feedback and dynamic output-feedback controllers. Theoretical analysis confirms that the proposed strategy eliminates the impact of delays after a finite number of steps, ensuring output synchronization. The effectiveness of the methods is validated through a numerical example and a Koopman operator-based linear Susceptible-Infected-Recovered (SIR) epidemic model. Notably, for a population of 4 million, the proposed delay compensation strategy achieves a reduction of over 200,000 infected individuals at the peak, underscoring its potential significance in epidemic mitigation.
Average Consensus with Dynamic Quantization Framing and Finite-Time Termination over Limited-Bandwidth Directed Networks
This paper proposes a deterministic distributed algorithm, referred to as PP-ACDC, that achieves exact average consensus over possibly unbalanced directed graphs using only a fixed and a priori specified number of quantization bits. The method integrates Push-Pull (surplus) consensus dynamics with a dynamic quantization framing scheme combining zooming and midpoint shifting, enabling agents to preserve the true global average while progressively refining their quantization precision. We establish a rigorous convergence theory showing that PP-ACDC achieves asymptotic (exact) average consensus on any strongly connected digraph under appropriately chosen quantization parameters. Moreover, we develop a fully distributed and synchronized finite-time termination mechanism, and we provide a formal proof on the detection of $ε$-convergence to the average within a finite number of iterations. Numerical simulations corroborate the theoretical results and demonstrate that PP-ACDC achieves reliable, communication-efficient, and precise average consensus even under very tight bit budgets, underscoring its suitability for large-scale and resource-constrained multi-agent systems operating over directed networks.
comment: arXiv admin note: substantial text overlap with arXiv:2508.06893
BatteryAgent: Synergizing Physics-Informed Interpretation with LLM Reasoning for Intelligent Battery Fault Diagnosis
Fault diagnosis of lithium-ion batteries is critical for system safety. While existing deep learning methods exhibit superior detection accuracy, their "black-box" nature hinders interpretability. Furthermore, restricted by binary classification paradigms, they struggle to provide root cause analysis and maintenance recommendations. To address these limitations, this paper proposes BatteryAgent, a hierarchical framework that integrates physical knowledge features with the reasoning capabilities of Large Language Models (LLMs). The framework comprises three core modules: (1) A Physical Perception Layer that utilizes 10 mechanism-based features derived from electrochemical principles, balancing dimensionality reduction with physical fidelity; (2) A Detection and Attribution Layer that employs Gradient Boosting Decision Trees and SHAP to quantify feature contributions; and (3) A Reasoning and Diagnosis Layer that leverages an LLM as the agent core. This layer constructs a "numerical-semantic" bridge, combining SHAP attributions with a mechanism knowledge base to generate comprehensive reports containing fault types, root cause analysis, and maintenance suggestions. Experimental results demonstrate that BatteryAgent effectively corrects misclassifications on hard boundary samples, achieving an AUROC of 0.986, which significantly outperforms current state-of-the-art methods. Moreover, the framework extends traditional binary detection to multi-type interpretable diagnosis, offering a new paradigm shift from "passive detection" to "intelligent diagnosis" for battery safety management.
Waste-to-Energy-Coupled AI Data Centers: Cooling Efficiency and Grid Resilience
AI data-center expansion is increasingly constrained by the coupled availability of deliverable electricity and heat-rejection (cooling) capacity. We propose and evaluate an integrated Waste-to-Energy-AI Data Center configuration that treats cooling as a first-class energy service rather than an unavoidable electricity burden. The coupled system is modeled as an input-output 'black box' with transparent boundaries and a standalone benchmark in which mechanical chilling is powered by grid electricity. The central mechanism is energy-grade matching: low-grade WtE thermal output drives absorption cooling to deliver chilled service, thereby displacing baseline cooling electricity. We show that thermoeconomic superiority is governed by three first-order determinants, (i) cooling coverage of IT heat load, (ii) parasitic electricity for transport and auxiliaries, and (iii) distance-driven delivery decay, yielding a break-even corridor beyond which net benefits vanish. Comparative statics characterize sensitivity to IT utilization, feedstock quality (waste LHV and throughput), climate parameterization, and corridor distance. We translate these accounting gains into decision language through a computable prototype for Levelized Cost of Computing (LCOC) and an ESG valuation channel grounded in measurable mechanisms, without re-deriving full lifecycle inventories. The framework provides siting-ready feasibility conditions for WtE-AIDC coupling in urban AI corridors under grid stress.
VLA-RAIL: A Real-Time Asynchronous Inference Linker for VLA Models and Robots
Vision-Language-Action (VLA) models have achieved remarkable breakthroughs in robotics, with the action chunk playing a dominant role in these advances. Given the real-time and continuous nature of robotic motion control, the strategies for fusing a queue of successive action chunks have a profound impact on the overall performance of VLA models. Existing methods suffer from jitter, stalling, or even pauses in robotic action execution, which not only limits the achievable execution speed but also reduces the overall success rate of task completion. This paper introduces VLA-RAIL (A Real-Time Asynchronous Inference Linker), a novel framework designed to address these issues by conducting model inference and robot motion control asynchronously and guaranteeing smooth, continuous, and high-speed action execution. The core contributions of the paper are two fold: a Trajectory Smoother that effectively filters out the noise and jitter in the trajectory of one action chunk using polynomial fitting and a Chunk Fuser that seamlessly align the current executing trajectory and the newly arrived chunk, ensuring position, velocity, and acceleration continuity between two successive action chunks. We validate the effectiveness of VLA-RAIL on a benchmark of dynamic simulation tasks and several real-world manipulation tasks. Experimental results demonstrate that VLA-RAIL significantly reduces motion jitter, enhances execution speed, and improves task success rates, which will become a key infrastructure for the large-scale deployment of VLA models.
Taking Advantage of Rational Canonical Form for Faster Ring-LWE based Encrypted Controller with Recursive Multiplication
This paper aims to provide an efficient implementation of encrypted linear dynamic controllers that perform recursive multiplications on a Ring-Learning With Errors (Ring-LWE) based cryptosystem. By adopting a system-theoretical approach, we significantly reduce both time and space complexities, particularly the number of homomorphic operations required for recursive multiplications. Rather than encrypting the entire state matrix of a given controller, the state matrix is transformed into its rational canonical form, whose sparse and circulant structure enables that encryption and computation are required only on its nontrivial columns. Furthermore, we propose a novel method to ``pack'' each of the input and the output matrices into a single polynomial, thereby reducing the number of homomorphic operations. Simulation results demonstrate that the proposed design enables a remarkably fast implementation of encrypted controllers.
comment: 8 pages, 1 figures, presented at 2025 IEEE Conference on Decision and Control
Decentralized No-Regret Frequency-Time Scheduling for FMCW Radar Interference Avoidance
Automotive FMCW radars are indispensable to modern ADAS and autonomous-driving systems, but their increasing density has intensified the risk of mutual interference. Existing mitigation techniques, including reactive receiver-side suppression, proactive waveform design, and cooperative scheduling, often face limitations in scalability, reliance on side-channel communication, or degradation of range-Doppler resolution. Building on our earlier work on decentralized Frequency-Domain No-Regret hopping, this paper introduces a unified time-frequency game-theoretic framework that enables radars to adapt across both spectral and temporal resources. We formulate the interference-avoidance problem as a repeated anti-coordination game, in which each radar autonomously updates a mixed strategy over frequency subbands and chirp-level time offsets using regret-minimization dynamics. We show that the proposed Time-Frequency No-Regret Hopping algorithm achieves vanishing external and swap regret, and that the induced empirical play converges to an $\varepsilon$-coarse correlated equilibrium or a correlated equilibrium. Theoretical analysis provides regret bounds in the joint domain, revealing how temporal adaptation implicitly regularizes frequency selection and enhances robustness against asynchronous interference. Numerical experiments with multi-radar scenarios demonstrate substantial improvements in SINR, collision rate, and range-Doppler quality compared with time-frequency random hopping and centralized Nash-based benchmarks.
A Graph Neural Network with Auxiliary Task Learning for Missing PMU Data Reconstruction
In wide-area measurement systems (WAMS), phasor measurement unit (PMU) measurement is prone to data missingness due to hardware failures, communication delays, and cyber-attacks. Existing data-driven methods are limited by inadaptability to concept drift in power systems, poor robustness under high missing rates, and reliance on the unrealistic assumption of full system observability. Thus, this paper proposes an auxiliary task learning (ATL) method for reconstructing missing PMU data. First, a K-hop graph neural network (GNN) is proposed to enable direct learning on the subgraph consisting of PMU nodes, overcoming the limitation of the incompletely observable system. Then, an auxiliary learning framework consisting of two complementary graph networks is designed for accurate reconstruction: a spatial-temporal GNN extracts spatial-temporal dependencies from PMU data to reconstruct missing values, and another auxiliary GNN utilizes the low-rank property of PMU data to achieve unsupervised online learning. In this way, the low-rank properties of the PMU data are dynamically leveraged across the architecture to ensure robustness and self-adaptation. Numerical results demonstrate the superior offline and online performance of the proposed method under high missing rates and incomplete observability.
Some remarks on stochastic converse Lyapunov theorems
In this brief note, we investigate some constructions of Lyapunov functions for stochastic discrete-time stabilizable dynamical systems, in other words, controlled Markov chains. The main question here is whether a Lyapunov function in some statistical sense exists if the respective controlled Markov chain admits a stabilizing policy. We demonstrate some constructions extending on the classical results for deterministic systems. Some limitations of the constructed Lyapunov functions for stabilization are discussed, particularly for stabilization in mean. Although results for deterministic systems are well known, the stochastic case was addressed in less detail, which the current paper remarks on. A distinguishable feature of this work is the study of stabilizers that possess computationally tractable convergence certificates.
comment: 19 pages. Accepted for Elsevier/Automatica
Value of Multi-pursuer Single-evader Pursuit-evasion Game with Terminal Cost of Evader's Position: Relaxation of Convexity Condition
In this study, we consider a multi-pursuer single-evader quantitative pursuit-evasion game with payoff function that includes only the terminal cost. The terminal cost is a function related only to the terminal position of the evader. This problem has been extensively studied in target defense games. Here, we prove that a candidate for the value function generated by geometric method is the viscosity solution of the corresponding Hamilton-Jacobi-Isaacs partial differential equation (HJI PDE) Dirichlet problem. Therefore, the value function of the game at each point can be computed by a mathematical program. In our work, the convexity of the terminal cost or the target is not required. The terminal cost only needs to be locally Lipschitz continuous. The cases in which the terminal costs or the targets are not convex are covered. Therefore, our result is more universal than those of previous studies, and the complexity of the proof is improved. We also discuss the optimal strategies in this game and present an intuitive explanation of this value function.
comment: 21 pages, 6 figures
Agentic AI Systems in Electrical Power Systems Engineering: Current State-of-the-Art and Challenges
Agentic AI systems have recently emerged as a critical and transformative approach in artificial intelligence, offering capabilities that extend far beyond traditional AI agents and contemporary generative AI models. This rapid evolution necessitates a clear conceptual and taxonomical understanding to differentiate this new paradigm. Our paper addresses this gap by providing a comprehensive review that establishes a precise definition and taxonomy for "agentic AI," with the aim of distinguishing it from previous AI paradigms. The concepts are gradually introduced, starting with a highlight of its diverse applications across the broader field of engineering. The paper then presents four detailed, state-of-the-art use case applications specifically within electrical engineering. These case studies demonstrate practical impact, ranging from an advanced agentic framework for streamlining complex power system studies and benchmarking to a novel system developed for survival analysis of dynamic pricing strategies in battery swapping stations. Finally, to ensure robust deployment, the paper provides detailed failure mode investigations. From these findings, we derive actionable recommendations for the design and implementation of safe, reliable, and accountable agentic AI systems, offering a critical resource for researchers and practitioners.
Hybrid A* Path Planning with Multi-Modal Motion Extension for Four-Wheel Steering Mobile Robots
Four-wheel independent steering (4WIS) systems provide mobile robots with a rich set of motion modes, such as Ackermann steering, lateral steering, and parallel movement, offering superior maneuverability in constrained environments. However, existing path planning methods generally assume a single kinematic model and thus fail to fully exploit the multi-modal capabilities of 4WIS platforms. To address this limitation, we propose an extended Hybrid A* framework that operates in a four-dimensional state space incorporating both spatial states and motion modes. Within this framework, we design multi-modal Reeds-Shepp curves tailored to the distinct kinematic constraints of each motion mode, develop an enhanced heuristic function that accounts for mode-switching costs, and introduce a terminal connection strategy with intelligent mode selection to ensure smooth transitions between different steering patterns. The proposed planner enables seamless integration of multiple motion modalities within a single path, significantly improving flexibility and adaptability in complex environments. Results demonstrate significantly improved planning performance for 4WIS robots in complex environments.
comment: Updated method details, parameters, and experimental scenarios
Robotics
What Drives Success in Physical Planning with Joint-Embedding Predictive World Models?
A long-standing challenge in AI is to develop agents capable of solving a wide range of physical tasks and generalizing to new, unseen tasks and environments. A popular recent approach involves training a world model from state-action trajectories and subsequently use it with a planning algorithm to solve new tasks. Planning is commonly performed in the input space, but a recent family of methods has introduced planning algorithms that optimize in the learned representation space of the world model, with the promise that abstracting irrelevant details yields more efficient planning. In this work, we characterize models from this family as JEPA-WMs and investigate the technical choices that make algorithms from this class work. We propose a comprehensive study of several key components with the objective of finding the optimal approach within the family. We conducted experiments using both simulated environments and real-world robotic data, and studied how the model architecture, the training objective, and the planning algorithm affect planning success. We combine our findings to propose a model that outperforms two established baselines, DINO-WM and V-JEPA-2-AC, in both navigation and manipulation tasks. Code, data and checkpoints are available at https://github.com/facebookresearch/jepa-wms.
Energy-Aware Bayesian Control Barrier Functions for Physics-Informed Gaussian Process Dynamics
We study safe control for dynamical systems whose continuous-time dynamics are learned with Gaussian processes (GPs), focusing on mechanical and port-Hamiltonian systems where safety is naturally expressed via energy constraints. The availability of a GP Hamiltonian posterior naturally raises the question of how to systematically exploit this structure to design an energy-aware control barrier function with high-probability safety guarantees. We address this problem by developing a Bayesian-CBF framework and instantiating it with energy-aware Bayesian-CBFs (EB-CBFs) that construct conservative energy-based barriers directly from the Hamiltonian and vector-field posteriors, yielding safety filters that minimally modify a nominal controller while providing probabilistic energy safety guarantees. Numerical simulations on a mass-spring system demonstrate that the proposed EB-CBFs achieve high-probability safety under noisy sampled GP-learned dynamics.
Foundation models on the bridge: Semantic hazard detection and safety maneuvers for maritime autonomy with vision-language models
The draft IMO MASS Code requires autonomous and remotely supervised maritime vessels to detect departures from their operational design domain, enter a predefined fallback that notifies the operator, permit immediate human override, and avoid changing the voyage plan without approval. Meeting these obligations in the alert-to-takeover gap calls for a short-horizon, human-overridable fallback maneuver. Classical maritime autonomy stacks struggle when the correct action depends on meaning (e.g., diver-down flag means people in the water, fire close by means hazard). We argue (i) that vision-language models (VLMs) provide semantic awareness for such out-of-distribution situations, and (ii) that a fast-slow anomaly pipeline with a short-horizon, human-overridable fallback maneuver makes this practical in the handover window. We introduce Semantic Lookout, a camera-only, candidate-constrained vision-language model (VLM) fallback maneuver selector that selects one cautious action (or station-keeping) from water-valid, world-anchored trajectories under continuous human authority. On 40 harbor scenes we measure per-call scene understanding and latency, alignment with human consensus (model majority-of-three voting), short-horizon risk-relief on fire hazard scenes, and an on-water alert->fallback maneuver->operator handover. Sub-10 s models retain most of the awareness of slower state-of-the-art models. The fallback maneuver selector outperforms geometry-only baselines and increases standoff distance on fire scenes. A field run verifies end-to-end operation. These results support VLMs as semantic fallback maneuver selectors compatible with the draft IMO MASS Code, within practical latency budgets, and motivate future work on domain-adapted, hybrid autonomy that pairs foundation-model semantics with multi-sensor bird's-eye-view perception and short-horizon replanning.
comment: 17 pages without bibliography or appendix. The main paper has 16 figures
Subsecond 3D Mesh Generation for Robot Manipulation
3D meshes are a fundamental representation widely used in computer science and engineering. In robotics, they are particularly valuable because they capture objects in a form that aligns directly with how robots interact with the physical world, enabling core capabilities such as predicting stable grasps, detecting collisions, and simulating dynamics. Although automatic 3D mesh generation methods have shown promising progress in recent years, potentially offering a path toward real-time robot perception, two critical challenges remain. First, generating high-fidelity meshes is prohibitively slow for real-time use, often requiring tens of seconds per object. Second, mesh generation by itself is insufficient. In robotics, a mesh must be contextually grounded, i.e., correctly segmented from the scene and registered with the proper scale and pose. Additionally, unless these contextual grounding steps remain efficient, they simply introduce new bottlenecks. In this work, we introduce an end-to-end system that addresses these challenges, producing a high-quality, contextually grounded 3D mesh from a single RGB-D image in under one second. Our pipeline integrates open-vocabulary object segmentation, accelerated diffusion-based mesh generation, and robust point cloud registration, each optimized for both speed and accuracy. We demonstrate its effectiveness in a real-world manipulation task, showing that it enables meshes to be used as a practical, on-demand representation for robotics perception and planning.
comment: In submission
Counterfactual VLA: Self-Reflective Vision-Language-Action Model with Adaptive Reasoning
Recent reasoning-augmented Vision-Language-Action (VLA) models have improved the interpretability of end-to-end autonomous driving by generating intermediate reasoning traces. Yet these models primarily describe what they perceive and intend to do, rarely questioning whether their planned actions are safe or appropriate. This work introduces Counterfactual VLA (CF-VLA), a self-reflective VLA framework that enables the model to reason about and revise its planned actions before execution. CF-VLA first generates time-segmented meta-actions that summarize driving intent, and then performs counterfactual reasoning conditioned on both the meta-actions and the visual context. This step simulates potential outcomes, identifies unsafe behaviors, and outputs corrected meta-actions that guide the final trajectory generation. To efficiently obtain such self-reflective capabilities, we propose a rollout-filter-label pipeline that mines high-value scenes from a base (non-counterfactual) VLA's rollouts and labels counterfactual reasoning traces for subsequent training rounds. Experiments on large-scale driving datasets show that CF-VLA improves trajectory accuracy by up to 17.6%, enhances safety metrics by 20.5%, and exhibits adaptive thinking: it only enables counterfactual reasoning in challenging scenarios. By transforming reasoning traces from one-shot descriptions to causal self-correction signals, CF-VLA takes a step toward self-reflective autonomous driving agents that learn to think before they act.
Fast and Realistic Automated Scenario Simulations and Reporting for an Autonomous Racing Stack
In this paper, we describe the automated simulation and reporting pipeline implemented for our autonomous racing stack, ur.autopilot. The backbone of the simulation is based on a high-fidelity model of the vehicle interfaced as a Functional Mockup Unit (FMU). The pipeline can execute the software stack and the simulation up to three times faster than real-time, locally or on GitHub for Continuous Integration/- Continuous Delivery (CI/CD). As the most important input of the pipeline, there is a set of running scenarios. Each scenario allows the initialization of the ego vehicle in different initial conditions (position and speed), as well as the initialization of any other configuration of the stack. This functionality is essential to validate efficiently critical modules, like the one responsible for high-speed overtaking maneuvers or localization, which are among the most challenging aspects of autonomous racing. Moreover, we describe how we implemented a fault injection module, capable of introducing sensor delays and perturbations as well as modifying outputs of any node of the stack. Finally, we describe the design of our automated reporting process, aimed at maximizing the effectiveness of the simulation analysis.
comment: Accepted to the 2026 IEEE/SICE International Symposium on System Integration (SII 2026)
Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems
The rapid advancement of autonomous systems, including self-driving vehicles and drones, has intensified the need to forge true Spatial Intelligence from multi-modal onboard sensor data. While foundation models excel in single-modal contexts, integrating their capabilities across diverse sensors like cameras and LiDAR to create a unified understanding remains a formidable challenge. This paper presents a comprehensive framework for multi-modal pre-training, identifying the core set of techniques driving progress toward this goal. We dissect the interplay between foundational sensor characteristics and learning strategies, evaluating the role of platform-specific datasets in enabling these advancements. Our central contribution is the formulation of a unified taxonomy for pre-training paradigms: ranging from single-modality baselines to sophisticated unified frameworks that learn holistic representations for advanced tasks like 3D object detection and semantic occupancy prediction. Furthermore, we investigate the integration of textual inputs and occupancy representations to facilitate open-world perception and planning. Finally, we identify critical bottlenecks, such as computational efficiency and model scalability, and propose a roadmap toward general-purpose multi-modal foundation models capable of achieving robust Spatial Intelligence for real-world deployment.
comment: Preprint; 38 pages, 7 figures, 9 tables; GitHub at https://github.com/worldbench/awesome-spatial-intelligence
Geometric Multi-Session Map Merging with Learned Local Descriptors
Multi-session map merging is crucial for extended autonomous operations in large-scale environments. In this paper, we present GMLD, a learning-based local descriptor framework for large-scale multi-session point cloud map merging that systematically aligns maps collected across different sessions with overlapping regions. The proposed framework employs a keypoint-aware encoder and a plane-based geometric transformer to extract discriminative features for loop closure detection and relative pose estimation. To further improve global consistency, we include inter-session scan matching cost factors in the factor-graph optimization stage. We evaluate our framework on the public datasets, as well as self-collected data from diverse environments. The results show accurate and robust map merging with low error, and the learned features deliver strong performance in both loop closure detection and relative pose estimation.
New Insights into Cascaded Geometric Flight Control: From Performance Guarantees to Practical Pitfalls
We present a new stability proof for cascaded geometric control used by aerial vehicles tracking time-varying position trajectories. Our approach uses sliding variables and a recently proposed quaternion-based sliding controller to demonstrate that exponentially convergent position trajectory tracking is theoretically possible. Notably, our analysis reveals new aspects of the control strategy, including how tracking error in the attitude loop influences the position loop, how model uncertainties affect the closed-loop system, and the practical pitfalls of the control architecture.
comment: V1
3D Path-Following Guidance via Nonlinear Model Predictive Control for Fixed-Wing Small UAS
This paper presents the design, implementation, and flight test results of two novel 3D path-following guidance algorithms based on nonlinear model predictive control (MPC), with specific application to fixed-wing small uncrewed aircraft systems. To enable MPC, control-augmented modelling and system identification of the RAAVEN small uncrewed aircraft is presented. Two formulations of MPC are then showcased. The first schedules a static reference path rate over the MPC horizon, incentivizing a constant inertial speed. The second, with inspiration from model predictive contouring control, dynamically optimizes for the reference path rate over the controller horizon as the system operates. This allows for a weighted tradeoff between path progression and distance from path, two competing objectives in path-following guidance. Both controllers are formulated to operate over general smooth 3D arc-length parameterized curves. The MPC guidance algorithms are flown over several high-curvature test paths, with comparison to a baseline lookahead guidance law. The results showcase the real-world feasibility and superior performance of nonlinear MPC for 3D path-following guidance at ground speeds up to 36 meters per second.
UniAct: Unified Motion Generation and Action Streaming for Humanoid Robots
A long-standing objective in humanoid robotics is the realization of versatile agents capable of following diverse multimodal instructions with human-level flexibility. Despite advances in humanoid control, bridging high-level multimodal perception with whole-body execution remains a significant bottleneck. Existing methods often struggle to translate heterogeneous instructions -- such as language, music, and trajectories -- into stable, real-time actions. Here we show that UniAct, a two-stage framework integrating a fine-tuned MLLM with a causal streaming pipeline, enables humanoid robots to execute multimodal instructions with sub-500 ms latency. By unifying inputs through a shared discrete codebook via FSQ, UniAct ensures cross-modal alignment while constraining motions to a physically grounded manifold. This approach yields a 19% improvement in the success rate of zero-shot tracking of imperfect reference motions. We validate UniAct on UniMoCap, our 20-hour humanoid motion benchmark, demonstrating robust generalization across diverse real-world scenarios. Our results mark a critical step toward responsive, general-purpose humanoid assistants capable of seamless interaction through unified perception and control.
comment: Project page: https://jnnan.github.io/uniact/
World In Your Hands: A Large-Scale and Open-source Ecosystem for Learning Human-centric Manipulation in the Wild
Large-scale pre-training is fundamental for generalization in language and vision models, but data for dexterous hand manipulation remains limited in scale and diversity, hindering policy generalization. Limited scenario diversity, misaligned modalities, and insufficient benchmarking constrain current human manipulation datasets. To address these gaps, we introduce World In Your Hands (WiYH), a large-scale open-source ecosystem for human-centric manipulation learning. WiYH includes (1) the Oracle Suite, a wearable data collection kit with an auto-labeling pipeline for accurate motion capture; (2) the WiYH Dataset, featuring over 1,000 hours of multi-modal manipulation data across hundreds of skills in diverse real-world scenarios; and (3) extensive annotations and benchmarks supporting tasks from perception to action. Furthermore, experiments based on the WiYH ecosystem show that integrating WiYH's human-centric data significantly enhances the generalization and robustness of dexterous hand policies in tabletop manipulation tasks. We believe that World In Your Hands will bring new insights into human-centric data collection and policy learning to the community.
Real-world Reinforcement Learning from Suboptimal Interventions
Real-world reinforcement learning (RL) offers a promising approach to training precise and dexterous robotic manipulation policies in an online manner, enabling robots to learn from their own experience while gradually reducing human labor. However, prior real-world RL methods often assume that human interventions are optimal across the entire state space, overlooking the fact that even expert operators cannot consistently provide optimal actions in all states or completely avoid mistakes. Indiscriminately mixing intervention data with robot-collected data inherits the sample inefficiency of RL, while purely imitating intervention data can ultimately degrade the final performance achievable by RL. The question of how to leverage potentially suboptimal and noisy human interventions to accelerate learning without being constrained by them thus remains open. To address this challenge, we propose SiLRI, a state-wise Lagrangian reinforcement learning algorithm for real-world robot manipulation tasks. Specifically, we formulate the online manipulation problem as a constrained RL optimization, where the constraint bound at each state is determined by the uncertainty of human interventions. We then introduce a state-wise Lagrange multiplier and solve the problem via a min-max optimization, jointly optimizing the policy and the Lagrange multiplier to reach a saddle point. Built upon a human-as-copilot teleoperation system, our algorithm is evaluated through real-world experiments on diverse manipulation tasks. Experimental results show that SiLRI effectively exploits human suboptimal interventions, reducing the time required to reach a 90% success rate by at least 50% compared with the state-of-the-art RL method HIL-SERL, and achieving a 100% success rate on long-horizon manipulation tasks where other RL methods struggle to succeed. Project website: https://silri-rl.github.io/.
DRL-TH: Jointly Utilizing Temporal Graph Attention and Hierarchical Fusion for UGV Navigation in Crowded Environments
Deep reinforcement learning (DRL) methods have demonstrated potential for autonomous navigation and obstacle avoidance of unmanned ground vehicles (UGVs) in crowded environments. Most existing approaches rely on single-frame observation and employ simple concatenation for multi-modal fusion, which limits their ability to capture temporal context and hinders dynamic adaptability. To address these challenges, we propose a DRL-based navigation framework, DRL-TH, which leverages temporal graph attention and hierarchical graph pooling to integrate historical observations and adaptively fuse multi-modal information. Specifically, we introduce a temporal-guided graph attention network (TG-GAT) that incorporates temporal weights into attention scores to capture correlations between consecutive frames, thereby enabling the implicit estimation of scene evolution. In addition, we design a graph hierarchical abstraction module (GHAM) that applies hierarchical pooling and learnable weighted fusion to dynamically integrate RGB and LiDAR features, achieving balanced representation across multiple scales. Extensive experiments demonstrate that our DRL-TH outperforms existing methods in various crowded environments. We also implemented DRL-TH control policy on a real UGV and showed that it performed well in real world scenarios.
Safe Sliding Mode Control for Marine Vessels Using High-Order Control Barrier Functions and Fast Projection
This paper presents a novel safe control framework that integrates Sliding Mode Control (SMC), High-Order Control Barrier Functions (HOCBFs) with state-dependent adaptiveness and a lightweight projection for collision-free navigation of an over-actuated 3-DOF marine surface vessel subjected to strong environmental disturbances (wind, waves, and current). SMC provides robustness to matched disturbances common in marine operations, while HOCBFs enforce forward invariance of obstacle-avoidance constraints. A fast half-space projection method adjusts the SMC control only when needed, preserving robustness and minimizing chattering. The approach is evaluated on a nonlinear marine platform model that includes added mass, hydrodynamic damping, and full thruster allocation. Simulation results show robust navigation, guaranteed obstacle avoidance, and computational efficiency suitable for real-time embedded use. For small marine robots and surface vessels with limited onboard computational resources-where execution speed and computational efficiency are critical-the SMC-HOCBF framework constitutes a strong candidate for safety-critical control.
Local Path Optimization in The Latent Space Using Learned Distance Gradient IROS 2025
Constrained motion planning is a common but challenging problem in robotic manipulation. In recent years, data-driven constrained motion planning algorithms have shown impressive planning speed and success rate. Among them, the latent motion method based on manifold approximation is the most efficient planning algorithm. Due to errors in manifold approximation and the difficulty in accurately identifying collision conflicts within the latent space, time-consuming path validity checks and path replanning are required. In this paper, we propose a method that trains a neural network to predict the minimum distance between the robot and obstacles using latent vectors as inputs. The learned distance gradient is then used to calculate the direction of movement in the latent space to move the robot away from obstacles. Based on this, a local path optimization algorithm in the latent space is proposed, and it is integrated with the path validity checking process to reduce the time of replanning. The proposed method is compared with state-of-the-art algorithms in multiple planning scenarios, demonstrating the fastest planning speed
comment: This paper has been published in IROS 2025
Heteroscedastic Bayesian Optimization-Based Dynamic PID Tuning for Accurate and Robust UAV Trajectory Tracking IROS 2025
Unmanned Aerial Vehicles (UAVs) play an important role in various applications, where precise trajectory tracking is crucial. However, conventional control algorithms for trajectory tracking often exhibit limited performance due to the underactuated, nonlinear, and highly coupled dynamics of quadrotor systems. To address these challenges, we propose HBO-PID, a novel control algorithm that integrates the Heteroscedastic Bayesian Optimization (HBO) framework with the classical PID controller to achieve accurate and robust trajectory tracking. By explicitly modeling input-dependent noise variance, the proposed method can better adapt to dynamic and complex environments, and therefore improve the accuracy and robustness of trajectory tracking. To accelerate the convergence of optimization, we adopt a two-stage optimization strategy that allow us to more efficiently find the optimal controller parameters. Through experiments in both simulation and real-world scenarios, we demonstrate that the proposed method significantly outperforms state-of-the-art (SOTA) methods. Compared to SOTA methods, it improves the position accuracy by 24.7% to 42.9%, and the angular accuracy by 40.9% to 78.4%.
comment: Accepted by IROS 2025 (2025 IEEE/RSJ International Conference on Intelligent Robots and Systems)
RANGER: A Monocular Zero-Shot Semantic Navigation Framework through Contextual Adaptation
Efficiently finding targets in complex environments is fundamental to real-world embodied applications. While recent advances in multimodal foundation models have enabled zero-shot object goal navigation, allowing robots to search for arbitrary objects without fine-tuning, existing methods face two key limitations: (1) heavy reliance on precise depth and pose information provided by simulators, which restricts applicability in real-world scenarios; and (2) lack of in-context learning (ICL) capability, making it difficult to quickly adapt to new environments, as in leveraging short videos. To address these challenges, we propose RANGER, a novel zero-shot, open-vocabulary semantic navigation framework that operates using only a monocular camera. Leveraging powerful 3D foundation models, RANGER eliminates the dependency on depth and pose while exhibiting strong ICL capability. By simply observing a short video of a new environment, the system can also significantly improve task efficiency without requiring architectural modifications or fine-tuning. The framework integrates several key components: keyframe-based 3D reconstruction, semantic point cloud generation, vision-language model (VLM)-driven exploration value estimation, high-level adaptive waypoint selection, and low-level action execution. Experiments on the HM3D benchmark and real-world environments demonstrate that RANGER achieves competitive performance in terms of navigation success rate and exploration efficiency, while showing superior ICL adaptability, with no previous 3D mapping of the environment required.
GR-Dexter Technical Report
Vision-language-action (VLA) models have enabled language-conditioned, long-horizon robot manipulation, but most existing systems are limited to grippers. Scaling VLA policies to bimanual robots with high degree-of-freedom (DoF) dexterous hands remains challenging due to the expanded action space, frequent hand-object occlusions, and the cost of collecting real-robot data. We present GR-Dexter, a holistic hardware-model-data framework for VLA-based generalist manipulation on a bimanual dexterous-hand robot. Our approach combines the design of a compact 21-DoF robotic hand, an intuitive bimanual teleoperation system for real-robot data collection, and a training recipe that leverages teleoperated robot trajectories together with large-scale vision-language and carefully curated cross-embodiment datasets. Across real-world evaluations spanning long-horizon everyday manipulation and generalizable pick-and-place, GR-Dexter achieves strong in-domain performance and improved robustness to unseen objects and unseen instructions. We hope GR-Dexter serves as a practical step toward generalist dexterous-hand robotic manipulation.
ROBOPOL: Social Robotics Meets Vehicular Communications for Cooperative Automated Driving
On the way towards full autonomy, sharing roads between automated vehicles and human actors in so-called mixed traffic is unavoidable. Moreover, even if all vehicles on the road were autonomous, pedestrians would still be crossing the streets. We propose social robots as moderators between autonomous vehicles and vulnerable road users (VRU). To this end, we identify four enablers requiring integration: (1) advanced perception, allowing the robot to see the environment; (2) vehicular communications allowing connected vehicles to share intentions and the robot to send guiding commands; (3) social human-robot interaction allowing the robot to effectively communicate with VRUs and drivers; (4) formal specification allowing the robot to understand traffic and plan accordingly. This paper presents an overview of the key enablers and report on a first proof-of-concept integration of the first three enablers envisioning a social robot advising pedestrians in scenarios with a cooperative automated e-bike.
Unified Embodied VLM Reasoning with Robotic Action via Autoregressive Discretized Pre-training
General-purpose robotic systems operating in open-world environments must achieve both broad generalization and high-precision action execution, a combination that remains challenging for existing Vision-Language-Action (VLA) models. While large Vision-Language Models (VLMs) improve semantic generalization, insufficient embodied reasoning leads to brittle behavior, and conversely, strong reasoning alone is inadequate without precise control. To provide a decoupled and quantitative assessment of this bottleneck, we introduce Embodied Reasoning Intelligence Quotient (ERIQ), a large-scale embodied reasoning benchmark in robotic manipulation, comprising 6K+ question-answer pairs across four reasoning dimensions. By decoupling reasoning from execution, ERIQ enables systematic evaluation and reveals a strong positive correlation between embodied reasoning capability and end-to-end VLA generalization. To bridge the gap from reasoning to precise execution, we propose FACT, a flow-matching-based action tokenizer that converts continuous control into discrete sequences while preserving high-fidelity trajectory reconstruction. The resulting GenieReasoner jointly optimizes reasoning and action in a unified space, outperforming both continuous-action and prior discrete-action baselines in real-world tasks. Together, ERIQ and FACT provide a principled framework for diagnosing and overcoming the reasoning-precision trade-off, advancing robust, general-purpose robotic manipulation.
RflyUT-Sim: A Simulation Platform for Development and Testing of Complex Low-Altitude Traffic Control
Significant challenges are posed by simulation and testing in the field of low-altitude unmanned aerial vehicle (UAV) traffic due to the high costs associated with large-scale UAV testing and the complexity of establishing low-altitude traffic test scenarios. Stringent safety requirements make high fidelity one of the key metrics for simulation platforms. Despite advancements in simulation platforms for low-altitude UAVs, there is still a shortage of platforms that feature rich traffic scenarios, high-precision UAV and scenario simulators, and comprehensive testing capabilities for low-altitude traffic. Therefore, this paper introduces an integrated high-fidelity simulation platform for low-altitude UAV traffic. This platform simulates all components of the UAV traffic network, including the control system, the traffic management system, the UAV system, the communication network , the anomaly and fault modules, etc. Furthermore, it integrates RflySim/AirSim and Unreal Engine 5 to develop full-state models of UAVs and 3D maps that model the real world using the oblique photogrammetry technique. Additionally, the platform offers a wide range of interfaces, and all models and scenarios can be customized with a high degree of flexibility. The platform's source code has been released, making it easier to conduct research related to low-altitude traffic.
Guided Diffusion-based Generation of Adversarial Objects for Real-World Monocular Depth Estimation Attacks
Monocular Depth Estimation (MDE) serves as a core perception module in autonomous driving systems, but it remains highly susceptible to adversarial attacks. Errors in depth estimation may propagate through downstream decision making and influence overall traffic safety. Existing physical attacks primarily rely on texture-based patches, which impose strict placement constraints and exhibit limited realism, thereby reducing their effectiveness in complex driving environments. To overcome these limitations, this work introduces a training-free generative adversarial attack framework that generates naturalistic, scene-consistent adversarial objects via a diffusion-based conditional generation process. The framework incorporates a Salient Region Selection module that identifies regions most influential to MDE and a Jacobian Vector Product Guidance mechanism that steers adversarial gradients toward update directions supported by the pre-trained diffusion model. This formulation enables the generation of physically plausible adversarial objects capable of inducing substantial adversarial depth shifts. Extensive digital and physical experiments demonstrate that our method significantly outperforms existing attacks in effectiveness, stealthiness, and physical deployability, underscoring its strong practical implications for autonomous driving safety assessment.
Evaluation of Impression Difference of a Domestic Mobile Manipulator with Autonomous and/or Remote Control in Fetch-and-Carry Tasks
A single service robot can present two distinct agencies: its onboard autonomy and an operator-mediated agency, yet users experience them through one physical body. We formalize this dual-agency structure as a User-Robot-Operator triad in an autonomous remote-control setting that combines autonomous execution with remote human support. Prior to the recent surge of language-based and multimodal interfaces, we developed and evaluated an early-stage prototype in 2020 that combined natural-language text chat with freehand sketch annotations over the robot's live camera view to support remote intervention. We evaluated three modes - autonomous, remote, and hybrid - in controlled fetch-and-carry tasks using a domestic mobile manipulator (HSR) on a World Robot Summit 2020 rule-compliant test field. The results show systematic mode-dependent differences in user-rated affinity and additional insights on perceived security, indicating that switching or blending agency within one robot measurably shapes human impressions. These findings provide empirical guidance for designing human-in-the-loop mobile manipulation in domestic physical tasks.
comment: Published in Advanced Robotics (2020). This paper defines an autonomous remote-control setting that makes the User-Robot-Operator triadic interaction explicit in physical tasks, and reports empirical differences in affinity and perceived security across autonomous, remote, and hybrid modes. Please cite: Advanced Robotics 34(20):1291-1308, 2020. DOI:10.1080/01691864.2020.1780152
SHIELD: Spherical-Projection Hybrid-Frontier Integration for Efficient LiDAR-based Drone Exploration
This paper introduces SHIELD, a Spherical-Projection Hybrid-Frontier Integration for Efficient LiDAR-based Drone exploration method. Although laser LiDAR offers the advantage of a wide field of view, its application in UAV exploration still faces several challenges. The observation quality of LiDAR point clouds is generally inferior to that of depth cameras. Traditional frontier methods based on known and unknown regions impose a heavy computational burden, especially when handling the wide field of view of LiDAR. In addition, regions without point cloud are also difficult to classify as free space through raycasting. To address these problems, the SHIELD is proposed. It maintains an observation-quality occupancy map and performs ray-casting on this map to address the issue of inconsistent point-cloud quality during exploration. A hybrid frontier method is used to tackle both the computational burden and the limitations of point-cloud quality exploration. In addition, an outward spherical-projection ray-casting strategy is proposed to jointly ensure flight safety and exploration efficiency in open areas. Simulations and flight experiments prove the effectiveness of SHIELD. This work will be open-sourced to contribute to the research community.
Cross-embodied Co-design for Dexterous Hands
Dexterous manipulation is limited by both control and design, without consensus as to what makes manipulators best for performing dexterous tasks. This raises a fundamental challenge: how should we design and control robot manipulators that are optimized for dexterity? We present a co-design framework that learns task-specific hand morphology and complementary dexterous control policies. The framework supports 1) an expansive morphology search space including joint, finger, and palm generation, 2) scalable evaluation across the wide design space via morphology-conditioned cross-embodied control, and 3) real-world fabrication with accessible components. We evaluate the approach across multiple dexterous tasks, including in-hand rotation with simulation and real deployment. Our framework enables an end-to-end pipeline that can design, train, fabricate, and deploy a new robotic hand in under 24 hours. The full framework will be open-sourced and available on our website: https://an-axolotl.github.io/co-design-for-dexterity.github.io/
A New Software Tool for Generating and Visualizing Robot Self-Collision Matrices
In robotics, it is common to check whether a given robot state results in self-intersection (i.e., a self-collision query) or to assess its distance from such an intersection (i.e., a self-proximity query). These checks are typically performed between pairs of shapes attached to different robot links. However, many of these shape pairs can be excluded in advance, as their configurations are known to always or never result in contact. This information is typically encoded in a self-collision matrix, where each entry (i, j) indicates whether a check should be performed between shape i and shape j. While the MoveIt Setup Assistant is widely used to generate such matrices, current tools are limited by static visualization, lack of proximity support, rigid single-geometry assumptions, and tedious refinement workflows, hindering flexibility and reuse in downstream robotics applications. In this work, we introduce an interactive tool that overcomes these limitations by generating and visualizing self-collision matrices across multiple shape representations, enabling dynamic inspection, filtering, and refinement of shape pairs. Outputs are provided in both JSON and YAML for easy integration. The system is implemented in Rust and uses the Bevy game engine to deliver high-quality visualizations. We demonstrate its effectiveness on multiple robot platforms, showing that matrices generated using diverse shape types yield faster and more accurate self-collision and self-proximity queries.
Beyond URDF: The Universal Robot Description Directory for Shared, Extensible, and Standardized Robot Models
Robots are typically described in software by specification files (e.g., URDF, SDF, MJCF, USD) that encode only basic kinematic, dynamic, and geometric information. As a result, downstream applications such as simulation, planning, and control must repeatedly re-derive richer data, leading to redundant computations, fragmented implementations, and limited standardization. In this work, we introduce the Universal Robot Description Directory (URDD), a modular representation that organizes derived robot information into structured, easy-to-parse JSON and YAML modules. Our open-source toolkit automatically generates URDDs from URDFs, with a Rust implementation supporting Bevy-based visualization. Additionally, we provide a JavaScript/Three.js viewer for web-based inspection of URDDs. Experiments on multiple robot platforms show that URDDs can be generated efficiently, encapsulate substantially richer information than standard specification files, and directly enable the construction of core robotics subroutines. URDD provides a unified, extensible resource for reducing redundancy and establishing shared standards across robotics frameworks. We conclude with a discussion on the limitations and implications of our work.
APOLLO Blender: A Robotics Library for Visualization and Animation in Blender
High-quality visualizations are an essential part of robotics research, enabling clear communication of results through figures, animations, and demonstration videos. While Blender is a powerful and freely available 3D graphics platform, its steep learning curve and lack of robotics-focused integrations make it difficult and time-consuming for researchers to use effectively. In this work, we introduce a lightweight software library that bridges this gap by providing simple scripting interfaces for common robotics visualization tasks. The library offers three primary capabilities: (1) importing robots and environments directly from standardized descriptions such as URDF; (2) Python-based scripting tools for keyframing robot states and visual attributes; and (3) convenient generation of primitive 3D shapes for schematic figures and animations. Together, these features allow robotics researchers to rapidly create publication-ready images, animations, and explanatory schematics without needing extensive Blender expertise. We demonstrate the library through a series of proof-of-concept examples and conclude with a discussion of current limitations and opportunities for future extensions.
Flowing from Reasoning to Motion: Learning 3D Hand Trajectory Prediction from Egocentric Human Interaction Videos
Prior works on 3D hand trajectory prediction are constrained by datasets that decouple motion from semantic supervision and by models that weakly link reasoning and action. To address these, we first present the EgoMAN dataset, a large-scale egocentric dataset for interaction stage-aware 3D hand trajectory prediction with 219K 6DoF trajectories and 3M structured QA pairs for semantic, spatial, and motion reasoning. We then introduce the EgoMAN model, a reasoning-to-motion framework that links vision-language reasoning and motion generation via a trajectory-token interface. Trained progressively to align reasoning with motion dynamics, our approach yields accurate and stage-aware trajectories with generalization across real-world scenes.
comment: Project website: https://egoman-project.github.io
OSVI-WM: One-Shot Visual Imitation for Unseen Tasks using World-Model-Guided Trajectory Generation
Visual imitation learning enables robotic agents to acquire skills by observing expert demonstration videos. In the one-shot setting, the agent generates a policy after observing a single expert demonstration without additional fine-tuning. Existing approaches typically train and evaluate on the same set of tasks, varying only object configurations, and struggle to generalize to unseen tasks with different semantic or structural requirements. While some recent methods attempt to address this, they exhibit low success rates on hard test tasks that, despite being visually similar to some training tasks, differ in context and require distinct responses. Additionally, most existing methods lack an explicit model of environment dynamics, limiting their ability to reason about future states. To address these limitations, we propose a novel framework for one-shot visual imitation learning via world-model-guided trajectory generation. Given an expert demonstration video and the agent's initial observation, our method leverages a learned world model to predict a sequence of latent states and actions. This latent trajectory is then decoded into physical waypoints that guide the agent's execution. Our method is evaluated on two simulated benchmarks and three real-world robotic platforms, where it consistently outperforms prior approaches, with over 30% improvement in some cases. The code is available at https://github.com/raktimgg/osvi-wm.
SurgWorld: Learning Surgical Robot Policies from Videos via World Modeling
Data scarcity remains a fundamental barrier to achieving fully autonomous surgical robots. While large scale vision language action (VLA) models have shown impressive generalization in household and industrial manipulation by leveraging paired video action data from diverse domains, surgical robotics suffers from the paucity of datasets that include both visual observations and accurate robot kinematics. In contrast, vast corpora of surgical videos exist, but they lack corresponding action labels, preventing direct application of imitation learning or VLA training. In this work, we aim to alleviate this problem by learning policy models from SurgWorld, a world model designed for surgical physical AI. We curated the Surgical Action Text Alignment (SATA) dataset with detailed action description specifically for surgical robots. Then we built SurgeWorld based on the most advanced physical AI world model and SATA. It's able to generate diverse, generalizable and realistic surgery videos. We are also the first to use an inverse dynamics model to infer pseudokinematics from synthetic surgical videos, producing synthetic paired video action data. We demonstrate that a surgical VLA policy trained with these augmented data significantly outperforms models trained only on real demonstrations on a real surgical robot platform. Our approach offers a scalable path toward autonomous surgical skill acquisition by leveraging the abundance of unlabeled surgical video and generative world modeling, thus opening the door to generalizable and data efficient surgical robot policies.
Reproducibility in the Control of Autonomous Mobility-on-Demand Systems
Autonomous Mobility-on-Demand (AMoD) systems, powered by advances in robotics, control, and Machine Learning (ML), offer a promising paradigm for future urban transportation. AMoD offers fast and personalized travel services by leveraging centralized control of autonomous vehicle fleets to optimize operations and enhance service performance. However, the rapid growth of this field has outpaced the development of standardized practices for evaluating and reporting results, leading to significant challenges in reproducibility. As AMoD control algorithms become increasingly complex and data-driven, a lack of transparency in modeling assumptions, experimental setups, and algorithmic implementation hinders scientific progress and undermines confidence in the results. This paper presents a systematic study of reproducibility in AMoD research. We identify key components across the research pipeline, spanning system modeling, control problems, simulation design, algorithm specification, and evaluation, and analyze common sources of irreproducibility. We survey prevalent practices in the literature, highlight gaps, and propose a structured framework to assess and improve reproducibility. Specifically, concrete guidelines are offered, along with a "reproducibility checklist", to support future work in achieving replicable, comparable, and extensible results. While focused on AMoD, the principles and practices we advocate generalize to a broader class of cyber-physical systems that rely on networked autonomy and data-driven control. This work aims to lay the foundation for a more transparent and reproducible research culture in the design and deployment of intelligent mobility systems.
Contingency Model-based Control (CMC) for Communicationless Cooperative Collision Avoidance in Robot Swarms
Cooperative collision avoidance between robots, or `agents,' in swarm operations remains an open challenge. Assuming a decentralized architecture, each agent is responsible for making its own decisions and choosing its control actions. Most existing approaches rely on a (wireless) communication network between (some of) the agents. In reality, however, communication is brittle. It may be affected by latency, further delays and packet losses, and transmission faults. Moreover, it is subject to adversarial attacks, such as jamming or spoofing. This paper proposes Contingency Model-based Control (CMC), a decentralized cooperative approach that does not rely on communication. Instead, the control algorithm is based on consensual rules that are designed for all agents offline, similar to traffic rules. For CMC, this includes the definition of a contingency trajectory for each robot, and perpendicular bisecting planes as collision avoidance constraints. The setup permits a full guarantee of recursive feasibility and collision avoidance between all swarm members in closed-loop operation. CMC naturally satisfies the plug & play paradigm, i.e., new robots may enter the swarm dynamically. The effectiveness of the CMC regime is demonstrated in two numerical examples, showing that the collision avoidance guarantee is intact and the robot swarm operates smoothly in a constrained environment.
Holistic Evaluation of Multimodal LLMs on Spatial Intelligence
Multimodal models have achieved remarkable progress in recent years. Nevertheless, they continue to exhibit notable limitations in spatial understanding and reasoning, the very capability that anchors artificial general intelligence in the physical world. With the recent release of GPT-5, allegedly the most powerful AI model to date, it is timely to examine where the leading models (GPT, Gemini, Grok, Seed, Qwen, and Intern) stand on the path toward spatial intelligence (SI). We thus propose EASI for holistic Evaluation of multimodAl LLMs on Spatial Intelligence. EASI conceptualizes a comprehensive taxonomy of spatial tasks that unifies existing benchmarks and a growing collection of newly curated ones, enabling systematic evaluation of state-of-the-art models. In this report, we conduct the study across eight key benchmarks, at a cost exceeding ten billion total tokens. Our empirical study then reveals that (1) GPT-5 demonstrates unprecedented strength in SI, yet (2) still falls short of human performance significantly across a broad spectrum of SI-tasks. Moreover, we (3) show that SI-tasks expose greater model capability deficiency than non-SI tasks, to the extent that (4) proprietary models do not exhibit a decisive advantage when facing the most difficult ones. In addition, we conduct a qualitative evaluation across a diverse set of scenarios that are intuitive for humans, yet fail the most advanced multimodal models. EASI is an ongoing community effort: we have open-sourced the EASI codebase that provides a one-stop and reproducible solution with standardized interfaces, integrated protocols and prompts that significantly reduce the friction of configuring and running multiple benchmarks; we have also launched an accompanying EASI leaderboard to provide a continually updated snapshot of model performance across the full SI spectrum, accelerating collective progress toward robust SI.
comment: Codebase: https://github.com/EvolvingLMMs-Lab/EASI/ ; Leaderboard: https://huggingface.co/spaces/lmms-lab-si/EASI-Leaderboard
Scaling Spatial Intelligence with Multimodal Foundation Models
Despite remarkable progress, multimodal foundation models still exhibit surprising deficiencies in spatial intelligence. In this work, we explore scaling up multimodal foundation models to cultivate spatial intelligence within the SenseNova-SI family, built upon established multimodal foundations including visual understanding models (i.e., Qwen3-VL and InternVL3) and unified understanding and generation models (i.e., Bagel). We take a principled approach to constructing high-performing and robust spatial intelligence by systematically curating SenseNova-SI-8M: eight million diverse data samples under a rigorous taxonomy of spatial capabilities. SenseNova-SI demonstrates unprecedented performance across a broad range of spatial intelligence benchmarks: 68.7% on VSI-Bench, 43.3% on MMSI, 85.6% on MindCube, 54.6% on ViewSpatial, and 50.1% on SITE, while maintaining strong general multimodal understanding (e.g., 84.9% on MMBench-En). More importantly, we analyze the impact of data scaling, discuss early signs of emergent generalization capabilities enabled by diverse data training, analyze the risk of overfitting and language shortcuts, present a preliminary study on spatial chain-of-thought reasoning, and validate the potential downstream application. SenseNova-SI is an ongoing project, and this report will be updated continuously. All newly trained multimodal foundation models are publicly released to facilitate further research in this direction.
comment: Codebase: https://github.com/OpenSenseNova/SenseNova-SI ; Models: https://huggingface.co/collections/sensenova/sensenova-si
How Robot Dogs See the Unseeable: Improving Visual Interpretability via Peering for Exploratory Robots
In vegetated environments, such as forests, exploratory robots play a vital role in navigating complex, cluttered environments where human access is limited and traditional equipment struggles. Visual occlusion from obstacles, such as foliage, can severely obstruct a robot's sensors, impairing scene understanding. We show that "peering", a characteristic side-to-side movement used by insects to overcome their visual limitations, can also allow robots to markedly improve visual reasoning under partial occlusion. This is accomplished by applying core signal processing principles, specifically optical synthetic aperture sensing, together with the vision reasoning capabilities of modern large multimodal models. Peering enables real-time, high-resolution, and wavelength-independent perception, which is crucial for vision-based scene understanding across a wide range of applications. The approach is low-cost and immediately deployable on any camera-equipped robot. We investigated different peering motions and occlusion masking strategies, demonstrating that, unlike peering, state-of-the-art multi-view 3D vision techniques fail in these conditions due to their high susceptibility to occlusion. Our experiments were carried out on an industrial-grade quadrupedal robot. However, the ability to peer is not limited to such platforms, but potentially also applicable to bipedal, hexapod, wheeled, or crawling platforms. Robots that can effectively see through partial occlusion will gain superior perception abilities - including enhanced scene understanding, situational awareness, camouflage breaking, and advanced navigation in complex environments.
OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction
Vision-Language-Action (VLA) models aim to predict robotic actions based on visual observations and language instructions. Existing approaches require fine-tuning pre-trained visionlanguage models (VLMs) as visual and language features are independently fed into downstream policies, degrading the pre-trained semantic alignments. We propose OTTER, a novel VLA architecture that leverages these existing alignments through explicit, text-aware visual feature extraction. Instead of processing all visual features, OTTER selectively extracts and passes only task-relevant visual features that are semantically aligned with the language instruction to the policy transformer. This allows OTTER to keep the pre-trained vision-language encoders frozen. Thereby, OTTER preserves and utilizes the rich semantic understanding learned from large-scale pre-training, enabling strong zero-shot generalization capabilities. In simulation and real-world experiments, OTTER significantly outperforms existing VLA models, demonstrating strong zeroshot generalization to novel objects and environments. Video, code, checkpoints, and dataset: https://ottervla.github.io/.
Analysis of Errors in Robotic Surgical Skill Acquisition with Video-Based Detection
Robot-assisted minimally invasive surgeries offer many advantages but require complex motor tasks that take surgeons years to master. There is currently a lack of knowledge on how surgeons acquire these robotic surgical skills. Toward bridging this gap, a previous study followed surgical residents learning complex surgical dry lab tasks on a surgical robot over six months. Errors are an important measure for training and skill evaluation, but unlike in virtual simulations, in dry lab training, errors are difficult to monitor automatically. Here, we analyzed errors in the ring tower transfer task, in which surgical residents moved a ring along a curved wire as quickly and accurately as possible. We developed an image-processing algorithm using color and size thresholds, optical flow and short time Fourier transforms to detect collision errors and achieved a detection accuracy of approximately 95%. Using the detected errors and task completion time, we found that the residents reduced their completion time and number of errors over the six months, while the percentage of task time spent making errors remained relatively constant on average. This analysis sheds light on the learning process of the residents and can serve as a step towards providing error-related feedback to robotic surgeons.
comment: 9 pages, 4 figures, 1 table. Alex Geftler and Ilana Nisky contributed equally to this work
CAML: Collaborative Auxiliary Modality Learning for Multi-Agent Systems
Multi-modal learning has emerged as a key technique for improving performance across domains such as autonomous driving, robotics, and reasoning. However, in certain scenarios, particularly in resource-constrained environments, some modalities available during training may be absent during inference. While existing frameworks effectively utilize multiple data sources during training and enable inference with reduced modalities, they are primarily designed for single-agent settings. This poses a critical limitation in dynamic environments such as connected autonomous vehicles (CAV), where incomplete data coverage can lead to decision-making blind spots. Conversely, some works explore multi-agent collaboration but without addressing missing modality at test time. To overcome these limitations, we propose Collaborative Auxiliary Modality Learning (CAML), a novel multi-modal multi-agent framework that enables agents to collaborate and share multi-modal data during training, while allowing inference with reduced modalities during testing. Experimental results in collaborative decision-making for CAV in accident-prone scenarios demonstrate that CAML achieves up to a 58.1% improvement in accident detection. Additionally, we validate CAML on real-world aerial-ground robot data for collaborative semantic segmentation, achieving up to a 10.6% improvement in mIoU.
RoboMirror: Understand Before You Imitate for Video to Humanoid Locomotion
Humans learn locomotion through visual observation, interpreting visual content first before imitating actions. However, state-of-the-art humanoid locomotion systems rely on either curated motion capture trajectories or sparse text commands, leaving a critical gap between visual understanding and control. Text-to-motion methods suffer from semantic sparsity and staged pipeline errors, while video-based approaches only perform mechanical pose mimicry without genuine visual understanding. We propose RoboMirror, the first retargeting-free video-to-locomotion framework embodying "understand before you imitate". Leveraging VLMs, it distills raw egocentric/third-person videos into visual motion intents, which directly condition a diffusion-based policy to generate physically plausible, semantically aligned locomotion without explicit pose reconstruction or retargeting. Extensive experiments validate the effectiveness of RoboMirror, it enables telepresence via egocentric videos, drastically reduces third-person control latency by 80%, and achieves a 3.7% higher task success rate than baselines. By reframing humanoid control around video understanding, we bridge the visual understanding and action gap.
PolaRiS: Scalable Real-to-Sim Evaluations for Generalist Robot Policies
A significant challenge for robot learning research is our ability to accurately measure and compare the performance of robot policies. Benchmarking in robotics is historically challenging due to the stochasticity, reproducibility, and time-consuming nature of real-world rollouts. This challenge is exacerbated for recent generalist policies, which has to be evaluated across a wide variety of scenes and tasks. Evaluation in simulation offers a scalable complement to real world evaluations, but the visual and physical domain gap between existing simulation benchmarks and the real world has made them an unreliable signal for policy improvement. Furthermore, building realistic and diverse simulated environments has traditionally required significant human effort and expertise. To bridge the gap, we introduce Policy Evaluation and Environment Reconstruction in Simulation (PolaRiS), a scalable real-to-sim framework for high-fidelity simulated robot evaluation. PolaRiS utilizes neural reconstruction methods to turn short video scans of real-world scenes into interactive simulation environments. Additionally, we develop a simple simulation data co-training recipe that bridges remaining real-to-sim gaps and enables zero-shot evaluation in unseen simulation environments. Through extensive paired evaluations between simulation and the real world, we demonstrate that PolaRiS evaluations provide a much stronger correlation to real world generalist policy performance than existing simulated benchmarks. Its simplicity also enables rapid creation of diverse simulated environments. As such, this work takes a step towards distributed and democratized evaluation for the next generation of robotic foundation models.
comment: Website: https://polaris-evals.github.io/
Multiagent Systems
MaRCA: Multi-Agent Reinforcement Learning for Dynamic Computation Allocation in Large-Scale Recommender Systems
Modern recommender systems face significant computational challenges due to growing model complexity and traffic scale, making efficient computation allocation critical for maximizing business revenue. Existing approaches typically simplify multi-stage computation resource allocation, neglecting inter-stage dependencies, thus limiting global optimality. In this paper, we propose MaRCA, a multi-agent reinforcement learning framework for end-to-end computation resource allocation in large-scale recommender systems. MaRCA models the stages of a recommender system as cooperative agents, using Centralized Training with Decentralized Execution (CTDE) to optimize revenue under computation resource constraints. We introduce an AutoBucket TestBench for accurate computation cost estimation, and a Model Predictive Control (MPC)-based Revenue-Cost Balancer to proactively forecast traffic loads and adjust the revenue-cost trade-off accordingly. Since its end-to-end deployment in the advertising pipeline of a leading global e-commerce platform in November 2024, MaRCA has consistently handled hundreds of billions of ad requests per day and has delivered a 16.67% revenue uplift using existing computation resources.
comment: 12 pages, 5 figures
SCP: Accelerating Discovery with a Global Web of Autonomous Scientific Agents
We introduce SCP: the Science Context Protocol, an open-source standard designed to accelerate discovery by enabling a global network of autonomous scientific agents. SCP is built on two foundational pillars: (1) Unified Resource Integration: At its core, SCP provides a universal specification for describing and invoking scientific resources, spanning software tools, models, datasets, and physical instruments. This protocol-level standardization enables AI agents and applications to discover, call, and compose capabilities seamlessly across disparate platforms and institutional boundaries. (2) Orchestrated Experiment Lifecycle Management: SCP complements the protocol with a secure service architecture, which comprises a centralized SCP Hub and federated SCP Servers. This architecture manages the complete experiment lifecycle (registration, planning, execution, monitoring, and archival), enforces fine-grained authentication and authorization, and orchestrates traceable, end-to-end workflows that bridge computational and physical laboratories. Based on SCP, we have constructed a scientific discovery platform that offers researchers and agents a large-scale ecosystem of more than 1,600 tool resources. Across diverse use cases, SCP facilitates secure, large-scale collaboration between heterogeneous AI systems and human researchers while significantly reducing integration overhead and enhancing reproducibility. By standardizing scientific context and tool orchestration at the protocol level, SCP establishes essential infrastructure for scalable, multi-institution, agent-driven science.
An Comparative Analysis about KYC on a Recommendation System Toward Agentic Recommendation System
This research presents a cutting-edge recommendation system utilizing agentic AI for KYC (Know Your Customer in the financial domain), and its evaluation across five distinct content verticals: Advertising (Ad), News, Gossip, Sharing (User-Generated Content), and Technology (Tech). The study compares the performance of four experimental groups, grouping by the intense usage of KYC, benchmarking them against the Normalized Discounted Cumulative Gain (nDCG) metric at truncation levels of $k=1$, $k=3$, and $k=5$. By synthesizing experimental data with theoretical frameworks and industry benchmarks from platforms such as Baidu and Xiaohongshu, this research provides insight by showing experimental results for engineering a large-scale agentic recommendation system.
comment: 5 pages, 1 figure
Simulation-Free PSRO: Removing Game Simulation from Policy Space Response Oracles
Policy Space Response Oracles (PSRO) combines game-theoretic equilibrium computation with learning and is effective in approximating Nash Equilibrium in zero-sum games. However, the computational cost of PSRO has become a significant limitation to its practical application. Our analysis shows that game simulation is the primary bottleneck in PSRO's runtime. To address this issue, we conclude the concept of Simulation-Free PSRO and summarize existing methods that instantiate this concept. Additionally, we propose a novel Dynamic Window-based Simulation-Free PSRO, which introduces the concept of a strategy window to replace the original strategy set maintained in PSRO. The number of strategies in the strategy window is limited, thereby simplifying opponent strategy selection and improving the robustness of the best response. Moreover, we use Nash Clustering to select the strategy to be eliminated, ensuring that the number of strategies within the strategy window is effectively limited. Our experiments across various environments demonstrate that the Dynamic Window mechanism significantly reduces exploitability compared to existing methods, while also exhibiting excellent compatibility. Our code is available at https://github.com/enochliu98/SF-PSRO.
KernelEvolve: Scaling Agentic Kernel Coding for Heterogeneous AI Accelerators at Meta
Making deep learning recommendation model (DLRM) training and inference fast and efficient is important. However, this presents three key system challenges - model architecture diversity, kernel primitive diversity, and hardware generation and architecture heterogeneity. This paper presents KernelEvolve-an agentic kernel coding framework-to tackle heterogeneity at-scale for DLRM. KernelEvolve is designed to take kernel specifications as input and automate the process of kernel generation and optimization for recommendation model across heterogeneous hardware architectures. KernelEvolve does so by operating at multiple programming abstractions, from Triton and CuTe DSL to low-level hardware agnostic languages, spanning the full hardware-software optimization stack. The kernel optimization process is described as graph-based search with selection policy, universal operator, fitness function, and termination rule, dynamically adapts to runtime execution context through retrieval-augmented prompt synthesis. We designed, implemented, and deployed KernelEvolve to optimize a wide variety of production recommendation models across generations of NVIDIA and AMD GPUs, as well as Meta's AI accelerators. We validate KernelEvolve on the publicly-available KernelBench suite, achieving 100% pass rate on all 250 problems across three difficulty levels, and 160 PyTorch ATen operators across three heterogeneous hardware platforms, demonstrating 100% correctness. KernelEvolve reduces development time from weeks to hours and achieves substantial performance improvements over PyTorch baselines across diverse production use cases and for heterogeneous AI systems at-scale. Beyond performance efficiency improvements, KernelEvolve significantly mitigates the programmability barrier for new AI hardware by enabling automated kernel generation for in-house developed AI hardware.
Toward Autonomous Engineering Design: A Knowledge-Guided Multi-Agent Framework
The engineering design process often demands expertise from multiple domains, leading to complex collaborations and iterative refinements. Traditional methods can be resource-intensive and prone to inefficiencies. To address this, we formalize the engineering design process through a multi-agent AI framework that integrates structured design and review loops. The framework introduces specialized knowledge-driven agents that collaborate to generate and refine design candidates. As an exemplar, we demonstrate its application to the aerodynamic optimization of 4-digit NACA airfoils. The framework consists of three key AI agents: a Graph Ontologist, a Design Engineer, and a Systems Engineer. The Graph Ontologist employs a Large Language Model (LLM) to construct two domain-specific knowledge graphs from airfoil design literature. The Systems Engineer, informed by a human manager, formulates technical requirements that guide design generation and evaluation. The Design Engineer leverages the design knowledge graph and computational tools to propose candidate airfoils meeting these requirements. The Systems Engineer reviews and provides feedback both qualitative and quantitative using its own knowledge graph, forming an iterative feedback loop until a design is validated by the manager. The final design is then optimized to maximize performance metrics such as the lift-to-drag ratio. Overall, this work demonstrates how collaborative AI agents equipped with structured knowledge representations can enhance efficiency, consistency, and quality in the engineering design process.
comment: Added appendices and updated literature review
Systems and Control (EESS)
Energy-Aware Bayesian Control Barrier Functions for Physics-Informed Gaussian Process Dynamics
We study safe control for dynamical systems whose continuous-time dynamics are learned with Gaussian processes (GPs), focusing on mechanical and port-Hamiltonian systems where safety is naturally expressed via energy constraints. The availability of a GP Hamiltonian posterior naturally raises the question of how to systematically exploit this structure to design an energy-aware control barrier function with high-probability safety guarantees. We address this problem by developing a Bayesian-CBF framework and instantiating it with energy-aware Bayesian-CBFs (EB-CBFs) that construct conservative energy-based barriers directly from the Hamiltonian and vector-field posteriors, yielding safety filters that minimally modify a nominal controller while providing probabilistic energy safety guarantees. Numerical simulations on a mass-spring system demonstrate that the proposed EB-CBFs achieve high-probability safety under noisy sampled GP-learned dynamics.
Design of Linear Residual Generators for Combined Fault Detection and Estimation in Nonlinear Systems
A systematic method for the design of linear residual generators for combined fault detection and estimation in nonlinear systems is developed. The proposed residual generator is a linear functional observer built for an extended system that incorporates the fault dynamics from a linear exo-system, and in addition possesses disturbance-decoupling properties. Necessary and sufficient conditions for the existence of such residual generators for nonlinear systems are derived. As long as these conditions are satisfied, we obtain explicit design formulas for the residual generator. The results are illustrated through a chemical reactor case study, which demonstrates the effectiveness of the proposed methodology.
Multipliers for forced Lurye systems with slope-restricted nonlinearities
Dynamic multipliers can be used to guarantee the stability of Lurye systems with slope-restricted nonlinearities, but give no guarantee that the closed-loop system has finite incremental gain. We show that multipliers guarantee the closed-loop power gain to be bounded and quantifiable. Power may be measured about an appropriate steady state bias term, provided the multiplier does not require the nonlinearity to be odd. Hence dynamic multipliers can be used to guarantee such Lurye systems have low sensitivity to noise, provided other exogenous signals have constant steady state. For periodic excitation, the closed-loop response can apparently have a subharmonic or chaotic response. We revisit a class of multipliers that can guarantee a unique, attractive and period-preserving solution. We show the multipliers can be derived using classical tools and reconsider assumptions required for their application. Their phase limitations are inherited from those of discrete-time multipliers. The multipliers cannot be used at all frequencies unless the circle criterion can also be applied; this is consistent with known results about dynamic multipliers and incremental stability.
comment: 16 pages, 14 figures, submitted for review to IEEE Transactions on Automatic Control
Adaptive Learning Guided by Bias-Noise-Alignment Diagnostics
Learning systems deployed in nonstationary and safety-critical environments often suffer from instability, slow convergence, or brittle adaptation when learning dynamics evolve over time. While modern optimization, reinforcement learning, and meta-learning methods adapt to gradient statistics, they largely ignore the temporal structure of the error signal itself. This paper proposes a diagnostic-driven adaptive learning framework that explicitly models error evolution through a principled decomposition into bias, capturing persistent drift; noise, capturing stochastic variability; and alignment, capturing repeated directional excitation leading to overshoot. These diagnostics are computed online from lightweight statistics of loss or temporal-difference error trajectories and are independent of model architecture or task domain. We show that the proposed bias-noise-alignment decomposition provides a unifying control backbone for supervised optimization, actor-critic reinforcement learning, and learned optimizers. Building on this framework, we derive diagnostic-driven instantiations including a stabilized supervised optimizer, a diagnostic-regulated actor-critic scheme, and a diagnostic-conditioned learned optimizer. Under standard smoothness assumptions, we establish bounded effective updates and stability properties for all cases. Representative diagnostic illustrations in actor-critic learning highlight how the proposed signals modulate adaptation in response to temporal-difference error structure. Overall, this work elevates error evolution to a first-class object in adaptive learning and provides an interpretable, lightweight foundation for reliable learning in dynamic environments.
comment: This preprint focuses on the theoretical framework and diagnostic behavior. Comprehensive experimental validation in application-specific settings is deferred to a companion experimental study
Bayesian Subspace Identification in the MIMO Case
This report investigates the extension of the Bayesian Subspace System Identification method proposed in our previous work to the Multiple-Input Multiple-Output (MIMO) case. We derive new equivariant priors and posterior distributions specifically suited for the MIMO framework. Numerical results utilizing the DAISY dataset are reported to validate the approach.
Fast and Realistic Automated Scenario Simulations and Reporting for an Autonomous Racing Stack
In this paper, we describe the automated simulation and reporting pipeline implemented for our autonomous racing stack, ur.autopilot. The backbone of the simulation is based on a high-fidelity model of the vehicle interfaced as a Functional Mockup Unit (FMU). The pipeline can execute the software stack and the simulation up to three times faster than real-time, locally or on GitHub for Continuous Integration/- Continuous Delivery (CI/CD). As the most important input of the pipeline, there is a set of running scenarios. Each scenario allows the initialization of the ego vehicle in different initial conditions (position and speed), as well as the initialization of any other configuration of the stack. This functionality is essential to validate efficiently critical modules, like the one responsible for high-speed overtaking maneuvers or localization, which are among the most challenging aspects of autonomous racing. Moreover, we describe how we implemented a fault injection module, capable of introducing sensor delays and perturbations as well as modifying outputs of any node of the stack. Finally, we describe the design of our automated reporting process, aimed at maximizing the effectiveness of the simulation analysis.
comment: Accepted to the 2026 IEEE/SICE International Symposium on System Integration (SII 2026)
New Insights into Cascaded Geometric Flight Control: From Performance Guarantees to Practical Pitfalls
We present a new stability proof for cascaded geometric control used by aerial vehicles tracking time-varying position trajectories. Our approach uses sliding variables and a recently proposed quaternion-based sliding controller to demonstrate that exponentially convergent position trajectory tracking is theoretically possible. Notably, our analysis reveals new aspects of the control strategy, including how tracking error in the attitude loop influences the position loop, how model uncertainties affect the closed-loop system, and the practical pitfalls of the control architecture.
comment: V1
Safe Sliding Mode Control for Marine Vessels Using High-Order Control Barrier Functions and Fast Projection
This paper presents a novel safe control framework that integrates Sliding Mode Control (SMC), High-Order Control Barrier Functions (HOCBFs) with state-dependent adaptiveness and a lightweight projection for collision-free navigation of an over-actuated 3-DOF marine surface vessel subjected to strong environmental disturbances (wind, waves, and current). SMC provides robustness to matched disturbances common in marine operations, while HOCBFs enforce forward invariance of obstacle-avoidance constraints. A fast half-space projection method adjusts the SMC control only when needed, preserving robustness and minimizing chattering. The approach is evaluated on a nonlinear marine platform model that includes added mass, hydrodynamic damping, and full thruster allocation. Simulation results show robust navigation, guaranteed obstacle avoidance, and computational efficiency suitable for real-time embedded use. For small marine robots and surface vessels with limited onboard computational resources-where execution speed and computational efficiency are critical-the SMC-HOCBF framework constitutes a strong candidate for safety-critical control.
Now or Never: Continuous Surveillance AIoT System for Ephemeral Events in Intermittent Sensor Networks
Wilderness monitoring tasks, such as poaching surveillance and forest fire detection, require pervasive and high-accuracy sensing. While AIoT offers a promising path, covering vast, inaccessible regions necessitates the massive deployment of maintenance-free, battery-less nodes with limited computational resources. However, these constraints create a critical `Availability Gap.' Conventional intermittent operations prioritize computation throughput, forcing sensors to sleep during energy buffering. Consequently, systems miss ephemeral, `now-or-never' events (e.g., Vocalizations of natural monuments or Fire), which is fatal for detecting rare but high-stakes anomalies. To address this, we propose an Energy-aware Elastic Split Computing Algorithm that prioritizes continuous sensing by dynamically offloading tasks to energy-rich neighbors. Preliminary results demonstrate stable monitoring of an additional $2,496\;\text{m}^2$ and the capture of approximately 103 more critical events per day. Ultimately, this algorithm establishes a robust foundation for building resilient, fail-safe surveillance systems even on resource-constrained nodes.
Hybrid Voltage and Current Control Method for Harmonic Mitigation of Single-Phase AC Loads in DC Microgrids
DC microgrids provide an efficient framework for the interconnection of DC distributed energy resources (DERs) and DC loads. To continue to supply legacy single-phase AC loads, DC/AC converters can be integrated in the DC microgrid. The oscillatory instantaneous power of the single-phase AC load translates into a harmonic current on the converter's DC side, which increases the losses and causes unwanted voltage harmonics in the DC microgrid. To mitigate this issue, this paper proposes a hybrid voltage and current control method (HCM) for DERs. This scheme consists of an inner current control loop and an outer control layer which determines the reference for the inner loop. The outer control layer combines the DC voltage control loop with an output harmonic current control loop. This hybrid structure enables simultaneous regulation of the DC components of the DER output voltage and control of the harmonic component of the DER output current in accordance with the local single-phase AC load's demand. Frequency-domain analysis of the proposed method is presented to demonstrate the DC voltage and harmonic current loops are decoupled and there is no unwanted interaction between them. Additionally, time-domain response of the proposed scheme is validated through hardware-in-the-loop test results.
comment: This manuscript has been submitted to IEEE-IAS for journal publication
Discovering Optimal Robust Minimum Redundancy Arrays (RMRAs) through Exhaustive Search and Algebraic Formulation of a New Sub-Optimal RMRA
Modern sparse arrays are maximally economic in that they retain just as many sensors required to provide a specific aperture while maintaining a hole-free difference coarray. As a result, these are susceptible to the failure of even a single sensor. Contrarily, two-fold redundant sparse arrays (TFRSAs) and robust minimum redundancy arrays (RMRAs) ensure robustness against single-sensor failures due to their inherent redundancy in their coarrays. At present, optimal RMRA configurations are known only for arrays with sensor counts N=6 to N=10. To this end, this paper proposes two objectives: (i) developing a systematic algorithm to discover optimal RMRAs for N>10, and (ii) obtaining a new family of near-/sub-optimal RMRA that can be completely specified using closed-form expressions (CFEs). We solve the combinatorial optimization problem of finding RMRAs using an exhaustive search technique implemented in MATLAB. Optimal RMRAs for N = 11 to 14 were successfully found and near/sub-optimal arrays for N = 15 to 20 were determined using the proposed technique. As a byproduct of the exhaustive search, a large catalogue of valid near- and sub-optimal RMRAs was also obtained. In the second stage, CFEs for a new TFRSA were obtained by applying pattern mining and algebraic generalizations to the arrays obtained through exhaustive search. The proposed family enjoys CFEs for sensor positions, available aperture, and achievable degrees of freedom (DOFs). The CFEs have been thoroughly validated using MATLAB and are found to be valid for $N\geq8$. Hence, it can be concluded that the novelty of this work is two-fold: extending the catalogue of known optimal RMRAs and formulating a sub-optimal RMRA that abides by CFEs.
comment: 8 Pages, 2 Figures, IEEE Journal Format
ROBOPOL: Social Robotics Meets Vehicular Communications for Cooperative Automated Driving
On the way towards full autonomy, sharing roads between automated vehicles and human actors in so-called mixed traffic is unavoidable. Moreover, even if all vehicles on the road were autonomous, pedestrians would still be crossing the streets. We propose social robots as moderators between autonomous vehicles and vulnerable road users (VRU). To this end, we identify four enablers requiring integration: (1) advanced perception, allowing the robot to see the environment; (2) vehicular communications allowing connected vehicles to share intentions and the robot to send guiding commands; (3) social human-robot interaction allowing the robot to effectively communicate with VRUs and drivers; (4) formal specification allowing the robot to understand traffic and plan accordingly. This paper presents an overview of the key enablers and report on a first proof-of-concept integration of the first three enablers envisioning a social robot advising pedestrians in scenarios with a cooperative automated e-bike.
Economic and Technical Feasibility of V2G in Non-Road Mobile Machinery sector
This paper investigates the economic and technical feasibility of integrating Vehicle-to-Grid (V2G) technology in the Non-Road Mobile Machinery (NRMM) sector. These often-idling assets, with their substantial battery capacities, present a unique opportunity to participate in energy markets, providing grid services and generating additional revenue. A novel methodology is introduced that integrates Bayesian Optimization (BO) to optimize the energy infrastructure together with an operating strategy optimization to reduce the electricity costs while enhancing grid interaction. While the focus lies on the methodology, the financial opportunities for the use-case of an electric NRMM rental service will be presented. However, the study is limited by the availability of real-world data on the usage of electric NRMM and does not address regulatory challenges of V2G. Further research is needed to extend the model accuracy and validate these findings.
comment: Conference publication
Hardware Acceleration for Neural Networks: A Comprehensive Survey
Neural networks have become a dominant computational workload across cloud and edge platforms, but rapid growth in model size and deployment diversity has exposed hardware bottlenecks increasingly dominated by memory movement, communication, and irregular operators rather than peak arithmetic throughput. This survey reviews the technology landscape for hardware acceleration of deep learning, spanning GPUs and tensor-core architectures; domain-specific accelerators (e.g., TPUs/NPUs); FPGA-based designs; ASIC inference engines; and emerging LLM-serving accelerators such as LPUs (language processing units), alongside in-/near-memory computing and neuromorphic/analog approaches. We organize the space using a unified taxonomy across (i) workloads (CNNs, RNNs, GNNs, and Transformers/LLMs), (ii) execution settings (training vs.\ inference; datacenter vs.\ edge), and (iii) optimization levers (reduced precision, sparsity and pruning, operator fusion, compilation and scheduling, and memory-system/interconnect design). We synthesize key architectural ideas including systolic arrays, vector and SIMD engines, specialized attention and softmax kernels, quantization-aware datapaths, and high-bandwidth memory, and we discuss how software stacks and compilers bridge model semantics to hardware. Finally, we highlight open challenges -- including efficient long-context LLM inference (KV-cache management), robust support for dynamic and sparse workloads, energy- and security-aware deployment, and fair benchmarking -- and point to promising directions for the next generation of neural acceleration.
Hybrid dynamical systems modeling of power systems
The increasing integration of renewable energy sources has introduced complex dynamic behavior in power systems that challenge the adequacy of traditional continuous-time modeling approaches. These developments call for modeling frameworks that can capture the intricate interplay between continuous dynamics and discrete events characterizing modern grid operations. Hybrid dynamical systems offer a rigorous foundation for representing such mixed dynamics and have emerged as a valuable tool in power system analysis. Despite their potential, existing studies remain focused on isolated applications or case-specific implementations, offering limited generalizability and guidance for model selection. This paper addresses that gap by providing a comprehensive overview of hybrid modeling approaches relevant to power systems. It critically examines key formalisms, including hybrid automata, switched systems, and piecewise affine models, evaluating their respective strengths, limitations, and suitability across control, stability, and system design tasks. In doing so, the paper identifies open challenges and outlines future research directions to support the systematic application of hybrid methods in renewable-rich, converter-dominated power systems
Two-Stage Robust Optimal Operation of Distribution Networks Considering Renewable Energy and Demand Asymmetric Uncertainties
This paper presents a confidence level-based distributionally information gap decision theory (CL-DIGDT) framework for the two-stage robust optimal operation of distribution networks, aiming at deriving an optimal operational scheme capable of addressing asymmetric uncertainties related to renewable energy and load demands. Building on conventional IGDT, the proposed framework utilizes the confidence level to capture the asymmetric characteristics of uncertainties and maximize the risk-averse capability of the solution in a probabilistic manner. To account for the probabilistic consideration, the imprecise Dirichlet model is employed to construct the ambiguity sets of uncertainties, reducing reliance on precise probability distributions. Consequently, a two-stage robust optimal operation model for distribution networks using CL-DIGDT is developed. An iterative method is proposed to solve the model and determine the upper and lower bounds of the objective function. Case study demonstrates that the proposed approach yields a more robust and statistically optimized solution with required accuracy compared to existing method, contributing to a reduction in first-stage cost by 0.84%, second-stage average cost by 6.7%, and significantly increasing the reliability of the solution by 8%.
Contingency Model-based Control (CMC) for Communicationless Cooperative Collision Avoidance in Robot Swarms
Cooperative collision avoidance between robots, or `agents,' in swarm operations remains an open challenge. Assuming a decentralized architecture, each agent is responsible for making its own decisions and choosing its control actions. Most existing approaches rely on a (wireless) communication network between (some of) the agents. In reality, however, communication is brittle. It may be affected by latency, further delays and packet losses, and transmission faults. Moreover, it is subject to adversarial attacks, such as jamming or spoofing. This paper proposes Contingency Model-based Control (CMC), a decentralized cooperative approach that does not rely on communication. Instead, the control algorithm is based on consensual rules that are designed for all agents offline, similar to traffic rules. For CMC, this includes the definition of a contingency trajectory for each robot, and perpendicular bisecting planes as collision avoidance constraints. The setup permits a full guarantee of recursive feasibility and collision avoidance between all swarm members in closed-loop operation. CMC naturally satisfies the plug & play paradigm, i.e., new robots may enter the swarm dynamically. The effectiveness of the CMC regime is demonstrated in two numerical examples, showing that the collision avoidance guarantee is intact and the robot swarm operates smoothly in a constrained environment.
Deep Reinforcement Learning Optimization for Uncertain Nonlinear Systems via Event-Triggered Robust Adaptive Dynamic Programming
This work proposes a unified control architecture that couples a Reinforcement Learning (RL)-driven controller with a disturbance-rejection Extended State Observer (ESO), complemented by an Event-Triggered Mechanism (ETM) to limit unnecessary computations. The ESO is utilized to estimate the system states and the lumped disturbance in real time, forming the foundation for effective disturbance compensation. To obtain near-optimal behavior without an accurate system description, a value-iteration-based Adaptive Dynamic Programming (ADP) method is adopted for policy approximation. The inclusion of the ETM ensures that parameter updates of the learning module are executed only when the state deviation surpasses a predefined bound, thereby preventing excessive learning activity and substantially reducing computational load. A Lyapunov-oriented analysis is used to characterize the stability properties of the resulting closed-loop system. Numerical experiments further confirm that the developed approach maintains strong control performance and disturbance tolerance, while achieving a significant reduction in sampling and processing effort compared with standard time-triggered ADP schemes.
comment: 9 pages, 9 figures
Robotics
Robo-Dopamine: General Process Reward Modeling for High-Precision Robotic Manipulation
The primary obstacle for applying reinforcement learning (RL) to real-world robotics is the design of effective reward functions. While recently learning-based Process Reward Models (PRMs) are a promising direction, they are often hindered by two fundamental limitations: their reward models lack step-aware understanding and rely on single-view perception, leading to unreliable assessments of fine-grained manipulation progress; and their reward shaping procedures are theoretically unsound, often inducing a semantic trap that misguides policy optimization. To address these, we introduce Dopamine-Reward, a novel reward modeling method for learning a general-purpose, step-aware process reward model from multi-view inputs. At its core is our General Reward Model (GRM), trained on a vast 3,400+ hour dataset, which leverages Step-wise Reward Discretization for structural understanding and Multi-Perspective Reward Fusion to overcome perceptual limitations. Building upon Dopamine-Reward, we propose Dopamine-RL, a robust policy learning framework that employs a theoretically-sound Policy-Invariant Reward Shaping method, which enables the agent to leverage dense rewards for efficient self-improvement without altering the optimal policy, thereby fundamentally avoiding the semantic trap. Extensive experiments across diverse simulated and real-world tasks validate our approach. GRM achieves state-of-the-art accuracy in reward assessment, and Dopamine-RL built on GRM significantly improves policy learning efficiency. For instance, after GRM is adapted to a new task in a one-shot manner from a single expert trajectory, the resulting reward model enables Dopamine-RL to improve the policy from near-zero to 95% success with only 150 online rollouts (approximately 1 hour of real robot interaction), while retaining strong generalization across tasks. Project website: https://robo-dopamine.github.io
comment: 27 pages, 11 figures
The Bulldozer Technique: Efficient Elimination of Local Minima Traps for APF-Based Robot Navigation
Path planning is a fundamental component in autonomous mobile robotics, enabling a robot to navigate from its current location to a desired goal while avoiding obstacles. Among the various techniques, Artificial Potential Field (APF) methods have gained popularity due to their simplicity, real-time responsiveness, and low computational requirements. However, a major limitation of conventional APF approaches is the local minima trap problem, where the robot becomes stuck in a position with no clear direction toward the goal. This paper proposes a novel path planning technique, termed the Bulldozer, which addresses the local minima issue while preserving the inherent advantages of APF. The Bulldozer technique introduces a backfilling mechanism that systematically identifies and eliminates local minima regions by increasing their potential values, analogous to a bulldozer filling potholes in a road. Additionally, a ramp-based enhancement is incorporated to assist the robot in escaping trap areas when starting within a local minimum. The proposed technique is experimentally validated using a physical mobile robot across various maps with increasing complexity. Comparative analyses are conducted against standard APF, adaptive APF, and well-established planning algorithms such as A*, PRM, and RRT. Results demonstrate that the Bulldozer technique effectively resolves the local minima problem while achieving superior execution speed and competitive path quality. Furthermore, a kinematic tracking controller is employed to assess the smoothness and traceability of the planned paths, confirming their suitability for real-world execution.
Do You Have Freestyle? Expressive Humanoid Locomotion via Audio Control
Humans intuitively move to sound, but current humanoid robots lack expressive improvisational capabilities, confined to predefined motions or sparse commands. Generating motion from audio and then retargeting it to robots relies on explicit motion reconstruction, leading to cascaded errors, high latency, and disjointed acoustic-actuation mapping. We propose RoboPerform, the first unified audio-to-locomotion framework that can directly generate music-driven dance and speech-driven co-speech gestures from audio. Guided by the core principle of "motion = content + style", the framework treats audio as implicit style signals and eliminates the need for explicit motion reconstruction. RoboPerform integrates a ResMoE teacher policy for adapting to diverse motion patterns and a diffusion-based student policy for audio style injection. This retargeting-free design ensures low latency and high fidelity. Experimental validation shows that RoboPerform achieves promising results in physical plausibility and audio alignment, successfully transforming robots into responsive performers capable of reacting to audio.
RoboMirror: Understand Before You Imitate for Video to Humanoid Locomotion
Humans learn locomotion through visual observation, interpreting visual content first before imitating actions. However, state-of-the-art humanoid locomotion systems rely on either curated motion capture trajectories or sparse text commands, leaving a critical gap between visual understanding and control. Text-to-motion methods suffer from semantic sparsity and staged pipeline errors, while video-based approaches only perform mechanical pose mimicry without genuine visual understanding. We propose RoboMirror, the first retargeting-free video-to-locomotion framework embodying "understand before you imitate". Leveraging VLMs, it distills raw egocentric/third-person videos into visual motion intents, which directly condition a diffusion-based policy to generate physically plausible, semantically aligned locomotion without explicit pose reconstruction or retargeting. Extensive experiments validate the effectiveness of RoboMirror, it enables telepresence via egocentric videos, drastically reduces third-person control latency by 80%, and achieves a 3.7% higher task success rate than baselines. By reframing humanoid control around video understanding, we bridge the visual understanding and action gap.
The N-5 Scaling Law: Topological Dimensionality Reduction in the Optimal Design of Fully-actuated Multirotors
The geometric design of fully-actuated and omnidirectional N-rotor aerial vehicles is conventionally formulated as a parametric optimization problem, seeking a single optimal set of N orientations within a fixed architectural family. This work departs from that paradigm to investigate the intrinsic topological structure of the optimization landscape itself. We formulate the design problem on the product manifold of Projective Lines \RP^2^N, fixing the rotor positions to the vertices of polyhedral chassis while varying their lines of action. By minimizing a coordinate-invariant Log-Volume isotropy metric, we reveal that the topology of the global optima is governed strictly by the symmetry of the chassis. For generic (irregular) vertex arrangements, the solutions appear as a discrete set of isolated points. However, as the chassis geometry approaches regularity, the solution space undergoes a critical phase transition, collapsing onto an N-dimensional Torus of the lines tangent at the vertexes to the circumscribing sphere of the chassis, and subsequently reducing to continuous 1-dimensional curves driven by Affine Phase Locking. We synthesize these observations into the N-5 Scaling Law: an empirical relationship holding for all examined regular planar polygons and Platonic solids (N <= 10), where the space of optimal configurations consists of K=N-5 disconnected 1D topological branches. We demonstrate that these locking patterns correspond to a sequence of admissible Star Polygons {N/q}, allowing for the exact prediction of optimal phases for arbitrary N. Crucially, this topology reveals a design redundancy that enables optimality-preserving morphing: the vehicle can continuously reconfigure along these branches while preserving optimal isotropic control authority.
Interactive Robot Programming for Surface Finishing via Task-Centric Mixed Reality Interfaces
Lengthy setup processes that require robotics expertise remain a major barrier to deploying robots for tasks involving high product variability and small batch sizes. As a result, collaborative robots, despite their advanced sensing and control capabilities, are rarely used for surface finishing in small-scale craft and manufacturing settings. To address this gap, we propose a novel robot programming approach that enables non-experts to intuitively program robots through interactive, task-focused workflows. For that, we developed a new surface segmentation algorithm that incorporates human input to identify and refine workpiece regions for processing. Throughout the programming process, users receive continuous visual feedback on the robot's learned model, enabling them to iteratively refine the segmentation result. Based on the segmented surface model, a robot trajectory is generated to cover the desired processing area. We evaluated multiple interaction designs across two comprehensive user studies to derive an optimal interface that significantly reduces user workload, improves usability and enables effective task programming even for users with limited practical experience.
comment: Currently under review at Intelligent Service Robotics
A Kalman Filter-Based Disturbance Observer for Steer-by-Wire Systems
Steer-by-Wire systems replace mechanical linkages, which provide benefits like weight reduction, design flexibility, and compatibility with autonomous driving. However, they are susceptible to high-frequency disturbances from unintentional driver torque, known as driver impedance, which can degrade steering performance. Existing approaches either rely on direct torque sensors, which are costly and impractical, or lack the temporal resolution to capture rapid, high-frequency driver-induced disturbances. We address this limitation by designing a Kalman filter-based disturbance observer that estimates high-frequency driver torque using only motor state measurements. We model the drivers passive torque as an extended state using a PT1-lag approximation and integrate it into both linear and nonlinear Steer-by-Wire system models. In this paper, we present the design, implementation and simulation of this disturbance observer with an evaluation of different Kalman filter variants. Our findings indicate that the proposed disturbance observer accurately reconstructs driver-induced disturbances with only minimal delay 14ms. We show that a nonlinear extended Kalman Filter outperforms its linear counterpart in handling frictional nonlinearities, improving estimation during transitions from static to dynamic friction. Given the study's methodology, it was unavoidable to rely on simulation-based validation rather than real-world experimentation. Further studies are needed to investigate the robustness of the observers under real-world driving conditions.
Unsupervised Learning for Detection of Rare Driving Scenarios
The detection of rare and hazardous driving scenarios is a critical challenge for ensuring the safety and reliability of autonomous systems. This research explores an unsupervised learning framework for detecting rare and extreme driving scenarios using naturalistic driving data (NDD). We leverage the recently proposed Deep Isolation Forest (DIF), an anomaly detection algorithm that combines neural network-based feature representations with Isolation Forests (IFs), to identify non-linear and complex anomalies. Data from perception modules, capturing vehicle dynamics and environmental conditions, is preprocessed into structured statistical features extracted from sliding windows. The framework incorporates t-distributed stochastic neighbor embedding (t-SNE) for dimensionality reduction and visualization, enabling better interpretability of detected anomalies. Evaluation is conducted using a proxy ground truth, combining quantitative metrics with qualitative video frame inspection. Our results demonstrate that the proposed approach effectively identifies rare and hazardous driving scenarios, providing a scalable solution for anomaly detection in autonomous driving systems. Given the study's methodology, it was unavoidable to depend on proxy ground truth and manually defined feature combinations, which do not encompass the full range of real-world driving anomalies or their nuanced contextual dependencies.
Soft Robotic Technological Probe for Speculative Fashion Futures
Emerging wearable robotics demand design approaches that address not only function, but also social meaning. In response, we present Sumbrella, a soft robotic garment developed as a speculative fashion probe. We first detail the design and fabrication of the Sumbrella, including sequenced origami-inspired bistable units, fabric pneumatic actuation chambers, cable driven shape morphing mechanisms, computer vision components, and an integrated wearable system comprising a hat and bolero jacket housing power and control electronics. Through a focus group with twelve creative technologists, we then used Sumbrella as a technological probe to explore how people interpreted, interacted, and imagined future relationships with soft robotic wearables. While Sumbrella allowed our participants to engage in rich discussion around speculative futures and expressive potential, it also surfaced concerns about exploitation, surveillance, and the personal risks and societal ethics of embedding biosensing technologies in public life. We contribute to the Human-Robot Interaction (HRI) field key considerations and recommendations for designing soft robotic garments, including the potential for kinesic communication, the impact of such technologies on social dynamics, and the importance of ethical guidelines. Finally, we provide a reflection on our application of speculative design; proposing that it allows HRI researchers to not only consider functionality, but also how wearable robots influence definitions of what is considered acceptable or desirable in public settings.
Act2Goal: From World Model To General Goal-conditioned Policy
Specifying robotic manipulation tasks in a manner that is both expressive and precise remains a central challenge. While visual goals provide a compact and unambiguous task specification, existing goal-conditioned policies often struggle with long-horizon manipulation due to their reliance on single-step action prediction without explicit modeling of task progress. We propose Act2Goal, a general goal-conditioned manipulation policy that integrates a goal-conditioned visual world model with multi-scale temporal control. Given a current observation and a target visual goal, the world model generates a plausible sequence of intermediate visual states that captures long-horizon structure. To translate this visual plan into robust execution, we introduce Multi-Scale Temporal Hashing (MSTH), which decomposes the imagined trajectory into dense proximal frames for fine-grained closed-loop control and sparse distal frames that anchor global task consistency. The policy couples these representations with motor control through end-to-end cross-attention, enabling coherent long-horizon behavior while remaining reactive to local disturbances. Act2Goal achieves strong zero-shot generalization to novel objects, spatial layouts, and environments. We further enable reward-free online adaptation through hindsight goal relabeling with LoRA-based finetuning, allowing rapid autonomous improvement without external supervision. Real-robot experiments demonstrate that Act2Goal improves success rates from 30% to 90% on challenging out-of-distribution tasks within minutes of autonomous interaction, validating that goal-conditioned world models with multi-scale temporal control provide structured guidance necessary for robust long-horizon manipulation. Project page: https://act2goal.github.io/
Robust Deep Learning Control with Guaranteed Performance for Safe and Reliable Robotization in Heavy-Duty Machinery
Today's heavy-duty mobile machines (HDMMs) face two transitions: from diesel-hydraulic actuation to clean electric systems driven by climate goals, and from human supervision toward greater autonomy. Diesel-hydraulic systems have long dominated, so full electrification, via direct replacement or redesign, raises major technical and economic challenges. Although advanced artificial intelligence (AI) could enable higher autonomy, adoption in HDMMs is limited by strict safety requirements, and these machines still rely heavily on human supervision. This dissertation develops a control framework that (1) simplifies control design for electrified HDMMs through a generic modular approach that is energy-source independent and supports future modifications, and (2) defines hierarchical control policies that partially integrate AI while guaranteeing safety-defined performance and stability. Five research questions align with three lines of investigation: a generic robust control strategy for multi-body HDMMs with strong stability across actuation types and energy sources; control solutions that keep strict performance under uncertainty and faults while balancing robustness and responsiveness; and methods to interpret and trust black-box learning strategies so they can be integrated stably and verified against international safety standards. The framework is validated in three case studies spanning different actuators and conditions, covering heavy-duty mobile robots and robotic manipulators. Results appear in five peer-reviewed publications and one unpublished manuscript, advancing nonlinear control and robotics and supporting both transitions.
comment: Doctoral Dissertation, Tampere University
Theory of Mind for Explainable Human-Robot Interaction
Within the context of human-robot interaction (HRI), Theory of Mind (ToM) is intended to serve as a user-friendly backend to the interface of robotic systems, enabling robots to infer and respond to human mental states. When integrated into robots, ToM allows them to adapt their internal models to users' behaviors, enhancing the interpretability and predictability of their actions. Similarly, Explainable Artificial Intelligence (XAI) aims to make AI systems transparent and interpretable, allowing humans to understand and interact with them effectively. Since ToM in HRI serves related purposes, we propose to consider ToM as a form of XAI and evaluate it through the eValuation XAI (VXAI) framework and its seven desiderata. This paper identifies a critical gap in the application of ToM within HRI, as existing methods rarely assess the extent to which explanations correspond to the robot's actual internal reasoning. To address this limitation, we propose to integrate ToM within XAI frameworks. By embedding ToM principles inside XAI, we argue for a shift in perspective, as current XAI research focuses predominantly on the AI system itself and often lacks user-centered explanations. Incorporating ToM would enable a change in focus, prioritizing the user's informational needs and perspective.
Assessing behaviour coverage in a multi-agent system simulation for autonomous vehicle testing
As autonomous vehicle technology advances, ensuring the safety and reliability of these systems becomes paramount. Consequently, comprehensive testing methodologies are essential to evaluate the performance of autonomous vehicles in diverse and complex real-world scenarios. This study focuses on the behaviour coverage analysis of a multi-agent system simulation designed for autonomous vehicle testing, and provides a systematic approach to measure and assess behaviour coverage within the simulation environment. By defining a set of driving scenarios, and agent interactions, we evaluate the extent to which the simulation encompasses a broad range of behaviours relevant to autonomous driving. Our findings highlight the importance of behaviour coverage in validating the effectiveness and robustness of autonomous vehicle systems. Through the analysis of behaviour coverage metrics and coverage-based testing, we identify key areas for improvement and optimization in the simulation framework. Thus, a Model Predictive Control (MPC) pedestrian agent is proposed, where its objective function is formulated to encourage \textit{interesting} tests while promoting a more realistic behaviour than other previously studied pedestrian agents. This research contributes to advancing the field of autonomous vehicle testing by providing insights into the comprehensive evaluation of system behaviour in simulated environments. The results offer valuable implications for enhancing the safety, reliability, and performance of autonomous vehicles through rigorous testing methodologies.
Optimal Scalability-Aware Allocation of Swarm Robots: From Linear to Retrograde Performance via Marginal Gains
In collective systems, the available agents are a limited resource that must be allocated among tasks to maximize collective performance. Computing the optimal allocation of several agents to numerous tasks through a brute-force approach can be infeasible, especially when each task's performance scales differently with the increase of agents. For example, difficult tasks may require more agents to achieve similar performances compared to simpler tasks, but performance may saturate nonlinearly as the number of allocated agents increases. We propose a computationally efficient algorithm, based on marginal performance gains, for optimally allocating agents to tasks with concave scalability functions, including linear, saturating, and retrograde scaling, to achieve maximum collective performance. We test the algorithm by allocating a simulated robot swarm among collective decision-making tasks, where embodied agents sample their environment and exchange information to reach a consensus on spatially distributed environmental features. We vary task difficulties by different geometrical arrangements of environmental features in space (patchiness). In this scenario, decision performance in each task scales either as a saturating curve (following the Condorcet's Jury Theorem in an interference-free setup) or as a retrograde curve (when physical interference among robots restricts their movement). Using simple robot simulations, we show that our algorithm can be useful in allocating robots among tasks. Our approach aims to advance the deployment of future real-world multi-robot systems.
comment: 14 pages, 11 figures, Accepted for publication in IEEE Transactions on Systems, Man, and Cybernetics: Systems
PCR-ORB: Enhanced ORB-SLAM3 with Point Cloud Refinement Using Deep Learning-Based Dynamic Object Filtering
Visual Simultaneous Localization and Mapping (vSLAM) systems encounter substantial challenges in dynamic environments where moving objects compromise tracking accuracy and map consistency. This paper introduces PCR-ORB (Point Cloud Refinement ORB), an enhanced ORB-SLAM3 framework that integrates deep learning-based point cloud refinement to mitigate dynamic object interference. Our approach employs YOLOv8 for semantic segmentation combined with CUDA-accelerated processing to achieve real-time performance. The system implements a multi-stage filtering strategy encompassing ground plane estimation, sky region removal, edge filtering, and temporal consistency validation. Comprehensive evaluation on the KITTI dataset (sequences 00-09) demonstrates performance characteristics across different environmental conditions and scene types. Notable improvements are observed in specific sequences, with sequence 04 achieving 25.9% improvement in ATE RMSE and 30.4% improvement in ATE median. However, results show mixed performance across sequences, indicating scenario-dependent effectiveness. The implementation provides insights into dynamic object filtering challenges and opportunities for robust navigation in complex environments.
comment: 17 pages, 2 figures, 1 table
Explainable Neural Inverse Kinematics for Obstacle-Aware Robotic Manipulation: A Comparative Analysis of IKNet Variants
Deep neural networks have accelerated inverse-kinematics (IK) inference to the point where low cost manipulators can execute complex trajectories in real time, yet the opaque nature of these models contradicts the transparency and safety requirements emerging in responsible AI regulation. This study proposes an explainability centered workflow that integrates Shapley-value attribution with physics-based obstacle avoidance evaluation for the ROBOTIS OpenManipulator-X. Building upon the original IKNet, two lightweight variants-Improved IKNet with residual connections and Focused IKNet with position-orientation decoupling are trained on a large, synthetically generated pose-joint dataset. SHAP is employed to derive both global and local importance rankings, while the InterpretML toolkit visualizes partial-dependence patterns that expose non-linear couplings between Cartesian poses and joint angles. To bridge algorithmic insight and robotic safety, each network is embedded in a simulator that subjects the arm to randomized single and multi-obstacle scenes; forward kinematics, capsule-based collision checks, and trajectory metrics quantify the relationship between attribution balance and physical clearance. Qualitative heat maps reveal that architectures distributing importance more evenly across pose dimensions tend to maintain wider safety margins without compromising positional accuracy. The combined analysis demonstrates that explainable AI(XAI) techniques can illuminate hidden failure modes, guide architectural refinements, and inform obstacle aware deployment strategies for learning based IK. The proposed methodology thus contributes a concrete path toward trustworthy, data-driven manipulation that aligns with emerging responsible-AI standards.
comment: 27 pages, 16 figures
Beyond Coverage Path Planning: Can UAV Swarms Perfect Scattered Regions Inspections?
Unmanned Aerial Vehicles (UAVs) have revolutionized inspection tasks by offering a safer, more efficient, and flexible alternative to traditional methods. However, battery limitations often constrain their effectiveness, necessitating the development of optimized flight paths and data collection techniques. While existing approaches like coverage path planning (CPP) ensure comprehensive data collection, they can be inefficient, especially when inspecting multiple non connected Regions of Interest (ROIs). This paper introduces the Fast Inspection of Scattered Regions (FISR) problem and proposes a novel solution, the multi UAV Disjoint Areas Inspection (mUDAI) method. The introduced approach implements a two fold optimization procedure, for calculating the best image capturing positions and the most efficient UAV trajectories, balancing data resolution and operational time, minimizing redundant data collection and resource consumption. The mUDAI method is designed to enable rapid, efficient inspections of scattered ROIs, making it ideal for applications such as security infrastructure assessments, agricultural inspections, and emergency site evaluations. A combination of simulated evaluations and real world deployments is used to validate and quantify the method's ability to improve operational efficiency while preserving high quality data capture, demonstrating its effectiveness in real world operations. An open source Python implementation of the mUDAI method can be found on GitHub (https://github.com/soc12/mUDAI) and the collected and processed data from the real world experiments are all hosted on Zenodo (https://zenodo.org/records/13866483). Finally, this online platform (https://sites.google.com/view/mudai-platform/) allows interested readers to interact with the mUDAI method and generate their own multi UAV FISR missions.
A Human-Oriented Cooperative Driving Approach: Integrating Driving Intention, State, and Conflict
Human-vehicle cooperative driving serves as a vital bridge to fully autonomous driving by improving driving flexibility and gradually building driver trust and acceptance of autonomous technology. To establish more natural and effective human-vehicle interaction, we propose a Human-Oriented Cooperative Driving (HOCD) approach that primarily minimizes human-machine conflict by prioritizing driver intention and state. In implementation, we take both tactical and operational levels into account to ensure seamless human-vehicle cooperation. At the tactical level, we design an intention-aware trajectory planning method, using intention consistency cost as the core metric to evaluate the trajectory and align it with driver intention. At the operational level, we develop a control authority allocation strategy based on reinforcement learning, optimizing the policy through a designed reward function to achieve consistency between driver state and authority allocation. The results of simulation and human-in-the-loop experiments demonstrate that our proposed approach not only aligns with driver intention in trajectory planning but also ensures a reasonable authority allocation. Compared to other cooperative driving approaches, the proposed HOCD approach significantly enhances driving performance and mitigates human-machine conflict.The code is available at https://github.com/i-Qin/HOCD.
SurgWorld: Learning Surgical Robot Policies from Videos via World Modeling
Data scarcity remains a fundamental barrier to achieving fully autonomous surgical robots. While large scale vision language action (VLA) models have shown impressive generalization in household and industrial manipulation by leveraging paired video action data from diverse domains, surgical robotics suffers from the paucity of datasets that include both visual observations and accurate robot kinematics. In contrast, vast corpora of surgical videos exist, but they lack corresponding action labels, preventing direct application of imitation learning or VLA training. In this work, we aim to alleviate this problem by learning policy models from SurgWorld, a world model designed for surgical physical AI. We curated the Surgical Action Text Alignment (SATA) dataset with detailed action description specifically for surgical robots. Then we built SurgeWorld based on the most advanced physical AI world model and SATA. It's able to generate diverse, generalizable and realistic surgery videos. We are also the first to use an inverse dynamics model to infer pseudokinematics from synthetic surgical videos, producing synthetic paired video action data. We demonstrate that a surgical VLA policy trained with these augmented data significantly outperforms models trained only on real demonstrations on a real surgical robot platform. Our approach offers a scalable path toward autonomous surgical skill acquisition by leveraging the abundance of unlabeled surgical video and generative world modeling, thus opening the door to generalizable and data efficient surgical robot policies.
Breaking Symmetry-Induced Degeneracy in Multi-Agent Ergodic Coverage via Stochastic Spectral Control
Multi-agent ergodic coverage via Spectral Multiscale Coverage (SMC) provides a principled framework for driving a team of agents so that their collective time-averaged trajectories match a prescribed spatial distribution. While classical SMC has demonstrated empirical success, it can suffer from gradient cancellation, particularly when agents are initialized near symmetry points of the target distribution, leading to undesirable behaviors such as stalling or motion constrained along symmetry axes. In this work, we rigorously characterize the initial conditions and symmetry-induced invariant manifolds that give rise to such directional degeneracy in first-order agent dynamics. To address this, we introduce a stochastic perturbation combined with a contraction term and prove that the resulting dynamics ensure almost-sure escape from zero-gradient manifolds while maintaining mean-square boundedness of agent trajectories. Simulations on symmetric multi-modal reference distributions demonstrate that the proposed stochastic SMC effectively mitigates transient stalling and axis-constrained motion, while ensuring that all agent trajectories remain bounded within the domain.
A Sequential Hermaphrodite Coupling Mechanism for Lattice-based Modular Robots
Lattice-based modular robot systems are envisioned for large-scale construction in extreme environments, such as space. Coupling mechanisms for heterogeneous structural modules should meet all of the following requirements: single-sided coupling and decoupling, flat surfaces when uncoupled, and coupling to passive coupling interfaces as well as coupling behavior between coupling mechanisms. The design requirements for such a coupling mechanism are complex. We propose a novel shape-matching mechanical coupling mechanism that satisfies these design requirements. This mechanism enables controlled, sequential transitions between male and female states. When uncoupled, all mechanisms are in the female state. To enable single-sided coupling, one side of the mechanisms switches to the male state during the coupling process. Single-sided decoupling is possible not only from the male side but also from the female side by forcibly switching the opposite mechanism's male state to the female state. This coupling mechanism can be applied to various modular robot systems and robot arm tool changers.
comment: Author's version of a manuscript accepted at the 2025 IEEE International Conference on Mechatronics (ICM). (c) IEEE. The final published version is available at https://doi.org/10.1109/ICM62621.2025.10934866
Towards the Automation in the Space Station: Feasibility Study and Ground Tests of a Multi-Limbed Intra-Vehicular Robot
This paper presents a feasibility study, including simulations and prototype tests, on the autonomous operation of a multi-limbed intra-vehicular robot (mobile manipulator), shortly MLIVR, designed to assist astronauts with logistical tasks on the International Space Station (ISS). Astronauts spend significant time on tasks such as preparation, close-out, and the collection and transportation of goods, reducing the time available for critical mission activities. Our study explores the potential for a mobile manipulator to support these operations, emphasizing the need for autonomous functionality to minimize crew and ground operator effort while enabling real-time task execution. We focused on the robot's transportation capabilities, simulating its motion planning in 3D space. The actual motion execution was tested with a prototype on a 2D table to mimic a microgravity environment. The results demonstrate the feasibility of performing these tasks with minimal human intervention, offering a promising solution to enhance operational efficiency on the ISS.
comment: Author's version of a manuscript accepted at the 2025 IEEE/SICE International Symposium on System Integration (SII). (c) IEEE. The final published version is available at https://doi.org/10.1109/SII59315.2025.10870890
Pole-centric Descriptors for Robust Robot Localization: Evaluation under Pole-at-Distance (PaD) Observations using the Small Pole Landmark (SPL) Dataset
While pole-like structures are widely recognized as stable geometric anchors for long-term robot localization, their identification reliability degrades significantly under Pole-at-Distance (Pad) observations typical of large-scale urban environments. This paper shifts the focus from descriptor design to a systematic investigation of descriptor robustness. Our primary contribution is the establishment of a specialized evaluation framework centered on the Small Pole Landmark (SPL) dataset. This dataset is constructed via an automated tracking-based association pipeline that captures multi-view, multi-distance observations of the same physical landmarks without manual annotation. Using this framework, we present a comparative analysis of Contrastive Learning (CL) and Supervised Learning (SL) paradigms. Our findings reveal that CL induces a more robust feature space for sparse geometry, achieving superior retrieval performance particularly in the 5--10m range. This work provides an empirical foundation and a scalable methodology for evaluating landmark distinctiveness in challenging real-world scenarios.
comment: 3 pages, technical report
A New Software Tool for Generating and Visualizing Robot Self-Collision Matrices
In robotics, it is common to check whether a given robot state results in self-intersection (i.e., a self-collision query) or to assess its distance from such an intersection (i.e., a self-proximity query). These checks are typically performed between pairs of shapes attached to different robot links. However, many of these shape pairs can be excluded in advance, as their configurations are known to always or never result in contact. This information is typically encoded in a self-collision matrix, where each entry (i, j) indicates whether a check should be performed between shape i and shape j. While the MoveIt Setup Assistant is widely used to generate such matrices, current tools are limited by static visualization, lack of proximity support, rigid single-geometry assumptions, and tedious refinement workflows, hindering flexibility and reuse in downstream robotics applications. In this work, we introduce an interactive tool that overcomes these limitations by generating and visualizing self-collision matrices across multiple shape representations, enabling dynamic inspection, filtering, and refinement of shape pairs. Outputs are provided in both JSON and YAML for easy integration. The system is implemented in Rust and uses the Bevy game engine to deliver high-quality visualizations. We demonstrate its effectiveness on multiple robot platforms, showing that matrices generated using diverse shape types yield faster and more accurate self-collision and self-proximity queries.
Beyond URDF: The Universal Robot Description Directory for Shared, Extensible, and Standardized Robot Models
Robots are typically described in software by specification files (e.g., URDF, SDF, MJCF, USD) that encode only basic kinematic, dynamic, and geometric information. As a result, downstream applications such as simulation, planning, and control must repeatedly re-derive richer data, leading to redundant computations, fragmented implementations, and limited standardization. In this work, we introduce the Universal Robot Description Directory (URDD), a modular representation that organizes derived robot information into structured, easy-to-parse JSON and YAML modules. Our open-source toolkit automatically generates URDDs from URDFs, with a Rust implementation supporting Bevy-based visualization. Additionally, we provide a JavaScript/Three.js viewer for web-based inspection of URDDs. Experiments on multiple robot platforms show that URDDs can be generated efficiently, encapsulate substantially richer information than standard specification files, and directly enable the construction of core robotics subroutines. URDD provides a unified, extensible resource for reducing redundancy and establishing shared standards across robotics frameworks. We conclude with a discussion on the limitations and implications of our work.
Learning to Feel the Future: DreamTacVLA for Contact-Rich Manipulation
Vision-Language-Action (VLA) models have shown remarkable generalization by mapping web-scale knowledge to robotic control, yet they remain blind to physical contact. Consequently, they struggle with contact-rich manipulation tasks that require reasoning about force, texture, and slip. While some approaches incorporate low-dimensional tactile signals, they fail to capture the high-resolution dynamics essential for such interactions. To address this limitation, we introduce DreamTacVLA, a framework that grounds VLA models in contact physics by learning to feel the future. Our model adopts a hierarchical perception scheme in which high-resolution tactile images serve as micro-vision inputs coupled with wrist-camera local vision and third-person macro vision. To reconcile these multi-scale sensory streams, we first train a unified policy with a Hierarchical Spatial Alignment (HSA) loss that aligns tactile tokens with their spatial counterparts in the wrist and third-person views. To further deepen the model's understanding of fine-grained contact dynamics, we finetune the system with a tactile world model that predicts future tactile signals. To mitigate tactile data scarcity and the wear-prone nature of tactile sensors, we construct a hybrid large-scale dataset sourced from both high-fidelity digital twin and real-world experiments. By anticipating upcoming tactile states, DreamTacVLA acquires a rich model of contact physics and conditions its actions on both real observations and imagined consequences. Across contact-rich manipulation tasks, it outperforms state-of-the-art VLA baselines, achieving up to 95% success, highlighting the importance of understanding physical contact for robust, touch-aware robotic agents.
Simultaneous Extrinsic Contact and In-Hand Pose Estimation via Distributed Tactile Sensing
Prehensile autonomous manipulation, such as peg insertion, tool use, or assembly, require precise in-hand understanding of the object pose and the extrinsic contacts made during interactions. Providing accurate estimation of pose and contacts is challenging. Tactile sensors can provide local geometry at the sensor and force information about the grasp, but the locality of sensing means resolving poses and contacts from tactile alone is often an ill-posed problem, as multiple configurations can be consistent with the observations. Adding visual feedback can help resolve ambiguities, but can suffer from noise and occlusions. In this work, we propose a method that pairs local observations from sensing with the physical constraints of contact. We propose a set of factors that ensure local consistency with tactile observations as well as enforcing physical plausibility, namely, that the estimated pose and contacts must respect the kinematic and force constraints of quasi-static rigid body interactions. We formalize our problem as a factor graph, allowing for efficient estimation. In our experiments, we demonstrate that our method outperforms existing geometric and contact-informed estimation pipelines, especially when only tactile information is available. Video results can be found at https://tacgraph.github.io/.
comment: 8 pages. IEEE Robotics and Automation Letters, 2026
Leveraging Synthetic Priors for Monocular Depth Estimation in Specular Surgical Environments
Accurate Monocular Depth Estimation (MDE) is critical for robotic surgery but remains fragile in specular, fluid-filled endoscopic environments. Existing self-supervised methods, typically relying on foundation models trained with noisy real-world pseudo-labels, often suffer from boundary collapse on thin surgical tools and transparent surfaces. In this work, we address this by leveraging the high-fidelity synthetic priors of the Depth Anything V2 architecture, which inherently captures precise geometric details of thin structures. We efficiently adapt these priors to the medical domain using Dynamic Vector Low-Rank Adaptation (DV-LORA), minimizing the parameter budget while bridging the synthetic-to-real gap. Additionally, we introduce a physically-stratified evaluation protocol on the SCARED dataset to rigorously quantify performance in high-specularity regimes often masked by aggregate metrics. Our approach establishes a new state-of-the-art, achieving an accuracy (< 1.25) of 98.1% and reducing Squared Relative Error by over 17% compared to established baselines, demonstrating superior robustness in adverse surgical lighting.
GrOMP: Grasped Object Manifold Projection for Multimodal Imitation Learning of Manipulation
Imitation Learning (IL) holds great potential for learning repetitive manipulation tasks, such as those in industrial assembly. However, its effectiveness is often limited by insufficient trajectory precision due to compounding errors. In this paper, we introduce Grasped Object Manifold Projection (GrOMP), an interactive method that mitigates these errors by constraining a non-rigidly grasped object to a lower-dimensional manifold. GrOMP assumes a precise task in which a manipulator holds an object that may shift within the grasp in an observable manner and must be mated with a grounded part. Crucially, all GrOMP enhancements are learned from the same expert dataset used to train the base IL policy, and are adjusted with an n-arm bandit-based interactive component. We propose a theoretical basis for GrOMP's improvement upon the well-known compounding error bound in IL literature. We demonstrate the framework on four precise assembly tasks using tactile feedback, and note that the approach remains modality-agnostic. Data and videos are available at williamvdb.github.io/GrOMPsite.
comment: 8 pages, 8 figures, 2 tables
Mechanically Programming the Cross-Sectional Shape of Soft Growing Robotic Structures for Patient Transfer
Pneumatic soft everting robotic structures have the potential to facilitate human transfer tasks due to their ability to grow underneath humans without sliding friction and their utility as a flexible sling when deflated. Tubular structures naturally yield circular cross-sections when inflated, whereas a robotic sling must be both thin enough to grow between them and their resting surface and wide enough to cradle the human. Recent works have achieved flattened cross-sections by including rigid components into the structure, but this reduces conformability to the human. We present a method of mechanically programming the cross-section of soft everting robotic structures using flexible strips that constrain radial expansion between points along the outer membrane. Our method enables simultaneously wide and thin profiles while maintaining the full multi-axis flexibility of traditional slings. We develop and validate a model relating the geometric design specifications to the fabrication parameters, and experimentally characterize their effects on growth rate. Finally, we prototype a soft growing robotic sling system and demonstrate its use for assisting a single caregiver in bed-to-chair patient transfer.
Gaussian Process Implicit Surfaces as Control Barrier Functions for Safe Robot Navigation
Level set methods underpin modern safety techniques such as control barrier functions (CBFs), while also serving as implicit surface representations for geometric shapes via distance fields. Inspired by these two paradigms, we propose a unified framework where the implicit surface itself acts as a CBF. We leverage Gaussian process (GP) implicit surface (GPIS) to represent the safety boundaries, using safety samples which are derived from sensor measurements to condition the GP. The GP posterior mean defines the implicit safety surface (safety belief), while the posterior variance provides a robust safety margin. Although GPs have favorable properties such as uncertainty estimation and analytical tractability, they scale cubically with data. To alleviate this issue, we develop a sparse solution called sparse Gaussian CBFs. To the best of our knowledge, GPIS have not been explicitly used to synthesize CBFs. We validate the approach on collision avoidance tasks in two settings: a simulated 7-DOF manipulator operating around the Stanford bunny, and a quadrotor navigating in 3D around a physical chair. In both cases, Gaussian CBFs (with and without sparsity) enable safe interaction and collision-free execution of trajectories that would otherwise intersect the objects.
comment: 8 pages, 7 figures, under review
Never too Cocky to Cooperate: An FIM and RL-based USV-AUV Collaborative System for Underwater Tasks in Extreme Sea Conditions
This paper develops a novel unmanned surface vehicle (USV)-autonomous underwater vehicle (AUV) collaborative system designed to enhance underwater task performance in extreme sea conditions. The system integrates a dual strategy: (1) high-precision multi-AUV localization enabled by Fisher information matrix-optimized USV path planning, and (2) reinforcement learning-based cooperative planning and control method for multi-AUV task execution. Extensive experimental evaluations in the underwater data collection task demonstrate the system's operational feasibility, with quantitative results showing significant performance improvements over baseline methods. The proposed system exhibits robust coordination capabilities between USV and AUVs while maintaining stability in extreme sea conditions. To facilitate reproducibility and community advancement, we provide an open-source simulation toolkit available at: https://github.com/360ZMEM/USV-AUV-colab .
comment: This paper has been accepted by IEEE Transactions on Mobile Computing
Anti-Slip AI-Driven Model-Free Control with Global Exponential Stability in Skid-Steering Robots IROS
Undesired lateral and longitudinal wheel slippage can disrupt a mobile robot's heading angle, traction, and, eventually, desired motion. This issue makes the robotization and accurate modeling of heavy-duty machinery very challenging because the application primarily involves off-road terrains, which are susceptible to uneven motion and severe slippage. As a step toward robotization in skid-steering heavy-duty robot (SSHDR), this paper aims to design an innovative robust model-free control system developed by neural networks to strongly stabilize the robot dynamics in the presence of a broad range of potential wheel slippages. Before the control design, the dynamics of the SSHDR are first investigated by mathematically incorporating slippage effects, assuming that all functional modeling terms of the system are unknown to the control system. Then, a novel tracking control framework to guarantee global exponential stability of the SSHDR is designed as follows: 1) the unknown modeling of wheel dynamics is approximated using radial basis function neural networks (RBFNNs); and 2) a new adaptive law is proposed to compensate for slippage effects and tune the weights of the RBFNNs online during execution. Simulation and experimental results verify the proposed tracking control performance of a 4,836 kg SSHDR operating on slippery terrain.
comment: This paper has been published in 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Contingency Model-based Control (CMC) for Communicationless Cooperative Collision Avoidance in Robot Swarms
Cooperative collision avoidance between robots in swarm operations remains an open challenge. Assuming a decentralized architecture, each robot is responsible for making its own control decisions, including motion planning. To this end, most existing approaches mostly rely some form of (wireless) communication between the agents of the swarm. In reality, however, communication is brittle. It may be affected by latency, further delays and packet losses, transmission faults, and is subject to adversarial attacks, such as jamming or spoofing. This paper proposes Contingency Model-based Control (CMC) as a communicationless alternative. It follows the implicit cooperation paradigm, under which the design of the robots is based on consensual (offline) rules, similar to traffic rules. They include the definition of a contingency trajectory for each robot, and a method for construction of mutual collision avoidance constraints. The setup is shown to guarantee the recursive feasibility and collision avoidance between all swarm members in closed-loop operation. Moreover, CMC naturally satisfies the Plug \& Play paradigm, i.e., for new robots entering the swarm. Two numerical examples demonstrate that the collision avoidance guarantee is intact and that the robot swarm operates smoothly under the CMC regime.
Think, Act, Learn: A Framework for Autonomous Robotic Agents using Closed-Loop Large Language Models
The integration of Large Language Models (LLMs) into robotics has unlocked unprecedented capabilities in high-level task planning. However, most current systems operate in an open-loop fashion, where LLMs act as one-shot planners, rendering them brittle and unable to adapt to unforeseen circumstances in dynamic physical environments. To overcome this limitation, this paper introduces the "Think, Act, Learn" (T-A-L) framework, a novel architecture that enables an embodied agent to autonomously learn and refine its policies through continuous interaction. Our framework establishes a closed-loop cycle where an LLM first "thinks" by decomposing high-level commands into actionable plans. The robot then "acts" by executing these plans while gathering rich, multimodal sensory feedback. Critically, the "learn" module processes this feedback to facilitate LLM-driven self-reflection, allowing the agent to perform causal analysis on its failures and generate corrective strategies. These insights are stored in an experiential memory to guide future planning cycles. We demonstrate through extensive experiments in both simulation and the real world that our T-A-L agent significantly outperforms baseline methods, including open-loop LLMs, Behavioral Cloning, and traditional Reinforcement Learning. Our framework achieves over a 97% success rate on complex, long-horizon tasks, converges to a stable policy in an average of just 9 trials, and exhibits remarkable generalization to unseen tasks. This work presents a significant step towards developing more robust, adaptive, and truly autonomous robotic agents.
comment: 13 pages, 7 figures
ReSemAct: Advancing Fine-Grained Robotic Manipulation via Semantic Structuring and Affordance Refinement
Fine-grained robotic manipulation requires grounding natural language into appropriate affordance targets. However, most existing methods driven by foundation models often compress rich semantics into oversimplified affordances, preventing exploitation of implicit semantic information. To address these challenges, we present ReSemAct, a novel unified manipulation framework that introduces Semantic Structuring and Affordance Refinement (SSAR), powered by the automated synergistic reasoning between Multimodal Large Language Models (MLLMs) and Vision Foundation Models (VFMs). Specifically, the Semantic Structuring module derives a unified semantic affordance description from natural language and RGB observations, organizing affordance regions, implicit functional intent, and coarse affordance anchors into a structured representation for downstream refinement. Building upon this specification, the Affordance Refinement strategy instantiates two complementary flows that separately specialize geometry and position, yielding fine-grained affordance targets. These refined targets are then encoded as real-time joint-space optimization objectives, enabling reactive and robust manipulation in dynamic environments. Extensive simulation and real-world experiments are conducted in semantically rich household and sparse chemical lab environments. The results demonstrate that ReSemAct performs diverse tasks under zero-shot conditions, showcasing the robustness of SSAR with foundation models in fine-grained manipulation. Code and videos at https://github.com/scy-v/ReSemAct and https://resemact.github.io.
comment: Code and videos: https://github.com/scy-v/ReSemAct and https://resemact.github.io
LidarDM: Generative LiDAR Simulation in a Generated World
We present LidarDM, a novel LiDAR generative model capable of producing realistic, layout-aware, physically plausible, and temporally coherent LiDAR videos. LidarDM stands out with two unprecedented capabilities in LiDAR generative modeling: (i) LiDAR generation guided by driving scenarios, offering significant potential for autonomous driving simulations, and (ii) 4D LiDAR point cloud generation, enabling the creation of realistic and temporally coherent sequences. At the heart of our model is a novel integrated 4D world generation framework. Specifically, we employ latent diffusion models to generate the 3D scene, combine it with dynamic actors to form the underlying 4D world, and subsequently produce realistic sensory observations within this virtual environment. Our experiments indicate that our approach outperforms competing algorithms in realism, temporal coherency, and layout consistency. We additionally show that LidarDM can be used as a generative world model simulator for training and testing perception models.
UniTacHand: Unified Spatio-Tactile Representation for Human to Robotic Hand Skill Transfer
Tactile sensing is crucial for robotic hands to achieve human-level dexterous manipulation, especially in scenarios with visual occlusion. However, its application is often hindered by the difficulty of collecting large-scale real-world robotic tactile data. In this study, we propose to collect low-cost human manipulation data using haptic gloves for tactile-based robotic policy learning. The misalignment between human and robotic tactile data makes it challenging to transfer policies learned from human data to robots. To bridge this gap, we propose UniTacHand, a unified representation to align robotic tactile information captured by dexterous hands with human hand touch obtained from gloves. First, we project tactile signals from both human hands and robotic hands onto a morphologically consistent 2D surface space of the MANO hand model. This unification standardizes the heterogeneous data structures and inherently embeds the tactile signals with spatial context. Then, we introduce a contrastive learning method to align them into a unified latent space, trained on only 10 minutes of paired data from our data collection system. Our approach enables zero-shot tactile-based policy transfer from humans to a real robot, generalizing to objects unseen in the pre-training data. We also demonstrate that co-training on mixed data, including both human and robotic demonstrations via UniTacHand, yields better performance and data efficiency compared with using only robotic data. UniTacHand paves a path toward general, scalable, and data-efficient learning for tactile-based dexterous hands.
comment: The first two authors contributed equally
Developing a Fundamental Diagram for Urban Air Mobility Based on Physical Experiments
Urban Air Mobility (UAM) is an emerging application of unmanned aerial vehicles (UAVs) that promises to reduce travel time and alleviate congestion in urban transportation systems. As drone density increases, UAM operations are expected to experience congestion similar to that in ground traffic. However, the fundamental characteristics of UAM traffic flow, particularly under real-world operating conditions, remain poorly understood. This study proposes a general framework for constructing the fundamental diagram (FD) of UAM traffic by integrating theoretical analysis with physical experiments. To the best of our knowledge, this is the first study to derive a UAM FD using real-world physical test data. On the theoretical side, we design two drone control laws for collision avoidance and develop simulation-based traffic generation methods to produce diverse UAM traffic scenarios. Based on Edie's definition, traffic flow theory is then applied to construct the FD and characterize the macroscopic properties of UAM traffic. To account for real-world disturbances and modeling uncertainties, we further conduct physical experiments on a reduced-scale testbed using Bitcraze Crazyflie drones. Both simulation and physical test trajectory data are collected and organized into the UAMTra2Flow dataset, which is analyzed using the proposed framework. Preliminary results indicate that classical FD structures for ground transportation are also applicable to UAM systems. Notably, FD curves obtained from physical experiments exhibit deviations from simulation-based results, highlighting the importance of experimental validation. Finally, results from the reduced-scale testbed are scaled to realistic operating conditions to provide practical insights for future UAM traffic systems. The dataset and code for this paper are publicly available at https://github.com/CATS-Lab/UAM-FD.
CHARM: Considering Human Attributes for Reinforcement Modeling
Reinforcement Learning from Human Feedback has recently achieved significant success in various fields, and its performance is highly related to feedback quality. While much prior work acknowledged that human teachers' characteristics would affect human feedback patterns, there is little work that has closely investigated the actual effects. In this work, we designed an exploratory study investigating how human feedback patterns are associated with human characteristics. We conducted a public space study with two long horizon tasks and 46 participants. We found that feedback patterns are not only correlated with task statistics, such as rewards, but also correlated with participants' characteristics, especially robot experience and educational background. Additionally, we demonstrated that human feedback value can be more accurately predicted with human characteristics compared to only using task statistics. All human feedback and characteristics we collected, and codes for our data collection and predicting more accurate human feedback are available at https://github.com/AABL-Lab/CHARM
How Much Progress Did I Make? An Unexplored Human Feedback Signal for Teaching Robots
Enhancing the expressiveness of human teaching is vital for both improving robots' learning from humans and the human-teaching-robot experience. In this work, we characterize and test a little-used teaching signal: \textit{progress}, designed to represent the completion percentage of a task. We conducted two online studies with 76 crowd-sourced participants and one public space study with 40 non-expert participants to validate the capability of this progress signal. We find that progress indicates whether the task is successfully performed, reflects the degree of task completion, identifies unproductive but harmless behaviors, and is likely to be more consistent across participants. Furthermore, our results show that giving progress does not require extra workload and time. An additional contribution of our work is a dataset of 40 non-expert demonstrations from the public space study through an ice cream topping-adding task, which we observe to be multi-policy and sub-optimal, with sub-optimality not only from teleoperation errors but also from exploratory actions and attempts. The dataset is available at https://github.com/TeachingwithProgress/Non-Expert\_Demonstrations.
comment: 8 pages. RO-MAN 2024
Model-free source seeking of exponentially convergent unicycle: theoretical and robotic experimental results
This paper introduces a novel model-free, real-time unicycle-based source seeking design. This design autonomously steers the unicycle dynamic system towards the extremum point of an objective function or physical/scalar signal that is unknown expression-wise, but accessible via measurements. A key contribution of this paper is that the introduced design converges exponentially to the extremum point of objective functions (or scalar signals) that behave locally like a higher-degree power function (e.g., fourth-degree polynomial function) as opposed to locally quadratic objective functions, the usual case in literature. We provide theoretical results and design characterization, supported by a variety of simulation results that demonstrate the robustness of the proposed design, including cases with different initial conditions and measurement delays/noise. Also, for the first time in the literature, we provide experimental robotic results that demonstrate the effectiveness of the proposed design and its exponential convergence ability.
MambaIO: Global-Coordinate Inertial Odometry for Pedestrians via Multi-Scale Frequency-Decoupled Modeling
Inertial Odometry (IO) enables real-time localization using only acceleration and angular velocity measurements from an Inertial Measurement Unit (IMU), making it a promising solution for localization in consumer-grade applications. Traditionally, researchers have routinely transformed IMU measurements into the global frame to obtain smoother motion representations. However, recent studies in drone scenarios have demonstrated that the body frame can significantly improve localization accuracy, prompting a re-evaluation of the suitability of the global frame for pedestrian IO. To address this issue, this paper systematically evaluates the effectiveness of the global frame in pedestrian IO through theoretical analysis, qualitative inspection, and quantitative experiments. Building upon these findings, we further propose MambaIO, which decomposes IMU measurements into high-frequency and low-frequency components using a Laplacian pyramid. The low-frequency component is processed by a Mamba architecture to extract implicit contextual motion cues, while the high-frequency component is handled by a convolutional structure to capture fine-grained local motion details. Experiments on multiple public datasets show that MambaIO substantially reduces localization error and achieves state-of-the-art (SOTA) performance. To the best of our knowledge, this is the first application of the Mamba architecture to the IO task.
Enhancing Fatigue Detection through Heterogeneous Multi-Source Data Integration and Cross-Domain Modality Imputation
Fatigue detection for human operators plays a key role in safety critical applications such as aviation, mining, and long haul transport. While numerous studies have demonstrated the effectiveness of high fidelity sensors in controlled laboratory environments, their performance often degrades when ported to real world settings due to noise, lighting conditions, and field of view constraints, thereby limiting their practicality. This paper formalizes a deployment oriented setting for real world fatigue detection, where high quality sensors are often unavailable in practical applications. To address this challenge, we propose leveraging knowledge from heterogeneous source domains, including high fidelity sensors that are difficult to deploy in the field but commonly used in controlled environments, to assist fatigue detection in the real world target domain. Building on this idea, we design a heterogeneous and multiple source fatigue detection framework that adaptively utilizes the available modalities in the target domain while exploiting diverse configurations in the source domains through alignment across domains and modality imputation. Our experiments, conducted using a field deployed sensor setup and two publicly available human fatigue datasets, demonstrate the practicality, robustness, and improved generalization of our approach across subjects and domains. The proposed method achieves consistent gains over strong baselines in sensor constrained scenarios. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.
comment: 4figures,14pages
Towards autonomous photogrammetric forest inventory using a lightweight under-canopy robotic drone
Drones are increasingly used in forestry to capture high-resolution remote sensing data, supporting enhanced monitoring, assessment, and decision-making processes. While operations above the forest canopy are already highly automated, flying inside forests remains challenging, primarily relying on manual piloting. In dense forests, relying on the Global Navigation Satellite System (GNSS) for localization is not feasible. In addition, the drone must autonomously adjust its flight path to avoid collisions. Recently, advancements in robotics have enabled autonomous drone flights in GNSS-denied obstacle-rich areas. In this article, a step towards autonomous forest data collection is taken by building a prototype of a robotic under-canopy drone utilizing state-of-the-art open source methods and validating its performance for data collection inside forests. Specifically, the study focused on camera-based autonomous flight under the forest canopy and photogrammetric post-processing of the data collected with the low-cost onboard stereo camera. The autonomous flight capability of the prototype was evaluated through multiple test flights in boreal forests. The tree parameter estimation capability was studied by performing diameter at breast height (DBH) estimation. The prototype successfully carried out flights in selected challenging forest environments, and the experiments showed promising performance in forest 3D modeling with a miniaturized stereoscopic photogrammetric system. The DBH estimation achieved a root mean square error (RMSE) of 3.33 - 3.97 cm (10.69 - 12.98 %) across all trees. For trees with a DBH less than 30 cm, the RMSE was 1.16 - 2.56 cm (5.74 - 12.47 %). The results provide valuable insights into autonomous under-canopy forest mapping and highlight the critical next steps for advancing lightweight robotic drone systems for mapping complex forest environments.
comment: This is an Accepted Manuscript of an article published by Taylor & Francis in International Journal of Remote Sensing on 03 Dec 2025. The published final Version of Record is openly available at: https://doi.org/10.1080/01431161.2025.2579803
Robust Robotic Exploration and Mapping Using Generative Occupancy Map Synthesis
We present a novel approach for enhancing robotic exploration by using generative occupancy mapping. We implement SceneSense, a diffusion model designed and trained for predicting 3D occupancy maps given partial observations. Our proposed approach probabilistically fuses these predictions into a running occupancy map in real-time, resulting in significant improvements in map quality and traversability. We deploy SceneSense on a quadruped robot and validate its performance with real-world experiments to demonstrate the effectiveness of the model. In these experiments we show that occupancy maps enhanced with SceneSense predictions better estimate the distribution of our fully observed ground truth data ($24.44\%$ FID improvement around the robot and $75.59\%$ improvement at range). We additionally show that integrating SceneSense enhanced maps into our robotic exploration stack as a ``drop-in'' map improvement, utilizing an existing off-the-shelf planner, results in improvements in robustness and traversability time. Finally, we show results of full exploration evaluations with our proposed system in two dissimilar environments and find that locally enhanced maps provide more consistent exploration results than maps constructed only from direct sensor measurements.
comment: arXiv admin note: text overlap with arXiv:2409.10681
Multiagent Systems
Nested Browser-Use Learning for Agentic Information Seeking
Information-seeking (IS) agents have achieved strong performance across a range of wide and deep search tasks, yet their tool use remains largely restricted to API-level snippet retrieval and URL-based page fetching, limiting access to the richer information available through real browsing. While full browser interaction could unlock deeper capabilities, its fine-grained control and verbose page content returns introduce substantial complexity for ReAct-style function-calling agents. To bridge this gap, we propose Nested Browser-Use Learning (NestBrowse), which introduces a minimal and complete browser-action framework that decouples interaction control from page exploration through a nested structure. This design simplifies agentic reasoning while enabling effective deep-web information acquisition. Empirical results on challenging deep IS benchmarks demonstrate that NestBrowse offers clear benefits in practice. Further in-depth analyses underscore its efficiency and flexibility.
Assessing behaviour coverage in a multi-agent system simulation for autonomous vehicle testing
As autonomous vehicle technology advances, ensuring the safety and reliability of these systems becomes paramount. Consequently, comprehensive testing methodologies are essential to evaluate the performance of autonomous vehicles in diverse and complex real-world scenarios. This study focuses on the behaviour coverage analysis of a multi-agent system simulation designed for autonomous vehicle testing, and provides a systematic approach to measure and assess behaviour coverage within the simulation environment. By defining a set of driving scenarios, and agent interactions, we evaluate the extent to which the simulation encompasses a broad range of behaviours relevant to autonomous driving. Our findings highlight the importance of behaviour coverage in validating the effectiveness and robustness of autonomous vehicle systems. Through the analysis of behaviour coverage metrics and coverage-based testing, we identify key areas for improvement and optimization in the simulation framework. Thus, a Model Predictive Control (MPC) pedestrian agent is proposed, where its objective function is formulated to encourage \textit{interesting} tests while promoting a more realistic behaviour than other previously studied pedestrian agents. This research contributes to advancing the field of autonomous vehicle testing by providing insights into the comprehensive evaluation of system behaviour in simulated environments. The results offer valuable implications for enhancing the safety, reliability, and performance of autonomous vehicles through rigorous testing methodologies.
Optimal Scalability-Aware Allocation of Swarm Robots: From Linear to Retrograde Performance via Marginal Gains
In collective systems, the available agents are a limited resource that must be allocated among tasks to maximize collective performance. Computing the optimal allocation of several agents to numerous tasks through a brute-force approach can be infeasible, especially when each task's performance scales differently with the increase of agents. For example, difficult tasks may require more agents to achieve similar performances compared to simpler tasks, but performance may saturate nonlinearly as the number of allocated agents increases. We propose a computationally efficient algorithm, based on marginal performance gains, for optimally allocating agents to tasks with concave scalability functions, including linear, saturating, and retrograde scaling, to achieve maximum collective performance. We test the algorithm by allocating a simulated robot swarm among collective decision-making tasks, where embodied agents sample their environment and exchange information to reach a consensus on spatially distributed environmental features. We vary task difficulties by different geometrical arrangements of environmental features in space (patchiness). In this scenario, decision performance in each task scales either as a saturating curve (following the Condorcet's Jury Theorem in an interference-free setup) or as a retrograde curve (when physical interference among robots restricts their movement). Using simple robot simulations, we show that our algorithm can be useful in allocating robots among tasks. Our approach aims to advance the deployment of future real-world multi-robot systems.
comment: 14 pages, 11 figures, Accepted for publication in IEEE Transactions on Systems, Man, and Cybernetics: Systems
The Law of Multi-Model Collaboration: Scaling Limits of Model Ensembling for Large Language Models
Recent advances in large language models (LLMs) have been largely driven by scaling laws for individual models, which predict performance improvements as model parameters and data volume increase. However, the capabilities of any single LLM are inherently bounded. One solution originates from intricate interactions among multiple LLMs, rendering their collective performance surpasses that of any constituent model. Despite the rapid proliferation of multi-model integration techniques such as model routing and post-hoc ensembling, a unifying theoretical framework of performance scaling for multi-model collaboration remains absent. In this work, we propose the Law of Multi-model Collaboration, a scaling law that predicts the performance limits of LLM ensembles based on their aggregated parameter budget. To quantify the intrinsic upper bound of multi-model collaboration, we adopt a method-agnostic formulation and assume an idealized integration oracle where the total cross-entropy loss of each sample is determined by the minimum loss of any model in the model pool. Experimental results reveal that multi-model systems follow a power-law scaling with respect to the total parameter count, exhibiting a more significant improvement trend and a lower theoretical loss floor compared to single model scaling. Moreover, ensembles of heterogeneous model families achieve better performance scaling than those formed within a single model family, indicating that model diversity is a primary driver of collaboration gains. These findings suggest that model collaboration represents a critical axis for extending the intelligence frontier of LLMs.
KernelEvolve: Scaling Agentic Kernel Coding for Heterogeneous AI Accelerators at Meta
Making deep learning recommendation model (DLRM) training and inference fast and efficient is important. However, this presents three key system challenges - model architecture diversity, kernel primitive diversity, and hardware generation and architecture heterogeneity. This paper presents KernelEvolve-an agentic kernel coding framework-to tackle heterogeneity at-scale for DLRM. KernelEvolve is designed to take kernel specifications as input and automate the process of kernel generation and optimization for recommendation model across heterogeneous hardware architectures. KernelEvolve does so by operating at multiple programming abstractions, from Triton and CuTe DSL to low-level hardware agnostic languages, spanning the full hardware-software optimization stack. The kernel optimization process is described as graph-based search with selection policy, universal operator, fitness function, and termination rule, dynamically adapts to runtime execution context through retrieval-augmented prompt synthesis. We designed, implemented, and deployed KernelEvolve to optimize a wide variety of production recommendation models across generations of NVIDIA and AMD GPUs, as well as Meta's AI accelerators. We validate KernelEvolve on the publicly-available KernelBench suite, achieving 100% pass rate on all 250 problems across three difficulty levels, and 160 PyTorch ATen operators across three heterogeneous hardware platforms, demonstrating 100% correctness. KernelEvolve reduces development time from weeks to hours and achieves substantial performance improvements over PyTorch baselines across diverse production use cases and for heterogeneous AI systems at-scale. Beyond performance efficiency improvements, KernelEvolve significantly mitigates the programmability barrier for new AI hardware by enabling automated kernel generation for in-house developed AI hardware.
SPIRAL: Symbolic LLM Planning via Grounded and Reflective Search
Large Language Models (LLMs) often falter at complex planning tasks that require exploration and self-correction, as their linear reasoning process struggles to recover from early mistakes. While search algorithms like Monte Carlo Tree Search (MCTS) can explore alternatives, they are often ineffective when guided by sparse rewards and fail to leverage the rich semantic capabilities of LLMs. We introduce SPIRAL (Symbolic LLM Planning via Grounded and Reflective Search), a novel framework that embeds a cognitive architecture of three specialized LLM agents into an MCTS loop. SPIRAL's key contribution is its integrated planning pipeline where a Planner proposes creative next steps, a Simulator grounds the search by predicting realistic outcomes, and a Critic provides dense reward signals through reflection. This synergy transforms MCTS from a brute-force search into a guided, self-correcting reasoning process. On the DailyLifeAPIs and HuggingFace datasets, SPIRAL consistently outperforms the default Chain-of-Thought planning method and other state-of-the-art agents. More importantly, it substantially surpasses other state-of-the-art agents; for example, SPIRAL achieves 83.6% overall accuracy on DailyLifeAPIs, an improvement of over 16 percentage points against the next-best search framework, while also demonstrating superior token efficiency. Our work demonstrates that structuring LLM reasoning as a guided, reflective, and grounded search process yields more robust and efficient autonomous planners. The source code, full appendices, and all experimental data are available for reproducibility at the official project repository.
Multi-Agent Framework for Threat Mitigation and Resilience in AI-Based Systems
Machine learning (ML) underpins foundation models in finance, healthcare, and critical infrastructure, making them targets for data poisoning, model extraction, prompt injection, automated jailbreaking, and preference-guided black-box attacks that exploit model comparisons. Larger models can be more vulnerable to introspection-driven jailbreaks and cross-modal manipulation. Traditional cybersecurity lacks ML-specific threat modeling for foundation, multimodal, and RAG systems. Objective: Characterize ML security risks by identifying dominant TTPs, vulnerabilities, and targeted lifecycle stages. Methods: We extract 93 threats from MITRE ATLAS (26), AI Incident Database (12), and literature (55), and analyze 854 GitHub/Python repositories. A multi-agent RAG system (ChatGPT-4o, temp 0.4) mines 300+ articles to build an ontology-driven threat graph linking TTPs, vulnerabilities, and stages. Results: We identify unreported threats including commercial LLM API model stealing, parameter memorization leakage, and preference-guided text-only jailbreaks. Dominant TTPs include MASTERKEY-style jailbreaking, federated poisoning, diffusion backdoors, and preference optimization leakage, mainly impacting pre-training and inference. Graph analysis reveals dense vulnerability clusters in libraries with poor patch propagation. Conclusion: Adaptive, ML-specific security frameworks, combining dependency hygiene, threat intelligence, and monitoring, are essential to mitigate supply-chain and inference risks across the ML lifecycle.
comment: 56 pages, 18 Figures, 22 Tables, TOSEM
It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web Agents
Web-based agents powered by large language models are increasingly used for tasks such as email management or professional networking. Their reliance on dynamic web content, however, makes them vulnerable to prompt injection attacks: adversarial instructions hidden in interface elements that persuade the agent to divert from its original task. We introduce the Task-Redirecting Agent Persuasion Benchmark (TRAP), an evaluation for studying how persuasion techniques misguide autonomous web agents on realistic tasks. Across six frontier models, agents are susceptible to prompt injection in 25\% of tasks on average (13\% for GPT-5 to 43\% for DeepSeek-R1), with small interface or contextual changes often doubling success rates and revealing systemic, psychologically driven vulnerabilities in web-based agents. We also provide a modular social-engineering injection framework with controlled experiments on high-fidelity website clones, allowing for further benchmark expansion.
Zero-Trust Agentic Federated Learning for Secure IIoT Defense Systems
Recent attacks on critical infrastructure, including the 2021 Oldsmar water treatment breach and 2023 Danish energy sector compromises, highlight urgent security gaps in Industrial IoT (IIoT) deployments. While Federated Learning (FL) enables privacy-preserving collaborative intrusion detection, existing frameworks remain vulnerable to Byzantine poisoning attacks and lack robust agent authentication. We propose Zero-Trust Agentic Federated Learning (ZTA-FL), a defense in depth framework combining: (1) TPM-based cryptographic attestation achieving less than 0.0000001 false acceptance rate, (2) a novel SHAP-weighted aggregation algorithm providing explainable Byzantine detection under non-IID conditions with theoretical guarantees, and (3) privacy-preserving on-device adversarial training. Comprehensive experiments across three IDS benchmarks (Edge-IIoTset, CIC-IDS2017, UNSW-NB15) demonstrate that ZTA-FL achieves 97.8 percent detection accuracy, 93.2 percent accuracy under 30 percent Byzantine attacks (outperforming FLAME by 3.1 percent, p less than 0.01), and 89.3 percent adversarial robustness while reducing communication overhead by 34 percent. We provide theoretical analysis, failure mode characterization, and release code for reproducibility.
comment: 9 Pages and 6 figures, Submitted in conference 2nd IEEE Conference on Secure and Trustworthy Cyber Infrastructure for IoT and Microelectronics, Houston TX, USA
Developing a Fundamental Diagram for Urban Air Mobility Based on Physical Experiments
Urban Air Mobility (UAM) is an emerging application of unmanned aerial vehicles (UAVs) that promises to reduce travel time and alleviate congestion in urban transportation systems. As drone density increases, UAM operations are expected to experience congestion similar to that in ground traffic. However, the fundamental characteristics of UAM traffic flow, particularly under real-world operating conditions, remain poorly understood. This study proposes a general framework for constructing the fundamental diagram (FD) of UAM traffic by integrating theoretical analysis with physical experiments. To the best of our knowledge, this is the first study to derive a UAM FD using real-world physical test data. On the theoretical side, we design two drone control laws for collision avoidance and develop simulation-based traffic generation methods to produce diverse UAM traffic scenarios. Based on Edie's definition, traffic flow theory is then applied to construct the FD and characterize the macroscopic properties of UAM traffic. To account for real-world disturbances and modeling uncertainties, we further conduct physical experiments on a reduced-scale testbed using Bitcraze Crazyflie drones. Both simulation and physical test trajectory data are collected and organized into the UAMTra2Flow dataset, which is analyzed using the proposed framework. Preliminary results indicate that classical FD structures for ground transportation are also applicable to UAM systems. Notably, FD curves obtained from physical experiments exhibit deviations from simulation-based results, highlighting the importance of experimental validation. Finally, results from the reduced-scale testbed are scaled to realistic operating conditions to provide practical insights for future UAM traffic systems. The dataset and code for this paper are publicly available at https://github.com/CATS-Lab/UAM-FD.
Systems and Control (EESS)
A Review of Community-Centric Power Systems Resilience Assessment and Enhancement Strategies
This paper presents a comprehensive review of resilience metrics, covering both engineering-based measures, such as fragility-curve modeling, and data-driven approaches, including triangular and trapezoidal representations. Next, the paper examines the interdependencies between power systems resilience and community resilience, addressing socioeconomic and behavioral dimensions, infrastructure interconnections, and the emerging role of resilience hubs. The review then synthesizes state-of-the-art strategies for enhancing power system resilience, including network hardening, resource allocation, optimal scheduling, and reconfiguration techniques. Special emphasis is placed on the integration of Artificial Intelligence (AI) methods and the techno-legal dimensions of resilient power systems and communities. In particular, the paper contrasts the regulatory landscapes of the European Union and the United States, highlighting key similarities and distinctions. By analyzing methodologies for mitigating the impacts of high-impact, low-probability (HILP) events, the review identifies critical research gaps and outlines promising directions for future investigation.
comment: This paper is under review at an Elsevier journal. Revisions may be made in response to peer review
NashOpt -- A Python Library for Computing Generalized Nash Equilibria
NashOpt is an open-source Python library for computing and designing generalized Nash equilibria (GNEs) in noncooperative games with shared constraints and real-valued decision variables. The library exploits the joint Karush-Kuhn-Tucker (KKT) conditions of all players to handle both general nonlinear GNEs and linear-quadratic games, including their variational versions. Nonlinear games are solved via nonlinear least-squares formulations, relying on JAX for automatic differentiation. Linear-quadratic GNEs are reformulated as mixed-integer linear programs, enabling efficient computation of multiple equilibria. The framework also supports inverse-game and Stackelberg game-design problems. The capabilities of NashOpt are demonstrated through several examples, including noncooperative game-theoretic control problems of linear quadratic regulation and model predictive control. The library is available at https://github.com/bemporad/nashopt
comment: 23 pages, 6 figures
Unsupervised Learning for Detection of Rare Driving Scenarios
The detection of rare and hazardous driving scenarios is a critical challenge for ensuring the safety and reliability of autonomous systems. This research explores an unsupervised learning framework for detecting rare and extreme driving scenarios using naturalistic driving data (NDD). We leverage the recently proposed Deep Isolation Forest (DIF), an anomaly detection algorithm that combines neural network-based feature representations with Isolation Forests (IFs), to identify non-linear and complex anomalies. Data from perception modules, capturing vehicle dynamics and environmental conditions, is preprocessed into structured statistical features extracted from sliding windows. The framework incorporates t-distributed stochastic neighbor embedding (t-SNE) for dimensionality reduction and visualization, enabling better interpretability of detected anomalies. Evaluation is conducted using a proxy ground truth, combining quantitative metrics with qualitative video frame inspection. Our results demonstrate that the proposed approach effectively identifies rare and hazardous driving scenarios, providing a scalable solution for anomaly detection in autonomous driving systems. Given the study's methodology, it was unavoidable to depend on proxy ground truth and manually defined feature combinations, which do not encompass the full range of real-world driving anomalies or their nuanced contextual dependencies.
Sidelobe Modification for an Offset Gregorian Reflector System using a Reconfigurable Intelligent Surface-Equipped Subreflector
In past work, we described the use of a reconfigurable intelligent surface (RIS) mounted on the rim of an axisymmetric prime focus-fed reflector to create nulls in the close-in sidelobes. In this paper, we show that similar performance is possible in an offset Gregorian reflector system using a RIS on the rim of the subreflector. Applications include radio astronomy, where offset Gregorian reflectors are common and observations are subject to deleterious levels of interference from satellites entering through sidelobes. We show that an efficient RIS replacing the outer one-third of the subreflector surface, employing passive elements with 1-bit phase-only control, can create a null in the peak of the second sidelobe in the quiescent pattern. This is achieved using a simple unconstrained optimization algorithm to set the states of the RIS elements. The algorithm yields a deep null with just 0.2~dB reduction in main lobe directivity, despite lacking any constraints on main lobe pattern. Compared to our previous approach of mounting the RIS on the rim of the main reflector, the subreflector-based approach demonstrated in this paper requires a much smaller RIS and can implemented in existing systems by replacing the subreflector.
comment: 4 pages, 3 figures
Robust Deep Learning Control with Guaranteed Performance for Safe and Reliable Robotization in Heavy-Duty Machinery
Today's heavy-duty mobile machines (HDMMs) face two transitions: from diesel-hydraulic actuation to clean electric systems driven by climate goals, and from human supervision toward greater autonomy. Diesel-hydraulic systems have long dominated, so full electrification, via direct replacement or redesign, raises major technical and economic challenges. Although advanced artificial intelligence (AI) could enable higher autonomy, adoption in HDMMs is limited by strict safety requirements, and these machines still rely heavily on human supervision. This dissertation develops a control framework that (1) simplifies control design for electrified HDMMs through a generic modular approach that is energy-source independent and supports future modifications, and (2) defines hierarchical control policies that partially integrate AI while guaranteeing safety-defined performance and stability. Five research questions align with three lines of investigation: a generic robust control strategy for multi-body HDMMs with strong stability across actuation types and energy sources; control solutions that keep strict performance under uncertainty and faults while balancing robustness and responsiveness; and methods to interpret and trust black-box learning strategies so they can be integrated stably and verified against international safety standards. The framework is validated in three case studies spanning different actuators and conditions, covering heavy-duty mobile robots and robotic manipulators. Results appear in five peer-reviewed publications and one unpublished manuscript, advancing nonlinear control and robotics and supporting both transitions.
comment: Doctoral Dissertation, Tampere University
Control Co-design of systems with parabolic partial differential equation dynamics
In this paper we study the control co-design (CCD) synthesis problem for a class of systems with parabolic partial differential equation (PDE) dynamics. We formulate CCD problem and finally derive an approximate CCD problem with matrix algebraic constraint. We then solve this approximate problem with gradient-based method and prove that the optimal solution also stabilizes the PDE system. We justify approach through numerical examples.
Agentic AI-Enhanced Semantic Communications: Foundations, Architecture, and Applications
Semantic communications (SemCom), as one of the key technologies for 6G, is shifting networks from bit transmission to semantic information exchange. On this basis, introducing agentic artificial intelligence (AI) with perception, memory, reasoning, and action capabilities provides a practicable path to intelligent communications. This paper provides a systematic exposition of how agentic AI empowers SemCom from the perspectives of research foundations, system architecture, and application scenarios. We first provide a comprehensive review of existing studies by agent types, covering embedded agents, large language model (LLM)/large vision model (LVM) agents, and reinforcement learning (RL) agents. Additionally, we propose a unified agentic AI-enhanced SemCom framework covering the application layer, the semantic layer, and the cloud-edge collaboration layer, forming a closed loop from intent to encoding to transmission to decoding to action to evaluation. We also present several typical scenarios, including multi-vehicle collaborative perception, multi-robot cooperative rescue, and agentic operations for intellicise (intelligent and concise) networks. Furthermore, we introduce an agentic knowledge base (KB)-based joint source-channel coding case study, AKB-JSCC, where the source KB and channel KB are built by LLM/LVM agents and RL agents, respectively. Experimental results show that AKB-JSCC achieves higher information reconstruction quality under different channel conditions. Finally, we discuss future evolution and research directions, providing a reference for portable, verifiable, and controllable research and deployment of agentic SemCom.
Revealing design archetypes and flexibility in e-molecule import pathways using Modeling to Generate Alternatives and interpretable machine learning
Given the central role of green e-molecule imports in the European energy transition, many studies optimize import pathways and identify a single cost-optimal solution. However, cost optimality is fragile, as real-world implementation depends on regulatory, spatial, and stakeholder constraints that are difficult to represent in optimization models and can render cost-optimal designs infeasible. To address this limitation, we generate a diverse set of near-cost-optimal alternatives within an acceptable cost margin using Modeling to Generate Alternatives, accounting for unmodeled uncertainties. Interpretable machine learning is then applied to extract insights from the resulting solution space. The approach is applied to hydrogen import pathways considering hydrogen, ammonia, methane, and methanol as carriers. Results reveal a broad near-optimal space with great flexibility: solar, wind, and storage are not strictly required to remain within 10% of the cost optimum. Wind constraints favor solar-storage methanol pathways, while limited storage favors wind-based ammonia or methane pathways.
A Learning-Driven Stochastic Hybrid System Framework for Detecting Unobservable Contingencies in Power Systems
This paper presents a new learning based Stochastic Hybrid System (LSHS) framework designed for the detection and classification of contingencies in modern power systems. Unlike conventional monitoring schemes, the proposed approach is capable of identifying unobservable events that remain hidden from standard sensing infrastructures, such as undetected protection system malfunctions. The framework operates by analyzing deviations in system outputs and behaviors, which are then categorized into three groups: physical, control, and measurement contingencies based on their impact on the SHS model. The SHS model integrates both system dynamics and observer-driven state estimation error dynamics. Within this architecture, machine learning classifiers are employed to achieve rapid and accurate categorization of contingencies. The effectiveness of the method is demonstrated through simulations on the IEEE 5-bus and 30-bus systems, where results indicate substantial improvements in both detection speed and accuracy compared with existing approaches.
The Dawn of Agentic EDA: A Survey of Autonomous Digital Chip Design
This survey provides a comprehensive overview of the integration of Generative AI and Agentic AI within the field of Digital Electronic Design Automation (EDA). The paper first reviews the paradigmatic evolution from traditional Computer-Aided Design (CAD) to AI-assisted EDA (AI4EDA), and finally to the emerging AI-Native and Agentic design paradigms. We detail the application of these paradigms across the digital chip design flow, including the construction of agentic cognitive architectures based on multimodal foundation models, frontend RTL code generation and intelligent verification, and backend physical design featuring algorithmic innovations and tool orchestration. We validate these methodologies through integrated case studies, demonstrating practical viability from microarchitecture definition to GDSII. Special emphasis is placed on the potential for cross-stage feedback loops where agents utilize backend PPA metrics to autonomously refine frontend logic. Furthermore, this survey delves into the dual-faceted impact on security, covering novel adversarial risks, automated vulnerability repair, and privacy-preserving infrastructure. Finally, the paper critically summarizes current challenges related to hallucinations, data scarcity, and black-box tools, and outlines future trends towards L4 autonomous chip design. Ultimately, this work aims to define the emerging field of Agentic EDA and provide a strategic roadmap for the transition from AI-assisted tools to fully autonomous design engineers.
Multi-objective control strategy of Electro-Mechanical Transmission Based on Driving Pattern Division
Based on the driving requirement and power balance of heavy-duty vehicle equipped with Electro-Mechanical Transmission (EMT), optimization goals under different driving patterns are put forward. The optimization objectives are changed into a comprehensive optimization target based on the method of weighting, which is calculated by using analytic hierarchy process (AHP) under different working conditions. According to theory of Dynamic Programming (DP), a multi-object control strategy of DP under different driving patterns is proposed. This strategy is verified by simulation and contrasted with rule strategy, the results show that comprehensive performance is significantly enhanced, and the fuel economy is highly improved especially.
comment: 25pages 10figures
Learning-based data-enabled economic predictive control with convex optimization for nonlinear systems
In this article, we propose a data-enabled economic predictive control method for a class of nonlinear systems, which aims to optimize the economic operational performance while handling hard constraints on the system outputs. Two lifting functions are constructed via training neural networks, which generate mapped input and mapped output in a higher-dimensional space, where the nonlinear economic cost function can be approximated using a quadratic function of the mapped variables. The data-enabled predictive control framework is extended to address nonlinear dynamics by using the mapped input and the mapped output that belong to a virtual linear representation, which serves as an approximation of the original nonlinear system. Additionally, we reconstruct the system output variables from the mapped output, on which hard output constraints are imposed. The online control problem is formulated as a convex optimization problem, despite the nonlinearity of the system dynamics and the original economic cost function. Theoretical analysis is presented to justify the suitability of the proposed method for nonlinear systems. We evaluate the proposed method through two large-scale industrial case studies: (i) a biological water treatment process, and (ii) a solvent-based shipboard post-combustion carbon capture process. These studies demonstrate its effectiveness and advantages.
comment: 18 pages,7 figures,9 tables
Breaking Symmetry-Induced Degeneracy in Multi-Agent Ergodic Coverage via Stochastic Spectral Control
Multi-agent ergodic coverage via Spectral Multiscale Coverage (SMC) provides a principled framework for driving a team of agents so that their collective time-averaged trajectories match a prescribed spatial distribution. While classical SMC has demonstrated empirical success, it can suffer from gradient cancellation, particularly when agents are initialized near symmetry points of the target distribution, leading to undesirable behaviors such as stalling or motion constrained along symmetry axes. In this work, we rigorously characterize the initial conditions and symmetry-induced invariant manifolds that give rise to such directional degeneracy in first-order agent dynamics. To address this, we introduce a stochastic perturbation combined with a contraction term and prove that the resulting dynamics ensure almost-sure escape from zero-gradient manifolds while maintaining mean-square boundedness of agent trajectories. Simulations on symmetric multi-modal reference distributions demonstrate that the proposed stochastic SMC effectively mitigates transient stalling and axis-constrained motion, while ensuring that all agent trajectories remain bounded within the domain.
Distributed Beamforming in Massive MIMO Communication for a Constellation of Airborne Platform Stations
Non-terrestrial base stations (NTBSs), including high-altitude platform stations (HAPSs) and hot-air balloons (HABs), are integral to next-generation wireless networks, offering coverage in remote areas and enhancing capacity in dense regions. In this paper, we propose a distributed beamforming framework for a massive MIMO network with a constellation of aerial platform stations (APSs). Our approach leverages an entropy-based multi-agent deep reinforcement learning (DRL) model, where each APS operates as an independent agent using imperfect channel state information (CSI) in both training and testing phases. Unlike conventional methods, our model does not require CSI sharing among APSs, significantly reducing overhead. Simulations results demonstrate that our method outperforms zero forcing (ZF) and maximum ratio transmission (MRT) techniques, particularly in high-interference scenarios, while remaining robust to CSI imperfections. Additionally, our framework exhibits scalability, maintaining stable performance over an increasing number of users and various cluster configurations. Therefore, the proposed method holds promise for dynamic and interference-rich NTBS networks, advancing scalable and robust wireless solutions.
Gaussian Process Implicit Surfaces as Control Barrier Functions for Safe Robot Navigation
Level set methods underpin modern safety techniques such as control barrier functions (CBFs), while also serving as implicit surface representations for geometric shapes via distance fields. Inspired by these two paradigms, we propose a unified framework where the implicit surface itself acts as a CBF. We leverage Gaussian process (GP) implicit surface (GPIS) to represent the safety boundaries, using safety samples which are derived from sensor measurements to condition the GP. The GP posterior mean defines the implicit safety surface (safety belief), while the posterior variance provides a robust safety margin. Although GPs have favorable properties such as uncertainty estimation and analytical tractability, they scale cubically with data. To alleviate this issue, we develop a sparse solution called sparse Gaussian CBFs. To the best of our knowledge, GPIS have not been explicitly used to synthesize CBFs. We validate the approach on collision avoidance tasks in two settings: a simulated 7-DOF manipulator operating around the Stanford bunny, and a quadrotor navigating in 3D around a physical chair. In both cases, Gaussian CBFs (with and without sparsity) enable safe interaction and collision-free execution of trajectories that would otherwise intersect the objects.
comment: 8 pages, 7 figures, under review
Never too Cocky to Cooperate: An FIM and RL-based USV-AUV Collaborative System for Underwater Tasks in Extreme Sea Conditions
This paper develops a novel unmanned surface vehicle (USV)-autonomous underwater vehicle (AUV) collaborative system designed to enhance underwater task performance in extreme sea conditions. The system integrates a dual strategy: (1) high-precision multi-AUV localization enabled by Fisher information matrix-optimized USV path planning, and (2) reinforcement learning-based cooperative planning and control method for multi-AUV task execution. Extensive experimental evaluations in the underwater data collection task demonstrate the system's operational feasibility, with quantitative results showing significant performance improvements over baseline methods. The proposed system exhibits robust coordination capabilities between USV and AUVs while maintaining stability in extreme sea conditions. To facilitate reproducibility and community advancement, we provide an open-source simulation toolkit available at: https://github.com/360ZMEM/USV-AUV-colab .
comment: This paper has been accepted by IEEE Transactions on Mobile Computing
Anti-Slip AI-Driven Model-Free Control with Global Exponential Stability in Skid-Steering Robots IROS
Undesired lateral and longitudinal wheel slippage can disrupt a mobile robot's heading angle, traction, and, eventually, desired motion. This issue makes the robotization and accurate modeling of heavy-duty machinery very challenging because the application primarily involves off-road terrains, which are susceptible to uneven motion and severe slippage. As a step toward robotization in skid-steering heavy-duty robot (SSHDR), this paper aims to design an innovative robust model-free control system developed by neural networks to strongly stabilize the robot dynamics in the presence of a broad range of potential wheel slippages. Before the control design, the dynamics of the SSHDR are first investigated by mathematically incorporating slippage effects, assuming that all functional modeling terms of the system are unknown to the control system. Then, a novel tracking control framework to guarantee global exponential stability of the SSHDR is designed as follows: 1) the unknown modeling of wheel dynamics is approximated using radial basis function neural networks (RBFNNs); and 2) a new adaptive law is proposed to compensate for slippage effects and tune the weights of the RBFNNs online during execution. Simulation and experimental results verify the proposed tracking control performance of a 4,836 kg SSHDR operating on slippery terrain.
comment: This paper has been published in 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
EM++: A parameter learning framework for stochastic switching systems
This paper proposes a general switching dynamical system model, and a custom majorization-minimization-based algorithm EM++ for identifying its parameters. For certain families of distributions, such as Gaussian distributions, this algorithm reduces to the well-known expectation-maximization method. We prove global convergence of the algorithm under suitable assumptions, thus addressing an important open issue in the switching system identification literature. The effectiveness of both the proposed model and algorithm is validated through extensive numerical experiments.
Contingency Model-based Control (CMC) for Communicationless Cooperative Collision Avoidance in Robot Swarms
Cooperative collision avoidance between robots in swarm operations remains an open challenge. Assuming a decentralized architecture, each robot is responsible for making its own control decisions, including motion planning. To this end, most existing approaches mostly rely some form of (wireless) communication between the agents of the swarm. In reality, however, communication is brittle. It may be affected by latency, further delays and packet losses, transmission faults, and is subject to adversarial attacks, such as jamming or spoofing. This paper proposes Contingency Model-based Control (CMC) as a communicationless alternative. It follows the implicit cooperation paradigm, under which the design of the robots is based on consensual (offline) rules, similar to traffic rules. They include the definition of a contingency trajectory for each robot, and a method for construction of mutual collision avoidance constraints. The setup is shown to guarantee the recursive feasibility and collision avoidance between all swarm members in closed-loop operation. Moreover, CMC naturally satisfies the Plug \& Play paradigm, i.e., for new robots entering the swarm. Two numerical examples demonstrate that the collision avoidance guarantee is intact and that the robot swarm operates smoothly under the CMC regime.
Combined Learning of Linear Parameter-Varying Models and Robust Control Invariant Sets
Dynamical models identified from data are frequently employed in control system design. However, decoupling system identification from controller synthesis can result in situations where no suitable controller exists after a model has been identified. In this work, we introduce a novel control-oriented regularization in the identification procedure to ensure the existence of a controller that can enforce constraints on system variables robustly. The combined identification algorithm includes: (i) the concurrent learning of an uncertain model and a nominal model using an observer; (ii) a regularization term on the model parameters defined as the size of the largest robust control invariant set for the uncertain model. To make the learning problem tractable, we consider nonlinear models in quasi Linear Parameter-Varying (qLPV) form, utilizing a novel scheduling function parameterization that facilitates the derivation of an associated uncertain linear model. The robust control invariant set is represented as a polytope, and we adopt novel results from polytope geometry to derive the regularization function as the optimal value of a convex quadratic program. Additionally, we present new model-reduction approaches that exploit the chosen model structure. Numerical examples on classical identification benchmarks demonstrate the efficacy of our approach. A simple control scheme is also derived to provide an example of data-driven control of a constrained nonlinear system.
comment: 17 Pages, Corresponding code found on https://github.com/samku/qLPV_ concurrent_identification
Sequential Operation of Residential Energy Hubs using Physics-Based Economic Nonlinear MPC
The operation of residential energy hubs with multiple energy carriers (electricity, heat, mobility) poses a significant challenge due to different carrier dynamics, hybrid storage coordination and high-dimensional action-spaces. Energy management systems oversee their operation, deciding the set points of the primary control layer. This paper presents a novel 2-stage economic model predictive controller for electrified buildings including physics-based models of the battery degradation and thermal systems. The hierarchical control operates in the Dutch sequential energy markets. In particular common assumptions regarding intra-day markets (auction and continuous-time) are discussed as well as the coupling of the different storage systems. The best control policy it is best to follow continuous time intra-day in the summer and the intra-day auction in the winter. This sequential operation comes at the expense of increased battery degradation. Lastly, under our controller, the realized short-term flexibility of the thermal energy storage is marginal compared to the flexibility delivered by stationary battery pack and electric vehicles with bidirectional charging.
An Adaptive Environment-Aware Transformer Autoencoder for UAV-FSO with Dynamic Complexity Control
The rise of sixth-generation (6G) wireless networks sets high demands on UAV-assisted Free Space Optical (FSO) communications, where the channel environment becomes more complex and variable due to both atmospheric turbulence and UAV-induced vibrations. These factors increase the challenge of maintaining reliable communication and require adaptive processing methods. Autoencoders are promising as they learn optimal encodings from channel data. However, existing autoencoder designs are generic and lack the specific adaptability and computational flexibility needed for UAV-FSO scenarios. To address this, we propose AEAT-AE (Adaptive Environment-aware Transformer Autoencoder), a Transformer-based framework that integrates environmental parameters into both encoder and decoder via a cross-attention mechanism. Moreover, AEAT-AE incorporates a Deep Q-Network (DQN) that dynamically selects which layers of the Transformer autoencoder to activate based on real-time environmental inputs, effectively balancing performance and computational cost. Simulation results demonstrate that AEAT-AE outperforms conventional methods in bit error rate while maintaining efficient runtime, representing a novel tailored solution for next-generation UAV-FSO communications.
Leveraging High-Fidelity Digital Models and Reinforcement Learning for Mission Engineering: A Case Study of Aerial Firefighting Under Perfect Information
As systems engineering (SE) objectives evolve from design and operation of monolithic systems to complex System of Systems (SoS), the discipline of Mission Engineering (ME) has emerged which is increasingly being accepted as a new line of thinking for the SE community. Moreover, mission environments are uncertain, dynamic, and mission outcomes are a direct function of how the mission assets will interact with this environment. This proves static architectures brittle and calls for analytically rigorous approaches for ME. To that end, this paper proposes an intelligent mission coordination methodology that integrates digital mission models with Reinforcement Learning (RL), that specifically addresses the need for adaptive task allocation and reconfiguration. More specifically, we are leveraging a Digital Engineering (DE) based infrastructure that is composed of a high-fidelity digital mission model and agent-based simulation; and then we formulate the mission tactics management problem as a Markov Decision Process (MDP), and employ an RL agent trained via Proximal Policy Optimization. By leveraging the simulation as a sandbox, we map the system states to actions, refining the policy based on realized mission outcomes. The utility of the RL-based intelligent mission coordinator is demonstrated through an aerial firefighting case study. Our findings indicate that the RL-based intelligent mission coordinator not only surpasses baseline performance but also significantly reduces the variability in mission performance. Thus, this study serves as a proof of concept demonstrating that DE-enabled mission simulations combined with advanced analytical tools offer a mission-agnostic framework for improving ME practice; which can be extended to more complicated fleet design and selection problems in the future from a mission-first perspective.
Myopically Verifiable Probabilistic Certificates for Safe Control and Learning
This paper addresses the design of safety certificates for stochastic systems, with a focus on ensuring long-term safety through fast real-time control. In stochastic environments, set invariance-based methods that restrict the probability of risk events in infinitesimal time intervals may exhibit significant long-term risks due to cumulative uncertainties/risks. On the other hand, reachability-based approaches that account for the long-term future may require prohibitive computation in real-time decision making. To overcome this challenge involving stringent long-term safety vs. computation tradeoffs, we first introduce a novel technique termed `probabilistic invariance'. This technique characterizes the invariance conditions of the probability of interest. When the target probability is defined using long-term trajectories, this technique can be used to design myopic conditions/controllers with assured long-term safe probability. Then, we integrate this technique into safe control and learning. The proposed control methods efficiently assure long-term safety using neural networks or model predictive controllers with short outlook horizons. The proposed learning methods can be used to guarantee long-term safety during and after training. Finally, we demonstrate the performance of the proposed techniques in numerical simulations.
Robotics
APOLLO Blender: A Robotics Library for Visualization and Animation in Blender
High-quality visualizations are an essential part of robotics research, enabling clear communication of results through figures, animations, and demonstration videos. While Blender is a powerful and freely available 3D graphics platform, its steep learning curve and lack of robotics-focused integrations make it difficult and time-consuming for researchers to use effectively. In this work, we introduce a lightweight software library that bridges this gap by providing simple scripting interfaces for common robotics visualization tasks. The library offers three primary capabilities: (1) importing robots and environments directly from standardized descriptions such as URDF; (2) Python-based scripting tools for keyframing robot states and visual attributes; and (3) convenient generation of primitive 3D shapes for schematic figures and animations. Together, these features allow robotics researchers to rapidly create publication-ready images, animations, and explanatory schematics without needing extensive Blender expertise. We demonstrate the library through a series of proof-of-concept examples and conclude with a discussion of current limitations and opportunities for future extensions.
Embodied Learning of Reward for Musculoskeletal Control with Vision Language Models
Discovering effective reward functions remains a fundamental challenge in motor control of high-dimensional musculoskeletal systems. While humans can describe movement goals explicitly such as "walking forward with an upright posture," the underlying control strategies that realize these goals are largely implicit, making it difficult to directly design rewards from high-level goals and natural language descriptions. We introduce Motion from Vision-Language Representation (MoVLR), a framework that leverages vision-language models (VLMs) to bridge the gap between goal specification and movement control. Rather than relying on handcrafted rewards, MoVLR iteratively explores the reward space through iterative interaction between control optimization and VLM feedback, aligning control policies with physically coordinated behaviors. Our approach transforms language and vision-based assessments into structured guidance for embodied learning, enabling the discovery and refinement of reward functions for high-dimensional musculoskeletal locomotion and manipulation. These results suggest that VLMs can effectively ground abstract motion descriptions in the implicit principles governing physiological motor control.
comment: 18 pages, 8 figures
Embodied Robot Manipulation in the Era of Foundation Models: Planning and Learning Perspectives
Recent advances in vision, language, and multimodal learning have substantially accelerated progress in robotic foundation models, with robot manipulation remaining a central and challenging problem. This survey examines robot manipulation from an algorithmic perspective and organizes recent learning-based approaches within a unified abstraction of high-level planning and low-level control. At the high level, we extend the classical notion of task planning to include reasoning over language, code, motion, affordances, and 3D representations, emphasizing their role in structured and long-horizon decision making. At the low level, we propose a training-paradigm-oriented taxonomy for learning-based control, organizing existing methods along input modeling, latent representation learning, and policy learning. Finally, we identify open challenges and prospective research directions related to scalability, data efficiency, multimodal physical interaction, and safety. Together, these analyses aim to clarify the design space of modern foundation models for robotic manipulation.
comment: This work is a re-architected core derived from the full survey (arXiv:2510.10903) , refined to highlight the most central themes and representative studies
PreGME: Prescribed Performance Control of Aerial Manipulators based on Variable-Gain ESO
An aerial manipulator, comprising a multirotor base and a robotic arm, is subject to significant dynamic coupling between these two components. Therefore, achieving precise and robust motion control is a challenging yet important objective. Here, we propose a novel prescribed performance motion control framework based on variable-gain extended state observers (ESOs), referred to as PreGME. The method includes variable-gain ESOs for real-time estimation of dynamic coupling and a prescribed performance flight control that incorporates error trajectory constraints. Compared with existing methods, the proposed approach exhibits the following two characteristics. First, the adopted variable-gain ESOs can accurately estimate rapidly varying dynamic coupling. This enables the proposed method to handle manipulation tasks that require aggressive motion of the robotic arm. Second, by prescribing the performance, a preset error trajectory is generated to guide the system evolution along this trajectory. This strategy allows the proposed method to ensure the tracking error remains within the prescribed performance envelope, thereby achieving high-precision control. Experiments on a real platform, including aerial staff twirling, aerial mixology, and aerial cart-pulling experiments, are conducted to validate the effectiveness of the proposed method. Experimental results demonstrate that even under the dynamic coupling caused by rapid robotic arm motion (end-effector velocity: 1.02 m/s, acceleration: 5.10 m/s$^2$), the proposed method achieves high tracking performance.
comment: 12 pages, 6 figures
P-FABRIK: A General Intuitive and Robust Inverse Kinematics Method for Parallel Mechanisms Using FABRIK Approach
Traditional geometric inverse kinematics methods for parallel mechanisms rely on specific spatial geometry constraints. However, their application to redundant parallel mechanisms is challenged due to the increased constraint complexity. Moreover, it will output no solutions and cause unpredictable control problems when the target pose lies outside its workspace. To tackle these challenging issues, this work proposes P-FABRIK, a general, intuitive, and robust inverse kinematics method to find one feasible solution for diverse parallel mechanisms based on the FABRIK algorithm. By decomposing the general parallel mechanism into multiple serial sub-chains using a new topological decomposition strategy, the end targets of each sub-chain can be subsequently revised to calculate the inverse kinematics solutions iteratively. Multiple case studies involving planar, standard, and redundant parallel mechanisms demonstrated the proposed method's generality across diverse parallel mechanisms. Furthermore, numerical simulation studies verified its efficacy and computational efficiency, as well as its robustness ability to handle out-of-workspace targets.
comment: 7 pages, 8 figures, and 2 tables
The body is not there to compute: Comment on "Informational embodiment: Computational role of information structure in codes and robots" by Pitti et al
Applying the lens of computation and information has been instrumental in driving the technological progress of our civilization as well as in empowering our understanding of the world around us. The digital computer was and for many still is the leading metaphor for how our mind operates. Information theory (IT) has also been important in our understanding of how nervous systems encode and process information. The target article deploys information and computation to bodies: to understand why they have evolved in particular ways (animal bodies) and to design optimal bodies (robots). In this commentary, I argue that the main role of bodies is not to compute.
comment: Comment on Pitti, A., Austin, M., Nakajima, K., & Kuniyoshi, Y. (2025). Informational Embodiment: Computational role of information structure in codes and robots. Physics of Life Reviews 53, 262-276. https://doi.org/10.1016/j.plrev.2025.03.018. Also available as arXiv:2408.12950
MUSON: A Reasoning-oriented Multimodal Dataset for Socially Compliant Navigation in Urban Environments
Socially compliant navigation requires structured reasoning over dynamic pedestrians and physical constraints to ensure safe and interpretable decisions. However, existing social navigation datasets often lack explicit reasoning supervision and exhibit highly long-tailed action distributions, limiting models' ability to learn safety-critical behaviors. To address these issues, we introduce MUSON, a multimodal dataset for short-horizon social navigation collected across diverse indoor and outdoor campus scenes. MUSON adopts a structured five-step Chain-of-Thought annotation consisting of perception, prediction, reasoning, action, and explanation, with explicit modeling of static physical constraints and a rationally balanced discrete action space. Compared to SNEI, MUSON provides consistent reasoning, action, and explanation. Benchmarking multiple state-of-the-art Small Vision Language Models on MUSON shows that Qwen2.5-VL-3B achieves the highest decision accuracy of 0.8625, demonstrating that MUSON serves as an effective and reusable benchmark for socially compliant navigation. The dataset is publicly available at https://huggingface.co/datasets/MARSLab/MUSON
Two-Robot Computational Landscape: A Complete Characterization of Model Power in Minimal Mobile Robot Systems
The computational power of autonomous mobile robots under the Look-Compute-Move (LCM) model has been widely studied through an extensive hierarchy of robot models defined by the presence of memory, communication, and synchrony assumptions. While the general n-robot landscape has been largely established, the exact structure for two robots has remained unresolved. This paper presents the first complete characterization of the computational power of two autonomous robots across all major models, namely OBLOT, FSTA, FCOM, and LUMI, under the full spectrum of schedulers (FSYNCH, SSYNCH, ASYNCH, and their atomic variants). Our results reveal a landscape that fundamentally differs from the general case. Most notably, we prove that FSTA^F and LUMI^F coincide under full synchrony, a surprising collapse indicating that perfect synchrony can substitute both memory and communication when only two robots exist. We also show that FSTA and FCOM are orthogonal: there exists a problem solvable in the weakest communication model but impossible even in the strongest finite-state model, completing the bidirectional incomparability. All equivalence and separation results are derived through a novel simulation-free method, providing a unified and constructive view of the two-robot hierarchy. This yields the first complete and exact computational landscape for two robots, highlighting the intrinsic challenges of coordination at the minimal scale.
comment: 23 pages, 3 figures
Active Constraint Learning in High Dimensions from Demonstrations
We present an iterative active constraint learning (ACL) algorithm, within the learning from demonstrations (LfD) paradigm, which intelligently solicits informative demonstration trajectories for inferring an unknown constraint in the demonstrator's environment. Our approach iteratively trains a Gaussian process (GP) on the available demonstration dataset to represent the unknown constraints, uses the resulting GP posterior to query start/goal states, and generates informative demonstrations which are added to the dataset. Across simulation and hardware experiments using high-dimensional nonlinear dynamics and unknown nonlinear constraints, our method outperforms a baseline, random-sampling based method at accurately performing constraint inference from an iteratively generated set of sparse but informative demonstrations.
comment: Under review, 25 pages, 11 figures
Sistema de navegación de cobertura para vehículos no holonómicos en ambientes de exterior
In mobile robotics, coverage navigation refers to the deliberate movement of a robot with the purpose of covering a certain area or volume. Performing this task properly is fundamental for the execution of several activities, for instance, cleaning a facility with a robotic vacuum cleaner. In the mining industry, it is required to perform coverage in several unit processes related with material movement using industrial machinery, for example, in cleaning tasks, in dumps, and in the construction of tailings dam walls. The automation of these processes is fundamental to enhance the security associated with their execution. In this work, a coverage navigation system for a non-holonomic robot is presented. This work is intended to be a proof of concept for the potential automation of various unit processes that require coverage navigation like the ones mentioned before. The developed system includes the calculation of routes that allow a mobile platform to cover a specific area, and incorporates recovery behaviors in case that an unforeseen event occurs, such as the arising of dynamic or previously unmapped obstacles in the terrain to be covered, e.g., other machines or pedestrians passing through the area, being able to perform evasive maneuvers and post-recovery to ensure a complete coverage of the terrain. The system was tested in different simulated and real outdoor environments, obtaining results near 90% of coverage in the majority of experiments. The next step of development is to scale up the utilized robot to a mining machine/vehicle whose operation will be validated in a real environment. The result of one of the tests performed in the real world can be seen in the video available in https://youtu.be/gK7_3bK1P5g.
comment: 13 pages, in Spanish language, 12 figures, accepted at Tercer Congreso Iberoamericano de Minería Subterranea y a Cielo Abierto, UMining 2024
Effective Game-Theoretic Motion Planning via Nested Search
To facilitate effective, safe deployment in the real world, individual robots must reason about interactions with other agents, which often occur without explicit communication. Recent work has identified game theory, particularly the concept of Nash Equilibrium (NE), as a key enabler for behavior-aware decision-making. Yet, existing work falls short of fully unleashing the power of game-theoretic reasoning. Specifically, popular optimization-based methods require simplified robot dynamics and tend to get trapped in local minima due to convexification. Other works that rely on payoff matrices suffer from poor scalability due to the explicit enumeration of all possible trajectories. To bridge this gap, we introduce Game-Theoretic Nested Search (GTNS), a novel, scalable, and provably correct approach for computing NEs in general dynamical systems. GTNS efficiently searches the action space of all agents involved, while discarding trajectories that violate the NE constraint (no unilateral deviation) through an inner search over a lower-dimensional space. Our algorithm enables explicit selection among equilibria by utilizing a user-specified global objective, thereby capturing a rich set of realistic interactions. We demonstrate the approach on a variety of autonomous driving and racing scenarios where we achieve solutions in mere seconds on commodity hardware.
comment: Updated version. Additional experiment included. Cosmetic/formatting changes made
Relative Localization System Design for SnailBot: A Modular Self-reconfigurable Robot
This paper presents the design and implementation of a relative localization system for SnailBot, a modular self reconfigurable robot. The system integrates ArUco marker recognition, optical flow analysis, and IMU data processing into a unified fusion framework, enabling robust and accurate relative positioning for collaborative robotic tasks. Experimental validation demonstrates the effectiveness of the system in realtime operation, with a rule based fusion strategy ensuring reliability across dynamic scenarios. The results highlight the potential for scalable deployment in modular robotic systems.
comment: The design presented in the article does not correspond to the actual situation
Adaptive Keyframe Selection for Scalable 3D Scene Reconstruction in Dynamic Environments
In this paper, we propose an adaptive keyframe selection method for improved 3D scene reconstruction in dynamic environments. The proposed method integrates two complementary modules: an error-based selection module utilizing photometric and structural similarity (SSIM) errors, and a momentum-based update module that dynamically adjusts keyframe selection thresholds according to scene motion dynamics. By dynamically curating the most informative frames, our approach addresses a key data bottleneck in real-time perception. This allows for the creation of high-quality 3D world representations from a compressed data stream, a critical step towards scalable robot learning and deployment in complex, dynamic environments. Experimental results demonstrate significant improvements over traditional static keyframe selection strategies, such as fixed temporal intervals or uniform frame skipping. These findings highlight a meaningful advancement toward adaptive perception systems that can dynamically respond to complex and evolving visual scenes. We evaluate our proposed adaptive keyframe selection module on two recent state-of-the-art 3D reconstruction networks, Spann3r and CUT3R, and observe consistent improvements in reconstruction quality across both frameworks. Furthermore, an extensive ablation study confirms the effectiveness of each individual component in our method, underlining their contribution to the overall performance gains.
comment: Accepted at ROBOVIS 2026
Multiagent Systems
Heterogeneity in Multi-Agent Reinforcement Learning
Heterogeneity is a fundamental property in multi-agent reinforcement learning (MARL), which is closely related not only to the functional differences of agents, but also to policy diversity and environmental interactions. However, the MARL field currently lacks a rigorous definition and deeper understanding of heterogeneity. This paper systematically discusses heterogeneity in MARL from the perspectives of definition, quantification, and utilization. First, based on an agent-level modeling of MARL, we categorize heterogeneity into five types and provide mathematical definitions. Second, we define the concept of heterogeneity distance and propose a practical quantification method. Third, we design a heterogeneity-based multi-agent dynamic parameter sharing algorithm as an example of the application of our methodology. Case studies demonstrate that our method can effectively identify and quantify various types of agent heterogeneity. Experimental results show that the proposed algorithm, compared to other parameter sharing baselines, has better interpretability and stronger adaptability. The proposed methodology will help the MARL community gain a more comprehensive and profound understanding of heterogeneity, and further promote the development of practical algorithms.
Reinforcement Networks: novel framework for collaborative Multi-Agent Reinforcement Learning tasks
Modern AI systems often comprise multiple learnable components that can be naturally organized as graphs. A central challenge is the end-to-end training of such systems without restrictive architectural or training assumptions. Such tasks fit the theory and approaches of the collaborative Multi-Agent Reinforcement Learning (MARL) field. We introduce Reinforcement Networks, a general framework for MARL that organizes agents as vertices in a directed acyclic graph (DAG). This structure extends hierarchical RL to arbitrary DAGs, enabling flexible credit assignment and scalable coordination while avoiding strict topologies, fully centralized training, and other limitations of current approaches. We formalize training and inference methods for the Reinforcement Networks framework and connect it to the LevelEnv concept to support reproducible construction, training, and evaluation. We demonstrate the effectiveness of our approach on several collaborative MARL setups by developing several Reinforcement Networks models that achieve improved performance over standard MARL baselines. Beyond empirical gains, Reinforcement Networks unify hierarchical, modular, and graph-structured views of MARL, opening a principled path toward designing and training complex multi-agent systems. We conclude with theoretical and practical directions - richer graph morphologies, compositional curricula, and graph-aware exploration. That positions Reinforcement Networks as a foundation for a new line of research in scalable, structured MARL.
Adaptive Trust Consensus for Blockchain IoT: Comparing RL, DRL, and MARL Against Naive, Collusive, Adaptive, Byzantine, and Sleeper Attacks
Securing blockchain-enabled IoT networks against sophisticated adversarial attacks remains a critical challenge. This paper presents a trust-based delegated consensus framework integrating Fully Homomorphic Encryption (FHE) with Attribute-Based Access Control (ABAC) for privacy-preserving policy evaluation, combined with learning-based defense mechanisms. We systematically compare three reinforcement learning approaches -- tabular Q-learning (RL), Deep RL with Dueling Double DQN (DRL), and Multi-Agent RL (MARL) -- against five distinct attack families: Naive Malicious Attack (NMA), Collusive Rumor Attack (CRA), Adaptive Adversarial Attack (AAA), Byzantine Fault Injection (BFI), and Time-Delayed Poisoning (TDP). Experimental results on a 16-node simulated IoT network reveal significant performance variations: MARL achieves superior detection under collusive attacks (F1=0.85 vs. DRL's 0.68 and RL's 0.50), while DRL and MARL both attain perfect detection (F1=1.00) against adaptive attacks where RL fails (F1=0.50). All agents successfully defend against Byzantine attacks (F1=1.00). Most critically, the Time-Delayed Poisoning attack proves catastrophic for all agents, with F1 scores dropping to 0.11-0.16 after sleeper activation, demonstrating the severe threat posed by trust-building adversaries. Our findings indicate that coordinated multi-agent learning provides measurable advantages for defending against sophisticated trust manipulation attacks in blockchain IoT environments.
comment: 34 pages, 19 figures, 10 tables. Code available at https://github.com/soham-padia/blockchain-iot-trust
MARPO: A Reflective Policy Optimization for Multi Agent Reinforcement Learning
We propose Multi Agent Reflective Policy Optimization (MARPO) to alleviate the issue of sample inefficiency in multi agent reinforcement learning. MARPO consists of two key components: a reflection mechanism that leverages subsequent trajectories to enhance sample efficiency, and an asymmetric clipping mechanism that is derived from the KL divergence and dynamically adjusts the clipping range to improve training stability. We evaluate MARPO in classic multi agent environments, where it consistently outperforms other methods.
comment: 9 pages
Effective Game-Theoretic Motion Planning via Nested Search
To facilitate effective, safe deployment in the real world, individual robots must reason about interactions with other agents, which often occur without explicit communication. Recent work has identified game theory, particularly the concept of Nash Equilibrium (NE), as a key enabler for behavior-aware decision-making. Yet, existing work falls short of fully unleashing the power of game-theoretic reasoning. Specifically, popular optimization-based methods require simplified robot dynamics and tend to get trapped in local minima due to convexification. Other works that rely on payoff matrices suffer from poor scalability due to the explicit enumeration of all possible trajectories. To bridge this gap, we introduce Game-Theoretic Nested Search (GTNS), a novel, scalable, and provably correct approach for computing NEs in general dynamical systems. GTNS efficiently searches the action space of all agents involved, while discarding trajectories that violate the NE constraint (no unilateral deviation) through an inner search over a lower-dimensional space. Our algorithm enables explicit selection among equilibria by utilizing a user-specified global objective, thereby capturing a rich set of realistic interactions. We demonstrate the approach on a variety of autonomous driving and racing scenarios where we achieve solutions in mere seconds on commodity hardware.
comment: Updated version. Additional experiment included. Cosmetic/formatting changes made
Scaling Clinician-Grade Feature Generation from Clinical Notes with Multi-Agent Language Models
Developing accurate clinical prediction models is often bottlenecked by the difficulty of deriving meaningful structured features from unstructured EHR notes, a process that traditionally requires manual, unscalable clinical abstraction. In this study, we first established a rigorous patient-level Clinician Feature Generation (CFG) protocol, in which domain experts manually reviewed notes to define and extract nuanced features for a cohort of 147 patients with prostate cancer. As a high-fidelity ground truth, this labor-intensive process provided the blueprint for SNOW (Scalable Note-to-Outcome Workflow), a transparent multi-agent large language model (LLM) system designed to autonomously mimic the iterative reasoning and validation workflow of clinical experts. On 5-year cancer recurrence prediction, SNOW (AUC-ROC 0.767) achieved performance comparable to manual CFG (0.762) and outperformed structured baselines, clinician-guided LLM extraction, and six representational feature generation (RFG) approaches. Once configured, SNOW produced the full patient-level feature table in 12 hours with 5 hours of clinician oversight, reducing human expert effort by approximately 48-fold versus manual CFG. To test scalability where manual CFG is infeasible, we deployed SNOW on an external heart failure with preserved ejection fraction (HFpEF) cohort from MIMIC-IV (n=2,084); without task-specific tuning, SNOW generated prognostic features that outperformed baseline and RFG methods for 30-day (SNOW: 0.851) and 1-year (SNOW: 0.763) mortality prediction. These results demonstrate that a modular LLM agent-based system can scale expert-level feature generation from clinical notes, while enabling interpretable use of unstructured EHR text in outcome prediction and preserving generalizability across a variety of settings and conditions.
Multi-agent Self-triage System with Medical Flowcharts
Online health resources and large language models (LLMs) are increasingly used as a first point of contact for medical decision-making, yet their reliability in healthcare remains limited by low accuracy, lack of transparency, and susceptibility to unverified information. We introduce a proof-of-concept conversational self-triage system that guides LLMs with 100 clinically validated flowcharts from the American Medical Association, providing a structured and auditable framework for patient decision support. The system leverages a multi-agent framework consisting of a retrieval agent, a decision agent, and a chat agent to identify the most relevant flowchart, interpret patient responses, and deliver personalized, patient-friendly recommendations, respectively. Performance was evaluated at scale using synthetic datasets of simulated conversations. The system achieved 95.29% top-3 accuracy in flowchart retrieval (N=2,000) and 99.10% accuracy in flowchart navigation across varied conversational styles and conditions (N=37,200). By combining the flexibility of free-text interaction with the rigor of standardized clinical protocols, this approach demonstrates the feasibility of transparent, accurate, and generalizable AI-assisted self-triage, with potential to support informed patient decision-making while improving healthcare resource utilization.
Systems and Control (EESS)
Real-Time Forward Kinematics and Jacobians for Control of an MRI-Guided Magnetically Actuated Robotic Catheter
This paper presents a forward kinematics and analytical Jacobian computation approach for real-time control of a novel magnetic resonance imaging (MRI)-actuated robotic catheter. The MRI-actuated robotic catheter is modeled as a series of rigid and flexible segments and actuated by magnetic torques generated on a set of current-carrying microcoils embedded on the catheter body by the magnetic field of the MRI scanner. First, a real-time forward kinematic modeling approach of the robotic catheter employing the static Cosserat-rod theory is presented. Second, the analytical calculation approach of the forward kinematic Jacobians of the proposed forward kinematic model is presented. The accuracy, reproducibility, and computational efficiency of the proposed methods are evaluated using a robotic catheter prototype with a single coil set, where catheter tip trajectories collected by a catadioptric stereo camera tracking system are validated using the desired tip trajectories. Experimental results demonstrate that the proposed method can successfully control the catheter in an open loop to perform complex trajectories with real-time computational efficiency, paving the way for accurate closed-loop control with real-time MR-imaging feedback.
Global Frequency Reference Tracking as an Oscillation Suppression Mechanism in VSM Primary Control: A Coupled-Oscillator Study
Synchronization in power systems is traditionally achieved through physical network coupling, whereby inverter-based resources (IBRs) and synchronous machines converge to a common frequency via oscillatory swing dynamics. In conventional operation, secondary control acts on a slow time scale and is typically engaged only after the primary dynamics have largely settled. As a result, in the absence of an explicit global reference, disturbances can induce prolonged transients and large phase excursions. This work considers a setting in which the total active power balance is known and maintained at all times, and proposes a control architecture for virtual synchronous machine (VSM) based inverters in which all units track a broadcast global frequency reference. Under this assumption, synchronization is transformed from a mutual oscillator locking problem into a reference tracking problem. Using a second order swing network model, we show that embedding a simple proportional integral (PI) frequency controller can significantly improves transient behavior. A washout mechanism ensures that the additional control action vanishes in steady state, thereby preserving network determined power sharing. Simulations on a three oscillator network demonstrate reduced frequency overshoot, elimination of underdamped oscillations, and lower angular stress compared to conventional open loop synchronization, highlighting the effectiveness of a global frequency reference as a coordination mechanism for grid-forming inverter networks.
A Bezier Curve Based Approach to the Convexification of the AC Optimal Power Flow Problem
The Alternating Current Optimal Power Flow (ACOPF) problem remains one of the most fundamental yet computationally challenging tasks in power systems operation and planning due to its nonconvex, nonlinear, and multimodal nature. This paper proposes a convex reformulation of the AC power flow problem by introducing auxiliary variables to isolate nonlinear terms, applying logarithmic transformations to exploit product-sum properties, and approximating with Bezier curves using a novel convexifying butterfly shaped function. This model is intended for assessing and operating weak power systems that face challenges with reactive power supply and overall network robustness. Its formulation closely mirrors the AC formulation, particularly regarding active and reactive power dispatch and network voltage levels. The proposed model achieves convergence on large test systems (e.g., IEEE 118 bus) in seconds and is validated against exact AC solutions. This convex formulation stands out not only for its mathematical transparency and intuitive structure but also for its ease of validation and implementation, making it an accessible and reliable tool for researchers and system operators for energy planning. The numerical analysis conducted on the IEEE 118 bus system yielded average percentage errors in the state variables specifically, the magnitudes and angles of nodal voltages of just 0.0008 percentage and 0.014 degree, respectively, when compared with the precise AC formulation. These results underscore the high accuracy and reliability of the proposed methodology.
comment: 10 pages, 7 figures
PreGME: Prescribed Performance Control of Aerial Manipulators based on Variable-Gain ESO
An aerial manipulator, comprising a multirotor base and a robotic arm, is subject to significant dynamic coupling between these two components. Therefore, achieving precise and robust motion control is a challenging yet important objective. Here, we propose a novel prescribed performance motion control framework based on variable-gain extended state observers (ESOs), referred to as PreGME. The method includes variable-gain ESOs for real-time estimation of dynamic coupling and a prescribed performance flight control that incorporates error trajectory constraints. Compared with existing methods, the proposed approach exhibits the following two characteristics. First, the adopted variable-gain ESOs can accurately estimate rapidly varying dynamic coupling. This enables the proposed method to handle manipulation tasks that require aggressive motion of the robotic arm. Second, by prescribing the performance, a preset error trajectory is generated to guide the system evolution along this trajectory. This strategy allows the proposed method to ensure the tracking error remains within the prescribed performance envelope, thereby achieving high-precision control. Experiments on a real platform, including aerial staff twirling, aerial mixology, and aerial cart-pulling experiments, are conducted to validate the effectiveness of the proposed method. Experimental results demonstrate that even under the dynamic coupling caused by rapid robotic arm motion (end-effector velocity: 1.02 m/s, acceleration: 5.10 m/s$^2$), the proposed method achieves high tracking performance.
comment: 12 pages, 6 figures
Weak state synchronization of homogeneous multi-agent systems with adaptive protocols
In this paper, we study scale-free weak synchronization for multi-agent systems (MAS). In other words, we design a protocol for the agents without using any knowledge about the network. We do not even require knowledge about the connectivity of the network. Each protocol contains an adaptive parameter to tune the protocol automatically to the demands of the network.
comment: This paper was submitted to 2026 CCDC at Dec. 25, 2025. Different from the submitted version, this version includes all simulation results
Distributed Fusion Estimation with Protecting Exogenous Inputs
In the context of distributed fusion estimation, directly transmitting local estimates to the fusion center may cause a privacy leakage concerning exogenous inputs. Thus, it is crucial to protect exogenous inputs against full eavesdropping while achieving distributed fusion estimation. To address this issue, a noise injection strategy is provided by injecting mutually independent noises into the local estimates transmitted to the fusion center. To determine the covariance matrices of the injected noises, a constrained minimization problem is constructed by minimizing the sum of mean square errors of the local estimates while ensuring (ε, δ)-differential privacy. Suffering from the non-convexity of the minimization problem, an approach of relaxation is proposed, which efficiently solves the minimization problem without sacrificing differential privacy level. Then, a differentially private distributed fusion estimation algorithm based on the covariance intersection approach is developed. Further, by introducing a feedback mechanism, the fusion estimation accuracy is enhanced on the premise of the same (ε, δ)-differential privacy. Finally, an illustrative example is provided to demonstrate the effectiveness of the proposed algorithms, and the trade-off between differential privacy level and fusion estimation accuracy.
A Neural Network-Based Real-time Casing Collar Recognition System for Downhole Instruments
Accurate downhole positioning is critical in oil and gas operations but is often compromised by signal degradation in traditional surface-based Casing Collar Locator (CCL) monitoring. To address this, we present an in-situ, real-time collar recognition system using embedded neural network. We introduce lightweight "Collar Recognition Nets" (CRNs) optimized for resource-constrained ARM Cortex-M7 microprocessors. By leveraging temporal and depthwise separable convolutions, our most compact model reduces computational complexity to just 8,208 MACs while maintaining an F1 score of 0.972. Hardware validation confirms an average inference latency of 343.2 μs, demonstrating that robust, autonomous signal processing is feasible within the severe power and space limitations of downhole instrumentation.
Assessment of a Hybrid Energy System for Reliable and Sustainable Power Supply to Boru Meda Hospital in Ethiopia
This study aims to evaluate the techno-economic feasibility of hybrid energy systems (HES) including Grid for providing reliable and sustainable power to Boru Meda Hospital, Ethiopia. HOMER pro 3.11.2 was used to design and evaluate a novel, integrated optimization and comparative assessment of diverse HRES, specif ically adjusted to the energy consumptions and available resources of the Hospital. The scenario evaluation showed that interconnecting photovoltaic (PV), biomass generator (BG), wind power (WP), diesel generator (DG), battery, and converter can effectively provide the Hospital's daily energy consumption of 11,214.66 kWh while conforming reliability and reducing emissions. The PV/BG/batt/conv configuration emerged as the most cost-effective and sustainable alternative, attaining the lowest LCOE of \$0.339/kWh, an NPC of \$25.7 million, and a 100% renewable energy fraction with simple pay back of 7.26 yr. As a result, the operational cost associated with the consumption of 500.00 L of diesel per month can be entirely avoided. The DG-integrated hybrids exhibit advanced techno-economic capability with significant worth, strong ROI (20\%) and IRR (18\%), endorsed by fast capital recovery (7.21-8.71 years). Overall, the hybrid system offers an optimal balance of cost, reliability, and sustainability, making it a promising and scalable solution for electrification of energy scare institution and areas in Ethiopia, thereby contributing to national sustainable energy development goals.
Reach-Avoid Differential game with Reachability Analysis for UAVs: A decomposition approach
Reach-avoid (RA) games have significant applications in security and defense, particularly for unmanned aerial vehicles (UAVs). These problems are inherently challenging due to the need to consider obstacles, consider the adversarial nature of opponents, ensure optimality, and account for nonlinear dynamics. Hamilton-Jacobi (HJ) reachability analysis has emerged as a powerful tool for tackling these challenges; however, while it has been applied to games involving two spatial dimensions, directly extending this approach to three spatial dimensions is impossible due to high dimensionality. On the other hand, alternative approaches for solving RA games lack the generality to consider games with three spatial dimensions involving agents with non-trivial system dynamics. In this work, we propose a novel framework for dimensionality reduction by decomposing the problem into a horizontal RA sub-game and a vertical RA sub-game. We then solve each sub-game using HJ reachability analysis and consider second-order dynamics that account for the defender's acceleration. To reconstruct the solution to the original RA game from the sub-games, we introduce a HJ-based tracking control algorithm in each sub-game that not only guarantees capture of the attacker but also tracking of the attacker thereafter. We prove the conditions under which the capture guarantees are maintained. The effectiveness of our approach is demonstrated via numerical simulations, showing that the decomposition maintains optimality and guarantees in the original problem. Our methods are also validated in a Gazebo physics simulator, achieving successful capture of quadrotors in three spatial dimensions space for the first time to the best of our knowledge.
comment: Paper version accepted to the Journal of Guidance, Control, and Dynamics (JGCD)
A Time-Barrier Lyapunov Condition for Predefined-Time Stability
Predefined-time stability enables convergence within a user-specified time independent of initial conditions. Existing results are predominantly based on autonomous Lyapunov inequalities, where the predefined-time is realized through integral bounds on state-dependent decay and therefore acts as an upper bound rather than a structurally enforced deadline. This paper introduces a time-barrier predefined-time stability concept in which convergence is enforced through a nonautonomous Lyapunov mechanism that intrinsically restricts the remaining available time. A sufficient Lyapunov-based condition is established, guaranteeing convergence before the predefined deadline via divergence of a time-dependent barrier. It is further shown that this mechanism cannot be reproduced by classical autonomous predefined-time stability formulations, thereby constituting a distinct stability notion. The proposed approach provides a concise and transparent means of enforcing hard convergence deadlines in nonlinear systems.
comment: 4 pages, 0 figures
Active Constraint Learning in High Dimensions from Demonstrations
We present an iterative active constraint learning (ACL) algorithm, within the learning from demonstrations (LfD) paradigm, which intelligently solicits informative demonstration trajectories for inferring an unknown constraint in the demonstrator's environment. Our approach iteratively trains a Gaussian process (GP) on the available demonstration dataset to represent the unknown constraints, uses the resulting GP posterior to query start/goal states, and generates informative demonstrations which are added to the dataset. Across simulation and hardware experiments using high-dimensional nonlinear dynamics and unknown nonlinear constraints, our method outperforms a baseline, random-sampling based method at accurately performing constraint inference from an iteratively generated set of sparse but informative demonstrations.
comment: Under review, 25 pages, 11 figures
Relative Localization System Design for SnailBot: A Modular Self-reconfigurable Robot
This paper presents the design and implementation of a relative localization system for SnailBot, a modular self reconfigurable robot. The system integrates ArUco marker recognition, optical flow analysis, and IMU data processing into a unified fusion framework, enabling robust and accurate relative positioning for collaborative robotic tasks. Experimental validation demonstrates the effectiveness of the system in realtime operation, with a rule based fusion strategy ensuring reliability across dynamic scenarios. The results highlight the potential for scalable deployment in modular robotic systems.
comment: The design presented in the article does not correspond to the actual situation
Off-grid solar energy storage system with hybrid lithium iron phosphate (LFP) and lead-acid batteries in high mountains: a case report of Jiujiu Cabins in Taiwan
Mountain huts are buildings located at high altitude, offering a place for hikers and providing shelter. Energy supply to mountain huts remains an ongoing issue. Using renewable energies could be an appropriate solution. Jiujiu Cabins, a famous mountain hut in Shei-Pa National Park, Taiwan, has operated an off-grid solar energy storage system (ESS) with lead-acid batteries. In 2021, a serious system failure took place, leading to no electricity. After a detailed on-site survey, a reorganization and repair project was implemented, and the energy system came back to operate normally. Meanwhile, an eco-friendly lithium iron phosphate battery (LFP battery) ESS replaces part of the lead-acid battery ESS, forming a hybrid ESS, making a better and greener off-grid solar ESS. In this case report, the energy architecture, detailed descriptions, and historical status of the system are provided. An on-site survey of the failed energy system, a system improvement project, and a future plan are listed.
comment: 10 pages, 9 figures, 3 tables
Probabilistic Dynamic Line Rating with Line Graph Convolutional LSTM
Dynamic line rating (DLR) is an effective approach to enhancing the utilization of existing transmission line infrastructure by adapting line ratings according to real-time weather conditions. Accurate DLR forecasts are essential for grid operators to effectively schedule generation, manage transmission congestion, and lower operating costs. As renewable generation becomes increasingly variable and weather-dependent, accurate DLR forecasts are also crucial for improving renewable utilization and reducing curtailment during congested periods. Deterministic forecasts, however, often inadequately represent actual line capacities under uncertain weather conditions, leading to operational risks and costly real-time adjustments. To overcome these limitations, we propose a novel network-wide probabilistic DLR forecasting model that leverages both spatial and temporal information, significantly reducing the operational risks and inefficiencies inherent in deterministic methods. Case studies on a synthetic Texas 123-bus system demonstrate that the proposed method not only enhances grid reliability by effectively capturing true DLR values, but also substantially reduces operational costs.
comment: 10 pages, 8 figures. arXiv admin note: text overlap with arXiv:2411.12963
Robotics
Modeling of UAV Tether Aerodynamics for Real-Time Simulation
One of the main limitations of multirotor UAVs is their short flight time due to battery constraints. A practical solution for continuous operation is to power the drone from the ground via a tether. While this approach has been demonstrated for stationary systems, scenarios with a fast-moving base vehicle or strong wind conditions require modeling the tether forces, including aerodynamic effects. In this work, we propose two complementary approaches for real-time quasi-static tether modeling with aerodynamics. The first is an analytical method based on catenary theory with a uniform drag assumption, achieving very fast solve times below 1ms. The second is a numerical method that discretizes the tether into segments and lumped masses, solving the equilibrium equations using CasADi and IPOPT. By leveraging initialization strategies, such as warm starting and analytical initialization, real-time performance was achieved with a solve time of 5ms, while allowing for flexible force formulations. Both approaches were validated in real-world tests using a load cell to measure the tether force. The results show that the analytical method provides sufficient accuracy for most tethered UAV applications with minimal computational cost, while the numerical method offers higher flexibility and physical accuracy when required. These approaches form a lightweight and extensible framework for real-time tether simulation, applicable to both offline optimization and online tasks such as simulation, control, and trajectory planning.
ParaMaP: Parallel Mapping and Collision-free Motion Planning for Reactive Robot Manipulation
Real-time and collision-free motion planning remains challenging for robotic manipulation in unknown environments due to continuous perception updates and the need for frequent online replanning. To address these challenges, we propose a parallel mapping and motion planning framework that tightly integrates Euclidean Distance Transform (EDT)-based environment representation with a sampling-based model predictive control (SMPC) planner. On the mapping side, a dense distance-field-based representation is constructed using a GPU-based EDT and augmented with a robot-masked update mechanism to prevent false self-collision detections during online perception. On the planning side, motion generation is formulated as a stochastic optimization problem with a unified objective function and efficiently solved by evaluating large batches of candidate rollouts in parallel within a SMPC framework, in which a geometrically consistent pose tracking metric defined on SE(3) is incorporated to ensure fast and accurate convergence to the target pose. The entire mapping and planning pipeline is implemented on the GPU to support high-frequency replanning. The effectiveness of the proposed framework is validated through extensive simulations and real-world experiments on a 7-DoF robotic manipulator. More details are available at: https://zxw610.github.io/ParaMaP.
VLA-Arena: An Open-Source Framework for Benchmarking Vision-Language-Action Models
While Vision-Language-Action models (VLAs) are rapidly advancing towards generalist robot policies, it remains difficult to quantitatively understand their limits and failure modes. To address this, we introduce a comprehensive benchmark called VLA-Arena. We propose a novel structured task design framework to quantify difficulty across three orthogonal axes: (1) Task Structure, (2) Language Command, and (3) Visual Observation. This allows us to systematically design tasks with fine-grained difficulty levels, enabling a precise measurement of model capability frontiers. For Task Structure, VLA-Arena's 170 tasks are grouped into four dimensions: Safety, Distractor, Extrapolation, and Long Horizon. Each task is designed with three difficulty levels (L0-L2), with fine-tuning performed exclusively on L0 to assess general capability. Orthogonal to this, language (W0-W4) and visual (V0-V4) perturbations can be applied to any task to enable a decoupled analysis of robustness. Our extensive evaluation of state-of-the-art VLAs reveals several critical limitations, including a strong tendency toward memorization over generalization, asymmetric robustness, a lack of consideration for safety constraints, and an inability to compose learned skills for long-horizon tasks. To foster research addressing these challenges and ensure reproducibility, we provide the complete VLA-Arena framework, including an end-to-end toolchain from task definition to automated evaluation and the VLA-Arena-S/M/L datasets for fine-tuning. Our benchmark, data, models, and leaderboard are available at https://vla-arena.github.io.
Clutter-Resistant Vision-Language-Action Models through Object-Centric and Geometry Grounding
Recent Vision-Language-Action (VLA) models have made impressive progress toward general-purpose robotic manipulation by post-training large Vision-Language Models (VLMs) for action prediction. Yet most VLAs entangle perception and control in a monolithic pipeline optimized purely for action, which can erode language-conditioned grounding. In our real-world tabletop tests, policies over-grasp when the target is absent, are distracted by clutter, and overfit to background appearance. To address these issues, we propose OBEYED-VLA (OBject-centric and gEometrY groundED VLA), a framework that explicitly disentangles perceptual grounding from action reasoning. Instead of operating directly on raw RGB, OBEYED-VLA augments VLAs with a perception module that grounds multi-view inputs into task-conditioned, object-centric, and geometry-aware observations. This module includes a VLM-based object-centric grounding stage that selects task-relevant object regions across camera views, along with a complementary geometric grounding stage that emphasizes the 3D structure of these objects over their appearance. The resulting grounded views are then fed to a pretrained VLA policy, which we fine-tune exclusively on single-object demonstrations collected without environmental clutter or non-target objects. On a real-world UR10e tabletop setup, OBEYED-VLA substantially improves robustness over strong VLA baselines across four challenging regimes and multiple difficulty levels: distractor objects, absent-target rejection, background appearance changes, and cluttered manipulation of unseen objects. Ablation studies confirm that both semantic grounding and geometry-aware grounding are critical to these gains. Overall, the results indicate that making perception an explicit, object-centric component is an effective way to strengthen and generalize VLA-based robotic manipulation.
comment: Under review. Project website: https://uark-aicv.github.io/OBEYED_VLA
Topology-Preserving Scalar Field Optimization for Boundary-Conforming Spiral Toolpaths on Multiply Connected Freeform Surfaces
Ball-end milling path planning on multiply connected freeform surfaces is pivotal for high-quality and efficient machining of components in automotive and aerospace manufacturing. Although scalar-field-based optimization provides a unified framework for multi-objective toolpath generation, maintaining boundary conformity while eliminating zero-gradient singularities that cause iso-curve branching or termination and disrupt toolpath continuity remains challenging on multiply connected surfaces. We propose an efficient strategy to robustly enforce these constraints throughout optimization. Conformal slit mapping is employed to construct a feasible, singularity-free initial scalar field. The optimization is reformulated as a topology-preserving mesh deformation governed by boundary-synchronous updates, enabling globally optimized spacing, scallop-height uniformity, and smooth trajectory transitions. Consequently, the toolpaths are continuous, boundary-conforming, and free of self-intersections. Milling experiments demonstrate that, compared with a state-of-the-art conformal slit mapping-based method, the proposed approach increases machining efficiency by 14.24%, improves scallop-height uniformity by 5.70%, and reduces milling impact-induced vibrations by over 10%. The strategy offers broad applicability in high-performance machining scenarios.
comment: 24Pages,12Figures
Asymmetric Friction in Geometric Locomotion
Geometric mechanics models of locomotion have provided insight into how robots and animals use environmental interactions to convert internal shape changes into displacement through the world, encoding this relationship in a ``motility map''. A key class of such motility maps arises from (possibly anisotropic) linear drag acting on the system's individual body parts, formally described via Riemannian metrics on the motions of the system's individual body parts. The motility map can then be generated by invoking a sub-Riemannian constraint on the aggregate system motion under which the position velocity induced by a given shape velocity is that which minimizes the power dissipated via friction. The locomotion of such systems is ``geometric'' in the sense that the final position reached by the system depends only on the sequence of shapes that the system passes through, but not on the rate with which the shape changes are made. In this paper, we consider a far more general class of systems in which the drag may be not only anisotropic (with different coefficients for forward/backward and left/right motions), but also asymmetric (with different coefficients for forward and backward motions). Formally, including asymmetry in the friction replaces the Riemannian metrics on the body parts with Finsler metrics. We demonstrate that the sub-Riemannian approach to constructing the system motility map extends naturally to a sub-Finslerian approach and identify system properties analogous to the constraint curvature of sub-Riemannian systems that allow for the characterization of the system motion capabilities.
comment: 23 pages, 15 figures
Pose-Guided Residual Refinement for Interpretable Text-to-Motion Generation and Editing
Text-based 3D motion generation aims to automatically synthesize diverse motions from natural-language descriptions to extend user creativity, whereas motion editing modifies an existing motion sequence in response to text while preserving its overall structure. Pose-code-based frameworks such as CoMo map quantifiable pose attributes into discrete pose codes that support interpretable motion control, but their frame-wise representation struggles to capture subtle temporal dynamics and high-frequency details, often degrading reconstruction fidelity and local controllability. To address this limitation, we introduce pose-guided residual refinement for motion (PGR$^2$M), a hybrid representation that augments interpretable pose codes with residual codes learned via residual vector quantization (RVQ). A pose-guided RVQ tokenizer decomposes motion into pose latents that encode coarse global structure and residual latents that model fine-grained temporal variations. Residual dropout further discourages over-reliance on residuals, preserving the semantic alignment and editability of the pose codes. On top of this tokenizer, a base Transformer autoregressively predicts pose codes from text, and a refine Transformer predicts residual codes conditioned on text, pose codes, and quantization stage. Experiments on HumanML3D and KIT-ML show that PGR$^2$M improves Fréchet inception distance and reconstruction metrics for both generation and editing compared with CoMo and recent diffusion- and tokenization-based baselines, while user studies confirm that it enables intuitive, structure-preserving motion edits.
Bugs with Features: Vision-Based Fault-Tolerant Collective Motion Inspired by Nature
In collective motion, perceptually-limited individuals move in an ordered manner, without centralized control. The perception of each individual is highly localized, as is its ability to interact with others. While natural collective motion is robust, most artificial swarms are brittle. This particularly occurs when vision is used as the sensing modality, due to ambiguities and information-loss inherent in visual perception. This paper presents mechanisms for robust collective motion inspired by studies of locusts. First, we develop a robust distance estimation method that combines visually perceived horizontal and vertical sizes of neighbors. Second, we introduce intermittent locomotion as a mechanism that allows robots to reliably detect peers that fail to keep up, and disrupt the motion of the swarm. We show how such faulty robots can be avoided in a manner that is robust to errors in classifying them as faulty. Through extensive physics-based simulation experiments, we show dramatic improvements to swarm resilience when using these techniques. We show these are relevant to both distance-based Avoid-Attract models, as well as to models relying on Alignment, in a wide range of experiment settings.
Emergence of Human to Robot Transfer in Vision-Language-Action Models
Vision-language-action (VLA) models can enable broad open world generalization, but require large and diverse datasets. It is appealing to consider whether some of this data can come from human videos, which cover diverse real-world situations and are easy to obtain. However, it is difficult to train VLAs with human videos alone, and establishing a mapping between humans and robots requires manual engineering and presents a major research challenge. Drawing inspiration from advances in large language models, where the ability to learn from diverse supervision emerges with scale, we ask whether a similar phenomenon holds for VLAs that incorporate human video data. We introduce a simple co-training recipe, and find that human-to-robot transfer emerges once the VLA is pre-trained on sufficient scenes, tasks, and embodiments. Our analysis suggests that this emergent capability arises because diverse pretraining produces embodiment-agnostic representations for human and robot data. We validate these findings through a series of experiments probing human to robot skill transfer and find that with sufficiently diverse robot pre-training our method can nearly double the performance on generalization settings seen only in human data.
Forecasting in Offline Reinforcement Learning for Non-stationary Environments NeurIPS 2025
Offline Reinforcement Learning (RL) provides a promising avenue for training policies from pre-collected datasets when gathering additional interaction data is infeasible. However, existing offline RL methods often assume stationarity or only consider synthetic perturbations at test time, assumptions that often fail in real-world scenarios characterized by abrupt, time-varying offsets. These offsets can lead to partial observability, causing agents to misperceive their true state and degrade performance. To overcome this challenge, we introduce Forecasting in Non-stationary Offline RL (FORL), a framework that unifies (i) conditional diffusion-based candidate state generation, trained without presupposing any specific pattern of future non-stationarity, and (ii) zero-shot time-series foundation models. FORL targets environments prone to unexpected, potentially non-Markovian offsets, requiring robust agent performance from the onset of each episode. Empirical evaluations on offline RL benchmarks, augmented with real-world time-series data to simulate realistic non-stationarity, demonstrate that FORL consistently improves performance compared to competitive baselines. By integrating zero-shot forecasting with the agent's experience, we aim to bridge the gap between offline RL and the complexities of real-world, non-stationary environments.
comment: The Thirty-Ninth Annual Conference on Neural Information Processing Systems, NeurIPS 2025
Symphony: A Heuristic Normalized Calibrated Advantage Actor and Critic Algorithm in application for Humanoid Robots
In our work we not explicitly hint that it is a misconception to think that humans learn fast. Learning process takes time. Babies start learning to move in the restricted liquid area called placenta. Children often are limited by underdeveloped body. Even adults are not allowed to participate in complex competitions right away. However, with robots, when learning from scratch, we often don't have the privilege of waiting for dozen millions of steps. "Swaddling" regularization is responsible for restraining an agent in rapid but unstable development penalizing action strength in a specific way not affecting actions directly. The Symphony, Transitional-policy Deterministic Actor and Critic algorithm, is a concise combination of different ideas for possibility of training humanoid robots from scratch with Sample Efficiency, Sample Proximity and Safety of Actions in mind. It is no secret that continuous increase in Gaussian noise without appropriate smoothing is harmful for motors and gearboxes. Compared to Stochastic algorithms, we set a limited parametric noise and promote a reduced strength of actions, safely increasing entropy, since the actions are kind of immersed in weaker noise. When actions require more extreme values, actions rise above the weak noise. Training becomes empirically much safer for both the environment around and the robot's mechanisms. We use Fading Replay Buffer: using a fixed formula containing the hyperbolic tangent, we adjust the batch sampling probability: the memory contains a recent memory and a long-term memory trail. Fading Replay Buffer allows us to use Temporal Advantage when we improve the current Critic Network prediction compared to the exponential moving average. Temporal Advantage allows us to update Actor and Critic in one pass, as well as combine Actor and Critic in one Object and implement their Losses in one line.
comment: https://github.com/SuspensionRailway/symphony
RiemanLine: Riemannian Manifold Representation of 3D Lines for Factor Graph Optimization
Minimal parametrization of 3D lines plays a critical role in camera localization and structural mapping. Existing representations in robotics and computer vision predominantly handle independent lines, overlooking structural regularities such as sets of parallel lines that are pervasive in man-made environments. This paper introduces \textbf{RiemanLine}, a unified minimal representation for 3D lines formulated on Riemannian manifolds that jointly accommodates both individual lines and parallel-line groups. Our key idea is to decouple each line landmark into global and local components: a shared vanishing direction optimized on the unit sphere $\mathcal{S}^2$, and scaled normal vectors constrained on orthogonal subspaces, enabling compact encoding of structural regularities. For $n$ parallel lines, the proposed representation reduces the parameter space from $4n$ (orthonormal form) to $2n+2$, naturally embedding parallelism without explicit constraints. We further integrate this parameterization into a factor graph framework, allowing global direction alignment and local reprojection optimization within a unified manifold-based bundle adjustment. Extensive experiments on ICL-NUIM, TartanAir, and synthetic benchmarks demonstrate that our method achieves significantly more accurate pose estimation and line reconstruction, while reducing parameter dimensionality and improving convergence stability.
On The Computational Complexity of Minimum Aerial Photographs for Planar Region Coverage
With the popularity of drone technologies, aerial photography has become prevalent in many daily scenarios such as environment monitoring, structure inspection, law enforcement etc. A central challenge in this domain is the efficient coverage of a target area with photographs that can entirely capture the region, while respecting constraints such as the image resolution, and limited number of pictures that can be taken. This work investigates the computational complexity of covering a simple planar polygon using squares and circles. Specifically, it shows inapproximability gaps of $1.165$ (for squares) and $1.25$ (for restricted square centers) and develops a $2.828$-optimal approximation algorithm, demonstrating that these problems are computationally intractable to approximate. The intuitions of this work can extend beyond aerial photography to broader applications such as pesticide spraying and strategic sensor placement.
Driving Beyond Privilege: Distilling Dense-Reward Knowledge into Sparse-Reward Policies
We study how to exploit dense simulator-defined rewards in vision-based autonomous driving without inheriting their misalignment with deployment metrics. In realistic simulators such as CARLA, privileged state (e.g., lane geometry, infractions, time-to-collision) can be converted into dense rewards that stabilize and accelerate model-based reinforcement learning, but policies trained directly on these signals often overfit and fail to generalize when evaluated on sparse objectives such as route completion and collision-free overtaking. We propose reward-privileged world model distillation, a two-stage framework in which a teacher DreamerV3-style agent is first trained with a dense privileged reward, and only its latent dynamics are distilled into a student trained solely on sparse task rewards. Teacher and student share the same observation space (semantic bird's-eye-view images); privileged information enters only through the teacher's reward, and the student does not imitate the teacher's actions or value estimates. Instead, the student's world model is regularized to match the teacher's latent dynamics while its policy is learned from scratch on sparse success/failure signals. In CARLA lane-following and overtaking benchmarks, sparse-reward students outperform both dense-reward teachers and sparse-from-scratch baselines. On unseen lane-following routes, reward-privileged distillation improves success by about 23 percent relative to the dense teacher while maintaining comparable or better safety. On overtaking, students retain near-perfect performance on training routes and achieve up to a 27x improvement in success on unseen routes, with improved lane keeping. These results show that dense rewards can be leveraged to learn richer dynamics models while keeping the deployed policy optimized strictly for sparse, deployment-aligned objectives.
InDRiVE: Reward-Free World-Model Pretraining for Autonomous Driving via Latent Disagreement
Model-based reinforcement learning (MBRL) can reduce interaction cost for autonomous driving by learning a predictive world model, but it typically still depends on task-specific rewards that are difficult to design and often brittle under distribution shift. This paper presents InDRiVE, a DreamerV3-style MBRL agent that performs reward-free pretraining in CARLA using only intrinsic motivation derived from latent ensemble disagreement. Disagreement acts as a proxy for epistemic uncertainty and drives the agent toward under-explored driving situations, while an imagination-based actor-critic learns a planner-free exploration policy directly from the learned world model. After intrinsic pretraining, we evaluate zero-shot transfer by freezing all parameters and deploying the pretrained exploration policy in unseen towns and routes. We then study few-shot adaptation by training a task policy with limited extrinsic feedback for downstream objectives (lane following and collision avoidance). Experiments in CARLA across towns, routes, and traffic densities show that disagreement-based pretraining yields stronger zero-shot robustness and robust few-shot collision avoidance under town shift and matched interaction budgets, supporting the use of intrinsic disagreement as a practical reward-free pretraining signal for reusable driving world models.
RefAV: Towards Planning-Centric Scenario Mining
Autonomous Vehicles (AVs) collect and pseudo-label terabytes of multi-modal data localized to HD maps during normal fleet testing. However, identifying interesting and safety-critical scenarios from uncurated driving logs remains a significant challenge. Traditional scenario mining techniques are error-prone and prohibitively time-consuming, often relying on hand-crafted structured queries. In this work, we revisit spatio-temporal scenario mining through the lens of recent vision-language models (VLMs) to detect whether a described scenario occurs in a driving log and, if so, precisely localize it in both time and space. To address this problem, we introduce RefAV, a large-scale dataset of 10,000 diverse natural language queries that describe complex multi-agent interactions relevant to motion planning derived from 1000 driving logs in the Argoverse 2 Sensor dataset. We evaluate several referential multi-object trackers and present an empirical analysis of our baselines. Notably, we find that naively repurposing off-the-shelf VLMs yields poor performance, suggesting that scenario mining presents unique challenges. Lastly, we discuss our recently held competition and share insights from the community. Our code and dataset are available at https://github.com/CainanD/RefAV/ and https://argoverse.github.io/user-guide/tasks/scenario_mining.html
comment: Project Page: https://cainand.github.io/RefAV/
EMMA: Scaling Mobile Manipulation via Egocentric Human Data
Scaling mobile manipulation imitation learning is bottlenecked by expensive mobile robot teleoperation. We present Egocentric Mobile MAnipulation (EMMA), an end-to-end framework training mobile manipulation policies from human mobile manipulation data with static robot data, sidestepping mobile teleoperation. To accomplish this, we co-train human full-body motion data with static robot data. In our experiments across three real-world tasks, EMMA demonstrates comparable performance to baselines trained on teleoperated mobile robot data (Mobile ALOHA), achieving higher or equivalent task performance in full task success. We find that EMMA is able to generalize to new spatial configurations and scenes, and we observe positive performance scaling as we increase the hours of human data, opening new avenues for scalable robotic learning in real-world environments. Details of this project can be found at https://ego-moma.github.io/.
Multiagent Systems
The Wisdom of Deliberating AI Crowds: Does Deliberation Improve LLM-Based Forecasting?
Structured deliberation has been found to improve the performance of human forecasters. This study investigates whether a similar intervention, i.e. allowing LLMs to review each other's forecasts before updating, can improve accuracy in large language models (GPT-5, Claude Sonnet 4.5, Gemini Pro 2.5). Using 202 resolved binary questions from the Metaculus Q2 2025 AI Forecasting Tournament, accuracy was assessed across four scenarios: (1) diverse models with distributed information, (2) diverse models with shared information, (3) homogeneous models with distributed information, and (4) homogeneous models with shared information. Results show that the intervention significantly improves accuracy in scenario (2), reducing Log Loss by 0.020 or about 4 percent in relative terms (p = 0.017). However, when homogeneous groups (three instances of the same model) engaged in the same process, no benefit was observed. Unexpectedly, providing LLMs with additional contextual information did not improve forecast accuracy, limiting our ability to study information pooling as a mechanism. Our findings suggest that deliberation may be a viable strategy for improving LLM forecasting.
comment: 13 pages, 2 figures, 5 tables, for source code and data see https://github.com/priorb-source/delib-ai-wisdom
Hierarchical Pedagogical Oversight: A Multi-Agent Adversarial Framework for Reliable AI Tutoring AAAI 2026
Large Language Models (LLMs) are increasingly deployed as automated tutors to address educator shortages; however, they often fail at pedagogical reasoning, frequently validating incorrect student solutions (sycophancy) or providing overly direct answers that hinder learning. We introduce Hierarchical Pedagogical Oversight (HPO), a framework that adapts structured adversarial synthesis to educational assessment. Unlike cooperative multi-agent systems that often drift toward superficial consensus, HPO enforces a dialectical separation of concerns: specialist agents first distill dialogue context, which then grounds a moderated, five-act debate between opposing pedagogical critics. We evaluate this framework on the MRBench dataset of 1,214 middle-school mathematics dialogues. Our 8B-parameter model achieves a Macro F1 of 0.845, outperforming GPT-4o (0.812) by 3.3% while using 20 times fewer parameters. These results establish adversarial reasoning as a critical mechanism for deploying reliable, low-compute pedagogical oversight in resource-constrained environments.
comment: Accepted for presentation at the AAAI 2026 EGSAI Community Activity (AAAI 2026)
Bugs with Features: Vision-Based Fault-Tolerant Collective Motion Inspired by Nature
In collective motion, perceptually-limited individuals move in an ordered manner, without centralized control. The perception of each individual is highly localized, as is its ability to interact with others. While natural collective motion is robust, most artificial swarms are brittle. This particularly occurs when vision is used as the sensing modality, due to ambiguities and information-loss inherent in visual perception. This paper presents mechanisms for robust collective motion inspired by studies of locusts. First, we develop a robust distance estimation method that combines visually perceived horizontal and vertical sizes of neighbors. Second, we introduce intermittent locomotion as a mechanism that allows robots to reliably detect peers that fail to keep up, and disrupt the motion of the swarm. We show how such faulty robots can be avoided in a manner that is robust to errors in classifying them as faulty. Through extensive physics-based simulation experiments, we show dramatic improvements to swarm resilience when using these techniques. We show these are relevant to both distance-based Avoid-Attract models, as well as to models relying on Alignment, in a wide range of experiment settings.
Evolutionary Algorithms for Computing Nash Equilibria in Dynamic Games
Dynamic nonzero sum games are widely used to model multi agent decision making in control, economics, and related fields. Classical methods for computing Nash equilibria, especially in linear quadratic settings, rely on strong structural assumptions and become impractical for nonlinear dynamics, many players, or long horizons, where multiple local equilibria may exist. We show through examples that such methods can fail to reach the true global Nash equilibrium even in relatively small games. To address this, we propose two population based evolutionary algorithms for general dynamic games with linear or nonlinear dynamics and arbitrary objective functions: a co evolutionary genetic algorithm and a hybrid genetic algorithm particle swarm optimization scheme. Both approaches search directly over joint strategy spaces without restrictive assumptions and are less prone to getting trapped in local Nash equilibria, providing more reliable approximations to global Nash solutions.
QLLM: Do We Really Need a Mixing Network for Credit Assignment in Multi-Agent Reinforcement Learning?
Credit assignment has remained a fundamental challenge in multi-agent reinforcement learning (MARL). Previous studies have primarily addressed this issue through value decomposition methods under the centralized training with decentralized execution paradigm, where neural networks are utilized to approximate the nonlinear relationship between individual Q-values and the global Q-value. Although these approaches have achieved considerable success in various benchmark tasks, they still suffer from several limitations, including imprecise attribution of contributions, limited interpretability, and poor scalability in high-dimensional state spaces. To address these challenges, we propose a novel algorithm, QLLM, which facilitates the automatic construction of credit assignment functions using large language models (LLMs). Specifically, the concept of TFCAF is introduced, wherein the credit allocation process is represented as a direct and expressive nonlinear functional formulation. A custom-designed coder-evaluator framework is further employed to guide the generation and verification of executable code by LLMs, significantly mitigating issues such as hallucination and shallow reasoning during inference. Furthermore, an IGM-Gating Mechanism enables QLLM to flexibly enforce or relax the monotonicity constraint depending on task demands, covering both IGM-compliant and non-monotonic scenarios. Extensive experiments conducted on several standard MARL benchmarks demonstrate that the proposed method consistently outperforms existing state-of-the-art baselines. Moreover, QLLM exhibits strong generalization capability and maintains compatibility with a wide range of MARL algorithms that utilize mixing networks, positioning it as a promising and versatile solution for complex multi-agent scenarios. The code is available at https://github.com/zhouyangjiang71-sys/QLLM.
comment: Following the correction of experimental and data preprocessing errors, all experiments have been rerun and the results have been validated. The code is now publicly available for reproducibility
Systems and Control (EESS)
Predictive Modeling of Power Outages during Extreme Events: Integrating Weather and Socio-Economic Factors
This paper presents a novel learning-based framework for predicting power outages caused by extreme events. The proposed approach specifically targets low-probability, high-consequence outage scenarios and leverages a comprehensive set of features derived from publicly available data sources. We integrate EAGLE-I outage records (2014-2024) with weather, socio-economic, infrastructure, and seasonal event data. Incorporating social and demographic indicators reveals underlying patterns of community vulnerability and provides a clearer understanding of outage risk during extreme conditions. Four machine learning models (Random Forest (RF), Support Vector Machine (SVM), Adaptive Boosting (AdaBoost), and Long Short-Term Memory (LSTM)) are evaluated. Experimental validation is performed on a large-scale dataset covering counties in the lower peninsula of Michigan. Among all models tested, the LSTM network achieves the lowest prediction error. Additionally, the results demonstrate that stronger economic conditions and more developed infrastructure are associated with lower outage occurrence.
comment: This is a preprint of a manuscript currently under review at Electric Power Systems Research. The content may be subject to change following peer review
From Electrochemical Energy Storage to Next-Generation Intelligent Battery Technologies for Electric Vehicles: A Survey
This study provides a comprehensive overview of recent advances in electrochemical energy storage, including Na+ -ion, metal-ion, and metal-air batteries, alongside innovations in electrode engineering, electrolytes, and solid-electrolyte interphase control. It also explores the integration of machine learning, digital twins, large language models and predictive analytics to enable intelligent battery management systems, enhancing performance, safety, and operational longevity. Key challenges, research gaps, and future prospects are addressed, highlighting opportunities presented by hybrid chemistry, scalable manufacturing, sustainability, and AI-driven optimization. This survey aims to provide researchers, engineers, and industry profesionnals with a comprehensive understanding of next-generation battery technologies for the evolving electric vehicles sector.
comment: This work was supervised by leading professor in the field (Pr. Mohamed Trari, Pr. Waseem Haider, Pr. Ylias Sabri)
Optimal Regulation of Nonlinear Input-Affine Systems via an Integral Reinforcement Learning-Based State-Dependent Riccati Equation Approach
The State-Dependent Riccati Equation (SDRE) technique generalizes the classical algebraic Riccati formulation to nonlinear systems by designing an input to the system that optimally(suboptimally) regulates system states toward the origin while simultaneously optimizing a quadratic performance index. In the SDRE technique, we solve the State-Dependent Riccati Equation to determine the control for regulating a nonlinear input-affine system. Since an analytic solution to SDRE is not straightforward, one method is to linearize the system at every state, solve the corresponding Algebraic Riccati Equation (ARE), and apply optimal control until the next state of the system. Completing this task with high frequency gives a result like the original SDRE technique. Both approaches require a complete model; therefore, here we propose a method that solves ARE in every state of the system using a partially model-free approach that learns optimal control in every state of the system, without explicit knowledge of the drift dynamics, based on Integral Reinforcement Learning (IRL). To show the effectiveness of our proposed approach, we apply it to the second-order nonlinear system in simulation and compare its performance with the classical SDRE method, which relies on the system's model and solves the ARE at each state. Our simulation results demonstrate that, with sufficient iterations, the IRL-based approach achieves approximately the same performance as the conventional SDRE method, demonstrating its capability as a reliable alternative for nonlinear system control that does not require an explicit environmental model. Index Terms-Algebraic Riccati Equation (ARE), Integral Reinforcement Learning (IRL), Nonlinear Input-Affine Systems, Optimal Regulation, State-Dependent Riccati Equation (SDRE)
comment: Presented at the 13th RSI International Conference on Robotics and Mechatronics (ICRoM 2025), Dec. 16-18, 2025, Tehran, Iran
On the Stealth of Unbounded Attacks Under Non-Negative-Kernel Feedback
The stealth of false data injection attacks (FDIAs) against feedback sensors in linear time-varying (LTV) control systems is investigated. In that regard, the following notions of stealth are pursued: For some finite $ε> 0$, i) an FDIA is deemed $ε$-stealthy if the deviation it produces in the signal that is monitored by the anomaly detector remains $ε$-bounded for all time, and ii) the $ε$-stealthy FDIA is further classified as untraceable if the bounded deviation dissipates over time (asymptotically). For LTV systems that contain a chain of $q \geq 1$ integrators and feedback controllers with non-negative impulse-response kernels, it is proved that polynomial (in time) FDIA signals of degree $a$ - growing unbounded over time - will remain i) $ε$-stealthy, for some finite $ε> 0$, if $a \leq q$, and ii) untraceable, if $a < q$. These results are obtained using the theory of linear Volterra integral equations.
comment: 8 pages, 3 figures, submitted to IFAC World Congress 2026
Tree Meets Transformer: A Hybrid Architecture for Scalable Power Allocation in Cell-Free Networks
Power allocation remains a fundamental challenge in wireless communication networks, particularly under dynamic user loads and large-scale deployments. While Transformerbased models have demonstrated strong performance, their computational cost scales poorly with the number of users. In this work, we propose a novel hybrid Tree-Transformer architecture that achieves scalable per-user power allocation. Our model compresses user features via a binary tree into a global root representation, applies a Transformer encoder solely to this root, and decodes per-user uplink and downlink powers through a shared decoder. This design achieves logarithmic depth and linear total complexity, enabling efficient inference across large and variable user sets without retraining or architectural changes. We evaluate our model on the max-min fairness problem in cellfree massive MIMO systems and demonstrate that it achieves near-optimal performance while significantly reducing inference time compared to full-attention baselines.
A Decomposition Method for Solving Sensitivity-Based Distributed Optimal Power Flow
Efficiently solving large-scale optimal power flow (OPF) problems is challenging due to the high dimensionality and interconnectivity of modern power systems. Decomposition methods offer a promising solution via partitioning large problems into smaller subproblems that can be solved in parallel, often with local information. These approaches reduce computational burden and improve flexibility by allowing agents to manage their local models. This article introduces a decomposition method that enables a distributed solution to OPF problems. The proposed method solves OPF problems with a sensitivity-based formulation using the alternating direction method of multipliers (ADMM) algorithm. We also propose a distributed method to compute system-wide sensitivities without sharing local parameters. This approach facilitates scalable optimization while satisfying global constraints and limiting data sharing. We demonstrate the effectiveness of the proposed approach using a large set of test systems and compare its performance against existing decomposition methods. The results show that the proposed method significantly outperforms the typical phase-angle formulation with a 14-times faster computation speed on average.
A Runtime-Adaptive Transformer Neural Network Accelerator on FPGAs
Transformer neural networks (TNN) excel in natural language processing (NLP), machine translation, and computer vision (CV) without relying on recurrent or convolutional layers. However, they have high computational and memory demands, particularly on resource-constrained devices like FPGAs. Moreover, transformer models vary in processing time across applications, requiring custom models with specific parameters. Designing custom accelerators for each model is complex and time-intensive. Some custom accelerators exist with no runtime adaptability, and they often rely on sparse matrices to reduce latency. However, hardware designs become more challenging due to the need for application-specific sparsity patterns. This paper introduces ADAPTOR, a runtime-adaptive accelerator for dense matrix computations in transformer encoders and decoders on FPGAs. ADAPTOR enhances the utilization of processing elements and on-chip memory, enhancing parallelism and reducing latency. It incorporates efficient matrix tiling to distribute resources across FPGA platforms and is fully quantized for computational efficiency and portability. Evaluations on Xilinx Alveo U55C data center cards and embedded platforms like VC707 and ZCU102 show that our design is 1.2$\times$ and 2.87$\times$ more power efficient than the NVIDIA K80 GPU and the i7-8700K CPU respectively. Additionally, it achieves a speedup of 1.7 to 2.25$\times$ compared to some state-of-the-art FPGA-based accelerators.
comment: Corrected based on the published journal on Microprocessors and Microsystems
NVM-in-Cache: Repurposing Commodity 6T SRAM Cache into NVM Analog Processing-in-Memory Engine using a Novel Compute-on-Powerline Scheme
The rapid growth of deep neural network (DNN) workloads has significantly increased the demand for large-capacity on-chip SRAM in machine learning (ML) applications, with SRAM arrays now occupying a substantial fraction of the total die area. To address the dual challenges of storage density and computation efficiency, this paper proposes an NVM-in-Cache architecture that integrates resistive RAM (RRAM) devices into a conventional 6T-SRAM cell, forming a compact 6T-2R bit-cell. This hybrid cell enables Processing-in-Memory (PIM) mode, which performs massively parallel multiply-and-accumulate (MAC) operations directly on cache power lines while preserving stored cache data. By exploiting the intrinsic properties of the 6T-2R structure, the architecture achieves additional storage capability, high computational throughput without any bit-cell area overhead. Circuit- and array-level simulations in GlobalFoundries 22nm FDSOI technology demonstrate that the proposed design achieves a throughput of 0.4 TOPS and 452.34 TOPS/W. For 128 row-parallel operations, the CIFAR-10 classification is demonstrated by mapping a Resnet-18 neural network, achieving an accuracy of 91.76%. These results highlight the potential of the NVM-in-Cache approach to serve as a scalable, energy-efficient computing method by re-purposing existing 6T SRAM cache architecture for next-generation AI accelerators and general purpose processors.
comment: 11 pages
Assessing the economic benefits of space weather mitigation investment decisions: Evidence from Aotearoa New Zealand
Space weather events pose a growing threat to modern economies, yet their macroeconomic consequences still remain underexplored. This study presents the first dedicated economic assessment of geomagnetic storm impacts on Aotearoa New Zealand, quantifying potential gross domestic product (GDP) losses across seven conservative disruption and mitigation scenarios due to an extreme coronal mass ejection (CME). The primary focus is upon the damaging impacts of geomagnetically induced currents (GICs) on the electrical power transmission network. We support space weather mitigation investments decisions by providing a first-order approximation of their potential economic benefits, using best-in-class scientific models. In the absence of mitigation, a severe but realistic storm could result in up to NZ\$8.36 billion in lost GDP, with more than half stemming from cascading supply chain effects. Yet, even less severe scenarios incur losses exceeding NZ\$3 billion. Importantly, even with conservative impact estimates we find that research-led operational strategies, such as optimized switching and islanding, can avoid up to NZ\$370 million in losses for as little as NZ\$0.5 million in expenditure, delivering a benefit-cost ratio of 740 to 1. Equally, physical protections such as GIC blocking devices achieve benefit-cost returns up to 80 to 1, highlighting the strong case for investment in space weather mitigation. When also acknowledging additional unmodelled impacts, including multi-billion losses in capital equipment and long-term revenue, the economic rationale for pre-emptive mitigation becomes even more pertinent. Future research needs to integrate the modelling of capital and revenue losses for strategically important industrial facilities.
Stabilizing NMPC Approaches for Underactuated Mechanical Systems on the SE(3) Manifold
This paper addresses the motion control problem for underactuated mechanical systems with full attitude control and one translational force input to manage the six degrees of freedom involved in the three-dimensional Euclidean space. These systems are often classified as second-order nonholonomic due to their completely nonintegrable acceleration constraints. To tackle this complex control problem, we propose two nonlinear model predictive control (NMPC) schemes that ensure closed-loop stability and recursive feasibility without terminal conditions. The system dynamics are modeled on the SE(3) manifold for a globally and unique description of rigid body configurations. One NMPC scheme also aims to reduce mission time as an economic criterion. The controllers' effectiveness is validated through numerical experiments on a quadrotor UAV.
comment: This is a preprint submitted to Automatica
A Bidirectional Diode-Clamp Circuit Paradigm for Time-Resolved Measurement of Electrical Short-Circuits
Conventional electrical fault models, which rely on static thresholds and instantaneous trip mechanisms, fail to capture the time-evolving dynamics of real faults, creating vulnerabilities in modern power systems. This paper introduces a diode-clamp circuit architecture that reconceives short-circuits as governed, sustained processes and establishes a physics-consistent, measurement system. An Arduino-based data acquisition system recorded continuous fault evolution across multiple input voltages and durations. Multi-resolution sampling at 10ms, 50ms, and 100ms enabled high-fidelity capture of both transients and sustained-state dynamics. The clamped mechanism constrained the circuit to a bounded regime, enabling repeatable observation. Experiments yielded definitive, measurable minima and maxima for voltage, current, and resistance, empirically refuting the classical assumption of instantaneous, unbounded current. Newly introduced metrics quantify this performance: the Sustained-to-Capacitive Energy Ratio (SCER ~1.53x10^12) proves fault energy originates from sustained dynamics, not transient discharge. The Sustained Fault Efficiency (SFE>1) demonstrates that governed fault power can exceed nominal operating power. This work provides the first fully validated short-circuit quantification system, yielding empirical data for next-generation battery management, adaptive grid protection, and fault-tolerant electronics.
comment: 123 pages, 22 Figures
Robotics
Bab_Sak Robotic Intubation System (BRIS): A Learning-Enabled Control Framework for Safe Fiberoptic Endotracheal Intubation
Endotracheal intubation is a critical yet technically demanding procedure, with failure or improper tube placement leading to severe complications. Existing robotic and teleoperated intubation systems primarily focus on airway navigation and do not provide integrated control of endotracheal tube advancement or objective verification of tube depth relative to the carina. This paper presents the Robotic Intubation System (BRIS), a compact, human-in-the-loop platform designed to assist fiberoptic-guided intubation while enabling real-time, objective depth awareness. BRIS integrates a four-way steerable fiberoptic bronchoscope, an independent endotracheal tube advancement mechanism, and a camera-augmented mouthpiece compatible with standard clinical workflows. A learning-enabled closed-loop control framework leverages real-time shape sensing to map joystick inputs to distal bronchoscope tip motion in Cartesian space, providing stable and intuitive teleoperation under tendon nonlinearities and airway contact. Monocular endoscopic depth estimation is used to classify airway regions and provide interpretable, anatomy-aware guidance for safe tube positioning relative to the carina. The system is validated on high-fidelity airway mannequins under standard and difficult airway configurations, demonstrating reliable navigation and controlled tube placement. These results highlight BRIS as a step toward safer, more consistent, and clinically compatible robotic airway management.
StereoVLA: Enhancing Vision-Language-Action Models with Stereo Vision
Stereo cameras closely mimic human binocular vision, providing rich spatial cues critical for precise robotic manipulation. Despite their advantage, the adoption of stereo vision in vision-language-action models (VLAs) remains underexplored. In this work, we present StereoVLA, a VLA model that leverages rich geometric cues from stereo vision. We propose a novel Geometric-Semantic Feature Extraction module that utilizes vision foundation models to extract and fuse two key features: 1) geometric features from subtle stereo-view differences for spatial perception; 2) semantic-rich features from the monocular view for instruction following. Additionally, we propose an auxiliary Interaction-Region Depth Estimation task to further enhance spatial perception and accelerate model convergence. Extensive experiments show that our approach outperforms baselines by a large margin in diverse tasks under the stereo setting and demonstrates strong robustness to camera pose variations.
Flexible Multitask Learning with Factorized Diffusion Policy
Multitask learning poses significant challenges due to the highly multimodal and diverse nature of robot action distributions. However, effectively fitting policies to these complex task distributions is often difficult, and existing monolithic models often underfit the action distribution and lack the flexibility required for efficient adaptation. We introduce a novel modular diffusion policy framework that factorizes complex action distributions into a composition of specialized diffusion models, each capturing a distinct sub-mode of the behavior space for a more effective overall policy. In addition, this modular structure enables flexible policy adaptation to new tasks by adding or fine-tuning components, which inherently mitigates catastrophic forgetting. Empirically, across both simulation and real-world robotic manipulation settings, we illustrate how our method consistently outperforms strong modular and monolithic baselines.
Aerial World Model for Long-horizon Visual Generation and Navigation in 3D Space
Unmanned aerial vehicles (UAVs) have emerged as powerful embodied agents. One of the core abilities is autonomous navigation in large-scale three-dimensional environments. Existing navigation policies, however, are typically optimized for low-level objectives such as obstacle avoidance and trajectory smoothness, lacking the ability to incorporate high-level semantics into planning. To bridge this gap, we propose ANWM, an aerial navigation world model that predicts future visual observations conditioned on past frames and actions, thereby enabling agents to rank candidate trajectories by their semantic plausibility and navigational utility. ANWM is trained on 4-DoF UAV trajectories and introduces a physics-inspired module: Future Frame Projection (FFP), which projects past frames into future viewpoints to provide coarse geometric priors. This module mitigates representational uncertainty in long-distance visual generation and captures the mapping between 3D trajectories and egocentric observations. Empirical results demonstrate that ANWM significantly outperforms existing world models in long-distance visual forecasting and improves UAV navigation success rates in large-scale environments.
Online Inertia Parameter Estimation for Unknown Objects Grasped by a Manipulator Towards Space Applications
Knowing the inertia parameters of a grasped object is crucial for dynamics-aware manipulation, especially in space robotics with free-floating bases. This work addresses the problem of estimating the inertia parameters of an unknown target object during manipulation. We apply and extend an existing online identification method by incorporating momentum conservation, enabling its use for the floating-base robots. The proposed method is validated through numerical simulations, and the estimated parameters are compared with ground-truth values. Results demonstrate accurate identification in the scenarios, highlighting the method's applicability to on-orbit servicing and other space missions.
comment: Author's version of a manuscript accepted at the International Conference on Space Robotics 2025 (iSpaRo 2025). (c) IEEE
Optimal Trajectory Planning for Orbital Robot Rendezvous and Docking
Approaching a tumbling target safely is a critical challenge in space debris removal missions utilizing robotic manipulators onboard servicing satellites. In this work, we propose a trajectory planning method based on nonlinear optimization for a close-range rendezvous to bring a free-floating, rotating debris object in a two-dimensional plane into the manipulator's workspace, as a preliminary step for its capture. The proposed method introduces a dynamic keep-out sphere that adapts depending on the approach conditions, allowing for closer and safer access to the target. Furthermore, a control strategy is developed to reproduce the optimized trajectory using discrete ON/OFF thrusters, considering practical implementation constraints.
comment: Author's version of a manuscript accepted at the International Conference on Space Robotics 2025 (iSpaRo 2025). (c) IEEE
MoonBot: Modular and On-Demand Reconfigurable Robot Toward Moon Base Construction
The allure of lunar surface exploration and development has recently captured widespread global attention. Robots have proved to be indispensable for exploring uncharted terrains, uncovering and leveraging local resources, and facilitating the construction of future human habitats. In this article, we introduce the modular and on-demand reconfigurable robot (MoonBot), a modular and reconfigurable robotic system engineered to maximize functionality while operating within the stringent mass constraints of lunar payloads and adapting to varying environmental conditions and task requirements. This article details the design and development of MoonBot and presents a preliminary field demonstration that validates the proof of concept through the execution of milestone tasks simulating the establishment of lunar infrastructure. These tasks include essential civil engineering operations, infrastructural component transportation and deployment, and assistive operations with inflatable modules. Furthermore, we systematically summarize the lessons learned during testing, focusing on the connector design and providing valuable insights for the advancement of modular robotic systems in future lunar missions.
comment: This is the authors' version of a paper accepted for publication in IEEE Transactions on Field Robotics, (c) IEEE. The final published version is available at https://doi.org/10.1109/TFR.2025.3624346
A Unified AI, Embedded, Simulation, and Mechanical Design Approach to an Autonomous Delivery Robot
This paper presents the development of a fully autonomous delivery robot integrating mechanical engineering, embedded systems, and artificial intelligence. The platform employs a heterogeneous computing architecture, with RPi 5 and ROS 2 handling AI-based perception and path planning, while ESP32 running FreeRTOS ensures real-time motor control. The mechanical design was optimized for payload capacity and mobility through precise motor selection and material engineering. Key technical challenges addressed include optimizing computationally intensive AI algorithms on a resource-constrained platform and implementing a low-latency, reliable communication link between the ROS 2 host and embedded controller. Results demonstrate deterministic, PID-based motor control through rigorous memory and task management, and enhanced system reliability via AWS IoT monitoring and a firmware-level motor shutdown failsafe. This work highlights a unified, multi-disciplinary methodology, resulting in a robust and operational autonomous delivery system capable of real-world deployment.
VL-LN Bench: Towards Long-horizon Goal-oriented Navigation with Active Dialogs
In most existing embodied navigation tasks, instructions are well-defined and unambiguous, such as instruction following and object searching. Under this idealized setting, agents are required solely to produce effective navigation outputs conditioned on vision and language inputs. However, real-world navigation instructions are often vague and ambiguous, requiring the agent to resolve uncertainty and infer user intent through active dialog. To address this gap, we propose Interactive Instance Object Navigation (IION), a task that requires agents not only to generate navigation actions but also to produce language outputs via active dialog, thereby aligning more closely with practical settings. IION extends Instance Object Navigation (ION) by allowing agents to freely consult an oracle in natural language while navigating. Building on this task, we present the Vision Language-Language Navigation (VL-LN) benchmark, which provides a large-scale, automatically generated dataset and a comprehensive evaluation protocol for training and assessing dialog-enabled navigation models. VL-LN comprises over 41k long-horizon dialog-augmented trajectories for training and an automatic evaluation protocol with an oracle capable of responding to agent queries. Using this benchmark, we train a navigation model equipped with dialog capabilities and show that it achieves significant improvements over the baselines. Extensive experiments and analyses further demonstrate the effectiveness and reliability of VL-LN for advancing research on dialog-enabled embodied navigation. Code and dataset: https://0309hws.github.io/VL-LN.github.io/
RoboCade: Gamifying Robot Data Collection
Imitation learning from human demonstrations has become a dominant approach for training autonomous robot policies. However, collecting demonstration datasets is costly: it often requires access to robots and needs sustained effort in a tedious, long process. These factors limit the scale of data available for training policies. We aim to address this scalability challenge by involving a broader audience in a gamified data collection experience that is both accessible and motivating. Specifically, we develop a gamified remote teleoperation platform, RoboCade, to engage general users in collecting data that is beneficial for downstream policy training. To do this, we embed gamification strategies into the design of the system interface and data collection tasks. In the system interface, we include components such as visual feedback, sound effects, goal visualizations, progress bars, leaderboards, and badges. We additionally propose principles for constructing gamified tasks that have overlapping structure with useful downstream target tasks. We instantiate RoboCade on three manipulation tasks -- including spatial arrangement, scanning, and insertion. To illustrate the viability of gamified robot data collection, we collect a demonstration dataset through our platform, and show that co-training robot policies with this data can improve success rate on non-gamified target tasks (+16-56%). Further, we conduct a user study to validate that novice users find the gamified platform significantly more enjoyable than a standard non-gamified platform (+24%). These results highlight the promise of gamified data collection as a scalable, accessible, and engaging method for collecting demonstration data.
comment: 10 pages, 9 figures
LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models
Visual-Language-Action (VLA) models report impressive success rates on robotic manipulation benchmarks, yet these results may mask fundamental weaknesses in robustness. We perform a systematic vulnerability analysis by introducing controlled perturbations across seven dimensions: objects layout, camera viewpoints, robot initial states, language instructions, light conditions, background textures and sensor noise. We comprehensively analyzed multiple state-of-the-art models and revealed consistent brittleness beneath apparent competence. Our analysis exposes critical weaknesses: models exhibit extreme sensitivity to perturbation factors, including camera viewpoints and robot initial states, with performance dropping from 95% to below 30% under modest perturbations. Surprisingly, models are largely insensitive to language variations, with further experiments revealing that models tend to ignore language instructions completely. Our findings challenge the assumption that high benchmark scores equate to true competency and highlight the need for evaluation practices that assess reliability under realistic variation.
Transformer Driven Visual Servoing for Fabric Texture Matching Using Dual-Arm Manipulator
In this paper, we propose a method to align and place a fabric piece on top of another using a dual-arm manipulator and a grayscale camera, so that their surface textures are accurately matched. We propose a novel control scheme that combines Transformer-driven visual servoing with dualarm impedance control. This approach enables the system to simultaneously control the pose of the fabric piece and place it onto the underlying one while applying tension to keep the fabric piece flat. Our transformer-based network incorporates pretrained backbones and a newly introduced Difference Extraction Attention Module (DEAM), which significantly enhances pose difference prediction accuracy. Trained entirely on synthetic images generated using rendering software, the network enables zero-shot deployment in real-world scenarios without requiring prior training on specific fabric textures. Real-world experiments demonstrate that the proposed system accurately aligns fabric pieces with different textures.
comment: 8 pages, 11 figures. Accepted to IEEE Robotics and Automation Letters (RA-L)
Super-LIO: A Robust and Efficient LiDAR-Inertial Odometry System with a Compact Mapping Strategy
LiDAR-Inertial Odometry (LIO) is a foundational technique for autonomous systems, yet its deployment on resource-constrained platforms remains challenging due to computational and memory limitations. We propose Super-LIO, a robust LIO system that demands both high performance and accuracy, ideal for applications such as aerial robots and mobile autonomous systems. At the core of Super-LIO is a compact octo-voxel-based map structure, termed OctVox, that limits each voxel to eight fused subvoxels, enabling strict point density control and incremental denoising during map updates. This design enables a simple yet efficient and accurate map structure, which can be easily integrated into existing LIO frameworks. Additionally, Super-LIO designs a heuristic-guided KNN strategy (HKNN) that accelerates the correspondence search by leveraging spatial locality, further reducing runtime overhead. We evaluated the proposed system using four publicly available datasets and several self-collected datasets, totaling more than 30 sequences. Extensive testing on both X86 and ARM platforms confirms that Super-LIO offers superior efficiency and robustness, while maintaining competitive accuracy. Super-LIO processes each frame approximately 73% faster than SOTA, while consuming less CPU resources. The system is fully open-source and plug-and-play compatible with a wide range of LiDAR sensors and platforms. The implementation is available at: https://github.com/Liansheng-Wang/Super-LIO.git
comment: 8 pages, 5 figures
Learning collision risk proactively from naturalistic driving data at scale
Accurately and proactively alerting drivers or automated systems to emerging collisions is crucial for road safety, particularly in highly interactive and complex urban environments. Existing methods either require labour-intensive annotation of sparse risk, struggle to consider varying contextual factors, or are tailored to limited scenarios. Here we present the Generalised Surrogate Safety Measure (GSSM), a data-driven approach that learns collision risk from naturalistic driving without the need for crash or risk labels. Trained over multiple datasets and evaluated on 2,591 real-world crashes and near-crashes, a basic GSSM using only instantaneous motion kinematics achieves an area under the precision-recall curve of 0.9, and secures a median time advance of 2.6 seconds to prevent potential collisions. Incorporating additional interaction patterns and contextual factors provides further performance gains. Across interaction scenarios such as rear-end, merging, and turning, GSSM consistently outperforms existing baselines in accuracy and timeliness. These results establish GSSM as a scalable, context-aware, and generalisable foundation to identify risky interactions before they become unavoidable, supporting proactive safety in autonomous driving systems and traffic incident management. Code and experiment data are openly accessible at https://github.com/Yiru-Jiao/GSSM.
comment: Equation (15) in the previous versions was wrong, which has been corrected since v4
Think, Act, Learn: A Framework for Autonomous Robotic Agents using Closed-Loop Large Language Models
The integration of Large Language Models (LLMs) into robotics has unlocked unprecedented capabilities in high-level task planning. However, most current systems operate in an open-loop fashion, where LLMs act as one-shot planners, rendering them brittle and unable to adapt to unforeseen circumstances in dynamic physical environments. To overcome this limitation, this paper introduces the "Think, Act, Learn" (T-A-L) framework, a novel architecture that enables an embodied agent to autonomously learn and refine its policies through continuous interaction. Our framework establishes a closed-loop cycle where an LLM first "thinks" by decomposing high-level commands into actionable plans. The robot then "acts" by executing these plans while gathering rich, multimodal sensory feedback. Critically, the "learn" module processes this feedback to facilitate LLM-driven self-reflection, allowing the agent to perform causal analysis on its failures and generate corrective strategies. These insights are stored in an experiential memory to guide future planning cycles. We demonstrate through extensive experiments in both simulation and the real world that our T-A-L agent significantly outperforms baseline methods, including open-loop LLMs, Behavioral Cloning, and traditional Reinforcement Learning. Our framework achieves over a 97% success rate on complex, long-horizon tasks, converges to a stable policy in an average of just 9 trials, and exhibits remarkable generalization to unseen tasks. This work presents a significant step towards developing more robust, adaptive, and truly autonomous robotic agents.
comment: 13 pages, 7 figures
RoboSafe: Safeguarding Embodied Agents via Executable Safety Logic
Embodied agents powered by vision-language models (VLMs) are increasingly capable of executing complex real-world tasks, yet they remain vulnerable to hazardous instructions that may trigger unsafe behaviors. Runtime safety guardrails, which intercept hazardous actions during task execution, offer a promising solution due to their flexibility. However, existing defenses often rely on static rule filters or prompt-level control, which struggle to address implicit risks arising in dynamic, temporally dependent, and context-rich environments. To address this, we propose RoboSafe, a hybrid reasoning runtime safeguard for embodied agents through executable predicate-based safety logic. RoboSafe integrates two complementary reasoning processes on a Hybrid Long-Short Safety Memory. We first propose a Backward Reflective Reasoning module that continuously revisits recent trajectories in short-term memory to infer temporal safety predicates and proactively triggers replanning when violations are detected. We then propose a Forward Predictive Reasoning module that anticipates upcoming risks by generating context-aware safety predicates from the long-term safety memory and the agent's multimodal observations. Together, these components form an adaptive, verifiable safety logic that is both interpretable and executable as code. Extensive experiments across multiple agents demonstrate that RoboSafe substantially reduces hazardous actions (-36.8% risk occurrence) compared with leading baselines, while maintaining near-original task performance. Real-world evaluations on physical robotic arms further confirm its practicality. Code will be released upon acceptance.
comment: 11 pages, 6 figures
Never-Ending Behavior-Cloning Agent for Robotic Manipulation
Relying on multi-modal observations, embodied robots (e.g., humanoid robots) could perform multiple robotic manipulation tasks in unstructured real-world environments. However, most language-conditioned behavior-cloning agents in robots still face existing long-standing challenges, i.e., 3D scene representation and human-level task learning, when adapting into a series of new tasks in practical scenarios. We here investigate these above challenges with NBAgent in embodied robots, a pioneering language-conditioned Never-ending Behavior-cloning Agent, which can continually learn observation knowledge of novel 3D scene semantics and robot manipulation skills from skill-shared and skill-specific attributes, respectively. Specifically, we propose a skill-shared semantic rendering module and a skill-shared representation distillation module to effectively learn 3D scene semantics from skill-shared attribute, further tackling 3D scene representation overlooking. Meanwhile, we establish a skill-specific evolving planner to perform manipulation knowledge decoupling, which can continually embed novel skill-specific knowledge like human from latent and low-rank space. Finally, we design a never-ending embodied robot manipulation benchmark, and expensive experiments demonstrate the significant performance of our method.
comment: 17 pages, 6 figures, 9 tables
Multiagent Systems
MASFIN: A Multi-Agent System for Decomposed Financial Reasoning and Forecasting NeurIPS 2025
Recent advances in large language models (LLMs) are transforming data-intensive domains, with finance representing a high-stakes environment where transparent and reproducible analysis of heterogeneous signals is essential. Traditional quantitative methods remain vulnerable to survivorship bias, while many AI-driven approaches struggle with signal integration, reproducibility, and computational efficiency. We introduce MASFIN, a modular multi-agent framework that integrates LLMs with structured financial metrics and unstructured news, while embedding explicit bias-mitigation protocols. The system leverages GPT-4.1-nano for reproducability and cost-efficient inference and generates weekly portfolios of 15-30 equities with allocation weights optimized for short-term performance. In an eight-week evaluation, MASFIN delivered a 7.33% cumulative return, outperforming the S&P 500, NASDAQ-100, and Dow Jones benchmarks in six of eight weeks, albeit with higher volatility. These findings demonstrate the promise of bias-aware, generative AI frameworks for financial forecasting and highlight opportunities for modular multi-agent design to advance practical, transparent, and reproducible approaches in quantitative finance.
comment: Accepted to the NeurIPS 2025 Workshop on Generative AI in Finance
Analyzing Code Injection Attacks on LLM-based Multi-Agent Systems in Software Development
Agentic AI and Multi-Agent Systems are poised to dominate industry and society imminently. Powered by goal-driven autonomy, they represent a powerful form of generative AI, marking a transition from reactive content generation into proactive multitasking capabilities. As an exemplar, we propose an architecture of a multi-agent system for the implementation phase of the software engineering process. We also present a comprehensive threat model for the proposed system. We demonstrate that while such systems can generate code quite accurately, they are vulnerable to attacks, including code injection. Due to their autonomous design and lack of humans in the loop, these systems cannot identify and respond to attacks by themselves. This paper analyzes the vulnerability of multi-agent systems and concludes that the coder-reviewer-tester architecture is more resilient than both the coder and coder-tester architectures, but is less efficient at writing code. We find that by adding a security analysis agent, we mitigate the loss in efficiency while achieving even better resiliency. We conclude by demonstrating that the security analysis agent is vulnerable to advanced code injection attacks, showing that embedding poisonous few-shot examples in the injected code can increase the attack success rate from 0% to 71.95%.
AI-Generated Code Is Not Reproducible (Yet): An Empirical Study of Dependency Gaps in LLM-Based Coding Agents
The rise of Large Language Models (LLMs) as coding agents promises to accelerate software development, but their impact on generated code reproducibility remains largely unexplored. This paper presents an empirical study investigating whether LLM-generated code can be executed successfully in a clean environment with only OS packages and using only the dependencies that the model specifies. We evaluate three state-of-the-art LLM coding agents (Claude Code, OpenAI Codex, and Gemini) across 300 projects generated from 100 standardized prompts in Python, JavaScript, and Java. We introduce a three-layer dependency framework (distinguishing between claimed, working, and runtime dependencies) to quantify execution reproducibility. Our results show that only 68.3% of projects execute out-of-the-box, with substantial variation across languages (Python 89.2%, Java 44.0%). We also find a 13.5 times average expansion from declared to actual runtime dependencies, revealing significant hidden dependencies.
SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents
Agentic reinforcement learning (RL) holds great promise for the development of autonomous agents under complex GUI tasks, but its scalability remains severely hampered by the verification of task completion. Existing task verification is treated as a passive, post-hoc process: a verifier (i.e., rule-based scoring script, reward or critic model, and LLM-as-a-Judge) analyzes the agent's entire interaction trajectory to determine if the agent succeeds. Such processing of verbose context that contains irrelevant, noisy history poses challenges to the verification protocols and therefore leads to prohibitive cost and low reliability. To overcome this bottleneck, we propose SmartSnap, a paradigm shift from this passive, post-hoc verification to proactive, in-situ self-verification by the agent itself. We introduce the Self-Verifying Agent, a new type of agent designed with dual missions: to not only complete a task but also to prove its accomplishment with curated snapshot evidences. Guided by our proposed 3C Principles (Completeness, Conciseness, and Creativity), the agent leverages its accessibility to the online environment to perform self-verification on a minimal, decisive set of snapshots. Such evidences are provided as the sole materials for a general LLM-as-a-Judge verifier to determine their validity and relevance. Experiments on mobile tasks across model families and scales demonstrate that our SmartSnap paradigm allows training LLM-driven agents in a scalable manner, bringing performance gains up to 26.08% and 16.66% respectively to 8B and 30B models. The synergizing between solution finding and evidence seeking facilitates the cultivation of efficient, self-verifying agents with competitive performance against DeepSeek V3.1 and Qwen3-235B-A22B.
AI Urban Scientist: Multi-Agent Collaborative Automation for Urban Research
Urban research aims to understand how cities operate and evolve as complex adaptive systems. With the rapid growth of urban data and analytical methodologies, the central challenge of the field has shifted from data availability to the integration of heterogeneous data into coherent, verifiable urban knowledge through multidisciplinary approaches. Recent advances in AI, particularly the emergence of large language models (LLMs), have enabled the development of AI scientists capable of autonomous reasoning, hypothesis generation, and data-driven experimentation, demonstrating substantial potential for autonomous urban research. However, most general-purpose AI systems remain misaligned with the domain-specific knowledge, methodological conventions, and inferential standards required in urban studies. Here, we introduce the AI Urban Scientist, a knowledge-driven multi-agent framework designed to support autonomous urban research. Grounded in hypotheses, peer-review feedback, datasets, and research methodologies distilled from large-scale prior studies, the system constructs structured domain knowledge that guides LLM-based agents to automatically generate hypotheses, identify and integrate multi-source urban datasets, conduct empirical analyses and simulations, and iteratively refine analytical methods. Through this process, the framework synthesizes new insights in urban science and accelerates the urban research lifecycle.
Computational Foundations for Strategic Coopetition: Formalizing Interdependence and Complementarity
Coopetition refers to simultaneous cooperation and competition among actors who "cooperate to grow the pie and compete to split it up." Modern socio-technical systems are characterized by strategic coopetition in which actors concomitantly cooperate to create value and compete to capture it. While conceptual modeling languages such as i* provide rich qualitative representations of strategic dependencies, they lack mechanisms for quantitative analysis of dynamic trade-offs. Conversely, classical game theory offers mathematical rigor but strips away contextual richness. This technical report bridges this gap by developing computational foundations that formalize two critical dimensions of coopetition: interdependence and complementarity. We ground interdependence in i* structural dependency analysis, translating depender-dependee-dependum relationships into quantitative interdependence coefficients through a structured translation framework. We formalize complementarity following Brandenburger and Nalebuff's Added Value concept, modeling synergistic value creation with validated parameterization. We integrate structural dependencies with bargaining power in value appropriation and introduce a game-theoretic formulation where Nash Equilibrium incorporates structural interdependence. Validation combines comprehensive experimental testing comprising over 22,000 trials across power and logarithmic value function specifications, demonstrating functional form robustness, with empirical application to the Samsung-Sony S-LCD joint venture (2004-2011). This technical report serves as the foundational reference for a coordinated research program examining strategic coopetition in multi-agent systems, with companion work addressing trust dynamics, collective action, and reciprocity mechanisms.
comment: 39 pages, 9 figures, Validation artifacts including source code available at https://github.com/vikpant/strategic-coopetition/tree/master/TR_validation/TR1_foundations
Towards Global Optimality in Cooperative MARL with the Transformation And Distillation Framework
Decentralized execution is one core demand in multi-agent reinforcement learning (MARL). Recently, most popular MARL algorithms have adopted decentralized policies to enable decentralized execution, and use gradient descent as the optimizer. However, there is hardly any theoretical analysis of these algorithms taking the optimization method into consideration, and we find that various popular MARL algorithms with decentralized policies are suboptimal in toy tasks when gradient descent is chosen as their optimization method. In this paper, we theoretically analyze two common classes of algorithms with decentralized policies -- multi-agent policy gradient methods and value-decomposition methods, and prove their suboptimality when gradient descent is used. To address the suboptimality issue, we propose the Transformation And Distillation (TAD) framework, which reformulates a multi-agent MDP as a special single-agent MDP with a sequential structure and enables decentralized execution by distilling the learned policy on the derived "single-agent" MDP. The approach is a two-stage learning paradigm that addresses the optimization problem in cooperative MARL, providing optimality guarantee with decent execution performance. Empirically, we implement TAD-PPO based on PPO, which can theoretically perform optimal policy learning in the finite multi-agent MDPs and shows significant outperformance on a large set of cooperative multi-agent tasks, from matrix game, hallway task, to StarCraft II, and football game.
MTTR-A: Measuring Cognitive Recovery Latency in Multi-Agent Systems
Reliability in multi-agent systems (MAS) built on large language models is increasingly limited by cognitive failures rather than infrastructure faults. Existing observability tools describe failures but do not quantify how quickly distributed reasoning recovers once coherence is lost. We introduce MTTR-A (Mean Time-to-Recovery for Agentic Systems), a runtime reliability metric that measures cognitive recovery latency in MAS. MTTR-A adapts classical dependability theory to agentic orchestration, capturing the time required to detect reasoning drift and restore coherent operation. We further define complementary metrics, including MTBF and a normalized recovery ratio (NRR), and establish theoretical bounds linking recovery latency to long-run cognitive uptime. Using a LangGraph-based benchmark with simulated drift and reflex recovery, we empirically demonstrate measurable recovery behavior across multiple reflex strategies. This work establishes a quantitative foundation for runtime cognitive dependability in distributed agentic systems.
comment: preprint
Systems and Control (EESS)
Optimal Placement of Data Centers to Support Power Distribution Networks Using Intelligent Algorithms with Economic Indicators
Data centers are among the fastest growing electricity consumers and can impose severe voltage drops and feeder losses when connected to weak distribution networks. This paper formulates a techno economic siting problem in which each candidate data center site is mapped to a bus of the distribution network and is assumed to deploy on site renewable generation and power electronic interfaces, resulting in a controllable net active power injection equivalent to distributed generation. A mixed integer nonlinear optimization model is developed to jointly select the connection bus and size the DG capacity while respecting network operating limits. The objective combines three normalized terms including active power losses, a voltage deviation index capturing profile quality, and investment cost derived from location dependent land price and unit DG cost. To address the discrete continuous search space, an intelligent genetic algorithm is embedded in a multi scenario decision framework with adaptive weight tuning. Three stakeholder scenarios prioritize losses, voltage quality, or techno economic balance, and additional balanced scenarios are generated automatically until the optimal bus decision converges. A case study on the IEEE 33 bus radial system demonstrates the effectiveness of the approach. The converged design selects bus 14 with 1.10 MW DG, reducing total losses from 202.67 kW to 129.37 kW while improving the minimum bus voltage to 0.933 per unit at a moderate investment cost of 1.33 MUSD. The proposed framework provides an interpretable pathway to integrate economic indicators into distribution aware data center siting.
comment: 7 pages, 4 figures, 4 tables
Optimal Trajectory Planning for Orbital Robot Rendezvous and Docking
Approaching a tumbling target safely is a critical challenge in space debris removal missions utilizing robotic manipulators onboard servicing satellites. In this work, we propose a trajectory planning method based on nonlinear optimization for a close-range rendezvous to bring a free-floating, rotating debris object in a two-dimensional plane into the manipulator's workspace, as a preliminary step for its capture. The proposed method introduces a dynamic keep-out sphere that adapts depending on the approach conditions, allowing for closer and safer access to the target. Furthermore, a control strategy is developed to reproduce the optimized trajectory using discrete ON/OFF thrusters, considering practical implementation constraints.
comment: Author's version of a manuscript accepted at the International Conference on Space Robotics 2025 (iSpaRo 2025). (c) IEEE
Low-cost Pyranometer-Based ANN Approach for MPPT in Solar PV Systems
This article presents a study on the application of artificial neural networks (ANNs) for maximum power point tracking (MPPT) in photovoltaic (PV) systems using low-cost pyranometer sensors. The proposed approach integrates pyranometers, temperature sensors, and an ANN to estimate the duty cycle of a DC/DC converter, enabling the system to consistently operate at its maximum power point. The strategy was implemented in the local control of a Cuk converter and experimentally validated against the conventional Perturb and Observe (P&O) method. Results demonstrate that the ANN-based technique, leveraging affordable sensor technology, achieves accurate MPPT performance with reduced fluctuations, enhancing the responsiveness and efficiency of PV tracking systems.
comment: License corrected. Content unchanged
Client-Aided Secure Two-Party Computation of Dynamic Controllers
In this paper, we propose a secure two-party computation protocol for dynamic controllers using a secret sharing scheme. The proposed protocol realizes outsourcing of controller computation to two servers, while controller parameters, states, inputs, and outputs are kept secret against the servers. Unlike previous encrypted controls in a single-server setting, the proposed method can operate a dynamic controller for an infinite time horizon without controller state decryption or input re-encryption. We show that the control performance achievable by the proposed protocol can be made arbitrarily close to that attained by the unencrypted controller. Furthermore, system-theoretic and cryptographic modifications of the protocol are presented to improve the communication complexity. The feasibility of the protocol is demonstrated through numerical examples of PID and observer-based controls.
comment: 12 pages, 4 figures
Differentially Private Gradient-Tracking-Based Distributed Stochastic Optimization over Directed Graphs
This paper proposes a differentially private gradient-tracking-based distributed stochastic optimization algorithm over directed graphs. In particular, privacy noises are incorporated into each agent's state and tracking variable to mitigate information leakage, after which the perturbed states and tracking variables are transmitted to neighbors. We design two novel schemes for the step-sizes and the sampling number within the algorithm. The sampling parameter-controlled subsampling method employed by both schemes enhances the differential privacy level, and ensures a finite cumulative privacy budget even over infinite iterations. The algorithm achieves both almost sure and mean square convergence for nonconvex objectives. Furthermore, when nonconvex objectives satisfy the Polyak-Lojasiewicz condition, Scheme (S1) achieves a polynomial mean square convergence rate, and Scheme (S2) achieves an exponential mean square convergence rate. The trade-off between privacy and convergence is presented. The effectiveness of the algorithm and its superior performance compared to existing works are illustrated through numerical examples of distributed training on the benchmark datasets "MNIST" and "CIFAR-10".
Improving Requirements Classification with SMOTE-Tomek Preprocessing
This study emphasizes the domain of requirements engineering by applying the SMOTE-Tomek preprocessing technique, combined with stratified K-fold cross-validation, to address class imbalance in the PROMISE dataset. This dataset comprises 969 categorized requirements, classified into functional and non-functional types. The proposed approach enhances the representation of minority classes while maintaining the integrity of validation folds, leading to a notable improvement in classification accuracy. Logistic regression achieved 76.16\%, significantly surpassing the baseline of 58.31\%. These results highlight the applicability and efficiency of machine learning models as scalable and interpretable solutions.
comment: 21 pages, 5 figures, Preprint
MTTR-A: Measuring Cognitive Recovery Latency in Multi-Agent Systems
Reliability in multi-agent systems (MAS) built on large language models is increasingly limited by cognitive failures rather than infrastructure faults. Existing observability tools describe failures but do not quantify how quickly distributed reasoning recovers once coherence is lost. We introduce MTTR-A (Mean Time-to-Recovery for Agentic Systems), a runtime reliability metric that measures cognitive recovery latency in MAS. MTTR-A adapts classical dependability theory to agentic orchestration, capturing the time required to detect reasoning drift and restore coherent operation. We further define complementary metrics, including MTBF and a normalized recovery ratio (NRR), and establish theoretical bounds linking recovery latency to long-run cognitive uptime. Using a LangGraph-based benchmark with simulated drift and reflex recovery, we empirically demonstrate measurable recovery behavior across multiple reflex strategies. This work establishes a quantitative foundation for runtime cognitive dependability in distributed agentic systems.
comment: preprint
Robotics
HELP: Hierarchical Embodied Language Planner for Household Tasks
Embodied agents tasked with complex scenarios, whether in real or simulated environments, rely heavily on robust planning capabilities. When instructions are formulated in natural language, large language models (LLMs) equipped with extensive linguistic knowledge can play this role. However, to effectively exploit the ability of such models to handle linguistic ambiguity, to retrieve information from the environment, and to be based on the available skills of an agent, an appropriate architecture must be designed. We propose a Hierarchical Embodied Language Planner, called HELP, consisting of a set of LLM-based agents, each dedicated to solving a different subtask. We evaluate the proposed approach on a household task and perform real-world experiments with an embodied agent. We also focus on the use of open source LLMs with a relatively small number of parameters, to enable autonomous deployment.
MAction-SocialNav: Multi-Action Socially Compliant Navigation via Reasoning-enhanced Prompt Tuning
Socially compliant navigation requires robots to move safely and appropriately in human-centered environments by respecting social norms. However, social norms are often ambiguous, and in a single scenario, multiple actions may be equally acceptable. Most existing methods simplify this problem by assuming a single correct action, which limits their ability to handle real-world social uncertainty. In this work, we propose MAction-SocialNav, an efficient vision language model for socially compliant navigation that explicitly addresses action ambiguity, enabling generating multiple plausible actions within one scenario. To enhance the model's reasoning capability, we introduce a novel meta-cognitive prompt (MCP) method. Furthermore, to evaluate the proposed method, we curate a multi-action socially compliant navigation dataset that accounts for diverse conditions, including crowd density, indoor and outdoor environments, and dual human annotations. The dataset contains 789 samples, each with three-turn conversation, split into 710 training samples and 79 test samples through random selection. We also design five evaluation metrics to assess high-level decision precision, safety, and diversity. Extensive experiments demonstrate that the proposed MAction-SocialNav achieves strong social reasoning performance while maintaining high efficiency, highlighting its potential for real-world human robot navigation. Compared with zero-shot GPT-4o and Claude, our model achieves substantially higher decision quality (APG: 0.595 vs. 0.000/0.025) and safety alignment (ER: 0.264 vs. 0.642/0.668), while maintaining real-time efficiency (1.524 FPS, over 3x faster).
Structural Induced Exploration for Balanced and Scalable Multi-Robot Path Planning
Multi-robot path planning is a fundamental yet challenging problem due to its combinatorial complexity and the need to balance global efficiency with fair task allocation among robots. Traditional swarm intelligence methods, although effective on small instances, often converge prematurely and struggle to scale to complex environments. In this work, we present a structure-induced exploration framework that integrates structural priors into the search process of the ant colony optimization (ACO). The approach leverages the spatial distribution of the task to induce a structural prior at initialization, thereby constraining the search space. The pheromone update rule is then designed to emphasize structurally meaningful connections and incorporates a load-aware objective to reconcile the total travel distance with individual robot workload. An explicit overlap suppression strategy further ensures that tasks remain distinct and balanced across the team. The proposed framework was validated on diverse benchmark scenarios covering a wide range of instance sizes and robot team configurations. The results demonstrate consistent improvements in route compactness, stability, and workload distribution compared to representative metaheuristic baselines. Beyond performance gains, the method also provides a scalable and interpretable framework that can be readily applied to logistics, surveillance, and search-and-rescue applications where reliable large-scale coordination is essential.
comment: 20pages, 6Figues
AstraNav-Memory: Contexts Compression for Long Memory
Lifelong embodied navigation requires agents to accumulate, retain, and exploit spatial-semantic experience across tasks, enabling efficient exploration in novel environments and rapid goal reaching in familiar ones. While object-centric memory is interpretable, it depends on detection and reconstruction pipelines that limit robustness and scalability. We propose an image-centric memory framework that achieves long-term implicit memory via an efficient visual context compression module end-to-end coupled with a Qwen2.5-VL-based navigation policy. Built atop a ViT backbone with frozen DINOv3 features and lightweight PixelUnshuffle+Conv blocks, our visual tokenizer supports configurable compression rates; for example, under a representative 16$\times$ compression setting, each image is encoded with about 30 tokens, expanding the effective context capacity from tens to hundreds of images. Experimental results on GOAT-Bench and HM3D-OVON show that our method achieves state-of-the-art navigation performance, improving exploration in unfamiliar environments and shortening paths in familiar ones. Ablation studies further reveal that moderate compression provides the best balance between efficiency and accuracy. These findings position compressed image-centric memory as a practical and scalable interface for lifelong embodied agents, enabling them to reason over long visual histories and navigate with human-like efficiency.
SymDrive: Realistic and Controllable Driving Simulator via Symmetric Auto-regressive Online Restoration
High-fidelity and controllable 3D simulation is essential for addressing the long-tail data scarcity in Autonomous Driving (AD), yet existing methods struggle to simultaneously achieve photorealistic rendering and interactive traffic editing. Current approaches often falter in large-angle novel view synthesis and suffer from geometric or lighting artifacts during asset manipulation. To address these challenges, we propose SymDrive, a unified diffusion-based framework capable of joint high-quality rendering and scene editing. We introduce a Symmetric Auto-regressive Online Restoration paradigm, which constructs paired symmetric views to recover fine-grained details via a ground-truth-guided dual-view formulation and utilizes an auto-regressive strategy for consistent lateral view generation. Furthermore, we leverage this restoration capability to enable a training-free harmonization mechanism, treating vehicle insertion as context-aware inpainting to ensure seamless lighting and shadow consistency. Extensive experiments demonstrate that SymDrive achieves state-of-the-art performance in both novel-view enhancement and realistic 3D vehicle insertion.
World-Coordinate Human Motion Retargeting via SAM 3D Body
Recovering world-coordinate human motion from monocular videos with humanoid robot retargeting is significant for embodied intelligence and robotics. To avoid complex SLAM pipelines or heavy temporal models, we propose a lightweight, engineering-oriented framework that leverages SAM 3D Body (3DB) as a frozen perception backbone and uses the Momentum HumanRig (MHR) representation as a robot-friendly intermediate. Our method (i) locks the identity and skeleton-scale parameters of per tracked subject to enforce temporally consistent bone lengths, (ii) smooths per-frame predictions via efficient sliding-window optimization in the low-dimensional MHR latent space, and (iii) recovers physically plausible global root trajectories with a differentiable soft foot-ground contact model and contact-aware global optimization. Finally, we retarget the reconstructed motion to the Unitree G1 humanoid using a kinematics-aware two-stage inverse kinematics pipeline. Results on real monocular videos show that our method has stable world trajectories and reliable robot retargeting, indicating that structured human representations with lightweight physical constraints can yield robot-ready motion from monocular input.
A Novel Robotic Variable Stiffness Mechanism Based on Helically Wound Structured Electrostatic Layer Jamming
This paper introduces a novel variable stiffness mechanism termed Helically Wound Structured Electrostatic Layer Jamming (HWS-ELJ) and systematically investigates its potential applications in variable stiffness robotic finger design. The proposed method utilizes electrostatic attraction to enhance interlayer friction, thereby suppressing relative sliding and enabling tunable stiffness. Compared with conventional planar ELJ, the helical configuration of HWS-ELJ provides exponentially increasing stiffness adjustment with winding angle, achieving significantly greater stiffness enhancement for the same electrode contact area while reducing the required footprint under equivalent stiffness conditions. Considering the practical advantage of voltage-based control, a series of experimental tests under different initial force conditions were conducted to evaluate the stiffness modulation characteristics of HWS-ELJ. The results demonstrated its rational design and efficacy, with outcomes following the deduced theoretical trends. Furthermore, a robotic finger prototype integrating HWS-ELJ was developed, demonstrating voltage-driven stiffness modulation and confirming the feasibility of the proposed robotic variable stiffness mechanism.
Spatiotemporal Tubes for Probabilistic Temporal Reach-Avoid-Stay Task in Uncertain Dynamic Environment
In this work, we extend the Spatiotemporal Tube (STT) framework to address Probabilistic Temporal Reach-Avoid-Stay (PrT-RAS) tasks in dynamic environments with uncertain obstacles. We develop a real-time tube synthesis procedure that explicitly accounts for time-varying uncertain obstacles and provides formal probabilistic safety guarantees. The STT is formulated as a time-varying ball in the state space whose center and radius evolve online based on uncertain sensory information. We derive a closed-form, approximation-free control law that confines the system trajectory within the tube, ensuring both probabilistic safety and task satisfaction. Our method offers a formal guarantee for probabilistic avoidance and finite-time task completion. The resulting controller is model-free, approximation-free, and optimization-free, enabling efficient real-time execution while guaranteeing convergence to the target. The effectiveness and scalability of the framework are demonstrated through simulation studies and hardware experiments on mobile robots, a UAV, and a 7-DOF manipulator navigating in cluttered and uncertain environments.
World-Coordinate Human Motion Retargeting via SAM 3D Body
Recovering world-coordinate human motion from monocular videos with humanoid robot retargeting is significant for embodied intelligence and robotics. To avoid complex SLAM pipelines or heavy temporal models, we propose a lightweight, engineering-oriented framework that leverages SAM 3D Body (3DB) as a frozen perception backbone and uses the Momentum HumanRig (MHR) representation as a robot-friendly intermediate. Our method (i) locks the identity and skeleton-scale parameters of per tracked subject to enforce temporally consistent bone lengths, (ii) smooths per-frame predictions via efficient sliding-window optimization in the low-dimensional MHR latent space, and (iii) recovers physically plausible global root trajectories with a differentiable soft foot-ground contact model and contact-aware global optimization. Finally, we retarget the reconstructed motion to the Unitree G1 humanoid using a kinematics-aware two-stage inverse kinematics pipeline. Results on real monocular videos show that our method has stable world trajectories and reliable robot retargeting, indicating that structured human representations with lightweight physical constraints can yield robot-ready motion from monocular input.
Compliant Residual DAgger: Improving Real-World Contact-Rich Manipulation with Human Corrections
We address key challenges in Dataset Aggregation (DAgger) for real-world contact-rich manipulation: how to collect informative human correction data and how to effectively update policies with this new data. We introduce Compliant Residual DAgger (CR-DAgger), which contains two novel components: 1) a Compliant Intervention Interface that leverages compliance control, allowing humans to provide gentle, accurate delta action corrections without interrupting the ongoing robot policy execution; and 2) a Compliant Residual Policy formulation that learns from human corrections while incorporating force feedback and force control. Our system significantly enhances performance on precise contact-rich manipulation tasks using minimal correction data, improving base policy success rates by 64% on four challenging tasks (book flipping, belt assembly, cable routing, and gear insertion) while outperforming both retraining-from-scratch and finetuning approaches. Through extensive real-world experiments, we provide practical guidance for implementing effective DAgger in real-world robot learning tasks. Result videos are available at: https://compliant-residual-dagger.github.io
Planar Juggling of a Devil-Stick using Discrete VHCs
Planar juggling of a devil-stick using impulsive inputs is addressed using the concept of discrete virtual holonomic constraints (DVHC). The location of the center-of-mass of the devil-stick is specified in terms of its orientation at the discrete instants when impulsive control inputs are applied. The discrete zero dynamics (DZD) resulting from the choice of DVHC provides conditions for stable juggling. A control design that enforces the DVHC and an orbit stabilizing controller are presented. The approach is validated in simulation.
comment: 9 pages, 7 figures; this is an extended version of the article published in the IEEE Control Systems Letters
How Robot Dogs See the Unseeable: Improving Visual Interpretability via Peering for Exploratory Robots
Occlusion from obstacles, such as foliage, can severely obstruct a robot's sensors, impairing scene understanding. We show that "peering", a characteristic side-to-side movement used by insects to overcome their visual limitations, can also allow robots to markedly improve visual reasoning under partial occlusion. This is accomplished by applying core signal processing principles, specifically optical synthetic aperture sensing, together with the vision reasoning capabilities of modern large multimodal models. Peering enables real-time, high-resolution, and wavelength-independent perception, which is crucial for vision-based scene understanding across a wide range of applications. The approach is low-cost and immediately deployable on any camera-equipped robot. We investigated different peering motions and occlusion masking strategies, demonstrating that, unlike peering, state-of-the-art multi-view 3D vision techniques fail in these conditions due to their high susceptibility to occlusion. Robots that see through occlusion will gain superior perception abilities - including enhanced scene understanding, situational awareness, camouflage breaking, and advanced navigation.
UniTacHand: Unified Spatio-Tactile Representation for Human to Robotic Hand Skill Transfer
Tactile sensing is crucial for robotic hands to achieve human-level dexterous manipulation, especially in scenarios with visual occlusion. However, its application is often hindered by the difficulty of collecting large-scale real-world robotic tactile data. In this study, we propose to collect low-cost human manipulation data using haptic gloves for tactile-based robotic policy learning. The misalignment between human and robotic tactile data makes it challenging to transfer policies learned from human data to robots. To bridge this gap, we propose UniTacHand, a unified representation to align robotic tactile information captured by dexterous hands with human hand touch obtained from gloves. First, we project tactile signals from both human hands and robotic hands onto a morphologically consistent 2D surface space of the MANO hand model. This unification standardizes the heterogeneous data structures and inherently embeds the tactile signals with spatial context. Then, we introduce a contrastive learning method to align them into a unified latent space, trained on only 10 minutes of paired data from our data collection system. Our approach enables zero-shot tactile-based policy transfer from humans to a real robot, generalizing to objects unseen in the pre-training data. We also demonstrate that co-training on mixed data, including both human and robotic demonstrations via UniTacHand, yields better performance and data efficiency compared with using only robotic data. UniTacHand paves a path toward general, scalable, and data-efficient learning for tactile-based dexterous hands.
FAR-AVIO: Fast and Robust Schur-Complement Based Acoustic-Visual-Inertial Fusion Odometry with Sensor Calibration
Underwater environments impose severe challenges to visual-inertial odometry systems, as strong light attenuation, marine snow and turbidity, together with weakly exciting motions, degrade inertial observability and cause frequent tracking failures over long-term operation. While tightly coupled acoustic-visual-inertial fusion, typically implemented through an acoustic Doppler Velocity Log (DVL) integrated with visual-inertial measurements, can provide accurate state estimation, the associated graph-based optimization is often computationally prohibitive for real-time deployment on resource-constrained platforms. Here we present FAR-AVIO, a Schur-Complement based, tightly coupled acoustic-visual-inertial odometry framework tailored for underwater robots. FAR-AVIO embeds a Schur complement formulation into an Extended Kalman Filter(EKF), enabling joint pose-landmark optimization for accuracy while maintaining constant-time updates by efficiently marginalizing landmark states. On top of this backbone, we introduce Adaptive Weight Adjustment and Reliability Evaluation(AWARE), an online sensor health module that continuously assesses the reliability of visual, inertial and DVL measurements and adaptively regulates their sigma weights, and we develop an efficient online calibration scheme that jointly estimates DVL-IMU extrinsics, without dedicated calibration manoeuvres. Numerical simulations and real-world underwater experiments consistently show that FAR-AVIO outperforms state-of-the-art underwater SLAM baselines in both localization accuracy and computational efficiency, enabling robust operation on low-power embedded platforms. Our implementation has been released as open source software at https://far-vido.gitbook.io/far-vido-docs.
VTAO-BiManip: Masked Visual-Tactile-Action Pre-training with Object Understanding for Bimanual Dexterous Manipulation
Bimanual dexterous manipulation remains significant challenges in robotics due to the high DoFs of each hand and their coordination. Existing single-hand manipulation techniques often leverage human demonstrations to guide RL methods but fail to generalize to complex bimanual tasks involving multiple sub-skills. In this paper, we introduce VTAO-BiManip, a novel framework that combines visual-tactile-action pretraining with object understanding to facilitate curriculum RL to enable human-like bimanual manipulation. We improve prior learning by incorporating hand motion data, providing more effective guidance for dual-hand coordination than binary tactile feedback. Our pretraining model predicts future actions as well as object pose and size using masked multimodal inputs, facilitating cross-modal regularization. To address the multi-skill learning challenge, we introduce a two-stage curriculum RL approach to stabilize training. We evaluate our method on a bottle-cap unscrewing task, demonstrating its effectiveness in both simulated and real-world environments. Our approach achieves a success rate that surpasses existing visual-tactile pretraining methods by over 20%.
Motus: A Unified Latent Action World Model
While a general embodied agent must function as a unified system, current methods are built on isolated models for understanding, world modeling, and control. This fragmentation prevents unifying multimodal generative capabilities and hinders learning from large-scale, heterogeneous data. In this paper, we propose Motus, a unified latent action world model that leverages existing general pretrained models and rich, sharable motion information. Motus introduces a Mixture-of-Transformer (MoT) architecture to integrate three experts (i.e., understanding, video generation, and action) and adopts a UniDiffuser-style scheduler to enable flexible switching between different modeling modes (i.e., world models, vision-language-action models, inverse dynamics models, video generation models, and video-action joint prediction models). Motus further leverages the optical flow to learn latent actions and adopts a recipe with three-phase training pipeline and six-layer data pyramid, thereby extracting pixel-level "delta action" and enabling large-scale action pretraining. Experiments show that Motus achieves superior performance against state-of-the-art methods in both simulation (a +15% improvement over X-VLA and a +45% improvement over Pi0.5) and real-world scenarios(improved by +11~48%), demonstrating unified modeling of all functionalities and priors significantly benefits downstream robotic tasks.
PlanScope: Learning to Plan Within Decision Scope for Urban Autonomous Driving
In the context of urban autonomous driving, imitation learning-based methods have shown remarkable effectiveness, with a typical practice to minimize the discrepancy between expert driving logs and predictive decision sequences. As expert driving logs natively contain future short-term decisions with respect to events, such as sudden obstacles or rapidly changing traffic signals. We believe that unpredictable future events and corresponding expert reactions can introduce reasoning disturbances, negatively affecting the convergence efficiency of planning models. At the same time, long-term decision information, such as maintaining a reference lane or avoiding stationary obstacles, is essential for guiding short-term decisions. Our preliminary experiments on shortening the planning horizon show a rise-and-fall trend in driving performance, supporting these hypotheses. Based on these insights, we present PlanScope, a sequential-decision-learning framework with novel techniques for separating short-term and long-term decisions in decision logs. To identify and extract each decision component, the Wavelet Transform on trajectory profiles is proposed. After that, to enhance the detail-generating ability of Neural Networks, extra Detail Decoders are proposed. Finally, to enable in-scope decision supervision across detail levels, Multi-Scope Supervision strategies are adopted during training. The proposed methods, especially the time-dependent normalization, outperform baseline models in closed-loop evaluations on the nuPlan dataset, offering a plug-and-play solution to enhance existing planning models.
Design and Research of a Self-Propelled Pipeline Robot Based on Force Analysis and Dynamic Simulation
In pipeline inspection, traditional tethered inspection robots are severely constrained by cable length and weight, which greatly limit their travel range and accessibility. To address these issues, this paper proposes a self-propelled pipeline robot design based on force analysis and dynamic simulation, with a specific focus on solving core challenges including vertical climbing failure and poor passability in T-branch pipes. Adopting a wheeled configuration and modular design, the robot prioritizes the core demand of body motion control. Specifically, 3D modeling of the robot was first completed using SolidWorks. Subsequently, the model was imported into ADAMS for dynamic simulation, which provided a basis for optimizing the drive module and motion control strategy.To verify the robot's dynamic performance, an experimental platform with acrylic pipes was constructed. Through adjusting its body posture to surmount obstacles and select directions, the robot has demonstrated its ability to stably traverse various complex pipeline scenarios. Notably, this work offers a technical feasibility reference for the application of pipeline robots in the inspection of medium and low-pressure urban gas pipelines.
comment: The article needs further revision
Research on Dead Reckoning Algorithm for Self-Propelled Pipeline Robots in Three-Dimensional Complex Pipelines
In the field of gas pipeline location, existing pipeline location methods mostly rely on pipeline location instruments. However, when faced with complex and curved pipeline scenarios, these methods often fail due to problems such as cable entanglement and insufficient equipment flexibility. To address this pain point, we designed a self-propelled pipeline robot. This robot can autonomously complete the location work of complex and curved pipelines in complex pipe networks without external dragging. In terms of pipeline mapping technology, traditional visual mapping and laser mapping methods are easily affected by lighting conditions and insufficient features in the confined space of pipelines, resulting in mapping drift and divergence problems. In contrast, the pipeline location method that integrates inertial navigation and wheel odometers is less affected by pipeline environmental factors. Based on this, this paper proposes a pipeline robot location method based on extended Kalman filtering (EKF). Firstly, the body attitude angle is initially obtained through an inertial measurement unit (IMU). Then, the extended Kalman filtering algorithm is used to improve the accuracy of attitude angle estimation. Finally, high-precision pipeline location is achieved by combining wheel odometers. During the testing phase, the roll wheels of the pipeline robot needed to fit tightly against the pipe wall to reduce slippage. However, excessive tightness would reduce the flexibility of motion control due to excessive friction. Therefore, a balance needed to be struck between the robot's motion capability and positioning accuracy. Experiments were conducted using the self-propelled pipeline robot in a rectangular loop pipeline, and the results verified the effectiveness of the proposed dead reckoning algorithm.
comment: The paper needs further revision
Multiagent Systems
Multi-agent Adaptive Mechanism Design
We study a sequential mechanism design problem in which a principal seeks to elicit truthful reports from multiple rational agents while starting with no prior knowledge of agents' beliefs. We introduce Distributionally Robust Adaptive Mechanism (DRAM), a general framework combining insights from both mechanism design and online learning to jointly address truthfulness and cost-optimality. Throughout the sequential game, the mechanism estimates agents' beliefs and iteratively updates a distributionally robust linear program with shrinking ambiguity sets to reduce payments while preserving truthfulness. Our mechanism guarantees truthful reporting with high probability while achieving $\tilde{O}(\sqrt{T})$ cumulative regret, and we establish a matching lower bound showing that no truthful adaptive mechanism can asymptotically do better. The framework generalizes to plug-in estimators, supporting structured priors and delayed feedback. To our knowledge, this is the first adaptive mechanism under general settings that maintains truthfulness and achieves optimal regret when incentive constraints are unknown and must be learned.
PERELMAN: Pipeline for scientific literature meta-analysis. Technical report
We present PERELMAN (PipEline foR sciEntific Literature Meta-ANalysis), an agentic framework designed to extract specific information from a large corpus of scientific articles to support large-scale literature reviews and meta-analyses. Our central goal is to reliably transform heterogeneous article content into a unified, machine-readable representation. PERELMAN first elicits domain knowledge-including target variables, inclusion criteria, units, and normalization rules-through a structured dialogue with a subject-matter expert. This domain knowledge is then reused across multiple stages of the pipeline and guides coordinated agents in extracting evidence from narrative text, tables, and figures, enabling consistent aggregation across studies. In order to assess reproducibility and validate our implementation, we evaluate the system on the task of reproducing the meta-analysis of layered Li-ion cathode properties (NMC811 material). We describe our solution, which has the potential to reduce the time required to prepare meta-analyses from months to minutes.
Democratizing Drug Discovery with an Orchestrated, Knowledge-Driven Multi-Agent Team for User-Guided Therapeutic Design
Therapeutic discovery remains a formidable challenge, impeded by the fragmentation of specialized domains and the execution gap between computational design and physiological validation. Although generative AI offers promise, current models often function as passive assistants rather than as autonomous executors. Here, we introduce OrchestRA, a human-in-the-loop multi-agent platform that unifies biology, chemistry, and pharmacology into an autonomous discovery engine. Unlike static code generators, our agents actively execute simulations and reason the results to drive iterative optimization. Governed by an Orchestrator, a Biologist Agent leverages deep reasoning over a massive knowledge graph (>10 million associations) to pinpoint high-confidence targets; a Chemist Agent autonomously detects structural pockets for de novo design or drug repositioning; and a Pharmacologist Agent evaluates candidates via rigorous physiologically based pharmacokinetic (PBPK) simulations. This architecture establishes a dynamic feedback loop where pharmacokinetic and toxicity profiles directly trigger structural reoptimization. By seamlessly integrating autonomous execution with human guidance, OrchestRA democratizes therapeutic design, transforming drug discovery from a stochastic search to a programmable evidence-based engineering discipline.
comment: 51 pages, 4 figures (with supplementary information)
The AI Committee: A Multi-Agent Framework for Automated Validation and Remediation of Web-Sourced Data
Many research areas rely on data from the web to gain insights and test their methods. However, collecting comprehensive research datasets often demands manually reviewing many web pages to identify and record relevant data points, which is labor-intensive and susceptible to error. While the emergence of large language models (LLM)-powered web agents has begun to automate parts of this process, they often struggle to ensure the validity of the data they collect. Indeed, these agents exhibit several recurring failure modes - including hallucinating or omitting values, misinterpreting page semantics, and failing to detect invalid information - which are subtle and difficult to detect and correct manually. To address this, we introduce the AI Committee, a novel model-agnostic multi-agent system that automates the process of validating and remediating web-sourced datasets. Each agent is specialized in a distinct task in the data quality assurance pipeline, from source scrutiny and fact-checking to data remediation and integrity validation. The AI Committee leverages various LLM capabilities - including in-context learning for dataset adaptation, chain-of-thought reasoning for complex semantic validation, and a self-correction loop for data remediation - all without task-specific training. We demonstrate the effectiveness of our system by applying it to three real-world datasets, showing that it generalizes across LLMs and significantly outperforms baseline approaches, achieving data completeness up to 78.7% and precision up to 100%. We additionally conduct an ablation study demonstrating the contribution of each agent to the Committee's performance. This work is released as an open-source tool for the research community.
A Plan Reuse Mechanism for LLM-Driven Agent
Integrating large language models (LLMs) into personal assistants, like Xiao Ai and Blue Heart V, effectively enhances their ability to interact with humans, solve complex tasks, and manage IoT devices. Such assistants are also termed LLM-driven agents. Upon receiving user requests, the LLM-driven agent generates plans using an LLM, executes these plans through various tools, and then returns the response to the user. During this process, the latency for generating a plan with an LLM can reach tens of seconds, significantly degrading user experience. Real-world dataset analysis shows that about 30% of the requests received by LLM-driven agents are identical or similar, which allows the reuse of previously generated plans to reduce latency. However, it is difficult to accurately define the similarity between the request texts received by the LLM-driven agent through directly evaluating the original request texts. Moreover, the diverse expressions of natural language and the unstructured format of plan texts make implementing plan reuse challenging. To address these issues, we present and implement a plan reuse mechanism for LLM-driven agents called AgentReuse. AgentReuse leverages the similarities and differences among requests' semantics and uses intent classification to evaluate the similarities between requests and enable the reuse of plans. Experimental results based on a real-world dataset demonstrate that AgentReuse achieves a 93% effective plan reuse rate, an F1 score of 0.9718, and an accuracy of 0.9459 in evaluating request similarities, reducing latency by 93.12% compared with baselines without using the reuse mechanism.
comment: This paper is an English version of A Plan Reuse Mechanism for LLM-Driven Agent published in 2024 in the Journal of Computer Research and Development
Systems and Control (EESS)
Smart IoT-Based Leak Forecasting and Detection for Energy-Efficient Liquid Cooling in AI Data Centers
AI data centers which are GPU centric, have adopted liquid cooling to handle extreme heat loads, but coolant leaks result in substantial energy loss through unplanned shutdowns and extended repair periods. We present a proof-of-concept smart IoT monitoring system combining LSTM neural networks for probabilistic leak forecasting with Random Forest classifiers for instant detection. Testing on synthetic data aligned with ASHRAE 2021 standards, our approach achieves 96.5% detection accuracy and 87% forecasting accuracy at 90% probability within plus or minus 30-minute windows. Analysis demonstrates that humidity, pressure, and flow rate deliver strong predictive signals, while temperature exhibits minimal immediate response due to thermal inertia in server hardware. The system employs MQTT streaming, InfluxDB storage, and Streamlit dashboards, forecasting leaks 2-4 hours ahead while identifying sudden events within 1 minute. For a typical 47-rack facility, this approach could prevent roughly 1,500 kWh annual energy waste through proactive maintenance rather than reactive emergency procedures. While validation remains synthetic-only, results establish feasibility for future operational deployment in sustainable data center operations.
comment: 7 pages, 6 figures, IEEE conference format
Economic and Reliability Value of Improved Offshore Wind Forecasting in Bulk Power Grid Operation: A Case Study of The New York Power Grid
This study investigates the economic and reliability benefits of improved offshore wind forecasting for grid operations along the U.S. East Coast. We introduce and evaluate a state-of-the-art, machine-learning-based offshore wind forecasting model tailored for this region by integrating its improved forecasts into a dynamic reserve procurement framework aligned with New York Independent System Operator (NYISO) practices to evaluate their economic value. To determine system-wide reserve needs, plant-specific reserves are aggregated. However, conventional methods overlook spatial correlation across sites, often leading to over procurement. To address this, we propose a risk-based reserve aggregation technique that leverages spatial diversification. Additionally, we evaluate the reliability improvements enabled by the enhanced offshore wind forecast. To evaluate the operational impact, we propose an operational resource adequacy framework that captures uncertainty from forecast errors and grid conditions. Using this framework, we quantify key reliability metrics under different offshore wind forecast scenarios. Using New York State as a case study, we find that the improved forecast enables more accurate reserve estimation, reducing procurement costs by 5.53% in 2035 scenario compared to a well-validated numerical weather prediction model. Applying the risk-based aggregation further reduces total production costs by 7.21%. From a reliability perspective, the improved forecasts lower the system Loss of Load Probability (LOLP) by approximately 19% in the 2035 scenario, highlighting its potential to enhance system reliability during real-time grid operations.
comment: Submitted to Applied Energy
Asynchronous Averaging on Dynamic Graphs with Selective Neighborhood Contraction
We study a discrete-time consensus model in which agents iteratively update their states through interactions on a dynamic social network. At each step, a single agent is selected asynchronously and averages the values of its current neighbors. A distinctive feature of our model is that an agent's neighborhood may contract following an update, while non-selected agents may add or remove neighbors independently. This creates a time-varying communication structure with endogenous contraction. We show that under mild assumptions--specifically, that the evolving graph is connected infinitely often--the system reaches consensus almost surely. Our results extend classical consensus theory on time-varying graphs and asynchronous updates by introducing selective neighborhood contraction, offering new insights into agreement dynamics in evolving social systems.
comment: 10 pages, 12 figures
Towards Learning-Based Formula 1 Race Strategies
This paper presents two complementary frameworks to optimize Formula 1 race strategies, jointly accounting for energy allocation, tire wear and pit stop timing. First, the race scenario is modeled using lap time maps and a dynamic tire wear model capturing the main trade-offs arising during a race. Then, we solve the problem by means of a mixed-integer nonlinear program that handles the integer nature of the pit stop decisions. The same race scenario is embedded into a reinforcement learning environment, on which an agent is trained. Providing fast inference at runtime, this method is suited to improve human decision-making during real races. The learned policy's suboptimality is assessed with respect to the optimal solution, both in a nominal scenario and with an unforeseen disturbance. In both cases, the agent achieves approximately 5s of suboptimality on 1.5h of race time, mainly attributable to the different energy allocation strategy. This work lays the foundations for learning-based race strategies and provides a benchmark for future developments.
Dynamic Cooperative Strategies in Search Engine Advertising Market: With and Without Retail Competition
In search engine advertising (SEA) market, where competition among retailers is intense and multifaceted, channel coordination between retailers and manufacturers emerges as a critical factor, which significantly influences the effectiveness of advertising strategies. This research attempts to provide managerial guidelines for cooperative advertising in the SEA context by modeling two cooperative advertising decision scenarios. Scenario I defines a simple cooperative channel consisting of one manufacturer and one retailer. In Scenario II, we consider a more general setting where there is an independent retailer who competes with the Manufacturer-Retailer alliance in Scenario I. We propose a novel cooperative advertising optimization model, wherein a manufacturer can advertise product directly through SEA campaigns and indirectly by subsidizing its retailer. To highlight the distinctive features of SEA, our model incorporates dynamic quality scores and focuses on a finite time horizon. In each scenario, we provide a feasible equilibrium solution of optimal policies for all members. Subsequently, we conduct numerical experiments to perform sensitivity analysis for both the quality score and gross margin. Additionally, we explore the impact of the initial market share of the competing retailer in Scenario II. Finally, we investigate how retail competition affects the cooperative alliance's optimal strategy and channel performance. Our identified properties derived from the equilibrium and numerical analyses offer crucial insights for participants engaged in cooperative advertising within the SEA market.
comment: 60 pages, 17 figures,6 tables
Spatiotemporal Tubes for Probabilistic Temporal Reach-Avoid-Stay Task in Uncertain Dynamic Environment
In this work, we extend the Spatiotemporal Tube (STT) framework to address Probabilistic Temporal Reach-Avoid-Stay (PrT-RAS) tasks in dynamic environments with uncertain obstacles. We develop a real-time tube synthesis procedure that explicitly accounts for time-varying uncertain obstacles and provides formal probabilistic safety guarantees. The STT is formulated as a time-varying ball in the state space whose center and radius evolve online based on uncertain sensory information. We derive a closed-form, approximation-free control law that confines the system trajectory within the tube, ensuring both probabilistic safety and task satisfaction. Our method offers a formal guarantee for probabilistic avoidance and finite-time task completion. The resulting controller is model-free, approximation-free, and optimization-free, enabling efficient real-time execution while guaranteeing convergence to the target. The effectiveness and scalability of the framework are demonstrated through simulation studies and hardware experiments on mobile robots, a UAV, and a 7-DOF manipulator navigating in cluttered and uncertain environments.
Convergence Analysis of Natural Power Method and Its Applications to Control
This paper analyzes the discrete-time natural power method, demonstrating its convergence to the dominant $r$-dimensional subspace corresponding to the $r$ eigenvalues with the largest absolute values. This contrasts with the Oja flow, which targets eigenvalues with the largest real parts. We leverage this property to develop methods for model order reduction and low-rank controller synthesis for discrete-time LTI systems, proving preservation of key system properties. We also extend the low-rank control framework to slowly-varying LTV systems, showing its utility for tracking time-varying dominant subspaces.
comment: 6 pages. submitted
Planar Juggling of a Devil-Stick using Discrete VHCs
Planar juggling of a devil-stick using impulsive inputs is addressed using the concept of discrete virtual holonomic constraints (DVHC). The location of the center-of-mass of the devil-stick is specified in terms of its orientation at the discrete instants when impulsive control inputs are applied. The discrete zero dynamics (DZD) resulting from the choice of DVHC provides conditions for stable juggling. A control design that enforces the DVHC and an orbit stabilizing controller are presented. The approach is validated in simulation.
comment: 9 pages, 7 figures; this is an extended version of the article published in the IEEE Control Systems Letters
Robotics
Quadrupped-Legged Robot Movement Plan Generation using Large Language Model
Traditional control interfaces for quadruped robots often impose a high barrier to entry, requiring specialized technical knowledge for effective operation. To address this, this paper presents a novel control framework that integrates Large Language Models (LLMs) to enable intuitive, natural language-based navigation. We propose a distributed architecture where high-level instruction processing is offloaded to an external server to overcome the onboard computational constraints of the DeepRobotics Jueying Lite 3 platform. The system grounds LLM-generated plans into executable ROS navigation commands using real-time sensor fusion (LiDAR, IMU, and Odometry). Experimental validation was conducted in a structured indoor environment across four distinct scenarios, ranging from single-room tasks to complex cross-zone navigation. The results demonstrate the system's robustness, achieving an aggregate success rate of over 90\% across all scenarios, validating the feasibility of offloaded LLM-based planning for autonomous quadruped deployment in real-world settings.
LookPlanGraph: Embodied Instruction Following Method with VLM Graph Augmentation
Methods that use Large Language Models (LLM) as planners for embodied instruction following tasks have become widespread. To successfully complete tasks, the LLM must be grounded in the environment in which the robot operates. One solution is to use a scene graph that contains all the necessary information. Modern methods rely on prebuilt scene graphs and assume that all task-relevant information is available at the start of planning. However, these approaches do not account for changes in the environment that may occur between the graph construction and the task execution. We propose LookPlanGraph - a method that leverages a scene graph composed of static assets and object priors. During plan execution, LookPlanGraph continuously updates the graph with relevant objects, either by verifying existing priors or discovering new entities. This is achieved by processing the agents egocentric camera view using a Vision Language Model. We conducted experiments with changed object positions VirtualHome and OmniGibson simulated environments, demonstrating that LookPlanGraph outperforms methods based on predefined static scene graphs. To demonstrate the practical applicability of our approach, we also conducted experiments in a real-world setting. Additionally, we introduce the GraSIF (Graph Scenes for Instruction Following) dataset with automated validation framework, comprising 514 tasks drawn from SayPlan Office, BEHAVIOR-1K, and VirtualHome RobotHow. Project page available at https://lookplangraph.github.io .
RoboCade: Gamifying Robot Data Collection
Imitation learning from human demonstrations has become a dominant approach for training autonomous robot policies. However, collecting demonstration datasets is costly: it often requires access to robots and needs sustained effort in a tedious, long process. These factors limit the scale of data available for training policies. We aim to address this scalability challenge by involving a broader audience in a gamified data collection experience that is both accessible and motivating. Specifically, we develop a gamified remote teleoperation platform, RoboCade, to engage general users in collecting data that is beneficial for downstream policy training. To do this, we embed gamification strategies into the design of the system interface and data collection tasks. In the system interface, we include components such as visual feedback, sound effects, goal visualizations, progress bars, leaderboards, and badges. We additionally propose principles for constructing gamified tasks that have overlapping structure with useful downstream target tasks. We instantiate RoboCade on three manipulation tasks -- including spatial arrangement, scanning, and insertion. To illustrate the viability of gamified robot data collection, we collect a demonstration dataset through our platform, and show that co-training robot policies with this data can improve success rate on non-gamified target tasks (+16-56%). Further, we conduct a user study to validate that novice users find the gamified platform significantly more enjoyable than a standard non-gamified platform (+24%). These results highlight the promise of gamified data collection as a scalable, accessible, and engaging method for collecting demonstration data.
comment: 10 pages, 9 figures
UniTacHand: Unified Spatio-Tactile Representation for Human to Robotic Hand Skill Transfer
Tactile sensing is crucial for robotic hands to achieve human-level dexterous manipulation, especially in scenarios with visual occlusion. However, its application is often hindered by the difficulty of collecting large-scale real-world robotic tactile data. In this study, we propose to collect low-cost human manipulation data using haptic gloves for tactile-based robotic policy learning. The misalignment between human and robotic tactile data makes it challenging to transfer policies learned from human data to robots. To bridge this gap, we propose UniTacHand, a unified representation to align robotic tactile information captured by dexterous hands with human hand touch obtained from gloves. First, we project tactile signals from both human hands and robotic hands onto a morphologically consistent 2D surface space of the MANO hand model. This unification standardizes the heterogeneous data structures and inherently embeds the tactile signals with spatial context. Then, we introduce a contrastive learning method to align them into a unified latent space, trained on only 10 minutes of paired data from our data collection system. Our approach enables zero-shot tactile-based policy transfer from humans to a real robot, generalizing to objects unseen in the pre-training data. We also demonstrate that co-training on mixed data, including both human and robotic demonstrations via UniTacHand, yields better performance and data efficiency compared with using only robotic data. UniTacHand paves a path toward general, scalable, and data-efficient learning for tactile-based dexterous hands.
Relative Localization System Design for SnailBot: A Modular Self-reconfigurable Robot
This paper presents the design and implementation of a relative localization system for SnailBot, a modular self reconfigurable robot. The system integrates ArUco marker recognition, optical flow analysis, and IMU data processing into a unified fusion framework, enabling robust and accurate relative positioning for collaborative robotic tasks. Experimental validation demonstrates the effectiveness of the system in realtime operation, with a rule based fusion strategy ensuring reliability across dynamic scenarios. The results highlight the potential for scalable deployment in modular robotic systems.
comment: 7 pages, 7 figures, 4 algorithms
RoboSafe: Safeguarding Embodied Agents via Executable Safety Logic
Embodied agents powered by vision-language models (VLMs) are increasingly capable of executing complex real-world tasks, yet they remain vulnerable to hazardous instructions that may trigger unsafe behaviors. Runtime safety guardrails, which intercept hazardous actions during task execution, offer a promising solution due to their flexibility. However, existing defenses often rely on static rule filters or prompt-level control, which struggle to address implicit risks arising in dynamic, temporally dependent, and context-rich environments. To address this, we propose RoboSafe, a hybrid reasoning runtime safeguard for embodied agents through executable predicate-based safety logic. RoboSafe integrates two complementary reasoning processes on a Hybrid Long-Short Safety Memory. We first propose a Backward Reflective Reasoning module that continuously revisits recent trajectories in short-term memory to infer temporal safety predicates and proactively triggers replanning when violations are detected. We then propose a Forward Predictive Reasoning module that anticipates upcoming risks by generating context-aware safety predicates from the long-term safety memory and the agent's multimodal observations. Together, these components form an adaptive, verifiable safety logic that is both interpretable and executable as code. Extensive experiments across multiple agents demonstrate that RoboSafe substantially reduces hazardous actions (-36.8% risk occurrence) compared with leading baselines, while maintaining near-original task performance. Real-world evaluations on physical robotic arms further confirm its practicality. Code will be released upon acceptance.
comment: 11 pages, 6 figures
Wireless Center of Pressure Feedback System for Humanoid Robot Balance Control using ESP32-C3
Maintaining stability during the single-support phase is a fundamental challenge in humanoid robotics, particularly in dance robots that require complex maneuvers and high mechanical freedom. Traditional tethered sensor configurations often restrict joint movement and introduce mechanical noises. This study proposes a wireless embedded balance system designed to maintain stability on uneven surfaces. The system utilizes a custom-designed foot unit integrated with four load cells and an ESP32-C3 microcontroller to estimate the Center of Pressure (CoP) in real time. The CoP data were transmitted wirelessly to the main controller to minimize the wiring complexity of the 29-DoF VI-ROSE humanoid robot. A PID control strategy is implemented to adjust the torso, hip, and ankle roll joints based on CoP feedback. Experimental characterization demonstrated high sensor precision with an average measurement error of 14.8 g. Furthermore, the proposed control system achieved a 100% success rate in maintaining balance during single-leg lifting tasks at a 3-degree inclination with optimized PID parameters (Kp=0.10, Kd=0.005). These results validate the efficacy of wireless CoP feedback in enhancing the postural stability of humanoid robots, without compromising their mechanical flexibility.
Schrödinger's Navigator: Imagining an Ensemble of Futures for Zero-Shot Object Navigation
Zero-shot object navigation (ZSON) requires a robot to locate a target object in a previously unseen environment without relying on pre-built maps or task-specific training. However, existing ZSON methods often struggle in realistic and cluttered environments, particularly when the scene contains heavy occlusions, unknown risks, or dynamically moving target objects. To address these challenges, we propose \textbf{Schrödinger's Navigator}, a navigation framework inspired by Schrödinger's thought experiment on uncertainty. The framework treats unobserved space as a set of plausible future worlds and reasons over them before acting. Conditioned on egocentric visual inputs and three candidate trajectories, a trajectory-conditioned 3D world model imagines future observations along each path. This enables the agent to see beyond occlusions and anticipate risks in unseen regions without requiring extra detours or dense global mapping. The imagined 3D observations are fused into the navigation map and used to update a value map. These updates guide the policy toward trajectories that avoid occlusions, reduce exposure to uncertain space, and better track moving targets. Experiments on a Go2 quadruped robot across three challenging scenarios, including severe static occlusions, unknown risks, and dynamically moving targets, show that Schrödinger's Navigator consistently outperforms strong ZSON baselines in self-localization, object localization, and overall Success Rate in occlusion-heavy environments. These results demonstrate the effectiveness of trajectory-conditioned 3D imagination in enabling robust zero-shot object navigation.
Flocking phase transition and threat responses in bio-inspired autonomous drone swarms
Collective motion inspired by animal groups offers powerful design principles for autonomous aerial swarms. We present a bio-inspired 3D flocking algorithm in which each drone interacts only with a minimal set of influential neighbors, relying solely on local alignment and attraction cues. By systematically tuning these two interaction gains, we map a phase diagram revealing sharp transitions between swarming and schooling, as well as a critical region where susceptibility, polarization fluctuations, and reorganization capacity peak. Outdoor experiments with a swarm of ten drones, combined with simulations using a calibrated flight-dynamics model, show that operating near this transition enhances responsiveness to external disturbances. When confronted with an intruder, the swarm performs rapid collective turns, transient expansions, and reliably recovers high alignment within seconds. These results demonstrate that minimal local-interaction rules are sufficient to generate multiple collective phases and that simple gain modulation offers an efficient mechanism to adjust stability, flexibility, and resilience in drone swarms.
SparScene: Efficient Traffic Scene Representation via Sparse Graph Learning for Large-Scale Trajectory Generation
Multi-agent trajectory generation is a core problem for autonomous driving and intelligent transportation systems. However, efficiently modeling the dynamic interactions between numerous road users and infrastructures in complex scenes remains an open problem. Existing methods typically employ distance-based or fully connected dense graph structures to capture interaction information, which not only introduces a large number of redundant edges but also requires complex and heavily parameterized networks for encoding, thereby resulting in low training and inference efficiency, limiting scalability to large and complex traffic scenes. To overcome the limitations of existing methods, we propose SparScene, a sparse graph learning framework designed for efficient and scalable traffic scene representation. Instead of relying on distance thresholds, SparScene leverages the lane graph topology to construct structure-aware sparse connections between agents and lanes, enabling efficient yet informative scene graph representation. SparScene adopts a lightweight graph encoder that efficiently aggregates agent-map and agent-agent interactions, yielding compact scene representations with substantially improved efficiency and scalability. On the motion prediction benchmark of the Waymo Open Motion Dataset (WOMD), SparScene achieves competitive performance with remarkable efficiency. It generates trajectories for more than 200 agents in a scene within 5 ms and scales to more than 5,000 agents and 17,000 lanes with merely 54 ms of inference time with a GPU memory of 2.9 GB, highlighting its superior scalability for large-scale traffic scenes.
comment: 13 pages, 7 figures, 5 tables
Robust and Efficient MuJoCo-based Model Predictive Control via Web of Affine Spaces Derivatives ICRA 2026
MuJoCo is a powerful and efficient physics simulator widely used in robotics. One common way it is applied in practice is through Model Predictive Control (MPC), which uses repeated rollouts of the simulator to optimize future actions and generate responsive control policies in real time. To make this process more accessible, the open source library MuJoCo MPC (MJPC) provides ready-to-use MPC algorithms and implementations built directly on top of the MuJoCo simulator. However, MJPC relies on finite differencing (FD) to compute derivatives through the underlying MuJoCo simulator, which is often a key bottleneck that can make it prohibitively costly for time-sensitive tasks, especially in high-DOF systems or complex scenes. In this paper, we introduce the use of Web of Affine Spaces (WASP) derivatives within MJPC as a drop-in replacement for FD. WASP is a recently developed approach for efficiently computing sequences of accurate derivative approximations. By reusing information from prior, related derivative calculations, WASP accelerates and stabilizes the computation of new derivatives, making it especially well suited for MPC's iterative, fine-grained updates over time. We evaluate WASP across a diverse suite of MJPC tasks spanning multiple robot embodiments. Our results suggest that WASP derivatives are particularly effective in MJPC: it integrates seamlessly across tasks, delivers consistently robust performance, and achieves up to a 2$\mathsf{x}$ speedup compared to an FD backend when used with derivative-based planners, such as iLQG. In addition, WASP-based MPC outperforms MJPC's stochastic sampling-based planners on our evaluation tasks, offering both greater efficiency and reliability. To support adoption and future research, we release an open-source implementation of MJPC with WASP derivatives fully integrated.
comment: Submitted to 2026 IEEE International Conference on Robotics & Automation (ICRA 2026)
Global End-Effector Pose Control of an Underactuated Aerial Manipulator via Reinforcement Learning
Aerial manipulators, which combine robotic arms with multi-rotor drones, face strict constraints on arm weight and mechanical complexity. In this work, we study a lightweight 2-degree-of-freedom (DoF) arm mounted on a quadrotor via a differential mechanism, capable of full six-DoF end-effector pose control. While the minimal design enables simplicity and reduced payload, it also introduces challenges such as underactuation and sensitivity to external disturbances, including manipulation of heavy loads and pushing tasks. To address these, we employ reinforcement learning, training a Proximal Policy Optimization (PPO) agent in simulation to generate feedforward commands for quadrotor acceleration and body rates, along with joint angle targets. These commands are tracked by an incremental nonlinear dynamic inversion (INDI) attitude controller and a PID joint controller, respectively. Flight experiments demonstrate centimeter-level position accuracy and degree-level orientation precision, with robust performance under external force disturbances. The results highlight the potential of learning-based control strategies for enabling contact-rich aerial manipulation using simple, lightweight platforms.
comment: 8 pages, 6 figures
Language-Guided Grasp Detection with Coarse-to-Fine Learning for Robotic Manipulation
Grasping is one of the most fundamental challenging capabilities in robotic manipulation, especially in unstructured, cluttered, and semantically diverse environments. Recent researches have increasingly explored language-guided manipulation, where robots not only perceive the scene but also interpret task-relevant natural language instructions. However, existing language-conditioned grasping methods typically rely on shallow fusion strategies, leading to limited semantic grounding and weak alignment between linguistic intent and visual grasp reasoning.In this work, we propose Language-Guided Grasp Detection (LGGD) with a coarse-to-fine learning paradigm for robotic manipulation. LGGD leverages CLIP-based visual and textual embeddings within a hierarchical cross-modal fusion pipeline, progressively injecting linguistic cues into the visual feature reconstruction process. This design enables fine-grained visual-semantic alignment and improves the feasibility of the predicted grasps with respect to task instructions. In addition, we introduce a language-conditioned dynamic convolution head (LDCH) that mixes multiple convolution experts based on sentence-level features, enabling instruction-adaptive coarse mask and grasp predictions. A final refinement module further enhances grasp consistency and robustness in complex scenes.Experiments on the OCID-VLG and Grasp-Anything++ datasets show that LGGD surpasses existing language-guided grasping methods, exhibiting strong generalization to unseen objects and diverse language queries. Moreover, deployment on a real robotic platform demonstrates the practical effectiveness of our approach in executing accurate, instruction-conditioned grasp actions. The code will be released publicly upon acceptance.
comment: Submitted to IEEE Journal
Tracing Energy Flow: Learning Tactile-based Grasping Force Control to Prevent Slippage in Dynamic Object Interaction
Regulating grasping force to reduce slippage during dynamic object interaction remains a fundamental challenge in robotic manipulation, especially when objects are manipulated by multiple rolling contacts, have unknown properties (such as mass or surface conditions), and when external sensing is unreliable. In contrast, humans can quickly regulate grasping force by touch, even without visual cues. Inspired by this ability, we aim to enable robotic hands to rapidly explore objects and learn tactile-driven grasping force control under motion and limited sensing. We propose a physics-informed energy abstraction that models the object as a virtual energy container. The inconsistency between the fingers' applied power and the object's retained energy provides a physically grounded signal for inferring slip-aware stability. Building on this abstraction, we employ model-based learning and planning to efficiently model energy dynamics from tactile sensing and perform real-time grasping force optimization. Experiments in both simulation and hardware demonstrate that our method can learn grasping force control from scratch within minutes, effectively reduce slippage, and extend grasp duration across diverse motion-object pairs, all without relying on external sensing or prior object knowledge.
comment: 8 pages. Accepted by IEEE Robotics and Automation Letters (RA-L)
Multimodal Sensing for Robot-Assisted Sub-Tissue Feature Detection in Physiotherapy Palpation
Robotic palpation relies on force sensing, but force signals in soft-tissue environments are variable and cannot reliably reveal subtle subsurface features. We present a compact multimodal sensor that integrates high-resolution vision-based tactile imaging with a 6-axis force-torque sensor. In experiments on silicone phantoms with diverse subsurface tendon geometries, force signals alone frequently produce ambiguous responses, while tactile images reveal clear structural differences in presence, diameter, depth, crossings, and multiplicity. Yet accurate force tracking remains essential for maintaining safe, consistent contact during physiotherapeutic interaction. Preliminary results show that combining tactile and force modalities enables robust subsurface feature detection and controlled robotic palpation.
comment: 6 pages, 9 figures, submitted to DMD2026
Generalised Linear Models in Deep Bayesian RL with Learnable Basis Functions
Bayesian Reinforcement Learning (BRL) provides a framework for generalisation of Reinforcement Learning (RL) problems from its use of Bayesian task parameters in the transition and reward models. However, classical BRL methods assume known forms of transition and reward models, reducing their applicability in real-world problems. As a result, recent deep BRL methods have started to incorporate model learning, though the use of neural networks directly on the joint data and task parameters requires optimising the Evidence Lower Bound (ELBO). ELBOs are difficult to optimise and may result in indistinctive task parameters, hence compromised BRL policies. To this end, we introduce a novel deep BRL method, Generalised Linear Models in Deep Bayesian RL with Learnable Basis Functions (GLiBRL), that enables efficient and accurate learning of transition and reward models, with fully tractable marginal likelihood and Bayesian inference on task parameters and model noises. On challenging MetaWorld ML10/45 benchmarks, GLiBRL improves the success rate of one of the state-of-the-art deep BRL methods, VariBAD, by up to 2.7x. Comparing against representative or recent deep BRL / Meta-RL methods, such as MAML, RL2, SDVT, TrMRL and ECET, GLiBRL also demonstrates its low-variance and decent performance consistently.
From Human Bias to Robot Choice: How Occupational Contexts and Racial Priming Shape Robot Selection
As artificial agents increasingly integrate into professional environments, fundamental questions have emerged about how societal biases influence human-robot selection decisions. We conducted two comprehensive experiments (N = 1,038) examining how occupational contexts and stereotype activation shape robotic agent choices across construction, healthcare, educational, and athletic domains. Participants made selections from artificial agents that varied systematically in skin tone and anthropomorphic characteristics. Our study revealed distinct context-dependent patterns. Healthcare and educational scenarios demonstrated strong favoritism toward lighter-skinned artificial agents, while construction and athletic contexts showed greater acceptance of darker-toned alternatives. Participant race was associated with systematic differences in selection patterns across professional domains. The second experiment demonstrated that exposure to human professionals from specific racial backgrounds systematically shifted later robotic agent preferences in stereotype-consistent directions. These findings show that occupational biases and color-based discrimination transfer directly from human-human to human-robot evaluation contexts. The results highlight mechanisms through which robotic deployment may unintentionally perpetuate existing social inequalities.
comment: HRI '26
ETP-R1: Evolving Topological Planning with Reinforcement Fine-tuning for Vision-Language Navigation in Continuous Environments
Vision-Language Navigation in Continuous Environments (VLN-CE) requires an embodied agent to navigate towards target in continuous environments, following natural language instructions. While current graph-based methods offer an efficient, structured approach by abstracting the environment into a topological map and simplifying the action space to waypoint selection, they lag behind methods based on Large Vision-Language Models (LVLMs) in leveraging large-scale data and advanced training paradigms. In this paper, we try to bridge this gap by introducing ETP-R1, a framework that applies the paradigm of scaling up data and Reinforcement Fine-Tuning (RFT) to a graph-based VLN-CE model. To build a strong foundation, we first construct a high-quality, large-scale pretraining dataset using the Gemini API. This dataset consists of diverse, low-hallucination instructions for topological trajectories, providing rich supervision for our graph-based policy to map language to topological paths. This foundation is further strengthened by unifying data from both R2R and RxR tasks for joint pretraining. Building on this, we introduce a three-stage training paradigm, which culminates in the first application of closed-loop, online RFT to a graph-based VLN-CE model, powered by the Group Relative Policy Optimization (GRPO) algorithm. Extensive experiments demonstrate that our approach is highly effective, establishing new state-of-the-art performance across all major metrics on both the R2R-CE and RxR-CE benchmarks. Our code is available at https://github.com/Cepillar/ETP-R1.
comment: 8 pages, 6 figures
Certifiable Alignment of GNSS and Local Frames via Lagrangian Duality
Estimating the absolute orientation of a local system relative to a global navigation satellite system (GNSS) reference often suffers from local minima and high dependency on satellite availability. Existing methods for this alignment task rely on abundant satellites unavailable in GNSS-degraded environments, or use local optimization methods which cannot guarantee the optimality of a solution. This work introduces a globally optimal solver that transforms raw pseudo-range or Doppler measurements into a convexly relaxed problem. The proposed method is certifiable, meaning it can numerically verify the correctness of the result, filling a gap where existing local optimizers fail. We first formulate the original frame alignment problem as a nonconvex quadratically constrained quadratic program (QCQP) problem and relax the QCQP problem to a concave Lagrangian dual problem that provides a lower cost bound for the original problem. Then we perform relaxation tightness and observability analysis to derive criteria for certifiable optimality of the solution. Finally, simulation and real world experiments are conducted to evaluate the proposed method. The experiments show that our method provides certifiably optimal solutions even with only 2 satellites with Doppler measurements and 2D vehicle motion, while the traditional velocity-based VOBA method and the advanced GVINS alignment technique may fail or converge to local optima without notice. To support the development of GNSS-based navigation techniques in robotics, all code and data are open-sourced at https://github.com/Baoshan-Song/Certifiable-Doppler-alignment.
Stretchable and High-Precision Optical Tactile Sensor for Trajectory Tracking of Parallel Mechanisms IROS
Stretchable sensors indicate promising prospects for soft robotics, medical devices, and human-machine interactions due to the high compliance of soft materials. Discrete sensing strategies, including sensor arrays and distributed sensors, are broadly involved in tactile sensors across versatile applications. However, it remains a challenge to achieve high spatial resolution with self-decoupled capacity and insensitivity to other off-axis stimuli for stretchable tactile sensors. Herein, we develop a stretchable tactile sensor based on the proposed continuous spectral-filtering principle, allowing superhigh resolution for applied stimuli. This proposed sensor enables a high-linear spatial response (0.996) even during stretching and bending, and high continuous spatial (7 μm) and force (5 mN) resolutions with design scalability and interaction robustness to survive piercing and cutting. We further demonstrate the sensors' performance by integrating them into a planar parallel mechanism for precise trajectory tracking (rotational resolution: 0.02°) in real time.
comment: Accepted by 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Proprioception Enhances Vision Language Model in Generating Captions and Subtask Segmentations for Robot Task
From the perspective of future developments in robotics, it is crucial to verify whether foundation models trained exclusively on offline data, such as images and language, can understand the robot motion. In particular, since Vision Language Models (VLMs) do not include low-level motion information from robots in their training datasets, video understanding including trajectory information remains a significant challenge. In this study, we assess two capabilities of VLMs through a video captioning task with low-level robot motion information: (1) automatic captioning of robot tasks and (2) segmentation of a series of tasks. Both capabilities are expected to enhance the efficiency of robot imitation learning by linking language and motion and serve as a measure of the foundation model's performance. The proposed method generates multiple "scene" captions using image captions and trajectory data from robot tasks. The full task caption is then generated by summarizing these individual captions. Additionally, the method performs subtask segmentation by comparing the similarity between text embeddings of image captions. In both captioning tasks, the proposed method aims to improve performance by providing the robot's motion data - joint and end-effector states - as input to the VLM. Simulator experiments were conducted to validate the effectiveness of the proposed method.
Early warning signals for loss of control
Maintaining stability in feedback systems, from aircraft and autonomous robots to biological and physiological systems, relies on monitoring their behavior and continuously adjusting their inputs. Incremental damage can make such control fragile. This tends to go unnoticed until a small perturbation induces instability (i.e. loss of control). Traditional methods in the field of engineering rely on accurate system models to compute a safe set of operating instructions, which become invalid when the, possibly damaged, system diverges from its model. Here we demonstrate that the approach of such a feedback system towards instability can nonetheless be monitored through dynamical indicators of resilience. This holistic system safety monitor does not rely on a system model and is based on the generic phenomenon of critical slowing down, shown to occur in the climate, biology and other complex nonlinear systems approaching criticality. Our findings for engineered devices opens up a wide range of applications involving real-time early warning systems as well as an empirical guidance of resilient system design exploration, or "tinkering". While we demonstrate the validity using drones, the generic nature of the underlying principles suggest that these indicators could apply across a wider class of controlled systems including reactors, aircraft, and self-driving cars.
Planetary Terrain Datasets and Benchmarks for Rover Path Planning
Planetary rover exploration is attracting renewed interest with several upcoming space missions to the Moon and Mars. However, a substantial amount of data from prior missions remain underutilized for path planning and autonomous navigation research. As a result, there is a lack of space mission-based planetary datasets, standardized benchmarks, and evaluation protocols. In this paper, we take a step towards coordinating these three research directions in the context of planetary rover path planning. We propose the first two large planar benchmark datasets, MarsPlanBench and MoonPlanBench, derived from high-resolution digital terrain images of Mars and the Moon. In addition, we set up classical and learned path planning algorithms, in a unified framework, and evaluate them on our proposed datasets and on a popular planning benchmark. Through comprehensive experiments, we report new insights on the performance of representative path planning algorithms on planetary terrains, for the first time to the best of our knowledge. Our results show that classical algorithms can achieve up to 100% global path planning success rates on average across challenging terrains such as Moon's north and south poles. This suggests, for instance, why these algorithms are used in practice by NASA. Conversely, learning-based models, although showing promising results in less complex environments, still struggle to generalize to planetary domains. To serve as a starting point for fundamental path planning research, our code and datasets will be released at: https://github.com/mchancan/PlanetaryPathBench.
EVE: A Generator-Verifier System for Generative Policies
Visuomotor policies based on generative architectures such as diffusion and flow-based matching have shown strong performance but degrade under distribution shifts, demonstrating limited recovery capabilities without costly finetuning. In the language modeling domain, test-time compute scaling has revolutionized reasoning capabilities of modern LLMs by leveraging additional inference-time compute for candidate solution refinement. These methods typically leverage foundation models as verification modules in a zero-shot manner to synthesize improved candidate solutions. In this work, we hypothesize that generative policies can similarly benefit from additional inference-time compute that employs zero-shot VLM-based verifiers. A systematic analysis of improving policy performance through the generation-verification framework remains relatively underexplored in the current literature. To this end, we introduce EVE - a modular, generator-verifier interaction framework - that boosts the performance of pretrained generative policies at test time, with no additional training. EVE wraps a frozen base policy with multiple zero-shot, VLM-based verifier agents. Each verifier proposes action refinements to the base policy candidate actions, while an action incorporator fuses the aggregated verifier output into the base policy action prediction to produce the final executed action. We study design choices for generator-verifier information interfacing across a system of verifiers with distinct capabilities. Across a diverse suite of manipulation tasks, EVE consistently improves task success rates without any additional policy training. Through extensive ablations, we isolate the contribution of verifier capabilities and action incorporator strategies, offering practical guidelines to build scalable, modular generator-verifier systems for embodied control.
Developing a Fundamental Diagram for Urban Air Mobility Based on Physical Experiments
Urban Air Mobility (UAM) is an emerging application of unmanned aerial vehicles (UAVs) that promises to reduce travel time and alleviate congestion in urban transportation systems. As drone density increases, UAM operations are expected to experience congestion similar to that in ground traffic. However, the fundamental characteristics of UAM traffic flow, particularly under real-world operating conditions, remain poorly understood. This study proposes a general framework for constructing the fundamental diagram (FD) of UAM traffic by integrating theoretical analysis with physical experiments. To the best of our knowledge, this is the first study to derive a UAM FD using real-world physical test data. On the theoretical side, we design two drone control laws for collision avoidance and develop simulation-based traffic generation methods to produce diverse UAM traffic scenarios. Based on Edie's definition, traffic flow theory is then applied to construct the FD and characterize the macroscopic properties of UAM traffic. To account for real-world disturbances and modeling uncertainties, we further conduct physical experiments on a reduced-scale testbed using Bitcraze Crazyflie drones. Both simulation and physical test trajectory data are collected and organized into the UAMTra2Flow dataset, which is analyzed using the proposed framework. Preliminary results indicate that classical FD structures for ground transportation are also applicable to UAM systems. Notably, FD curves obtained from physical experiments exhibit deviations from simulation-based results, highlighting the importance of experimental validation. Finally, results from the reduced-scale testbed are scaled to realistic operating conditions to provide practical insights for future UAM traffic systems. The dataset and code for this paper are publicly available at https://github.com/CATS-Lab/UAM-FD.
Fast Navigation Through Occluded Spaces via Language-Conditioned Map Prediction
In cluttered environments, motion planners often face a trade-off between safety and speed due to uncertainty caused by occlusions and limited sensor range. In this work, we investigate whether co-pilot instructions can help robots plan more decisively while remaining safe. We introduce PaceForecaster, as an approach that incorporates such co-pilot instructions into local planners. PaceForecaster takes the robot's local sensor footprint (Level-1) and the provided co-pilot instructions as input and predicts (i) a forecasted map with all regions visible from Level-1 (Level-2) and (ii) an instruction-conditioned subgoal within Level-2. The subgoal provides the planner with explicit guidance to exploit the forecasted environment in a goal-directed manner. We integrate PaceForecaster with a Log-MPPI controller and demonstrate that using language-conditioned forecasts and goals improves navigation performance by 36% over a local-map-only baseline while in polygonal environments.
Safe Path Planning and Observation Quality Enhancement Strategy for Unmanned Aerial Vehicles in Water Quality Monitoring Tasks
Unmanned Aerial Vehicle (UAV) spectral remote sensing technology is widely used in water quality monitoring. However, in dynamic environments, varying illumination conditions, such as shadows and specular reflection (sun glint), can cause severe spectral distortion, thereby reducing data availability. To maximize the acquisition of high-quality data while ensuring flight safety, this paper proposes an active path planning method for dynamic light and shadow disturbance avoidance. First, a dynamic prediction model is constructed to transform the time-varying light and shadow disturbance areas into three-dimensional virtual obstacles. Second, an improved Interfered Fluid Dynamical System (IFDS) algorithm is introduced, which generates a smooth initial obstacle avoidance path by building a repulsive force field. Subsequently, a Model Predictive Control (MPC) framework is employed for rolling-horizon path optimization to handle flight dynamics constraints and achieve real-time trajectory tracking. Furthermore, a Dynamic Flight Altitude Adjustment (DFAA) mechanism is designed to actively reduce the flight altitude when the observable area is narrow, thereby enhancing spatial resolution. Simulation results show that, compared with traditional PID and single obstacle avoidance algorithms, the proposed method achieves an obstacle avoidance success rate of 98% in densely disturbed scenarios, significantly improves path smoothness, and increases the volume of effective observation data by approximately 27%. This research provides an effective engineering solution for precise UAV water quality monitoring in complex illumination environments.
Evaluating an Adaptive Multispectral Turret System for Autonomous Tracking Across Variable Illumination Conditions
Autonomous robotic platforms are playing a growing role across the emergency services sector, supporting missions such as search and rescue operations in disaster zones and reconnaissance. However, traditional red-green-blue (RGB) detection pipelines struggle in low-light environments, and thermal-based systems lack color and texture information. To overcome these limitations, we present an adaptive framework that fuses RGB and long-wave infrared (LWIR) video streams at multiple fusion ratios and dynamically selects the optimal detection model for each illumination condition. We trained 33 You Only Look Once (YOLO) models on over 22,000 annotated images spanning three light levels: no-light (<10 lux), dim-light (10-1000 lux), and full-light (>1000 lux). To integrate both modalities, fusion was performed by blending aligned RGB and LWIR frames at eleven ratios, from full RGB (100/0) to full LWIR (0/100) in 10% increments. Evaluation showed that the best full-light model (80/20 RGB-LWIR) and dim-light model (90/10 fusion) achieved 92.8% and 92.0% mean confidence; both significantly outperformed the YOLOv5 nano (YOLOv5n) and YOLOv11 nano (YOLOv11n) baselines. Under no-light conditions, the top 40/60 fusion reached 71.0%, exceeding baselines though not statistically significant. Adaptive RGB-LWIR fusion improved detection confidence and reliability across all illumination conditions, enhancing autonomous robotic vision performance.
NavDP: Learning Sim-to-Real Navigation Diffusion Policy with Privileged Information Guidance
Learning to navigate in dynamic and complex open-world environments is a critical yet challenging capability for autonomous robots. Existing approaches often rely on cascaded modular frameworks, which require extensive hyperparameter tuning or learning from limited real-world demonstration data. In this paper, we propose Navigation Diffusion Policy (NavDP), an end-to-end network trained solely in simulation that enables zero-shot sim-to-real transfer across diverse environments and robot embodiments. The core of NavDP is a unified transformer-based architecture that jointly learns trajectory generation and trajectory evaluation, both conditioned solely on local RGB-D observation. By learning to predict critic values for contrastive trajectory samples, our proposed approach effectively leverages supervision from privileged information available in simulation, thereby fostering accurate spatial understanding and enabling the distinction between safe and dangerous behaviors. To support this, we develop an efficient data generation pipeline in simulation and construct a large-scale dataset encompassing over one million meters of navigation experience across 3,000 scenes. Empirical experiments in both simulated and real-world environments demonstrate that NavDP significantly outperforms prior state-of-the-art methods. Furthermore, we identify key factors influencing the generalization performance of NavDP. The dataset and code are publicly available at https://wzcai99.github.io/navigation-diffusion-policy.github.io.
comment: Project Page: https://wzcai99.github.io/navigation-diffusion-policy.github.io/
State-Conditional Adversarial Learning: An Off-Policy Visual Domain Transfer Method for End-to-End Imitation Learning
We study visual domain transfer for end-to-end imitation learning in a realistic and challenging setting where target-domain data are strictly off-policy, expert-free, and scarce. We first provide a theoretical analysis showing that the target-domain imitation loss can be upper bounded by the source-domain loss plus a state-conditional latent KL divergence between source and target observation models. Guided by this result, we propose State- Conditional Adversarial Learning, an off-policy adversarial framework that aligns latent distributions conditioned on system state using a discriminator-based estimator of the conditional KL term. Experiments on visually diverse autonomous driving environments built on the BARC-CARLA simulator demonstrate that SCAL achieves robust transfer and strong sample efficiency.
RGMP: Recurrent Geometric-prior Multimodal Policy for Generalizable Humanoid Robot Manipulation
Humanoid robots exhibit significant potential in executing diverse human-level skills. However, current research predominantly relies on data-driven approaches that necessitate extensive training datasets to achieve robust multimodal decision-making capabilities and generalizable visuomotor control. These methods raise concerns due to the neglect of geometric reasoning in unseen scenarios and the inefficient modeling of robot-target relationships within the training data, resulting in significant waste of training resources. To address these limitations, we present the Recurrent Geometric-prior Multimodal Policy (RGMP), an end-to-end framework that unifies geometric-semantic skill reasoning with data-efficient visuomotor control. For perception capabilities, we propose the Geometric-prior Skill Selector, which infuses geometric inductive biases into a vision language model, producing adaptive skill sequences for unseen scenes with minimal spatial common sense tuning. To achieve data-efficient robotic motion synthesis, we introduce the Adaptive Recursive Gaussian Network, which parameterizes robot-object interactions as a compact hierarchy of Gaussian processes that recursively encode multi-scale spatial relationships, yielding dexterous, data-efficient motion synthesis even from sparse demonstrations. Evaluated on both our humanoid robot and desktop dual-arm robot, the RGMP framework achieves 87% task success in generalization tests and exhibits 5x greater data efficiency than the state-of-the-art model. This performance underscores its superior cross-domain generalization, enabled by geometric-semantic reasoning and recursive-Gaussion adaptation.
DAPPER: Discriminability-Aware Policy-to-Policy Preference-Based Reinforcement Learning for Query-Efficient Robot Skill Acquisition
Preference-based Reinforcement Learning (PbRL) enables policy learning through simple queries comparing trajectories from a single policy. While human responses to these queries make it possible to learn policies aligned with human preferences, PbRL suffers from low query efficiency, as policy bias limits trajectory diversity and reduces the number of discriminable queries available for learning preferences. This paper identifies preference discriminability, which quantifies how easily a human can judge which trajectory is closer to their ideal behavior, as a key metric for improving query efficiency. To address this, we move beyond comparisons within a single policy and instead generate queries by comparing trajectories from multiple policies, as training them from scratch promotes diversity without policy bias. We propose Discriminability-Aware Policy-to-Policy Preference-Based Efficient Reinforcement Learning (DAPPER), which integrates preference discriminability with trajectory diversification achieved by multiple policies. DAPPER trains new policies from scratch after each reward update and employs a discriminator that learns to estimate preference discriminability, enabling the prioritized sampling of more discriminable queries. During training, it jointly maximizes the preference reward and preference discriminability score, encouraging the discovery of highly rewarding and easily distinguishable policies. Experiments in simulated and real-world legged robot environments demonstrate that DAPPER outperforms previous methods in query efficiency, particularly under challenging preference discriminability conditions.
comment: Accepted for IEEE Robotics & Automation Magazine (RAM)
Imperative Learning: A Self-supervised Neuro-Symbolic Learning Framework for Robot Autonomy
Data-driven methods such as reinforcement and imitation learning have achieved remarkable success in robot autonomy. However, their data-centric nature still hinders them from generalizing well to ever-changing environments. Moreover, labeling data for robotic tasks is often impractical and expensive. To overcome these challenges, we introduce a new self-supervised neuro-symbolic (NeSy) computational framework, imperative learning (IL), for robot autonomy, leveraging the generalization abilities of symbolic reasoning. The framework of IL consists of three primary components: a neural module, a reasoning engine, and a memory system. We formulate IL as a special bilevel optimization (BLO), which enables reciprocal learning over the three modules. This overcomes the label-intensive obstacles associated with data-driven approaches and takes advantage of symbolic reasoning concerning logical reasoning, physical principles, geometric analysis, etc. We discuss several optimization techniques for IL and verify their effectiveness in five distinct robot autonomy tasks including path planning, rule induction, optimal control, visual odometry, and multi-robot routing. Through various experiments, we show that IL can significantly enhance robot autonomy capabilities and we anticipate that it will catalyze further research across diverse domains.
TacMan-Turbo: Proactive Tactile Control for Robust and Efficient Articulated Object Manipulation
Adept manipulation of articulated objects is essential for robots to operate successfully in human environments. Such manipulation requires both effectiveness--reliable operation despite uncertain object structures--and efficiency--swift execution with minimal redundant steps and smooth actions. Existing approaches struggle to achieve both objectives simultaneously: methods relying on predefined kinematic models lack effectiveness when encountering structural variations, while tactile-informed approaches achieve robust manipulation without kinematic priors but compromise efficiency through reactive, step-by-step exploration-compensation cycles. This paper introduces TacMan-Turbo, a novel proactive tactile control framework for articulated object manipulation that mitigates this fundamental trade-off. Unlike previous approaches that treat tactile contact deviations merely as error signals requiring compensation, our method interprets these deviations as rich sources of local kinematic information. This new perspective enables our controller to predict optimal future interactions and make proactive adjustments, significantly enhancing manipulation efficiency. In comprehensive evaluations across 200 diverse simulated articulated objects and real-world experiments, our approach maintains a 100% success rate while significantly outperforming the previous tactile-informed method in time efficiency, action efficiency, and trajectory smoothness (all p-values < 0.0001). These results demonstrate that the long-standing trade-off between effectiveness and efficiency in articulated object manipulation can be successfully resolved without relying on prior kinematic knowledge.
Analyzing Key Objectives in Human-to-Robot Retargeting for Dexterous Manipulation
Kinematic retargeting from human hands to robot hands is essential for transferring dexterity from humans to robots in manipulation teleoperation and imitation learning. However, due to mechanical differences between human and robot hands, completely reproducing human motions on robot hands is impossible. Existing works on retargeting incorporate various optimization objectives, focusing on different aspects of hand configuration. However, the lack of experimental comparative studies leaves the significance and effectiveness of these objectives unclear. This work aims to analyze these retargeting objectives for dexterous manipulation through extensive real-world comparative experiments. Specifically, we propose a comprehensive retargeting objective formulation that integrates intuitively crucial factors appearing in recent approaches. The significance of each factor is evaluated through experimental ablation studies on the full objective in kinematic posture retargeting and real-world teleoperated manipulation tasks. Experimental results and conclusions provide valuable insights for designing more accurate and effective retargeting algorithms for real-world dexterous manipulation.
comment: v2: Extended the main text with additional analysis and implementation details
CoDrone: Autonomous Drone Navigation Assisted by Edge and Cloud Foundation Models
Autonomous navigation for Unmanned Aerial Vehicles faces key challenges from limited onboard computational resources, which restrict deployed deep neural networks to shallow architectures incapable of handling complex environments. Offloading tasks to remote edge servers introduces high latency, creating an inherent trade-off in system design. To address these limitations, we propose CoDrone - the first cloud-edge-end collaborative computing framework integrating foundation models into autonomous UAV cruising scenarios - effectively leveraging foundation models to enhance performance of resource-constrained unmanned aerial vehicle platforms. To reduce onboard computation and data transmission overhead, CoDrone employs grayscale imagery for the navigation model. When enhanced environmental perception is required, CoDrone leverages the edge-assisted foundation model Depth Anything V2 for depth estimation and introduces a novel one-dimensional occupancy grid-based navigation method - enabling fine-grained scene understanding while advancing efficiency and representational simplicity of autonomous navigation. A key component of CoDrone is a Deep Reinforcement Learning-based neural scheduler that seamlessly integrates depth estimation with autonomous navigation decisions, enabling real-time adaptation to dynamic environments. Furthermore, the framework introduces a UAV-specific vision language interaction module incorporating domain-tailored low-level flight primitives to enable effective interaction between the cloud foundation model and the UAV. The introduction of VLM enhances open-set reasoning capabilities in complex unseen scenarios. Experimental results show CoDrone outperforms baseline methods under varying flight speeds and network conditions, achieving a 40% increase in average flight distance and a 5% improvement in average Quality of Navigation.
comment: This paper is accepted by the IEEE Internet of Things Journal (IoT-J) for publication in the Special Issue on "Augmented Edge Sensing Intelligence for Low-Altitude IoT Systems"
Nonholonomic Robot Parking by Feedback -- Part I: Modular Strict CLF Designs
It has been known in the robotics literature since about 1995 that, in polar coordinates, the nonholonomic unicycle is asymptotically stabilizable by smooth feedback, even globally. We introduce a modular design framework that selects the forward velocity to decouple the radial coordinate, allowing the steering subsystem to be stabilized independently. Within this structure, we develop families of feedback laws using passivity, backstepping, and integrator forwarding. Each law is accompanied by a strict control Lyapunov function, including barrier variants that enforce angular constraints. These strict CLFs provide constructive class KL convergence estimates and enable eigenvalue assignment at the target equilibrium. The framework generalizes and extends prior modular and nonmodular approaches, while preparing the ground for inverse optimal and adaptive redesigns in the sequel paper.
comment: arXiv admin note: text overlap with arXiv:2509.25575
Multiagent Systems
A Plan Reuse Mechanism for LLM-Driven Agent
Integrating large language models (LLMs) into personal assistants, like Xiao Ai and Blue Heart V, effectively enhances their ability to interact with humans, solve complex tasks, and manage IoT devices. Such assistants are also termed LLM-driven agents. Upon receiving user requests, the LLM-driven agent generates plans using an LLM, executes these plans through various tools, and then returns the response to the user. During this process, the latency for generating a plan with an LLM can reach tens of seconds, significantly degrading user experience. Real-world dataset analysis shows that about 30% of the requests received by LLM-driven agents are identical or similar, which allows the reuse of previously generated plans to reduce latency. However, it is difficult to accurately define the similarity between the request texts received by the LLM-driven agent through directly evaluating the original request texts. Moreover, the diverse expressions of natural language and the unstructured format of plan texts make implementing plan reuse challenging. To address these issues, we present and implement a plan reuse mechanism for LLM-driven agents called AgentReuse. AgentReuse leverages the similarities and differences among requests' semantics and uses intent classification to evaluate the similarities between requests and enable the reuse of plans. Experimental results based on a real-world dataset demonstrate that AgentReuse achieves a 93% effective plan reuse rate, an F1 score of 0.9718, and an accuracy of 0.9459 in evaluating request similarities, reducing latency by 93.12% compared with baselines without using the reuse mechanism.
comment: This paper is an English version of A Plan Reuse Mechanism for LLM-Driven Agent published in 2024 in the Journal of Computer Research and Development
CoTDeceptor:Adversarial Code Obfuscation Against CoT-Enhanced LLM Code Agents
LLM-based code agents(e.g., ChatGPT Codex) are increasingly deployed as detector for code review and security auditing tasks. Although CoT-enhanced LLM vulnerability detectors are believed to provide improved robustness against obfuscated malicious code, we find that their reasoning chains and semantic abstraction processes exhibit exploitable systematic weaknesses.This allows attackers to covertly embed malicious logic, bypass code review, and propagate backdoored components throughout real-world software supply chains.To investigate this issue, we present CoTDeceptor, the first adversarial code obfuscation framework targeting CoT-enhanced LLM detectors. CoTDeceptor autonomously constructs evolving, hard-to-reverse multi-stage obfuscation strategy chains that effectively disrupt CoT-driven detection logic.We obtained malicious code provided by security enterprise, experimental results demonstrate that CoTDeceptor achieves stable and transferable evasion performance against state-of-the-art LLMs and vulnerability detection agents. CoTDeceptor bypasses 14 out of 15 vulnerability categories, compared to only 2 bypassed by prior methods. Our findings highlight potential risks in real-world software supply chains and underscore the need for more robust and interpretable LLM-powered security analysis systems.
FinAgent: An Agentic AI Framework Integrating Personal Finance and Nutrition Planning
The issue of limited household budgets and nutritional demands continues to be a challenge especially in the middle-income environment where food prices fluctuate. This paper introduces a price aware agentic AI system, which combines personal finance management with diet optimization. With household income and fixed expenditures, medical and well-being status, as well as real-time food costs, the system creates nutritionally sufficient meals plans at comparatively reasonable prices that automatically adjust to market changes. The framework is implemented in a modular multi-agent architecture, which has specific agents (budgeting, nutrition, price monitoring, and health personalization). These agents share the knowledge base and use the substitution graph to ensure that the nutritional quality is maintained at a minimum cost. Simulations with a representative Saudi household case study show a steady 12-18\% reduction in costs relative to a static weekly menu, nutrient adequacy of over 95\% and high performance with price changes of 20-30%. The findings indicate that the framework can locally combine affordability with nutritional adequacy and provide a viable avenue of capacity-building towards sustainable and fair diet planning in line with Sustainable Development Goals on Zero Hunger and Good Health.
comment: This paper was presented at the IEEE International Conference on Computing and Applications (ICCA 2025), Bahrain
A Blockchain-Monitored Agentic AI Architecture for Trusted Perception-Reasoning-Action Pipelines
The application of agentic AI systems in autonomous decision-making is growing in the areas of healthcare, smart cities, digital forensics, and supply chain management. Even though these systems are flexible and offer real-time reasoning, they also raise concerns of trust and oversight, and integrity of the information and activities upon which they are founded. The paper suggests a single architecture model comprising of LangChain-based multi-agent system with a permissioned blockchain to guarantee constant monitoring, policy enforcement, and immutable auditability of agentic action. The framework relates the perception conceptualization-action cycle to a blockchain layer of governance that verifies the inputs, evaluates recommended actions, and documents the outcomes of the execution. A Hyperledger Fabric-based system, action executors MCP-integrated, and LangChain agent are introduced and experiments of smart inventory management, traffic-signal control, and healthcare monitoring are done. The results suggest that blockchain-security verification is efficient in preventing unauthorized practices, offers traceability throughout the whole decision-making process, and maintains operational latency within reasonable ranges. The suggested framework provides a universal system of implementing high-impact agentic AI applications that are autonomous yet responsible.
comment: This paper was presented at the IEEE International Conference on Computing and Applications (ICCA 2025), Bahrain
DAO-Agent: Zero Knowledge-Verified Incentives for Decentralized Multi-Agent Coordination
Autonomous Large Language Model (LLM)-based multi-agent systems have emerged as a promising paradigm for facilitating cross-application and cross-organization collaborations. These autonomous agents often operate in trustless environments, where centralized coordination faces significant challenges, such as the inability to ensure transparent contribution measurement and equitable incentive distribution. While blockchain is frequently proposed as a decentralized coordination platform, it inherently introduces high on-chain computation costs and risks exposing sensitive execution information of the agents. Consequently, the core challenge lies in enabling auditable task execution and fair incentive distribution for autonomous LLM agents in trustless environments, while simultaneously preserving their strategic privacy and minimizing on-chain costs. To address this challenge, we propose DAO-Agent, a novel framework that integrates three key technical innovations: (1) an on-chain decentralized autonomous organization (DAO) governance mechanism for transparent coordination and immutable logging; (2) a ZKP mechanism approach that enables Shapley-based contribution measurement off-chain, and (3) a hybrid on-chain/off-chain architecture that verifies ZKP-validated contribution measurements on-chain with minimal computational overhead. We implement DAO-Agent and conduct end-to-end experiments using a crypto trading task as a case study. Experimental results demonstrate that DAO-Agent achieves up to 99.9% reduction in verification gas costs compared to naive on-chain alternatives, with constant-time verification complexity that remains stable as coalition size increases, thereby establishing a scalable foundation for agent coordination in decentralized environments.
comment: 10 pages, 1 figure
Transductive Visual Programming: Evolving Tool Libraries from Experience for Spatial Reasoning
Spatial reasoning in 3D scenes requires precise geometric calculations that challenge vision-language models. Visual programming addresses this by decomposing problems into steps calling specialized tools, yet existing methods rely on either fixed toolsets or speculative tool induction before solving problems, resulting in suboptimal programs and poor utilization of induced tools. We present Transductive Visual Programming (TVP), a novel framework that builds new tools from its own experience rather than speculation. TVP first solves problems using basic tools while accumulating experiential solutions into an Example Library, then abstracts recurring patterns from these programs into reusable higher-level tools for an evolving Tool Library. This allows TVP to tackle new problems with increasingly powerful tools learned from experience. On Omni3D-Bench, TVP achieves state-of-the-art performance, outperforming GPT-4o by 22% and the previous best visual programming system by 11%. Our transductively learned tools are used 5x more frequently as core program dependency than inductively created ones, demonstrating more effective tool discovery and reuse. The evolved tools also show strong generalization to unseen spatial tasks, achieving superior performance on benchmarks from SpatialScore-Hard collection without any testset-specific modification. Our work establishes experience-driven transductive tool creation as a powerful paradigm for building self-evolving visual programming agents that effectively tackle challenging spatial reasoning tasks. We release our code at https://transductive-visualprogram.github.io/.
comment: Project Website: https://transductive-visualprogram.github.io/
Developing a Fundamental Diagram for Urban Air Mobility Based on Physical Experiments
Urban Air Mobility (UAM) is an emerging application of unmanned aerial vehicles (UAVs) that promises to reduce travel time and alleviate congestion in urban transportation systems. As drone density increases, UAM operations are expected to experience congestion similar to that in ground traffic. However, the fundamental characteristics of UAM traffic flow, particularly under real-world operating conditions, remain poorly understood. This study proposes a general framework for constructing the fundamental diagram (FD) of UAM traffic by integrating theoretical analysis with physical experiments. To the best of our knowledge, this is the first study to derive a UAM FD using real-world physical test data. On the theoretical side, we design two drone control laws for collision avoidance and develop simulation-based traffic generation methods to produce diverse UAM traffic scenarios. Based on Edie's definition, traffic flow theory is then applied to construct the FD and characterize the macroscopic properties of UAM traffic. To account for real-world disturbances and modeling uncertainties, we further conduct physical experiments on a reduced-scale testbed using Bitcraze Crazyflie drones. Both simulation and physical test trajectory data are collected and organized into the UAMTra2Flow dataset, which is analyzed using the proposed framework. Preliminary results indicate that classical FD structures for ground transportation are also applicable to UAM systems. Notably, FD curves obtained from physical experiments exhibit deviations from simulation-based results, highlighting the importance of experimental validation. Finally, results from the reduced-scale testbed are scaled to realistic operating conditions to provide practical insights for future UAM traffic systems. The dataset and code for this paper are publicly available at https://github.com/CATS-Lab/UAM-FD.
AgentTutor: Empowering Personalized Learning with Multi-Turn Interactive Teaching in Intelligent Education Systems AAAI2026
The rapid advancement of large-scale language models (LLMs) has shown their potential to transform intelligent education systems (IESs) through automated teaching and learning support applications. However, current IESs often rely on single-turn static question-answering, which fails to assess learners' cognitive levels, cannot adjust teaching strategies based on real-time feedback, and is limited to providing simple one-off responses. To address these issues, we introduce AgentTutor, a multi-turn interactive intelligent education system to empower personalized learning. It features an LLM-powered generative multi-agent system and a learner-specific personalized learning profile environment that dynamically optimizes and delivers teaching strategies based on learners' learning status, personalized goals, learning preferences, and multimodal study materials. It includes five key modules: curriculum decomposition, learner assessment, dynamic strategy, teaching reflection, and knowledge & experience memory. We conducted extensive experiments on multiple benchmark datasets, AgentTutor significantly enhances learners' performance while demonstrating strong effectiveness in multi-turn interactions and competitiveness in teaching quality among other baselines.
comment: AAAI2026 Workshop AI4EDU
Electromagnetic Formation Flying Using Alternating Magnetic Field Forces and Control Barrier Functions for State and Input Constraints
This article presents a feedback control algorithm for electromagnetic formation flying with constraints on the satellites' states and control inputs. The algorithm combines several key techniques. First, we use alternating magnetic field forces to decouple the electromagnetic forces between each pair of satellites in the formation. Each satellite's electromagnetic actuation system is driven by a sum of amplitude-modulated sinusoids, where amplitudes are controlled in order to prescribe the time-averaged force between each pair of satellites. Next, the desired time-averaged force is computed from a optimal control that satisfies state constraints (i.e., no collisions and an upper limit on intersatellite speeds) and input constraints (i.e., not exceeding satellite's apparent power capability). The optimal time-averaged force is computed using a single relaxed control barrier function that is obtained by composing multiple control barrier functions that are designed to enforce each state and input constraint. Finally, we demonstrate the satellite formation control method in numerical simulations.
comment: Preprint submitted to IEEE Transactions on Aerospace and Electronic Systems (TAES). arXiv admin note: substantial text overlap with arXiv:2411.16908
Computational Foundations for Strategic Coopetition: Formalizing Interdependence and Complementarity
Coopetition refers to simultaneous cooperation and competition among actors wherein actors 'cooperate to grow the pie and compete to split it up.' Modern socio-technical systems are characterized by strategic coopetition wherein actors concomitantly cooperate to create value and compete to capture it. While conceptual modeling languages such as i* provide rich qualitative representations of strategic dependencies, they lack mechanisms for quantitative analysis of dynamic trade-offs. Conversely, classical game theory offers mathematical rigor but strips away contextual richness. This report bridges this gap by developing computational foundations that formalize two critical dimensions of coopetition: interdependence and complementarity. We ground interdependence in i* structural dependency analysis, translating depender-dependee-dependum relationships into quantitative interdependence coefficients via a structured translation framework. We formalize complementarity following Brandenburger and Nalebuff's Added Value concept, modeling synergistic value creation with validated parameterization. We integrate structural dependencies with bargaining power in value appropriation and introduce a game-theoretic formulation where Nash Equilibrium incorporates structural interdependence. Validation combines over 22,000 experimental trials across power and logarithmic specifications with the Samsung-Sony S-LCD joint venture (2004-2011). Under strict historical alignment scoring, logarithmic specifications achieve 58/60 compared to power functions (46/60), producing realistic 41% cooperation increases aligning with documented S-LCD patterns while power functions produce 166% increases exceeding realistic bounds. Statistical significance confirmed at p < 0.001, Cohen's d > 9.
comment: 39 pages, 9 figures, This technical report serves as the foundational reference for a coordinated research program examining strategic coopetition in requirements engineering and multi-agent systems, with companion work addressing trust dynamics, team production, and reciprocity mechanisms
Consistent Opponent Modeling in Imperfect-Information Games
The goal of agents in multi-agent environments is to maximize total reward against the opposing agents that are encountered. Following a game-theoretic solution concept, such as Nash equilibrium, may obtain a strong performance in some settings; however, such approaches fail to capitalize on historical and observed data from repeated interactions against our opponents. Opponent modeling algorithms integrate machine learning techniques to exploit suboptimal opponents utilizing available data; however, the effectiveness of such approaches in imperfect-information games to date is quite limited. We show that existing opponent modeling approaches fail to satisfy a simple desirable property even against static opponents drawn from a known prior distribution; namely, they do not guarantee that the model approaches the opponent's true strategy even in the limit as the number of game iterations approaches infinity. We develop a new algorithm that is able to achieve this property and runs efficiently by solving a convex minimization problem based on the sequence-form game representation using projected gradient descent. The algorithm is guaranteed to efficiently converge to the opponent's true strategy under standard Bayesian identifiability and visitation assumptions, given observations from gameplay and possibly additional historical data if it is available.
Systems and Control (EESS)
A Lyapunov-Based Small-Gain Theorem for Fixed-Time ISS: Theory, Optimization, and Games
We develop a Lyapunov-based small-gain theorem for establishing fixed-time input-to-state stability (FxT-ISS) guarantees in interconnected nonlinear dynamical systems. The proposed framework considers interconnections in which each subsystem admits a FxT-ISS Lyapunov function, providing robustness with respect to external inputs. We show that, under an appropriate nonlinear small-gain condition, the overall interconnected system inherits the FxT-ISS property. In this sense, the proposed result complements existing Lyapunov-based smallgain theorems for asymptotic and finite-time stability, and enables a systematic analysis of interconnection structures exhibiting fixed-time stability. To illustrate the applicability of the theory, we study feedback-based optimization problems with time-varying cost functions, and Nash-equilibrium seeking for noncooperative games with nonlinear dynamical plants in the loop. For both problems, we present a class of non-smooth gradient or pseudogradient-based controllers that achieve fixed-time convergence without requiring time-scale separation and using real-time feedback. Numerical examples are provided to validate the theoretical findings.
Enhancing Grid Resilience for Giga-Watt Scale Data Centers Using High Voltage Circuit Breaker Operated Braking Resistors
As hyperscale and co-located data centers scale, the electric grid sees an increase in large, voltage-sensitive IT loads with these data center plant size ranging between 500 MW to 2 GW. A sudden loss of these loads as they switch to onsite UPS during grid voltage excursion events causes a grid frequency rise from generation and load imbalance, and a voltage rise because less power is flowing through the network. This paper proposes and theoretically demonstrates the use of high voltage circuit breaker operated braking resistors at data center transmission substations as an effective strategy in enhancing grid resilience under such large load loss scenarios. We developed a test bed to illustrate the dynamic behavior of the system with resistive braking on a gigawatt scale data center load cluster connected to a 345 kV network. The braking resistor(s), which in the case of inverter rich system comes in a multi-stage configuration, are connected or disconnected via high-speed circuit breaker(s). Results show that insertion for 0.25 to 0.85 seconds sufficiently reduce rate of change of frequency and provides time for primary governor response and capacitor switching to restore steady state. Sensitivity across different synchronous machines and inverter-based resource mix are tested and confirms robustness. We conclude circuit breaker controlled resistive braking is a practical means to enhance Bulk Electric System (BES) resilience for gigawatt scale data centers. The approach integrates with protection, needs no generator changes, and can be scaled with cluster size or growth of the data center facility load.
comment: Provincially accepted for publication in 2025 IEEE International Conference on Energy Technologies for Future Grids (ETFG) conference proceedings
$\mathcal{K}$-Lorentzian Polynomials, Semipositive Cones, and Cone-Stable EVI Systems
Lorentzian and completely log-concave polynomials have recently emerged as a unifying framework for negative dependence, log-concavity, and convexity in combinatorics and probability. We extend this theory to variational analysis and cone-constrained dynamics by studying $K$-Lorentzian and $K$-completely log-concave polynomials over a proper convex cone $K\subset\mathbb{R}^n$. For a $K$-Lorentzian form $f$ and $v\in\operatorname{int}K$, we define an open cone $K^\circ(f,v)$ and a closed cone $K(f,v)$ via directional derivatives along $v$, recovering the usual hyperbolicity cone when $f$ is hyperbolic. We prove that $K^\circ(f,v)$ is a proper cone and equals $\operatorname{int}K(f,v)$. If $f$ is $K(f,v)$-Lorentzian, then $K(f,v)$ is convex and maximal among convex cones on which $f$ is Lorentzian. Using the Rayleigh matrix $M_f(x)=\nabla f(x)\nabla f(x)^T - f(x)\nabla^2 f(x)$, we obtain cone-restricted Rayleigh inequalities and show that two-direction Rayleigh inequalities on $K$ are equivalent to an acuteness condition for the bilinear form $v^T M_f(x) w$. This yields a cone-restricted negative-dependence interpretation linking the curvature of $\log f$ to covariance properties of associated Gibbs measures. For determinantal generating polynomials, we identify the intersection of the hyperbolicity cone with the nonnegative orthant as the classical semipositive cone, and we extend this construction to general proper cones via $K$-semipositive cones. Finally, for linear evolution variational inequality (LEVI) systems, we show that if $q(x)=x^T A x$ is (strictly) $K$-Lorentzian, then $A$ is (strictly) $K$-copositive and yields Lyapunov (semi-)stability on $K$, giving new Lyapunov criteria for cone-constrained dynamics.
comment: 23 pages, 5 figures
ARX-Implementation of encrypted nonlinear dynamic controllers using observer form
While computation-enabled cryptosystems applied to control systems have improved security and privacy, a major issue is that the number of recursive operations on encrypted data is limited to a finite number of times in most cases, especially where fast computation is required. To allow for nonlinear dynamic control under this constraint, a method for representing a state-space system model as an auto-regressive model with exogenous inputs (ARX model) is proposed. With the input as well as the output of the plant encrypted and transmitted to the controller, the reformulated ARX form can compute each output using only a finite number of operations, from its several previous inputs and outputs. Existence of a stable observer for the controller is a key condition for the proposed representation. The representation replaces the controller with an observer form and applies a method similar to finite-impulse-response approximation. It is verified that the approximation error and its effect can be made arbitrarily small by an appropriate choice of a parameter, under stability of the observer and the closed-loop system. Simulation results demonstrate the effectiveness of the proposed method.
comment: 5 pages, 2 figures
Relative Localization System Design for SnailBot: A Modular Self-reconfigurable Robot
This paper presents the design and implementation of a relative localization system for SnailBot, a modular self reconfigurable robot. The system integrates ArUco marker recognition, optical flow analysis, and IMU data processing into a unified fusion framework, enabling robust and accurate relative positioning for collaborative robotic tasks. Experimental validation demonstrates the effectiveness of the system in realtime operation, with a rule based fusion strategy ensuring reliability across dynamic scenarios. The results highlight the potential for scalable deployment in modular robotic systems.
comment: 7 pages, 7 figures, 4 algorithms
Wireless Center of Pressure Feedback System for Humanoid Robot Balance Control using ESP32-C3
Maintaining stability during the single-support phase is a fundamental challenge in humanoid robotics, particularly in dance robots that require complex maneuvers and high mechanical freedom. Traditional tethered sensor configurations often restrict joint movement and introduce mechanical noises. This study proposes a wireless embedded balance system designed to maintain stability on uneven surfaces. The system utilizes a custom-designed foot unit integrated with four load cells and an ESP32-C3 microcontroller to estimate the Center of Pressure (CoP) in real time. The CoP data were transmitted wirelessly to the main controller to minimize the wiring complexity of the 29-DoF VI-ROSE humanoid robot. A PID control strategy is implemented to adjust the torso, hip, and ankle roll joints based on CoP feedback. Experimental characterization demonstrated high sensor precision with an average measurement error of 14.8 g. Furthermore, the proposed control system achieved a 100% success rate in maintaining balance during single-leg lifting tasks at a 3-degree inclination with optimized PID parameters (Kp=0.10, Kd=0.005). These results validate the efficacy of wireless CoP feedback in enhancing the postural stability of humanoid robots, without compromising their mechanical flexibility.
A Multimodal Human-Centered Framework for Assessing Pedestrian Well-Being in the Wild
Pedestrian well-being is a critical yet rarely measured component of sustainable urban mobility and livable city design. Existing approaches to evaluating pedestrian environments often rely on static, infrastructure-based indices or retrospective surveys, which overlook the dynamic, subjective, and psychophysiological dimensions of everyday walking experience. This paper introduces a multimodal, human-centered framework for assessing pedestrian well-being in the wild by integrating three complementary data streams: continuous physiological sensing, geospatial tracking, and momentary self-reports collected using the Experience Sampling Method. The framework conceptualizes pedestrian experience as a triangulation enabling a holistic understanding of how urban environments influence well-being. The utility of our framework is then demonstrated through a naturalistic case study conducted in the Greater Philadelphia region, in which participants wore research-grade wearable sensors and carried GPS-enabled smartphones during their regular daily activities. Physiological indicators of autonomic nervous system activity, including heart rate variability and electrodermal activity, were synchronized with spatial trajectories and in situ self-reports of stress, affect, and perceived infrastructure conditions. Results illustrate substantial inter- and intra-individual variability in both subjective experience and physiological response, as well as context-dependent patterns associated with traffic exposure, pedestrian infrastructure quality, and environmental enclosure. The findings also suggest that commonly used walkability indices may not fully capture experiential dimensions of pedestrian well-being. By enabling real-world, multimodal measurement of pedestrian experience, the proposed framework offers a scalable and transferable approach for advancing human-centered urban analytics.
Safe Navigation with Zonotopic Tubes: An Elastic Tube-based MPC Framework
This paper presents an elastic tube-based model predictive control (MPC) framework for unknown discrete-time linear systems subject to disturbances. Unlike most existing elastic tube-based MPC methods, we do not assume perfect knowledge of the system model or disturbance realizations bounds. Instead, a conservative zonotopic disturbance set is initialized and iteratively refined using data and prior knowledge: data are used to identify matrix zonotope model sets for the system dynamics, while prior physical knowledge is employed to discard models and disturbances inconsistent with known constraints. This process yields constrained matrix zonotopes representing disturbance realizations and dynamics that enable a principled fusion of offline information with limited online data, improving MPC feasibility and performance. The proposed design leverages closed-loop system characterization to learn and refine control gains that maintain a small tube size. By separating open-loop model mismatch from closed-loop effects in the error dynamics, the method avoids dependence on the size of the state and input operating regions, thereby reducing conservatism. An adaptive co-design of the tube and ancillary feedback ensures $λ$-contractive zonotopic tubes, guaranteeing robust positive invariance, improved feasibility margins, and enhanced disturbance tolerance. We establish recursive feasibility conditions and introduce a polyhedral Lyapunov candidate for the error tube, proving exponential stability of the closed-loop error dynamics under the adaptive tube-gain updates. Simulations demonstrate improved robustness, enlarged feasibility regions, and safe closed-loop performance using only a small amount of online data.
Flocking phase transition and threat responses in bio-inspired autonomous drone swarms
Collective motion inspired by animal groups offers powerful design principles for autonomous aerial swarms. We present a bio-inspired 3D flocking algorithm in which each drone interacts only with a minimal set of influential neighbors, relying solely on local alignment and attraction cues. By systematically tuning these two interaction gains, we map a phase diagram revealing sharp transitions between swarming and schooling, as well as a critical region where susceptibility, polarization fluctuations, and reorganization capacity peak. Outdoor experiments with a swarm of ten drones, combined with simulations using a calibrated flight-dynamics model, show that operating near this transition enhances responsiveness to external disturbances. When confronted with an intruder, the swarm performs rapid collective turns, transient expansions, and reliably recovers high alignment within seconds. These results demonstrate that minimal local-interaction rules are sufficient to generate multiple collective phases and that simple gain modulation offers an efficient mechanism to adjust stability, flexibility, and resilience in drone swarms.
Dyna-Style Reinforcement Learning Modeling and Control of Non-linear Dynamics
Controlling systems with complex, nonlinear dynamics poses a significant challenge, particularly in achieving efficient and robust control. In this paper, we propose a Dyna-Style Reinforcement Learning control framework that integrates Sparse Identification of Nonlinear Dynamics (SINDy) with Twin Delayed Deep Deterministic Policy Gradient (TD3) reinforcement learning. SINDy is used to identify a data-driven model of the system, capturing its key dynamics without requiring an explicit physical model. This identified model is used to generate synthetic rollouts that are periodically injected into the reinforcement learning replay buffer during training on the real environment, enabling efficient policy learning with limited data available. By leveraging this hybrid approach, we mitigate the sample inefficiency of traditional model-free reinforcement learning methods while ensuring accurate control of nonlinear systems. To demonstrate the effectiveness of this framework, we apply it to a bi-rotor system as a case study, evaluating its performance in stabilization and trajectory tracking. The results show that our SINDy-TD3 approach achieves superior accuracy and robustness compared to direct reinforcement learning techniques, highlighting the potential of combining data-driven modeling with reinforcement learning for complex dynamical systems.
LSTM-Based Modeling and Reinforcement Learning Control of a Magnetically Actuated Catheter
Autonomous magnetic catheter systems are emerging as a promising approach for the future of minimally invasive interventions. This study presents a novel approach that begins by modeling the nonlinear and hysteretic dynamics of a magnetically actuated catheter system, consists of a magnetic catheter manipulated by servo-controlled magnetic fields generated by two external permanent magnets, and its complex behavior is captured using a Long Short-Term Memory (LSTM) neural network. This model validated against experimental setup's data with a root mean square error (RMSE) of 0.42 mm and 99.8% coverage within 3 mm, establishing it as a reliable surrogate model. This LSTM enables the training of Reinforcement Learning (RL) agents for controlling the system and avoiding damage to the real setup, with the potential for subsequent fine-tuning on the physical system. We implemented Deep Q-Network (DQN) and actor-critic RL controllers, comparing these two agents first for regulation and subsequently for path following along linear and half-sinusoidal paths for the catheter tip. The actor-critic outperforms DQN, offering greater accuracy and faster performance with less error, along with smoother trajectories at a 10 Hz sampling rate, in both regulation and path following compared to the DQN controller. This performance, due to the continuous action space, suits dynamic navigation tasks like navigating curved vascular structures for practical applications.
comment: Presented at the 13th RSI International Conference on Robotics and Mechatronics (ICRoM 2025), Dec. 16-18, 2025, Tehran, Iran
Energy-Gain Control of Time-Varying Systems: Receding Horizon Approximation
Standard formulations of prescribed worstcase disturbance energy-gain control policies for linear time-varying systems depend on all forward model data. In a discrete-time setting, this dependence arises through a backward Riccati recursion. The aim herein is to consider the infinite-horizon $\ell_2$ gain performance of state feedback policies with only finite receding-horizon preview of the model parameters. The proposed synthesis of controllers subject to such a constraint leverages the strict contraction of lifted Riccati operators under uniform controllability and observability. The main approximation result establishes a sufficient number of preview steps for the performance loss to remain below any set tolerance, relative to the baseline gain bound of the associated infinite-preview controller. Aspects of the main result are explored in the context of a numerical example.
comment: Accepted to appear in IEEE TAC
Partitioned robustness analysis of networks with uncertain links
An input-output model for networks with link uncertainty is developed. The main result presents a set of integral quadratic constraints (IQCs) that collectively imply robust stability of the uncertain network dynamics. The model dependency of each IQC is localized according to an edge-based partition of the network graph. The class of admissible network partitions affords scope for trading-off scalability against conservativeness. This is illustrated by numerical example.
comment: Submitted
Universal Transient Stability Analysis: A Large Language Model-Enabled Dynamics Prediction Framework
Existing dynamics prediction frameworks for transient stability analysis (TSA) fail to achieve multi-scenario "universality"--the inherent ability of a single, pre-trained architecture to generalize across diverse operating conditions, unseen faults, and heterogeneous systems. To address this, this paper proposes TSA-LLM, a large language model (LLM)-based universal framework that models multi-variate transient dynamics prediction as a univariate generative task with three key innovations: First, a novel data processing pipeline featuring channel independence decomposition to resolve dimensional heterogeneity, sample-wise normalization to eliminate separate stable or unstable pipelines, and temporal patching for efficient long-sequence modeling; Second, a parameter-efficient freeze-and-finetune strategy that augments the LLM's architecture with dedicated input embedding and output projection layers while freezing core transformer blocks to preserve generic feature extraction capabilities; Third, a two-stage fine-tuning scheme that combines teacher forcing, which feeds the model ground-truth data during initial training, with scheduled sampling, which gradually shifts to leveraging model-generated predictions, to mitigate cumulative errors in long-horizon iterative prediction. Comprehensive testing demonstrates the framework's universality, as TSA-LLM trained solely on the New England 39-bus system achieves zero-shot generalization to mixed stability conditions and unseen faults, and matches expert performance on the larger Iceland 189-bus system with only 5% fine-tuning data. This multi-scenario versatility validates a universal framework that eliminates scenario-specific retraining and achieves scalability via large-scale parameters and cross-scenario training data.
Decentralized water-level balancing for irrigation channels in storage critical operations
A feedback control system is proposed for balancing the deviations of water levels from set-points along open channels subject to uncertain supply-demand mismatch that exceeds individual pool capacity. Decentralized controllers adjust the gate flows between pools to regulate potentially weighted differences between neighbouring water-level errors to zero in steady state. A sequential SISO loop-shaping procedure is developed for the design of each local flow controller based on distributed parameter transfer function models of the channel dynamics. Recursive feasibility of the procedure for relevant performance specifications, and stability of the resulting MIMO closed-loop, are verified by supporting analysis. Both numerical simulations and field trial results are presented.
comment: Accepted to appear in IEEE Transactions on Control Systems Technology
Accelerating Underground Pumped Hydro Energy Storage Scheduling with Decision-Focused Learning
Underground pumped hydro energy storage (UPHES) systems play a critical role in grid-scale energy storage for renewable integration, yet optimal day-ahead scheduling remains computationally prohibitive due to nonlinear turbine performance characteristics and discrete operational modes. This paper presents a decision-focused learning (DFL) framework that addresses the computational-accuracy trade-off in UPHES day-ahead scheduling. The proposed methodology employs neural networks to predict penalty weights that guide recursive linearization, transforming the intractable MINLP into a sequence of convex quadratic programs trained end-to-end via differentiable optimization layers. Case studies across 19 representative Belgian electricity market scenarios demonstrate that the DFL framework effectively navigates the trade-off between solution quality and computation time. As a refinement tool, the framework improves profit by 1.1% over piecewise MIQP baselines. Alternatively, as a real-time scheduler initialized with linear approximations, it achieves a 300-fold speedup (3.87s vs 1205.79s) while maintaining profitability within 3.6% of the piecewise MIQP benchmark. Thus, the presented DFL framework enables flexible prioritization between profit maximization and real-time responsiveness.
Systemization of Knowledge: Resilience and Fault Tolerance in Cyber-Physical Systems
Cyber-Physical Systems (CPS) now support critical infrastructure spanning transportation, energy, manufacturing, medical devices, and autonomous robotics. Their defining characteristic is the tight coupling between digital computation and continuous physical dynamics which enables sophisticated autonomy but also creates highly non-linear failure modes. Small disturbances at sensors, firmware, networks, or physical interfaces can propagate through estimation and control pipelines, producing cascading instabilities that defy traditional single-layer reasoning. This Systematization of Knowledge (SoK) unifies nearly two decades of CPS resilience research into a structured Origin-Layer-Effect (OLE) taxonomy. This taxonomy provides a cross-layer lens for understanding how faults arise, how they propagate, and why unrelated CPS failures often share deep structural similarities. By mapping representative systems including RockDrone, MAYDAY, M2MON, HACMS, Byzantine fault-tolerant control, and learning-based recovery mechanisms onto the taxonomy, we reveal patterns of coverage, persistent blind spots, and recurring pathways of fault amplification. Our analysis identifies four structural gaps that span multiple CPS domains: (1) physical-model manipulation, (2) ML-enabled control without stability guarantees, (3) semantic inconsistencies between formal models and firmware, and (4) inadequate forensic visibility across cyber and physical layers. These insights motivate new directions for resilient CPS design, integrating robust control, runtime monitoring, formal assurance, and system-level visibility.
comment: Systemization of knowledge paper. Approximately 13 pages, 3 figures, 3 tables
Early warning signals for loss of control
Maintaining stability in feedback systems, from aircraft and autonomous robots to biological and physiological systems, relies on monitoring their behavior and continuously adjusting their inputs. Incremental damage can make such control fragile. This tends to go unnoticed until a small perturbation induces instability (i.e. loss of control). Traditional methods in the field of engineering rely on accurate system models to compute a safe set of operating instructions, which become invalid when the, possibly damaged, system diverges from its model. Here we demonstrate that the approach of such a feedback system towards instability can nonetheless be monitored through dynamical indicators of resilience. This holistic system safety monitor does not rely on a system model and is based on the generic phenomenon of critical slowing down, shown to occur in the climate, biology and other complex nonlinear systems approaching criticality. Our findings for engineered devices opens up a wide range of applications involving real-time early warning systems as well as an empirical guidance of resilient system design exploration, or "tinkering". While we demonstrate the validity using drones, the generic nature of the underlying principles suggest that these indicators could apply across a wider class of controlled systems including reactors, aircraft, and self-driving cars.
Robustness Certificates for Neural Networks against Adversarial Attacks
The increasing use of machine learning in safety-critical domains amplifies the risk of adversarial threats, especially data poisoning attacks that corrupt training data to degrade performance or induce unsafe behavior. Most existing defenses lack formal guarantees or rely on restrictive assumptions about the model class, attack type, extent of poisoning, or point-wise certification, limiting their practical reliability. This paper introduces a principled formal robustness certification framework that models gradient-based training as a discrete-time dynamical system (dt-DS) and formulates poisoning robustness as a formal safety verification problem. By adapting the concept of barrier certificates (BCs) from control theory, we introduce sufficient conditions to certify a robust radius ensuring that the terminal model remains safe under worst-case ${\ell}_p$-norm based poisoning. To make this practical, we parameterize BCs as neural networks trained on finite sets of poisoned trajectories. We further derive probably approximately correct (PAC) bounds by solving a scenario convex program (SCP), which yields a confidence lower bound on the certified robustness radius generalizing beyond the training set. Importantly, our framework also extends to certification against test-time attacks, making it the first unified framework to provide formal guarantees in both training and test-time attack settings. Experiments on MNIST, SVHN, and CIFAR-10 show that our approach certifies non-trivial perturbation budgets while being model-agnostic and requiring no prior knowledge of the attack or contamination level.
Lyapunov-Based Kolmogorov-Arnold Network (KAN) Adaptive Control
Recent advancements in Lyapunov-based Deep Neural Networks (Lb-DNNs) have demonstrated improved performance over shallow NNs and traditional adaptive control for nonlinear systems with uncertain dynamics. Existing Lb-DNNs rely on multi-layer perceptrons (MLPs), which lack interpretable insights. As a first step towards embedding interpretable insights in the control architecture, this paper develops the first Lyapunov-based Kolmogorov-Arnold Networks (Lb-KAN) adaptive control method for uncertain nonlinear systems. Unlike MLPs with deep-layer matrix multiplications, KANs provide interpretable insights by direct functional decomposition. In this framework, KANs are employed to approximate uncertain dynamics and embedded into the control law, enabling visualizable functional decomposition. The analytical update laws are constructed from a Lyapunov-based analysis for real-time learning without prior data in a KAN architecture. The analysis uses the distinct KAN approximation theorem to formally bound the reconstruction error and its effect on the performance. The update law is derived by incorporating the KAN's learnable parameters into a Jacobian matrix, enabling stable, analytical, real-time adaptation and ensuring asymptotic convergence of tracking errors. Moreover, the Lb-KAN provides a foundation for interpretability characteristics by achieving visualizable functional decomposition. Simulation results demonstrate that the Lb-KAN controller reduces the function approximation error by 20.2% and 18.0% over the baseline Lb-LSTM and Lb-DNN methods, respectively.
A Survey of Freshness-Aware Wireless Networking with Reinforcement Learning
The age of information (AoI) has become a central measure of data freshness in modern wireless systems, yet existing surveys either focus on classical AoI formulations or provide broad discussions of reinforcement learning (RL) in wireless networks without addressing freshness as a unified learning problem. Motivated by this gap, this survey examines RL specifically through the lens of AoI and generalized freshness optimization. We organize AoI and its variants into native, function-based, and application-oriented families, providing a clearer view of how freshness should be modeled in B5G and 6G systems. Building on this foundation, we introduce a policy-centric taxonomy that reflects the decisions most relevant to freshness, consisting of update-control RL, medium-access RL, risk-sensitive RL, and multi-agent RL. This structure provides a coherent framework for understanding how learning can support sampling, scheduling, trajectory planning, medium access, and distributed coordination. We further synthesize recent progress in RL-driven freshness control and highlight open challenges related to delayed decision processes, stochastic variability, and cross-layer design. The goal is to establish a unified foundation for learning-based freshness optimization in next-generation wireless networks.
Safe Path Planning and Observation Quality Enhancement Strategy for Unmanned Aerial Vehicles in Water Quality Monitoring Tasks
Unmanned Aerial Vehicle (UAV) spectral remote sensing technology is widely used in water quality monitoring. However, in dynamic environments, varying illumination conditions, such as shadows and specular reflection (sun glint), can cause severe spectral distortion, thereby reducing data availability. To maximize the acquisition of high-quality data while ensuring flight safety, this paper proposes an active path planning method for dynamic light and shadow disturbance avoidance. First, a dynamic prediction model is constructed to transform the time-varying light and shadow disturbance areas into three-dimensional virtual obstacles. Second, an improved Interfered Fluid Dynamical System (IFDS) algorithm is introduced, which generates a smooth initial obstacle avoidance path by building a repulsive force field. Subsequently, a Model Predictive Control (MPC) framework is employed for rolling-horizon path optimization to handle flight dynamics constraints and achieve real-time trajectory tracking. Furthermore, a Dynamic Flight Altitude Adjustment (DFAA) mechanism is designed to actively reduce the flight altitude when the observable area is narrow, thereby enhancing spatial resolution. Simulation results show that, compared with traditional PID and single obstacle avoidance algorithms, the proposed method achieves an obstacle avoidance success rate of 98% in densely disturbed scenarios, significantly improves path smoothness, and increases the volume of effective observation data by approximately 27%. This research provides an effective engineering solution for precise UAV water quality monitoring in complex illumination environments.
Electromagnetic Formation Flying Using Alternating Magnetic Field Forces and Control Barrier Functions for State and Input Constraints
This article presents a feedback control algorithm for electromagnetic formation flying with constraints on the satellites' states and control inputs. The algorithm combines several key techniques. First, we use alternating magnetic field forces to decouple the electromagnetic forces between each pair of satellites in the formation. Each satellite's electromagnetic actuation system is driven by a sum of amplitude-modulated sinusoids, where amplitudes are controlled in order to prescribe the time-averaged force between each pair of satellites. Next, the desired time-averaged force is computed from a optimal control that satisfies state constraints (i.e., no collisions and an upper limit on intersatellite speeds) and input constraints (i.e., not exceeding satellite's apparent power capability). The optimal time-averaged force is computed using a single relaxed control barrier function that is obtained by composing multiple control barrier functions that are designed to enforce each state and input constraint. Finally, we demonstrate the satellite formation control method in numerical simulations.
comment: Preprint submitted to IEEE Transactions on Aerospace and Electronic Systems (TAES). arXiv admin note: substantial text overlap with arXiv:2411.16908
Improving Action Smoothness for a Cascaded Online Learning Flight Control System
This paper aims to improve the action smoothness of a cascaded online learning flight control system. Although the cascaded structure is widely used in flight control design, its stability can be compromised by oscillatory control actions, which poses challenges for practical engineering applications. To address this issue, we introduce an online temporal smoothness technique and a low-pass filter to reduce the amplitude and frequency of the control actions. Fast Fourier Transform (FFT) is used to analyze policy performance in the frequency domain. Simulation results demonstrate the improvements achieved by the two proposed techniques.
Differentially Private Gradient-Tracking-Based Distributed Stochastic Optimization over Directed Graphs
This paper proposes a differentially private gradient-tracking-based distributed stochastic optimization algorithm over directed graphs. In particular, privacy noises are incorporated into each agent's state and tracking variable to mitigate information leakage, after which the perturbed states and tracking variables are transmitted to neighbors. We design two novel schemes for the step-sizes and the sampling number within the algorithm. The sampling parameter-controlled subsampling method employed by both schemes enhances the differential privacy level, and ensures a finite cumulative privacy budget even over infinite iterations. The algorithm achieves both almost sure and mean square convergence for nonconvex objectives. Furthermore, when nonconvex objectives satisfy the Polyak-Lojasiewicz condition, Scheme (S1) achieves a polynomial mean square convergence rate, and Scheme (S2) achieves an exponential mean square convergence rate. The trade-off between privacy and convergence is presented. The effectiveness of the algorithm and its superior performance compared to existing works are illustrated through numerical examples of distributed training on the benchmark datasets "MNIST" and "CIFAR-10".
Turbulent Multiple-Scattering Channel Modeling for Ultraviolet Communications: A Monte-Carlo Integration Approach
Modeling of multiple-scattering channels in atmospheric turbulence is essential for the performance analysis of long-distance non-line-of-sight (NLOS) ultraviolet (UV) communications. Existing works on the turbulent channel modeling for NLOS UV communications either focused on single-scattering cases or estimate the turbulent fluctuation effect in an unreliable way based on Monte-Carlo simulation (MCS) approach. In this paper, we establish a comprehensive turbulent multiple-scattering channel model by using a more efficient Monte-Carlo integration (MCI) approach for NLOS UV communications, where both the scattering, absorption, and turbulence effects are considered. Compared with the MCS approach, the MCI approach is more interpretable for estimating the turbulent fluctuation. To achieve this, we first introduce the scattering, absorption, and turbulence effects for NLOS UV communications in turbulent channels. Then we propose the estimation methods based on MCI approach for estimating both the turbulent fluctuation and the distribution of turbulent fading coefficient. Numerical results demonstrate that the turbulence-induced scattering effect can always be ignored for typical UV communication scenarios. Besides, the turbulent fluctuation will increase as either the communication distance increases or the zenith angle decreases, which is compatible with existing experimental results and also with our experimental results. Moreover, we demonstrate numerically that the distribution of the turbulent fading coefficient for UV multiple-scattering channels under all turbulent conditions can be approximated as log-normal distribution; and we also demonstrate both numerically and experimentally that the turbulent fading can be approximated as a Gaussian distribution under weak turbulence.
comment: 28 pages,9 figures
Closed-Loop Control Law for Low Thrust Orbit Transfer with Guaranteed Stability
Electric propulsion is used to maximize payload capacity in communication satellites. These orbit raising maneuvers span several months and hundreds of revolutions, making trajectory design a complex challenge. The literature typically addresses this problem using feedback laws, with Q-law being one of the most prominent approaches. However, Q-law suffers from closed-loop stability issues, limiting its suitability for real-time on-board implementation. In this work, we focus on closed-loop orbit raising rather than offline trajectory planning and address the stability limitations of the Q-law through a Lyapunov based control design. A Lyapunov-guided modification of the classical Q-law is proposed to ensure closed-loop stability and enable real-time implementation. The effectiveness of the proposed method is demonstrated through closed-loop orbit transfers across various scenarios, including co-planar transfers, equatorial to polar orbit transfers, and geostationary transfer orbit (GTO) to geostationary earth orbit (GEO) transfers.
comment: 6 pages, 5 figures, 3 tables -- Accepted for publication in Indian Control Conference 2025
Powered Descent Trajectory Optimization of Chandrayaan-3 using Radau Collocation and Controllable Sets
India achieved a significant milestone on August $23^{\text{rd}}$ 2023, becoming the fourth country to accomplish a soft landing on the Moon. This paper presents the powered descent trajectory design for the Chandrayaan-3 mission. The optimization framework is based on pseudospectral Radau collocation, and controllability-based waypoint refinement is employed to further enhance the robustness of the trajectory against state and control perturbations. Furthermore, the trade-off between fuel consumption and robustness is explicitly quantified, providing insights into the practical considerations of mission planning.
comment: 6 pages, 6 figure, Accepted for publication in Indian Control Conference 2025
Reaction-diffusion systems derived from kinetic theory for Multiple Sclerosis
We present a mathematical study for the development of Multiple Sclerosis in which a spatio-temporal kinetic { theory} model describes, at the mesoscopic level, the dynamics of a high number of interacting agents. We consider both interactions among different populations of human cells and the motion of immune cells, stimulated by cytokines. Moreover, we reproduce the consumption of myelin sheath due to anomalously activated lymphocytes and its restoration by oligodendrocytes. Successively, we fix a small time parameter and assume that the considered processes occur at different scales. This allows us to perform a formal limit, obtaining macroscopic reaction-diffusion equations for the number densities with a chemotaxis term. A natural step is then to study the system, inquiring about the formation of spatial patterns through a Turing instability analysis of the problem and basing the discussion on the microscopic parameters of the model. In particular, we get spatial patterns oscillating in time that may reproduce brain lesions characteristic of different phases of the pathology.
Linear Stability Analysis of a Constant Quaternion Difference Attitude Controller
It is quite often claimed, and correctly so, that linear methods cannot achieve global stability results for attitude control, and conversely that nonlinear control is essential in order to achieve (almost) globally stable tracking of general attitude trajectories. On account of this definitive result, and also because of the existence of powerful nonlinear control techniques, there has been relatively very little work analyzing the limits and performance of linear attitude control. It is the purpose of this paper to provide a characterization of the stability achievable for one class of linear attitude control problems, namely those leading to a constant quaternion difference. In this paper, we analytically derive a critical error angle below which linearized dynamics lead to natural marginal stability for such a system, and above which the system is unstable. The dynamics are then used to derive a locally stable linear attitude controller whose performance is validated using simulations.
Towards Hierarchical Multi-Agent Decision-Making for Uncertainty-Aware EV Charging
Recent advances in bidirectional EV charging and discharging systems have spurred interest in workplace applications. However, real-world deployments face various dynamic factors, such as fluctuating electricity prices and uncertain EV departure times, that hinder effective energy management. To address these issues and minimize building electricity costs while meeting EV charging requirements, we design a hierarchical multi-agent structure in which a high-level agent coordinates overall charge or discharge decisions based on real-time pricing, while multiple low-level agents manage individual power level accordingly. For uncertain EV departure times, we propose a novel uncertainty-aware critic augmentation mechanism for low-level agents that improves the evaluation of power-level decisions and ensures robust control under such uncertainty. Building upon these two key designs, we introduce HUCA, a real-time charging control framework that coordinates energy supply among the building and EVs. Experiments on real-world electricity datasets show that HUCA significantly reduces electricity costs and maintains competitive performance in meeting EV charging requirements under both simulated certain and uncertain departure scenarios. The results further highlight the importance of hierarchical control and the proposed critic augmentation under the uncertain departure scenario. A case study illustrates HUCA's capability to allocate energy between the building and EVs in real time, underscoring its potential for practical use.
Nonholonomic Robot Parking by Feedback -- Part II: Nonmodular, Inverse Optimal, Adaptive, Prescribed/Fixed-Time and Safe Designs
For the unicycle system, we provide constructive methods for the design of feedback laws that have one or more of the following properties: being nonmodular and globally exponentially stabilizing, inverse optimal, robust to arbitrary decrease or increase of input coefficients, adaptive, prescribed/fixed-time stabilizing, and safe (ensuring the satisfaction of state constraints). Our nonmodular backstepping feedbacks are implementable with either unidirectional or bidirectional velocity actuation. Thanks to constructing families of strict CLFs for the unicycle, we introduce a general design framework and families of feedback laws for the unicycle, which are inverse optimal with respect to meaningful costs. These inverse optimal feedback laws are endowed with robustness to actuator uncertainty and arbitrarily low input saturation due to the unicycle's driftlessness. Besides ensuring robustness to unknown input coefficients, we also develop adaptive laws for these unknown coefficients, enabling the achievement of good transient performance with lower initial control effort. Additionally, we develop controllers that achieve stabilization within a user-specified time horizon using two systematic methods: time-dilated prescribed-time design with smooth-in-time convergence to zero of both the states and the inputs and homogeneity-based fixed-time control that provides an explicit bound on the settling time. Finally, with a nonovershooting design we guarantee strictly forward motion without curb violation. This article, along with its Part I, lays a broad constructive design foundation for stabilization of the nonholonomic unicycle.
comment: 16 pages. arXiv admin note: text overlap with arXiv:2509.25563
Adaptive Trajectory Bundle Method for Roll-to-Roll Manufacturing Systems
Roll-to-roll (R2R) manufacturing requires precise tension and velocity control under operational constraints. Model predictive control demands gradient computation, while sampling-based methods like MPPI struggle with hard constraint satisfaction. This paper presents an adaptive trajectory bundle method that achieves rigorous constraint handling through derivative-free sequential convex programming. The approach approximates nonlinear dynamics and costs via interpolated sample bundles, replacing Taylor-series linearization with function-value interpolation. Adaptive trust region and penalty mechanisms automatically adjust based on constraint violation metrics, eliminating manual tuning. We establish convergence guarantees proving finite-time feasibility and convergence to stationary points of the constrained problem. Simulations on a six-zone R2R system demonstrate that the adaptive method achieves 4.3\% lower tension RMSE than gradient-based MPC and 11.1\% improvement over baseline TBM in velocity transients, with superior constraint satisfaction compared to MPPI variants. Experimental validation on an R2R dry transfer system confirms faster settling and reduced overshoot relative to LQR and non-adaptive TBM.
Optimal Control with Natural Images: Efficient Reinforcement Learning using Overcomplete Sparse Codes
Optimal control and sequential decision making are widely used in many complex tasks. Optimal control over a sequence of natural images is a first step towards understanding the role of vision in control. Here, we formalize this problem as a reinforcement learning task, and derive general conditions under which an image includes enough information to implement an optimal policy. Reinforcement learning is shown to provide a computationally efficient method for finding optimal policies when natural images are encoded into "efficient" image representations. This is demonstrated by introducing a new reinforcement learning benchmark that easily scales to large numbers of states and long horizons. In particular, by representing each image as an overcomplete sparse code, we are able to efficiently solve an optimal control task that is orders of magnitude larger than those tasks solvable using complete codes. Theoretical justification for this behaviour is provided. This work also demonstrates that deep learning is not necessary for efficient optimal control with natural images.
Nonholonomic Robot Parking by Feedback -- Part I: Modular Strict CLF Designs
It has been known in the robotics literature since about 1995 that, in polar coordinates, the nonholonomic unicycle is asymptotically stabilizable by smooth feedback, even globally. We introduce a modular design framework that selects the forward velocity to decouple the radial coordinate, allowing the steering subsystem to be stabilized independently. Within this structure, we develop families of feedback laws using passivity, backstepping, and integrator forwarding. Each law is accompanied by a strict control Lyapunov function, including barrier variants that enforce angular constraints. These strict CLFs provide constructive class KL convergence estimates and enable eigenvalue assignment at the target equilibrium. The framework generalizes and extends prior modular and nonmodular approaches, while preparing the ground for inverse optimal and adaptive redesigns in the sequel paper.
comment: arXiv admin note: text overlap with arXiv:2509.25575
Safe Control of Multi-Agent Systems with Minimal Communication
In many multi-agent systems, communication is limited by bandwidth, latency, and energy constraints. Designing controllers that achieve coordination and safety with minimal communication is critical for scalable and reliable deployment. This paper presents a method for designing controllers that minimize inter-agent communication in multi-agent systems while satisfying safety and coordination requirements, while conforming to communication delay constraints. The control synthesis problem is cast as a rank minimization problem, where a convex relaxation is obtained via system level synthesis. Simulation results on various tasks, including trajectory tracking with relative and heterogeneous sensing, demonstrate that the proposed method significantly reduces inter-agent transmission compared to baseline approaches.
comment: to appear at 2025 IEEE Conference on Decision and Control (CDC)
Robotics
LightTact: A Visual-Tactile Fingertip Sensor for Deformation-Independent Contact Sensing
Contact often occurs without macroscopic surface deformation, such as during interaction with liquids, semi-liquids, or ultra-soft materials. Most existing tactile sensors rely on deformation to infer contact, making such light-contact interactions difficult to perceive robustly. To address this, we present LightTact, a visual-tactile fingertip sensor that makes contact directly visible via a deformation-independent, optics-based principle. LightTact uses an ambient-blocking optical configuration that suppresses both external light and internal illumination at non-contact regions, while transmitting only the diffuse light generated at true contacts. As a result, LightTact produces high-contrast raw images in which non-contact pixels remain near-black (mean gray value < 3) and contact pixels preserve the natural appearance of the contacting surface. Built on this, LightTact achieves accurate pixel-level contact segmentation that is robust to material properties, contact force, surface appearance, and environmental lighting. We further integrate LightTact on a robotic arm and demonstrate manipulation behaviors driven by extremely light contact, including water spreading, facial-cream dipping, and thin-film interaction. Finally, we show that LightTact's spatially aligned visual-tactile images can be directly interpreted by existing vision-language models, enabling resistor value reasoning for robotic sorting.
LEAD: Minimizing Learner-Expert Asymmetry in End-to-End Driving
Simulators can generate virtually unlimited driving data, yet imitation learning policies in simulation still struggle to achieve robust closed-loop performance. Motivated by this gap, we empirically study how misalignment between privileged expert demonstrations and sensor-based student observations can limit the effectiveness of imitation learning. More precisely, experts have significantly higher visibility (e.g., ignoring occlusions) and far lower uncertainty (e.g., knowing other vehicles' actions), making them difficult to imitate reliably. Furthermore, navigational intent (i.e., the route to follow) is under-specified in student models at test time via only a single target point. We demonstrate that these asymmetries can measurably limit driving performance in CARLA and offer practical interventions to address them. After careful modifications to narrow the gaps between expert and student, our TransFuser v6 (TFv6) student policy achieves a new state of the art on all major publicly available CARLA closed-loop benchmarks, reaching 95 DS on Bench2Drive and more than doubling prior performances on Longest6~v2 and Town13. Additionally, by integrating perception supervision from our dataset into a shared sim-to-real pipeline, we show consistent gains on the NAVSIM and Waymo Vision-Based End-to-End driving benchmarks. Our code, data, and models are publicly available at https://github.com/autonomousvision/lead.
Drift-Corrected Monocular VIO and Perception-Aware Planning for Autonomous Drone Racing
The Abu Dhabi Autonomous Racing League(A2RL) x Drone Champions League competition(DCL) requires teams to perform high-speed autonomous drone racing using only a single camera and a low-quality inertial measurement unit -- a minimal sensor set that mirrors expert human drone racing pilots. This sensor limitation makes the system susceptible to drift from Visual-Inertial Odometry (VIO), particularly during long and fast flights with aggressive maneuvers. This paper presents the system developed for the championship, which achieved a competitive performance. Our approach corrected VIO drift by fusing its output with global position measurements derived from a YOLO-based gate detector using a Kalman filter. A perception-aware planner generated trajectories that balance speed with the need to keep gates visible for the perception system. The system demonstrated high performance, securing podium finishes across multiple categories: third place in the AI Grand Challenge with top speed of 43.2 km/h, second place in the AI Drag Race with over 59 km/h, and second place in the AI Multi-Drone Race. We detail the complete architecture and present a performance analysis based on experimental data from the competition, contributing our insights on building a successful system for monocular vision-based autonomous drone flight.
Contingency Model-based Control (CMC) for Communicationless Cooperative Collision Avoidance in Robot Swarms
Cooperative collision avoidance between robots in swarm operations remains an open challenge. Assuming a decentralized architecture, each robot is responsible for making its own control decisions, including motion planning. To this end, most existing approaches mostly rely some form of (wireless) communication between the agents of the swarm. In reality, however, communication is brittle. It may be affected by latency, further delays and packet losses, transmission faults, and is subject to adversarial attacks, such as jamming or spoofing. This paper proposes Contingency Model-based Control (CMC) as a communicationless alternative. It follows the implicit cooperation paradigm, under which the design of the robots is based on consensual (offline) rules, similar to traffic rules. They include the definition of a contingency trajectory for each robot, and a method for construction of mutual collision avoidance constraints. The setup is shown to guarantee the recursive feasibility and collision avoidance between all swarm members in closed-loop operation. Moreover, CMC naturally satisfies the Plug \& Play paradigm, i.e., for new robots entering the swarm. Two numerical examples demonstrate that the collision avoidance guarantee is intact and that the robot swarm operates smoothly under the CMC regime.
FAR-AVIO: Fast and Robust Schur-Complement Based Acoustic-Visual-Inertial Fusion Odometry with Sensor Calibration
Underwater environments impose severe challenges to visual-inertial odometry systems, as strong light attenuation, marine snow and turbidity, together with weakly exciting motions, degrade inertial observability and cause frequent tracking failures over long-term operation. While tightly coupled acoustic-visual-inertial fusion, typically implemented through an acoustic Doppler Velocity Log (DVL) integrated with visual-inertial measurements, can provide accurate state estimation, the associated graph-based optimization is often computationally prohibitive for real-time deployment on resource-constrained platforms. Here we present FAR-AVIO, a Schur-Complement based, tightly coupled acoustic-visual-inertial odometry framework tailored for underwater robots. FAR-AVIO embeds a Schur complement formulation into an Extended Kalman Filter(EKF), enabling joint pose-landmark optimization for accuracy while maintaining constant-time updates by efficiently marginalizing landmark states. On top of this backbone, we introduce Adaptive Weight Adjustment and Reliability Evaluation(AWARE), an online sensor health module that continuously assesses the reliability of visual, inertial and DVL measurements and adaptively regulates their sigma weights, and we develop an efficient online calibration scheme that jointly estimates DVL-IMU extrinsics, without dedicated calibration manoeuvres. Numerical simulations and real-world underwater experiments consistently show that FAR-AVIO outperforms state-of-the-art underwater SLAM baselines in both localization accuracy and computational efficiency, enabling robust operation on low-power embedded platforms. Our implementation has been released as open source software at https://far-vido.gitbook.io/far-vido-docs.
Design and Modeling of a Simple-Structured Continuously Variable Transmission Utilizing Shape Memory Alloy Superelasticity for Twisted String Actuator
Twisted String Actuators (TSAs) are widely used in robotics but suffer from a limited range of Transmission Ratio (TR) variation, restricting their efficiency under varying loads.To overcome this, we propose a novel lightweight, simple-structured Continuously Variable Transmission (CVT) mechanism for TSA utilizing Shape Memory Alloy (SMA) superelasticity. The CVT mechanism consists solely of a pair of highly lightweight superelastic SMA rods connecting the ends of twisted strings. These rods deform under external loads, adjusting the inter-string distance to enable continuous TR variation.We develop a comprehensive theoretical model that integrates three critical nonlinearities
comment: 15pages,Fig14
Pneumatic bladder links with wide range of motion joints for articulated inflatable robots IROS2024
Exploration of various applications is the frontier of research on inflatable robots. We proposed an articulated robots consisting of multiple pneumatic bladder links connected by rolling contact joints called Hillberry joints. The bladder link is made of a double-layered structure of tarpaulin sheet and polyurethane sheet, which is both airtight and flexible in shape. The integration of the Hilberry joint into an inflatable robot is also a new approach. The rolling contact joint allows wide range of motion of $\pm 150 ^{\circ}$, the largest among the conventional inflatable joints. Using the proposed mechanism for inflatable robots, we demonstrated moving a 500 g payload with a 3-DoF arm and lifting 3.4 kg and 5 kg payloads with 2-DoF and 1-DoF arms, respectively. We also experimented with a single 3-DoF inflatable leg attached to a dolly to show that the proposed structure worked for legged locomotion.
comment: Accepted at IROS2024 (IEEE/RSJ International Conference on Intelligent Robots and Systems)
KnowVal: A Knowledge-Augmented and Value-Guided Autonomous Driving System
Visual-language reasoning, driving knowledge, and value alignment are essential for advanced autonomous driving systems. However, existing approaches largely rely on data-driven learning, making it difficult to capture the complex logic underlying decision-making through imitation or limited reinforcement rewards. To address this, we propose KnowVal, a new autonomous driving system that enables visual-language reasoning through the synergistic integration of open-world perception and knowledge retrieval. Specifically, we construct a comprehensive driving knowledge graph that encodes traffic laws, defensive driving principles, and ethical norms, complemented by an efficient LLM-based retrieval mechanism tailored for driving scenarios. Furthermore, we develop a human-preference dataset and train a Value Model to guide interpretable, value-aligned trajectory assessment. Experimental results show that our method substantially improves planning performance while remaining compatible with existing architectures. Notably, KnowVal achieves the lowest collision rate on nuScenes and state-of-the-art results on Bench2Drive.
ActionFlow: A Pipelined Action Acceleration for Vision Language Models on Edge
Vision-Language-Action (VLA) models have emerged as a unified paradigm for robotic perception and control, enabling emergent generalization and long-horizon task execution. However, their deployment in dynamic, real-world environments is severely hin dered by high inference latency. While smooth robotic interaction requires control frequencies of 20 to 30 Hz, current VLA models typi cally operate at only 3-5 Hz on edge devices due to the memory bound nature of autoregressive decoding. Existing optimizations often require extensive retraining or compromise model accuracy. To bridge this gap, we introduce ActionFlow, a system-level inference framework tailored for resource-constrained edge plat forms. At the core of ActionFlow is a Cross-Request Pipelin ing strategy, a novel scheduler that redefines VLA inference as a macro-pipeline of micro-requests. The strategy intelligently batches memory-bound Decode phases with compute-bound Prefill phases across continuous time steps to maximize hardware utilization. Furthermore, to support this scheduling, we propose a Cross Request State Packed Forward operator and a Unified KV Ring Buffer, which fuse fragmented memory operations into efficient dense computations. Experimental results demonstrate that ActionFlow achieves a 2.55x improvement in FPS on the OpenVLA-7B model without retraining, enabling real-time dy namic manipulation on edge hardware. Our work is available at https://anonymous.4open.science/r/ActionFlow-1D47.
Finite-Time Control Based on Differential Flatness for Wheeled Mobile Robots with Experimental Validation
A robust tracking control strategy is designed to empower wheeled mobile robots (WMRs) to track predetermined routes while operating in diverse fields and encountering disturbances like strong winds or uneven path conditions, which affect tracking performance. Ensuring the applicability of this tracking method in real-world scenarios is essential. To accomplish this, the WMR model is initially transformed into a linear canonical form by leveraging the differential flatness of its kinematic model, facilitating controller design. Subsequently, a novel integral nonlinear hyperplane-based sliding mode control (INH-SMC) technique is proposed for WMR under disturbances. The stability of the technique is analyzed and verified. Finally, its practical viability is demonstrated through a comparative real-world indoor experiment on a TurtleBot3 WMR subjected to disturbances, confirming the feasibility and efficacy of the proposed approach.
UrbanV2X: A Multisensory Vehicle-Infrastructure Dataset for Cooperative Navigation in Urban Areas SC 2025
Due to the limitations of a single autonomous vehicle, Cellular Vehicle-to-Everything (C-V2X) technology opens a new window for achieving fully autonomous driving through sensor information sharing. However, real-world datasets supporting vehicle-infrastructure cooperative navigation in complex urban environments remain rare. To address this gap, we present UrbanV2X, a comprehensive multisensory dataset collected from vehicles and roadside infrastructure in the Hong Kong C-V2X testbed, designed to support research on smart mobility applications in dense urban areas. Our onboard platform provides synchronized data from multiple industrial cameras, LiDARs, 4D radar, ultra-wideband (UWB), IMU, and high-precision GNSS-RTK/INS navigation systems. Meanwhile, our roadside infrastructure provides LiDAR, GNSS, and UWB measurements. The entire vehicle-infrastructure platform is synchronized using the Precision Time Protocol (PTP), with sensor calibration data provided. We also benchmark various navigation algorithms to evaluate the collected cooperative data. The dataset is publicly available at https://polyu-taslab.github.io/UrbanV2X/.
comment: 8 pages, 9 figures, IEEE ITSC 2025
Asynchronous Fast-Slow Vision-Language-Action Policies for Whole-Body Robotic Manipulation
Most Vision-Language-Action (VLA) systems integrate a Vision-Language Model (VLM) for semantic reasoning with an action expert generating continuous action signals, yet both typically run at a single unified frequency. As a result, policy performance is constrained by the low inference speed of large VLMs. This mandatory synchronous execution severely limits control stability and real-time performance in whole-body robotic manipulation, which involves more joints, larger motion spaces, and dynamically changing views. We introduce a truly asynchronous Fast-Slow VLA framework (DuoCore-FS), organizing the system into a fast pathway for high-frequency action generation and a slow pathway for rich VLM reasoning. The system is characterized by two key features. First, a latent representation buffer bridges the slow and fast systems. It stores instruction semantics and action-reasoning representation aligned with the scene-instruction context, providing high-level guidance to the fast pathway. Second, a whole-body action tokenizer provides a compact, unified representation of whole-body actions. Importantly, the VLM and action expert are still jointly trained end-to-end, preserving unified policy learning while enabling asynchronous execution. DuoCore-FS supports a 3B-parameter VLM while achieving 30 Hz whole-body action-chunk generation, approximately three times as fast as prior VLA models with comparable model sizes. Real-world whole-body manipulation experiments demonstrate improved task success rates and significantly enhanced responsiveness compared to synchronous Fast-Slow VLA baselines. The implementation of DuoCore-FS, including training, inference, and deployment, is provided to commercial users by Astribot as part of the Astribot robotic platform.
LoLA: Long Horizon Latent Action Learning for General Robot Manipulation
The capability of performing long-horizon, language-guided robotic manipulation tasks critically relies on leveraging historical information and generating coherent action sequences. However, such capabilities are often overlooked by existing Vision-Language-Action (VLA) models. To solve this challenge, we propose LoLA (Long Horizon Latent Action Learning), a framework designed for robot manipulation that integrates long-term multi-view observations and robot proprioception to enable multi-step reasoning and action generation. We first employ Vision-Language Models to encode rich contextual features from historical sequences and multi-view observations. We further introduces a key module, State-Aware Latent Re-representation, which transforms visual inputs and language commands into actionable robot motion space. Unlike existing VLA approaches that merely concatenate robot proprioception (e.g., joint angles) with VL embeddings, this module leverages such robot states to explicitly ground VL representations in physical scale through a learnable "embodiment-anchored" latent space. We trained LoLA on diverse robotic pre-training datasets and conducted extensive evaluations on simulation benchmarks (SIMPLER and LIBERO), as well as two real-world tasks on Franka and Bi-Manual Aloha robots. Results show that LoLA significantly outperforms prior state-of-the-art methods (e.g., pi0), particularly in long-horizon manipulation tasks.
Enhancing annotations for 5D apple pose estimation through 3D Gaussian Splatting (3DGS)
Automating tasks in orchards is challenging because of the large amount of variation in the environment and occlusions. One of the challenges is apple pose estimation, where key points, such as the calyx, are often occluded. Recently developed pose estimation methods no longer rely on these key points, but still require them for annotations, making annotating challenging and time-consuming. Due to the abovementioned occlusions, there can be conflicting and missing annotations of the same fruit between different images. Novel 3D reconstruction methods can be used to simplify annotating and enlarge datasets. We propose a novel pipeline consisting of 3D Gaussian Splatting to reconstruct an orchard scene, simplified annotations, automated projection of the annotations to images, and the training and evaluation of a pose estimation method. Using our pipeline, 105 manual annotations were required to obtain 28,191 training labels, a reduction of 99.6%. Experimental results indicated that training with labels of fruits that are $\leq95\%$ occluded resulted in the best performance, with a neutral F1 score of 0.927 on the original images and 0.970 on the rendered images. Adjusting the size of the training dataset had small effects on the model performance in terms of F1 score and pose estimation accuracy. It was found that the least occluded fruits had the best position estimation, which worsened as the fruits became more occluded. It was also found that the tested pose estimation method was unable to correctly learn the orientation estimation of apples.
comment: 33 pages, excluding appendices. 17 figures
Detecting Non-Optimal Decisions of Embodied Agents via Diversity-Guided Metamorphic Testing
As embodied agents advance toward real-world deployment, ensuring optimal decisions becomes critical for resource-constrained applications. Current evaluation methods focus primarily on functional correctness, overlooking the non-functional optimality of generated plans. This gap can lead to significant performance degradation and resource waste. We identify and formalize the problem of Non-optimal Decisions (NoDs), where agents complete tasks successfully but inefficiently. We present NoD-DGMT, a systematic framework for detecting NoDs in embodied agent task planning via diversity-guided metamorphic testing. Our key insight is that optimal planners should exhibit invariant behavioral properties under specific transformations. We design four novel metamorphic relations capturing fundamental optimality properties: position detour suboptimality, action optimality completeness, condition refinement monotonicity, and scene perturbation invariance. To maximize detection efficiency, we introduce a diversity-guided selection strategy that actively selects test cases exploring different violation categories, avoiding redundant evaluations while ensuring comprehensive diversity coverage. Extensive experiments on the AI2-THOR simulator with four state-of-the-art planning models demonstrate that NoD-DGMT achieves violation detection rates of 31.9% on average, with our diversity-guided filter improving rates by 4.3% and diversity scores by 3.3 on average. NoD-DGMT significantly outperforms six baseline methods, with 16.8% relative improvement over the best baseline, and demonstrates consistent superiority across different model architectures and task complexities.
Learning Skills from Action-Free Videos
Learning from videos offers a promising path toward generalist robots by providing rich visual and temporal priors beyond what real robot datasets contain. While existing video generative models produce impressive visual predictions, they are difficult to translate into low-level actions. Conversely, latent-action models better align videos with actions, but they typically operate at the single-step level and lack high-level planning capabilities. We bridge this gap by introducing Skill Abstraction from Optical Flow (SOF), a framework that learns latent skills from large collections of action-free videos. Our key idea is to learn a latent skill space through an intermediate representation based on optical flow that captures motion information aligned with both video dynamics and robot actions. By learning skills in this flow-based latent space, SOF enables high-level planning over video-derived skills and allows for easier translation of these skills into actions. Experiments show that our approach consistently improves performance in both multitask and long-horizon settings, demonstrating the ability to acquire and compose skills directly from raw visual data.
Bring My Cup! Personalizing Vision-Language-Action Models with Visual Attentive Prompting
While Vision-Language-Action (VLA) models generalize well to generic instructions, they struggle with personalized commands such as "bring my cup", where the robot must act on one specific instance among visually similar objects. We study this setting of manipulating personal objects, in which a VLA must identify and control a user-specific object unseen during training using only a few reference images. To address this challenge, we propose Visual Attentive Prompting (VAP), a simple-yet-effective training-free perceptual adapter that equips frozen VLAs with top-down selective attention. VAP treats the reference images as a non-parametric visual memory, grounds the personal object in the scene through open-vocabulary detection and embedding-based matching, and then injects this grounding as a visual prompt by highlighting the object and rewriting the instruction. We construct two simulation benchmarks, Personalized-SIMPLER and Personalized-VLABench, and a real-world tabletop benchmark to evaluate personalized manipulation across multiple robots and tasks. Experiments show that VAP consistently outperforms generic policies and token-learning baselines in both success rate and correct-object manipulation, helping to bridge the gap between semantic understanding and instance-level control.
YCB-Handovers Dataset: Analyzing Object Weight Impact on Human Handovers to Adapt Robotic Handover Motion
This paper introduces the YCB-Handovers dataset, capturing motion data of 2771 human-human handovers with varying object weights. The dataset aims to bridge a gap in human-robot collaboration research, providing insights into the impact of object weight in human handovers and readiness cues for intuitive robotic motion planning. The underlying dataset for object recognition and tracking is the YCB (Yale-CMU-Berkeley) dataset, which is an established standard dataset used in algorithms for robotic manipulation, including grasping and carrying objects. The YCB-Handovers dataset incorporates human motion patterns in handovers, making it applicable for data-driven, human-inspired models aimed at weight-sensitive motion planning and adaptive robotic behaviors. This dataset covers an extensive range of weights, allowing for a more robust study of handover behavior and weight variation. Some objects also require careful handovers, highlighting contrasts with standard handovers. We also provide a detailed analysis of the object's weight impact on the human reaching motion in these handovers.
comment: Paper presented at the IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), 2025
Towards Optimal Performance and Action Consistency Guarantees in Dec-POMDPs with Inconsistent Beliefs and Limited Communication
Multi-agent decision-making under uncertainty is fundamental for effective and safe autonomous operation. In many real-world scenarios, each agent maintains its own belief over the environment and must plan actions accordingly. However, most existing approaches assume that all agents have identical beliefs at planning time, implying these beliefs are conditioned on the same data. Such an assumption is often impractical due to limited communication. In reality, agents frequently operate with inconsistent beliefs, which can lead to poor coordination and suboptimal, potentially unsafe, performance. In this paper, we address this critical challenge by introducing a novel decentralized framework for optimal joint action selection that explicitly accounts for belief inconsistencies. Our approach provides probabilistic guarantees for both action consistency and performance with respect to open-loop multi-agent POMDP (which assumes all data is always communicated), and selectively triggers communication only when needed. Furthermore, we address another key aspect of whether, given a chosen joint action, the agents should share data to improve expected performance in inference. Simulation results show our approach outperforms state-of-the-art algorithms.
comment: 9 pages, 3 figures, 2 tables
A General Purpose Method for Robotic Interception of Non-Cooperative Dynamic Targets
This paper presents a general purpose framework for autonomous, vision-based interception of dynamic, non-cooperative targets, validated across three distinct mobility platforms: an unmanned aerial vehicle (UAV), a four-wheeled ground rover, and an air-thruster spacecraft testbed. The approach relies solely on a monocular camera with fiducials for target tracking and operates entirely in the local observer frame without the need for global information. The core contribution of this work is a streamlined and general approach to autonomous interception that can be adapted across robots with varying dynamics, as well as our comprehensive study of the robot interception problem across heterogenous mobility systems under limited observability and no global localization. Our method integrates (1) an Extended Kalman Filter for relative pose estimation amid intermittent measurements, (2) a history-conditioned motion predictor for dynamic target trajectory propagation, and (3) a receding-horizon planner solving a constrained convex program in real time to ensure time-efficient and kinematically feasible interception paths. Our operating regime assumes that observability is restricted by partial fields of view, sensor dropouts, and target occlusions. Experiments are performed in these conditions and include autonomous UAV landing on dynamic targets, rover rendezvous and leader-follower tasks, and spacecraft proximity operations. Results from simulated and physical experiments demonstrate robust performance with low interception errors (both during station-keeping and upon scenario completion), high success rates under deterministic and stochastic target motion profiles, and real-time execution on embedded processors such as the Jetson Orin, VOXL2, and Raspberry Pi 5. These results highlight the framework's generalizability, robustness, and computational efficiency.
comment: 10 pages, 11 figures, 5 tables. Accepted to IEEE Aerospace Conference 2026
Fixed-time control with prescribed performance for path following of underwater gliders
Underwater gliders are increasingly deployed in challenging missions - such as hurricane-season observations and long-endurance environmental monitoring - where strong currents and turbulence pose significant risks to navigation safety. To address these practical challenges, this paper presents a fixed-time prescribed performance control scheme for the 3D path following of underwater gliders subject to model uncertainties and environmental disturbances. The primary contribution is the integration of a finite-time performance function within a fixed-time control framework. This synthesis ensures that the tracking errors are constrained within prescribed performance bounds and converge to a compact set within a fixed time, independent of initial conditions. A second key contribution is the development of a fixed-time sliding mode disturbance observer that provides accurate finite-time estimation of lumped disturbances, enhancing the system's robustness. Integrated with an iLOS guidance law, the proposed controller enables precise and safe waypoint following. Numerical simulations demonstrate that the proposed method outperforms conventional sliding mode and prescribed performance controllers in tracking accuracy, convergence speed, and control effort smoothness, validating its efficacy for robust underwater navigation.
comment: 22 pages, 13 figures, 2 tables, Submitted to Ocean Engineering
Anytime Metaheuristic Framework for Global Route Optimization in Expected-Time Mobile Search
Expected-time mobile search (ETS) is a fundamental robotics task where a mobile sensor navigates an environment to minimize the expected time required to locate a hidden object. Global route optimization for ETS in static 2D continuous environments remains largely underexplored due to the intractability of objective evaluation, stemming from the continuous nature of the environment and the interplay of motion and visibility constraints. Prior work has addressed this through partial discretization, leading to discrete-sensing formulations tackled via utility-greedy heuristics. Others have taken an indirect approach by heuristically approximating the objective using minimum latency problems on fixed graphs, enabling global route optimization via efficient metaheuristics. This paper builds on and significantly extends the latter by introducing Milaps (Minimum latency problems), a model-based solution framework for ETS. Milaps integrates novel auxiliary objectives and adapts a recent anytime metaheuristic for the traveling deliveryman problem, chosen for its strong performance under tight runtime constraints. Evaluations on a novel large-scale dataset demonstrate superior trade-offs between solution quality and runtime compared to state-of-the-art baselines. The best-performing strategy rapidly generates a preliminary solution, assigns static weights to sensing configurations, and optimizes global costs metaheuristically. Additionally, a qualitative study highlights the framework's flexibility across diverse scenarios.
comment: 20 pages, 42 figures (including subfigures); submitted to IEEE Transactions on Robotics (T-RO) in February 2025
Learning Safe Autonomous Driving Policies Using Predictive Safety Representations
Safe reinforcement learning (SafeRL) is a prominent paradigm for autonomous driving, where agents are required to optimize performance under strict safety requirements. This dual objective creates a fundamental tension, as overly conservative policies limit driving efficiency while aggressive exploration risks safety violations. The Safety Representations for Safer Policy Learning (SRPL) framework addresses this challenge by equipping agents with a predictive model of future constraint violations and has shown promise in controlled environments. This paper investigates whether SRPL extends to real-world autonomous driving scenarios. Systematic experiments on the Waymo Open Motion Dataset (WOMD) and NuPlan demonstrate that SRPL can improve the reward-safety tradeoff, achieving statistically significant improvements in success rate (effect sizes r = 0.65-0.86) and cost reduction (effect sizes r = 0.70-0.83), with p < 0.05 for observed improvements. However, its effectiveness depends on the underlying policy optimizer and the dataset distribution. The results further show that predictive safety representations play a critical role in improving robustness to observation noise. Additionally, in zero-shot cross-dataset evaluation, SRPL-augmented agents demonstrate improved generalization compared to non-SRPL methods. These findings collectively demonstrate the potential of predictive safety representations to strengthen SafeRL for autonomous driving.
comment: 8 pages, 4 figures
Categorical Equivariant Deep Learning: Category-Equivariant Neural Networks and Universal Approximation Theorems
We develop a theory of category-equivariant neural networks (CENNs) that unifies group/groupoid-equivariant networks, poset/lattice-equivariant networks, graph and sheaf neural networks. Equivariance is formulated as naturality in a topological category with Radon measures. Formulating linear and nonlinear layers in the categorical setup, we prove the equivariant universal approximation theorem in the general setting: the class of finite-depth CENNs is dense in the space of continuous equivariant transformations. We instantiate the framework for groups/groupoids, posets/lattices, graphs and cellular sheaves, deriving universal approximation theorems for them in a systematic manner. Categorical equivariant deep learning thus allows us to expand the horizons of equivariant deep learning beyond group actions, encompassing not only geometric symmetries but also contextual and compositional symmetries.
Let's Think in Two Steps: Mitigating Agreement Bias in MLLMs with Self-Grounded Verification
Verifiers--functions assigning rewards to agent behavior--have been key for AI progress in domains like math and code. However, extending gains to domains without clear-cut success criteria (e.g., computer use) remains a challenge: while humans can recognize desired outcomes, translating this intuition into scalable rules is nontrivial. Multimodal Large Language Models (MLLMs) emerge as a promising solution, given their world knowledge, human-preference alignment, and reasoning skills. We evaluate MLLMs as verifiers across web navigation, computer use, and robotic manipulation, and identify a critical limitation: a strong tendency to over-validate agent behavior, a phenomenon we term agreement bias. This bias is pervasive across models, resilient to test-time scaling, and poses risks to existing methods relying on MLLM evaluations. We discuss methods to evaluate and improve MLLM verifiers and introduce Self-Grounded Verification (SGV), a lightweight method that harnesses MLLMs' own sampling mechanisms by modulating (un)conditional generation to better leverage their knowledge, alignment, and reasoning. SGV operates in two steps: first, the MLLM is elicited to generate broad priors about desired behavior, independent of the data under evaluation. Then, conditioned on self-generated priors, it reasons over and evaluates a candidate trajectory. SGV yields more human-aligned evaluations with gains of up to 25pp in failure detection, 14pp in accuracy, and benefits extending to downstream applications. In self-refinement and online supervision, SGV boosts task completion of a GUI specialist in OSWorld, a diffusion policy in robomimic, and a ReAct agent in VisualWebArena--setting a new state of the art, surpassing the previous best by 20pp. We release an updated version of VisualWebArena featuring more human-aligned evaluators, high-fidelity environment parallelism, and speedups of over 10x.
comment: Our code, models, and data are publicly available at https://mshalimay.github.io/agreement-bias-sgv/
Learning Stack-of-Tasks Management for Redundant Robots
This paper presents a novel framework for automatically learning complete Stack-of-Tasks (SoT) controllers for redundant robotic systems, including task priorities, activation logic, and control parameters. Unlike classical SoT pipelines-where task hierarchies are manually defined and tuned-our approach optimizes the full SoT structure directly from a user-specified cost function encoding intuitive preferences such as safety, precision, manipulability, or execution speed. The method combines Genetic Programming with simulation-based evaluation to explore both discrete (priority order, task activation) and continuous (gains, trajectory durations) components of the controller. We validate the framework on a dual-arm mobile manipulator (the ABB mobile-YuMi research platform), demonstrating robust convergence across multiple cost definitions, automatic suppression of irrelevant tasks, and strong resilience to distractors. Learned SoTs exhibit expert-like hierarchical structure and adapt naturally to multi-objective trade-offs. Crucially, all controllers transfer from Gazebo simulation to the real robot, achieving safe and precise motion without additional tuning. Experiments in static and dynamic environments show reliable obstacle avoidance, high tracking accuracy, and predictable behavior in the presence of humans. The proposed method provides an interpretable and scalable alternative to manual SoT design, enabling rapid, user-driven generation of task execution hierarchies for complex robotic systems.
Continuous Vision-Language-Action Co-Learning with Semantic-Physical Alignment for Behavioral Cloning AAAI 2026
Language-conditioned manipulation facilitates human-robot interaction via behavioral cloning (BC), which learns control policies from human demonstrations and serves as a cornerstone of embodied AI. Overcoming compounding errors in sequential action decisions remains a central challenge to improving BC performance. Existing approaches mitigate compounding errors through data augmentation, expressive representation, or temporal abstraction. However, they suffer from physical discontinuities and semantic-physical misalignment, leading to inaccurate action cloning and intermittent execution. In this paper, we present Continuous vision-language-action Co-Learning with Semantic-Physical Alignment (CCoL), a novel BC framework that ensures temporally consistent execution and fine-grained semantic grounding. It generates robust and smooth action execution trajectories through continuous co-learning across vision, language, and proprioceptive inputs (e.g., robot internal states). Meanwhile, we anchor language semantics to visuomotor representations by a bidirectional cross-attention to learn contextual information for action generation, successfully overcoming the problem of semantic-physical misalignment. Extensive experiments show that CCoL achieves an average 8.0% relative improvement across three simulation suites, with up to 19.2% relative gain in human-demonstrated bimanual insertion tasks. Real-world tests on a 7-DoF robot further confirm CCoL's generalization under unseen and noisy object states.
comment: Accepted at AAAI 2026, the Project website is available at https://qhemu.github.io/CCoL/
LeLaR: The First In-Orbit Demonstration of an AI-Based Satellite Attitude Controller
Attitude control is essential for many satellite missions. Classical controllers, however, are time-consuming to design and sensitive to model uncertainties and variations in operational boundary conditions. Deep Reinforcement Learning (DRL) offers a promising alternative by learning adaptive control strategies through autonomous interaction with a simulation environment. Overcoming the Sim2Real gap, which involves deploying an agent trained in simulation onto the real physical satellite, remains a significant challenge. In this work, we present the first successful in-orbit demonstration of an AI-based attitude controller for inertial pointing maneuvers. The controller was trained entirely in simulation and deployed to the InnoCube 3U nanosatellite, which was developed by the Julius-Maximilians-Universität Würzburg in cooperation with the Technische Universität Berlin, and launched in January 2025. We present the AI agent design, the methodology of the training procedure, the discrepancies between the simulation and the observed behavior of the real satellite, and a comparison of the AI-based attitude controller with the classical PD controller of InnoCube. Steady-state metrics confirm the robust performance of the AI-based controller during repeated in-orbit maneuvers.
comment: 55 pages, 27 figures, 29 tables. The maneuver telemetry datasets generated and analyzed during this work are available in the GitHub repository under https://github.com/kdjebko/lelar-in-orbit-data
DRAE: Dynamic Retrieval-Augmented Expert Networks for Lifelong Learning and Task Adaptation in Robotics ACL 2025
We introduce Dynamic Retrieval-Augmented Expert Networks (DRAE), a groundbreaking architecture that addresses the challenges of lifelong learning, catastrophic forgetting, and task adaptation by combining the dynamic routing capabilities of Mixture-of-Experts (MoE); leveraging the knowledge-enhancement power of Retrieval-Augmented Generation (RAG); incorporating a novel hierarchical reinforcement learning (RL) framework; and coordinating through ReflexNet-SchemaPlanner-HyperOptima (RSHO).DRAE dynamically routes expert models via a sparse MoE gating mechanism, enabling efficient resource allocation while leveraging external knowledge through parametric retrieval (P-RAG) to augment the learning process. We propose a new RL framework with ReflexNet for low-level task execution, SchemaPlanner for symbolic reasoning, and HyperOptima for long-term context modeling, ensuring continuous adaptation and memory retention. Experimental results show that DRAE significantly outperforms baseline approaches in long-term task retention and knowledge reuse, achieving an average task success rate of 82.5% across a set of dynamic robotic manipulation tasks, compared to 74.2% for traditional MoE models. Furthermore, DRAE maintains an extremely low forgetting rate, outperforming state-of-the-art methods in catastrophic forgetting mitigation. These results demonstrate the effectiveness of our approach in enabling flexible, scalable, and efficient lifelong learning for robotics.
comment: Accepted to the main conference of the Annual Meeting of the Association for Computational Linguistics (ACL 2025)
Conservative Bias in Multi-Teacher Learning: Why Agents Prefer Low-Reward Advisors
Interactive reinforcement learning (IRL) has shown promise in enabling autonomous agents and robots to learn complex behaviours from human teachers, yet the dynamics of teacher selection remain poorly understood. This paper reveals an unexpected phenomenon in IRL: when given a choice between teachers with different reward structures, learning agents overwhelmingly prefer conservative, low-reward teachers (93.16% selection rate) over those offering 20x higher rewards. Through 1,250 experimental runs in navigation tasks with multiple expert teachers, we discovered: (1) Conservative bias dominates teacher selection: agents systematically choose the lowest-reward teacher, prioritising consistency over optimality; (2) Critical performance thresholds exist at teacher availability rho >= 0.6 and accuracy omega >= 0.6, below which the framework fails catastrophically; (3) The framework achieves 159% improvement over baseline Q-learning under concept drift. These findings challenge fundamental assumptions about optimal teaching in RL and suggest potential implications for human-robot collaboration, where human preferences for safety and consistency may align with the observed agent selection behaviour, potentially informing training paradigms for safety-critical robotic applications.
comment: 10 pages, 5 figures. Accepted at ACRA 2025 (Australasian Conference on Robotics and Automation)
Tactile-based Object Retrieval From Granular Media
We introduce GEOTACT, the first robotic system capable of grasping and retrieving objects of potentially unknown shapes buried in a granular environment. While important in many applications, ranging from mining and exploration to search and rescue, this type of interaction with granular media is difficult due to the uncertainty stemming from visual occlusion and noisy contact signals. To address these challenges, we use a learning method relying exclusively on touch feedback, trained end-to-end with simulated sensor noise. We show that our problem formulation leads to the natural emergence of learned pushing behaviors that the manipulator uses to reduce uncertainty and funnel the object to a stable grasp despite spurious and noisy tactile readings. We introduce a training curriculum that bootstraps learning in simulated granular environments, enabling zero-shot transfer to real hardware. Despite being trained only on seven objects with primitive shapes, our method is shown to successfully retrieve 35 different objects, including rigid, deformable, and articulated objects with complex shapes. Videos and additional information can be found at https://jxu.ai/geotact.
LoGoPlanner: Localization Grounded Navigation Policy with Metric-aware Visual Geometry
Trajectory planning in unstructured environments is a fundamental and challenging capability for mobile robots. Traditional modular pipelines suffer from latency and cascading errors across perception, localization, mapping, and planning modules. Recent end-to-end learning methods map raw visual observations directly to control signals or trajectories, promising greater performance and efficiency in open-world settings. However, most prior end-to-end approaches still rely on separate localization modules that depend on accurate sensor extrinsic calibration for self-state estimation, thereby limiting generalization across embodiments and environments. We introduce LoGoPlanner, a localization-grounded, end-to-end navigation framework that addresses these limitations by: (1) finetuning a long-horizon visual-geometry backbone to ground predictions with absolute metric scale, thereby providing implicit state estimation for accurate localization; (2) reconstructing surrounding scene geometry from historical observations to supply dense, fine-grained environmental awareness for reliable obstacle avoidance; and (3) conditioning the policy on implicit geometry bootstrapped by the aforementioned auxiliary tasks, thereby reducing error propagation. We evaluate LoGoPlanner in both simulation and real-world settings, where its fully end-to-end design reduces cumulative error while metric-aware geometry memory enhances planning consistency and obstacle avoidance, leading to more than a 27.3\% improvement over oracle-localization baselines and strong generalization across embodiments and environments. The code and models have been made publicly available on the https://steinate.github.io/logoplanner.github.io.
comment: Project page:https://steinate.github.io/logoplanner.github.io/
GR-RL: Going Dexterous and Precise for Long-Horizon Robotic Manipulation
We present GR-RL, a robotic learning framework that turns a generalist vision-language-action (VLA) policy into a highly capable specialist for long-horizon dexterous manipulation. Assuming the optimality of human demonstrations is core to existing VLA policies. However, we claim that in highly dexterous and precise manipulation tasks, human demonstrations are noisy and suboptimal. GR-RL proposes a multi-stage training pipeline that filters, augments, and reinforces the demonstrations by reinforcement learning. First, GR-RL learns a vision-language-conditioned task progress, filters the demonstration trajectories, and only keeps the transitions that contribute positively to the progress. Specifically, we show that by directly applying offline RL with sparse reward, the resulting $Q$-values can be treated as a robust progress function. Next, we introduce morphological symmetry augmentation that greatly improves the generalization and performance of GR-RL. Lastly, to better align the VLA policy with its deployment behaviors for high-precision control, we perform online RL by learning a latent space noise predictor. With this pipeline, GR-RL is, to our knowledge, the first learning-based policy that can autonomously lace up a shoe by threading shoelaces through multiple eyelets with an 83.3% success rate, a task requiring long-horizon reasoning, millimeter-level precision, and compliant soft-body interaction. We hope GR-RL provides a step toward enabling generalist robot foundation models to specialize into reliable real-world experts.
Interaction Dataset of Autonomous Vehicles with Traffic Lights and Signs
This paper presents the development of a comprehensive dataset capturing interactions between Autonomous Vehicles (AVs) and traffic control devices, specifically traffic lights and stop signs. Derived from the Waymo Motion dataset, our work addresses a critical gap in the existing literature by providing real-world trajectory data on how AVs navigate these traffic control devices. We propose a methodology for identifying and extracting relevant interaction trajectory data from the Waymo Motion dataset, incorporating over 37,000 instances with traffic lights and 44,000 with stop signs. Our methodology includes defining rules to identify various interaction types, extracting trajectory data, and applying a wavelet-based denoising method to smooth the acceleration and speed profiles and eliminate anomalous values, thereby enhancing the trajectory quality. Quality assessment metrics indicate that trajectories obtained in this study have anomaly proportions in acceleration and jerk profiles reduced to near-zero levels across all interaction categories. By making this dataset publicly available, we aim to address the current gap in datasets containing AV interaction behaviors with traffic lights and signs. Based on the organized and published dataset, we can gain a more in-depth understanding of AVs' behavior when interacting with traffic lights and signs. This will facilitate research on AV integration into existing transportation infrastructures and networks, supporting the development of more accurate behavioral models and simulation tools.
STARE-VLA: Progressive Stage-Aware Reinforcement for Fine-Tuning Vision-Language-Action Models
Recent advances in Vision-Language-Action (VLA) models, powered by large language models and reinforcement learning-based fine-tuning, have shown remarkable progress in robotic manipulation. Existing methods often treat long-horizon actions as linguistic sequences and apply trajectory-level optimization methods such as Trajectory-wise Preference Optimization (TPO) or Proximal Policy Optimization (PPO), leading to coarse credit assignment and unstable training. However, unlike language, where a unified semantic meaning is preserved despite flexible sentence order, action trajectories progress through causally chained stages with different learning difficulties. This motivates progressive stage optimization. Thereby, we present Stage-Aware Reinforcement (STARE), a module that decomposes a long-horizon action trajectory into semantically meaningful stages and provides dense, interpretable, and stage-aligned reinforcement signals. Integrating STARE into TPO and PPO, we yield Stage-Aware TPO (STA-TPO) and Stage-Aware PPO (STA-PPO) for offline stage-wise preference and online intra-stage interaction, respectively. Further building on supervised fine-tuning as initialization, we propose the Imitation -> Preference -> Interaction (IPI), a serial fine-tuning pipeline for improving action accuracy in VLA models. Experiments on SimplerEnv and ManiSkill3 demonstrate substantial gains, achieving state-of-the-art success rates of 98.0 percent on SimplerEnv and 96.4 percent on ManiSkill3 tasks.
Towards Autonomous Navigation in Endovascular Interventions
Cardiovascular diseases remain the leading cause of global mortality, with minimally invasive treatment options offered through endovascular interventions. However, the precision and adaptability of current robotic systems for endovascular navigation are limited by heuristic control, low autonomy, and the absence of haptic feedback. This thesis presents an integrated AI-driven framework for autonomous guidewire navigation in complex vascular environments, addressing key challenges in data availability, simulation fidelity, and navigational accuracy. A high-fidelity, real-time simulation platform, CathSim, is introduced for reinforcement learning based catheter navigation, featuring anatomically accurate vascular models and contact dynamics. Building on CathSim, the Expert Navigation Network is developed, a policy that fuses visual, kinematic, and force feedback for autonomous tool control. To mitigate data scarcity, the open-source, bi-planar fluoroscopic dataset Guide3D is proposed, comprising more than 8,700 annotated images for 3D guidewire reconstruction. Finally, SplineFormer, a transformer-based model, is introduced to directly predict guidewire geometry as continuous B-spline parameters, enabling interpretable, real-time navigation. The findings show that combining high-fidelity simulation, multimodal sensory fusion, and geometric modelling substantially improves autonomous endovascular navigation and supports safer, more precise minimally invasive procedures.
Multiagent Systems
LongVideoAgent: Multi-Agent Reasoning with Long Videos
Recent advances in multimodal LLMs and systems that use tools for long-video QA point to the promise of reasoning over hour-long episodes. However, many methods still compress content into lossy summaries or rely on limited toolsets, weakening temporal grounding and missing fine-grained cues. We propose a multi-agent framework in which a master LLM coordinates a grounding agent to localize question-relevant segments and a vision agent to extract targeted textual observations. The master agent plans with a step limit, and is trained with reinforcement learning to encourage concise, correct, and efficient multi-agent cooperation. This design helps the master agent focus on relevant clips via grounding, complements subtitles with visual detail, and yields interpretable trajectories. On our proposed LongTVQA and LongTVQA+ which are episode-level datasets aggregated from TVQA/TVQA+, our multi-agent system significantly outperforms strong non-agent baselines. Experiments also show reinforcement learning further strengthens reasoning and planning for the trained agent. Code and data will be shared at https://longvideoagent.github.io/.
When Natural Strategies Meet Fuzziness and Resource-Bounded Actions (Extended Version)
In formal strategic reasoning for Multi-Agent Systems (MAS), agents are typically assumed to (i) employ arbitrarily complex strategies, (ii) execute each move at zero cost, and (iii) operate over fully crisp game structures. These idealized assumptions stand in stark contrast with human decision making in real world environments. The natural strategies framework along with some of its recent variants, partially addresses this gap by restricting strategies to concise rules guarded by regular expressions. Yet, it still overlook both the cost of each action and the uncertainty that often characterizes human perception of facts over the time. In this work, we introduce HumanATLF, a logic that builds upon natural strategies employing both fuzzy semantics and resource bound actions: each action carries a real valued cost drawn from a non refillable budget, and atomic conditions and goals have degrees in [0,1]. We give a formal syntax and semantics, and prove that model checking is in P when both the strategy complexity k and resource budget b are fixed, NP complete if just one strategic operator over Boolean objectives is allowed, and Delta^P_2 complete when k and b vary. Moreover, we show that recall based strategies can be decided in PSPACE. We implement our algorithms in VITAMIN, an open source model checking tool for MAS and validate them on an adversarial resource aware drone rescue scenario.
Chain-of-Anomaly Thoughts with Large Vision-Language Models
Automated video surveillance with Large Vision-Language Models is limited by their inherent bias towards normality, often failing to detect crimes. While Chain-of-Thought reasoning strategies show significant potential for improving performance in language tasks, the lack of inductive anomaly biases in their reasoning further steers the models towards normal interpretations. To address this, we propose Chain-of-Anomaly-Thoughts (CoAT), a multi-agent reasoning framework that introduces inductive criminal bias in the reasoning process through a final, anomaly-focused classification layer. Our method significantly improves Anomaly Detection, boosting F1-score by 11.8 p.p. on challenging low-resolution footage and Anomaly Classification by 3.78 p.p. in high-resolution videos.
comment: 2 pages, 3 figures, 1 table. Accepted for RECPAD 2025
MAR:Multi-Agent Reflexion Improves Reasoning Abilities in LLMs
LLMs have shown the capacity to improve their performance on reasoning tasks through reflecting on their mistakes, and acting with these reflections in mind. However, continual reflections of the same LLM onto itself exhibit degeneration of thought, where the LLM continues to repeat the same errors again and again even with the knowledge that its wrong. To address this problem, we instead introduce multi-agent with multi-persona debators as the method to generate reflections. Through out extensive experimentation, we've found that the leads to better diversity of in the reflections generated by the llm agent. We demonstrate an accuracy of 47% EM HotPot QA (question answering) and 82.7% on HumanEval (programming), both performances surpassing reflection with a single llm.
Towards Optimal Performance and Action Consistency Guarantees in Dec-POMDPs with Inconsistent Beliefs and Limited Communication
Multi-agent decision-making under uncertainty is fundamental for effective and safe autonomous operation. In many real-world scenarios, each agent maintains its own belief over the environment and must plan actions accordingly. However, most existing approaches assume that all agents have identical beliefs at planning time, implying these beliefs are conditioned on the same data. Such an assumption is often impractical due to limited communication. In reality, agents frequently operate with inconsistent beliefs, which can lead to poor coordination and suboptimal, potentially unsafe, performance. In this paper, we address this critical challenge by introducing a novel decentralized framework for optimal joint action selection that explicitly accounts for belief inconsistencies. Our approach provides probabilistic guarantees for both action consistency and performance with respect to open-loop multi-agent POMDP (which assumes all data is always communicated), and selectively triggers communication only when needed. Furthermore, we address another key aspect of whether, given a chosen joint action, the agents should share data to improve expected performance in inference. Simulation results show our approach outperforms state-of-the-art algorithms.
comment: 9 pages, 3 figures, 2 tables
Let's Think in Two Steps: Mitigating Agreement Bias in MLLMs with Self-Grounded Verification
Verifiers--functions assigning rewards to agent behavior--have been key for AI progress in domains like math and code. However, extending gains to domains without clear-cut success criteria (e.g., computer use) remains a challenge: while humans can recognize desired outcomes, translating this intuition into scalable rules is nontrivial. Multimodal Large Language Models (MLLMs) emerge as a promising solution, given their world knowledge, human-preference alignment, and reasoning skills. We evaluate MLLMs as verifiers across web navigation, computer use, and robotic manipulation, and identify a critical limitation: a strong tendency to over-validate agent behavior, a phenomenon we term agreement bias. This bias is pervasive across models, resilient to test-time scaling, and poses risks to existing methods relying on MLLM evaluations. We discuss methods to evaluate and improve MLLM verifiers and introduce Self-Grounded Verification (SGV), a lightweight method that harnesses MLLMs' own sampling mechanisms by modulating (un)conditional generation to better leverage their knowledge, alignment, and reasoning. SGV operates in two steps: first, the MLLM is elicited to generate broad priors about desired behavior, independent of the data under evaluation. Then, conditioned on self-generated priors, it reasons over and evaluates a candidate trajectory. SGV yields more human-aligned evaluations with gains of up to 25pp in failure detection, 14pp in accuracy, and benefits extending to downstream applications. In self-refinement and online supervision, SGV boosts task completion of a GUI specialist in OSWorld, a diffusion policy in robomimic, and a ReAct agent in VisualWebArena--setting a new state of the art, surpassing the previous best by 20pp. We release an updated version of VisualWebArena featuring more human-aligned evaluators, high-fidelity environment parallelism, and speedups of over 10x.
comment: Our code, models, and data are publicly available at https://mshalimay.github.io/agreement-bias-sgv/
Multi-Agent Intelligence for Multidisciplinary Decision-Making in Gastrointestinal Oncology
Multimodal clinical reasoning in the field of gastrointestinal (GI) oncology necessitates the integrated interpretation of endoscopic imagery, radiological data, and biochemical markers. Despite the evident potential exhibited by Multimodal Large Language Models (MLLMs), they frequently encounter challenges such as context dilution and hallucination when confronted with intricate, heterogeneous medical histories. In order to address these limitations, a hierarchical Multi-Agent Framework is proposed, which emulates the collaborative workflow of a human Multidisciplinary Team (MDT). The system attained a composite expert evaluation score of 4.60/5.00, thereby demonstrating a substantial improvement over the monolithic baseline. It is noteworthy that the agent-based architecture yielded the most substantial enhancements in reasoning logic and medical accuracy. The findings indicate that mimetic, agent-based collaboration provides a scalable, interpretable, and clinically robust paradigm for automated decision support in oncology.
CardAIc-Agents: A Multimodal Framework with Hierarchical Adaptation for Cardiac Care Support
Cardiovascular diseases (CVDs) remain the foremost cause of mortality worldwide, a burden worsened by a severe deficit of healthcare workers. Artificial intelligence (AI) agents have shown potential to alleviate this gap through automated detection and proactive screening, yet their clinical application remains limited by: 1) rigid sequential workflows, whereas clinical care often requires adaptive reasoning that select specific tests and, based on their results, guides personalised next steps; 2) reliance solely on intrinsic model capabilities to perform role assignment without domain-specific tool support; 3) general and static knowledge bases without continuous learning capability; and 4) fixed unimodal or bimodal inputs and lack of on-demand visual outputs when clinicians require visual clarification. In response, a multimodal framework, CardAIc-Agents, was proposed to augment models with external tools and adaptively support diverse cardiac tasks. First, a CardiacRAG agent generated task-aware plans from updatable cardiac knowledge, while the Chief agent integrated tools to autonomously execute these plans and deliver decisions. Second, to enable adaptive and case-specific customization, a stepwise update strategy was developed to dynamically refine plans based on preceding execution results, once the task was assessed as complex. Third, a multidisciplinary discussion team was proposed which was automatically invoked to interpret challenging cases, thereby supporting further adaptation. In addition, visual review panels were provided to assist validation when clinicians raised concerns. Experiments across three datasets showed the efficiency of CardAIc-Agents compared to mainstream Vision-Language Models (VLMs) and state-of-the-art agentic systems.
A Game-Theoretic Framework for Distributed Load Balancing: Static and Dynamic Game Models
Motivated by applications in job scheduling, queuing networks, and load balancing in cyber-physical systems, we develop and analyze a game-theoretic framework to balance the load among servers in static and dynamic settings. In these applications, jobs/tasks are held by selfish entities that do not want to coordinate with each other, yet the goal is to balance the load among servers in a distributed manner. First, we provide a static game formulation in which each player holds a job with a specific processing requirement and wants to schedule it fractionally among a set of heterogeneous servers to minimize its average processing time. We show that this static game is a potential game with a pure Nash equilibrium (NE). In particular, the best-response dynamics converge to such an NE after $n$ iterations, where $n$ is the number of players. Additionally, we bound the price of anarchy (PoA) of the static game in terms of game parameters. We then extend our results to a dynamic game setting, where jobs arrive and get processed, and players observe the load on the servers to decide how to schedule their jobs. In this setting, we show that if the players update their strategies using dynamic best-response, the system eventually becomes fully load-balanced and the players' strategies converge to the pure NE of the static game. In particular, we show that the convergence time scales only polynomially with respect to the game parameters. Finally, we provide numerical results to evaluate the performance of our proposed algorithms.
comment: To appear in Automatica, vol. 185, 2026
Social learning moderates the tradeoffs between efficiency, stability, and equity in group foraging
Collective foragers, from animals to robotic swarms, must balance exploration and exploitation to locate sparse resources efficiently. While social learning is known to facilitate this balance, how the range of information sharing shapes group-level outcomes remains unclear. Here, we develop a minimal collective foraging model in which individuals combine independent exploration, local exploitation, and socially guided movement. We show that foraging efficiency is maximized at an intermediate social learning range, where groups exploit discovered resources without suppressing independent discovery. This optimal regime also minimizes temporal burstiness in resource intake, reducing starvation risk. Increasing social learning range further improves equity among individuals but degrades efficiency through redundant exploitation. Introducing risky (negative) targets shifts the optimal range upward; in contrast, when penalties are ignored, randomly distributed negative cues can further enhance efficiency by constraining unproductive exploration. Together, these results reveal how local information rules regulate a fundamental trade-off between efficiency, stability, and equity, providing design principles for biological foraging systems and engineered collectives.
comment: Code and data: https://github.com/LoneStar97/social-learning-search ; additional videos of agents' movement: https://www.youtube.com/playlist?list=PLgRFM9nAjJRwoZvCGBAdCIE-BYNgPmSuV
SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios
Existing benchmarks for AI coding agents focus on isolated, single-issue tasks such as fixing a bug or implementing a small feature. However, real-world software engineering is fundamentally a long-horizon endeavor: developers must interpret high-level requirements, plan coordinated changes across many files, and evolve codebases over multiple iterations while preserving existing functionality. We introduce SWE-EVO, a benchmark that evaluates agents on this long-horizon software evolution challenge. Constructed from release notes and version histories of seven mature open-source Python projects, Tool comprises 48 evolution tasks that require agents to implement multi-step modifications spanning an average of 21 files, validated against comprehensive test suites averaging 874 tests per instance. Experiments with state-of-the-art models reveal a striking capability gap: even GPT-5 with OpenHands achieves only a 21 percent resolution rate on Tool, compared to 65 percent on the single-issue SWE-Bench Verified. This demonstrates that current agents struggle with sustained, multi-file reasoning. We also propose Fix Rate, a fine-grained metric that captures partial progress toward solving these complex, long-horizon tasks.
Systems and Control (EESS)
Modeling Economic Systems as Multiport Networks
In this paper, we demonstrate how multiport network theory can be used as a powerful modeling tool in economics. The critical insight is using the port concept to pair the flow of goods (the electrical current) with the agent's incentive (the voltage) in an economic interaction. By building networks of agents interacting through ports, we create models with multiple levels of abstraction, from the macro level down to the micro level. We are thereby able to model complex macroeconomic systems whose dynamical behavior is emergent from the micro level. Using the LTSpice circuit simulator, we then design and analyze a series of example systems that range in complexity from the textbook Robinson Crusoe economy to a model of an entire economy.
comment: 29 pages, 16 figures, to be submitted to Journal of the Franklin Institute
Leveraging High-Fidelity Digital Models and Reinforcement Learning for Mission Engineering: A Case Study of Aerial Firefighting Under Perfect Information
As systems engineering (SE) objectives evolve from design and operation of monolithic systems to complex System of Systems (SoS), the discipline of Mission Engineering (ME) has emerged which is increasingly being accepted as a new line of thinking for the SE community. Moreover, mission environments are uncertain, dynamic, and mission outcomes are a direct function of how the mission assets will interact with this environment. This proves static architectures brittle and calls for analytically rigorous approaches for ME. To that end, this paper proposes an intelligent mission coordination methodology that integrates digital mission models with Reinforcement Learning (RL), that specifically addresses the need for adaptive task allocation and reconfiguration. More specifically, we are leveraging a Digital Engineering (DE) based infrastructure that is composed of a high-fidelity digital mission model and agent-based simulation; and then we formulate the mission tactics management problem as a Markov Decision Process (MDP), and employ an RL agent trained via Proximal Policy Optimization. By leveraging the simulation as a sandbox, we map the system states to actions, refining the policy based on realized mission outcomes. The utility of the RL-based intelligent mission coordinator is demonstrated through an aerial firefighting case study. Our findings indicate that the RL-based intelligent mission coordinator not only surpasses baseline performance but also significantly reduces the variability in mission performance. Thus, this study serves as a proof of concept demonstrating that DE-enabled mission simulations combined with advanced analytical tools offer a mission-agnostic framework for improving ME practice; which can be extended to more complicated fleet design and selection problems in the future from a mission-first perspective.
Expected Revenue, Risk, and Grid Impact of Bitcoin Mining: A Decision-Theoretic Perspective
Most current assessments use ex post proxies that miss uncertainty and fail to consistently capture the rapid change in bitcoin mining. We introduce a unified, ex ante statistical model that derives expected return, downside risk, and upside potential profit from the first principles of mining: Each hash is a Bernoulli trial with a Bitcoin block difficulty-based success probability. The model yields closed-form expected revenue per hash-rate unit, risk metrics in different scenarios, and upside-profit probabilities for different fleet sizes. Empirical calibration closely matches previously reported observations, yielding a unified, faithful quantification across hardware, pools, and operating conditions. This foundation enables more reliable analysis of mining impacts and behavior.
Fast Fixed-time Convergence in Nonlinear Dynamical Systems
A fast convergence in a fixed-time of solutions of nonlinear dynamical systems, for which special requirements are satisfied on the derivative of a quadratic function calculated along the solutions of the system, is proposed. The conditions for the system solutions to converge to zero and to a given region within a fixed-time are obtained. To achieve fast convergence, a negative power is applied to the derivative of a quadratic function within a specific time interval during the evolution of the system. The application of the proposed results to the design of control laws for arbitrary order linear plants using the backstepping method is considered. All the main results are accompanied by numerical modelling and a comparison of the proposed solutions with some existing ones.
Divergence Method to Stability Study of Andronov-Vyshnegradsky Problem. Hidden Oscillations
The classical Andronov-Vyshnegradsky problem, which deals with locating regions of stability and oscillations in control systems with a Watt regulator, is solved using a divergence method for studying the stability of dynamic systems. This system is studied both with and without the self-regulation effect. The exact value of the hidden boundary of the global stability region is obtained. The stability criteria for a system with a Watt regulator are also presented in the context of the solvability of a linear matrix inequality. Computer modelling shows that the system exhibits hidden oscillations when the self-regulation effect is present and when it is not. The conditions for computing the hidden boundary of global stability are determined by three parameters in the Watt regulator model.
Contingency Model-based Control (CMC) for Communicationless Cooperative Collision Avoidance in Robot Swarms
Cooperative collision avoidance between robots in swarm operations remains an open challenge. Assuming a decentralized architecture, each robot is responsible for making its own control decisions, including motion planning. To this end, most existing approaches mostly rely some form of (wireless) communication between the agents of the swarm. In reality, however, communication is brittle. It may be affected by latency, further delays and packet losses, transmission faults, and is subject to adversarial attacks, such as jamming or spoofing. This paper proposes Contingency Model-based Control (CMC) as a communicationless alternative. It follows the implicit cooperation paradigm, under which the design of the robots is based on consensual (offline) rules, similar to traffic rules. They include the definition of a contingency trajectory for each robot, and a method for construction of mutual collision avoidance constraints. The setup is shown to guarantee the recursive feasibility and collision avoidance between all swarm members in closed-loop operation. Moreover, CMC naturally satisfies the Plug \& Play paradigm, i.e., for new robots entering the swarm. Two numerical examples demonstrate that the collision avoidance guarantee is intact and that the robot swarm operates smoothly under the CMC regime.
A Tutorial to Multirate Extended Kalman Filter Design for Monitoring of Agricultural Anaerobic Digestion Plants
In many applications of biotechnology, measurements are available at different sampling rates, e.g., due to online sensors and offline lab analysis. Offline measurements typically involve time delays that may be unknown a priori due to the underlying laboratory procedures. This multirate (MR) setting poses a challenge to Kalman filtering, where conventionally measurement data is assumed to be available on an equidistant time grid and without delays. The present study derives the MR version of an extended Kalman filter (EKF) based on sample state augmentation, and applies it to the anaerobic digestion (AD) process in a simulative agricultural setting. The performance of the MR-EKF is investigated for various scenarios, i.e., varying delay lengths, measurement noise levels, plant-model mismatch (PMM), and initial state error. Provided with an adequate tuning, the MR-EKF could be demonstrated to reliably estimate the process state, to appropriately fuse delayed offline measurements, and to smooth noisy online measurements well. Because of the sample state augmentation approach, the delay length of offline measurements does not critically impair state estimation performance, provided observability is not lost during the delays. Poor state initialization and PMM affect convergence more than measurement noise levels. Further, selecting an appropriate tuning was found to be critically important for successful application of the MR-EKF, for which a systematic approach is presented. This study provides implementation guidance for practitioners aiming at successfully applying state estimation for multirate systems. It thereby contributes to develop demand-driven operation of biogas plants, which may aid in stabilizing a renewable electricity grid.
comment: extended appendix
Data-based Moving Horizon Estimation under Irregularly Measured Data
In this work, we introduce a sample- and data-based moving horizon estimation framework for linear systems. We perform state estimation in a sample-based fashion in the sense that we assume to have only few, irregular output measurements available. This setting is encountered in applications where measuring is expensive or time-consuming. Furthermore, the state estimation framework does not rely on a standard mathematical model, but on an implicit system representation based on measured data. We prove sample-based practical robust exponential stability of the proposed estimator under mild assumptions. Furthermore, we apply the proposed scheme to estimate the states of a gastrointestinal tract absorption system.
comment: 8 pages
Inference in Latent Force Models Using Optimal State Estimation
Latent force models, a class of hybrid modeling approaches, integrate physical knowledge of system dynamics with a latent force - an unknown, unmeasurable input modeled as a Gaussian process. In this work, we introduce two optimal state estimation frameworks to reconstruct the latent forces and to estimate the states. In contrast to state-of-the-art approaches, the designed estimators enable the consideration of system-inherent constraints. Finally, the performance of the novel frameworks is investigated in several numerical examples. In particular, we demonstrate the performance of the new framework in a real-world biomedical example - the hypothalamic-pituitary-thyroid axis - using hormone measurements.
comment: 8 pages
Sliding Mode Control for a Parabolic-Elliptic PDE System with Boundary Perturbation
In this paper, we address the robustness of parabolic-elliptic systems under boundary control. A sliding mode control strategy is proposed to reject matched perturbations. The stability analysis establishes finite-time convergence of the sliding manifold and exponential stability of the closed-loop system. Since the closed-loop system is discontinuous, we also prove its well-posedness. A numerical example is provided to validate the effectiveness of the proposed approach.
Finite-Time Control Based on Differential Flatness for Wheeled Mobile Robots with Experimental Validation
A robust tracking control strategy is designed to empower wheeled mobile robots (WMRs) to track predetermined routes while operating in diverse fields and encountering disturbances like strong winds or uneven path conditions, which affect tracking performance. Ensuring the applicability of this tracking method in real-world scenarios is essential. To accomplish this, the WMR model is initially transformed into a linear canonical form by leveraging the differential flatness of its kinematic model, facilitating controller design. Subsequently, a novel integral nonlinear hyperplane-based sliding mode control (INH-SMC) technique is proposed for WMR under disturbances. The stability of the technique is analyzed and verified. Finally, its practical viability is demonstrated through a comparative real-world indoor experiment on a TurtleBot3 WMR subjected to disturbances, confirming the feasibility and efficacy of the proposed approach.
Robust safety design for strict-feedback nonlinear systems via observer-based linear time varying feedback
This paper develops a robust safety-critical control method for nonlinear strictfeedback systems with mismatched disturbances. Using a state transformation and a linear time-varying disturbance observer, the system is converted into a form that enables safe control design. The approach ensures forward invariance of the safety set and also applies to disturbancefree systems. Safety is proven for all cases, and a numerical example illustrates the results.
Joint Design of Embedded Index Coding and Beamforming for MIMO-based Distributed Computing via Multi-Agent Reinforcement Learning
In distributed computing systems, reducing the communication load during the data shuffling phase is a critical challenge, as excessive inter-node transmissions are a major performance bottleneck. One promising approach to alleviate this burden is Embedded Index Coding (EIC), which exploits cached data at user nodes to encode transmissions more efficiently. However, most prior work on EIC has focused on minimizing code length in wired, error-free environments-an objective often suboptimal for wireless multiple-input multiple-output (MIMO) systems, where channel conditions and spatial multiplexing gains must be considered. This paper investigates the joint design of EIC and transmit beamforming in MIMO systems to minimize total transmission time, an NP-hard problem. We first present a conventional optimization method that determines the optimal EIC via exhaustive search. To address its prohibitive complexity and adapt to dynamic wireless environments, we propose a novel, low-complexity multi-agent reinforcement learning (MARL) framework. The proposed framework enables decentralized agents to act on local observations while effectively managing the hybrid action space of discrete EIC selection and continuous beamforming design. Simulation results demonstrate that the proposed MARL-based approach achieves near-optimal performance with significantly reduced complexity, underscoring its effectiveness and practicality for real-world wireless systems.
From Dissipativity Property to Data-Driven GAS Certificate of Degree-One Homogeneous Networks with Unknown Topology
In this work, we propose a data-driven divide and conquer strategy for the stability analysis of interconnected homogeneous nonlinear networks of degree one with unknown models and a fully unknown topology. The proposed scheme leverages joint dissipativity-type properties of subsystems described by storage functions, while providing a stability certificate over unknown interconnected networks. In our data-driven framework, we begin by formulating the required conditions for constructing storage functions as a robust convex program (RCP). Given that unknown models of subsystems are integrated into one of the constraints of the RCP, we collect data from trajectories of each unknown subsystem and provide a scenario convex program (SCP) that aligns with the original RCP. We solve the SCP as a linear program and construct a storage function for each subsystem with unknown dynamics. Under some newly developed data-driven compositionality conditions, we then construct a Lyapunov function for the fully unknown interconnected network utilizing storage functions derived from data of individual subsystems. We show that our data-driven {divide and conquer strategy} provides correctness guarantees (as opposed to probabilistic confidence) while significantly mitigating the sample complexity problem existing in data-driven approaches. To illustrate the effectiveness of our proposed results, we apply our approaches to three different case studies involving interconnected homogeneous (nonlinear) networks with unknown models. We collect data from trajectories of unknown subsystems and verify the global asymptotic stability (GAS) of the interconnected system with a guaranteed confidence.
An Optimal Policy for Learning Controllable Dynamics by Exploration
Controllable Markov chains describe the dynamics of sequential decision making tasks and are the central component in optimal control and reinforcement learning. In this work, we give the general form of an optimal policy for learning controllable dynamics in an unknown environment by exploring over a limited time horizon. This policy is simple to implement and efficient to compute, and allows an agent to ``learn by exploring" as it maximizes its information gain in a greedy fashion by selecting controls from a constraint set that changes over time during exploration. We give a simple parameterization for the set of controls, and present an algorithm for finding an optimal policy. The reason for this policy is due to the existence of certain types of states that restrict control of the dynamics; such as transient states, absorbing states, and non-backtracking states. We show why the occurrence of these states makes a non-stationary policy essential for achieving optimal exploration. Six interesting examples of controllable dynamics are treated in detail. Policy optimality is demonstrated using counting arguments, comparing with suboptimal policies, and by making use of a sequential improvement property from dynamic programming.
$\mathscr{H}_2$ Model Reduction for Augmented Model of Linear Non-Markovian Quantum Systems
An augmented system model provides an effective way to model non-Markovian quantum systems, which is useful in filtering and control for this class of systems. However, since a large number of ancillary quantum oscillators representing internal modes of a non-Markovian environment directly interact with the principal system in these models, the dimension of the augmented system may be very large causing significant computational burden in designing filters and controllers. In this context, this paper proposes an $\mathscr{H}_2$ model reduction method for the augmented model of linear non-Markovian quantum systems. We first establish necessary and sufficient conditions for the physical realizability of the augmented model of linear non-Markovian quantum systems, which are more stringent than those for Markovian quantum systems. However, these physical realizability conditions of augmented system model pose non-convex constrains in the optimization problem of model reduction, which makes the problem different from the corresponding classical model reduction problem. To solve the problem, we derive necessary conditions for determining the input matrix in the reduced model, with which a theorem for designing the system matrix of the ancillary system in the reduced system is proved. Building on this, we convert the nonlinear equality constraints into inequality constraints so that a semidefinite programming algorithm can be developed to solve the optimization problem for model reduction. A numerical example of a two-mode linear quantum system driven by three internal modes of a non-Markovian environment validates the effectiveness of our method.
comment: 15 pages, 3 figures
HeylandCircle: A Computational Framework for the Geometric Reconstruction of the Heyland Circle Diagram
The Heyland circle diagram is a classical graphical tool for representing the steady-state behavior of induction machines using no-load and blocked-rotor test data. While widely used in alternating-current machinery texts, the diagram is typically presented as a hand-constructed aid and lacks a standardized computational formulation. This paper presents HeylandCircle, a computational framework that reconstructs the classical Heyland circle diagram directly from standard test parameters. The framework formalizes the traditional geometric construction as a deterministic, reproducible sequence of geometric operations, establishing a clear mapping between measured data, fixed geometric objects, and steady-state operating points. Quantities such as power factor, slip, output power, torque, and efficiency are obtained through explicit geometric relationships on the constructed diagram. Validation using a representative textbook example demonstrates close agreement with classical results. The framework provides a computational realization of the traditional Heyland diagram suitable for instruction, analysis, and systematic extension.
comment: 8 pages, 1 figure
Spatio-Temporal Graph Neural Networks for Dairy Farm Sustainability Forecasting and Counterfactual Policy Analysis
This study introduces a novel data-driven framework and the first-ever county-scale application of Spatio-Temporal Graph Neural Networks (STGNN) to forecast composite sustainability indices from herd-level operational records. The methodology employs a novel, end-to-end pipeline utilizing a Variational Autoencoder (VAE) to augment Irish Cattle Breeding Federation (ICBF) datasets, preserving joint distributions while mitigating sparsity. A first-ever pillar-based scoring formulation is derived via Principal Component Analysis, identifying Reproductive Efficiency, Genetic Management, Herd Health, and Herd Management, to construct weighted composite indices. These indices are modelled using a novel STGNN architecture that explicitly encodes geographic dependencies and non-linear temporal dynamics to generate multi-year forecasts for 2026-2030.
Extragradient methods with complexity guarantees for hierarchical variational inequalities
In the framework of a real Hilbert space we consider the problem of approaching solutions to a class of hierarchical variational inequality problems, subsuming several other problem classes including certain mathematical programs under equilibrium constraints, constrained min-max problems, hierarchical game problems, optimal control under VI constraints, and simple bilevel optimization problems. For this general problem formulation, we establish rates of convergence in terms of suitably constructed gap functions, measuring feasibility gaps and optimality gaps. We present worst-case iteration complexity results on both levels of the variational problem, as well as weak convergence under a geometric weak sharpness condition on the lower level solution set. Our results match and improve the state of the art in terms of their iteration complexity and the generality of the problem formulation.
X-GridAgent: An LLM-Powered Agentic AI System for Assisting Power Grid Analysis
The growing complexity of power system operations has created an urgent need for intelligent, automated tools to support reliable and efficient grid management. Conventional analysis tools often require significant domain expertise and manual effort, which limits their accessibility and adaptability. To address these challenges, this paper presents X-GridAgent, a novel large language model (LLM)-powered agentic AI system designed to automate complex power system analysis through natural language queries. The system integrates domain-specific tools and specialized databases under a three-layer hierarchical architecture comprising planning, coordination, and action layers. This architecture offers high flexibility and adaptability to previously unseen tasks, while providing a modular and extensible framework that can be readily expanded to incorporate new tools, data sources, or analytical capabilities. To further enhance performance, we introduce two novel algorithms: (1) LLM-driven prompt refinement with human feedback, and (2) schema-adaptive hybrid retrieval-augmented generation (RAG) for accurate information retrieval from large-scale structured grid datasets. Experimental evaluations across a variety of user queries and power grid cases demonstrate the effectiveness and reliability of X-GridAgent in automating interpretable and rigorous power system analysis.
Fixed-time control with prescribed performance for path following of underwater gliders
Underwater gliders are increasingly deployed in challenging missions - such as hurricane-season observations and long-endurance environmental monitoring - where strong currents and turbulence pose significant risks to navigation safety. To address these practical challenges, this paper presents a fixed-time prescribed performance control scheme for the 3D path following of underwater gliders subject to model uncertainties and environmental disturbances. The primary contribution is the integration of a finite-time performance function within a fixed-time control framework. This synthesis ensures that the tracking errors are constrained within prescribed performance bounds and converge to a compact set within a fixed time, independent of initial conditions. A second key contribution is the development of a fixed-time sliding mode disturbance observer that provides accurate finite-time estimation of lumped disturbances, enhancing the system's robustness. Integrated with an iLOS guidance law, the proposed controller enables precise and safe waypoint following. Numerical simulations demonstrate that the proposed method outperforms conventional sliding mode and prescribed performance controllers in tracking accuracy, convergence speed, and control effort smoothness, validating its efficacy for robust underwater navigation.
comment: 22 pages, 13 figures, 2 tables, Submitted to Ocean Engineering
Optimized Rolling Allocation of Outages for Damage Assesment
Natural disasters often inflict severe damage on distribution grids. Rapid, reliable damage assessment (DA) is essential for storm restoration, yet most optimization work targets repair dispatch after faults are identified. This paper presents a production, rolling horizon DA crew allocation system deployed across multiple U.S. states in Eversource Energy's service territory and used during live storms. The method implements a sequential k-job assignment policy per available crew, executed on a fixed cadence and on operators' control. The objective jointly prioritizes critical facilities and customer impact while controlling travel time on the actual road network via the Google Maps API. A key constraint is the absence of live crew GPS; we infer crew locations from the last confirmed DA site and robustify travel estimates for staleness, yielding stable recommendations without continuous tracking. The operator remains in the loop with controls to limit churn and to publish a feasible plan. Using data from the March 7 New Hampshire storm with 90 moderate outages and seven DA crews, we observe shorter time to first assessment, fewer revisits with reduced distance traveled. To our knowledge, this is among the first multi-state enterprise integrated deployments to treat DA crews as a first-class optimized resource in storm restoration.
Adaptive Real-Time Scheduling Algorithms for Embedded Systems
Embedded systems are becoming more in demand to work in dynamic and uncertain environments, and being confined to the strong requirements of real-time. Conventional static scheduling models usually cannot cope with runtime modification in workload, resource availability, or system updates. This brief survey covers the area of feedback-based control (e.g., Feedback Control Scheduling) and interdependence between tasks (e.g., Symbiotic Scheduling of Periodic Tasks) models. It also borders on predictive methods and power management, combining methods based on Dynamic Voltage and Frequency Scaling (DVFS). In this paper, key mechanisms are briefly summarized, influencing trade-offs relating to adaptivity/predictability, typical metrics of evaluation, and ongoing problems, especially in situations where safety is a critical factor, giving a succinct and easy-to-understand introduction to researchers and practitioners who have to cope with the changing environment of adaptive real-time systems.
An Equivalent and Unified Virtual Battery Modeling Framework for Flexibility Characterization of Building HVAC Systems
The heating, ventilation and air-conditioning (HVAC) system dominates building's energy consumption and meanwhile exhibits substantial operational flexibility that can be exploited for providing grid services. However, the goal is largely hindered by the difficulty to characterize the system's operating flexibility due to the complex building thermal dynamics, system operating limits and human comfort constraints. To address this challenge, this paper develops an unified virtual battery (VB) modeling framework for characterizing the operating flexibility of both single-zone and multi-zone building HVAC systems, enabling flexible buildings to function like virtual batteries. Specifically, a physically meaningful representation state is first identified to represent building thermal conditions under thermal comfort constraints and a VB model is then established for characterizing the operating flexibility of single-zone HVAC systems. We subsequently extend the VB modeling framework to multi-zone HVAC systems and establish a set of zone-level VB models to characterize the building's zonal operating flexibility. We further develop a systematic method to aggregate the VB models into a low-order and low-complexity aggregated VB model, significantly reducing model and computational complexity. We demonstrate the VB model through demand response (DR) applications and conclude that the VB model can well capture the operating flexibility of building HVAC systems and enable effective DR participation. The DR strategies obtained from the VB model can be efficiently decomposed to zone-level control inputs for maintaining human thermal comfort while achieving near-optimal operation cost.
comment: 13 pages, 6 figures
From Visual Perception to Deep Empathy: An Automated Assessment Framework for House-Tree-Person Drawings Using Multimodal LLMs and Multi-Agent Collaboration
Background: The House-Tree-Person (HTP) drawing test, introduced by John Buck in 1948, remains a widely used projective technique in clinical psychology. However, it has long faced challenges such as heterogeneous scoring standards, reliance on examiners subjective experience, and a lack of a unified quantitative coding system. Results: Quantitative experiments showed that the mean semantic similarity between Multimodal Large Language Model (MLLM) interpretations and human expert interpretations was approximately 0.75 (standard deviation about 0.05). In structurally oriented expert data sets, this similarity rose to 0.85, indicating expert-level baseline comprehension. Qualitative analyses demonstrated that the multi-agent system, by integrating social-psychological perspectives and destigmatizing narratives, effectively corrected visual hallucinations and produced psychological reports with high ecological validity and internal coherence. Conclusions: The findings confirm the potential of multimodal large models as standardized tools for projective assessment. The proposed multi-agent framework, by dividing roles, decouples feature recognition from psychological inference and offers a new paradigm for digital mental-health services. Keywords: House-Tree-Person test; multimodal large language model; multi-agent collaboration; cosine similarity; computational psychology; artificial intelligence
comment: 16 pages, 8 figures
Deep Reinforcement Learning Optimization for Uncertain Nonlinear Systems via Event-Triggered Robust Adaptive Dynamic Programming
This work proposes a unified control architecture that couples a Reinforcement Learning (RL)-driven controller with a disturbance-rejection Extended State Observer (ESO), complemented by an Event-Triggered Mechanism (ETM) to limit unnecessary computations. The ESO is utilized to estimate the system states and the lumped disturbance in real time, forming the foundation for effective disturbance compensation. To obtain near-optimal behavior without an accurate system description, a value-iteration-based Adaptive Dynamic Programming (ADP) method is adopted for policy approximation. The inclusion of the ETM ensures that parameter updates of the learning module are executed only when the state deviation surpasses a predefined bound, thereby preventing excessive learning activity and substantially reducing computational load. A Lyapunov-oriented analysis is used to characterize the stability properties of the resulting closed-loop system. Numerical experiments further confirm that the developed approach maintains strong control performance and disturbance tolerance, while achieving a significant reduction in sampling and processing effort compared with standard time-triggered ADP schemes.
comment: 9 pages, 9 figures
Turbulent Multiple-Scattering Channel Modeling for Ultraviolet Communications: A Monte-Carlo Integration Approach
Modeling of multiple-scattering channels in atmospheric turbulence is essential for the performance analysis of long-distance non-line-of-sight (NLOS) ultraviolet (UV) communications. Existing works on the turbulent channel modeling for NLOS UV communications either focused on single-scattering cases or estimate the turbulent fluctuation effect in an unreliable way based on Monte-Carlo simulation (MCS) approach. In this paper, we establish a comprehensive turbulent multiple-scattering channel model by using a more efficient Monte-Carlo integration (MCI) approach for NLOS UV communications, where both the scattering, absorption, and turbulence effects are considered. Compared with the MCS approach, the MCI approach is more interpretable for estimating the turbulent fluctuation. To achieve this, we first introduce the scattering, absorption, and turbulence effects for NLOS UV communications in turbulent channels. Then we propose the estimation methods based on MCI approach for estimating both the turbulent fluctuation and the distribution of turbulent fading coefficient. Numerical results demonstrate that the turbulence-induced scattering effect can always be ignored for typical UV communication scenarios. Besides, the turbulent fluctuation will increase as either the communication distance increases or the zenith angle decreases, which is compatible with existing experimental results and also with our experimental results. Moreover, we demonstrate numerically that the distribution of the turbulent fading coefficient for UV multiple-scattering channels under all turbulent conditions can be approximated as log-normal distribution; and we also demonstrate both numerically and experimentally that the turbulent fading can be approximated as a Gaussian distribution under weak turbulence.
comment: 28 pages,9 figures
Learning Stack-of-Tasks Management for Redundant Robots
This paper presents a novel framework for automatically learning complete Stack-of-Tasks (SoT) controllers for redundant robotic systems, including task priorities, activation logic, and control parameters. Unlike classical SoT pipelines-where task hierarchies are manually defined and tuned-our approach optimizes the full SoT structure directly from a user-specified cost function encoding intuitive preferences such as safety, precision, manipulability, or execution speed. The method combines Genetic Programming with simulation-based evaluation to explore both discrete (priority order, task activation) and continuous (gains, trajectory durations) components of the controller. We validate the framework on a dual-arm mobile manipulator (the ABB mobile-YuMi research platform), demonstrating robust convergence across multiple cost definitions, automatic suppression of irrelevant tasks, and strong resilience to distractors. Learned SoTs exhibit expert-like hierarchical structure and adapt naturally to multi-objective trade-offs. Crucially, all controllers transfer from Gazebo simulation to the real robot, achieving safe and precise motion without additional tuning. Experiments in static and dynamic environments show reliable obstacle avoidance, high tracking accuracy, and predictable behavior in the presence of humans. The proposed method provides an interpretable and scalable alternative to manual SoT design, enabling rapid, user-driven generation of task execution hierarchies for complex robotic systems.
LeLaR: The First In-Orbit Demonstration of an AI-Based Satellite Attitude Controller
Attitude control is essential for many satellite missions. Classical controllers, however, are time-consuming to design and sensitive to model uncertainties and variations in operational boundary conditions. Deep Reinforcement Learning (DRL) offers a promising alternative by learning adaptive control strategies through autonomous interaction with a simulation environment. Overcoming the Sim2Real gap, which involves deploying an agent trained in simulation onto the real physical satellite, remains a significant challenge. In this work, we present the first successful in-orbit demonstration of an AI-based attitude controller for inertial pointing maneuvers. The controller was trained entirely in simulation and deployed to the InnoCube 3U nanosatellite, which was developed by the Julius-Maximilians-Universität Würzburg in cooperation with the Technische Universität Berlin, and launched in January 2025. We present the AI agent design, the methodology of the training procedure, the discrepancies between the simulation and the observed behavior of the real satellite, and a comparison of the AI-based attitude controller with the classical PD controller of InnoCube. Steady-state metrics confirm the robust performance of the AI-based controller during repeated in-orbit maneuvers.
comment: 55 pages, 27 figures, 29 tables. The maneuver telemetry datasets generated and analyzed during this work are available in the GitHub repository under https://github.com/kdjebko/lelar-in-orbit-data
Integrating Conductor Health into Dynamic Line Rating and Unit Commitment under Uncertainty
Dynamic line rating (DLR) enables greater utilization of existing transmission lines by leveraging real-time weather data. However, the elevated temperature operation (ETO) of conductors under DLR is often overlooked, despite its long-term impact on conductor health. This paper addresses this issue by 1) quantifying depreciation costs associated with ETO and 2) proposing a Conductor Health-Aware Unit Commitment (CHA-UC) that internalizes these costs in operational decisions. The CHA-UC incorporates a robust linear approximation of conductor temperature and integration of expected depreciation costs due to hourly ETO into the objective function. Case studies on the Texas 123-bus backbone test system using NOAA weather data demonstrate that the proposed CHA-UC model reduces the total cost by 0.8% and renewable curtailment by 84%compared to static line rating (SLR), while conventional DLR operation without risk consideration resulted in higher costs due to excessive ETO. Further analysis of the commitment decisions and the line temperature statistics confirms that the CHA-UC achieves safer line flows by shifting generator commitments. Finally, we examine the emergent correlation between wind generation and DLR forecast errors, and show that CHA-UC adaptively manages this effect by relaxing flows for risk-hedging conditions while tightening flows for risk-amplifying ones.
Learning Enhanced Ensemble Filters
The filtering distribution in hidden Markov models evolves according to the law of a mean-field model in state-observation space. The ensemble Kalman filter (EnKF) approximates this mean-field model with an ensemble of interacting particles, employing a Gaussian ansatz for the joint distribution of the state and observation at each observation time. These methods are robust, but the Gaussian ansatz limits accuracy. Here this shortcoming is addressed by using machine learning to map the joint predicted state and observation to the updated state estimate. The derivation of methods from a mean field formulation of the true filtering distribution suggests a single parametrization of the algorithm that can be deployed at different ensemble sizes. And we use a mean field formulation of the ensemble Kalman filter as an inductive bias for our architecture. To develop this perspective, in which the mean-field limit of the algorithm and finite interacting ensemble particle approximations share a common set of parameters, a novel form of neural operator is introduced, taking probability distributions as input: a measure neural mapping (MNM). A MNM is used to design a novel approach to filtering, the MNM-enhanced ensemble filter (MNMEF), which is defined in both the mean-field limit and for interacting ensemble particle approximations. The ensemble approach uses empirical measures as input to the MNM and is implemented using the set transformer, which is invariant to ensemble permutation and allows for different ensemble sizes. In practice fine-tuning of a small number of parameters, for specific ensemble sizes, further enhances the accuracy of the scheme. The promise of the approach is demonstrated by its superior root-mean-square-error performance relative to leading methods in filtering the Lorenz '96 and Kuramoto-Sivashinsky models.
comment: Accepted by the Journal of Computational Physics
Safe Online Control-Informed Learning
This paper proposes a Safe Online Control-Informed Learning framework for safety-critical autonomous systems. The framework unifies optimal control, parameter estimation, and safety constraints into an online learning process. It employs an extended Kalman filter to incrementally update system parameters in real time, enabling robust and data-efficient adaptation under uncertainty. A softplus barrier function enforces constraint satisfaction during learning and control while eliminating the dependence on high-quality initial guesses. Theoretical analysis establishes convergence and safety guarantees, and the framework's effectiveness is demonstrated on cart-pole and robot-arm systems.
Gait Transitions in Load-Pulling Quadrupeds: Insights from Sled Dogs and a Minimal SLIP Model
Quadrupedal animals employ diverse galloping strategies to optimize speed, stability, and energy efficiency. However, the biomechanical mechanisms that enable adaptive gait transitions during high-speed locomotion under load remain poorly understood. In this study, we present new empirical and modeling insights into the biomechanics of load-pulling quadrupeds, using sprint sled dogs as a model system. High-speed video and force recordings reveal that sled dogs often switch between rotary and transverse galloping gaits within just a few strides and without any observable changes in speed, stride duration, or terrain, providing clear evidence of locomotor multistability during high-speed load-pulling. To investigate the mechanical basis of these transitions, a physics-based quadrupedal Spring-Loaded Inverted Pendulum model with hybrid dynamics and prescribed footfall sequences to reproduce the asymmetric galloping patterns observed in racing sled dogs. Through trajectory optimization, we replicate experimentally observed gait sequences and identify swing-leg stiffness modulation as a key control mechanism for inducing transitions. This work provides a much-needed biomechanical perspective on high-speed animal draft and establishes a modeling framework for studying locomotion in pulling quadrupeds, with implications for both biological understanding and the design of adaptive legged systems.
Network-Realised Model Predictive Control Part II: Distributed Constraint Management
A two-layer control architecture is proposed, which promotes scalable implementations for model predictive controllers. The top layer acts as both reference governor for the bottom layer, and as a feedback controller for the regulated network. By employing set-based methods, global theoretical guarantees are obtained by enforcing local constraints upon the network's variables and upon those of the first layer's implementation. The proposed technique offers recursive feasibility guarantees as one of its central features, and the expressions of the resulting predictive strategies bear a striking resemblance to classical formulations from model predictive control literature, allowing for flexible and easily customisable implementations.
comment: 20 pages, 9 figures, 4 tables
Demonstrating Integrative, Scalable and Extensible Modeling of Hydrological Systems with Model-Based Systems Engineering and Hetero-functional Graph Theory
Worsening global challenges demand solutions grounded in a systems-level understanding of coupled social and environmental dynamics. Existing environmental models encode extensive knowledge of individual systems, yet much of this information remains isolated within domain-specific formulations and data structures. This paper introduces a unified modeling framework that formalizes information from existing process models by asserting real-world physical relationships onto their underlying mathematical representations. By integrating Model-Based Systems Engineering (MBSE) with Hetero-functional Graph Theory (HFGT), the framework establishes a consistent ontology that explicitly defines system structure and behavior. Illustrative hydrological examples demonstrate implementation of the methodology, showing how relationships embedded in conventional process models can be made explicit and scalable. While simplified, these examples provide a first step toward applying the approach to complex environmental systems. More broadly, the methodology offers a foundation for future modeling of systems of systems within a shared computational architecture.
Modeling skiers flows via Wardrope equilibrium in closed capacitated networks
We propose an equilibrium model of ski resorts where users are assigned to cycles in a closed network. As queues form on lifts with limited capacity, we derive an efficient way to find waiting times via convex optimization. The equilibrium problem is formulated as a variational inequality, and numerical experiments show that it can be solved using standard algorithms.
comment: Corrected the statement about the uniqueness of waiting times: lift waiting times are not necessarily unique, but cycle waiting times are
Robotics
Zero-shot Reconstruction of In-Scene Object Manipulation from Video
We build the first system to address the problem of reconstructing in-scene object manipulation from a monocular RGB video. It is challenging due to ill-posed scene reconstruction, ambiguous hand-object depth, and the need for physically plausible interactions. Existing methods operate in hand centric coordinates and ignore the scene, hindering metric accuracy and practical use. In our method, we first use data-driven foundation models to initialize the core components, including the object mesh and poses, the scene point cloud, and the hand poses. We then apply a two-stage optimization that recovers a complete hand-object motion from grasping to interaction, which remains consistent with the scene information observed in the input video.
LoGoPlanner: Localization Grounded Navigation Policy with Metric-aware Visual Geometry
Trajectory planning in unstructured environments is a fundamental and challenging capability for mobile robots. Traditional modular pipelines suffer from latency and cascading errors across perception, localization, mapping, and planning modules. Recent end-to-end learning methods map raw visual observations directly to control signals or trajectories, promising greater performance and efficiency in open-world settings. However, most prior end-to-end approaches still rely on separate localization modules that depend on accurate sensor extrinsic calibration for self-state estimation, thereby limiting generalization across embodiments and environments. We introduce LoGoPlanner, a localization-grounded, end-to-end navigation framework that addresses these limitations by: (1) finetuning a long-horizon visual-geometry backbone to ground predictions with absolute metric scale, thereby providing implicit state estimation for accurate localization; (2) reconstructing surrounding scene geometry from historical observations to supply dense, fine-grained environmental awareness for reliable obstacle avoidance; and (3) conditioning the policy on implicit geometry bootstrapped by the aforementioned auxiliary tasks, thereby reducing error propagation.We evaluate LoGoPlanner in both simulation and real-world settings, where its fully end-to-end design reduces cumulative error while metric-aware geometry memory enhances planning consistency and obstacle avoidance, leading to more than a 27.3\% improvement over oracle-localization baselines and strong generalization across embodiments and environments. The code and models have been made publicly available on the \href{https://steinate.github.io/logoplanner.github.io/}{project page}.
comment: Project page:https://steinate.github.io/logoplanner.github.io/
Learning Generalizable Hand-Object Tracking from Synthetic Demonstrations
We present a system for learning generalizable hand-object tracking controllers purely from synthetic data, without requiring any human demonstrations. Our approach makes two key contributions: (1) HOP, a Hand-Object Planner, which can synthesize diverse hand-object trajectories; and (2) HOT, a Hand-Object Tracker that bridges synthetic-to-physical transfer through reinforcement learning and interaction imitation learning, delivering a generalizable controller conditioned on target hand-object states. Our method extends to diverse object shapes and hand morphologies. Through extensive evaluations, we show that our approach enables dexterous hands to track challenging, long-horizon sequences including object re-arrangement and agile in-hand reorientation. These results represent a significant step toward scalable foundation controllers for manipulation that can learn entirely from synthetic data, breaking the data bottleneck that has long constrained progress in dexterous manipulation.
LeLaR: The First In-Orbit Demonstration of an AI-Based Satellite Attitude Controller
Attitude control is essential for many satellite missions. Classical controllers, however, are time-consuming to design and sensitive to model uncertainties and variations in operational boundary conditions. Deep Reinforcement Learning (DRL) offers a promising alternative by learning adaptive control strategies through autonomous interaction with a simulation environment. Overcoming the Sim2Real gap, which involves deploying an agent trained in simulation onto the real physical satellite, remains a significant challenge. In this work, we present the first successful in-orbit demonstration of an AI-based attitude controller for inertial pointing maneuvers. The controller was trained entirely in simulation and deployed to the InnoCube 3U nanosatellite, which was developed by the Julius-Maximilians-Universität Würzburg in cooperation with the Technische Universität Berlin, and launched in January 2025. We present the AI agent design, the methodology of the training procedure, the discrepancies between the simulation and the observed behavior of the real satellite, and a comparison of the AI-based attitude controller with the classical PD controller of InnoCube. Steady-state metrics confirm the robust performance of the AI-based controller during repeated in-orbit maneuvers.
comment: 55 pages, 27 figures, 29 tables. The maneuver telemetry datasets generated and analyzed during this work are available in the GitHub repository https://github.com/kdjebko/lelar-in-orbit-data
LIMOncello: Revisited IKFoM on the SGal(3) Manifold for Fast LiDAR-Inertial Odometry
This work introduces LIMOncello, a tightly coupled LiDAR-Inertial Odometry system that models 6-DoF motion on the $\mathrm{SGal}(3)$ manifold within an iterated error-state Kalman filter backend. Compared to state representations defined on $\mathrm{SO}(3)\times\mathbb{R}^6$, the use of $\mathrm{SGal}(3)$ provides a coherent and numerically stable discrete-time propagation model that helps limit drift in low-observability conditions. LIMOncello also includes a lightweight incremental i-Octree mapping backend that enables faster updates and substantially lower memory usage than incremental kd-tree style map structures, without relying on locality-restricted search heuristics. Experiments on multiple real-world datasets show that LIMOncello achieves competitive accuracy while improving robustness in geometrically sparse environments. The system maintains real-time performance with stable memory growth and is released as an extensible open-source implementation at https://github.com/CPerezRuiz335/LIMOncello.
Results of the 2024 CommonRoad Motion Planning Competition for Autonomous Vehicles
Over the past decade, a wide range of motion planning approaches for autonomous vehicles has been developed to handle increasingly complex traffic scenarios. However, these approaches are rarely compared on standardized benchmarks, limiting the assessment of relative strengths and weaknesses. To address this gap, we present the setup and results of the 4th CommonRoad Motion Planning Competition held in 2024, conducted using the CommonRoad benchmark suite. This annual competition provides an open-source and reproducible framework for benchmarking motion planning algorithms. The benchmark scenarios span highway and urban environments with diverse traffic participants, including passenger cars, buses, and bicycles. Planner performance is evaluated along four dimensions: efficiency, safety, comfort, and compliance with selected traffic rules. This report introduces the competition format and provides a comparison of representative high-performing planners from the 2023 and 2024 editions.
REALM: A Real-to-Sim Validated Benchmark for Generalization in Robotic Manipulation
Vision-Language-Action (VLA) models empower robots to understand and execute tasks described by natural language instructions. However, a key challenge lies in their ability to generalize beyond the specific environments and conditions they were trained on, which is presently difficult and expensive to evaluate in the real-world. To address this gap, we present REALM, a new simulation environment and benchmark designed to evaluate the generalization capabilities of VLA models, with a specific emphasis on establishing a strong correlation between simulated and real-world performance through high-fidelity visuals and aligned robot control. Our environment offers a suite of 15 perturbation factors, 7 manipulation skills, and more than 3,500 objects. Finally, we establish two task sets that form our benchmark and evaluate the π_{0}, π_{0}-FAST, and GR00T N1.5 VLA models, showing that generalization and robustness remain an open challenge. More broadly, we also show that simulation gives us a valuable proxy for the real-world and allows us to systematically probe for and quantify the weaknesses and failure modes of VLAs. Project page: https://martin-sedlacek.com/realm
comment: 9 pages, 10 figures
MaP-AVR: A Meta-Action Planner for Agents Leveraging Vision Language Models and Retrieval-Augmented Generation
Embodied robotic AI systems designed to manage complex daily tasks rely on a task planner to understand and decompose high-level tasks. While most research focuses on enhancing the task-understanding abilities of LLMs/VLMs through fine-tuning or chain-of-thought prompting, this paper argues that defining the planned skill set is equally crucial. To handle the complexity of daily environments, the skill set should possess a high degree of generalization ability. Empirically, more abstract expressions tend to be more generalizable. Therefore, we propose to abstract the planned result as a set of meta-actions. Each meta-action comprises three components: {move/rotate, end-effector status change, relationship with the environment}. This abstraction replaces human-centric concepts, such as grasping or pushing, with the robot's intrinsic functionalities. As a result, the planned outcomes align seamlessly with the complete range of actions that the robot is capable of performing. Furthermore, to ensure that the LLM/VLM accurately produces the desired meta-action format, we employ the Retrieval-Augmented Generation (RAG) technique, which leverages a database of human-annotated planning demonstrations to facilitate in-context learning. As the system successfully completes more tasks, the database will self-augment to continue supporting diversity. The meta-action set and its integration with RAG are two novel contributions of our planner, denoted as MaP-AVR, the meta-action planner for agents composed of VLM and RAG. To validate its efficacy, we design experiments using GPT-4o as the pre-trained LLM/VLM model and OmniGibson as our robotic platform. Our approach demonstrates promising performance compared to the current state-of-the-art method. Project page: https://map-avr.github.io/.
comment: 8 pages, 10 figures, This work was completed in December 2024
Sign Language Recognition using Parallel Bidirectional Reservoir Computing
Sign language recognition (SLR) facilitates communication between deaf and hearing communities. Deep learning based SLR models are commonly used but require extensive computational resources, making them unsuitable for deployment on edge devices. To address these limitations, we propose a lightweight SLR system that combines parallel bidirectional reservoir computing (PBRC) with MediaPipe. MediaPipe enables real-time hand tracking and precise extraction of hand joint coordinates, which serve as input features for the PBRC architecture. The proposed PBRC architecture consists of two echo state network (ESN) based bidirectional reservoir computing (BRC) modules arranged in parallel to capture temporal dependencies, thereby creating a rich feature representation for classification. We trained our PBRC-based SLR system on the Word-Level American Sign Language (WLASL) video dataset, achieving top-1, top-5, and top-10 accuracies of 60.85%, 85.86%, and 91.74%, respectively. Training time was significantly reduced to 18.67 seconds due to the intrinsic properties of reservoir computing, compared to over 55 minutes for deep learning based methods such as Bi-GRU. This approach offers a lightweight, cost-effective solution for real-time SLR on edge devices.
Mixed formulation and structure-preserving discretization of Cosserat rod dynamics in a port-Hamiltonian framework
An energy-based modeling framework for the nonlinear dynamics of spatial Cosserat rods undergoing large displacements and rotations is proposed. The mixed formulation features independent displacement, velocity and stress variables and is further objective and locking-free. Finite rotations are represented using a director formulation that avoids singularities and yields a constant mass matrix. This results in an infinite-dimensional nonlinear port-Hamiltonian (PH) system governed by partial differential-algebraic equations with a quadratic energy functional. Using a time-differentiated compliance form of the stress-strain relations allows for the imposition of kinematic constraints, such as inextensibility or shear-rigidity. A structure-preserving finite element discretization leads to a finite-dimensional system with PH structure, thus facilitating the design of an energy-momentum consistent integration scheme. Dissipative material behavior (via the generalized-Maxwell model) and non-standard actuation approaches (via pneumatic chambers or tendons) integrate naturally into the framework. As illustrated by selected numerical examples, the present framework establishes a new approach to energy-momentum consistent formulations in computational mechanics involving finite rotations.
comment: 37 pages, 16 figures, currently under review
Real2Edit2Real: Generating Robotic Demonstrations via a 3D Control Interface
Recent progress in robot learning has been driven by large-scale datasets and powerful visuomotor policy architectures, yet policy robustness remains limited by the substantial cost of collecting diverse demonstrations, particularly for spatial generalization in manipulation tasks. To reduce repetitive data collection, we present Real2Edit2Real, a framework that generates new demonstrations by bridging 3D editability with 2D visual data through a 3D control interface. Our approach first reconstructs scene geometry from multi-view RGB observations with a metric-scale 3D reconstruction model. Based on the reconstructed geometry, we perform depth-reliable 3D editing on point clouds to generate new manipulation trajectories while geometrically correcting the robot poses to recover physically consistent depth, which serves as a reliable condition for synthesizing new demonstrations. Finally, we propose a multi-conditional video generation model guided by depth as the primary control signal, together with action, edge, and ray maps, to synthesize spatially augmented multi-view manipulation videos. Experiments on four real-world manipulation tasks demonstrate that policies trained on data generated from only 1-5 source demonstrations can match or outperform those trained on 50 real-world demonstrations, improving data efficiency by up to 10-50x. Moreover, experimental results on height and texture editing demonstrate the framework's flexibility and extensibility, indicating its potential to serve as a unified data generation framework.
TwinAligner: Visual-Dynamic Alignment Empowers Physics-aware Real2Sim2Real for Robotic Manipulation
The robotics field is evolving towards data-driven, end-to-end learning, inspired by multimodal large models. However, reliance on expensive real-world data limits progress. Simulators offer cost-effective alternatives, but the gap between simulation and reality challenges effective policy transfer. This paper introduces TwinAligner, a novel Real2Sim2Real system that addresses both visual and dynamic gaps. The visual alignment module achieves pixel-level alignment through SDF reconstruction and editable 3DGS rendering, while the dynamic alignment module ensures dynamic consistency by identifying rigid physics from robot-object interaction. TwinAligner improves robot learning by providing scalable data collection and establishing a trustworthy iterative cycle, accelerating algorithm development. Quantitative evaluations highlight TwinAligner's strong capabilities in visual and dynamic real-to-sim alignment. This system enables policies trained in simulation to achieve strong zero-shot generalization to the real world. The high consistency between real-world and simulated policy performance underscores TwinAligner's potential to advance scalable robot learning. Code and data will be released on https://twin-aligner.github.io
OMP: One-step Meanflow Policy with Directional Alignment
Robot manipulation, a key capability of embodied AI, has turned to data-driven generative policy frameworks, but mainstream approaches like Diffusion Models suffer from high inference latency and Flow-based Methods from increased architectural complexity. While simply applying meanFlow on robotic tasks achieves single-step inference and outperforms FlowPolicy, it lacks few-shot generalization due to fixed temperature hyperparameters in its Dispersive Loss and misaligned predicted-true mean velocities. To solve these issues, this study proposes an improved MeanFlow-based Policies: we introduce a lightweight Cosine Loss to align velocity directions and use the Differential Derivation Equation (DDE) to optimize the Jacobian-Vector Product (JVP) operator. Experiments on Adroit and Meta-World tasks show the proposed method outperforms MP1 and FlowPolicy in average success rate, especially in challenging Meta-World tasks, effectively enhancing few-shot generalization and trajectory accuracy of robot manipulation policies while maintaining real-time performance, offering a more robust solution for high-precision robotic manipulation.
Comparison and Evaluation of Different Simulation Environments for Rigid Body Systems
Rigid body dynamics simulators are important tools for the design, analysis and optimization of mechanical systems in a variety of technical and scientific applications. This study examines four different simulation environments (Adams, Simscape, OpenModelica, and VEROSIM), focusing in particular on the comparison of the modeling methods, the numerical solvers, and the treatment of numerical problems that arise especially in closed-loop kinematics (esp. redundant boundary conditions and static equilibrium problem). A novel and complex crane boom of a real forestry machine serves as a practical benchmark application example. The direct comparison of the different solution approaches in the examined simulation tools supports the user in selecting the most suitable tool for his application.
comment: Accepted at the 10th MHI-Fachkolloquium
Are All Data Necessary? Efficient Data Pruning for Large-scale Autonomous Driving Dataset via Trajectory Entropy Maximization
Collecting large-scale naturalistic driving data is essential for training robust autonomous driving planners. However, real-world datasets often contain a substantial amount of repetitive and low-value samples, which lead to excessive storage costs and bring limited benefits to policy learning. To address this issue, we propose an information-theoretic data pruning method that effectively reduces the training data volume without compromising model performance. Our approach evaluates the trajectory distribution information entropy of driving data and iteratively selects high-value samples that preserve the statistical characteristics of the original dataset in a model-agnostic manner. From a theoretical perspective, we show that maximizing trajectory entropy effectively constrains the Kullback-Leibler divergence between the pruned subset and the original data distribution, thereby maintaining generalization ability. Comprehensive experiments on the NuPlan benchmark with a large-scale imitation learning framework demonstrate that the proposed method can reduce the dataset size by up to 40% while maintaining closed-loop performance. This work provides a lightweight and theoretically grounded approach for scalable data management and efficient policy learning in autonomous driving systems.
comment: 7 pages, 4 figures
Translating Flow to Policy via Hindsight Online Imitation
Recent advances in hierarchical robot systems leverage a high-level planner to propose task plans and a low-level policy to generate robot actions. This design allows training the planner on action-free or even non-robot data sources (e.g., videos), providing transferable high-level guidance. Nevertheless, grounding these high-level plans into executable actions remains challenging, especially with the limited availability of high-quality robot data. To this end, we propose to improve the low-level policy through online interactions. Specifically, our approach collects online rollouts, retrospectively annotates the corresponding high-level goals from achieved outcomes, and aggregates these hindsight-relabeled experiences to update a goal-conditioned imitation policy. Our method, Hindsight Flow-conditioned Online Imitation (HinFlow), instantiates this idea with 2D point flows as the high-level planner. Across diverse manipulation tasks in both simulation and physical world, our method achieves more than $2\times$ performance improvement over the base policy, significantly outperforming the existing methods. Moreover, our framework enables policy acquisition from planners trained on cross-embodiment video data, demonstrating its potential for scalable and transferable robot learning.
Vision-Aided Relative State Estimation for Approach and Landing on a Moving Platform with Inertial Measurements
This paper tackles the problem of estimating the relative position, orientation, and velocity between a UAV and a planar platform undergoing arbitrary 3D motion during approach and landing. The estimation relies on measurements from Inertial Measurement Units (IMUs) mounted on both systems, assuming there is a suitable communication channel to exchange data, together with visual information provided by an onboard monocular camera, from which the bearing (line-of-sight direction) to the platform's center and the normal vector of its planar surface are extracted. We propose a cascade observer with a complementary filter on SO(3) to reconstruct the relative attitude, followed by a linear Riccati observer for relative position and velocity estimation. Convergence of both observers is established under persistently exciting conditions, and the cascade is shown to be almost globally asymptotically and locally exponentially stable. We further extend the design to the case where the platform's rotation is restricted to its normal axis and show that its measured linear acceleration can be exploited to recover the remaining unobservable rotation angle. A sufficient condition to ensure local exponential convergence in this setting is provided. The performance of the proposed observers is validated through extensive simulations.
comment: 13 pages, 4 figures. Submitted to IFAC World Congress 2026
Vision-Language-Policy Model for Dynamic Robot Task Planning
Bridging the gap between natural language commands and autonomous execution in unstructured environments remains an open challenge for robotics. This requires robots to perceive and reason over the current task scene through multiple modalities, and to plan their behaviors to achieve their intended goals. Traditional robotic task-planning approaches often struggle to bridge low-level execution with high-level task reasoning, and cannot dynamically update task strategies when instructions change during execution, which ultimately limits their versatility and adaptability to new tasks. In this work, we propose a novel language model-based framework for dynamic robot task planning. Our Vision-Language-Policy (VLP) model, based on a vision-language model fine-tuned on real-world data, can interpret semantic instructions and integrate reasoning over the current task scene to generate behavior policies that control the robot to accomplish the task. Moreover, it can dynamically adjust the task strategy in response to changes in the task, enabling flexible adaptation to evolving task requirements. Experiments conducted with different robots and a variety of real-world tasks show that the trained model can efficiently adapt to novel scenarios and dynamically update its policy, demonstrating strong planning autonomy and cross-embodiment generalization. Videos: https://robovlp.github.io/
comment: Manuscript under review
A Flexible Field-Based Policy Learning Framework for Diverse Robotic Systems and Sensors
We present a cross robot visuomotor learning framework that integrates diffusion policy based control with 3D semantic scene representations from D3Fields to enable category level generalization in manipulation. Its modular design supports diverse robot camera configurations including UR5 arms with Microsoft Azure Kinect arrays and bimanual manipulators with Intel RealSense sensors through a low latency control stack and intuitive teleoperation. A unified configuration layer enables seamless switching between setups for flexible data collection training and evaluation. In a grasp and lift block task the framework achieved an 80 percent success rate after only 100 demonstration episodes demonstrating robust skill transfer between platforms and sensing modalities. This design paves the way for scalable real world studies in cross robotic generalization.
comment: 6 pages, 7 figures, conference: SII 2026. Cancun, Mexico
WorldRFT: Latent World Model Planning with Reinforcement Fine-Tuning for Autonomous Driving AAAI 2026
Latent World Models enhance scene representation through temporal self-supervised learning, presenting a perception annotation-free paradigm for end-to-end autonomous driving. However, the reconstruction-oriented representation learning tangles perception with planning tasks, leading to suboptimal optimization for planning. To address this challenge, we propose WorldRFT, a planning-oriented latent world model framework that aligns scene representation learning with planning via a hierarchical planning decomposition and local-aware interactive refinement mechanism, augmented by reinforcement learning fine-tuning (RFT) to enhance safety-critical policy performance. Specifically, WorldRFT integrates a vision-geometry foundation model to improve 3D spatial awareness, employs hierarchical planning task decomposition to guide representation optimization, and utilizes local-aware iterative refinement to derive a planning-oriented driving policy. Furthermore, we introduce Group Relative Policy Optimization (GRPO), which applies trajectory Gaussianization and collision-aware rewards to fine-tune the driving policy, yielding systematic improvements in safety. WorldRFT achieves state-of-the-art (SOTA) performance on both open-loop nuScenes and closed-loop NavSim benchmarks. On nuScenes, it reduces collision rates by 83% (0.30% -> 0.05%). On NavSim, using camera-only sensors input, it attains competitive performance with the LiDAR-based SOTA method DiffusionDrive (87.8 vs. 88.1 PDMS).
comment: AAAI 2026, first version
CoDrone: Autonomous Drone Navigation Assisted by Edge and Cloud Foundation Models
Autonomous navigation for Unmanned Aerial Vehicles faces key challenges from limited onboard computational resources, which restrict deployed deep neural networks to shallow architectures incapable of handling complex environments. Offloading tasks to remote edge servers introduces high latency, creating an inherent trade-off in system design. To address these limitations, we propose CoDrone - the first cloud-edge-end collaborative computing framework integrating foundation models into autonomous UAV cruising scenarios - effectively leveraging foundation models to enhance performance of resource-constrained unmanned aerial vehicle platforms. To reduce onboard computation and data transmission overhead, CoDrone employs grayscale imagery for the navigation model. When enhanced environmental perception is required, CoDrone leverages the edge-assisted foundation model Depth Anything V2 for depth estimation and introduces a novel one-dimensional occupancy grid-based navigation method - enabling fine-grained scene understanding while advancing efficiency and representational simplicity of autonomous navigation. A key component of CoDrone is a Deep Reinforcement Learning-based neural scheduler that seamlessly integrates depth estimation with autonomous navigation decisions, enabling real-time adaptation to dynamic environments. Furthermore, the framework introduces a UAV-specific vision language interaction module incorporating domain-tailored low-level flight primitives to enable effective interaction between the cloud foundation model and the UAV. The introduction of VLM enhances open-set reasoning capabilities in complex unseen scenarios. Experimental results show CoDrone outperforms baseline methods under varying flight speeds and network conditions, achieving a 40% increase in average flight distance and a 5% improvement in average Quality of Navigation.
comment: This paper is accepted by the IEEE Internet of Things Journal (IoT-J) for publication in the Special Issue on "Augmented Edge Sensing Intelligence for Low-Altitude IoT Systems"
EGM: Efficiently Learning General Motion Tracking Policy for High Dynamic Humanoid Whole-Body Control
Learning a general motion tracking policy from human motions shows great potential for versatile humanoid whole-body control. Conventional approaches are not only inefficient in data utilization and training processes but also exhibit limited performance when tracking highly dynamic motions. To address these challenges, we propose EGM, a framework that enables efficient learning of a general motion tracking policy. EGM integrates four core designs. Firstly, we introduce a Bin-based Cross-motion Curriculum Adaptive Sampling strategy to dynamically orchestrate the sampling probabilities based on tracking error of each motion bin, eficiently balancing the training process across motions with varying dificulty and durations. The sampled data is then processed by our proposed Composite Decoupled Mixture-of-Experts (CDMoE) architecture, which efficiently enhances the ability to track motions from different distributions by grouping experts separately for upper and lower body and decoupling orthogonal experts from shared experts to separately handle dedicated features and general features. Central to our approach is a key insight we identified: for training a general motion tracking policy, data quality and diversity are paramount. Building on these designs, we develop a three-stage curriculum training flow to progressively enhance the policy's robustness against disturbances. Despite training on only 4.08 hours of data, EGM generalized robustly across 49.25 hours of test motions, outperforming baselines on both routine and highly dynamic tasks.
IndoorUAV: Benchmarking Vision-Language UAV Navigation in Continuous Indoor Environments
Vision-Language Navigation (VLN) enables agents to navigate in complex environments by following natural language instructions grounded in visual observations. Although most existing work has focused on ground-based robots or outdoor Unmanned Aerial Vehicles (UAVs), indoor UAV-based VLN remains underexplored, despite its relevance to real-world applications such as inspection, delivery, and search-and-rescue in confined spaces. To bridge this gap, we introduce \textbf{IndoorUAV}, a novel benchmark and method specifically tailored for VLN with indoor UAVs. We begin by curating over 1,000 diverse and structurally rich 3D indoor scenes from the Habitat simulator. Within these environments, we simulate realistic UAV flight dynamics to collect diverse 3D navigation trajectories manually, further enriched through data augmentation techniques. Furthermore, we design an automated annotation pipeline to generate natural language instructions of varying granularity for each trajectory. This process yields over 16,000 high-quality trajectories, comprising the \textbf{IndoorUAV-VLN} subset, which focuses on long-horizon VLN. To support short-horizon planning, we segment long trajectories into sub-trajectories by selecting semantically salient keyframes and regenerating concise instructions, forming the \textbf{IndoorUAV-VLA} subset. Finally, we introduce \textbf{IndoorUAV-Agent}, a novel navigation model designed for our benchmark, leveraging task decomposition and multimodal reasoning. We hope IndoorUAV serves as a valuable resource to advance research on vision-language embodied AI in the indoor aerial navigation domain.
VLNVerse: A Benchmark for Vision-Language Navigation with Versatile, Embodied, Realistic Simulation and Evaluation
Despite remarkable progress in Vision-Language Navigation (VLN), existing benchmarks remain confined to fixed, small-scale datasets with naive physical simulation. These shortcomings limit the insight that the benchmarks provide into sim-to-real generalization, and create a significant research gap. Furthermore, task fragmentation prevents unified/shared progress in the area, while limited data scales fail to meet the demands of modern LLM-based pretraining. To overcome these limitations, we introduce VLNVerse: a new large-scale, extensible benchmark designed for Versatile, Embodied, Realistic Simulation, and Evaluation. VLNVerse redefines VLN as a scalable, full-stack embodied AI problem. Its Versatile nature unifies previously fragmented tasks into a single framework and provides an extensible toolkit for researchers. Its Embodied design moves beyond intangible and teleporting "ghost" agents that support full-kinematics in a Realistic Simulation powered by a robust physics engine. We leverage the scale and diversity of VLNVerse to conduct a comprehensive Evaluation of existing methods, from classic models to MLLM-based agents. We also propose a novel unified multi-task model capable of addressing all tasks within the benchmark. VLNVerse aims to narrow the gap between simulated navigation and real-world generalization, providing the community with a vital tool to boost research towards scalable, general-purpose embodied locomotion agents.
PalpAid: Multimodal Pneumatic Tactile Sensor for Tissue Palpation
The tactile properties of tissue, such as elasticity and stiffness, often play an important role in surgical oncology when identifying tumors and pathological tissue boundaries. Though extremely valuable, robot-assisted surgery comes at the cost of reduced sensory information to the surgeon; typically, only vision is available. Sensors proposed to overcome this sensory desert are often bulky, complex, and incompatible with the surgical workflow. We present PalpAid, a multimodal pneumatic tactile sensor equipped with a microphone and pressure sensor, converting contact force into an internal pressure differential. The pressure sensor acts as an event detector, while the auditory signature captured by the microphone assists in tissue delineation. We show the design, fabrication, and assembly of sensory units with characterization tests to show robustness to use, inflation-deflation cycles, and integration with a robotic system. Finally, we show the sensor's ability to classify 3D-printed hard objects with varying infills and soft ex vivo tissues. Overall, PalpAid aims to fill the sensory gap intelligently and allow improved clinical decision-making.
DTCCL: Disengagement-Triggered Contrastive Continual Learning for Autonomous Bus Planners
Autonomous buses run on fixed routes but must operate in open, dynamic urban environments. Disengagement events on these routes are often geographically concentrated and typically arise from planner failures in highly interactive regions. Such policy-level failures are difficult to correct using conventional imitation learning, which easily overfits to sparse disengagement data. To address this issue, this paper presents a Disengagement-Triggered Contrastive Continual Learning (DTCCL) framework that enables autonomous buses to improve planning policies through real-world operation. Each disengagement triggers cloud-based data augmentation that generates positive and negative samples by perturbing surrounding agents while preserving route context. Contrastive learning refines policy representations to better distinguish safe and unsafe behaviors, and continual updates are applied in a cloud-edge loop without human supervision. Experiments on urban bus routes demonstrate that DTCCL improves overall planning performance by 48.6 percent compared with direct retraining, validating its effectiveness for scalable, closed-loop policy improvement in autonomous public transport.
Affordance RAG: Hierarchical Multimodal Retrieval with Affordance-Aware Embodied Memory for Mobile Manipulation ICRA 2026
In this study, we address the problem of open-vocabulary mobile manipulation, where a robot is required to carry a wide range of objects to receptacles based on free-form natural language instructions. This task is challenging, as it involves understanding visual semantics and the affordance of manipulation actions. To tackle these challenges, we propose Affordance RAG, a zero-shot hierarchical multimodal retrieval framework that constructs Affordance-Aware Embodied Memory from pre-explored images. The model retrieves candidate targets based on regional and visual semantics and reranks them with affordance scores, allowing the robot to identify manipulation options that are likely to be executable in real-world environments. Our method outperformed existing approaches in retrieval performance for mobile manipulation instruction in large-scale indoor environments. Furthermore, in real-world experiments where the robot performed mobile manipulation in indoor environments based on free-form instructions, the proposed method achieved a task success rate of 85%, outperforming existing methods in both retrieval performance and overall task success.
comment: Accepted to IEEE RA-L, with presentation at ICRA 2026
A Framework for Deploying Learning-based Quadruped Loco-Manipulation
Quadruped mobile manipulators offer strong potential for agile loco-manipulation but remain difficult to control and transfer reliably from simulation to reality. Reinforcement learning (RL) shows promise for whole-body control, yet most frameworks are proprietary and hard to reproduce on real hardware. We present an open pipeline for training, benchmarking, and deploying RL-based controllers on the Unitree B1 quadruped with a Z1 arm. The framework unifies sim-to-sim and sim-to-real transfer through ROS, re-implementing a policy trained in Isaac Gym, extending it to MuJoCo via a hardware abstraction layer, and deploying the same controller on physical hardware. Sim-to-sim experiments expose discrepancies between Isaac Gym and MuJoCo contact models that influence policy behavior, while real-world teleoperated object-picking trials show that coordinated whole-body control extends reach and improves manipulation over floating-base baselines. The pipeline provides a transparent, reproducible foundation for developing and analyzing RL-based loco-manipulation controllers and will be released open source to support future research.
Point What You Mean: Visually Grounded Instruction Policy
Vision-Language-Action (VLA) models align vision and language with embodied control, but their object referring ability remains limited when relying solely on text prompt, especially in cluttered or out-of-distribution (OOD) scenes. In this study, we introduce the Point-VLA, a plug-and-play policy that augments language instructions with explicit visual cues (e.g., bounding boxes) to resolve referential ambiguity and enable precise object-level grounding. To efficiently scale visually grounded datasets, we further develop an automatic data annotation pipeline requiring minimal human effort. We evaluate Point-VLA on diverse real-world referring tasks and observe consistently stronger performance than text-only instruction VLAs, particularly in cluttered or unseen-object scenarios, with robust generalization. These results demonstrate that Point-VLA effectively resolves object referring ambiguity through pixel-level visual grounding, achieving more generalizable embodied control.
A Time-efficient Prioritised Scheduling Algorithm to Optimise Initial Flock Formation of Drones
Drone applications continue to expand across various domains, with flocking offering enhanced cooperative capabilities but introducing significant challenges during initial formation. Existing flocking algorithms often struggle with efficiency and scalability, particularly when potential collisions force drones into suboptimal trajectories. This paper presents a time-efficient prioritised scheduling algorithm that improves the initial formation process of drone flocks. The method assigns each drone a priority based on its number of potential collisions and its likelihood of reaching its target position without permanently obstructing other drones. Using this hierarchy, each drone computes an appropriate delay to ensure a collision-free path. Simulation results show that the proposed algorithm successfully generates collision-free trajectories for flocks of up to 5000 drones and outperforms the coupling-degree-based heuristic prioritised planning method (CDH-PP) in both performance and computational efficiency.
comment: 35 pages
Gaussian Variational Inference with Non-Gaussian Factors for State Estimation: A UWB Localization Case Study
This letter extends the exactly sparse Gaussian variational inference (ESGVI) algorithm for state estimation in two complementary directions. First, ESGVI is generalized to operate on matrix Lie groups, enabling the estimation of states with orientation components while respecting the underlying group structure. Second, factors are introduced to accommodate heavy-tailed and skewed noise distributions, as commonly encountered in ultra-wideband (UWB) localization due to non-line-of-sight (NLOS) and multipath effects. Both extensions are shown to integrate naturally within the ESGVI framework while preserving its sparse and derivative-free structure. The proposed approach is validated in a UWB localization experiment with NLOS-rich measurements, demonstrating improved accuracy and comparable consistency. Finally, a Python implementation within a factor-graph-based estimation framework is made open-source (https://github.com/decargroup/gvi_ws) to support broader research use.
A Class of Axis-Angle Attitude Control Laws for Rotational Systems
We introduce a new class of attitude control laws for rotational systems, which generalizes the use of the Euler axis-angle representation beyond quaternion-based formulations. Using basic Lyapunov's stability theory and the notion of extended $K_{\infty}$ functions, we developed a method for determining and enforcing the global asymptotic stability of the single fixed point of the resulting closed-loop (CL) scheme. In contrast with traditional quaternion-based methods, the proposed generalized axis-angle approach enables greater flexibility in the design of the control law, which is of great utility when employed in combination with a switching scheme whose transition state depends on the angular velocity of the controlled rotational system. Through simulation and real-time experimental results, we demonstrate the effectiveness of the proposed approach. According to the recorded data, in the execution of high-speed tumble-recovery maneuvers, the new method consistently achieves shorter stabilization times and requires lower control effort relative to those corresponding to the quaternion-based and geometric-control methods used as benchmarks.
comment: 6 pages, 4 figures. Accepted for publication in IEEE Control Systems Letters
Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies
Vision-Language-Action (VLA) models adapt large vision-language backbones to map images and instructions into robot actions. However, prevailing VLAs either generate actions auto-regressively in a fixed left-to-right order or attach separate MLP or diffusion heads outside the backbone, leading to fragmented information pathways and specialized training requirements that hinder a unified, scalable architecture. We present Discrete Diffusion VLA, a unified-transformer policy that models discretized action chunks with discrete diffusion. The design retains diffusion's progressive refinement paradigm while remaining natively compatible with the discrete token interface of VLMs. Our method achieves an adaptive decoding order that resolves easy action elements before harder ones and uses secondary re-masking to revisit uncertain predictions across refinement rounds, which improves consistency and enables robust error correction. This unified decoder preserves pre-trained vision-language priors, supports parallel decoding, breaks the autoregressive bottleneck, and reduces the number of function evaluations. Discrete Diffusion VLA achieves 96.3% avg. success rates on LIBERO, 71.2% visual matching on SimplerEnv-Fractal and 54.2% overall on SimplerEnv-Bridge. We also provide ablation study on vision-language ability retention on LIBERO-OOD (Out-of-Distribution) benchmark, with our method improving over autoregressive, MLP decoder and continuous diffusion baselines. These findings indicate that discrete-diffusion VLA supports precise action modeling and consistent training, laying groundwork for scaling VLA to larger models and datasets. Our code is available at https://github.com/Liang-ZX/DiscreteDiffusionVLA/tree/libero.
comment: New experiments on VL retention and new ablations. 18 pages
AsyMoE: Leveraging Modal Asymmetry for Enhanced Expert Specialization in Large Vision-Language Models
Large Vision-Language Models (LVLMs) have demonstrated impressive performance on multimodal tasks through scaled architectures and extensive training. However, existing Mixture of Experts (MoE) approaches face challenges due to the asymmetry between visual and linguistic processing. Visual information is spatially complete, while language requires maintaining sequential context. As a result, MoE models struggle to balance modality-specific features and cross-modal interactions. Through systematic analysis, we observe that language experts in deeper layers progressively lose contextual grounding and rely more on parametric knowledge rather than utilizing the provided visual and linguistic information. To address this, we propose AsyMoE, a novel architecture that models this asymmetry using three specialized expert groups. We design intra-modality experts for modality-specific processing, hyperbolic inter-modality experts for hierarchical cross-modal interactions, and evidence-priority language experts to suppress parametric biases and maintain contextual grounding. Extensive experiments demonstrate that AsyMoE achieves 26.58% and 15.45% accuracy improvements over vanilla MoE and modality-specific MoE respectively, with 25.45% fewer activated parameters than dense models.
comment: This submission has been withdrawn by the authors due to a fundamental error in the methodology that affects the validity of the main results
Towards Senior-Robot Interaction: Reactive Robot Dog Gestures
As the global population ages, many seniors face the problem of loneliness. Companion robots offer a potential solution. However, current companion robots often lack advanced functionality, while task-oriented robots are not designed for social interaction, limiting their suitability and acceptance by seniors. Our work introduces a senior-oriented system for quadruped robots that allows for more intuitive user input and provides more socially expressive output. For user input, we implemented a MediaPipe-based module for hand gesture and head movement recognition, enabling control without a remote. For output, we designed and trained robotic dog gestures using curriculum-based reinforcement learning in Isaac Gym, progressing from simple standing to three-legged balancing and leg extensions, and more. The final tests achieved over 95\% success on average in simulation, and we validated a key social gesture (the paw-lift) on a Unitree robot. Real-world tests demonstrated the feasibility and social expressiveness of this framework, while also revealing sim-to-real challenges in joint compliance, load distribution, and balance control. These contributions advance the development of practical quadruped robots as social companions for the senior and outline pathways for sim-to-real adaptation and inform future user studies.
comment: Accepted at the Australasian Conference on Robotics and Automation (ACRA) 2025
VERDI: VLM-Embedded Reasoning for Autonomous Driving
While autonomous driving (AD) stacks struggle with decision making under partial observability and real-world complexity, human drivers are capable of commonsense reasoning to make near-optimal decisions with limited information. Recent work has attempted to leverage finetuned Vision-Language Models (VLMs) for trajectory planning at inference time to emulate human behavior. Despite their success in benchmark evaluations, these methods are often impractical to deploy (a 70B parameter VLM inference at merely 8 tokens per second requires more than 160G of memory), and their monolithic network structure prohibits safety decomposition. To bridge this gap, we propose VLM-Embedded Reasoning for autonomous Driving (VERDI), a training-time framework that distills the reasoning process and commonsense knowledge of VLMs into the AD stack. VERDI augments modular differentiable end-to-end (e2e) AD models by aligning intermediate module outputs at the perception, prediction, and planning stages with text features explaining the driving reasoning process produced by VLMs. By encouraging alignment in latent space, VERDI enables the modular AD stack to internalize structured reasoning, without incurring the inference-time costs of large VLMs. We validate VERDI in both open-loop (NuScenes and Bench2Drive benchmarks) and closed-loop (HugSim Simulator) settings. We find that VERDI outperforms existing e2e methods that do not embed reasoning by up to 11% in $\ell_{2}$ distance and 11% in driving performance, while maintaining real-time inference speed.
Continuous Vision-Language-Action Co-Learning with Semantic-Physical Alignment for Behavioral Cloning AAAI 2026
Language-conditioned manipulation facilitates human-robot interaction via behavioral cloning (BC), which learns control policies from human demonstrations and serves as a cornerstone of embodied AI. Overcoming compounding errors in sequential action decisions remains a central challenge to improving BC performance. Existing approaches mitigate compounding errors through data augmentation, expressive representation, or temporal abstraction. However, they suffer from physical discontinuities and semantic-physical misalignment, leading to inaccurate action cloning and intermittent execution. In this paper, we present Continuous vision-language-action Co-Learning with Semantic-Physical Alignment (CCoL), a novel BC framework that ensures temporally consistent execution and fine-grained semantic grounding. It generates robust and smooth action execution trajectories through continuous co-learning across vision, language, and proprioceptive inputs (e.g., robot internal states). Meanwhile, we anchor language semantics to visuomotor representations by a bidirectional cross-attention to learn contextual information for action generation, successfully overcoming the problem of semantic-physical misalignment. Extensive experiments show that CCoL achieves an average 8.0% relative improvement across three simulation suites, with up to 19.2% relative gain in human-demonstrated bimanual insertion tasks. Real-world tests on a 7-DoF robot further confirm CCoL's generalization under unseen and noisy object states.
comment: Accepted at AAAI 2026, the Project website is available at https://qhemu.github.io/CCoL/
BRIC: Bridging Kinematic Plans and Physical Control at Test Time AAAI'26
We propose BRIC, a novel test-time adaptation (TTA) framework that enables long-term human motion generation by resolving execution discrepancies between diffusion-based kinematic motion planners and reinforcement learning-based physics controllers. While diffusion models can generate diverse and expressive motions conditioned on text and scene context, they often produce physically implausible outputs, leading to execution drift during simulation. To address this, BRIC dynamically adapts the physics controller to noisy motion plans at test time, while preserving pre-trained skills via a loss function that mitigates catastrophic forgetting. In addition, BRIC introduces a lightweight test-time guidance mechanism that steers the diffusion model in the signal space without updating its parameters. By combining both adaptation strategies, BRIC ensures consistent and physically plausible long-term executions across diverse environments in an effective and efficient manner. We validate the effectiveness of BRIC on a variety of long-term tasks, including motion composition, obstacle avoidance, and human-scene interaction, achieving state-of-the-art performance across all tasks.
comment: Accepted to AAAI'26
Confidence Calibration in Vision-Language-Action Models
Trustworthy robot behavior requires not only high levels of task success but also that the robot can reliably quantify how likely it is to succeed. To this end, we present a first-of-its-kind study of confidence calibration in vision-language-action (VLA) foundation models, which map visual observations and natural language instructions to low-level robot motor commands. We establish a confidence baseline for VLAs, examine how task success relates to calibration error and how calibration evolves over time, and introduce two lightweight techniques to remedy the miscalibration we observe: prompt ensembles and action-wise Platt scaling. Our aim in this study is to begin to develop the tools and conceptual understanding necessary to render VLAs both highly performant and highly trustworthy via reliable uncertainty quantification.
comment: 38 pages, 19 figures; additional experiments with VLA variants
PRISM-Loc: a Lightweight Long-range LiDAR Localization in Urban Environments with Topological Maps ICRA 2026
We propose PRISM-Loc - a lightweight and robust approach for localization in large outdoor environments that combines a compact topological representation with a novel scan-matching and curb-detection module operating on raw LiDAR scans. The method is designed for resource-constrained platforms and emphasizes real-time performance and resilience to common urban sensing challenges. It provides accurate localization in compact topological maps using global place recognition and an original scan matching technique. Experiments on standard benchmarks and on an embedded platform demonstrate the effectiveness of our approach. Our method achieves a 99\% success rate on the large-scale ITLP-Campus dataset while running at 150 ms per localization and using a 20 MB map for localization. We highlight three main contributions: (1) a compact representation for city-scale localization; (2) a novel curb detection and scan matching pipeline operating directly on raw LiDAR points; (3) a thorough evaluation of our method with performance analysis.
comment: This version was submitted to ICRA 2026 conference
DeltaFlow: An Efficient Multi-frame Scene Flow Estimation Method NeurIPS 2025
Previous dominant methods for scene flow estimation focus mainly on input from two consecutive frames, neglecting valuable information in the temporal domain. While recent trends shift towards multi-frame reasoning, they suffer from rapidly escalating computational costs as the number of frames grows. To leverage temporal information more efficiently, we propose DeltaFlow ($Δ$Flow), a lightweight 3D framework that captures motion cues via a $Δ$ scheme, extracting temporal features with minimal computational cost, regardless of the number of frames. Additionally, scene flow estimation faces challenges such as imbalanced object class distributions and motion inconsistency. To tackle these issues, we introduce a Category-Balanced Loss to enhance learning across underrepresented classes and an Instance Consistency Loss to enforce coherent object motion, improving flow accuracy. Extensive evaluations on the Argoverse 2, Waymo and nuScenes datasets show that $Δ$Flow achieves state-of-the-art performance with up to 22% lower error and $2\times$ faster inference compared to the next-best multi-frame supervised method, while also demonstrating a strong cross-domain generalization ability. The code is open-sourced at https://github.com/Kin-Zhang/DeltaFlow along with trained model weights.
comment: NeurIPS 2025 Spotlight, 18 pages (10 main pages + 8 supp materail), 11 figures, code at https://github.com/Kin-Zhang/DeltaFlow
Geometric Data-Driven Multi-Jet Locomotion Inspired by Salps
Salps are marine animals consisting of chains of jellyfish-like units. Their efficient underwater locomotion by coordinating multi-jet propulsion has aroused great interest in robotics. This paper presents a geometric mechanics framework for salp-inspired robots. We study a new type of geometric mechanics models inspired by salps, in which control inputs are not restricted to the shape axes, analyze nonlinear controllability, and develop motion planning and feedback control methods. We introduce the "LandSalp" robot, which serves as a physical realization of the reduced-order, drag-dominated model of salp swimming, enabling controlled evaluation of locomotion strategies without many confounding factors of underwater experiments. We extend least-squares- and inverse-dynamics-based system identification to learn the Riemannian metric of the drag-dominated model from experimental data using Lie group differentiation. With about three minutes of data, we identify an accurate model of LandSalp. Simulation and hardware experiments demonstrate omnidirectional locomotion, shape regulation, and bending maneuvers, providing a principled pathway toward more capable salp-inspired robots.
comment: 20 pages, 16 figures
GUIDEd Agents: Enhancing Navigation Policies through Task-Specific Uncertainty Abstraction in Localization-Limited Environments
Autonomous vehicles performing navigation tasks in complex environments face significant challenges due to uncertainty in state estimation. In many scenarios, such as stealth operations or resource-constrained settings, accessing high-precision localization comes at a significant cost, forcing robots to rely primarily on less precise state estimates. Our key observation is that different tasks require varying levels of precision in different regions: a robot navigating a crowded space might need precise localization near obstacles but can operate effectively with less precision elsewhere. In this paper, we present a planning method for integrating task-specific uncertainty requirements directly into navigation policies. We introduce Task-Specific Uncertainty Maps (TSUMs), which abstract the acceptable levels of state estimation uncertainty across different regions. TSUMs align task requirements and environmental features using a shared representation space, generated via a domain-adapted encoder. Using TSUMs, we propose Generalized Uncertainty Integration for Decision-Making and Execution (GUIDE), a policy conditioning framework that incorporates these uncertainty requirements into robot decision-making. We find that TSUMs provide an effective way to abstract task-specific uncertainty requirements, and conditioning policies on TSUMs enables the robot to reason about the context-dependent value of certainty and adapt its behavior accordingly. We show how integrating GUIDE into reinforcement learning frameworks allows the agent to learn navigation policies that effectively balance task completion and uncertainty management without explicit reward engineering. We evaluate GUIDE on various real-world robotic navigation tasks and find that it demonstrates significant improvement in task completion rates compared to baseline methods that do not explicitly consider task-specific uncertainty.
comment: Accepted for publication at RAL (Robotics and automation letters). Updated with the final version
TakeAD: Preference-based Post-optimization for End-to-end Autonomous Driving with Expert Takeover Data
Existing end-to-end autonomous driving methods typically rely on imitation learning (IL) but face a key challenge: the misalignment between open-loop training and closed-loop deployment. This misalignment often triggers driver-initiated takeovers and system disengagements during closed-loop execution. How to leverage those expert takeover data from disengagement scenarios and effectively expand the IL policy's capability presents a valuable yet unexplored challenge. In this paper, we propose TakeAD, a novel preference-based post-optimization framework that fine-tunes the pre-trained IL policy with this disengagement data to enhance the closed-loop driving performance. First, we design an efficient expert takeover data collection pipeline inspired by human takeover mechanisms in real-world autonomous driving systems. Then, this post optimization framework integrates iterative Dataset Aggregation (DAgger) for imitation learning with Direct Preference Optimization (DPO) for preference alignment. The DAgger stage equips the policy with fundamental capabilities to handle disengagement states through direct imitation of expert interventions. Subsequently, the DPO stage refines the policy's behavior to better align with expert preferences in disengagement scenarios. Through multiple iterations, the policy progressively learns recovery strategies for disengagement states, thereby mitigating the open-loop gap. Experiments on the closed-loop Bench2Drive benchmark demonstrate our method's effectiveness compared with pure IL methods, with comprehensive ablations confirming the contribution of each component.
comment: This work has been accepted by IEEE RA-L. Manuscript submitted: July, 8, 2025; Accepted: November, 24, 2025
QDepth-VLA: Quantized Depth Prediction as Auxiliary Supervision for Vision-Language-Action Models
Spatial perception and reasoning are crucial for Vision-Language-Action (VLA) models to accomplish fine-grained manipulation tasks. However, existing approaches often lack the ability to understand and reason over the essential 3D structures necessary for precise control. To address this limitation, we propose QDepth-VLA, a general framework that augments VLA models with an auxiliary depth prediction task. A dedicated depth expert is designed to predict quantized latent tokens of depth maps obtained from a VQ-VAE encoder, enabling the model to learn depth-aware representations that capture critical geometric cues. Experimental results on the simulation benchmarks and real-world tasks demonstrate that QDepth-VLA yields strong spatial reasoning and competitive performance on manipulation tasks.
Deformable Cluster Manipulation via Whole-Arm Policy Learning
Manipulating clusters of deformable objects presents a substantial challenge with widespread applicability, but requires contact-rich whole-arm interactions. A potential solution must address the limited capacity for realistic model synthesis, high uncertainty in perception, and the lack of efficient spatial abstractions, among others. We propose a novel framework for learning model-free policies integrating two modalities: 3D point clouds and proprioceptive touch indicators, emphasising manipulation with full body contact awareness, going beyond traditional end-effector modes. Our reinforcement learning framework leverages a distributional state representation, aided by kernel mean embeddings, to achieve improved training efficiency and real-time inference. Furthermore, we propose a novel context-agnostic occlusion heuristic to clear deformables from a target region for exposure tasks. We deploy the framework in a power line clearance scenario and observe that the agent generates creative strategies leveraging multiple arm links for de-occlusion. Finally, we perform zero-shot sim-to-real policy transfer, allowing the arm to clear real branches with unknown occlusion patterns, unseen topology, and uncertain dynamics. Website: https://sites.google.com/view/dcmwap/
DRO-EDL-MPC: Evidential Deep Learning-Based Distributionally Robust Model Predictive Control for Safe Autonomous Driving
Safety is a critical concern in motion planning for autonomous vehicles. Modern autonomous vehicles rely on neural network-based perception, but making control decisions based on these inference results poses significant safety risks due to inherent uncertainties. To address this challenge, we present a distributionally robust optimization (DRO) framework that accounts for both aleatoric and epistemic perception uncertainties using evidential deep learning (EDL). Our approach introduces a novel ambiguity set formulation based on evidential distributions that dynamically adjusts the conservativeness according to perception confidence levels. We integrate this uncertainty-aware constraint into model predictive control (MPC), proposing the DRO-EDL-MPC algorithm with computational tractability for autonomous driving applications. Validation in the CARLA simulator demonstrates that our approach maintains efficiency under high perception confidence while enforcing conservative constraints under low confidence.
comment: Accepted to IEEE Robotics and Automation Letters (RA-L)
BUFFER-X: Towards Zero-Shot Point Cloud Registration in Diverse Scenes ICCV 2025
Recent advances in deep learning-based point cloud registration have improved generalization, yet most methods still require retraining or manual parameter tuning for each new environment. In this paper, we identify three key factors limiting generalization: (a) reliance on environment-specific voxel size and search radius, (b) poor out-of-domain robustness of learning-based keypoint detectors, and (c) raw coordinate usage, which exacerbates scale discrepancies. To address these issues, we present a zero-shot registration pipeline called BUFFER-X by (a) adaptively determining voxel size/search radii, (b) using farthest point sampling to bypass learned detectors, and (c) leveraging patch-wise scale normalization for consistent coordinate bounds. In particular, we present a multi-scale patch-based descriptor generation and a hierarchical inlier search across scales to improve robustness in diverse scenes. We also propose a novel generalizability benchmark using 11 datasets that cover various indoor/outdoor scenarios and sensor modalities, demonstrating that BUFFER-X achieves substantial generalization without prior information or manual parameter tuning for the test datasets. Our code is available at https://github.com/MIT-SPARK/BUFFER-X.
comment: 20 pages, 14 figures. Accepted as a highlight paper at ICCV 2025
FORWARD: Dataset of a forwarder operating in rough terrain
We present FORWARD, a high-resolution multimodal dataset of a cut-to-length forwarder operating in rough terrain on two harvest sites in the middle part of Sweden. The forwarder is a large Komatsu model equipped with vehicle telematics sensors, including global positioning via satellite navigation, movement sensors, accelerometers, and engine sensors. The vehicle was additionally equipped with cameras, operator vibration sensors, and multiple IMUs. The data includes event time logs recorded at 5 Hz of driving speed, fuel consumption, vehicle position with centimeter accuracy, and crane use while the vehicle operates in forest areas, aerially laser-scanned with a resolution of around 1500 points per square meter. Production log files (StanForD standard) with time-stamped machine events, extensive video material, and terrain data in various formats are included as well. About 18 hours of regular wood extraction work during three days is annotated from 360-video material into individual work elements and included in the dataset. We also include scenario specifications of conducted experiments on forest roads and in terrain. Scenarios include repeatedly driving the same routes with and without steel tracks, different load weights, and different target driving speeds. The dataset is intended for developing models and algorithms for trafficability, perception, and autonomous control of forest machines using artificial intelligence, simulation, and experiments on physical testbeds. In part, we focus on forwarders traversing terrain, avoiding or handling obstacles, and loading or unloading logs, with consideration for efficiency, fuel consumption, safety, and environmental impact. Other benefits of the open dataset include the ability to explore auto-generation and calibration of forestry machine simulators and automation scenario descriptions using the data recorded in the field.
comment: 28 pages, 22 figures
Explainable deep learning improves human mental models of self-driving cars
Self-driving cars increasingly rely on deep neural networks to achieve human-like driving. The opacity of such black-box planners makes it challenging for the human behind the wheel to accurately anticipate when they will fail, with potentially catastrophic consequences. While research into interpreting these systems has surged, most of it is confined to simulations or toy setups due to the difficulty of real-world deployment, leaving the practical utility of such techniques unknown. Here, we introduce the Concept-Wrapper Network (CW-Net), a method for explaining the behavior of machine-learning-based planners by grounding their reasoning in human-interpretable concepts. We deploy CW-Net on a real self-driving car and show that the resulting explanations improve the human driver's mental model of the car, allowing them to better predict its behavior. To our knowledge, this is the first demonstration that explainable deep learning integrated into self-driving cars can be both understandable and useful in a realistic deployment setting. CW-Net accomplishes this level of intelligibility while providing explanations which are causally faithful and do not sacrifice driving performance. Overall, our study establishes a general pathway to interpretability for autonomous agents by way of concept-based explanations, which could help make them more transparent and safe.
comment: MST & JAS contributed equally to this work
Multiagent Systems
A Multi-Agent Retrieval-Augmented Framework for Work-in-Progress Predictio
Work-in-Progress (WiP) prediction is critical for predictive process monitoring, enabling accurate anticipation of workload fluctuations and optimized operational planning. This paper proposes a retrieval-augmented, multi-agent framework that combines retrieval-augmented generation (RAG) and collaborative multi-agent reasoning for WiP prediction. The narrative generation component transforms structured event logs into semantically rich natural language stories, which are embedded into a semantic vector-based process memory to facilitate dynamic retrieval of historical context during inference. The framework includes predictor agents that independently leverage retrieved historical contexts and a decision-making assistant agent that extracts high-level descriptive signals from recent events. A fusion agent then synthesizes predictions using ReAct-style reasoning over agent outputs and retrieved narratives. We evaluate our framework on two real-world benchmark datasets. Results show that the proposed retrieval-augmented multi-agent approach achieves competitive prediction accuracy, obtaining a Mean Absolute Percentage Error (MAPE) of 1.50\% on one dataset, and surpassing Temporal Convolutional Networks (TCN), Long Short-Term Memory (LSTM), and persistence baselines. The results highlight improved robustness, demonstrating the effectiveness of integrating retrieval mechanisms and multi-agent reasoning in WiP prediction.
comment: 15th International Conference on Digital Image Processing and Pattern Recognition (DPPR 2025), December 20 ~ 21, 2025, Sydney, Australia
Mechanism-Based Intelligence (MBI): Differentiable Incentives for Rational Coordination and Guaranteed Alignment in Multi-Agent Systems
Autonomous multi-agent systems are fundamentally fragile: they struggle to solve the Hayekian Information problem (eliciting dispersed private knowledge) and the Hurwiczian Incentive problem (aligning local actions with global objectives), making coordination computationally intractable. I introduce Mechanism-Based Intelligence (MBI), a paradigm that reconceptualizes intelligence as emergent from the coordination of multiple "brains", rather than a single one. At its core, the Differentiable Price Mechanism (DPM) computes the exact loss gradient $$ \mathbf{G}_i = - \frac{\partial \mathcal{L}}{\partial \mathbf{x}_i} $$ as a dynamic, VCG-equivalent incentive signal, guaranteeing Dominant Strategy Incentive Compatibility (DSIC) and convergence to the global optimum. A Bayesian extension ensures incentive compatibility under asymmetric information (BIC). The framework scales linearly ($\mathcal{O}(N)$) with the number of agents, bypassing the combinatorial complexity of Dec-POMDPs and is empirically 50x faster than Model-Free Reinforcement Learning. By structurally aligning agent self-interest with collective objectives, it provides a provably efficient, auditable and generalizable approach to coordinated, trustworthy and scalable multi-agent intelligence grounded in economic principles.
HyperAgent: Leveraging Hypergraphs for Topology Optimization in Multi-Agent Communication
Recent advances in large language model-powered multi-agent systems have demonstrated remarkable collective intelligence through effective communication. However, existing approaches face two primary challenges: (i) \textit{Ineffective group collaboration modeling}, as they rely on pairwise edge representations in graph structures, limiting their ability to capture relationships among multiple agents; and (ii) \textit{Limited task-adaptiveness in communication topology design}, leading to excessive communication cost for simple tasks and insufficient coordination for complex scenarios. These issues restrict the scalability and practical deployment of adaptive collaboration frameworks. To address these challenges, we propose \textbf{HyperAgent}, a hypergraph-based framework that optimizes communication topologies and effectively captures group collaboration patterns using direct hyperedge representations. Unlike edge-based approaches, HyperAgent uses hyperedges to link multiple agents within the same subtask and employs hypergraph convolutional layers to achieve one-step information aggregation in collaboration groups. Additionally, it incorporates a variational autoencoder framework with sparsity regularization to dynamically adjust hypergraph topologies based on task complexity. Experiments highlight the superiority of HyperAgent in both performance and efficiency. For instance, on GSM8K, HyperAgent achieves 95.07\% accuracy while reducing token consumption by 25.33\%, demonstrating the potential of hypergraph-based optimization for multi-agent communication.
comment: This submission has been withdrawn by the authors due to a fundamental error in the methodology that affects the validity of the main results
RadAgents: Multimodal Agentic Reasoning for Chest X-ray Interpretation with Radiologist-like Workflows ML4H'25
Agentic systems offer a potential path to solve complex clinical tasks through collaboration among specialized agents, augmented by tool use and external knowledge bases. Nevertheless, for chest X-ray (CXR) interpretation, prevailing methods remain limited: (i) reasoning is frequently neither clinically interpretable nor aligned with guidelines, reflecting mere aggregation of tool outputs; (ii) multimodal evidence is insufficiently fused, yielding text-only rationales that are not visually grounded; and (iii) systems rarely detect or resolve cross-tool inconsistencies and provide no principled verification mechanisms. To bridge the above gaps, we present RadAgents, a multi-agent framework that couples clinical priors with task-aware multimodal reasoning and encodes a radiologist-style workflow into a modular, auditable pipeline. In addition, we integrate grounding and multimodal retrieval-augmentation to verify and resolve context conflicts, resulting in outputs that are more reliable, transparent, and consistent with clinical practice.
comment: ML4H'25; Work in progress
Personalized and Resilient Distributed Learning Through Opinion Dynamics
In this paper, we address two practical challenges of distributed learning in multi-agent network systems, namely personalization and resilience. Personalization is the need of heterogeneous agents to learn local models tailored to their own data and tasks, while still generalizing well; on the other hand, the learning process must be resilient to cyberattacks or anomalous training data to avoid disruption. Motivated by a conceptual affinity between these two requirements, we devise a distributed learning algorithm that combines distributed gradient descent and the Friedkin-Johnsen model of opinion dynamics to fulfill both of them. We quantify its convergence speed and the neighborhood that contains the final learned models, which can be easily controlled by tuning the algorithm parameters to enforce a more personalized/resilient behavior. We numerically showcase the effectiveness of our algorithm on synthetic and real-world distributed learning tasks, where it achieves high global accuracy both for personalized models and with malicious agents compared to standard strategies.
comment: Published on IEEE Transactions on Control of Network Systems. Final accepted version
Networked Communication for Decentralised Cooperative Agents in Mean-Field Control
The mean-field framework has been used to find approximate solutions to problems involving very large populations of symmetric, anonymous agents, which may be intractable by other methods. The cooperative mean-field control (MFC) problem has received less attention than the non-cooperative mean-field game (MFG), despite the former potentially being more useful as a tool for engineering large-scale collective behaviours. Decentralised communication algorithms have recently been introduced to MFGs, giving benefits to learning speed and robustness. Inspired by this, we introduce networked communication to MFC - where populations arguably have broader incentive to communicate - and in particular to the setting where decentralised agents learn online from a single, non-episodic run of the empirical system. We adapt recent MFG algorithms to this new setting, as well as contributing a novel sub-routine allowing networked agents to estimate the global average reward from their local neighbourhood. Previous theoretical analysis of decentralised communication in MFGs does not extend trivially to MFC. We therefore contribute new theory proving that in MFC the networked communication scheme allows agents to increase social welfare faster than under both of the two typical alternative architectures, namely independent and centralised learning. We provide experiments that support this new result across different classes of cooperative game, and also give numerous ablation studies and additional experiments concerning numbers of communication round and robustness to communication failures.
Networked Communication for Mean-Field Games with Function Approximation and Empirical Mean-Field Estimation
Recent algorithms allow decentralised agents, possibly connected via a communication network, to learn equilibria in mean-field games from a non-episodic run of the empirical system. However, these algorithms are for tabular settings: this computationally limits the size of agents' observation space, meaning the algorithms cannot handle anything but small state spaces, nor generalise beyond policies depending only on the agent's local state to so-called 'population-dependent' policies. We address this limitation by introducing function approximation to the existing setting, drawing on the Munchausen Online Mirror Descent method that has previously been employed only in finite-horizon, episodic, centralised settings. While this permits us to include the mean field in the observation for players' policies, it is unrealistic to assume decentralised agents have access to this global information: we therefore also provide new algorithms allowing agents to locally estimate the global empirical distribution, and to improve this estimate via inter-agent communication. We prove theoretically that exchanging policy information helps networked agents outperform both independent and even centralised agents in function-approximation settings. Our experiments demonstrate this happening empirically, and show that the communication network allows decentralised agents to estimate the mean field for population-dependent policies.
Networked Communication for Decentralised Agents in Mean-Field Games
Methods like multi-agent reinforcement learning struggle to scale with growing population size. Mean-field games (MFGs) are a game-theoretic approach that can circumvent this by finding a solution for an abstract infinite population, which can then be used as an approximate solution for the $N$-agent problem. However, classical mean-field algorithms usually only work under restrictive conditions. We take steps to address this by introducing networked communication to MFGs, in particular to settings that use a single, non-episodic run of $N$ decentralised agents to simulate the infinite population, as is likely to be most reasonable in real-world deployments. We prove that our architecture's sample guarantees lie between those of earlier theoretical algorithms for the centralised- and independent-learning architectures, varying dependent on network structure and the number of communication rounds. However, the sample guarantees of the three theoretical algorithms do not actually result in practical convergence times. We thus contribute practical enhancements to all three algorithms allowing us to present their first empirical demonstrations. We then show that in practical settings where the theoretical hyperparameters are not observed, giving fewer loops but poorer estimation of the Q-function, our communication scheme still respects the earlier theoretical analysis: it considerably accelerates learning over the independent case, which hardly seems to learn at all, and often performs similarly to the centralised case, while removing the restrictive assumption of the latter. We provide ablations and additional studies showing that our networked approach also has advantages over both alternatives in terms of robustness to update failures and to changes in population size.
MAPPO-LCR: Multi-Agent Proximal Policy Optimization with Local Cooperation Reward in Spatial Public Goods Games
Spatial public goods games model collective dilemmas where individual payoffs depend on population-level strategy configurations. Most existing studies rely on evolutionary update rules or value-based reinforcement learning methods. These approaches struggle to represent payoff coupling and non-stationarity in large interacting populations. This work introduces Multi-Agent Proximal Policy Optimization (MAPPO) into spatial public goods games for the first time. In these games, individual returns are intrinsically coupled through overlapping group interactions. Proximal Policy Optimization (PPO) treats agents as independent learners and ignores this coupling during value estimation. MAPPO addresses this limitation through a centralized critic that evaluates joint strategy configurations. To study neighborhood-level cooperation signals under this framework, we propose MAPPO with Local Cooperation Reward, termed MAPPO-LCR. The local cooperation reward aligns policy updates with surrounding cooperative density without altering the original game structure. MAPPO-LCR preserves decentralized execution while enabling population-level value estimation during training. Extensive simulations demonstrate stable cooperation emergence and reliable convergence across enhancement factors. Statistical analyses further confirm the learning advantage of MAPPO over PPO in spatial public goods games.
Cooperation as a Black Box: Conceptual Fluctuation and Diagnostic Tools for Misalignment in MAS
Misalignment in Multi-Agent Systems (MAS) is frequently treated as a technical failure. Yet, issues may arise from the conceptual design phase, where semantic ambiguity and normative projection occur. The Rabbit-Duck illusion illustrates how perspective-dependent readings of agent behavior, such as the conflation of cooperation-coordination, can create epistemic instability; e.g., coordinated agents in cooperative Multi-Agent Reinforcement Learning (MARL) benchmarks being interpreted as morally aligned, despite being optimized for shared utility maximization only. Motivated by three drivers of meaning-level misalignment in MAS (coordination-cooperation ambiguity, conceptual fluctuation, and semantic instability), we introduce the Misalignment Mosaic: a framework for diagnosing how misalignment emerges through language, framing, and design assumptions. The Mosaic comprises four components: 1. Terminological Inconsistency, 2. Interpretive Ambiguity, 3. Concept-to-Code Decay, and 4. Morality as Cooperation. Building on insights from the Morality-as-Cooperation Theory, we call for consistent meaning-level grounding in MAS to ensure systems function as intended: technically and ethically. This need is particularly urgent as MAS principles influence broader Artificial Intelligence (AI) workflows, amplifying risks in trust, interpretability, and governance. While this work focuses on the coordination/cooperation ambiguity, the Mosaic generalizes to other overloaded terms, such as alignment, autonomy, and trust.
Systems and Control (EESS)
Optimal-coupling-observer AV motion control securing comfort in the presence of cyber attacks
The security of Automated Vehicles (AVs) is an important emerging area of research in traffic safety. Methods have been published and evaluated in experimental vehicles to secure safe AV control in the presence of attacks, but human motion comfort is rarely investigated in such studies. In this paper, we present an innovative optimal-coupling-observer-based framework that rejects the impact of bounded sensor attacks in a network of connected and automated vehicles from safety and comfort point of view. We demonstrate its performance in car following with cooperative adaptive cruise control for platoons with redundant distance and velocity sensors. The error dynamics are formulated as a Linear Time Variant (LTV) system, resulting in complex stability conditions that are investigated using a Linear Matrix Inequality (LMI) approach guaranteeing global asymptotic stability. We prove the capability of the framework to secure occupants' safety and comfort in the presence of bounded attacks. In the onset of attack, the framework rapidly detects attacked sensors and switches to the most reliable observer eliminating attacked sensors, even with modest attack magnitudes. Without our proposed method, severe (but bounded) attacks result in collisions and major discomfort. With our method, attacks had negligible effects on motion comfort evaluated using ISO-2631 Ride Comfort and Motion Sickness indexes. The results pave the path to bring comfort to the forefront of AVs security.
LeLaR: The First In-Orbit Demonstration of an AI-Based Satellite Attitude Controller
Attitude control is essential for many satellite missions. Classical controllers, however, are time-consuming to design and sensitive to model uncertainties and variations in operational boundary conditions. Deep Reinforcement Learning (DRL) offers a promising alternative by learning adaptive control strategies through autonomous interaction with a simulation environment. Overcoming the Sim2Real gap, which involves deploying an agent trained in simulation onto the real physical satellite, remains a significant challenge. In this work, we present the first successful in-orbit demonstration of an AI-based attitude controller for inertial pointing maneuvers. The controller was trained entirely in simulation and deployed to the InnoCube 3U nanosatellite, which was developed by the Julius-Maximilians-Universität Würzburg in cooperation with the Technische Universität Berlin, and launched in January 2025. We present the AI agent design, the methodology of the training procedure, the discrepancies between the simulation and the observed behavior of the real satellite, and a comparison of the AI-based attitude controller with the classical PD controller of InnoCube. Steady-state metrics confirm the robust performance of the AI-based controller during repeated in-orbit maneuvers.
comment: 55 pages, 27 figures, 29 tables. The maneuver telemetry datasets generated and analyzed during this work are available in the GitHub repository https://github.com/kdjebko/lelar-in-orbit-data
Exact Recourse Functions for Aggregations of EVs Operating in Imbalance Markets
We study optimal charging of large electric vehicle populations that are exposed to a single real-time imbalance price. The problem is naturally cast as a multistage stochastic linear programme (MSLP), which can be solved by algorithms such as Stochastic Dual Dynamic Programming. However, these methods scale poorly with the number of devices and stages. This paper presents a novel approach to overcome this curse of dimensionality. Building prior work that characterises the aggregate flexibility sets of populations of EVs as a permutahdron, we reformulate the original problem in terms of aggregated quantities. The geometric structure of permutahedra lets us (i) construct an optimal disaggregation policy, (ii) derive an exact, lower-dimensional MSLP, and (iii) characterise the expected recourse function as piecewise affine with a finite, explicit partition. In particular, we provide closed-form expressions for the slopes and intercepts of each affine region via truncated expectations of future prices, yielding an exact form for the recourse function and first-stage policy. Comprehensive numerical studies validate our claims and demonstrate the practical utility of this work.
A Gauss-Newton-Induced Structure-Exploiting Algorithm for Differentiable Optimal Control
Differentiable optimal control, particularly differentiable nonlinear model predictive control (NMPC), provides a powerful framework that enjoys the complementary benefits of machine learning and control theory. A key enabler of differentiable optimal control is the computation of derivatives of the optimal trajectory with respect to problem parameters, i.e., trajectory derivatives. Previous works compute trajectory derivatives by solving a differential Karush-Kuhn-Tucker (KKT) system, and achieve this efficiently by constructing an equivalent auxiliary system. However, we find that directly exploiting the matrix structures in the differential KKT system yields significant computation speed improvements. Motivated by this insight, we propose FastDOC, which applies a Gauss-Newton approximation of Hessian and takes advantage of the resulting block-sparsity and positive semidefinite properties of the matrices involved. These structural properties enable us to accelerate the computationally expensive matrix factorization steps, resulting in a factor-of-two speedup in theoretical computational complexity, and in a synthetic benchmark FastDOC achieves up to a 180% time reduction compared to the baseline method. Finally, we validate the method on an imitation learning task for human-like autonomous driving, where the results demonstrate the effectiveness of the proposed FastDOC in practical applications.
comment: 8 pages, 7 figures, submitted to IFAC World Congress 2026 for possible publication
Hybrid Analytical-Machine Learning Framework for Ripple Factor Estimation in Cockcroft-Walton Voltage Multipliers with Residual Correction for Non-Ideal Effects
Cockcroft-Walton (CW) voltage multipliers suffer from output ripple that classical analytical models underestimate due to neglected non-idealities like diode drops and capacitor ESR, particularly in high-stage, low-frequency and heavy-load regimes. This paper proposes a hybrid framework that generates a comprehensive 324-case MATLAB/Simulink dataset varying stages (2-8), input voltage (5-25 kV), capacitance (1-10 μF), frequency (50-500 Hz) and load (6-60 MΩ), then trains a Random Forest model to predict residuals between simulated and theoretical peak-to-peak ripple. The approach achieves 70.6% RMSE reduction (131 V vs. 448 V) globally and 66.7% in critical regimes, with near-zero bias, enabling physically interpretable design optimization while outperforming pure ML in extrapolation reliability.
comment: 6 Pages, 2 figures, IEEE Conference Template used
Machine Learning of Temperature-dependent Chemical Kinetics Using Parallel Droplet Microreactors
Temperature is a fundamental regulator of chemical and biochemical kinetics, yet capturing nonlinear thermal effects directly from experimental data remains a major challenge due to limited throughput and model flexibility. Recent advances in machine learning have enabled flexible modeling beyond conventional physical laws, but most existing strategies remain confined to surrogate models of end-point yields rather than full kinetic dynamics. Consequently, an end-to-end framework that unifies systematic kinetic data acquisition with machine learning based modeling has been lacking. In this paper, we present a unified framework that integrates droplet microfluidics with machine learning for the systematic analysis of temperature-dependent reaction kinetics. The platform is specifically designed to enable stable immobilization and long-term time-lapse imaging of thousands of droplets under dynamic thermal gradients. This configuration yields massively parallel time-resolved datasets across diverse temperature conditions that capture transient kinetics and provides particularly suitable inputs for training machine-learning models of reaction dynamics. Leveraging these datasets, we train Neural ODE models, which embed neural networks within differential equations to flexibly represent nonlinear temperature dependencies beyond conventional formulations. We demonstrate accurate prediction of enzymatic kinetics across diverse thermal environments, highlighting the robustness and versatility of the approach. Our framework bridges high-throughput experimental data acquisition with data-driven modeling, establishing a versatile foundation for enhanced predictive ability and rational analysis and design of temperature-sensitive biochemical processes.
Mixed formulation and structure-preserving discretization of Cosserat rod dynamics in a port-Hamiltonian framework
An energy-based modeling framework for the nonlinear dynamics of spatial Cosserat rods undergoing large displacements and rotations is proposed. The mixed formulation features independent displacement, velocity and stress variables and is further objective and locking-free. Finite rotations are represented using a director formulation that avoids singularities and yields a constant mass matrix. This results in an infinite-dimensional nonlinear port-Hamiltonian (PH) system governed by partial differential-algebraic equations with a quadratic energy functional. Using a time-differentiated compliance form of the stress-strain relations allows for the imposition of kinematic constraints, such as inextensibility or shear-rigidity. A structure-preserving finite element discretization leads to a finite-dimensional system with PH structure, thus facilitating the design of an energy-momentum consistent integration scheme. Dissipative material behavior (via the generalized-Maxwell model) and non-standard actuation approaches (via pneumatic chambers or tendons) integrate naturally into the framework. As illustrated by selected numerical examples, the present framework establishes a new approach to energy-momentum consistent formulations in computational mechanics involving finite rotations.
comment: 37 pages, 16 figures, currently under review
Power feedback strategy based on efficiency trajectory analysis for HCPV sun tracking
This paper presents a control strategy for sun trackers which adapts continuously to different sources of error, avoiding the necessity of any kind of calibration by analyzing the produced electric power to sense the position of the Sun. The proposed strategy is able to meet the strict specifications for HCPV sun trackers despite of mechanical uncertainties (misalignments in the structure itself, misalignment of the solar modules with respect to the wing, etc.) and installation uncertainties (misalignments of the platform with respect to geographical north). Experimental results with an industrial-grade solar tracker showing the validity of the proposed control strategy under sunny and moderate cloudy conditions, as well as with different installation precisions by un-calibrating the system on purpose are exposed.
Stability Analysis of a B-Spline Deep Neural Operator for Nonlinear Systems
This paper investigates the stability properties of neural operators through the structured representation offered by the Hybrid B-spline Deep Neural Operator (HBDNO). While existing stability-aware architectures typically enforce restrictive constraints that limit universality, HBDNO preserves full expressive power by representing outputs via B-spline control points. We show that these control points form a natural observable for post-training stability analysis. By applying Dynamic Mode Decomposition and connecting the resulting discrete dynamics to the Koopman operator framework, we provide a principled approach to spectral characterization of learned operators. Numerical results demonstrate the ability to assess stability and reveal future directions for safety-critical applications.
Topology-based Conditions for Multiconsensus under the Signed Friedkin-Johnsen Model
In this paper, we address the multiconsensus problem in networked systems, where agents are partitioned into disjoint subgroups and the states of agents within a subgroup are driven to consensus. Our objective is to present a distributed control law that leads to multiconsensus in signed digraphs. To this end, we examine the convergence of opinions under the opposing rule-based signed Friedkin-Johnsen (SFJ) model and present conditions that lead to multiconsensus under this model. Interestingly, the proposed conditions depend only on graph topology and signed interactions and not on the edge weights of the network. Consequently, the proposed SFJ-based control law relaxes the in-degree balance and homogeneity of trust-distrust, frequently assumed in the literature. Finally, we add simulation results to demonstrate the proposed conditions for multiconsensus.
Vision-Aided Relative State Estimation for Approach and Landing on a Moving Platform with Inertial Measurements
This paper tackles the problem of estimating the relative position, orientation, and velocity between a UAV and a planar platform undergoing arbitrary 3D motion during approach and landing. The estimation relies on measurements from Inertial Measurement Units (IMUs) mounted on both systems, assuming there is a suitable communication channel to exchange data, together with visual information provided by an onboard monocular camera, from which the bearing (line-of-sight direction) to the platform's center and the normal vector of its planar surface are extracted. We propose a cascade observer with a complementary filter on SO(3) to reconstruct the relative attitude, followed by a linear Riccati observer for relative position and velocity estimation. Convergence of both observers is established under persistently exciting conditions, and the cascade is shown to be almost globally asymptotically and locally exponentially stable. We further extend the design to the case where the platform's rotation is restricted to its normal axis and show that its measured linear acceleration can be exploited to recover the remaining unobservable rotation angle. A sufficient condition to ensure local exponential convergence in this setting is provided. The performance of the proposed observers is validated through extensive simulations.
comment: 13 pages, 4 figures. Submitted to IFAC World Congress 2026
Modular Landfill Remediation for AI Grid Resilience
Rising AI electricity demand and persistent landfill methane emissions constitute coupled constraints on U.S. digital infrastructure and decarbonization. While China has achieved a rapid 'de-landfilling' transition through centralized coordination, the U.S. remains structurally 'locked in' to landfilling due to fragmented governance and carbon accounting incentives. This paper proposes a modular legacy landfill remediation framework to address these dual challenges within U.S. institutional constraints. By treating legacy sites as stock resources, the proposed system integrates excavation, screening, and behind-the-meter combined heat and power (CHP) to transform environmental liabilities into resilience assets. A system analysis of a representative AI corridor demonstrates that such modules can mitigate site-level methane by 60-70% and recover urban land, while supplying approximately 20 MW of firm, islandable power. Although contributing only approximately 5% of a hyperscale data center's bulk load, it provides critical microgrid resilience and black-start capability. We conclude that remediation-oriented waste-to-energy should be valued not as a substitute for bulk renewables, but as a strategic control volume for buffering critical loads against grid volatility while resolving long-term environmental liabilities.
Semantic Communication for Rate-Limited Closed-Loop Distributed Communication-Sensing-Control Systems
The growing integration of distributed integrated sensing and communication (ISAC) with closed-loop control in intelligent networks demands efficient information transmission under stringent bandwidth constraints. To address this challenge, this paper proposes a unified framework for goal-oriented semantic communication in distributed SCC systems. Building upon Weaver's three-level model, we establish a hierarchical semantic formulation with three error levels (L1: observation reconstruction, L2: state estimation, and L3: control) to jointly optimize their corresponding objectives. Based on this formulation, we propose a unified goal-oriented semantic compression and rate adaptation framework that is applicable to different semantic error levels and optimization goals across the SCC loop. A rate-limited multi-sensor LQR system is used as a case study to validate the proposed framework. We employ a GRU-based AE for semantic compression and a PPO-based rate adaptation algorithm that dynamically allocates transmission rates across sensors. Results show that the proposed framework effectively captures task-relevant semantics and adapts its resource allocation strategies across different semantic levels, thereby achieving level-specific performance gains under bandwidth constraints.
comment: 13 pages, 18 figures. This work has been submitted to the IEEE for possible publication
Finite-sample guarantees for data-driven forward-backward operator methods
We establish finite sample certificates on the quality of solutions produced by data-based forward-backward (FB) operator splitting schemes. As frequently happens in stochastic regimes, we consider the problem of finding a zero of the sum of two operators, where one is either unavailable in closed form or computationally expensive to evaluate, and shall therefore be approximated using a finite number of noisy oracle samples. Under the lens of algorithmic stability, we then derive probabilistic bounds on the distance between a true zero and the FB output without making specific assumptions about the underlying data distribution. We show that under weaker conditions ensuring the convergence of FB schemes, stability bounds grow proportionally to the number of iterations. Conversely, stronger assumptions yield stability guarantees that are independent of the iteration count. We then specialize our results to a popular FB stochastic Nash equilibrium seeking algorithm and validate our theoretical bounds on a control problem for smart grids, where the energy price uncertainty is approximated by means of historical data.
Optical design and characterization of a multi-depth vision simulator
We present a vision simulator device (Katsim), a compact near-eye optical display designed for assessing postoperative corrected vision, preoperative intraocular lens (IOL) assessment, and objective IOL characterization. The system forms a virtual image using an amplitude-modulated LCoS spatial light modulator (AM-SLM), RGB LED illumination, and a high-speed varifocal lens. In the proposed architecture, the LED illumination and varifocal lens diopter changes are triggered in synchrony with the SLM RGB subframes, rendering three depth planes perceptually simultaneously via high-frequency time-multiplexing. Operating at 60 frames per second (fps), the system achieves an effective 180 Hz depth-coded cycle, enabling sharp, multi-depth rendering within a dynamically adjustable depth range from 0.2 m to optical infinity. The system's eyebox is configurable from 1 to 5 mm, while maintaining a fixed spatial location and preserving angular magnification regardless of changes in focus or eyebox size. The designed system features a 9.15-degree field of view. An integrated infrared pupil-tracking module detects non-cataractous regions of the cataractous crystalline lens, and the projected imagery is mechanically steered through those clear zones in real time. The proposed vision simulator supports both subjective simulation of post-surgical vision for patient-specific counseling and objective optical evaluation of IOLs, including resolution and contrast fidelity (e.g., modulation transfer function, contrast transfer function, and defocus curves). By decoupling depth modulation from eyebox position and size, the system offers a modular, portable platform that supports enhanced preoperative planning, personalized IOL selection, objective IOL characterization, and use as a novel research vision tool.
comment: 16 pages, 8 figures. Preprint submitted to Biomedical Optics Express journal
Rapid stabilization of the heat equation with localized disturbance
This paper studies the rapid stabilization of a multidimensional heat equation in the presence of an unknown spatially localized disturbance. A novel multivalued feedback control strategy is proposed, which synthesizes the frequency Lyapunov method (introduced by Xiang [41]) with the sign multivalued operator. This methodology connects Lyapunov-based stability analysis with spectral inequalities, while the inclusion of the sign operator ensures robustness against the disturbance. The closed-loop system is governed by a differential inclusion, for which well-posedness is proved via the theory of maximal monotone operators. This approach not only guarantees exponential stabilization but also circumvents the need for explicit disturbance modeling or estimation.
OPBO: Order-Preserving Bayesian Optimization
Bayesian optimization is an effective method for solving expensive black-box optimization problems. Most existing methods use Gaussian processes (GP) as the surrogate model for approximating the black-box objective function, it is well-known that it can fail in high-dimensional space (e.g., dimension over 500). We argue that the reliance of GP on precise numerical fitting is fundamentally ill-suited in high-dimensional space, where it leads to prohibitive computational complexity. In order to address this, we propose a simple order-preserving Bayesian optimization (OPBO) method, where the surrogate model preserves the order, instead of the value, of the black-box objective function. Then we can use a simple but effective OP neural network (NN) to replace GP as the surrogate model. Moreover, instead of searching for the best solution from the acquisition model, we select good-enough solutions in the ordinal set to reduce computational cost. The experimental results show that for high-dimensional (over 500) black-box optimization problems, the proposed OPBO significantly outperforms traditional BO methods based on regression NN and GP. The source code is available at https://github.com/pengwei222/OPBO.
comment: 13 pages
A Time-efficient Prioritised Scheduling Algorithm to Optimise Initial Flock Formation of Drones
Drone applications continue to expand across various domains, with flocking offering enhanced cooperative capabilities but introducing significant challenges during initial formation. Existing flocking algorithms often struggle with efficiency and scalability, particularly when potential collisions force drones into suboptimal trajectories. This paper presents a time-efficient prioritised scheduling algorithm that improves the initial formation process of drone flocks. The method assigns each drone a priority based on its number of potential collisions and its likelihood of reaching its target position without permanently obstructing other drones. Using this hierarchy, each drone computes an appropriate delay to ensure a collision-free path. Simulation results show that the proposed algorithm successfully generates collision-free trajectories for flocks of up to 5000 drones and outperforms the coupling-degree-based heuristic prioritised planning method (CDH-PP) in both performance and computational efficiency.
comment: 35 pages
A Class of Axis-Angle Attitude Control Laws for Rotational Systems
We introduce a new class of attitude control laws for rotational systems, which generalizes the use of the Euler axis-angle representation beyond quaternion-based formulations. Using basic Lyapunov's stability theory and the notion of extended $K_{\infty}$ functions, we developed a method for determining and enforcing the global asymptotic stability of the single fixed point of the resulting closed-loop (CL) scheme. In contrast with traditional quaternion-based methods, the proposed generalized axis-angle approach enables greater flexibility in the design of the control law, which is of great utility when employed in combination with a switching scheme whose transition state depends on the angular velocity of the controlled rotational system. Through simulation and real-time experimental results, we demonstrate the effectiveness of the proposed approach. According to the recorded data, in the execution of high-speed tumble-recovery maneuvers, the new method consistently achieves shorter stabilization times and requires lower control effort relative to those corresponding to the quaternion-based and geometric-control methods used as benchmarks.
comment: 6 pages, 4 figures. Accepted for publication in IEEE Control Systems Letters
Semantic Communication for Rate-Limited Closed-Loop Distributed Communication-Sensing-Control Systems
The growing integration of distributed integrated sensing and communication (ISAC) with closed-loop control in intelligent networks demands efficient information transmission under stringent bandwidth constraints. To address this challenge, this paper proposes a unified framework for goal-oriented semantic communication in distributed SCC systems. Building upon Weaver's three-level model, we establish a hierarchical semantic formulation with three error levels (L1: observation reconstruction, L2: state estimation, and L3: control) to jointly optimize their corresponding objectives. Based on this formulation, we propose a unified goal-oriented semantic compression and rate adaptation framework that is applicable to different semantic error levels and optimization goals across the SCC loop. A rate-limited multi-sensor LQR system is used as a case study to validate the proposed framework. We employ a GRU-based AE for semantic compression and a PPO-based rate adaptation algorithm that dynamically allocates transmission rates across sensors. Results show that the proposed framework effectively captures task-relevant semantics and adapts its resource allocation strategies across different semantic levels, thereby achieving level-specific performance gains under bandwidth constraints.
comment: 13 pages, 18 figures. This work has been submitted to the IEEE for possible publication
An overview of systems-theoretic guarantees in data-driven model predictive control
The development of control methods based on data has seen a surge of interest in recent years. When applying data-driven controllers in real-world applications, providing theoretical guarantees for the closed-loop system is of crucial importance to ensure reliable operation. In this review, we provide an overview of data-driven model predictive control (MPC) methods for controlling unknown systems with guarantees on systems-theoretic properties such as stability, robustness, and constraint satisfaction. The considered approaches rely on the Fundamental Lemma from behavioral theory in order to predict input-output trajectories directly from data. We cover various setups, ranging from linear systems and noise-free data to more realistic formulations with noise and nonlinearities, and we provide an overview of different techniques to ensure guarantees for the closed-loop system. Moreover, we discuss avenues for future research that may further improve the theoretical understanding and practical applicability of data-driven MPC.
On Zero-sum Game Representation for Replicator Dynamics
Replicator dynamics have been widely used in evolutionary game theory to model how strategy frequencies evolve over time in large populations. The so-called payoff matrix encodes the pairwise fitness that each strategy obtains when interacting with every other strategy, and it solely determines the replicator dynamics. If the payoff matrix is unknown, we show in this paper that it cannot be inferred from observed strategy frequencies alone -- distinct payoff matrices can induce the same replicator dynamics. We thus look for a canonical representative of the payoff matrix in the equivalence class. The main result of the paper is to show that for every polynomial replicator dynamics (i.e., the vector field is a polynomial), there always exists a skew-symmetric, polynomial payoff matrix that can induce the given dynamics.
Configuration-Constrained Tube MPC for Periodic Operation
Periodic operation often emerges as the economically optimal mode in industrial processes, particularly under varying economic or environmental conditions. This paper proposes a robust model predictive control (MPC) framework for uncertain systems modeled as polytopic linear differential inclusions (LDIs), where the dynamics evolve as convex combinations of finitely many affine control systems with additive disturbances. The robust control problem is reformulated as a convex optimization program by optimizing over configuration-constrained polytopic tubes and tracks a periodic trajectory that is optimal for a given economic criterion. Artificial variables embedded in the formulation ensure recursive feasibility and robust constraint satisfaction when the economic criterion is updated online, while guaranteeing convergence to the corresponding optimal periodic tube when the criterion remains constant. To improve computational efficiency, we introduce a quadratic over-approximation of the periodic cost under a Lipschitz continuity assumption, yielding a Quadratic Program (QP) formulation that preserves the above theoretical guarantees. The effectiveness and scalability of the approach are demonstrated on a benchmark example and a ball-plate system with eight states.
comment: 11 pages, 3 figures, submitted for IEEE-TACON
Deep Reinforcement Learning Optimization for Uncertain Nonlinear Systems via Event-Triggered Robust Adaptive Dynamic Programming
This work proposes a unified control architecture that couples a Reinforcement Learning (RL)-driven controller with a disturbance-rejection Extended State Observer (ESO), complemented by an Event-Triggered Mechanism (ETM) to limit unnecessary computations. The ESO is utilized to estimate the system states and the lumped disturbance in real time, forming the foundation for effective disturbance compensation. To obtain near-optimal behavior without an accurate system description, a value-iteration-based Adaptive Dynamic Programming (ADP) method is adopted for policy approximation. The inclusion of the ETM ensures that parameter updates of the learning module are executed only when the state deviation surpasses a predefined bound, thereby preventing excessive learning activity and substantially reducing computational load. A Lyapunov-oriented analysis is used to characterize the stability properties of the resulting closed-loop system. Numerical experiments further confirm that the developed approach maintains strong control performance and disturbance tolerance, while achieving a significant reduction in sampling and processing effort compared with standard time-triggered ADP schemes.
comment: we have identified some technical issues, including the mathematical derivation. After discussion, all authors have agreed that the analysis requires a thorough re-derivation to ensure correctness and rigor
A Riemannian Optimization Perspective of the Gauss-Newton Method for Feedforward Neural Networks
In this work, we establish non-asymptotic convergence bounds for the Gauss-Newton method in training neural networks with smooth activations. In the underparameterized regime, the Gauss-Newton gradient flow in parameter space induces a Riemannian gradient flow on a low-dimensional embedded submanifold of the function space. Using tools from Riemannian optimization, we establish geodesic Polyak-Lojasiewicz and Lipschitz-smoothness conditions for the loss under appropriately chosen output scaling, yielding geometric convergence to the optimal in-class predictor at an explicit rate independent of the conditioning of the Gram matrix. In the overparameterized regime, we propose adaptive, curvature-aware regularization schedules that ensure fast geometric convergence to a global optimum at a rate independent of the minimum eigenvalue of the neural tangent kernel and, locally, of the modulus of strong convexity of the loss. These results demonstrate that Gauss-Newton achieves accelerated convergence rates in settings where first-order methods exhibit slow convergence due to ill-conditioned kernel matrices and loss landscapes.
Networked Communication for Mean-Field Games with Function Approximation and Empirical Mean-Field Estimation
Recent algorithms allow decentralised agents, possibly connected via a communication network, to learn equilibria in mean-field games from a non-episodic run of the empirical system. However, these algorithms are for tabular settings: this computationally limits the size of agents' observation space, meaning the algorithms cannot handle anything but small state spaces, nor generalise beyond policies depending only on the agent's local state to so-called 'population-dependent' policies. We address this limitation by introducing function approximation to the existing setting, drawing on the Munchausen Online Mirror Descent method that has previously been employed only in finite-horizon, episodic, centralised settings. While this permits us to include the mean field in the observation for players' policies, it is unrealistic to assume decentralised agents have access to this global information: we therefore also provide new algorithms allowing agents to locally estimate the global empirical distribution, and to improve this estimate via inter-agent communication. We prove theoretically that exchanging policy information helps networked agents outperform both independent and even centralised agents in function-approximation settings. Our experiments demonstrate this happening empirically, and show that the communication network allows decentralised agents to estimate the mean field for population-dependent policies.
Networked Communication for Decentralised Agents in Mean-Field Games
Methods like multi-agent reinforcement learning struggle to scale with growing population size. Mean-field games (MFGs) are a game-theoretic approach that can circumvent this by finding a solution for an abstract infinite population, which can then be used as an approximate solution for the $N$-agent problem. However, classical mean-field algorithms usually only work under restrictive conditions. We take steps to address this by introducing networked communication to MFGs, in particular to settings that use a single, non-episodic run of $N$ decentralised agents to simulate the infinite population, as is likely to be most reasonable in real-world deployments. We prove that our architecture's sample guarantees lie between those of earlier theoretical algorithms for the centralised- and independent-learning architectures, varying dependent on network structure and the number of communication rounds. However, the sample guarantees of the three theoretical algorithms do not actually result in practical convergence times. We thus contribute practical enhancements to all three algorithms allowing us to present their first empirical demonstrations. We then show that in practical settings where the theoretical hyperparameters are not observed, giving fewer loops but poorer estimation of the Q-function, our communication scheme still respects the earlier theoretical analysis: it considerably accelerates learning over the independent case, which hardly seems to learn at all, and often performs similarly to the centralised case, while removing the restrictive assumption of the latter. We provide ablations and additional studies showing that our networked approach also has advantages over both alternatives in terms of robustness to update failures and to changes in population size.
Neural Controller for Incremental Stability of Unknown Continuous-time Systems
This work primarily focuses on synthesizing a controller that guarantees an unknown continuous-time system to be incrementally input-to-state stable ($δ$-ISS). In this context, the notion of $δ$-ISS control Lyapunov function ($δ$-ISS-CLF) for the continuous-time system is introduced. Combined with the controller, the $δ$-ISS-CLF guarantees that the system is incrementally stable. As the paper deals with unknown dynamical systems, the controller as well as the $δ$-ISS-CLF are parametrized using neural networks. The data set used to train the neural networks is generated from the state space of the system by proper sampling. Now, to give a formal guarantee that the controller makes the system incrementally stable, we develop a validity condition by having some Lipschitz continuity assumptions and incorporate the condition into the training framework to ensure a provable correctness guarantee at the end of the training process. Finally, we demonstrate the effectiveness of the proposed approach through several case studies: a scalar system with a non-affine, non-polynomial structure, a one-link manipulator system, a nonlinear Moore-Greitzer model of a jet engine, a magnetic levitator system and a rotating rigid spacecraft model.
comment: arXiv admin note: substantial text overlap with arXiv:2503.04129
GUIDEd Agents: Enhancing Navigation Policies through Task-Specific Uncertainty Abstraction in Localization-Limited Environments
Autonomous vehicles performing navigation tasks in complex environments face significant challenges due to uncertainty in state estimation. In many scenarios, such as stealth operations or resource-constrained settings, accessing high-precision localization comes at a significant cost, forcing robots to rely primarily on less precise state estimates. Our key observation is that different tasks require varying levels of precision in different regions: a robot navigating a crowded space might need precise localization near obstacles but can operate effectively with less precision elsewhere. In this paper, we present a planning method for integrating task-specific uncertainty requirements directly into navigation policies. We introduce Task-Specific Uncertainty Maps (TSUMs), which abstract the acceptable levels of state estimation uncertainty across different regions. TSUMs align task requirements and environmental features using a shared representation space, generated via a domain-adapted encoder. Using TSUMs, we propose Generalized Uncertainty Integration for Decision-Making and Execution (GUIDE), a policy conditioning framework that incorporates these uncertainty requirements into robot decision-making. We find that TSUMs provide an effective way to abstract task-specific uncertainty requirements, and conditioning policies on TSUMs enables the robot to reason about the context-dependent value of certainty and adapt its behavior accordingly. We show how integrating GUIDE into reinforcement learning frameworks allows the agent to learn navigation policies that effectively balance task completion and uncertainty management without explicit reward engineering. We evaluate GUIDE on various real-world robotic navigation tasks and find that it demonstrates significant improvement in task completion rates compared to baseline methods that do not explicitly consider task-specific uncertainty.
comment: Accepted for publication at RAL (Robotics and automation letters). Updated with the final version
JITServe: SLO-aware LLM Serving with Imprecise Request Information
The integration of Large Language Models (LLMs) into applications ranging from interactive chatbots to multi-agent systems has introduced a wide spectrum of service-level objectives (SLOs) for responsiveness. These include latency-sensitive requests emphasizing per-token latency in streaming chat, deadline-sensitive requests requiring rapid full responses to trigger external tools, and compound requests with evolving dependencies across multiple LLM calls. Despite-or perhaps, because of-this workload diversity and unpredictable request information (e.g., response lengths and dependencies), existing request schedulers have focused on aggregate performance, unable to ensure application-level SLO needs. This paper presents JITServe, the first SLO-aware LLM serving system designed to maximize service goodput (e.g., the number of tokens meeting request SLOs) across diverse workloads. JITServe novelly schedules requests using imprecise request information and gradually relaxes this conservatism by refining request information estimates as generation progresses. It applies a grouped margin goodput maximization algorithm to allocate just enough serving bandwidth to satisfy each request's SLO just-in-time (JIT), maximizing residual capacity for others, while deciding the composition of requests in a batch to maximize efficiency and goodput with provable guarantees. Our evaluation across diverse realistic workloads, including chat, deep research, and agentic pipelines, shows that JITServe improves service goodput by 1.4x-6.3x, alternatively achieving 28.5%-83.2% resource savings, compared to state-of-the-art designs.
Physics-Informed Dynamical Modeling of Extrusion-Based 3D Printing Processes
The trade-off between model fidelity and computational cost remains a central challenge in the computational modeling of extrusion-based 3D printing, particularly for real time optimization and control. Although high fidelity simulations have advanced considerably for offline analysis, dynamical modeling tailored for online, control-oriented applications is still significantly underdeveloped. In this study, we propose a reduced order dynamical flow model that captures the transient behavior of extrusion-based 3D printing. The model is grounded in physics-based principles derived from the Navier Stokes equations and further simplified through spatial averaging and input dependent parameterization. To assess its performance, the model is identified via a nonlinear least squares approach using Computational Fluid Dynamics (CFD) simulation data spanning a range of printing conditions and subsequently validated across multiple combinations of training and testing scenarios. The results demonstrate strong agreement with the CFD data within the nozzle, the nozzle substrate gap, and the deposited layer regions. Overall, the proposed reduced order model successfully captures the dominant flow dynamics of the process while maintaining a level of simplicity compatible with real time control and optimization.
comment: 19 pages, 12 figures
Robotics
Optimizing Robotic Placement via Grasp-Dependent Feasibility Prediction
In this paper, we study whether inexpensive, physics-free supervision can reliably prioritize grasp-place candidates for budget-aware pick-and-place. From an object's initial pose, target pose, and a candidate grasp, we generate two path-aware geometric labels: path-wise inverse kinematics (IK) feasibility across a fixed approach-grasp-lift waypoint template, and a transit collision flag from mesh sweeps along the same template. A compact dual-output MLP learns these signals from pose encodings, and at test time its scores rank precomputed candidates for a rank-then-plan policy under the same IK gate and planner as the baseline. Although learned from cheap labels only, the scores transfer to physics-enabled executed trajectories: at a fixed planning budget the policy finds successful paths sooner with fewer planner calls while keeping final success on par or better. This work targets a single rigid cuboid with side-face grasps and a fixed waypoint template, and we outline extensions to varied objects and richer waypoint schemes.
comment: Accepted in ACRA 2025
Construction and deformation of P-hedra using control polylines
In the 19th International Symposium on Advances in Robot Kinematics the author introduced a novel class of continuous flexible discrete surfaces and mentioned that these so-called P-hedra (or P-nets) allow direct access to their spatial shapes by three control polylines. In this follow-up paper we study this intuitive method, which makes these flexible planar quad surfaces suitable for transformable design tasks by means of interactive tools. The construction of P-hedra from the control polylines can also be used for an efficient algorithmic computation of their isometric deformations. In addition we discuss flexion limits, bifurcation configurations, developable/flat-foldable pattern and tubular P-hedra.
comment: 8 pages, 4 figures
InDRiVE: Reward-Free World-Model Pretraining for Autonomous Driving via Latent Disagreement
Model-based reinforcement learning (MBRL) can reduce interaction cost for autonomous driving by learning a predictive world model, but it typically still depends on task-specific rewards that are difficult to design and often brittle under distribution shift. This paper presents InDRiVE, a DreamerV3-style MBRL agent that performs reward-free pretraining in CARLA using only intrinsic motivation derived from latent ensemble disagreement. Disagreement acts as a proxy for epistemic uncertainty and drives the agent toward under-explored driving situations, while an imagination-based actor-critic learns a planner-free exploration policy directly from the learned world model. After intrinsic pretraining, we evaluate zero-shot transfer by freezing all parameters and deploying the pretrained exploration policy in unseen towns and routes. We then study few-shot adaptation by training a task policy with limited extrinsic feedback for downstream objectives (lane following and collision avoidance). Experiments in CARLA across towns, routes, and traffic densities show that disagreement-based pretraining yields stronger zero-shot robustness and robust few-shot collision avoidance under town shift and matched interaction budgets, supporting the use of intrinsic disagreement as a practical reward-free pretraining signal for reusable driving world models.
Multimodal Classification Network Guided Trajectory Planning for Four-Wheel Independent Steering Autonomous Parking Considering Obstacle Attributes
Four-wheel Independent Steering (4WIS) vehicles have attracted increasing attention for their superior maneuverability. Human drivers typically choose to cross or drive over the low-profile obstacles (e.g., plastic bags) to efficiently navigate through narrow spaces, while existing planners neglect obstacle attributes, causing inefficiency or path-finding failures. To address this, we propose a trajectory planning framework integrating the 4WIS hybrid A* and Optimal Control Problem (OCP), in which the hybrid A* provides an initial path to enhance the OCP solution. Specifically, a multimodal classification network is introduced to assess scene complexity (hard/easy task) by fusing image and vehicle state data. For hard tasks, guided points are set to decompose complex tasks into local subtasks, improving the search efficiency of 4WIS hybrid A*. The multiple steering modes of 4WIS vehicles (Ackermann, diagonal, and zero-turn) are also incorporated into node expansion and heuristic designs. Moreover, a hierarchical obstacle handling strategy is designed to guide the node expansion considering obstacle attributes, i.e., 'non-traversable', 'crossable', and 'drive-over' obstacles. It allows crossing or driving over obstacles instead of the 'avoid-only' strategy, greatly enhancing success rates of pathfinding. We also design a logical constraint for the 'drive-over' obstacle by limiting its velocity to ensure safety. Furthermore, to address dynamic obstacles with motion uncertainty, we introduce a probabilistic risk field model, constructing risk-aware driving corridors that serve as linear collision constraints in OCP. Experimental results demonstrate the proposed framework's effectiveness in generating safe, efficient, and smooth trajectories for 4WIS vehicles, especially in constrained environments.
DSO-VSA: a Variable Stiffness Actuator with Decoupled Stiffness and Output Characteristics for Rehabilitation Robotics
Stroke-induced motor impairment often results in substantial loss of upper-limb function, creating a strong demand for rehabilitation robots that enable safe and transparent physical human-robot interaction (pHRI). Variable stiffness actuators are well suited for such applications. However, in most existing designs, stiffness is coupled with the deflection angle, complicating both modeling and control. To address this limitation, this paper presents a variable stiffness actuator featuring decoupled stiffness and output behavior for rehabilitation robotics. The system integrates a variable stiffness mechanism that combines a variable-length lever with a hypocycloidal straight-line mechanism to achieve a linear torque-deflection relationship and continuous stiffness modulation from near zero to theoretically infinite. It also incorporates a differential transmission mechanism based on a planetary gear system that enables dual-motor load sharing. A cascade PI controller is further developed on the basis of the differential configuration, in which the position-loop term jointly regulates stiffness and deflection angle, effectively suppressing stiffness fluctuations and output disturbances. The performance of prototype was experimentally validated through stiffness calibration, stiffness regulation, torque control, decoupled characteristics, and dual-motor load sharing, indicating the potential for rehabilitation exoskeletons and other pHRI systems.
CauTraj: A Causal-Knowledge-Guided Framework for Lane-Changing Trajectory Planning of Autonomous Vehicles
Enhancing the performance of trajectory planners for lane - changing vehicles is one of the key challenges in autonomous driving within human - machine mixed traffic. Most existing studies have not incorporated human drivers' prior knowledge when designing trajectory planning models. To address this issue, this study proposes a novel trajectory planning framework that integrates causal prior knowledge into the control process. Both longitudinal and lateral microscopic behaviors of vehicles are modeled to quantify interaction risk, and a staged causal graph is constructed to capture causal dependencies in lane-changing scenarios. Causal effects between the lane-changing vehicle and surrounding vehicles are then estimated using causal inference, including average causal effects (ATE) and conditional average treatment effects (CATE). These causal priors are embedded into a model predictive control (MPC) framework to enhance trajectory planning. The proposed approach is validated on naturalistic vehicle trajectory datasets. Experimental results show that: (1) causal inference provides interpretable and stable quantification of vehicle interactions; (2) individual causal effects reveal driver heterogeneity; and (3) compared with the baseline MPC, the proposed method achieves a closer alignment with human driving behaviors, reducing maximum trajectory deviation from 1.2 m to 0.2 m, lateral velocity fluctuation by 60%, and yaw angle variability by 50%. These findings provide methodological support for human-like trajectory planning and practical value for improving safety, stability, and realism in autonomous vehicle testing and traffic simulation platforms.
Offline Reinforcement Learning for End-to-End Autonomous Driving
End-to-end (E2E) autonomous driving models that take only camera images as input and directly predict a future trajectory are appealing for their computational efficiency and potential for improved generalization via unified optimization; however, persistent failure modes remain due to reliance on imitation learning (IL). While online reinforcement learning (RL) could mitigate IL-induced issues, the computational burden of neural rendering-based simulation and large E2E networks renders iterative reward and hyperparameter tuning costly. We introduce a camera-only E2E offline RL framework that performs no additional exploration and trains solely on a fixed simulator dataset. Offline RL offers strong data efficiency and rapid experimental iteration, yet is susceptible to instability from overestimation on out-of-distribution (OOD) actions. To address this, we construct pseudo ground-truth trajectories from expert driving logs and use them as a behavior regularization signal, suppressing imitation of unsafe or suboptimal behavior while stabilizing value learning. Training and closed-loop evaluation are conducted in a neural rendering environment learned from the public nuScenes dataset. Empirically, the proposed method achieves substantial improvements in collision rate and route completion compared with IL baselines. Our code will be available at [URL].
comment: 15 pages
Geometric-Photometric Event-based 3D Gaussian Ray Tracing
Event cameras offer a high temporal resolution over traditional frame-based cameras, which makes them suitable for motion and structure estimation. However, it has been unclear how event-based 3D Gaussian Splatting (3DGS) approaches could leverage fine-grained temporal information of sparse events. This work proposes a framework to address the trade-off between accuracy and temporal resolution in event-based 3DGS. Our key idea is to decouple the rendering into two branches: event-by-event geometry (depth) rendering and snapshot-based radiance (intensity) rendering, by using ray-tracing and the image of warped events. The extensive evaluation shows that our method achieves state-of-the-art performance on the real-world datasets and competitive performance on the synthetic dataset. Also, the proposed method works without prior information (e.g., pretrained image reconstruction models) or COLMAP-based initialization, is more flexible in the event selection number, and achieves sharp reconstruction on scene edges with fast training time. We hope that this work deepens our understanding of the sparse nature of events for 3D reconstruction. The code will be released.
comment: 15 pages, 10 figures, 5 tables
ChronoDreamer: Action-Conditioned World Model as an Online Simulator for Robotic Planning
We present ChronoDreamer, an action-conditioned world model for contact-rich robotic manipulation. Given a history of egocentric RGB frames, contact maps, actions, and joint states, ChronoDreamer predicts future video frames, contact distributions, and joint angles via a spatial-temporal transformer trained with MaskGIT-style masked prediction. Contact is encoded as depth-weighted Gaussian splat images that render 3D forces into a camera-aligned format suitable for vision backbones. At inference, predicted rollouts are evaluated by a vision-language model that reasons about collision likelihood, enabling rejection sampling of unsafe actions before execution. We train and evaluate on DreamerBench, a simulation dataset generated with Project Chrono that provides synchronized RGB, contact splat, proprioception, and physics annotations across rigid and deformable object scenarios. Qualitative results demonstrate that the model preserves spatial coherence during non-contact motion and generates plausible contact predictions, while the LLM-based judge distinguishes collision from non-collision trajectories.
SD2AIL: Adversarial Imitation Learning from Synthetic Demonstrations via Diffusion Models
Adversarial Imitation Learning (AIL) is a dominant framework in imitation learning that infers rewards from expert demonstrations to guide policy optimization. Although providing more expert demonstrations typically leads to improved performance and greater stability, collecting such demonstrations can be challenging in certain scenarios. Inspired by the success of diffusion models in data generation, we propose SD2AIL, which utilizes synthetic demonstrations via diffusion models. We first employ a diffusion model in the discriminator to generate synthetic demonstrations as pseudo-expert data that augment the expert demonstrations. To selectively replay the most valuable demonstrations from the large pool of (pseudo-) expert demonstrations, we further introduce a prioritized expert demonstration replay strategy (PEDR). The experimental results on simulation tasks demonstrate the effectiveness and robustness of our method. In particular, in the Hopper task, our method achieves an average return of 3441, surpassing the state-of-the-art method by 89. Our code will be available at https://github.com/positron-lpc/SD2AIL.
CRISP: Contact-Guided Real2Sim from Monocular Video with Planar Scene Primitives SP
We introduce CRISP, a method that recovers simulatable human motion and scene geometry from monocular video. Prior work on joint human-scene reconstruction relies on data-driven priors and joint optimization with no physics in the loop, or recovers noisy geometry with artifacts that cause motion tracking policies with scene interactions to fail. In contrast, our key insight is to recover convex, clean, and simulation-ready geometry by fitting planar primitives to a point cloud reconstruction of the scene, via a simple clustering pipeline over depth, normals, and flow. To reconstruct scene geometry that might be occluded during interactions, we make use of human-scene contact modeling (e.g., we use human posture to reconstruct the occluded seat of a chair). Finally, we ensure that human and scene reconstructions are physically-plausible by using them to drive a humanoid controller via reinforcement learning. Our approach reduces motion tracking failure rates from 55.2\% to 6.9\% on human-centric video benchmarks (EMDB, PROX), while delivering a 43\% faster RL simulation throughput. We further validate it on in-the-wild videos including casually-captured videos, Internet videos, and even Sora-generated videos. This demonstrates CRISP's ability to generate physically-valid human motion and interaction environments at scale, greatly advancing real-to-sim applications for robotics and AR/VR.
comment: Project page: https://crisp-real2sim.github.io/CRISP-Real2Sim/
Flying robots for a smarter life
Innovative ideas are continuously emerging to produce better life conditions where essential human needs are supposed to be fulfilled with perfect scenarios leading us to propose modern strategies drawing the future of smart cities. In this context, flying robots are increasingly exploited in many fields to improve the quality of our life. This paper illustrates new designs of flying robots that could be used to perform a variety of advanced missions like investigating the state of high-power lines and manipulating cabling maintenance procedures when failures are detected, evaluating the state of the outer edge of sidewalks to color their partially or wholly erased parts, and spraying pesticides to trees or crops that are affected by different diseases. Creating such smart devices demands developing many other partial designs relying on AI-based algorithms, computer vision techniques, and embedded systems. A variety of techniques that we have recently developed in this field are presented here.
Collaborative Representation Learning for Alignment of Tactile, Language, and Vision Modalities AAAI 2026
Tactile sensing offers rich and complementary information to vision and language, enabling robots to perceive fine-grained object properties. However, existing tactile sensors lack standardization, leading to redundant features that hinder cross-sensor generalization. Moreover, existing methods fail to fully integrate the intermediate communication among tactile, language, and vision modalities. To address this, we propose TLV-CoRe, a CLIP-based Tactile-Language-Vision Collaborative Representation learning method. TLV-CoRe introduces a Sensor-Aware Modulator to unify tactile features across different sensors and employs tactile-irrelevant decoupled learning to disentangle irrelevant tactile features. Additionally, a Unified Bridging Adapter is introduced to enhance tri-modal interaction within the shared representation space. To fairly evaluate the effectiveness of tactile models, we further propose the RSS evaluation framework, focusing on Robustness, Synergy, and Stability across different methods. Experimental results demonstrate that TLV-CoRe significantly improves sensor-agnostic representation learning and cross-modal alignment, offering a new direction for multimodal tactile representation.
comment: Accepted by AAAI 2026
Fine-Tuning of Neural Network Approximate MPC without Retraining via Bayesian Optimization
Approximate model-predictive control (AMPC) aims to imitate an MPC's behavior with a neural network, removing the need to solve an expensive optimization problem at runtime. However, during deployment, the parameters of the underlying MPC must usually be fine-tuned. This often renders AMPC impractical as it requires repeatedly generating a new dataset and retraining the neural network. Recent work addresses this problem by adapting AMPC without retraining using approximated sensitivities of the MPC's optimization problem. Currently, this adaption must be done by hand, which is labor-intensive and can be unintuitive for high-dimensional systems. To solve this issue, we propose using Bayesian optimization to tune the parameters of AMPC policies based on experimental data. By combining model-based control with direct and local learning, our approach achieves superior performance to nominal AMPC on hardware, with minimal experimentation. This allows automatic and data-efficient adaptation of AMPC to new system instances and fine-tuning to cost functions that are difficult to directly implement in MPC. We demonstrate the proposed method in hardware experiments for the swing-up maneuver on an inverted cartpole and yaw control of an under-actuated balancing unicycle robot, a challenging control problem.
Task adaptation of Vision-Language-Action model: 1st Place Solution for the 2025 BEHAVIOR Challenge NeurIPS
We present a vision-action policy that won 1st place in the 2025 BEHAVIOR Challenge - a large-scale benchmark featuring 50 diverse long-horizon household tasks in photo-realistic simulation, requiring bimanual manipulation, navigation, and context-aware decision making. Building on the Pi0.5 architecture, we introduce several innovations. Our primary contribution is correlated noise for flow matching, which improves training efficiency and enables correlation-aware inpainting for smooth action sequences. We also apply learnable mixed-layer attention and System 2 stage tracking for ambiguity resolution. Training employs multi-sample flow matching to reduce variance, while inference uses action compression and challenge-specific correction rules. Our approach achieves 26% q-score across all 50 tasks on both public and private leaderboards.
comment: 2025 NeurIPS Behavior Challenge 1st place solution
Event Spectroscopy: Event-based Multispectral and Depth Sensing using Structured Light
Uncrewed aerial vehicles (UAVs) are increasingly deployed in forest environments for tasks such as environmental monitoring and search and rescue, which require safe navigation through dense foliage and precise data collection. Traditional sensing approaches, including passive multispectral and RGB imaging, suffer from latency, poor depth resolution, and strong dependence on ambient light - especially under forest canopies. In this work, we present a novel event spectroscopy system that simultaneously enables high-resolution, low-latency depth reconstruction with integrated multispectral imaging using a single sensor. Depth is reconstructed using structured light, and by modulating the wavelength of the projected structured light, our system captures spectral information in controlled bands between 650 nm and 850 nm. We demonstrate up to $60\%$ improvement in RMSE over commercial depth sensors and validate the spectral accuracy against a reference spectrometer and commercial multispectral cameras, demonstrating comparable performance. A portable version limited to RGB (3 wavelengths) is used to collect real-world depth and spectral data from a Masoala Rainforest. We demonstrate the use of this prototype for color image reconstruction and material differentiation between leaves and branches using spectral and depth data. Our results show that adding depth (available at no extra effort with our setup) to material differentiation improves the accuracy by over $30\%$ compared to color-only method. Our system, tested in both lab and real-world rainforest environments, shows strong performance in depth estimation, RGB reconstruction, and material differentiation - paving the way for lightweight, integrated, and robust UAV perception and data collection in complex natural environments.
comment: This work has been accepted for publication in IEEE Robotics and Automation Letters
OW-Rep: Open World Object Detection with Instance Representation Learning WACV 2026
Open World Object Detection(OWOD) addresses realistic scenarios where unseen object classes emerge, enabling detectors trained on known classes to detect unknown objects and incrementally incorporate the knowledge they provide. While existing OWOD methods primarily focus on detecting unknown objects, they often overlook the rich semantic relationships between detected objects, which are essential for scene understanding and applications in open-world environments (e.g., open-world tracking and novel class discovery). In this paper, we extend the OWOD framework to jointly detect unknown objects and learn semantically rich instance embeddings, enabling the detector to capture fine-grained semantic relationships between instances. To this end, we propose two modules that leverage the rich and generalizable knowledge of Vision Foundation Models(VFMs) and can be integrated into open-world object detectors. First, the Unknown Box Refine Module uses instance masks from the Segment Anything Model to accurately localize unknown objects. The Embedding Transfer Module then distills instance-wise semantic similarities from VFM features to the detector's embeddings via a relaxed contrastive loss, enabling the detector to learn a semantically meaningful and generalizable instance feature. Extensive experiments show that our method significantly improves both unknown object detection and instance embedding quality, while also enhancing performance in downstream tasks such as open-world tracking.
comment: Accepted to WACV 2026. Our project website can be found at https://sunohlee.github.io/OW-Rep/
Multiagent Systems
Quantifying the Lifelong Impact of Resilience Interventions via Agent-Based LLM Simulation
Establishing the long-term, causal impact of psychological interventions on life outcomes is a grand challenge for the social sciences, caught between the limitations of correlational longitudinal studies and short-term randomized controlled trials (RCTs). This paper introduces Large-Scale Agent-based Longitudinal Simulation (LALS), a framework that resolves this impasse by simulating multi-decade, counterfactual life trajectories. The methodology employs a "digital clone" design where 2,500 unique LLM-based agent personas (grounded in a curated corpus of 3,917 empirical research articles) are each cloned across a 2x2 factorial experiment. Specifically, the simulation models the efficacy of extended psychological resilience training (Intervention vs. Control) either in childhood or as a young adult (age 6 vs. age 18). Comparing digital clones enables exceptionally precise causal inference. The simulation provides a quantitative, causal estimate of a resilience intervention's lifelong effects, revealing significant reductions in mortality, a lower incidence of dementia, and a substantial increase in accumulated wealth. Crucially, the results uncover a crucial developmental window: the intervention administered at age 6 produced more than double the positive impact on lifetime wealth compared to the same intervention at age 18. These benefits were most pronounced for agents from low-socioeconomic backgrounds, highlighting a powerful buffering effect. The LALS framework serves as a "computational wind tunnel" for social science, offering a new paradigm for generating and testing causal hypotheses about the complex, lifelong dynamics that shape human capital and well-being.
comment: 31 pages, 5 figures
MemEvolve: Meta-Evolution of Agent Memory Systems
Self-evolving memory systems are unprecedentedly reshaping the evolutionary paradigm of large language model (LLM)-based agents. Prior work has predominantly relied on manually engineered memory architectures to store trajectories, distill experience, and synthesize reusable tools, enabling agents to evolve on the fly within environment interactions. However, this paradigm is fundamentally constrained by the staticity of the memory system itself: while memory facilitates agent-level evolving, the underlying memory architecture cannot be meta-adapted to diverse task contexts. To address this gap, we propose MemEvolve, a meta-evolutionary framework that jointly evolves agents' experiential knowledge and their memory architecture, allowing agent systems not only to accumulate experience but also to progressively refine how they learn from it. To ground MemEvolve in prior research and foster openness in future self-evolving systems, we introduce EvolveLab, a unified self-evolving memory codebase that distills twelve representative memory systems into a modular design space (encode, store, retrieve, manage), providing both a standardized implementation substrate and a fair experimental arena. Extensive evaluations on four challenging agentic benchmarks demonstrate that MemEvolve achieves (I) substantial performance gains, improving frameworks such as SmolAgent and Flash-Searcher by up to $17.06\%$; and (II) strong cross-task and cross-LLM generalization, designing memory architectures that transfer effectively across diverse benchmarks and backbone models.
Explainable and Fine-Grained Safeguarding of LLM Multi-Agent Systems via Bi-Level Graph Anomaly Detection
Large language model (LLM)-based multi-agent systems (MAS) have shown strong capabilities in solving complex tasks. As MAS become increasingly autonomous in various safety-critical tasks, detecting malicious agents has become a critical security concern. Although existing graph anomaly detection (GAD)-based defenses can identify anomalous agents, they mainly rely on coarse sentence-level information and overlook fine-grained lexical cues, leading to suboptimal performance. Moreover, the lack of interpretability in these methods limits their reliability and real-world applicability. To address these limitations, we propose XG-Guard, an explainable and fine-grained safeguarding framework for detecting malicious agents in MAS. To incorporate both coarse and fine-grained textual information for anomalous agent identification, we utilize a bi-level agent encoder to jointly model the sentence- and token-level representations of each agent. A theme-based anomaly detector further captures the evolving discussion focus in MAS dialogues, while a bi-level score fusion mechanism quantifies token-level contributions for explanation. Extensive experiments across diverse MAS topologies and attack scenarios demonstrate robust detection performance and strong interpretability of XG-Guard.
comment: 14 pages, 3 tables, 5 figures
A Multi-agent Text2SQL Framework using Small Language Models and Execution Feedback
Text2SQL, the task of generating SQL queries from natural language text, is a critical challenge in data engineering. Recently, Large Language Models (LLMs) have demonstrated superior performance for this task due to their advanced comprehension and generation capabilities. However, privacy and cost considerations prevent companies from using Text2SQL solutions based on external LLMs offered as a service. Rather, small LLMs (SLMs) that are openly available and can hosted in-house are adopted. These SLMs, in turn, lack the generalization capabilities of larger LLMs, which impairs their effectiveness for complex tasks such as Text2SQL. To address these limitations, we propose MATS, a novel Text2SQL framework designed specifically for SLMs. MATS uses a multi-agent mechanism that assigns specialized roles to auxiliary agents, reducing individual workloads and fostering interaction. A training scheme based on reinforcement learning aligns these agents using feedback obtained during execution, thereby maintaining competitive performance despite a limited LLM size. Evaluation results using on benchmark datasets show that MATS, deployed on a single- GPU server, yields accuracy that are on-par with large-scale LLMs when using significantly fewer parameters. Our source code and data are available at https://github.com/thanhdath/mats-sql.
DASH: Deception-Augmented Shared Mental Model for a Human-Machine Teaming System
We present DASH (Deception-Augmented Shared mental model for Human-machine teaming), a novel framework that enhances mission resilience by embedding proactive deception into Shared Mental Models (SMM). Designed for mission-critical applications such as surveillance and rescue, DASH introduces "bait tasks" to detect insider threats, e.g., compromised Unmanned Ground Vehicles (UGVs), AI agents, or human analysts, before they degrade team performance. Upon detection, tailored recovery mechanisms are activated, including UGV system reinstallation, AI model retraining, or human analyst replacement. In contrast to existing SMM approaches that neglect insider risks, DASH improves both coordination and security. Empirical evaluations across four schemes (DASH, SMM-only, no-SMM, and baseline) show that DASH sustains approximately 80% mission success under high attack rates, eight times higher than the baseline. This work contributes a practical human-AI teaming framework grounded in shared mental models, a deception-based strategy for insider threat detection, and empirical evidence of enhanced robustness under adversarial conditions. DASH establishes a foundation for secure, adaptive human-machine teaming in contested environments.
comment: 17 pages, 16 figures
Adaptive Accountability in Networked MAS: Tracing and Mitigating Emergent Norms at Scale
Large-scale networked multi-agent systems increasingly underpin critical infrastructure, yet their collective behavior can drift toward undesirable emergent norms that elude conventional governance mechanisms. We introduce an adaptive accountability framework that (i) continuously traces responsibility flows through a lifecycle-aware audit ledger, (ii) detects harmful emergent norms online via decentralized sequential hypothesis tests, and (iii) deploys local policy and reward-shaping interventions that realign agents with system-level objectives in near real time. We prove a bounded-compromise theorem showing that whenever the expected intervention cost exceeds an adversary's payoff, the long-run proportion of compromised interactions is bounded by a constant strictly less than one. Extensive high-performance simulations with up to 100 heterogeneous agents, partial observability, and stochastic communication graphs show that our framework prevents collusion and resource hoarding in at least 90% of configurations, boosts average collective reward by 12-18%, and lowers the Gini inequality index by up to 33% relative to a PPO baseline. These results demonstrate that a theoretically principled accountability layer can induce ethically aligned, self-regulating behavior in complex MAS without sacrificing performance or scalability.
ShibuyaSocial: Multi-scale Model of Pedestrian Flows in Scramble Crossing
This paper presents a learning-based model of pedestrian flows that integrates multi scale behaviors such as global route selection and local collision avoidance in urban spaces, particularly focusing on pedestrian movements at Shibuya scramble crossing. Since too much congestion of pedestrian flows can cause serious accidents, mathematically modeling and predicting pedestrian behaviors is important for preventing such accidents and providing a safe and comfortable environment. Although numerous studies have investigated learning-based modeling methods, most of them focus only on the local behavior of pedestrians, such as collision avoidance with neighbors and environmental objects. In an actual environment, pedestrian behavior involves more complicated decision making including global route selection. Moreover, a state transition from stopping to walking at a traffic light should be considered simultaneously. In this study, the proposed model integrates local behaviors with global route selection, using an Attention mechanism to ensure consistent global and local behavior predictions. We recorded video data of pedestrians at Shibuya scramble crossing and trained the proposed model using pedestrian walking trajectory data obtained from the video. Simulations of pedestrian behaviors based on the trained model qualitatively and quantitatively validated that the proposed model can appropriately predict pedestrian behaviors.
Learning to Design City-scale Transit Routes
Designing efficient transit route networks is an NP-hard problem with exponentially large solution spaces that traditionally relies on manual planning processes. We present an end-to-end reinforcement learning (RL) framework based on graph attention networks for sequential transit network construction. To address the long-horizon credit assignment challenge, we introduce a two-level reward structure combining incremental topological feedback with simulation-based terminal rewards. We evaluate our approach on a new real-world dataset from Bloomington, Indiana with topologically accurate road networks, census-derived demand, and existing transit routes. Our learned policies substantially outperform existing designs and traditional heuristics across two initialization schemes and two modal-split scenarios. Under high transit adoption with transit center initialization, our approach achieves 25.6% higher service rates, 30.9\% shorter wait times, and 21.0% better bus utilization compared to the real-world network. Under mixed-mode conditions with random initialization, it delivers 68.8% higher route efficiency than demand coverage heuristics and 5.9% lower travel times than shortest path construction. These results demonstrate that end-to-end RL can design transit networks that substantially outperform both human-designed systems and hand-crafted heuristics on realistic city-scale benchmarks.
Multi-Agent LLM Committees for Autonomous Software Beta Testing
Manual software beta testing is costly and time-consuming, while single-agent large language model (LLM) approaches suffer from hallucinations and inconsistent behavior. We propose a multi-agent committee framework in which diverse vision-enabled LLMs collaborate through a three-round voting protocol to reach consensus on testing actions. The framework combines model diversity, persona-driven behavioral variation, and visual user interface understanding to systematically explore web applications. Across 84 experimental runs with 9 testing personas and 4 scenarios, multi-agent committees achieve an 89.5 percent overall task success rate. Configurations with 2 to 4 agents reach 91.7 to 100 percent success, compared to 78.0 percent for single-agent baselines, yielding improvements of 13.7 to 22.0 percentage points. At the action level, the system attains a 93.1 percent success rate with a median per-action latency of 0.71 seconds, enabling real-time and continuous integration testing. Vision-enabled agents successfully identify user interface elements, with navigation and reporting achieving 100 percent success and form filling achieving 99.2 percent success. We evaluate the framework on WebShop and OWASP benchmarks, achieving 74.7 percent success on WebShop compared to a 50.1 percent published GPT-3 baseline, and 82.0 percent success on OWASP Juice Shop security testing with coverage of 8 of the 10 OWASP Top 10 vulnerability categories. Across 20 injected regressions, the committee achieves an F1 score of 0.91 for bug detection, compared to 0.78 for single-agent baselines. The open-source implementation enables reproducible research and practical deployment of LLM-based software testing in CI/CD pipelines.
A Modular Reference Architecture for MCP-Servers Enabling Agentic BIM Interaction
Agentic workflows driven by large language models (LLMs) are increasingly applied to Building Information Modelling (BIM), enabling natural-language retrieval, modification and generation of IFC models. Recent work has begun adopting the emerging Model Context Protocol (MCP) as a uniform tool-calling interface for LLMs, simplifying the agent side of BIM interaction. While MCP standardises how LLMs invoke tools, current BIM-side implementations are still authoring tool-specific and ad hoc, limiting reuse, evaluation, and workflow portability across environments. This paper addresses this gap by introducing a modular reference architecture for MCP servers that enables API-agnostic, isolated and reproducible agentic BIM interactions. From a systematic analysis of recurring capabilities in recent literature, we derive a core set of requirements. These inform a microservice architecture centred on an explicit adapter contract that decouples the MCP interface from specific BIM-APIs. A prototype implementation using IfcOpenShell demonstrates feasibility across common modification and generation tasks. Evaluation across representative scenarios shows that the architecture enables reliable workflows, reduces coupling, and provides a reusable foundation for systematic research.
comment: Submitted to the GNI Symposium on Artificial Intelligence for the Built World (Technical University of Munich, May 18--20, 2026)
Simulation of Crowd Egress with Environmental Stressors
This article introduces a modeling framework to characterize evacuee response to environmental stimuli during emergency egress. The model is developed in consistency with stress theory, which explains how an organism reacts to environmental stressors (e.g., alarm signals or hazardous factors such as smoke and fire). We integrate the theory into the well-known social force model, and develop a framework to simulate crowd evacuation behavior in multi-compartment layout in buildings. Our method serves as a theoretical basis to study crowd movement at bottlenecks, and simulate their herding behavior and way-finding activities in normal and hazardous conditions. The pre-movement behavior is also briefly investigated by using opinion dynamics with a social group model. The algorithms have been partly tested in FDS+EVAC as well as our simulation platform crowdEgress.
comment: 30 pages, 19 figures
Systems and Control (EESS)
Distribution Network Restoration with Mobile Resources Dispatch: A Simulation-Based Online Dynamic Programming Approach
Dispatching mobile resources such as repair crews and mobile emergency generators is essential for the rapid restoration of distribution systems after extreme events. However, the restoration process is affected by various uncertain factors including repair time, road condition, and newly observed failures, necessitating online decision-making in response to real-time information. This paper proposes a simulation-based online dynamic programming approach to provide real-time restoration actions considering the dispatch of mobile resources. Using an index-based priority rule as the base policy, the remaining cumulative loss at the current state and a given action is evaluated from online simulation. As the base policy is explicit, the simulation is efficient. Then, the action leading to the minimum cumulative loss is chosen. It is proven that such a strategy improves the performance of base policy. The proposed policy adapts to real-time information updates and does not rely on offline training, so incurs no data and convergence-related issues, which is important in restoration tasks where the historical data of extreme events is rare. The rolling optimization approach may not meet the requirement of online use, because routing mobile resources gives rise to large-scale discrete optimization problems. Case studies on 123-bus and 8500-bus systems demonstrate that the proposed method achieves higher efficiency and better performance compared with rolling horizon optimization.
Real-Time Remote Monitoring of Correlated Markovian Sources
We investigate real-time tracking of two correlated stochastic processes over a shared wireless channel. The joint evolution of the processes is modeled as a two-dimensional discrete-time Markov chain. Each process is observed by a dedicated sampler and independently reconstructed at a remote monitor according to a task-specific objective. Although both processes originate from a common underlying phenomenon (e.g., distinct features of the same source), each monitor is interested only in its corresponding feature. A reconstruction error is incurred when the true and reconstructed states mismatch at one or both monitors. To address this problem, we propose an error-aware joint sampling and transmission policy, under which each sampler probabilistically generates samples only when the current process state differs from the most recently reconstructed state at its corresponding monitor. We adopt the time-averaged reconstruction error as the primary performance metric and benchmark the proposed policy against state-of-the-art joint sampling and transmission schemes. For each policy, we derive closed-form expressions for the resulting time-averaged reconstruction error. We further formulate and solve an optimization problem that minimizes the time-averaged reconstruction error subject to an average sampling cost constraint. Analytical and numerical results demonstrate that the proposed error-aware policy achieves the minimum time-averaged reconstruction error among the considered schemes while efficiently utilizing the sampling budget. The performance gains are particularly pronounced in regimes with strong inter-process correlation and stringent tracking requirements, where frequent sampling by both samplers is necessary.
Smart nudging for efficient routing through networks
In this paper, we formulate the design of efficient digitalised deposit return schemes as a control problem. We focus on the recycling of paper cups, though the proposed methodology applies more broadly to reverse logistics systems arising in circular economy R-strategies. Each item is assumed to carry a digital wallet through which monetary rewards are allocated to actors transferring the item across successive stages, incentivising completion of the recycling process. System efficiency is ensured by: (i) decentralised algorithms that avoid congestion at individual nodes; (ii) a decentralised AIMD-based algorithm that optimally splits the deposit across layers; and (iii) a feedback control loop that dynamically adjusts the deposit to achieve a desired throughput. The effectiveness of the framework is demonstrated through extensive simulations using realistic paper cup recycling data.
Comparing Dynamical Models Through Diffeomorphic Vector Field Alignment
Dynamical systems models such as recurrent neural networks (RNNs) are increasingly popular in theoretical neuroscience for hypothesis-generation and data analysis. Evaluating the dynamics in such models is key to understanding their learned generative mechanisms. However, such evaluation is impeded by two major challenges: First, comparison of learned dynamics across models is difficult because there is no enforced equivalence of their coordinate systems. Second, identification of mechanistically important low-dimensional motifs (e.g., limit sets) is intractable in high-dimensional nonlinear models such as RNNs. Here, we propose a comprehensive framework to address these two issues, termed Diffeomorphic vector field alignment FOR learned Models (DFORM). DFORM learns a nonlinear coordinate transformation between the state spaces of two dynamical systems, which aligns their trajectories in a maximally one-to-one manner. In so doing, DFORM enables an assessment of whether two models exhibit topological equivalence, i.e., similar mechanisms despite differences in coordinate systems. A byproduct of this method is a means to locate dynamical motifs on low-dimensional manifolds embedded within higher-dimensional systems. We verified DFORM's ability to identify linear and nonlinear coordinate transformations using canonical topologically equivalent systems, RNNs, and systems related by nonlinear flows. DFORM was also shown to provide a quantification of similarity between topologically distinct systems. We then demonstrated that DFORM can locate important dynamical motifs including invariant manifolds and saddle limit sets within high-dimensional models. Finally, using a set of RNN models trained on human functional MRI (fMRI) recordings, we illustrated that DFORM can identify limit cycles from high-dimensional data-driven models, which agreed well with prior numerical analysis.
comment: 57 pages, 18 figures. For associated code, see https://github.com/rq-Chen/DFORM_stable
Distributionally Robust Multi-Agent Reinforcement Learning for Intelligent Traffic Control
Learning-based traffic signal control is typically optimized for average performance under a few nominal demand patterns, which can result in poor behavior under atypical traffic conditions. To address this, we develop a distributionally robust multi-agent reinforcement learning framework for signal control on a 3x3 urban grid calibrated from a contiguous 3x3 subarea of central Athens covered by the pNEUMA trajectory dataset (Barmpounakis and Geroliminis, 2020). Our approach proceeds in three stages. First, we train a baseline multi-agent RL controller in which each intersection is governed by a proximal policy optimization agent with discrete signal phases, using a centralized training, decentralized execution paradigm. Second, to capture demand uncertainty, we construct eight heterogeneous origin-destination-based traffic scenarios-one directly derived from pNEUMA and seven synthetically generated-to span a wide range of spatial and temporal demand patterns. Over this scenario set, we train a contextual-bandit worst-case estimator that assigns mixture weights to estimate adversarial demand distributions conditioned on context. Finally, without modifying the controller architecture, we fine-tune the baseline multi-agent reinforcement learning agents under these estimated worst-case mixtures to obtain a distributionally robust multi-agent reinforcement learning controller. Across all eight scenarios, as well as on an unseen validation network based on the Sioux Falls configuration, the distributionally robust multi-agent reinforcement learning controller consistently reduces horizon-averaged queues and increases average speeds relative to the baseline, achieving up to 51% shorter queues and 38% higher speeds on the worst-performing scenarios.
Improving the Accuracy of DC Optimal Power Flow Formulations via Parameter Optimization
DC Optimal Power Flow (DC-OPF) problems optimize the generators' active power setpoints while satisfying constraints based on the DC power flow linearization. The computational tractability advantages of DC-OPF problems come at the expense of inaccuracies relative to AC Optimal Power Flow (AC-OPF) problems that accurately model the nonlinear steady-state behavior of power grids. This paper proposes an algorithm that significantly improves the accuracy of the generators' active power setpoints from DC-OPF problems with respect to the corresponding AC-OPF problems over a specified range of operating conditions. Using sensitivity information in a machine learning-inspired methodology, this algorithm tunes coefficient and bias parameters in the DC power flow approximation to improve the accuracy of the resulting DC-OPF solutions. Employing the Truncated Newton Conjugate-Gradient (TNC) method, a Quasi-Newton optimization technique, this parameter tuning occurs during an offline training phase, with the resulting parameters then used in online computations. Numerical results underscore the algorithm's efficacy with accuracy improvements in squared two-norm and $\infty$-norm losses of up to $90\%$ and $79\%$, respectively, relative to traditional DC-OPF formulations.
Fine-Tuning of Neural Network Approximate MPC without Retraining via Bayesian Optimization
Approximate model-predictive control (AMPC) aims to imitate an MPC's behavior with a neural network, removing the need to solve an expensive optimization problem at runtime. However, during deployment, the parameters of the underlying MPC must usually be fine-tuned. This often renders AMPC impractical as it requires repeatedly generating a new dataset and retraining the neural network. Recent work addresses this problem by adapting AMPC without retraining using approximated sensitivities of the MPC's optimization problem. Currently, this adaption must be done by hand, which is labor-intensive and can be unintuitive for high-dimensional systems. To solve this issue, we propose using Bayesian optimization to tune the parameters of AMPC policies based on experimental data. By combining model-based control with direct and local learning, our approach achieves superior performance to nominal AMPC on hardware, with minimal experimentation. This allows automatic and data-efficient adaptation of AMPC to new system instances and fine-tuning to cost functions that are difficult to directly implement in MPC. We demonstrate the proposed method in hardware experiments for the swing-up maneuver on an inverted cartpole and yaw control of an under-actuated balancing unicycle robot, a challenging control problem.
Robotics
Systematic Benchmarking of SUMO Against Data-Driven Traffic Simulators
This paper presents a systematic benchmarking of the model-based microscopic traffic simulator SUMO against state-of-the-art data-driven traffic simulators using large-scale real-world datasets. Using the Waymo Open Motion Dataset (WOMD) and the Waymo Open Sim Agents Challenge (WOSAC), we evaluate SUMO under both short-horizon (8s) and long-horizon (60s) closed-loop simulation settings. To enable scalable evaluation, we develop Waymo2SUMO, an automated pipeline that converts WOMD scenarios into SUMO simulations. On the WOSAC benchmark, SUMO achieves a realism meta metric of 0.653 while requiring fewer than 100 tunable parameters. Extended rollouts show that SUMO maintains low collision and offroad rates and exhibits stronger long-horizon stability than representative data-driven simulators. These results highlight complementary strengths of model-based and data-driven approaches for autonomous driving simulation and benchmarking.
comment: Source code is available at https://github.com/LuminousLamp/SUMO-Benchmark
STORM: Search-Guided Generative World Models for Robotic Manipulation
We present STORM (Search-Guided Generative World Models), a novel framework for spatio-temporal reasoning in robotic manipulation that unifies diffusion-based action generation, conditional video prediction, and search-based planning. Unlike prior Vision-Language-Action (VLA) models that rely on abstract latent dynamics or delegate reasoning to language components, STORM grounds planning in explicit visual rollouts, enabling interpretable and foresight-driven decision-making. A diffusion-based VLA policy proposes diverse candidate actions, a generative video world model simulates their visual and reward outcomes, and Monte Carlo Tree Search (MCTS) selectively refines plans through lookahead evaluation. Experiments on the SimplerEnv manipulation benchmark demonstrate that STORM achieves a new state-of-the-art average success rate of 51.0 percent, outperforming strong baselines such as CogACT. Reward-augmented video prediction substantially improves spatio-temporal fidelity and task relevance, reducing Frechet Video Distance by over 75 percent. Moreover, STORM exhibits robust re-planning and failure recovery behavior, highlighting the advantages of search-guided generative world models for long-horizon robotic manipulation.
comment: Under submission
When Robots Say No: The Empathic Ethical Disobedience Benchmark
Robots must balance compliance with safety and social expectations as blind obedience can cause harm, while over-refusal erodes trust. Existing safe reinforcement learning (RL) benchmarks emphasize physical hazards, while human-robot interaction trust studies are small-scale and hard to reproduce. We present the Empathic Ethical Disobedience (EED) Gym, a standardized testbed that jointly evaluates refusal safety and social acceptability. Agents weigh risk, affect, and trust when choosing to comply, refuse (with or without explanation), clarify, or propose safer alternatives. EED Gym provides different scenarios, multiple persona profiles, and metrics for safety, calibration, and refusals, with trust and blame models grounded in a vignette study. Using EED Gym, we find that action masking eliminates unsafe compliance, while explanatory refusals help sustain trust. Constructive styles are rated most trustworthy, empathic styles -- most empathic, and safe RL methods improve robustness but also make agents more prone to overly cautious behavior. We release code, configurations, and reference policies to enable reproducible evaluation and systematic human-robot interaction research on refusal and trust. At submission time, we include an anonymized reproducibility package with code and configs, and we commit to open-sourcing the full repository after the paper is accepted.
comment: Accepted at the ACM/IEEE International Conference on Human-Robot Interaction (HRI 2026). This is a preprint of the author-accepted manuscript
AOMGen: Photoreal, Physics-Consistent Demonstration Generation for Articulated Object Manipulation
Recent advances in Vision-Language-Action (VLA) and world-model methods have improved generalization in tasks such as robotic manipulation and object interaction. However, Successful execution of such tasks depends on large, costly collections of real demonstrations, especially for fine-grained manipulation of articulated objects. To address this, we present AOMGen, a scalable data generation framework for articulated manipulation which is instantiated from a single real scan, demonstration and a library of readily available digital assets, yielding photoreal training data with verified physical states. The framework synthesizes synchronized multi-view RGB temporally aligned with action commands and state annotations for joints and contacts, and systematically varies camera viewpoints, object styles, and object poses to expand a single execution into a diverse corpus. Experimental results demonstrate that fine-tuning VLA policies on AOMGen data increases the success rate from 0% to 88.7%, and the policies are tested on unseen objects and layouts.
Learning Semantic Atomic Skills for Multi-Task Robotic Manipulation
While imitation learning has shown impressive results in single-task robot manipulation, scaling it to multi-task settings remains a fundamental challenge due to issues such as suboptimal demonstrations, trajectory noise, and behavioral multi-modality. Existing skill-based methods attempt to address this by decomposing actions into reusable abstractions, but they often rely on fixed-length segmentation or environmental priors that limit semantic consistency and cross-task generalization. In this work, we propose AtomSkill, a novel multi-task imitation learning framework that learns and leverages a structured Atomic Skill Space for composable robot manipulation. Our approach is built on two key technical contributions. First, we construct a Semantically Grounded Atomic Skill Library by partitioning demonstrations into variable-length skills using gripper-state keyframe detection and vision-language model annotation. A contrastive learning objective ensures the resulting skill embeddings are both semantically consistent and temporally coherent. Second, we propose an Action Generation module with Keypose Imagination, which jointly predicts a skill's long-horizon terminal keypose and its immediate action sequence. This enables the policy to reason about overarching motion goals and fine-grained control simultaneously, facilitating robust skill chaining. Extensive experiments in simulated and real-world environments show that AtomSkill consistently outperforms state-of-the-art methods across diverse manipulation tasks.
Dynamic Entropy Tuning in Reinforcement Learning Low-Level Quadcopter Control: Stochasticity vs Determinism
This paper explores the impact of dynamic entropy tuning in Reinforcement Learning (RL) algorithms that train a stochastic policy. Its performance is compared against algorithms that train a deterministic one. Stochastic policies optimize a probability distribution over actions to maximize rewards, while deterministic policies select a single deterministic action per state. The effect of training a stochastic policy with both static entropy and dynamic entropy and then executing deterministic actions to control the quadcopter is explored. It is then compared against training a deterministic policy and executing deterministic actions. For the purpose of this research, the Soft Actor-Critic (SAC) algorithm was chosen for the stochastic algorithm while the Twin Delayed Deep Deterministic Policy Gradient (TD3) was chosen for the deterministic algorithm. The training and simulation results show the positive effect the dynamic entropy tuning has on controlling the quadcopter by preventing catastrophic forgetting and improving exploration efficiency.
comment: This is the Author Accepted Manuscript version of a paper accepted for publication. The final published version is available via IEEE Xplore
Reinforcement Learning Position Control of a Quadrotor Using Soft Actor-Critic (SAC)
This paper proposes a new Reinforcement Learning (RL) based control architecture for quadrotors. With the literature focusing on controlling the four rotors' RPMs directly, this paper aims to control the quadrotor's thrust vector. The RL agent computes the percentage of overall thrust along the quadrotor's z-axis along with the desired Roll ($φ$) and Pitch ($θ$) angles. The agent then sends the calculated control signals along with the current quadrotor's Yaw angle ($ψ$) to an attitude PID controller. The PID controller then maps the control signals to motor RPMs. The Soft Actor-Critic algorithm, a model-free off-policy stochastic RL algorithm, was used to train the RL agents. Training results show the faster training time of the proposed thrust vector controller in comparison to the conventional RPM controllers. Simulation results show smoother and more accurate path-following for the proposed thrust vector controller.
comment: This is the Author Accepted Manuscript version of a paper accepted for publication. The final published version is available via IEEE Xplore
Deterministic Reconstruction of Tennis Serve Mechanics: From Aerodynamic Constraints to Internal Torques via Rigid-Body Dynamics
Most conventional studies on tennis serve biomechanics rely on phenomenological observations comparing professional and amateur players or, more recently, on AI-driven statistical analyses of motion data. While effective at describing \textit{what} elite players do, these approaches often fail to explain \textit{why} such motions are physically necessary from a mechanistic perspective. This paper proposes a deterministic, physics-based approach to the tennis serve using a 12-degree-of-freedom multi-segment model of the human upper body. Rather than fitting the model to motion capture data, we solve the inverse kinematics problem via trajectory optimization to rigorously satisfy the aerodynamic boundary conditions required for Flat, Slice, and Kick serves. We subsequently perform an inverse dynamics analysis based on the Principle of Virtual Power to compute the net joint torques. The simulation results reveal that while the kinematic trajectories for different serves may share visual similarities, the underlying kinetic profiles differ drastically. A critical finding is that joints exhibiting minimal angular displacement (kinematically ``quiet'' phases), particularly at the wrist, require substantial and highly time-varying torques to counteract gravitational loading and dynamic coupling effects. By elucidating the dissociation between visible kinematics and internal kinetics, this study provides a first-principles framework for understanding the mechanics of the tennis serve, moving beyond simple imitation of elite techniques.
comment: 9 figures
On The Computational Complexity for Minimizing Aerial Photographs for Full Coverage of a Planar Region
With the popularity of drone technologies, aerial photography have become prevalent in many daily scenarios such as environment monitoring, structure inspection, law enforcement etc. A central challenge in this domain is the efficient coverage of a target area with photographs that can entirely capture the region, while respecting constraints such as the image resolution, and limited number of pictures that can be taken. This work investigates the computational complexity of several fundamental problems arised from this challenge. By abstracting the aerial photography problem into the coverage problems in computational geometry, we demonstrate that most of these problems are in fact computationally intractable, with the implication that traditional algorithms cannot solve them efficiently. The intuitions of this work can extend beyond aerial photography to broader applications such as pesticide spraying, and strategic sensor placement.
Joint Learning of Depth, Pose, and Local Radiance Field for Large Scale Monocular 3D Reconstruction
Photorealistic 3-D reconstruction from monocular video collapses in large-scale scenes when depth, pose, and radiance are solved in isolation: scale-ambiguous depth yields ghost geometry, long-horizon pose drift corrupts alignment, and a single global NeRF cannot model hundreds of metres of content. We introduce a joint learning framework that couples all three factors and demonstrably overcomes each failure case. Our system begins with a Vision-Transformer (ViT) depth network trained with metric-scale supervision, giving globally consistent depths despite wide field-of-view variations. A multi-scale feature bundle-adjustment (BA) layer refines camera poses directly in feature space--leveraging learned pyramidal descriptors instead of brittle keypoints--to suppress drift on unconstrained trajectories. For scene representation, we deploy an incremental local-radiance-field hierarchy: new hash-grid NeRFs are allocated and frozen on-the-fly when view overlap falls below a threshold, enabling city-block-scale coverage on a single GPU. Evaluated on the Tanks and Temples benchmark, our method reduces Absolute Trajectory Error to 0.001-0.021 m across eight indoor-outdoor sequences--up to 18x lower than BARF and 2x lower than NoPe-NeRF--while maintaining sub-pixel Relative Pose Error. These results demonstrate that metric-scale, drift-free 3-D reconstruction and high-fidelity novel-view synthesis are achievable from a single uncalibrated RGB camera.
comment: 8 pages, 2 figures, 2 tables
Fractional-order Modeling for Nonlinear Soft Actuators via Particle Swarm Optimization
Modeling soft pneumatic actuators with high precision remains a fundamental challenge due to their highly nonlinear and compliant characteristics. This paper proposes an innovative modeling framework based on fractional-order differential equations (FODEs) to accurately capture the dynamic behavior of soft materials. The unknown parameters within the fractional-order model are identified using particle swarm optimization (PSO), enabling parameter estimation directly from experimental data without reliance on pre-established material databases or empirical constitutive laws. The proposed approach effectively represents the complex deformation phenomena inherent in soft actuators. Experimental results validate the accuracy and robustness of the developed model, demonstrating improvement in predictive performance compared to conventional modeling techniques. The presented framework provides a data-efficient and database-independent solution for soft actuator modeling, advancing the precision and adaptability of soft robotic system design.
comment: 6 pages, 4 figures
LLaViDA: A Large Language Vision Driving Assistant for Explicit Reasoning and Enhanced Trajectory Planning
Trajectory planning is a fundamental yet challenging component of autonomous driving. End-to-end planners frequently falter under adverse weather, unpredictable human behavior, or complex road layouts, primarily because they lack strong generalization or few-shot capabilities beyond their training data. We propose LLaViDA, a Large Language Vision Driving Assistant that leverages a Vision-Language Model (VLM) for object motion prediction, semantic grounding, and chain-of-thought reasoning for trajectory planning in autonomous driving. A two-stage training pipeline--supervised fine-tuning followed by Trajectory Preference Optimization (TPO)--enhances scene understanding and trajectory planning by injecting regression-based supervision, produces a powerful "VLM Trajectory Planner for Autonomous Driving." On the NuScenes benchmark, LLaViDA surpasses state-of-the-art end-to-end and other recent VLM/LLM-based baselines in open-loop trajectory planning task, achieving an average L2 trajectory error of 0.31 m and a collision rate of 0.10% on the NuScenes test set. The code for this paper is available at GitHub.
Alternating Minimization for Time-Shifted Synergy Extraction in Human Hand Coordination
Identifying motor synergies -- coordinated hand joint patterns activated at task-dependent time shifts -- from kinematic data is central to motor control and robotics. Existing two-stage methods first extract candidate waveforms (via SVD) and then select shifted templates using sparse optimization, requiring at least two datasets and complicating data collection. We introduce an optimization-based framework that jointly learns a small set of synergies and their sparse activation coefficients. The formulation enforces group sparsity for synergy selection and element-wise sparsity for activation timing. We develop an alternating minimization method in which coefficient updates decouple across tasks and synergy updates reduce to regularized least-squares problems. Our approach requires only a single data set, and simulations show accurate velocity reconstruction with compact, interpretable synergies.
comment: 7 pages, 5 figures
On Swarm Leader Identification using Probing Policies
Identifying the leader within a robotic swarm is crucial, especially in adversarial contexts where leader concealment is necessary for mission success. This work introduces the interactive Swarm Leader Identification (iSLI) problem, a novel approach where an adversarial probing agent identifies a swarm's leader by physically interacting with its members. We formulate the iSLI problem as a Partially Observable Markov Decision Process (POMDP) and employ Deep Reinforcement Learning, specifically Proximal Policy Optimization (PPO), to train the prober's policy. The proposed approach utilizes a novel neural network architecture featuring a Timed Graph Relationformer (TGR) layer combined with a Simplified Structured State Space Sequence (S5) model. The TGR layer effectively processes graph-based observations of the swarm, capturing temporal dependencies and fusing relational information using a learned gating mechanism to generate informative representations for policy learning. Extensive simulations demonstrate that our TGR-based model outperforms baseline graph neural network architectures and exhibits significant zero-shot generalization capabilities across varying swarm sizes and speeds different from those used during training. The trained prober achieves high accuracy in identifying the leader, maintaining performance even in out-of-training distribution scenarios, and showing appropriate confidence levels in its predictions. Real-world experiments with physical robots further validate the approach, confirming successful sim-to-real transfer and robustness to dynamic changes, such as unexpected agent disconnections.
comment: 13 pages, journal
Structural Integrality in Task Assignment and Path Finding via Total Unimodularity of Petri Net Models
Task Assignment and Path Finding (TAPF) concerns computing collision-free motions for multiple robots while jointly selecting goal locations. In this paper, safety is enforced by requiring unit-capacity traversal between successive intermediate markings, yielding coordination strategies that are valid independently of any specific time interpretation. Existing optimization-based approaches typically rely on time-expanded network-flow models, which result in large mixed-integer programs and limited scalability. We instead develop a Petri net (PN)-based optimization framework that exploits structural properties of the motion model to improve computational efficiency without explicit time expansion. When robot motion is modeled by strongly connected state-machine PNs, we show that, once the congestion level (equivalently, the synchronization depth) is fixed to an integer value, the resulting motion-planning constraint matrix is totally unimodular. Consequently, the corresponding LP relaxation admits integral optimal solutions for the motion variables. When the estimated congestion exceeds one, we introduce a synchronization-on-demand mechanism based on intermediate markings; for a fixed number of synchronization stages, the associated constraint matrices remain totally unimodular, thereby preserving integrality of the motion variables. Finally, we extend TAPF to Boolean specifications over regions of interest and propose a two-stage LP/mixed-integer linear programming (MILP) scheme in which integrality is confined to task-selection variables. Simulations on large benchmarks demonstrate substantial scalability improvements over time-expanded optimization baselines.
AnyNav: Visual Neuro-Symbolic Friction Learning for Off-road Navigation
Off-road navigation is critical for a wide range of field robotics applications from planetary exploration to disaster response. However, it remains a longstanding challenge due to unstructured environments and the inherently complex terrain-vehicle interactions. Traditional physics-based methods struggle to accurately capture the nonlinear dynamics underlying these interactions, while purely data-driven approaches often overfit to specific motion patterns, vehicle geometries, or platforms, limiting their generalization in diverse, real-world scenarios. To address these limitations, we introduce AnyNav, a vision-based friction estimation and navigation framework grounded in neuro-symbolic principles. Our approach integrates neural networks for visual perception with symbolic physical models for reasoning about terrain-vehicle dynamics. To enable self-supervised learning in real-world settings, we adopt the imperative learning paradigm, employing bilevel optimization to train the friction network through physics-based optimization. This explicit incorporation of physical reasoning substantially enhances generalization across terrains, vehicle types, and operational conditions. Leveraging the predicted friction coefficients, we further develop a physics-informed navigation system capable of generating physically feasible, time-efficient paths together with corresponding speed profiles. We demonstrate that AnyNav seamlessly transfers from simulation to real-world robotic platforms, exhibiting strong robustness across different four-wheeled vehicles and diverse off-road environments.
Py-DiSMech: A Scalable and Efficient Framework for Discrete Differential Geometry-Based Modeling and Control of Soft Robots
High-fidelity simulation has become essential to the design and control of soft robots, where large geometric deformations and complex contact interactions challenge conventional modeling tools. Recent advances in the field demand simulation frameworks that combine physical accuracy, computational scalability, and seamless integration with modern control and optimization pipelines. In this work, we present Py-DiSMech, a Python-based, open-source simulation framework for modeling and control of soft robotic structures grounded in the principles of Discrete Differential Geometry (DDG). By discretizing geometric quantities such as curvature and strain directly on meshes, Py-DiSMech captures the nonlinear deformation of rods, shells, and hybrid structures with high fidelity and reduced computational cost. The framework introduces (i) a fully vectorized NumPy implementation achieving order-of-magnitude speed-ups over existing geometry-based simulators; (ii) a penalty-energy-based fully implicit contact model that supports rod-rod, rod-shell, and shell-shell interactions; (iii) a natural-strain-based feedback-control module featuring a proportional-integral (PI) controller for shape regulation and trajectory tracking; and (iv) a modular, object-oriented software design enabling user-defined elastic energies, actuation schemes, and integration with machine-learning libraries. Benchmark comparisons demonstrate that Py-DiSMech substantially outperforms the state-of-the-art simulator Elastica in computational efficiency while maintaining physical accuracy. Together, these features establish Py-DiSMech as a scalable, extensible platform for simulation-driven design, control validation, and sim-to-real research in soft robotics.
comment: Software: https://github.com/structuresComp/dismech-python Supplementary Video: https://youtu.be/AfFcoAZR0Go
Iterative Compositional Data Generation for Robot Control
Collecting robotic manipulation data is expensive, making it impractical to acquire demonstrations for the combinatorially large space of tasks that arise in multi-object, multi-robot, and multi-environment settings. While recent generative models can synthesize useful data for individual tasks, they do not exploit the compositional structure of robotic domains and struggle to generalize to unseen task combinations. We propose a semantic compositional diffusion transformer that factorizes transitions into robot-, object-, obstacle-, and objective-specific components and learns their interactions through attention. Once trained on a limited subset of tasks, we show that our model can zero-shot generate high-quality transitions from which we can learn control policies for unseen task combinations. Then, we introduce an iterative self-improvement procedure in which synthetic data is validated via offline reinforcement learning and incorporated into subsequent training rounds. Our approach substantially improves zero-shot performance over monolithic and hard-coded compositional baselines, ultimately solving nearly all held-out tasks and demonstrating the emergence of meaningful compositional structure in the learned representations.
comment: Added experiments in Appendix A; no change to main paper
Vidar: Embodied Video Diffusion Model for Generalist Manipulation
Scaling general-purpose manipulation to new robot embodiments remains challenging: each platform typically needs large, homogeneous demonstrations, and end-to-end pixel-to-action pipelines may degenerate under background and viewpoint shifts. Based on previous advances in video-based robot control, we present Vidar, consisting of an embodied video diffusion model as the generalizable prior and a masked inverse dynamics model (MIDM) as the adapter. We leverage a video diffusion model pre-trained at Internet scale, and further continuously pre-train it for the embodied domain using 750K multi-view trajectories collected from three real-world robot platforms. For this embodied pre-training, we introduce a unified observation space that jointly encodes robot, camera, task, and scene contexts. The MIDM module learns action-relevant pixel masks without dense labels, grounding the prior into the target embodiment's action space while suppressing distractors. With only 20 minutes of human demonstrations on an unseen robot (1% of typical data), Vidar outperforms state-of-the-art baselines and generalizes to unseen tasks, backgrounds, and camera layouts. Our results suggest a scalable recipe for "one prior, many embodiments": strong, inexpensive video priors together with minimal on-robot alignment.
Peer-Aware Cost Estimation in Nonlinear General-Sum Dynamic Games for Mutual Learning and Intent Inference AAMAS 2026
Dynamic game theory is a powerful tool in modeling multi-agent interactions and human-robot systems. In practice, since the objective functions of both agents may not be explicitly known to each other, these interactions can be modeled as incomplete-information general-sum dynamic games. Solving for equilibrium policies for such games presents a major challenge, especially if the games involve nonlinear underlying dynamics. To simplify the problem, existing work often assumes that one agent is an expert with complete information about its peer, which can lead to biased estimates and failures in coordination. To address this challenge, we propose a nonlinear peer-aware cost estimation (N-PACE) algorithm for general-sum dynamic games. In N-PACE, using iterative linear quadratic (ILQ) approximation of dynamic games, each agent explicitly models the learning dynamics of its peer agent while inferring their objective functions and updating its own control policy accordingly in real time, which leads to unbiased and fast learning of the unknown objective function of the peer agent. Additionally, we demonstrate how N-PACE enables intent communication by explicitly modeling the peer's learning dynamics. Finally, we show how N-PACE outperforms baseline methods that disregard the learning behavior of the other agent, both analytically and using our case studies
comment: Extended version of our AAMAS 2026 accepted paper with an expanded appendix. Compared to the previous arXiv version, we add theoretical guarantees, additional experiments, new baselines, and more in-depth appendix discussion, along with a link to the GitHub repository
Human Centric General Physical Intelligence for Agile Manufacturing Automation
Agile human-centric manufacturing increasingly requires resilient robotic solutions that are capable of safe and productive interactions within unstructured environments of modern factories. While multi-modal sensor fusion provides comprehensive situational awareness yet robots must also contextualize their reasoning to achieve deep semantic understanding of complex scenes. Foundation model particularly Vision-Language-Action (VLA) models have emerged as promising approach on integrating diverse perceptual modalities and spatio-temporal reasoning abilities to ground physical actions to realize General Physical Intelligence (GPI) across various robotic embodiments. Although GPI has been conceptually discussed in literature but its pivotal role and practical deployment in agile manufacturing remain underexplored. To address this gap, this practical review systematically surveys recent advances in VLA models through the lens of GPI by offering comparative analysis of leading implementations and evaluating their industrial readiness via structured ablation study. The state of the art is organized into six thematic pillars including multisensory representation learning, sim2real transfer, planning and control, uncertainty and safety measures and benchmarking. Finally, the review highlights open challenges and future directions for integrating GPI into industrial ecosystems to align with the vision of Industry 5.0 for intelligent, adaptive and collaborative manufacturing ecosystem.
comment: Advanced Engineering Informatics
cVLA: Towards Efficient Camera-Space VLAs
Vision-Language-Action (VLA) models offer a compelling framework for tackling complex robotic manipulation tasks, but they are often expensive to train. In this paper, we propose a novel VLA approach that leverages the competitive performance of Vision Language Models (VLMs) on 2D images to directly infer robot end-effector poses in image frame coordinates. Unlike prior VLA models that output low-level controls, our model predicts trajectory waypoints, making it both more efficient to train and robot embodiment agnostic. Despite its lightweight design, our next-token prediction architecture effectively learns meaningful and executable robot trajectories. We further explore the underutilized potential of incorporating depth images, inference-time techniques such as decoding strategies, and demonstration-conditioned action generation. Our model is trained on a simulated dataset and exhibits strong sim-to-real transfer capabilities. We evaluate our approach using a combination of simulated and real data, demonstrating its effectiveness on a real robotic system.
comment: 20 pages, 10 figures; CoRL 2025 Workshop on Generalizable Priors for Robot Manipulation
Multiagent Systems
SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios
Existing benchmarks for AI coding agents focus on isolated, single-issue tasks such as fixing a bug or implementing a small feature. However, real-world software engineering is fundamentally a long-horizon endeavor: developers must interpret high-level requirements, plan coordinated changes across many files, and evolve codebases over multiple iterations while preserving existing functionality. We introduce SWE-EVO, a benchmark that evaluates agents on this long-horizon software evolution challenge. Constructed from release notes and version histories of seven mature open-source Python projects, Tool comprises 48 evolution tasks that require agents to implement multi-step modifications spanning an average of 21 files, validated against comprehensive test suites averaging 874 tests per instance. Experiments with state-of-the-art models reveal a striking capability gap: even GPT-5 with OpenHands achieves only a 21 percent resolution rate on Tool, compared to 65 percent on the single-issue SWE-Bench Verified. This demonstrates that current agents struggle with sustained, multi-file reasoning. We also propose Fix Rate, a fine-grained metric that captures partial progress toward solving these complex, long-horizon tasks.
Agent-Based Output Drift Detection for Breast Cancer Response Prediction in a Multisite Clinical Decision Support System
Modern clinical decision support systems can concurrently serve multiple, independent medical imaging institutions, but their predictive performance may degrade across sites due to variations in patient populations, imaging hardware, and acquisition protocols. Continuous surveillance of predictive model outputs offers a safe and reliable approach for identifying such distributional shifts without ground truth labels. However, most existing methods rely on centralized monitoring of aggregated predictions, overlooking site-specific drift dynamics. We propose an agent-based framework for detecting drift and assessing its severity in multisite clinical AI systems. To evaluate its effectiveness, we simulate a multi-center environment for output-based drift detection, assigning each site a drift monitoring agent that performs batch-wise comparisons of model outputs against a reference distribution. We analyse several multi-center monitoring schemes, that differ in how the reference is obtained (site-specific, global, production-only and adaptive), alongside a centralized baseline. Results on real-world breast cancer imaging data using a pathological complete response prediction model shows that all multi-center schemes outperform centralized monitoring, with F1-score improvements up to 10.3% in drift detection. In the absence of site-specific references, the adaptive scheme performs best, with F1-scores of 74.3% for drift detection and 83.7% for drift severity classification. These findings suggest that adaptive, site-aware agent-based drift monitoring can enhance reliability of multisite clinical decision support systems.
comment: Accepted at MICAD (Medical Imaging and Computer-Aided Diagnosis) 2025
Snowveil: A Framework for Decentralised Preference Discovery
Aggregating subjective preferences of a large group is a fundamental challenge in computational social choice, traditionally reliant on central authorities. To address the limitations of this model, this paper introduces Decentralised Preference Discovery (DPD), the problem of determining the collective will of an electorate under constraints of censorship resistance, partial information, and asynchronous communication. We propose Snowveil, a novel framework for this task. Snowveil uses an iterative, gossip-based protocol where voters repeatedly sample the preferences of a small, random subset of the electorate to progressively converge on a collective outcome. We demonstrate the framework's modularity by designing the Constrained Hybrid Borda (CHB), a novel aggregation rule engineered to balance broad consensus with strong plurality support, and provide a rigorous axiomatic analysis of its properties. By applying a potential function and submartingale theory, we develop a multi-level analytical method to show that the system almost surely converges to a stable, single-winner in finite time, a process that can then be iterated to construct a set of winning candidates for multi-winner scenarios. This technique is largely agnostic to the specific aggregation rule, requiring only that it satisfies core social choice axioms like Positive Responsiveness, thus offering a formal toolkit for a wider class of DPD protocols. Furthermore, we present a comprehensive empirical analysis through extensive simulation, validating Snowveil's $O(n)$ scalability. Overall, this work advances the understanding of how a stable consensus can emerge from subjective, complex, and diverse preferences in decentralised systems for large electorates.
On Swarm Leader Identification using Probing Policies
Identifying the leader within a robotic swarm is crucial, especially in adversarial contexts where leader concealment is necessary for mission success. This work introduces the interactive Swarm Leader Identification (iSLI) problem, a novel approach where an adversarial probing agent identifies a swarm's leader by physically interacting with its members. We formulate the iSLI problem as a Partially Observable Markov Decision Process (POMDP) and employ Deep Reinforcement Learning, specifically Proximal Policy Optimization (PPO), to train the prober's policy. The proposed approach utilizes a novel neural network architecture featuring a Timed Graph Relationformer (TGR) layer combined with a Simplified Structured State Space Sequence (S5) model. The TGR layer effectively processes graph-based observations of the swarm, capturing temporal dependencies and fusing relational information using a learned gating mechanism to generate informative representations for policy learning. Extensive simulations demonstrate that our TGR-based model outperforms baseline graph neural network architectures and exhibits significant zero-shot generalization capabilities across varying swarm sizes and speeds different from those used during training. The trained prober achieves high accuracy in identifying the leader, maintaining performance even in out-of-training distribution scenarios, and showing appropriate confidence levels in its predictions. Real-world experiments with physical robots further validate the approach, confirming successful sim-to-real transfer and robustness to dynamic changes, such as unexpected agent disconnections.
comment: 13 pages, journal
SafeSieve: From Heuristics to Experience in Progressive Pruning for LLM-based Multi-Agent Communication AAAI-2026
LLM-based multi-agent systems exhibit strong collaborative capabilities but often suffer from redundant communication and excessive token overhead. Existing methods typically enhance efficiency through pretrained GNNs or greedy algorithms, but often isolate pre- and post-task optimization, lacking a unified strategy. To this end, we present SafeSieve, a progressive and adaptive multi-agent pruning algorithm that dynamically refines the inter-agent communication through a novel dual-mechanism. SafeSieve integrates initial LLM-based semantic evaluation with accumulated performance feedback, enabling a smooth transition from heuristic initialization to experience-driven refinement. Unlike existing greedy Top-k pruning methods, SafeSieve employs 0-extension clustering to preserve structurally coherent agent groups while eliminating ineffective links. Experiments across benchmarks (SVAMP, HumanEval, etc.) showcase that SafeSieve achieves 94.01% average accuracy while reducing token usage by 12.4%-27.8%. Results further demonstrate robustness under prompt injection attacks (1.23% average accuracy drop). In heterogeneous settings, SafeSieve reduces deployment costs by 13.3% while maintaining performance. These results establish SafeSieve as an efficient, GPU-free, and scalable framework for practical multi-agent systems. Our code can be found here: https://github.com/csgen/SafeSieve
comment: AAAI-2026 poster; 7 pages for main content, 5 figures, 4 tables
JustAct+: A Framework for Auditable Multi-Agent Systems Regulated by Inter-Organisational Policies
In open multi-agent agent systems that cross organisational boundaries, agent actions must be regulated by complex policies. Consider medical data processing systems, which must observe generic laws (e.g., EU data protection regulations) and also specific participants' resource conditions (e.g., Bob consents to sharing his X-Rays with EU hospitals). Presently, we address the implementation of these systems as distributed software. Solutions to key sub-problems are available: existing policy languages capture the necessary normative concepts and formalise the computational representation and reasoning about policies, and existing distributed algorithms and protocols coordinate agents' changing actions and policies. But which policies and protocols are useful in application? With the JustAct framework, we characterise a class of multi-agent systems where actors justify their actions with sufficient policy information collected from dynamic policy statements and agreements. We prove key properties of these systems, e.g., any decision that an action is permitted now cannot be refuted later, regardless of any added statements or updated agreements. We study a particular instance of the framework by specifying (in Rocq) and implementing (in Rust) a particular policy language and runtime system for mediating agent communications. We demonstrate and assess JustAct via a case study of this implementation: we reproduce the usage scenarios of Brane, an existing policy-regulated, inter-domain, medical data processing system.
Systems and Control (EESS)
Scaling up Stability: Reinforcement Learning for Distributed Control of Networked Systems in the Space of Stabilizing Policies
We study distributed control of networked systems through reinforcement learning, where neural policies must be simultaneously scalable, expressive and stabilizing. We introduce a policy parameterization that embeds Graph Neural Networks (GNNs) into a Youla-like magnitude-direction parameterization, yielding distributed stochastic controllers that guarantee network-level closed-loop stability by design. The magnitude is implemented as a stable operator consisting of a GNN acting on disturbance feedback, while the direction is a GNN acting on local observations. We prove robustness of the closed loop to perturbations in both the graph topology and model parameters, and show how to integrate our parameterization with Proximal Policy Optimization. Experiments on a multi-agent navigation task show that policies trained on small networks transfer directly to larger ones and unseen network topologies, achieve higher returns and lower variance than a state-of-the-art MARL baseline while preserving stability.
The Illusion of Consistency: Selection-Induced Bias in Gated Kalman Innovation Statistics
Validation gating is a fundamental component of classical Kalman-based tracking systems. Only measurements whose normalized innovation squared (NIS) falls below a prescribed threshold are considered for state update. While this procedure is statistically motivated by the chi-square distribution, it implicitly replaces the unconditional innovation process with a conditionally observed one, restricted to the validation event. This paper shows that innovation statistics computed after gating converge to gate-conditioned rather than nominal quantities. Under classical linear--Gaussian assumptions, we derive exact expressions for the first- and second-order moments of the innovation conditioned on ellipsoidal gating, and show that gating induces a deterministic, dimension-dependent contraction of the innovation covariance. The analysis is extended to NN association, which is shown to act as an additional statistical selection operator. We prove that selecting the minimum-norm innovation among multiple in-gate measurements introduces an unavoidable energy contraction, implying that nominal innovation statistics cannot be preserved under nontrivial gating and association. Closed-form results in the two-dimensional case quantify the combined effects and illustrate their practical significance.
comment: 8 pages, preprint
Sink Proximity: A Novel Approach for Online Vehicle Dispatch in Ride-hailing
Ride-hailing platforms have a profound impact on urban transportation systems, and their performance largely depends on how intelligently they dispatch vehicles in real time. In this work, we develop a new approach to online vehicle dispatch that strengthens a platform's ability to serve more requests under demand uncertainty. We introduce a novel measure called sink proximity, a network-science-inspired measure that captures how demand and vehicle flows are likely to evolve across the city. By integrating this measure into a shareability-network framework, we design an online dispatch algorithm that naturally considers future network states, without depending on fragile spatiotemporal forecasts. Numerical studies demonstrate that our proposed solution significantly improves the request service rate under peak hours within a receding horizon framework with limited future information available.
Prioritized Constraints in Optimization-Based Control
We provide theoretical foundations and computational tools for the systematic design of optimization-based control laws with constraints that have different priorities. By introducing the concept of prioritized intersections, we extend and unify previous work on the topic. Moreover, to enable the use of prioritized intersection in real-time applications, we propose an efficient solver for forming such intersections for polyhedral constraints. The solver in question is a tailored implementation of a dual active-set quadratic programming solver that leverages the particular problem structure of the optimization problems arising for prioritized intersections. The method is validated in a real-time MPC application for autonomous driving, where it successfully resolves six different levels of conflicting constraints, confirming its efficiency and practicality for control. Furthermore, we show that the proposed solver outperforms existing solvers for hierarchical quadratic programming, making it relevant beyond control applications.
A Distributed Hierarchical Spatio-Temporal Edge-Enhanced Graph Neural Network for City-Scale Dynamic Logistics Routing
City-scale logistics routing has become increasingly challenging as metropolitan road networks grow to tens of millions of edges and traffic conditions evolve rapidly under high-volume mobility demands. Conventional centralized routing algorithms and monolithic graph neural network (GNN) models suffer from limited scalability, high latency, and poor real-time adaptability, which restricts their effectiveness in large urban logistics systems. To address these challenges, this paper proposes a Distributed Hierarchical Spatio-Temporal Edge-Enhanced Graph Neural Network (HSTE-GNN) for dynamic routing over ultra-large road networks. The framework partitions the city-scale graph into regional subgraphs processed in parallel across distributed computing nodes, enabling efficient learning of localized traffic dynamics. Within each region, an edge-enhanced spatio-temporal module jointly models node states, dynamic edge attributes, and short-term temporal dependencies. A hierarchical coordination layer further aggregates cross-region representations through an asynchronous parameter-server mechanism, ensuring global routing coherence under high-frequency traffic updates. This distributed hierarchical design balances local responsiveness with global consistency, significantly improving scalability and inference efficiency. Experiments on real-world large-scale traffic datasets from Beijing and New York demonstrate that HSTE-GNN outperforms strong spatio-temporal baselines such as ST-GRAPH, achieving 34.9% lower routing delay, 14.7% lower MAPE, and 11.8% lower RMSE, while improving global route consistency by 7.3%. These results confirm that the proposed framework provides a scalable, adaptive, and efficient solution for next-generation intelligent transportation systems and large-scale logistics platforms.
On Hyperexponential Stabilization of Linear Infinite-Dimensional Systems
This paper study the hyperexponential stabilization for infinite-dimensional system on Hilbert space by a distributed time depending control law. The well-posedness of the closed loop for every time is obtained through the use of maximal monotone operator. The hyperexponential stability and ISS property of the closed loop is established using Lyapunov analysis and time scale transformation.
Virtual Resistance-Based Control for Grid-Connected Inverters using Persidskii Systems Approach
This work addresses virtual resistance (VR)based control for grid-connected inverters, which enhances transient damping, reduces steady-state errors, and improves robustness to grid disturbances without requiring additional voltage sensors. Classical passivity-based VR control is robust, but limited by restrictive sector bounds on nonlinearities. We extend these bounds and model the closed-loop system as a generalized Persidskii-type nonlinear system. Using this framework, we derive input-to-state stability (ISS) conditions that account for the extended nonlinearities and external disturbances, providing a systematic and less conservative approach to VR control design under practical operating conditions, which is validated through extensive simulations.
Why Most Optimism Bandit Algorithms Have the Same Regret Analysis: A Simple Unifying Theorem
Several optimism-based stochastic bandit algorithms -- including UCB, UCB-V, linear UCB, and finite-arm GP-UCB -- achieve logarithmic regret using proofs that, despite superficial differences, follow essentially the same structure. This note isolates the minimal ingredients behind these analyses: a single high-probability concentration condition on the estimators, after which logarithmic regret follows from two short deterministic lemmas describing radius collapse and optimism-forced deviations. The framework yields unified, near-minimal proofs for these classical algorithms and extends naturally to many contemporary bandit variants.
Neural Proofs for Sound Verification and Control of Complex Systems
This informal contribution presents an ongoing line of research that is pursuing a new approach to the construction of sound proofs for the formal verification and control of complex stochastic models of dynamical systems, of reactive programs and, more generally, of models of Cyber-Physical Systems. Neural proofs are made up of two key components: 1) proof rules encode requirements entailing the verification of general temporal specifications over the models of interest; and 2) certificates that discharge such rules, namely they are constructed from said proof rules with an inductive (that is, cyclic, repetitive) approach; this inductive approach involves: 2a) accessing samples from the model's dynamics and accordingly training neural networks, whilst 2b) generalising such networks via SAT-modulo-theory (SMT) queries that leverage the full knowledge of the models. In the context of sequential decision making problems over complex stochastic models, it is possible to additionally generate provably-correct policies/strategies/controllers, namely state-feedback functions that, in conjunction with neural certificates, formally attain the given specifications for the models of interest.
Robust H-infinity control under stochastic requirements: minimizing conditional value-at-risk instead of worst-case performance
Conventional robust $\mathcal H_2/\mathcal H_\infty$ control minimizes the worst-case performance, often leading to a conservative design driven by very rare and somewhat arbitrary parametric configurations. To reduce this conservatism while taking advantage of the stochastic properties of Monte-Carlo sampling and its compatibility with parallel computing, we introduce an alternative paradigm that optimizes the controller with respect to a stochastic criterion, namely the conditional value at risk. We illustrate the potential of this approach on a realistic satellite benchmark, showing that it can significantly improve overall performance by tolerating some degradation in very rare worst-case scenarios.
comment: Preprint
State-Space Averaging Revisited via Reconstruction Operators
This paper presents an operator-theoretic reconstruction of an equivalent continuous-time LTI model from an exact sampled-data (Poincaré-map) baseline of a piecewise-linear switching system. The rebuilding is explicitly expressed via matrix logarithms. By expanding the logarithm of a product of matrix exponentials using the Baker--Campbell--Hausdorff (BCH) formula, we show that the classical state-space averaging (SSA) model can be interpreted as the leading-order truncation of this exact reconstruction when the switching period is small and the ripple is small. The same view explains why SSA critically relies on low-frequency and small-ripple assumptions, and why the method becomes fragile for converters with more than two subintervals per cycle. Finally, we provide a complexity-reduced, SSA-flavoured implementation strategy for obtaining the required spectral quantities and a real-valued logarithm without explicitly calling eigen-decomposition or complex matrix logarithms, by exploiting $2\times 2$ invariants and a minimal real-lift construction.
Trustworthy and Explainable Deep Reinforcement Learning for Safe and Energy-Efficient Process Control: A Use Case in Industrial Compressed Air Systems
This paper presents a trustworthy reinforcement learning approach for the control of industrial compressed air systems. We develop a framework that enables safe and energy-efficient operation under realistic boundary conditions and introduce a multi-level explainability pipeline combining input perturbation tests, gradient-based sensitivity analysis, and SHAP (SHapley Additive exPlanations) feature attribution. An empirical evaluation across multiple compressor configurations shows that the learned policy is physically plausible, anticipates future demand, and consistently respects system boundaries. Compared to the installed industrial controller, the proposed approach reduces unnecessary overpressure and achieves energy savings of approximately 4\,\% without relying on explicit physics models. The results further indicate that system pressure and forecast information dominate policy decisions, while compressor-level inputs play a secondary role. Overall, the combination of efficiency gains, predictive behavior, and transparent validation supports the trustworthy deployment of reinforcement learning in industrial energy systems.
Power Converter DC Link Ripple and Network Unbalance as Active Constraints in Distribution System Optimal Power Flow
The mitigation of unbalanced grid voltages or currents by voltage source converters results in power ripple on the dc link, and is a key converter design parameter due to hardware or stability considerations. Despite the importance of this issue for system design and operation, the use of Optimal Power Flow (OPF)-based methods capturing the interaction between dc link ripple and converter unbalanced operation has been largely unexplored. In this work, the magnitude of the power ripple is derived for generic multi-terminal converters, then introduced as a bilinear OPF constraint for two-level converter topologies. OPF case studies demonstrate the necessity to model both neutral current and dc link ripple, with tradeoffs between capacitor sizing and leg sizing highlighted for phase current unbalance mitigation applications. Time domain simulations of a grid-connected four-wire voltage source converter verify the accuracy and validity of the algebraic formulation. It is concluded that awareness of dc link ripple impacts and constraints will be of growing importance for distribution system operators.
comment: This work has been submitted to the IEEE for possible publication
Regularized Distributed MPC for UAV Networks: Stabilizing Coupled Motion and Hybrid Beam Alignment
This letter investigates the coupled control problem in UAV networks utilizing high-frequency hybrid beamsteering. While phased arrays enable rapid electronic scanning, their finite Field of View (FoV) imposes a fundamental constraint that necessitates active mechanical steering of the airframe to maintain connectivity. We propose a decentralized Model Predictive Control (MPC) framework that jointly optimizes trajectory and heading to maximize network sum-capacity subject to safety constraints. Addressing the numerical instability caused by fast-fading channel nulls, we introduce a regularized surrogate cost function based on discrete spatial smoothing. We analytically prove that this approximation bounds the cost curvature, restoring the Lipschitz continuity of the gradient. Crucially, we derive a sufficient condition linking this Lipschitz constant to the controller gain, guaranteeing the contraction and linear convergence of the distributed best-response dynamics. Simulation results demonstrate that the proposed algorithm effectively navigates the trade-off between electronic beam tracking and kinematic safety, significantly systematically outperforming velocity-aligned baselines.
comment: Submitted to IEEE Control Systems Letters (LCSS). 6 pages, 3 figures
FedWiLoc: Federated Learning for Privacy-Preserving WiFi Indoor Localization
Current data-driven Wi-Fi-based indoor localization systems face three critical challenges: protecting user privacy, achieving accurate predictions in dynamic multipath environments, and generalizing across different deployments. Traditional Wi-Fi localization systems often compromise user privacy, particularly when facing compromised access points (APs) or man-in-the-middle attacks. As IoT devices proliferate in indoor environments, developing solutions that deliver accurate localization while robustly protecting privacy has become imperative. We introduce FedWiLoc, a privacy-preserving indoor localization system that addresses these challenges through three key innovations. First, FedWiLoc employs a split architecture where APs process Channel State Information (CSI) locally and transmit only privacy-preserving embedding vectors to user devices, preventing raw CSI exposure. Second, during training, FedWiLoc uses federated learning to collaboratively train the model across APs without centralizing sensitive user data. Third, we introduce a geometric loss function that jointly optimizes angle-of-arrival predictions and location estimates, enforcing geometric consistency to improve accuracy in challenging multipath conditions. Extensive evaluation across six diverse indoor environments spanning over 2,000 sq. ft. demonstrates that FedWiLoc outperforms state-of-the-art methods by up to 61.9% in median localization error while maintaining strong privacy guarantees throughout both training and inference.
Structural Integrality in Task Assignment and Path Finding via Total Unimodularity of Petri Net Models
Task Assignment and Path Finding (TAPF) concerns computing collision-free motions for multiple robots while jointly selecting goal locations. In this paper, safety is enforced by requiring unit-capacity traversal between successive intermediate markings, yielding coordination strategies that are valid independently of any specific time interpretation. Existing optimization-based approaches typically rely on time-expanded network-flow models, which result in large mixed-integer programs and limited scalability. We instead develop a Petri net (PN)-based optimization framework that exploits structural properties of the motion model to improve computational efficiency without explicit time expansion. When robot motion is modeled by strongly connected state-machine PNs, we show that, once the congestion level (equivalently, the synchronization depth) is fixed to an integer value, the resulting motion-planning constraint matrix is totally unimodular. Consequently, the corresponding LP relaxation admits integral optimal solutions for the motion variables. When the estimated congestion exceeds one, we introduce a synchronization-on-demand mechanism based on intermediate markings; for a fixed number of synchronization stages, the associated constraint matrices remain totally unimodular, thereby preserving integrality of the motion variables. Finally, we extend TAPF to Boolean specifications over regions of interest and propose a two-stage LP/mixed-integer linear programming (MILP) scheme in which integrality is confined to task-selection variables. Simulations on large benchmarks demonstrate substantial scalability improvements over time-expanded optimization baselines.
Any-Time Regret-Guaranteed Algorithm for Control of Linear Quadratic Systems
We propose a computationally efficient algorithm that achieves anytime regret of order $\mathcal{O}(\sqrt{t})$, with explicit dependence on the system dimensions and on the solution of the Discrete Algebraic Riccati Equation (DARE). Our approach uses an appropriately tuned regularization and a sufficiently accurate initial estimate to construct confidence ellipsoids for control design. A carefully designed input-perturbation mechanism is incorporated to ensure anytime performance. We develop two variants of the algorithm. The first enforces strong sequential stability, requiring each policy to be stabilizing and successive policies to remain close. This sequential condition helps prevent state explosion at policy update times; however, it results in a suboptimal regret scaling with respect to the DARE solution. Motivated by this limitation, we introduce a second class of algorithms that removes this requirement and instead requires only that each generated policy be stabilizing. Closed-loop stability is then preserved through a dwell-time inspired policy-update rule. This class of algorithms also addresses key shortcomings of most existing approaches which lack explicit high-probability bounds on the state trajectory expressed in system-theoretic terms. Our analysis shows that partially relaxing the sequential-stability requirement yields optimal regret. Finally, our method eliminates the need for any \emph{a priori} bound on the norm of the DARE solution, an assumption required by all existing computationally efficient OFU based algorithms.
PowerMamba: A Deep State Space Model and Comprehensive Benchmark for Time Series Prediction in Electric Power Systems
The electricity sector is undergoing substantial transformations due to the rising electrification of demand, enhanced integration of renewable energy resources, and the emergence of new technologies. These changes are rendering the electric grid more volatile and unpredictable, making it difficult to maintain reliable operations. In order to address these issues, advanced time series prediction models are needed for closing the gap between the forecasted and actual grid outcomes. In this paper, we introduce a multivariate time series prediction model that combines traditional state space models with deep learning methods to simultaneously capture and predict the underlying dynamics of multiple time series. Additionally, we design a time series processing module that incorporates high-resolution external forecasts into sequence-to-sequence prediction models, achieving this with negligible increases in size and no loss of accuracy. We also release an extended dataset spanning five years of load, electricity price, ancillary service price, and renewable generation. To complement this dataset, we provide an open-access toolbox that includes our proposed model, the dataset itself, and several state-of-the-art prediction models, thereby creating a unified framework for benchmarking advanced machine learning approaches. Our findings indicate that the proposed model outperforms existing models across various prediction tasks, improving state-of-the-art prediction error by an average of 7% and decreasing model parameters by 43%.
comment: This paper has been accepted for publication in the Journal of IEEE Transactions on Power Systems
An Architecture for Remote Container Builds and Artifact Delivery Using a Controller-Light Jenkins CI/CD Pipeline
Resource-intensive builds are often executed directly on the controller by conventional Jenkins installations, which can lower reliability and overload system resources. Jenkins functions as a containerized controller with persistent volumes in the controller-light CI/CD framework presented in this paper, delegating difficult build and packaging tasks to a remote Docker host. The controller container maintains secure SSH connections to remote compute nodes while focusing solely on orchestration and reporting. Atomic deployments with time-stamped backups, containerized build environments, immutable artifact packaging, and automated notifications are all included in the system. Faster build throughput, reduced CPU and RAM consumption on the controller, and reduced artifact delivery latency are all revealed by experimental evaluation. For small and medium-sized DevOps businesses looking for scalable automation without adding orchestration complexity, this method offers a repeatable, low-maintenance solution.
comment: v2: revised writing and presentation; clarified contributions and experimental discussion
Closing the Loop Inside Neural Networks: Causality-Guided Layer Adaptation for Fault Recovery Control
This paper studies the problem of real-time fault recovery control for nonlinear control-affine systems subject to actuator loss of effectiveness faults and external disturbances. We derive a two-stage framework that combines causal inference with selective online adaptation to achieve an effective learning-based recovery control method. In the offline phase, we develop a causal layer attribution technique based on the average causal effect (ACE) to evaluate the relative importance of each layer in a pretrained deep neural network (DNN) controller compensating for faults. This methodology identifies a subset of high-impact layers responsible for robust fault compensation. In the online phase, we deploy a Lyapunov-based gradient update to adapt only the ACE-selected layer to circumvent the need for full-network or last-layer only updates. The proposed adaptive controller guarantees uniform ultimate boundedness (UUB) with exponential convergence of the closed-loop system in the presence of actuator faults and external disturbances. Compared to conventional adaptive DNN controllers with full-network adaptation, our methodology has a reduced computational overhead. To demonstrate the effectiveness of our proposed methodology, a case study is provided on a 3-axis attitude control system of a spacecraft with four reaction wheels.
Informativity Conditions for Multiple Signals: Properties, Experimental Design, and Applications (extended version)
Recent studies highlight the importance of persistently exciting condition in single signal sequence for model identification and data-driven control methodologies. However, maintaining prolonged excitation in control signals introduces significant challenges, as continuous excitation can reduce the lifetime of mechanical devices. In this paper, we introduce three informativity conditions for various types of multi-signal data, each augmented by weight factors. We explore the interrelations between these conditions and their rank properties in linear time-invariant systems. Furthermore, we introduce open-loop experimental design methods tailored to each of the three conditions, which can synthesize the required excitation conditions either offline or online, even in the presence of limited information within each signal segment. We demonstrate the effectiveness of these informativity conditions in least-squares identification. Additionally, all three conditions can extend Willems' fundamental lemma and are utilized to assess the properties of the system. Illustrative examples confirm that these conditions yield satisfactory outcomes in both least-squares identification and the construction of data-driven controllers.
Peer-Aware Cost Estimation in Nonlinear General-Sum Dynamic Games for Mutual Learning and Intent Inference AAMAS 2026
Dynamic game theory is a powerful tool in modeling multi-agent interactions and human-robot systems. In practice, since the objective functions of both agents may not be explicitly known to each other, these interactions can be modeled as incomplete-information general-sum dynamic games. Solving for equilibrium policies for such games presents a major challenge, especially if the games involve nonlinear underlying dynamics. To simplify the problem, existing work often assumes that one agent is an expert with complete information about its peer, which can lead to biased estimates and failures in coordination. To address this challenge, we propose a nonlinear peer-aware cost estimation (N-PACE) algorithm for general-sum dynamic games. In N-PACE, using iterative linear quadratic (ILQ) approximation of dynamic games, each agent explicitly models the learning dynamics of its peer agent while inferring their objective functions and updating its own control policy accordingly in real time, which leads to unbiased and fast learning of the unknown objective function of the peer agent. Additionally, we demonstrate how N-PACE enables intent communication by explicitly modeling the peer's learning dynamics. Finally, we show how N-PACE outperforms baseline methods that disregard the learning behavior of the other agent, both analytically and using our case studies
comment: Extended version of our AAMAS 2026 accepted paper with an expanded appendix. Compared to the previous arXiv version, we add theoretical guarantees, additional experiments, new baselines, and more in-depth appendix discussion, along with a link to the GitHub repository
Robotics
Diffusion Forcing for Multi-Agent Interaction Sequence Modeling
Understanding and generating multi-person interactions is a fundamental challenge with broad implications for robotics and social computing. While humans naturally coordinate in groups, modeling such interactions remains difficult due to long temporal horizons, strong inter-agent dependencies, and variable group sizes. Existing motion generation methods are largely task-specific and do not generalize to flexible multi-agent generation. We introduce MAGNet (Multi-Agent Diffusion Forcing Transformer), a unified autoregressive diffusion framework for multi-agent motion generation that supports a wide range of interaction tasks through flexible conditioning and sampling. MAGNet performs dyadic prediction, partner inpainting, and full multi-agent motion generation within a single model, and can autoregressively generate ultra-long sequences spanning hundreds of v. Building on Diffusion Forcing, we introduce key modifications that explicitly model inter-agent coupling during autoregressive denoising, enabling coherent coordination across agents. As a result, MAGNet captures both tightly synchronized activities (e.g, dancing, boxing) and loosely structured social interactions. Our approach performs on par with specialized methods on dyadic benchmarks while naturally extending to polyadic scenarios involving three or more interacting people, enabled by a scalable architecture that is agnostic to the number of agents. We refer readers to the supplemental video, where the temporal dynamics and spatial coordination of generated interactions are best appreciated. Project page: https://von31.github.io/MAGNet/
RadarGen: Automotive Radar Point Cloud Generation from Cameras
We present RadarGen, a diffusion model for synthesizing realistic automotive radar point clouds from multi-view camera imagery. RadarGen adapts efficient image-latent diffusion to the radar domain by representing radar measurements in bird's-eye-view form that encodes spatial structure together with radar cross section (RCS) and Doppler attributes. A lightweight recovery step reconstructs point clouds from the generated maps. To better align generation with the visual scene, RadarGen incorporates BEV-aligned depth, semantic, and motion cues extracted from pretrained foundation models, which guide the stochastic generation process toward physically plausible radar patterns. Conditioning on images makes the approach broadly compatible, in principle, with existing visual datasets and simulation frameworks, offering a scalable direction for multimodal generative simulation. Evaluations on large-scale driving data show that RadarGen captures characteristic radar measurement distributions and reduces the gap to perception models trained on real data, marking a step toward unified generative simulation across sensing modalities.
comment: Project page: https://radargen.github.io/
AnyTask: an Automated Task and Data Generation Framework for Advancing Sim-to-Real Policy Learning
Generalist robot learning remains constrained by data: large-scale, diverse, and high-quality interaction data are expensive to collect in the real world. While simulation has become a promising way for scaling up data collection, the related tasks, including simulation task design, task-aware scene generation, expert demonstration synthesis, and sim-to-real transfer, still demand substantial human effort. We present AnyTask, an automated framework that pairs massively parallel GPU simulation with foundation models to design diverse manipulation tasks and synthesize robot data. We introduce three AnyTask agents for generating expert demonstrations aiming to solve as many tasks as possible: 1) ViPR, a novel task and motion planning agent with VLM-in-the-loop Parallel Refinement; 2) ViPR-Eureka, a reinforcement learning agent with generated dense rewards and LLM-guided contact sampling; 3) ViPR-RL, a hybrid planning and learning approach that jointly produces high-quality demonstrations with only sparse rewards. We train behavior cloning policies on generated data, validate them in simulation, and deploy them directly on real robot hardware. The policies generalize to novel object poses, achieving 44% average success across a suite of real-world pick-and-place, drawer opening, contact-rich pushing, and long-horizon manipulation tasks. Our project website is at https://anytask.rai-inst.com .
comment: 28 pages, 25 figures. The first four authors contributed equally
Planning as Descent: Goal-Conditioned Latent Trajectory Synthesis in Learned Energy Landscapes
We present Planning as Descent (PaD), a framework for offline goal-conditioned reinforcement learning that grounds trajectory synthesis in verification. Instead of learning a policy or explicit planner, PaD learns a goal-conditioned energy function over entire latent trajectories, assigning low energy to feasible, goal-consistent futures. Planning is realized as gradient-based refinement in this energy landscape, using identical computation during training and inference to reduce train-test mismatch common in decoupled modeling pipelines. PaD is trained via self-supervised hindsight goal relabeling, shaping the energy landscape around the planning dynamics. At inference, multiple trajectory candidates are refined under different temporal hypotheses, and low-energy plans balancing feasibility and efficiency are selected. We evaluate PaD on OGBench cube manipulation tasks. When trained on narrow expert demonstrations, PaD achieves state-of-the-art 95\% success, strongly outperforming prior methods that peak at 68\%. Remarkably, training on noisy, suboptimal data further improves success and plan efficiency, highlighting the benefits of verification-driven planning. Our results suggest learning to evaluate and refine trajectories provides a robust alternative to direct policy learning for offline, reward-free planning.
UniStateDLO: Unified Generative State Estimation and Tracking of Deformable Linear Objects Under Occlusion for Constrained Manipulation
Perception of deformable linear objects (DLOs), such as cables, ropes, and wires, is the cornerstone for successful downstream manipulation. Although vision-based methods have been extensively explored, they remain highly vulnerable to occlusions that commonly arise in constrained manipulation environments due to surrounding obstacles, large and varying deformations, and limited viewpoints. Moreover, the high dimensionality of the state space, the lack of distinctive visual features, and the presence of sensor noises further compound the challenges of reliable DLO perception. To address these open issues, this paper presents UniStateDLO, the first complete DLO perception pipeline with deep-learning methods that achieves robust performance under severe occlusion, covering both single-frame state estimation and cross-frame state tracking from partial point clouds. Both tasks are formulated as conditional generative problems, leveraging the strong capability of diffusion models to capture the complex mapping between highly partial observations and high-dimensional DLO states. UniStateDLO effectively handles a wide range of occlusion patterns, including initial occlusion, self-occlusion, and occlusion caused by multiple objects. In addition, it exhibits strong data efficiency as the entire network is trained solely on a large-scale synthetic dataset, enabling zero-shot sim-to-real generalization without any real-world training data. Comprehensive simulation and real-world experiments demonstrate that UniStateDLO outperforms all state-of-the-art baselines in both estimation and tracking, producing globally smooth yet locally precise DLO state predictions in real time, even under substantial occlusions. Its integration as the front-end module in a closed-loop DLO manipulation system further demonstrates its ability to support stable feedback control in complex, constrained 3-D environments.
comment: The first two authors contributed equally. Project page: https://unistatedlo.github.io
A Dual Quaternion based RRT* Path Planning Approach for Satellite Rendezvous and Docking
This paper proposes a sampling-based motion planner that employs a dual quaternion representation to generate smooth, collision-free six-degree-of-freedom pose trajectories for satellite rendezvous and docking under keep-out zone constraints. The proposed planner integrates the dual quaternion algebra directly into an RRT* framework, thereby enabling natural screw motion interpolation in SE(3). The dual quaternion-based RRT* has been implemented in Python and demonstrated on a representative multi-obstacle scenario. A comparison with a standard RRT* using separate translation and quaternion steering highlights the enhanced pose continuity and obstacle avoidance of the proposed method. The present approach is purely kinematic in nature and does not take into account relative orbital dynamics. Consequently, the resulting path provides a preliminary estimate for a subsequent optimisation-based trajectory planner, which will refine the motion with dynamic constraints for the purpose of practical satellite rendezvous and docking missions.
comment: 6 pages, CAMSAT 2025, This work has been accepted to IFAC
Vidarc: Embodied Video Diffusion Model for Closed-loop Control
Robotic arm manipulation in data-scarce settings is a highly challenging task due to the complex embodiment dynamics and diverse contexts. Recent video-based approaches have shown great promise in capturing and transferring the temporal and physical interactions by pre-training on Internet-scale video data. However, such methods are often not optimized for the embodiment-specific closed-loop control, typically suffering from high latency and insufficient grounding. In this paper, we present Vidarc (Video Diffusion for Action Reasoning and Closed-loop Control), a novel autoregressive embodied video diffusion approach augmented by a masked inverse dynamics model. By grounding video predictions with action-relevant masks and incorporating real-time feedback through cached autoregressive generation, Vidarc achieves fast, accurate closed-loop control. Pre-trained on one million cross-embodiment episodes, Vidarc surpasses state-of-the-art baselines, achieving at least a 15% higher success rate in real-world deployment and a 91% reduction in latency. We also highlight its robust generalization and error correction capabilities across previously unseen robotic platforms.
Learning Safe Autonomous Driving Policies Using Predictive Safety Representations ICRA 2026
Safe reinforcement learning (SafeRL) is a prominent paradigm for autonomous driving, where agents are required to optimize performance under strict safety requirements. This dual objective creates a fundamental tension, as overly conservative policies limit driving efficiency while aggressive exploration risks safety violations. The Safety Representations for Safer Policy Learning (SRPL) framework addresses this challenge by equipping agents with a predictive model of future constraint violations and has shown promise in controlled environments. This paper investigates whether SRPL extends to real-world autonomous driving scenarios. Systematic experiments on the Waymo Open Motion Dataset (WOMD) and NuPlan demonstrate that SRPL can improve the reward-safety tradeoff, achieving statistically significant improvements in success rate (effect sizes r = 0.65-0.86) and cost reduction (effect sizes r = 0.70-0.83), with p < 0.05 for observed improvements. However, its effectiveness depends on the underlying policy optimizer and the dataset distribution. The results further show that predictive safety representations play a critical role in improving robustness to observation noise. Additionally, in zero-shot cross-dataset evaluation, SRPL-augmented agents demonstrate improved generalization compared to non-SRPL methods. These findings collectively demonstrate the potential of predictive safety representations to strengthen SafeRL for autonomous driving.
comment: 8 pages, 4 figures. Submitted to ICRA 2026
Optimized Scheduling and Positioning of Mobile Manipulators in Collaborative Applications
The growing integration of mobile robots in shared workspaces requires efficient path planning and coordination between the agents, accounting for safety and productivity. In this work, we propose a digital model-based optimization framework for mobile manipulators in human-robot collaborative environments, in order to determine the sequence of robot base poses and the task scheduling for the robot. The complete problem is treated as black-box, and Particle Swarm Optimization (PSO) is employed to balance conflicting Key-Performance Indicators (KPIs). We demonstrate improvements in cycle time, task sequencing, and adaptation to human presence in a collaborative box-packing scenario.
comment: Accepted at The IFAC Joint Conference on Computers, Cognition and Communication (J3C) 2025
On Using Neural Networks to Learn Safety Speed Reduction in Human-Robot Collaboration: A Comparative Analysis
In Human-Robot Collaboration, safety mechanisms such as Speed and Separation Monitoring and Power and Force Limitation dynamically adjust the robot's speed based on human proximity. While essential for risk reduction, these mechanisms introduce slowdowns that makes cycle time estimation a hard task and impact job scheduling efficiency. Existing methods for estimating cycle times or designing schedulers often rely on predefined safety models, which may not accurately reflect real-world safety implementations, as these depend on case-specific risk assessments. In this paper, we propose a deep learning approach to predict the robot's safety scaling factor directly from process execution data. We analyze multiple neural network architectures and demonstrate that a simple feed-forward network effectively estimates the robot's slowdown. This capability is crucial for improving cycle time predictions and designing more effective scheduling algorithms in collaborative robotic environments.
comment: Accepted at IEEE Internation Conference on Emerging Technologies and Factory Automation (ETFA) 2025
Kinematics-Aware Diffusion Policy with Consistent 3D Observation and Action Space for Whole-Arm Robotic Manipulation
Whole-body control of robotic manipulators with awareness of full-arm kinematics is crucial for many manipulation scenarios involving body collision avoidance or body-object interactions, which makes it insufficient to consider only the end-effector poses in policy learning. The typical approach for whole-arm manipulation is to learn actions in the robot's joint space. However, the unalignment between the joint space and actual task space (i.e., 3D space) increases the complexity of policy learning, as generalization in task space requires the policy to intrinsically understand the non-linear arm kinematics, which is difficult to learn from limited demonstrations. To address this issue, this letter proposes a kinematics-aware imitation learning framework with consistent task, observation, and action spaces, all represented in the same 3D space. Specifically, we represent both robot states and actions using a set of 3D points on the arm body, naturally aligned with the 3D point cloud observations. This spatially consistent representation improves the policy's sample efficiency and spatial generalizability while enabling full-body control. Built upon the diffusion policy, we further incorporate kinematics priors into the diffusion processes to guarantee the kinematic feasibility of output actions. The joint angle commands are finally calculated through an optimization-based whole-body inverse kinematics solver for execution. Simulation and real-world experimental results demonstrate higher success rates and stronger spatial generalizability of our approach compared to existing methods in body-aware manipulation policy learning.
comment: The first two authors contributed equally. Project Website: https://kinematics-aware-diffusion-policy.github.io
Learning-Based Safety-Aware Task Scheduling for Efficient Human-Robot Collaboration
Ensuring human safety in collaborative robotics can compromise efficiency because traditional safety measures increase robot cycle time when human interaction is frequent. This paper proposes a safety-aware approach to mitigate efficiency losses without assuming prior knowledge of safety logic. Using a deep-learning model, the robot learns the relationship between system state and safety-induced speed reductions based on execution data. Our framework does not explicitly predict human motions but directly models the interaction effects on robot speed, simplifying implementation and enhancing generalizability to different safety logics. At runtime, the learned model optimizes task selection to minimize cycle time while adhering to safety requirements. Experiments on a pick-and-packaging scenario demonstrated significant reductions in cycle times.
comment: 8 pages
Deep Learning-based Robust Autonomous Navigation of Aerial Robots in Dense Forests
Autonomous aerial navigation in dense natural environments remains challenging due to limited visibility, thin and irregular obstacles, GNSS-denied operation, and frequent perceptual degradation. This work presents an improved deep learning-based navigation framework that integrates semantically enhanced depth encoding with neural motion-primitive evaluation for robust flight in cluttered forests. Several modules are incorporated on top of the original sevae-ORACLE algorithm to address limitations observed during real-world deployment, including lateral control for sharper maneuvering, a temporal consistency mechanism to suppress oscillatory planning decisions, a stereo-based visual-inertial odometry solution for drift-resilient state estimation, and a supervisory safety layer that filters unsafe actions in real time. A depth refinement stage is included to improve the representation of thin branches and reduce stereo noise, while GPU optimization increases onboard inference throughput from 4 Hz to 10 Hz. The proposed approach is evaluated against several existing learning-based navigation methods under identical environmental conditions and hardware constraints. It demonstrates higher success rates, more stable trajectories, and improved collision avoidance, particularly in highly cluttered forest settings. The system is deployed on a custom quadrotor in three boreal forest environments, achieving fully autonomous completion in all flights in moderate and dense clutter, and 12 out of 15 flights in highly dense underbrush. These results demonstrate improved reliability and safety over existing navigation methods in complex natural environments.
Adaptive Covariance and Quaternion-Focused Hybrid Error-State EKF/UKF for Visual-Inertial Odometry
This study presents an innovative hybrid Visual-Inertial Odometry (VIO) method for Unmanned Aerial Vehicles (UAVs) that is resilient to environmental challenges and capable of dynamically assessing sensor reliability. Built upon a loosely coupled sensor fusion architecture, the system utilizes a novel hybrid Quaternion-focused Error-State EKF/UKF (Qf-ES-EKF/UKF) architecture to process inertial measurement unit (IMU) data. This architecture first propagates the entire state using an Error-State Extended Kalman Filter (ESKF) and then applies a targeted Scaled Unscented Kalman Filter (SUKF) step to refine only the orientation. This sequential process blends the accuracy of SUKF in quaternion estimation with the overall computational efficiency of ESKF. The reliability of visual measurements is assessed via a dynamic sensor confidence score based on metrics, such as image entropy, intensity variation, motion blur, and inference quality, adapting the measurement noise covariance to ensure stable pose estimation even under challenging conditions. Comprehensive experimental analyses on the EuRoC MAV dataset demonstrate key advantages: an average improvement of 49% in position accuracy in challenging scenarios, an average of 57% in rotation accuracy over ESKF-based methods, and SUKF-comparable accuracy achieved with approximately 48% lower computational cost than a full SUKF implementation. These findings demonstrate that the presented approach strikes an effective balance between computational efficiency and estimation accuracy, and significantly enhances UAV pose estimation performance in complex environments with varying sensor reliability.
ImagineNav++: Prompting Vision-Language Models as Embodied Navigator through Scene Imagination
Visual navigation is a fundamental capability for autonomous home-assistance robots, enabling long-horizon tasks such as object search. While recent methods have leveraged Large Language Models (LLMs) to incorporate commonsense reasoning and improve exploration efficiency, their planning remains constrained by textual representations, which cannot adequately capture spatial occupancy or scene geometry--critical factors for navigation decisions. We explore whether Vision-Language Models (VLMs) can achieve mapless visual navigation using only onboard RGB/RGB-D streams, unlocking their potential for spatial perception and planning. We achieve this through an imagination-powered navigation framework, ImagineNav++, which imagines future observation images from candidate robot views and translates navigation planning into a simple best-view image selection problem for VLMs. First, a future-view imagination module distills human navigation preferences to generate semantically meaningful viewpoints with high exploration potential. These imagined views then serve as visual prompts for the VLM to identify the most informative viewpoint. To maintain spatial consistency, we develop a selective foveation memory mechanism, which hierarchically integrates keyframe observations via a sparse-to-dense framework, constructing a compact yet comprehensive memory for long-term spatial reasoning. This approach transforms goal-oriented navigation into a series of tractable point-goal navigation tasks. Extensive experiments on open-vocabulary object and instance navigation benchmarks show that ImagineNav++ achieves SOTA performance in mapless settings, even surpassing most map-based methods, highlighting the importance of scene imagination and memory in VLM-based spatial reasoning.
comment: 17 pages, 10 figures. arXiv admin note: text overlap with arXiv:2410.09874
Personalized Gait Patterns During Exoskeleton-Aided Training May Have Minimal Effect on User Experience. Insights from a Pilot Study
Robot-aided gait rehabilitation facilitates high-intensity and repeatable therapy. However, most exoskeletons rely on pre-recorded, non-personalized gait trajectories constrained to the sagittal plane, potentially limiting movement naturalness and user comfort. We present a data-driven gait personalization framework for an exoskeleton that supports multi-planar motion, including hip abduction/adduction and pelvic translation and rotation. Personalized trajectories to individual participants were generated using regression models trained on anthropometric, demographic, and walking speed data from a normative database. In a within-subject experiment involving ten unimpaired participants, these personalized trajectories were evaluated in regard to comfort, naturalness, and overall experience and compared against two standard patterns from the same database: one averaging all the trajectories, and one randomly selected. We did not find relevant differences across pattern conditions, despite all trajectories being executed with high accuracy thanks to a stiff position-derivative controller. We found, however, that pattern conditions in later trials were rated as more comfortable and natural than those in the first trial, suggesting that participants might have adapted to walking within the exoskeleton, regardless of the enforced gait pattern. Our findings highlight the importance of integrating subjective feedback when designing personalized gait controllers and accounting for user adaptation during experimentation.
TakeAD: Preference-based Post-optimization for End-to-end Autonomous Driving with Expert Takeover Data
Existing end-to-end autonomous driving methods typically rely on imitation learning (IL) but face a key challenge: the misalignment between open-loop training and closed-loop deployment. This misalignment often triggers driver-initiated takeovers and system disengagements during closed-loop execution. How to leverage those expert takeover data from disengagement scenarios and effectively expand the IL policy's capability presents a valuable yet unexplored challenge. In this paper, we propose TakeAD, a novel preference-based post-optimization framework that fine-tunes the pre-trained IL policy with this disengagement data to enhance the closed-loop driving performance. First, we design an efficient expert takeover data collection pipeline inspired by human takeover mechanisms in real-world autonomous driving systems. Then, this post optimization framework integrates iterative Dataset Aggregation (DAgger) for imitation learning with Direct Preference Optimization (DPO) for preference alignment. The DAgger stage equips the policy with fundamental capabilities to handle disengagement states through direct imitation of expert interventions. Subsequently, the DPO stage refines the policy's behavior to better align with expert preferences in disengagement scenarios. Through multiple iterations, the policy progressively learns recovery strategies for disengagement states, thereby mitigating the open-loop gap. Experiments on the closed-loop Bench2Drive benchmark demonstrate our method's effectiveness compared with pure IL methods, with comprehensive ablations confirming the contribution of each component.
Flying in Clutter on Monocular RGB by Learning in 3D Radiance Fields with Domain Adaptation
Modern autonomous navigation systems predominantly rely on lidar and depth cameras. However, a fundamental question remains: Can flying robots navigate in clutter using solely monocular RGB images? Given the prohibitive costs of real-world data collection, learning policies in simulation offers a promising path. Yet, deploying such policies directly in the physical world is hindered by the significant sim-to-real perception gap. Thus, we propose a framework that couples the photorealism of 3D Gaussian Splatting (3DGS) environments with Adversarial Domain Adaptation. By training in high-fidelity simulation while explicitly minimizing feature discrepancy, our method ensures the policy relies on domain-invariant cues. Experimental results demonstrate that our policy achieves robust zero-shot transfer to the physical world, enabling safe and agile flight in unstructured environments with varying illumination.
comment: 8 pages, 7 figures
Neuro-Symbolic Control with Large Language Models for Language-Guided Spatial Tasks
Although large language models (LLMs) have recently become effective tools for language-conditioned control in embodied systems, instability, slow convergence, and hallucinated actions continue to limit their direct application to continuous control. A modular neuro-symbolic control framework that clearly distinguishes between low-level motion execution and high-level semantic reasoning is proposed in this work. While a lightweight neural delta controller performs bounded, incremental actions in continuous space, a locally deployed LLM interprets symbolic tasks. We assess the suggested method in a planar manipulation setting with spatial relations between objects specified by language. Numerous tasks and local language models, such as Mistral, Phi, and LLaMA-3.2, are used in extensive experiments to compare LLM-only control, neural-only control, and the suggested LLM+DL framework. In comparison to LLM-only baselines, the results show that the neuro-symbolic integration consistently increases both success rate and efficiency, achieving average step reductions exceeding 70% and speedups of up to 8.83x while remaining robust to language model quality. The suggested framework enhances interpretability, stability, and generalization without any need of reinforcement learning or costly rollouts by controlling the LLM to symbolic outputs and allocating uninterpreted execution to a neural controller trained on artificial geometric data. These outputs show empirically that neuro-symbolic decomposition offers a scalable and principled way to integrate language understanding with ongoing control, this approach promotes the creation of dependable and effective language-guided embodied systems.
RecipeMasterLLM: Revisiting RoboEarth in the Era of Large Language Models
RoboEarth was a pioneering initiative in cloud robotics, establishing a foundational framework for robots to share and exchange knowledge about actions, objects, and environments through a standardized knowledge graph. Initially, this knowledge was predominantly hand-crafted by engineers using RDF triples within OWL Ontologies, with updates, such as changes in an object's pose, being asserted by the robot's control and perception routines. However, with the advent and rapid development of Large Language Models (LLMs), we believe that the process of knowledge acquisition can be significantly automated. To this end, we propose RecipeMasterLLM, a high-level planner, that generates OWL action ontologies based on a standardized knowledge graph in response to user prompts. This architecture leverages a fine-tuned LLM specifically trained to understand and produce action descriptions consistent with the RoboEarth standardized knowledge graph. Moreover, during the Retrieval-Augmented Generation (RAG) phase, environmental knowledge is supplied to the LLM to enhance its contextual understanding and improve the accuracy of the generated action descriptions.
A Service Robot's Guide to Interacting with Busy Customers
The growing use of service robots in hospitality highlights the need to understand how to effectively communicate with pre-occupied customers. This study investigates the efficacy of commonly used communication modalities by service robots, namely, acoustic/speech, visual display, and micromotion gestures in capturing attention and communicating intention with a user in a simulated restaurant scenario. We conducted a two-part user study (N=24) using a Temi robot to simulate delivery tasks, with participants engaged in a typing game (MonkeyType) to emulate a state of busyness. The participants' engagement in the typing game is measured by words per minute (WPM) and typing accuracy. In Part 1, we compared non-verbal acoustic cue versus baseline conditions to assess attention capture during a single-cup delivery task. In Part 2, we evaluated the effectiveness of speech, visual display, micromotion and their multimodal combination in conveying specific intentions (correct cup selection) during a two-cup delivery task. The results indicate that, while speech is highly effective in capturing attention, it is less successful in clearly communicating intention. Participants rated visual as the most effective modality for intention clarity, followed by speech, with micromotion being the lowest ranked.These findings provide insights into optimizing communication strategies for service robots, highlighting the distinct roles of attention capture and intention communication in enhancing user experience in dynamic hospitality settings.
comment: Presented at ACRA 2025. 10 pages, 4 figures. Includes a user study (N=24) using the Temi robot evaluating speech, visual, and micromotion modalities
Research on Dead Reckoning Algorithm for Self-Propelled Pipeline Robots in Three-Dimensional Complex Pipelines
In the field of gas pipeline location, existing pipeline location methods mostly rely on pipeline location instruments. However, when faced with complex and curved pipeline scenarios, these methods often fail due to problems such as cable entanglement and insufficient equipment flexibility. To address this pain point, we designed a self-propelled pipeline robot. This robot can autonomously complete the location work of complex and curved pipelines in complex pipe networks without external dragging. In terms of pipeline mapping technology, traditional visual mapping and laser mapping methods are easily affected by lighting conditions and insufficient features in the confined space of pipelines, resulting in mapping drift and divergence problems. In contrast, the pipeline location method that integrates inertial navigation and wheel odometers is less affected by pipeline environmental factors. Based on this, this paper proposes a pipeline robot location method based on extended Kalman filtering (EKF). Firstly, the body attitude angle is initially obtained through an inertial measurement unit (IMU). Then, the extended Kalman filtering algorithm is used to improve the accuracy of attitude angle estimation. Finally, high-precision pipeline location is achieved by combining wheel odometers. During the testing phase, the roll wheels of the pipeline robot needed to fit tightly against the pipe wall to reduce slippage. However, excessive tightness would reduce the flexibility of motion control due to excessive friction. Therefore, a balance needed to be struck between the robot's motion capability and positioning accuracy. Experiments were conducted using the self-propelled pipeline robot in a rectangular loop pipeline, and the results verified the effectiveness of the proposed dead reckoning algorithm.
comment: 8 pages, 9 figures
Design and Research of a Self-Propelled Pipeline Robot Based on Force Analysis and Dynamic Simulation
In pipeline inspection, traditional tethered inspection robots are severely constrained by cable length and weight, which greatly limit their travel range and accessibility. To address these issues, this paper proposes a self-propelled pipeline robot design based on force analysis and dynamic simulation, with a specific focus on solving core challenges including vertical climbing failure and poor passability in T-branch pipes. Adopting a wheeled configuration and modular design, the robot prioritizes the core demand of body motion control. Specifically, 3D modeling of the robot was first completed using SolidWorks. Subsequently, the model was imported into ADAMS for dynamic simulation, which provided a basis for optimizing the drive module and motion control strategy.To verify the robot's dynamic performance, an experimental platform with acrylic pipes was constructed. Through adjusting its body posture to surmount obstacles and select directions, the robot has demonstrated its ability to stably traverse various complex pipeline scenarios. Notably, this work offers a technical feasibility reference for the application of pipeline robots in the inspection of medium and low-pressure urban gas pipelines.
comment: 7 pages, 14 figures
Semantic Co-Speech Gesture Synthesis and Real-Time Control for Humanoid Robots
We present an innovative end-to-end framework for synthesizing semantically meaningful co-speech gestures and deploying them in real-time on a humanoid robot. This system addresses the challenge of creating natural, expressive non-verbal communication for robots by integrating advanced gesture generation techniques with robust physical control. Our core innovation lies in the meticulous integration of a semantics-aware gesture synthesis module, which derives expressive reference motions from speech input by leveraging a generative retrieval mechanism based on large language models (LLMs) and an autoregressive Motion-GPT model. This is coupled with a high-fidelity imitation learning control policy, the MotionTracker, which enables the Unitree G1 humanoid robot to execute these complex motions dynamically and maintain balance. To ensure feasibility, we employ a robust General Motion Retargeting (GMR) method to bridge the embodiment gap between human motion data and the robot platform. Through comprehensive evaluation, we demonstrate that our combined system produces semantically appropriate and rhythmically coherent gestures that are accurately tracked and executed by the physical robot. To our knowledge, this work represents a significant step toward general real-world use by providing a complete pipeline for automatic, semantic-aware, co-speech gesture generation and synchronized real-time physical deployment on a humanoid robot.
Conservative Bias in Multi-Teacher Learning: Why Agents Prefer Low-Reward Advisors
Interactive reinforcement learning (IRL) has shown promise in enabling autonomous agents and robots to learn complex behaviours from human teachers, yet the dynamics of teacher selection remain poorly understood. This paper reveals an unexpected phenomenon in IRL: when given a choice between teachers with different reward structures, learning agents overwhelmingly prefer conservative, low-reward teachers (93.16% selection rate) over those offering 20x higher rewards. Through 1,250 experimental runs in navigation tasks with multiple expert teachers, we discovered: (1) Conservative bias dominates teacher selection: agents systematically choose the lowest-reward teacher, prioritising consistency over optimality; (2) Critical performance thresholds exist at teacher availability rho >= 0.6 and accuracy omega >= 0.6, below which the framework fails catastrophically; (3) The framework achieves 159% improvement over baseline Q-learning under concept drift. These findings challenge fundamental assumptions about optimal teaching in RL and suggest potential implications for human-robot collaboration, where human preferences for safety and consistency may align with the observed agent selection behaviour, potentially informing training paradigms for safety-critical robotic applications.
comment: 10 pages, 5 figures. Accepted at ACRA 2025 (Australasian Conference on Robotics and Automation)
Towards Senior-Robot Interaction: Reactive Robot Dog Gestures
As the global population ages, many seniors face the problem of loneliness. Companion robots offer a potential solution. However, current companion robots often lack advanced functionality, while task-oriented robots are not designed for social interaction, limiting their suitability and acceptance by seniors. Our work introduces a senior-oriented system for quadruped robots that allows for more intuitive user input and provides more socially expressive output. For user input, we implemented a MediaPipe-based module for hand gesture and head movement recognition, enabling control without a remote. For output, we designed and trained robotic dog gestures using curriculum-based reinforcement learning in Isaac Gym, progressing from simple standing to three-legged balancing and leg extensions, and more. The final tests achieved over 95\% success on average in simulation, and we validated a key social gesture (the paw-lift) on a Unitree robot. Real-world tests demonstrated the feasibility and social expressiveness of this framework, while also revealing sim-to-real challenges in joint compliance, load distribution, and balance control. These contributions advance the development of practical quadruped robots as social companions for the senior and outline pathways for sim-to-real adaptation and inform future user studies.
comment: Accepted at the Australasian Conference on Robotics and Automation (ACRA) 2025
Towards Autonomous Navigation in Endovascular Interventions
Cardiovascular diseases remain the leading cause of global mortality, with minimally invasive treatment options offered through endovascular interventions. However, the precision and adaptability of current robotic systems for endovascular navigation are limited by heuristic control, low autonomy, and the absence of haptic feedback. This thesis presents an integrated AI-driven framework for autonomous guidewire navigation in complex vascular environments, addressing key challenges in data availability, simulation fidelity, and navigational accuracy. A high-fidelity, real-time simulation platform, CathSim, is introduced for reinforcement learning based catheter navigation, featuring anatomically accurate vascular models and contact dynamics. Building on CathSim, the Expert Navigation Network is developed, a policy that fuses visual, kinematic, and force feedback for autonomous tool control. To mitigate data scarcity, the open-source, bi-planar fluoroscopic dataset Guide3D is proposed, comprising more than 8,700 annotated images for 3D guidewire reconstruction. Finally, SplineFormer, a transformer-based model, is introduced to directly predict guidewire geometry as continuous B-spline parameters, enabling interpretable, real-time navigation. The findings show that combining high-fidelity simulation, multimodal sensory fusion, and geometric modelling substantially improves autonomous endovascular navigation and supports safer, more precise minimally invasive procedures.
SurgiPose: Estimating Surgical Tool Kinematics from Monocular Video for Surgical Robot Learning
Imitation learning (IL) has shown immense promise in enabling autonomous dexterous manipulation, including learning surgical tasks. To fully unlock the potential of IL for surgery, access to clinical datasets is needed, which unfortunately lack the kinematic data required for current IL approaches. A promising source of large-scale surgical demonstrations is monocular surgical videos available online, making monocular pose estimation a crucial step toward enabling large-scale robot learning. Toward this end, we propose SurgiPose, a differentiable rendering based approach to estimate kinematic information from monocular surgical videos, eliminating the need for direct access to ground truth kinematics. Our method infers tool trajectories and joint angles by optimizing tool pose parameters to minimize the discrepancy between rendered and real images. To evaluate the effectiveness of our approach, we conduct experiments on two robotic surgical tasks: tissue lifting and needle pickup, using the da Vinci Research Kit Si (dVRK Si). We train imitation learning policies with both ground truth measured kinematics and estimated kinematics from video and compare their performance. Our results show that policies trained on estimated kinematics achieve comparable success rates to those trained on ground truth data, demonstrating the feasibility of using monocular video based kinematic estimation for surgical robot learning. By enabling kinematic estimation from monocular surgical videos, our work lays the foundation for large scale learning of autonomous surgical policies from online surgical data.
comment: 8 pages, 6 figures, 2 tables
Design of a Polymer-based Steerable Cannula for Neurosurgical Applications
Robotically steerable compliant surgical tools offer several advantages over rigid tools, including enhanced dexterity, reduced tissue damage, and the ability to generate non-linear trajectories in minimally invasive neurosurgical procedures. Many existing robotic neurosurgical tools are designed using stainless steel or nitinol materials. Using polymer-based materials instead can offer advantages such as reduced interference in magnetic resonance imaging, enhanced safety for guiding electrically powered instruments, and reduced tissue damage due to inherent compliance. Several polymer materials have been used in robotic surgical applications, such as polyimide, polycarbonate, and elastic resin. Various fabrication strategies have also been proposed, including standard microfabrication techniques, thermal drawing, and 3-D printing. In our previous work, a tendon-driven, notched-tube was designed for several neurosurgical robotic tools, utilizing laser micromachining to reduce the stiffness of the tube in certain directions. This fabrication method is desirable because it has a single-step process, has high precision, and does not require a cleanroom or harsh chemicals. Past studies have explored laser-micromachining of polymer material for surgical applications such as stent fabrication. In this work, we explore extending the use of the laser micromachining approach to the fabrication of polyimide (PI) robotically steerable cannulas for neurosurgical applications. Utilizing the method presented in this work, we fabricated joints as small as 1.5 mm outer diameter (OD). Multiple joints were fabricated using PI tubes of different ODs, and the loading behavior of the fabricated joints was experimentally characterized.
comment: Presented at the 14th Conference on New Technologies for Computer and Robot Assisted Surgery (CRAS 2025)
Design and Integration of Thermal and Vibrotactile Feedback for Lifelike Touch in Social Robots
Zoomorphic Socially Assistive Robots (SARs) offer an alternative source of social touch for individuals who cannot access animal companionship. However, current SARs provide only limited, passive touch-based interactions and lack the rich haptic cues, such as warmth, heartbeat or purring, that are characteristic of human-animal touch. This limits their ability to evoke emotionally engaging, life-like physical interactions. We present a multimodal tactile prototype, which was used to augment the established PARO robot, integrating thermal and vibrotactile feedback to simulate feeling biophysiological signals. A flexible heating interface delivers body-like warmth, while embedded actuators generate heartbeat-like rhythms and continuous purring sensations. These cues were iteratively designed and calibrated with input from users and haptics experts. We outline the design process and offer reproducible guidelines to support the development of emotionally resonant and biologically plausible touch interactions with SARs.
Embodied4C: Measuring What Matters for Embodied Vision-Language Navigation
Vision-language navigation requires agents to reason and act under constraints of embodiment. While vision-language models (VLMs) demonstrate strong generalization, current benchmarks provide limited understanding of how embodiment -- i.e., the choice of physical platform, sensor configuration, and modality alignment -- influences perception, reasoning, and control. We introduce Embodied4C, a closed-loop benchmark designed as a Turing test for embodied reasoning. The benchmark evaluates the core embodied capabilities of VLMs across three heterogeneous embodiments -- autonomous vehicles, aerial drones, and robotic manipulators -- through approximately 1.1K one-shot reasoning questions and 58 goal-directed navigation tasks. These tasks jointly assess four foundational dimensions: semantic, spatial, temporal, and physical reasoning. Each embodiment presents dynamic sensor configurations and environment variations to probe generalization beyond platform-specific adaptation. To prevent embodiment overfitting, Embodied4C integrates domain-far queries targeting abstract and cross-context reasoning. Comprehensive evaluation across ten state-of-the-art VLMs and four embodied control baselines shows that cross-modal alignment and instruction tuning matter more than scale, while spatial and temporal reasoning remains the primary bottleneck for reliable embodied competence.
Robotic VLA Benefits from Joint Learning with Motion Image Diffusion
Vision-Language-Action (VLA) models have achieved remarkable progress in robotic manipulation by mapping multimodal observations and instructions directly to actions. However, they typically mimic expert trajectories without predictive motion reasoning, which limits their ability to reason about what actions to take. To address this limitation, we propose joint learning with motion image diffusion, a novel strategy that enhances VLA models with motion reasoning capabilities. Our method extends the VLA architecture with a dual-head design: while the action head predicts action chunks as in vanilla VLAs, an additional motion head, implemented as a Diffusion Transformer (DiT), predicts optical-flow-based motion images that capture future dynamics. The two heads are trained jointly, enabling the shared VLM backbone to learn representations that couple robot control with motion knowledge. This joint learning builds temporally coherent and physically grounded representations without modifying the inference pathway of standard VLAs, thereby maintaining test-time latency. Experiments in both simulation and real-world environments demonstrate that joint learning with motion image diffusion improves the success rate of pi-series VLAs to 97.5% on the LIBERO benchmark and 58.0% on the RoboTwin benchmark, yielding a 23% improvement in real-world performance and validating its effectiveness in enhancing the motion reasoning capability of large-scale VLAs.
comment: Website: https://vla-motion.github.io/
Unifying Deep Predicate Invention with Pre-trained Foundation Models
Long-horizon robotic tasks are hard due to continuous state-action spaces and sparse feedback. Symbolic world models help by decomposing tasks into discrete predicates that capture object properties and relations. Existing methods learn predicates either top-down, by prompting foundation models without data grounding, or bottom-up, from demonstrations without high-level priors. We introduce UniPred, a bilevel learning framework that unifies both. UniPred uses large language models (LLMs) to propose predicate effect distributions that supervise neural predicate learning from low-level data, while learned feedback iteratively refines the LLM hypotheses. Leveraging strong visual foundation model features, UniPred learns robust predicate classifiers in cluttered scenes. We further propose a predicate evaluation method that supports symbolic models beyond STRIPS assumptions. Across five simulated and one real-robot domains, UniPred achieves 2-4 times higher success rates than top-down methods and 3-4 times faster learning than bottom-up approaches, advancing scalable and flexible symbolic world modeling for robotics.
comment: 18 pages, 11 figures
mimic-video: Video-Action Models for Generalizable Robot Control Beyond VLAs
Prevailing Vision-Language-Action Models (VLAs) for robotic manipulation are built upon vision-language backbones pretrained on large-scale, but disconnected static web data. As a result, despite improved semantic generalization, the policy must implicitly infer complex physical dynamics and temporal dependencies solely from robot trajectories. This reliance creates an unsustainable data burden, necessitating continuous, large-scale expert data collection to compensate for the lack of innate physical understanding. We contend that while vision-language pretraining effectively captures semantic priors, it remains blind to physical causality. A more effective paradigm leverages video to jointly capture semantics and visual dynamics during pretraining, thereby isolating the remaining task of low-level control. To this end, we introduce mimic-video, a novel Video-Action Model (VAM) that pairs a pretrained Internet-scale video model with a flow matching-based action decoder conditioned on its latent representations. The decoder serves as an Inverse Dynamics Model (IDM), generating low-level robot actions from the latent representation of video-space action plans. Our extensive evaluation shows that our approach achieves state-of-the-art performance on simulated and real-world robotic manipulation tasks, improving sample efficiency by 10x and convergence speed by 2x compared to traditional VLA architectures.
comment: Revised Introduction, Related Work, and Appendix. Additional minor notational and grammatical fixes
Maintaining the Level of a Payload carried by Multi-Robot System on Irregular Surface
In this paper, we introduce a multi robot payload transport system to carry payloads through an environment of unknown and uneven inclinations while maintaining the desired orientation of the payload. For this task, we used custom built robots with a linear actuator (pistons) mounted on top of each robot. The system continuously monitors the payload's orientation and computes the required piston height of each robot to maintain the desired orientation of the payload. In this work, we propose an open loop controller coupled with a closed loop PID controller to achieve the goal. As our modelling makes no assumptions on the type of terrain, the system can work on any unknown and uneven terrains and inclinations. We showcase the efficacy of our proposed controller by testing it on various simulated environments with varied and complex terrains.
Mitigating Undesired Conditions in Flexible Production with Product-Process-Resource Asset Knowledge Graphs ISWC 2025
Contemporary industrial cyber-physical production systems (CPPS) composed of robotic workcells face significant challenges in the analysis of undesired conditions due to the flexibility of Industry 4.0 that disrupts traditional quality assurance mechanisms. This paper presents a novel industry-oriented semantic model called Product-Process-Resource Asset Knowledge Graph (PPR-AKG), which is designed to analyze and mitigate undesired conditions in flexible CPPS. Built on top of the well-proven Product-Process-Resource (PPR) model originating from ISA-95 and VDI-3682, a comprehensive OWL ontology addresses shortcomings of conventional model-driven engineering for CPPS, particularly inadequate undesired condition and error handling representation. The integration of semantic technologies with large language models (LLMs) provides intuitive interfaces for factory operators, production planners, and engineers to interact with the entire model using natural language. Evaluation with the use case addressing electric vehicle battery remanufacturing demonstrates that the PPR-AKG approach efficiently supports resource allocation based on explicitly represented capabilities as well as identification and mitigation of undesired conditions in production. The key contributions include (1) a holistic PPR-AKG model capturing multi-dimensional production knowledge, and (2) the useful combination of the PPR-AKG with LLM-based chatbots for human interaction.
comment: Originally published online by CEUR Workshop Proceedings (CEUR-WS.org, ISSN 1613-0073) within ISWC 2025 Companion Volume. Available online: https://ceur-ws.org/Vol-4085/ and https://ceur-ws.org/Vol-4085/paper9.pdf
Cooperative Task Spaces for Multi-Arm Manipulation Control based on Similarity Transformations
Many tasks in human environments require collaborative behavior between multiple kinematic chains, either to provide additional support for carrying big and bulky objects or to enable the dexterity that is required for in-hand manipulation. Since these complex systems often have a very high number of degrees of freedom coordinating their movements is notoriously difficult to model. In this article, we present the derivation of the theoretical foundations for cooperative task spaces of multi-arm robotic systems based on geometric primitives defined using conformal geometric algebra. Based on the similarity transformations of these cooperative geometric primitives, we derive an abstraction of complex robotic systems that enables representing these systems in a way that directly corresponds to single-arm systems. By deriving the associated analytic and geometric Jacobian matrices, we then show the straightforward integration of our approach into classical control techniques rooted in operational space control. We demonstrate this using bimanual manipulators, humanoids and multi-fingered hands in optimal control experiments for reaching desired geometric primitives and in teleoperation experiments using differential kinematics control. We then discuss how the geometric primitives naturally embed nullspace structures into the controllers that can be exploited for introducing secondary control objectives. This work, represents the theoretical foundations of this cooperative manipulation control framework, and thus the experiments are presented in an abstract way, while giving pointers towards potential future applications.
MILE: A Mechanically Isomorphic Exoskeleton Data Collection System with Fingertip Visuotactile Sensing for Dexterous Manipulation
Imitation learning provides a promising approach to dexterous hand manipulation, but its effectiveness is limited by the lack of large-scale, high-fidelity data. Existing data-collection pipelines suffer from inaccurate motion retargeting, low data-collection efficiency, and missing high-resolution fingertip tactile sensing. We address this gap with MILE, a mechanically isomorphic teleoperation and data-collection system co-designed from human hand to exoskeleton to robotic hand. The exoskeleton is anthropometrically derived from the human hand, and the robotic hand preserves one-to-one joint-position isomorphism, eliminating nonlinear retargeting and enabling precise, natural control. The exoskeleton achieves a multi-joint mean absolute angular error below one degree, while the robotic hand integrates compact fingertip visuotactile modules that provide high-resolution tactile observations. Built on this retargeting-free interface, we teleoperate complex, contact-rich in-hand manipulation and efficiently collect a multimodal dataset comprising high-resolution fingertip visuotactile signals, RGB-D images, and joint positions. The teleoperation pipeline achieves a mean success rate improvement of 64%. Incorporating fingertip tactile observations further increases the success rate by an average of 25% over the vision-only baseline, validating the fidelity and utility of the dataset. Further details are available at: https://sites.google.com/view/mile-system.
VLA-AN: An Efficient and Onboard Vision-Language-Action Framework for Aerial Navigation in Complex Environments
This paper proposes VLA-AN, an efficient and onboard Vision-Language-Action (VLA) framework dedicated to autonomous drone navigation in complex environments. VLA-AN addresses four major limitations of existing large aerial navigation models: the data domain gap, insufficient temporal navigation with reasoning, safety issues with generative action policies, and onboard deployment constraints. First, we construct a high-fidelity dataset utilizing 3D Gaussian Splatting (3D-GS) to effectively bridge the domain gap. Second, we introduce a progressive three-stage training framework that sequentially reinforces scene comprehension, core flight skills, and complex navigation capabilities. Third, we design a lightweight, real-time action module coupled with geometric safety correction. This module ensures fast, collision-free, and stable command generation, mitigating the safety risks inherent in stochastic generative policies. Finally, through deep optimization of the onboard deployment pipeline, VLA-AN achieves a robust real-time 8.3x improvement in inference throughput on resource-constrained UAVs. Extensive experiments demonstrate that VLA-AN significantly improves spatial grounding, scene reasoning, and long-horizon navigation, achieving a maximum single-task success rate of 98.1%, and providing an efficient, practical solution for realizing full-chain closed-loop autonomy in lightweight aerial robots.
CBMC-V3: A CNS-inspired Control Framework Towards Manipulation Agility with SNN
As robotic arm applications extend beyond industrial settings into service-oriented sectors such as catering, household and retail, existing control algorithms struggle to achieve the agile manipulation required for complex environments with dynamic trajectories, unpredictable interactions, and diverse objects. This paper presents a biomimetic control framework based on Spiking Neural Networks (SNNs), inspired by the human Central Nervous System (CNS), to achieve agile control in such environments. The proposed framework features five control modules (cerebral cortex, cerebellum, thalamus, brainstem, and spinal cord), three hierarchical control levels (first-order, second-order, and third-order), and two information pathways (ascending and descending). Each module is fully implemented using SNN. The spinal cord module uses spike encoding and Leaky Integrate-and-Fire (LIF) neurons for feedback control. The brainstem module employs a network of LIF and non-spiking LIF neurons to dynamically adjust spinal cord parameters via reinforcement learning. The thalamus module similarly employs a network of LIF and non-spiking LIF neurons to adjust the cerebellum's torque outputs via reinforcement learning. The cerebellum module, which provides feedfoward gravity compensation torques, uses a recurrent SNN to learn the robotic arm's dynamics through regression. The framework is validated both in simulation and on real-world robotic arm platform under various loads and trajectories. Results demonstrate that our method outperforms the industrial-grade position control in manipulation agility.
An Anatomy of Vision-Language-Action Models: From Modules to Milestones and Challenges
Vision-Language-Action (VLA) models are driving a revolution in robotics, enabling machines to understand instructions and interact with the physical world. This field is exploding with new models and datasets, making it both exciting and challenging to keep pace with. This survey offers a clear and structured guide to the VLA landscape. We design it to follow the natural learning path of a researcher: we start with the basic Modules of any VLA model, trace the history through key Milestones, and then dive deep into the core Challenges that define recent research frontier. Our main contribution is a detailed breakdown of the five biggest challenges in: (1) Representation, (2) Execution, (3) Generalization, (4) Safety, and (5) Dataset and Evaluation. This structure mirrors the developmental roadmap of a generalist agent: establishing the fundamental perception-action loop, scaling capabilities across diverse embodiments and environments, and finally ensuring trustworthy deployment-all supported by the essential data infrastructure. For each of them, we review existing approaches and highlight future opportunities. We position this paper as both a foundational guide for newcomers and a strategic roadmap for experienced researchers, with the dual aim of accelerating learning and inspiring new ideas in embodied intelligence. A live version of this survey, with continuous updates, is maintained on our \href{https://suyuz1.github.io/VLA-Survey-Anatomy/}{project page}.
comment: project page: https://suyuz1.github.io/VLA-Survey-Anatomy/
MiVLA: Towards Generalizable Vision-Language-Action Model with Human-Robot Mutual Imitation Pre-training
While leveraging abundant human videos and simulated robot data poses a scalable solution to the scarcity of real-world robot data, the generalization capability of existing vision-language-action models (VLAs) remains limited by mismatches in camera views, visual appearance, and embodiment morphologies. To overcome this limitation, we propose MiVLA, a generalizable VLA empowered by human-robot mutual imitation pre-training, which leverages inherent behavioral similarity between human hands and robotic arms to build a foundation of strong behavioral priors for both human actions and robotic control. Specifically, our method utilizes kinematic rules with left/right hand coordinate systems for bidirectional alignment between human and robot action spaces. Given human or simulated robot demonstrations, MiVLA is trained to forecast behavior trajectories for one embodiment, and imitate behaviors for another one unseen in the demonstration. Based on this mutual imitation, it integrates the behavioral fidelity of real-world human data with the manipulative diversity of simulated robot data into a unified model, thereby enhancing the generalization capability for downstream tasks. Extensive experiments conducted on both simulation and real-world platforms with three robots (ARX, PiPer and LocoMan), demonstrate that MiVLA achieves strong improved generalization capability, outperforming state-of-the-art VLAs (e.g., $\boldsymbolπ_{0}$, $\boldsymbolπ_{0.5}$ and H-RDT) by 25% in simulation, and 14% in real-world robot control tasks.
Phantom Menace: Exploring and Enhancing the Robustness of VLA Models Against Physical Sensor Attacks AAAI 2026
Vision-Language-Action (VLA) models revolutionize robotic systems by enabling end-to-end perception-to-action pipelines that integrate multiple sensory modalities, such as visual signals processed by cameras and auditory signals captured by microphones. This multi-modality integration allows VLA models to interpret complex, real-world environments using diverse sensor data streams. Given the fact that VLA-based systems heavily rely on the sensory input, the security of VLA models against physical-world sensor attacks remains critically underexplored. To address this gap, we present the first systematic study of physical sensor attacks against VLAs, quantifying the influence of sensor attacks and investigating the defenses for VLA models. We introduce a novel "Real-Sim-Real" framework that automatically simulates physics-based sensor attack vectors, including six attacks targeting cameras and two targeting microphones, and validates them on real robotic systems. Through large-scale evaluations across various VLA architectures and tasks under varying attack parameters, we demonstrate significant vulnerabilities, with susceptibility patterns that reveal critical dependencies on task types and model designs. We further develop an adversarial-training-based defense that enhances VLA robustness against out-of-distribution physical perturbations caused by sensor attacks while preserving model performance. Our findings expose an urgent need for standardized robustness benchmarks and mitigation strategies to secure VLA deployments in safety-critical environments.
comment: Accepted by AAAI 2026 main track
Collaborative Object Handover in a Robot Crafting Assistant
Robots are increasingly working alongside people, delivering food to patrons in restaurants or helping workers on assembly lines. These scenarios often involve object handovers between the person and the robot. To achieve safe and efficient human-robot collaboration (HRC), it is important to incorporate human context in a robot's handover strategies. We develop a collaborative handover model trained on human teleoperation data collected in a naturalistic crafting task. To evaluate its performance, we conduct cross-validation experiments on the training dataset as well as a user study in the same HRC crafting task. The handover episodes and user perceptions of the autonomous handover policy were compared with those of the human teleoperated handovers. While the cross-validation experiment and user study indicate that the autonomous policy successfully achieved collaborative handovers, the comparison with human teleoperation revealed avenues for further improvements.
comment: Published at the 2025 Australasian Conference on Robotics and Automation (ACRA 2025): https://www.araa.asn.au/conference/acra-2025/
DHP: Discrete Hierarchical Planning for Hierarchical Reinforcement Learning Agents
Hierarchical Reinforcement Learning (HRL) agents often struggle with long-horizon visual planning due to their reliance on error-prone distance metrics. We propose Discrete Hierarchical Planning (DHP), a method that replaces continuous distance estimates with discrete reachability checks to evaluate subgoal feasibility. DHP recursively constructs tree-structured plans by decomposing long-term goals into sequences of simpler subtasks, using a novel advantage estimation strategy that inherently rewards shorter plans and generalizes beyond training depths. In addition, to address the data efficiency challenge, we introduce an exploration strategy that generates targeted training examples for the planning modules without needing expert data. Experiments in 25-room navigation environments demonstrate a 100% success rate (vs. 90% baseline). We also present an offline variant that achieves state-of-the-art results on OGBench benchmarks, with up to 71% absolute gains on giant HumanoidMaze tasks, demonstrating our core contributions are architecture-agnostic. The method also generalizes to momentum-based control tasks and requires only log N steps for replanning. Theoretical analysis and ablations validate our design choices.
Efficient Image-Goal Navigation with Representative Latent World Model
World models enable robots to conduct counterfactual reasoning in physical environments by predicting future world states. While conventional approaches often prioritize pixel-level reconstruction of future scenes, such detailed rendering is computationally intensive and unnecessary for planning tasks like navigation. We therefore propose that prediction and planning can be efficiently performed directly within a latent space of high-level semantic representations. To realize this, we introduce the Representative Latent space Navigation World Model (ReL-NWM). Rather than relying on reconstructionoriented latent embeddings, our method leverages a pre-trained representation encoder, DINOv3, and incorporates specialized mechanisms to effectively integrate action signals and historical context within this representation space. By operating entirely in the latent domain, our model bypasses expensive explicit reconstruction and achieves highly efficient navigation planning. Experiments show state-of-the-art trajectory prediction and image-goal navigation performance on multiple benchmarks. Additionally, we demonstrate real-world applicability by deploying the system on a Unitree G1 humanoid robot, confirming its efficiency and robustness in practical navigation scenarios.
WebATLAS: An LLM Agent with Experience-Driven Memory and Action Simulation NeurIPS 2025
Large Language Model (LLM) web agents often struggle with long-horizon web navigation and web task completion in new websites, producing inefficient action sequences unless fine-tuned on environment-specific data. We show that experience-driven memory, combined with look-ahead action simulation, is sufficient for LLM agents to adapt to unseen web environments by remembering past failures and predicting the consequences of future actions. We introduce WebATLAS (Actor-Critic Task-completion with Look-ahead Action Simulation), a memory-augmented LLM web agent that learns a lightweight internal model of the environment from interaction experience and performs hypothetical action rollouts before acting in the real world. WebATLAS builds a persistent cognitive map via curiosity-driven exploration, stores interaction outcomes as experience-based memory, and evaluates candidate actions in cognitive space using a planner--simulator--critic loop. This enables the agent to reuse past experience, avoid previously unsuccessful behaviors, and generate more efficient plans. We evaluate WebATLAS on the WebArena-Lite benchmark for autonomous web navigation and demonstrate a success rate of 63%, outperforming the previous state-of-the-art at 53.9%. Unlike previous systems, our modular architecture requires no website-specific LLM fine-tuning. Ablation studies confirm that experience-driven memory, look-ahead action simulation, and hierarchical replanning play complementary roles in enabling robust, training-free web agents.
comment: 9 pages, NeurIPS 2025 Workshop on Language Agents and World Models
HEIGHT: Heterogeneous Interaction Graph Transformer for Robot Navigation in Crowded and Constrained Environments
We study the problem of robot navigation in dense and interactive crowds with static constraints such as corridors and furniture. Previous methods fail to consider all types of spatial and temporal interactions among agents and obstacles, leading to unsafe and inefficient robot paths. In this article, we leverage a graph-based representation of crowded and constrained scenarios and propose a structured framework to learn robot navigation policies with deep reinforcement learning. We first split the representations of different inputs and propose a heterogeneous spatio-temporal graph to model distinct interactions among humans, robots, and obstacles. Based on the heterogeneous spatio-temporal graph, we propose HEIGHT, a novel navigation policy network architecture with different components to capture heterogeneous interactions through space and time. HEIGHT utilizes attention mechanisms to prioritize important interactions and a recurrent network to track changes in the dynamic scene over time, encouraging the robot to avoid collisions adaptively. Through extensive simulation and real-world experiments, we demonstrate that HEIGHT outperforms state-of-the-art baselines in terms of success, navigation time, and generalization to domain shifts in challenging navigation scenarios. More information is available at https://sites.google.com/view/crowdnav-height/home.
comment: Accepted to IEEE Transactions of Automation Science and Engineering (T-ASE)
Towards a Multi-Embodied Grasping Agent
Multi-embodiment grasping focuses on developing approaches that exhibit generalist behavior across diverse gripper designs. Existing methods often learn the kinematic structure of the robot implicitly and face challenges due to the difficulty of sourcing the required large-scale data. In this work, we present a data-efficient, flow-based, equivariant grasp synthesis architecture that can handle different gripper types with variable degrees of freedom and successfully exploit the underlying kinematic model, deducing all necessary information solely from the gripper and scene geometry. Unlike previous equivariant grasping methods, we translated all modules from the ground up to JAX and provide a model with batching capabilities over scenes, grippers, and grasps, resulting in smoother learning, improved performance and faster inference time. Our dataset encompasses grippers ranging from humanoid hands to parallel yaw grippers and includes 25,000 scenes and 20 million grasps.
comment: 8 pages, 3 figures
Multiagent Systems
Verifiability-First Agents: Provable Observability and Lightweight Audit Agents for Controlling Autonomous LLM Systems
As LLM-based agents grow more autonomous and multi-modal, ensuring they remain controllable, auditable, and faithful to deployer intent becomes critical. Prior benchmarks measured the propensity for misaligned behavior and showed that agent personalities and tool access significantly influence misalignment. Building on these insights, we propose a Verifiability-First architecture that (1) integrates run-time attestations of agent actions using cryptographic and symbolic methods, (2) embeds lightweight Audit Agents that continuously verify intent versus behavior using constrained reasoning, and (3) enforces challenge-response attestation protocols for high-risk operations. We introduce OPERA (Observability, Provable Execution, Red-team, Attestation), a benchmark suite and evaluation protocol designed to measure (i) detectability of misalignment, (ii) time to detection under stealthy strategies, and (iii) resilience of verifiability mechanisms to adversarial prompt and persona injection. Our approach shifts the evaluation focus from how likely misalignment is to how quickly and reliably misalignment can be detected and remediated.
MAPPO-LCR: Multi-Agent Policy Optimization with Local Cooperation Reward in Spatial Public Goods Games
Spatial public goods games model collective dilemmas where individual payoffs depend on population-level strategy configurations. Most existing studies rely on evolutionary update rules or value-based reinforcement learning methods. These approaches struggle to represent payoff coupling and non-stationarity in large interacting populations. This work introduces Multi-Agent Proximal Policy Optimization (MAPPO) into spatial public goods games for the first time. In these games, individual returns are intrinsically coupled through overlapping group interactions. Proximal Policy Optimization (PPO) treats agents as independent learners and ignores this coupling during value estimation. MAPPO addresses this limitation through a centralized critic that evaluates joint strategy configurations. To study neighborhood-level cooperation signals under this framework, we propose MAPPO with Local Cooperation Reward, termed MAPPO-LCR. The local cooperation reward aligns policy updates with surrounding cooperative density without altering the original game structure. MAPPO-LCR preserves decentralized execution while enabling population-level value estimation during training. Extensive simulations demonstrate stable cooperation emergence and reliable convergence across enhancement factors. Statistical analyses further confirm the learning advantage of MAPPO over PPO in spatial public goods games.
Efficient Mixture-of-Agents Serving via Tree-Structured Routing, Adaptive Pruning, and Dependency-Aware Prefill-Decode Overlap
Mixture-of-Agents (MoA) inference can suffer from dense inter-agent communication and low hardware utilization, which jointly inflate serving latency. We present a serving design that targets these bottlenecks through an algorithm-system co-design. First, we replace dense agent interaction graphs with a hierarchical tree topology that induces structured sparsity in inter-agent communication. Second, we introduce a runtime adaptive mechanism that selectively terminates or skips downstream agent invocations using semantic agreement and confidence signals from intermediate outputs. Third, we pipeline agent execution by overlapping incremental prefilling with decoding across dependency-related agents, improving utilization and reducing inference latency. Across representative tasks, this approach substantially reduces end-to-end latency (up to 90%) while maintaining comparable accuracy (within $\pm$1%) relative to dense-connectivity MoA baselines, and can improve accuracy in certain settings.
comment: 13 pages, 4 figures, submitted to Design Automation Conference (DAC) 2026
Rethinking Multi-Agent Intelligence Through the Lens of Small-World Networks
Large language models (LLMs) have enabled multi-agent systems (MAS) in which multiple agents argue, critique, and coordinate to solve complex tasks, making communication topology a first-class design choice. Yet most existing LLM-based MAS either adopt fully connected graphs, simple sparse rings, or ad-hoc dynamic selection, with little structural guidance. In this work, we revisit classic theory on small-world (SW) networks and ask: what changes if we treat SW connectivity as a design prior for MAS? We first bridge insights from neuroscience and complex networks to MAS, highlighting how SW structures balance local clustering and long-range integration. Using multi-agent debate (MAD) as a controlled testbed, experiment results show that SW connectivity yields nearly the same accuracy and token cost, while substantially stabilizing consensus trajectories. Building on this, we introduce an uncertainty-guided rewiring scheme for scaling MAS, where long-range shortcuts are added between epistemically divergent agents using LLM-oriented uncertainty signals (e.g., semantic entropy). This yields controllable SW structures that adapt to task difficulty and agent heterogeneity. Finally, we discuss broader implications of SW priors for MAS design, framing them as stabilizers of reasoning, enhancers of robustness, scalable coordinators, and inductive biases for emergent cognitive roles.
comment: Under Review
Adaptive Agents in Spatial Double-Auction Markets: Modeling the Emergence of Industrial Symbiosis AAMAS
Industrial symbiosis fosters circularity by enabling firms to repurpose residual resources, yet its emergence is constrained by socio-spatial frictions that shape costs, matching opportunities, and market efficiency. Existing models often overlook the interaction between spatial structure, market design, and adaptive firm behavior, limiting our understanding of where and how symbiosis arises. We develop an agent-based model where heterogeneous firms trade byproducts through a spatially embedded double-auction market, with prices and quantities emerging endogenously from local interactions. Leveraging reinforcement learning, firms adapt their bidding strategies to maximize profit while accounting for transport costs, disposal penalties, and resource scarcity. Simulation experiments reveal the economic and spatial conditions under which decentralized exchanges converge toward stable and efficient outcomes. Counterfactual regret analysis shows that sellers' strategies approach a near Nash equilibrium, while sensitivity analysis highlights how spatial structures and market parameters jointly govern circularity. Our model provides a basis for exploring policy interventions that seek to align firm incentives with sustainability goals, and more broadly demonstrates how decentralized coordination can emerge from adaptive agents in spatially constrained markets.
comment: AAMAS CC-BY 4.0 licence. Adaptive Agents in Spatial Double-Auction Markets: Modeling the Emergence of Industrial Symbiosis. Full paper. In Proc. of the 25th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2026), Paphos, Cyprus, May 25 - 29, 2026, IFAAMAS, 10 pages
Finch: Benchmarking Finance & Accounting across Spreadsheet-Centric Enterprise Workflows
We introduce a finance & accounting benchmark (Finch) for evaluating AI agents on real-world, enterprise-grade professional workflows -- interleaving data entry, structuring, formatting, web search, cross-file retrieval, calculation, modeling, validation, translation, visualization, and reporting. Finch is sourced from authentic enterprise workspaces at Enron (15,000 spreadsheets and 500,000 emails from 150 employees) and other financial institutions, preserving in-the-wild messiness across multimodal artifacts (text, tables, formulas, charts, code, and images) and spanning diverse domains such as budgeting, trading, and asset management. We propose a workflow construction process that combines LLM-assisted discovery with expert annotation: (1) LLM-assisted, expert-verified derivation of workflows from real-world email threads and version histories of spreadsheet files, and (2) meticulous expert annotation for workflows, requiring over 700 hours of domain-expert effort. This yields 172 composite workflows with 384 tasks, involving 1,710 spreadsheets with 27 million cells, along with PDFs and other artifacts, capturing the intrinsically messy, long-horizon, knowledge-intensive, and collaborative nature of real-world enterprise work. We conduct both human and automated evaluations of frontier AI systems including GPT 5.1, Claude Sonnet 4.5, Gemini 3 Pro, Grok 4, and Qwen 3 Max, and GPT 5.1 Pro spends 48 hours in total yet passes only 38.4% of workflows, while Claude Sonnet 4.5 passes just 25.0%. Comprehensive case studies further surface the challenges that real-world enterprise workflows pose for AI agents.
Parallelism Meets Adaptiveness: Scalable Documents Understanding in Multi-Agent LLM Systems AAAI 2026
Large language model (LLM) agents have shown increasing promise for collaborative task completion. However, existing multi-agent frameworks often rely on static workflows, fixed roles, and limited inter-agent communication, reducing their effectiveness in open-ended, high-complexity domains. This paper proposes a coordination framework that enables adaptiveness through three core mechanisms: dynamic task routing, bidirectional feedback, and parallel agent evaluation. The framework allows agents to reallocate tasks based on confidence and workload, exchange structured critiques to iteratively improve outputs, and crucially compete on high-ambiguity subtasks with evaluator-driven selection of the most suitable result. We instantiate these principles in a modular architecture and demonstrate substantial improvements in factual coverage, coherence, and efficiency over static and partially adaptive baselines. Our findings highlight the benefits of incorporating both adaptiveness and structured competition in multi-agent LLM systems.
comment: Accepted at AAAI 2026 Workshop on WoMAPF, Camera ready version
WebATLAS: An LLM Agent with Experience-Driven Memory and Action Simulation NeurIPS 2025
Large Language Model (LLM) web agents often struggle with long-horizon web navigation and web task completion in new websites, producing inefficient action sequences unless fine-tuned on environment-specific data. We show that experience-driven memory, combined with look-ahead action simulation, is sufficient for LLM agents to adapt to unseen web environments by remembering past failures and predicting the consequences of future actions. We introduce WebATLAS (Actor-Critic Task-completion with Look-ahead Action Simulation), a memory-augmented LLM web agent that learns a lightweight internal model of the environment from interaction experience and performs hypothetical action rollouts before acting in the real world. WebATLAS builds a persistent cognitive map via curiosity-driven exploration, stores interaction outcomes as experience-based memory, and evaluates candidate actions in cognitive space using a planner--simulator--critic loop. This enables the agent to reuse past experience, avoid previously unsuccessful behaviors, and generate more efficient plans. We evaluate WebATLAS on the WebArena-Lite benchmark for autonomous web navigation and demonstrate a success rate of 63%, outperforming the previous state-of-the-art at 53.9%. Unlike previous systems, our modular architecture requires no website-specific LLM fine-tuning. Ablation studies confirm that experience-driven memory, look-ahead action simulation, and hierarchical replanning play complementary roles in enabling robust, training-free web agents.
comment: 9 pages, NeurIPS 2025 Workshop on Language Agents and World Models
MTTR-A: Measuring Cognitive Recovery Latency in Multi-Agent Systems
Reliability in multi-agent systems (MAS) built on large language models is increasingly limited by cognitive failures rather than infrastructure faults. Existing observability tools describe failures but do not quantify how quickly distributed reasoning recovers once coherence is lost. We introduce MTTR-A (Mean Time-to-Recovery for Agentic Systems), a runtime reliability metric that measures cognitive recovery latency in MAS. MTTR-A adapts classical dependability theory to agentic orchestration, capturing the time required to detect reasoning drift and restore coherent operation. We further define complementary metrics, including MTBF and a normalized recovery ratio (NRR), and establish theoretical bounds linking recovery latency to long-run cognitive uptime. Using a LangGraph-based benchmark with simulated drift and reflex recovery, we empirically demonstrate measurable recovery behavior across multiple reflex strategies. This work establishes a quantitative foundation for runtime cognitive dependability in distributed agentic systems.
comment: preprint
STAIR: Stability criterion for Time-windowed Assignment and Internal adversarial influence in Routing and decision-making
A major limitation of existing routing algorithms for multi-agent systems is that they are designed without considering the potential presence of adversarial agents in the decision-making loop, which could lead to severe performance degradation in real-life applications where adversarial agents may be present. We study autonomous pickup-and-delivery routing problems in which adversarial agents launch coordinated denial-of-service attacks by spoofing their locations. This deception causes the central scheduler to assign pickup requests to adversarial agents instead of cooperative agents. Adversarial agents then choose not to service the requests with the goal of disrupting the operation of the system, leading to delays, cancellations, and potential instability in the routing policy. Policy stability in routing problems is typically defined as the cost of the policy being uniformly bounded over time, and it has been studied through two different lenses: queuing theory and reinforcement learning (RL), which are not well suited for routing with adversaries. In this paper, we propose a new stability criterion, STAIR, which is easier to analyze than queuing-theory-based stability in adversarial settings. Furthermore, STAIR does not depend on a chosen discount factor as is the case in discounted RL stability. STAIR directly links stability to desired operational metrics, like a finite number of rejected requests. This characterization is particularly useful in adversarial settings as it provides a metric for monitoring the effect of adversaries in the operation of the system. Furthermore, we demonstrate STAIR's practical relevance through simulations on real-world San Francisco mobility-on-demand data. We also identify a phenomenon of degenerate stability that arises in the adversarial routing problem, and we introduce time-window constraints in the decision-making algorithm to mitigate it.
comment: Requires major changes
Systems and Control (EESS)
Distributionally Robust Imitation Learning: Layered Control Architecture for Certifiable Autonomy
Imitation learning (IL) enables autonomous behavior by learning from expert demonstrations. While more sample-efficient than comparative alternatives like reinforcement learning, IL is sensitive to compounding errors induced by distribution shifts. There are two significant sources of distribution shifts when using IL-based feedback laws on systems: distribution shifts caused by policy error and distribution shifts due to exogenous disturbances and endogenous model errors due to lack of learning. Our previously developed approaches, Taylor Series Imitation Learning (TaSIL) and $\mathcal{L}_1$ -Distributionally Robust Adaptive Control (\ellonedrac), address the challenge of distribution shifts in complementary ways. While TaSIL offers robustness against policy error-induced distribution shifts, \ellonedrac offers robustness against distribution shifts due to aleatoric and epistemic uncertainties. To enable certifiable IL for learned and/or uncertain dynamical systems, we formulate \textit{Distributionally Robust Imitation Policy (DRIP)} architecture, a Layered Control Architecture (LCA) that integrates TaSIL and~\ellonedrac. By judiciously designing individual layer-centric input and output requirements, we show how we can guarantee certificates for the entire control pipeline. Our solution paves the path for designing fully certifiable autonomy pipelines, by integrating learning-based components, such as perception, with certifiable model-based decision-making through the proposed LCA approach.
comment: 18 pages, 5 figures
NeuRehab: A Reinforcement Learning and Spiking Neural Network-Based Rehab Automation Framework
Recent advancements in robotic rehabilitation therapy have provided modular exercise systems for post-stroke muscle recovery with basic control schemes. But these systems struggle to adapt to patients' complex and ever-changing behaviour, and to operate within mobile settings, such as heat and power. To aid this, we present NeuRehab: an end-to-end framework consisting of a training and inference pipeline with AI-based automation, co-designed with neuromorphic computing-based control systems that balance action performance, power consumption, and observed latency. The framework consists of 2 partitions. One is designated for the rehabilitation device based on ultra-low power spiking networks deployed on dedicated neuromorphic hardware. The other resides on stationary hardware that can accommodate computationally intensive hardware for fine-tuning on a per-patient basis. By maintaining a communication channel between both the modules and splitting the algorithm components, the power and latency requirements of the movable system have been optimised, while retaining the learning performance advantages of compute- and power-hungry hardware on the stationary machine. As part of the framework, we propose (a) the split machine learning processes for efficiency in architectural utilisation, and (b) task-specific temporal optimisations to lower edge-inference control latency. This paper evaluates the proposed methods on a reference stepper motor-based shoulder exercise. Overall, these methods offer comparable performance uplifts over the State-of-the-art for neuromorphic deployment, while achieving over 60% savings in both power and latency during inference compared to standard implementations.
comment: 18 pages, 22 figures, 3 tables, for submission to IOP Neuromorphic Computing and Engineering
Techno-Economic Case Study of a Rural Local Electricity Community in Switzerland
Local Electricity Communities (communautés électriques locales, CEL) will become operational in Switzerland in 2026, allowing prosumers, consumers, and storage operators within the same municipality and distribution system operator (DSO) area to exchange electricity over the public grid with reduced distribution tariffs. This report examines a rural Swiss case study to explore the techno-economic implications of CELs for both participants and the local DSO. The findings indicate that CELs can enhance the local use of renewable generation, particularly photovoltaics, and offer modest financial gains, with outcomes strongly shaped by community size, composition, and tariff design. Larger and more heterogeneous communities achieve better internal matching of supply and demand, though the overall incentive remains limited because the tariff reduction applies only to distribution charges. The study further shows that internal energy exchange is maximized when local PV generation covers roughly 1-2 times the community load. For DSOs, CELs reduce grid imports (27-46%), resulting in a substantial reduction in distribution tariff revenues (17-36%), necessitating regulatory adaptation. While centralized batteries provide economic value to members, their technical impact on the grid remains modest due to their small, economically optimized capacity. Larger centralized storage is shown to reduce transformer peak power, but risks increasing line loading, suggesting a need for careful sizing and placement.
comment: 33 pages and 12 figures
UAV-Enabled ISAC: Towards On-Demand Sensing Services and Enhanced Communication
In this paper, we investigate an integrated sensing-and-communication (ISAC) network enabled by an unmanned aerial vehicle (UAV). The UAV is supposed to fly along a periodical circular trajectory at a fixed height for ISAC service supply from the sky. We consider on-demand sensing services, where on-demand detection and on-demand localization requests may be activated at any time toward any position within the targeted serving region. While guaranteeing satisfactory accuracy for both on-demand sensing tasks, we aim at maximizing the minimum achievable throughput among all communication users, via joint optimizing the UAV trajectory and communication user scheduling. To address the complicated problem with infinite sensing constraints, we characterize the on-demand detection constraint as a restricted deployment area for UAV and the on-demand localization constraint as Cramer-Rao Bound (CRB) constraints over finite reference target points, based on which the original problem is simplified to more tractable one. Afterwards, particularly aiming to ensure no violations of CRB constraints, we propose a convex approximation for the reformulated problem, where tight approximation is guaranteed at given local solution. The construction strategy for convex problem approximation allows an efficient iterative algorithm with verified convergence to a superior suboptimal solution. At last, with simulations, we verified the applicability of our developed optimization scheme in strictly fulfilling the on-demand sensing constraints and the effectiveness of our proposed solution for simultaneously enhancing the communication throughput in UAV-enabled ISAC.
Layer-to-layer Closed-loop Switched Heating and Cooling Control of the Laser Powder Bed Fusion Process
This study investigates the stabilization of interlayer temperature in the laser powder bed fusion process through a novel switched layer-to-layer closed-loop feedback controller. The controller architecture aims to measure the interlayer temperature by a laterally positioned thermal camera and maintain a preset reference temperature by switching between the heating mode through dynamic laser power adjustment and the cooling mode by assigning interlayer dwell time to allow cooling between layers. The switching controller employs a feedback optimization control algorithm for the heating mode to adjust the laser power, and a triggering algorithm that increases the interlayer dwell time until the interlayer temperature reaches the reference value. Additionally, the study compares the performance of the proposed controller in both supported and unsupported overhanging parts to evaluate the effect of support structures on the controller performance as well as the thermal behavior of overhanging parts. Results demonstrate the controller's effectiveness in stabilizing interlayer temperature across varying cross-sectional areas while remaining within the material's stable processing zone. In the heating mode, the controller efficiently stabilizes temperature, even in geometries with significant cross-section variation. The study also identifies trade-offs among process efficiency, energy consumption, and build time. Supported parts exhibit reduced overheating but consume more energy and material, while unsupported parts stabilize interlayer temperature faster but with longer build times due to increased dwell time assignments. The research highlights notable improvements in interlayer temperature control for geometries prone to excessive thermal stresses. Moreover, the introduction of interlayer dwell time offers a practical solution to maintaining thermal stability in complex geometries.
Dispatch-Aware Learning for Optimal Transmission Switching
Optimal transmission switching (OTS) improves optimal power flow (OPF) by selectively opening transmission lines, but its mixed-integer formulation increases computational complexity, especially on large grids. To deal with this, we propose a dispatch-aware deep neural network (DA-DNN) that accelerates DC-OTS without relying on pre-solved labels. DA-DNN predicts line states and passes them through an embedded differentiable DC-OPF layer, using the resulting generation cost as the loss function so that all physical network constraints are enforced throughout training and inference. In addition, we adopt a customized weight-bias initialization that keeps every forward pass feasible from the first iteration, which allows stable learning. Once trained, the proposed DA-DNN successfully produces a feasible topology and dispatch pair in the same time as solving the DCOPF, whereas conventional mixed-integer solvers become intractable. Moreover, the embedded OPF layer enables DA-DNN to generalize to untrained system configurations, such as changes in line flow limits. As a result, the proposed method successfully captures the economic advantages of OTS while maintaining scalability and generalization ability.
comment: 10 pages, 5 figures
Linear Attention for Joint Power Optimization and User-Centric Clustering in Cell-Free Networks
Optimal AP clustering and power allocation are critical in user-centric cell-free massive MIMO systems. Existing deep learning models lack flexibility to handle dynamic network configurations. Furthermore, many approaches overlook pilot contamination and suffer from high computational complexity. In this paper, we propose a lightweight transformer model that overcomes these limitations by jointly predicting AP clusters and powers solely from spatial coordinates of user devices and AP. Our model is architecture-agnostic to users load, handles both clustering and power allocation without channel estimation overhead, and eliminates pilot contamination by assigning users to AP within a pilot reuse constraint. We also incorporate a customized linear attention mechanism to capture user-AP interactions efficiently and enable linear scalability with respect to the number of users. Numerical results confirm the model's effectiveness in maximizing the minimum spectral efficiency and providing near-optimal performance while ensuring adaptability and scalability in dynamic scenarios.
comment: Submitted
Grey graphs and its application
In multi-attribute decision-making problems where the attribute values are interval grey numbers, a simplified form based on kernels and the degree of greyness is presented. Combining fuzzy graph theory with the kernel and the degree of greyness of interval grey numbers, grey graphs and their corresponding operation rules are presented. This paper presents a new multi-attribute decision-making method based on grey graph theory. We analyzed and evaluated the alternative schemes using grey graph. Lastly, a numerical example was conducted in order to demonstrate the effectiveness and feasibility of the proposed method.
Uncertainty-Aware 3D UAV Tracking Using Single-Anchor UWB Measurements
In this letter, we present an uncertainty-aware single-anchor Ultra-Wideband (UWB)-based 3D tracking framework. Specifically, a mobile Unmanned Aerial Vehicle (UAV) maintains a desired standoff distance to a moving target using range and 3D bearing measurements from a multi-antenna UWB anchor rigidly mounted on the UAV. To enhance the stability and safety under measurement degradation and motion uncertainty, we jointly design a robust factor-graph-based target localization method and a covariance-aware control Lyapunov function--control barrier function (CLF--CBF) tracking controller. This controller adaptively adjusts distance bounds and safety margins based on the posterior target covariance provided by the factor graph. The proposed system is evaluated through numerical simulations and real-world experiments carried out in a narrow indoor corridor environment.
Cooperative Energy Scheduling of Multi-Microgrids Based on Risk-Sensitive Reinforcement Learning
With the rapid development of distributed renewable energy, multi-microgrids play an increasingly important role in improving the flexibility and reliability of energy supply. Reinforcement learning has shown great potential in coordination strategies due to its model-free nature. Current methods lack explicit quantification of the relationship between individual and joint risk values, resulting in obscured credit assignment. Moreover, they often depend on explicit communication, which becomes inefficient as system complexity grows. To address these challenges, this paper proposes a risk-sensitive reinforcement learning framework with shared memory (RRL-SM) for multi-microgrid scheduling. Specifically, a risk-sensitive value factorization scheme is proposed to quantify the relationship between individual and joint risk values by leveraging distributional modeling and attention-based representations, thereby aligning local decisions with global risk objectives. An implicit shared-memory coordination mechanism is implemented through a global memory space to enhance the overall efficiency of decentralized decision-making. Collectively, the integrated approach delivers more reliable cooperative scheduling under renewable energy uncertainty. Simulation results show that RRL-SM reduces load-shedding risk by 84.5%, demonstrating a favorable balance between reliability and economic performance.
Bayesian Holonic Systems: Equilibrium, Uniqueness, and Computation
This paper addresses the challenge of modeling and control in hierarchical, multi-agent systems, known as holonic systems, where local agent decisions are coupled with global systemic outcomes. We introduce the Bayesian Holonic Equilibrium (BHE), a concept that ensures consistency between agent-level rationality and system-wide emergent behavior. We establish the theoretical soundness of the BHE by showing its existence and, under stronger regularity conditions, its uniqueness. We propose a two-time scale learning algorithm to compute such an equilibrium. This algorithm mirrors the system's structure, with a fast timescale for intra-holon strategy coordination and a slow timescale for inter-holon belief adaptation about external risks. The convergence of the algorithm to the theoretical equilibrium is validated through a numerical experiment on a continuous public good game. This work provides a complete theoretical and algorithmic framework for the principled design and analysis of strategic risk in complex, coupled control systems.
Timing-Aware Two-Player Stochastic Games with Self-Triggered Control
We study self-triggered two-player stochastic games on Piecewise Deterministic Markov Processes (PDMPs) where each agent decides when to observe and which open-loop action to hold. Augmenting the state with clocks and committed controls yields flow regions (both hold) and trigger surfaces (at least one updates). The framework covers both blind simultaneous (Nash) timing and observable sequential (Stackelberg) commitments; the former leads to coupled, intractable QVIs, while the latter admits a nested Hamilton-Jacobi-Bellman quasi-variational inequality and a tractable dynamic-programming decomposition. We outline a computational scheme based on implicit differentiation of the follower's fixed point. A pursuit-evasion example illustrates the strategic timing interaction.
A Games-in-Games Paradigm for Strategic Hybrid Jump-Diffusions: Hamilton-Jacobi-Isaacs Hierarchy and Spectral Structure
This paper develops a hierarchical games-in-games control architecture for hybrid stochastic systems governed by regime-switching jump-diffusions. We model the interplay between continuous state dynamics and discrete mode transitions as a bilevel differential game: an inner layer solves a robust stochastic control problem within each regime, while a strategic outer layer modulates the transition intensities of the underlying Markov chain. A Dynkin-based analysis yields a system of coupled Hamilton-Jacobi-Isaacs (HJI) equations. We prove that for the class of Linear-Quadratic games and Exponential-Affine games, this hierarchy admits tractable semi-closed form solutions via coupled matrix differential equations. We prove that for the class of Linear-Quadratic games and Exponential-Affine games, this hierarchy admits tractable semi-closed form solutions via coupled matrix differential equations. The framework is demonstrated through a case study on adversarial market microstructure, showing how the outer layer's strategic switching pre-emptively adjusts inventory spreads against latent regime risks, which leads to a hyper-alert equilibrium.
Review of Power Electronic Solutions for Dielectric Barrier Discharge Applications
This paper presents a comprehensive review of dielectric barrier discharge (DBD) power supply topologies, aiming to bridge the gap between DBD applications and power electronics design. Two key aspects are examined: the dependence of the DBD electrical model on reactor geometry, and application-driven requirements for injected waveform characteristics, including shapes, voltage amplitude, frequency, and modulation techniques. On this basis, the paper systematically reviews two major categories of power supplies: sinusoidal types comprising transformerless and transformer-based resonant inverters, and pulsed power supplies (PPSs). The review summarizes performance trade-offs, highlights untested topologies and emerging applications, and offers guidance for advancing high-performance DBD power supply design for next-generation systems.
comment: 26 pages, 32 figures. Under conditional acceptance at IEEE Transactions on Power Electronics
Robustness of Delayed Higher Order Sliding Mode Control
In this paper, the feasibility of recently developed higher order delayed sliding mode controllers is addressed. With this aim the robustness against the measurement noise and mismatched perturbations for the systems governed by such controllers is established using ISS implicit Lyapunov-Razumikhin function approach. To illustrate proposed results, a simulation example validating the efficiency of the method is provided.
comment: 16 pages, 3 figures
Reduced-order asymptotic observer of nonlinear friction for precise motion control
Nonlinear friction has long been, and continues to be, one of the major challenges for precision motion control systems. A linear asymptotic observer of the motion state variables with nonlinear friction uses a dedicated state-space representation of the dynamic friction force (including pre-sliding) [25], which is robust to the noisy input signals. The observer implements the reduced-order Luenberger observation law, while assuming the output displacement is the only available measurement. The latter is relevant for multiple motion control applications with use of encoder-type sensors. The uniform asymptotic stability and convergence analysis of the proposed observer are elaborated in the present work by using the Lyapunov function-based stability criterion by Ignatyev and imposing parametric constraints on the time-dependent eigenvalues of the system matrix to be always negative real. A design procedure for assigning a dominant, and thus slowest, real pole of the observer is proposed. Explanative numerical examples accompany the developed analysis. In addition, a thorough experimental evaluation is given for the proposed observer-based friction compensation which is performed for positioning and tracking tasks. The observer-based compensation, which can serve as a plug-in to a standard feedback controller, extends a PID feedback control that is optimally tuned for disturbance suppression. The experimental results are compared with and without the plugged-in observer-based compensator.
comment: 10 pages, 6 figures
Combinatorial Control Barrier Functions: Nested Boolean and p-choose-r Compositions of Safety Constraints
This paper investigates the problem of composing multiple control barrier functions (CBFs) -- and matrix control barrier functions (MCBFs) -- through logical and combinatorial operations. Standard CBF formulations naturally enable conjunctive (AND) combinations, but disjunctive (OR) and more general logical structures introduce nonsmoothness and possibly a combinatorial blow-up in the number of logical combinations. We introduce the framework of combinatorial CBFs that addresses p-choose-r safety specifications and their nested composition. The proposed framework ensures safety for the exact safe set in a scalable way, using the original number of primitive constraints. We establish theoretical guarantees on safety under these compositions, and we demonstrate their use on a patrolling problem in a multi-agent system.
comment: 6 pages, 3 figures, Accepted for publication in Control System Letters (L-CSS) with the possibility of presenting at the American Control Conference (ACC) 2026
The Battle of the Water Futures
The highly anticipated 'Battle of the Water Networks' is back with a new challenge for the water community. This competition will be hosted at the 4th International Joint Conference on Water Distribution Systems Analysis and Computing and Control in the Water Industry (WDSA/CCWI 2026), taking place in Paphos, Cyprus, from May 18-21, 2026. This competition embodies the core mission of Water-Futures and the theme for WDSA/CCWI 2026: "Designing the next generation of urban water (and wastewater) systems." The objective is to design and operate a water distribution system over a long-term horizon under deep uncertainty, with interventions applied in stages. For the first time, this challenge features a staged-design approach, unobservable and unknown uncertainties, and incorporates elements of policymaking and artificial intelligence. The solutions will be assessed using a transparent and inspectable open-source evaluation framework.
Quantifying the Dunkelflaute -- An analysis of variable renewable energy droughts in Europe
Variable renewable energy droughts, also called "Dunkelflaute", emerge as a challenge for climate-neutral energy systems based on variable renewables. Here we characterize European drought events for on- and offshore wind power, solar photovoltaics, and renewable technology portfolios, using 38 historic weather years and an advanced identification method. Their characteristics heavily depend on the chosen drought threshold, questioning the usefulness of single-threshold analyses. Applying a multi-threshold framework, we quantify how the complementarity of wind and solar power temporally and spatially alleviates drought frequency, return periods, duration, and severity within (portfolio effect) and across countries (balancing effect). We identify the most extreme droughts, which drive major discharging periods of long-duration storage in a fully renewable European energy system, based on a policy-relevant decarbonization scenario. Such events comprise sequences of shorter droughts of varying severity. The most extreme event occurred in winter 1996/97 and lasted 55 days in an idealized, perfectly interconnected setting. The average renewable availability during this period was still 47% of its long-run mean. System planners must consider such events when planning for storage and other flexibility technologies. Methodologically, we conclude that using arbitrary single calendar years is not suitable for modeling weather-resilient energy scenarios.
Cooperative Task Spaces for Multi-Arm Manipulation Control based on Similarity Transformations
Many tasks in human environments require collaborative behavior between multiple kinematic chains, either to provide additional support for carrying big and bulky objects or to enable the dexterity that is required for in-hand manipulation. Since these complex systems often have a very high number of degrees of freedom coordinating their movements is notoriously difficult to model. In this article, we present the derivation of the theoretical foundations for cooperative task spaces of multi-arm robotic systems based on geometric primitives defined using conformal geometric algebra. Based on the similarity transformations of these cooperative geometric primitives, we derive an abstraction of complex robotic systems that enables representing these systems in a way that directly corresponds to single-arm systems. By deriving the associated analytic and geometric Jacobian matrices, we then show the straightforward integration of our approach into classical control techniques rooted in operational space control. We demonstrate this using bimanual manipulators, humanoids and multi-fingered hands in optimal control experiments for reaching desired geometric primitives and in teleoperation experiments using differential kinematics control. We then discuss how the geometric primitives naturally embed nullspace structures into the controllers that can be exploited for introducing secondary control objectives. This work, represents the theoretical foundations of this cooperative manipulation control framework, and thus the experiments are presented in an abstract way, while giving pointers towards potential future applications.
Decentralized Voltage Control of AC Microgrids with Constant Power Loads using Control Barrier Functions
This paper proposes a novel nonlinear decentralized voltage controller for constrained regulation of meshed AC Microgrid networks with high penetration of time-varying constant power loads. Modelling the load demand as a constantly evolving unknown disturbance, the network model is reformulated in a cascaded structure composed of a nominal, \ie uncertainty-free, and an error subsystem. By adopting a suitable control barrier function, we formulate a continuous-time control law and derive analytic conditions on the tuning parameters, such that the distance between the true and the nominal state trajectories is bounded. Under sufficient conditions, we prove asymptotic stability of the cascaded dynamics with respect to an equilibrium set and also provide an estimate of the region of attraction. In addition, it is rigorously shown that the proposed nonlinear control law enforces constrained regulation around a rated voltage value, without the need of saturation devices. The operation of the closed-loop system is illustrated both via simulation and real-time HIL scenarios, demonstrating bounded operation and convergence to a neighbourhood of the desired reference vector.
comment: 12 pages
Nonlinear moving horizon estimation for robust state and parameter estimation -- extended version
We propose a moving horizon estimation scheme to estimate the states and the unknown constant parameters of general nonlinear uncertain discrete-time systems. The proposed framework and analysis explicitly do not involve the a priori verification of a particular excitation condition for the parameters. Instead, we use online information about the actual excitation of the parameters at any time during operation and ensure that the regularization term in the cost function is always automatically selected appropriately. This ensures that the state and parameter estimation error is bounded for all times, even if the parameters are never (or only rarely) excited during operation. Robust exponential stability of the state and parameter estimation error emerges under an additional uniform condition on the maximum duration of insufficient excitation. The theoretical results are illustrated by a numerical example.
comment: Replaced by version corresponding to the final published paper
Observer-based Differentially Private Consensus for Linear Multi-agent Systems
This paper investigates the differentially private consensus problem for general linear multi-agent systems (MASs) based on output feedback protocols. To protect the output information, which is considered private data and may be at high risk of exposure, Laplace noise is added to the information exchange. The conditions for achieving mean square and almost sure consensus in observer-based MASs are established using the backstepping method and the convergence theory for nonnegative almost supermartingales. It is shown that the separation principle remains valid for the consensus problem of linear MASs with decaying Laplace noise. Furthermore, the convergence rate is provided. Then, a joint design framework is developed for state estimation gain, feedback control gain, and noise to ensure the preservation of ε-differential privacy. The output information of each agent is shown to be protected at every time step. Finally, sufficient conditions are established for simultaneously achieving consensus and preserving differential privacy for linear MASs utilizing both full-order and reduced-order observers. Meanwhile, an ε*-differentially private consensus is achieved to meet the desired privacy level. Two simulation examples are provided to validate the theoretical results.
Data-Driven Nonconvex Reachability Analysis using Exact Multiplication
This paper addresses a fundamental challenge in data-driven reachability analysis: accurately representing and propagating non-convex reachable sets. We propose a novel approach using constrained polynomial zonotopes to describe reachable sets for unknown LTI systems. Unlike constrained zonotopes commonly used in existing literature, constrained polynomial zonotopes are closed under multiplication with constrained matrix zonotopes. We leverage this property to develop an exact multiplication method that preserves the non-convex geometry of reachable sets without resorting to approximations. We demonstrate that our approach provides tighter over-approximations of reachable sets for LTI systems compared to conventional methods.
comment: This paper has been accepted at the 64th IEEE Conference on Decision and Control (CDC 2025)
BattBee: Equivalent Circuit Modeling and Early Detection of Thermal Runaway Triggered by Internal Short Circuits for Lithium-Ion Batteries
Lithium-ion batteries are the enabling power source for transportation electrification. However, in real-world applications, they remain vulnerable to internal short circuits (ISCs) and the consequential risk of thermal runaway (TR). Toward addressing the challenge of ISCs and TR, we undertake a systematic study that extends from dynamic modeling to fault detection in this paper. First, we develop {\em BattBee}, the first equivalent circuit model to specifically describe the onset of ISCs and the evolution of subsequently induced TR. Drawing upon electrochemical modeling, the model can simulate ISCs at different severity levels and predict their impact on the initiation and progression of TR events. With the physics-inspired design, this model offers strong physical interpretability and predictive accuracy, while maintaining structural simplicity to allow fast computation. Then, building upon the BattBee model, we develop fault detection observers and derive detection criteria together with decision-making logics to identify the occurrence and emergence of ISC and TR events. This detection approach is principled in design and fast in computation, lending itself to practical applications. Validation based on simulations and experimental data demonstrates the effectiveness of both the BattBee model and the ISC/TR detection approach. The research outcomes underscore this study's potential for real-world battery safety risk management.
comment: 19 pages, 15 figures, 2 tables
MTTR-A: Measuring Cognitive Recovery Latency in Multi-Agent Systems
Reliability in multi-agent systems (MAS) built on large language models is increasingly limited by cognitive failures rather than infrastructure faults. Existing observability tools describe failures but do not quantify how quickly distributed reasoning recovers once coherence is lost. We introduce MTTR-A (Mean Time-to-Recovery for Agentic Systems), a runtime reliability metric that measures cognitive recovery latency in MAS. MTTR-A adapts classical dependability theory to agentic orchestration, capturing the time required to detect reasoning drift and restore coherent operation. We further define complementary metrics, including MTBF and a normalized recovery ratio (NRR), and establish theoretical bounds linking recovery latency to long-run cognitive uptime. Using a LangGraph-based benchmark with simulated drift and reflex recovery, we empirically demonstrate measurable recovery behavior across multiple reflex strategies. This work establishes a quantitative foundation for runtime cognitive dependability in distributed agentic systems.
comment: preprint
STAIR: Stability criterion for Time-windowed Assignment and Internal adversarial influence in Routing and decision-making
A major limitation of existing routing algorithms for multi-agent systems is that they are designed without considering the potential presence of adversarial agents in the decision-making loop, which could lead to severe performance degradation in real-life applications where adversarial agents may be present. We study autonomous pickup-and-delivery routing problems in which adversarial agents launch coordinated denial-of-service attacks by spoofing their locations. This deception causes the central scheduler to assign pickup requests to adversarial agents instead of cooperative agents. Adversarial agents then choose not to service the requests with the goal of disrupting the operation of the system, leading to delays, cancellations, and potential instability in the routing policy. Policy stability in routing problems is typically defined as the cost of the policy being uniformly bounded over time, and it has been studied through two different lenses: queuing theory and reinforcement learning (RL), which are not well suited for routing with adversaries. In this paper, we propose a new stability criterion, STAIR, which is easier to analyze than queuing-theory-based stability in adversarial settings. Furthermore, STAIR does not depend on a chosen discount factor as is the case in discounted RL stability. STAIR directly links stability to desired operational metrics, like a finite number of rejected requests. This characterization is particularly useful in adversarial settings as it provides a metric for monitoring the effect of adversaries in the operation of the system. Furthermore, we demonstrate STAIR's practical relevance through simulations on real-world San Francisco mobility-on-demand data. We also identify a phenomenon of degenerate stability that arises in the adversarial routing problem, and we introduce time-window constraints in the decision-making algorithm to mitigate it.
comment: Requires major changes
Robotics
DVGT: Driving Visual Geometry Transformer
Perceiving and reconstructing 3D scene geometry from visual inputs is crucial for autonomous driving. However, there still lacks a driving-targeted dense geometry perception model that can adapt to different scenarios and camera configurations. To bridge this gap, we propose a Driving Visual Geometry Transformer (DVGT), which reconstructs a global dense 3D point map from a sequence of unposed multi-view visual inputs. We first extract visual features for each image using a DINO backbone, and employ alternating intra-view local attention, cross-view spatial attention, and cross-frame temporal attention to infer geometric relations across images. We then use multiple heads to decode a global point map in the ego coordinate of the first frame and the ego poses for each frame. Unlike conventional methods that rely on precise camera parameters, DVGT is free of explicit 3D geometric priors, enabling flexible processing of arbitrary camera configurations. DVGT directly predicts metric-scaled geometry from image sequences, eliminating the need for post-alignment with external sensors. Trained on a large mixture of driving datasets including nuScenes, OpenScene, Waymo, KITTI, and DDAD, DVGT significantly outperforms existing models on various scenarios. Code is available at https://github.com/wzzheng/DVGT.
comment: Code is available at https://github.com/wzzheng/DVGT
Posterior Behavioral Cloning: Pretraining BC Policies for Efficient RL Finetuning
Standard practice across domains from robotics to language is to first pretrain a policy on a large-scale demonstration dataset, and then finetune this policy, typically with reinforcement learning (RL), in order to improve performance on deployment domains. This finetuning step has proved critical in achieving human or super-human performance, yet while much attention has been given to developing more effective finetuning algorithms, little attention has been given to ensuring the pretrained policy is an effective initialization for RL finetuning. In this work we seek to understand how the pretrained policy affects finetuning performance, and how to pretrain policies in order to ensure they are effective initializations for finetuning. We first show theoretically that standard behavioral cloning (BC) -- which trains a policy to directly match the actions played by the demonstrator -- can fail to ensure coverage over the demonstrator's actions, a minimal condition necessary for effective RL finetuning. We then show that if, instead of exactly fitting the observed demonstrations, we train a policy to model the posterior distribution of the demonstrator's behavior given the demonstration dataset, we do obtain a policy that ensures coverage over the demonstrator's actions, enabling more effective finetuning. Furthermore, this policy -- which we refer to as the posterior behavioral cloning (PostBC) policy -- achieves this while ensuring pretrained performance is no worse than that of the BC policy. We then show that PostBC is practically implementable with modern generative models in robotic control domains -- relying only on standard supervised learning -- and leads to significantly improved RL finetuning performance on both realistic robotic control benchmarks and real-world robotic manipulation tasks, as compared to standard behavioral cloning.
MomaGraph: State-Aware Unified Scene Graphs with Vision-Language Model for Embodied Task Planning
Mobile manipulators in households must both navigate and manipulate. This requires a compact, semantically rich scene representation that captures where objects are, how they function, and which parts are actionable. Scene graphs are a natural choice, yet prior work often separates spatial and functional relations, treats scenes as static snapshots without object states or temporal updates, and overlooks information most relevant for accomplishing the current task. To address these limitations, we introduce MomaGraph, a unified scene representation for embodied agents that integrates spatial-functional relationships and part-level interactive elements. However, advancing such a representation requires both suitable data and rigorous evaluation, which have been largely missing. We thus contribute MomaGraph-Scenes, the first large-scale dataset of richly annotated, task-driven scene graphs in household environments, along with MomaGraph-Bench, a systematic evaluation suite spanning six reasoning capabilities from high-level planning to fine-grained scene understanding. Built upon this foundation, we further develop MomaGraph-R1, a 7B vision-language model trained with reinforcement learning on MomaGraph-Scenes. MomaGraph-R1 predicts task-oriented scene graphs and serves as a zero-shot task planner under a Graph-then-Plan framework. Extensive experiments demonstrate that our model achieves state-of-the-art results among open-source models, reaching 71.6% accuracy on the benchmark (+11.4% over the best baseline), while generalizing across public benchmarks and transferring effectively to real-robot experiments.
comment: 25 pages, 10 figures. Project page:https://hybridrobotics.github.io/MomaGraph/
Flowing from Reasoning to Motion: Learning 3D Hand Trajectory Prediction from Egocentric Human Interaction Videos
Prior works on 3D hand trajectory prediction are constrained by datasets that decouple motion from semantic supervision and by models that weakly link reasoning and action. To address these, we first present the EgoMAN dataset, a large-scale egocentric dataset for interaction stage-aware 3D hand trajectory prediction with 219K 6DoF trajectories and 3M structured QA pairs for semantic, spatial, and motion reasoning. We then introduce the EgoMAN model, a reasoning-to-motion framework that links vision-language reasoning and motion generation via a trajectory-token interface. Trained progressively to align reasoning with motion dynamics, our approach yields accurate and stage-aware trajectories with generalization across real-world scenes.
comment: Project website: https://egoman-project.github.io
Sceniris: A Fast Procedural Scene Generation Framework
Synthetic 3D scenes are essential for developing Physical AI and generative models. Existing procedural generation methods often have low output throughput, creating a significant bottleneck in scaling up dataset creation. In this work, we introduce Sceniris, a highly efficient procedural scene generation framework for rapidly generating large-scale, collision-free scene variations. Sceniris also provides an optional robot reachability check, providing manipulation-feasible scenes for robot tasks. Sceniris is designed for maximum efficiency by addressing the primary performance limitations of the prior method, Scene Synthesizer. Leveraging batch sampling and faster collision checking in cuRobo, Sceniris achieves at least 234x speed-up over Scene Synthesizer. Sceniris also expands the object-wise spatial relationships available in prior work to support diverse scene requirements. Our code is available at https://github.com/rai-inst/sceniris
comment: Code is available at https://github.com/rai-inst/sceniris
PolaRiS: Scalable Real-to-Sim Evaluations for Generalist Robot Policies
A significant challenge for robot learning research is our ability to accurately measure and compare the performance of robot policies. Benchmarking in robotics is historically challenging due to the stochasticity, reproducibility, and time-consuming nature of real-world rollouts. This challenge is exacerbated for recent generalist policies, which has to be evaluated across a wide variety of scenes and tasks. Evaluation in simulation offers a scalable complement to real world evaluations, but the visual and physical domain gap between existing simulation benchmarks and the real world has made them an unreliable signal for policy improvement. Furthermore, building realistic and diverse simulated environments has traditionally required significant human effort and expertise. To bridge the gap, we introduce Policy Evaluation and Environment Reconstruction in Simulation (PolaRiS), a scalable real-to-sim framework for high-fidelity simulated robot evaluation. PolaRiS utilizes neural reconstruction methods to turn short video scans of real-world scenes into interactive simulation environments. Additionally, we develop a simple simulation data co-training recipe that bridges remaining real-to-sim gaps and enables zero-shot evaluation in unseen simulation environments. Through extensive paired evaluations between simulation and the real world, we demonstrate that PolaRiS evaluations provide a much stronger correlation to real world generalist policy performance than existing simulated benchmarks. Its simplicity also enables rapid creation of diverse simulated environments. As such, this work takes a step towards distributed and democratized evaluation for the next generation of robotic foundation models.
comment: Website: https://polaris-evals.github.io/
ReinforceGen: Hybrid Skill Policies with Automated Data Generation and Reinforcement Learning
Long-horizon manipulation has been a long-standing challenge in the robotics community. We propose ReinforceGen, a system that combines task decomposition, data generation, imitation learning, and motion planning to form an initial solution, and improves each component through reinforcement-learning-based fine-tuning. ReinforceGen first segments the task into multiple localized skills, which are connected through motion planning. The skills and motion planning targets are trained with imitation learning on a dataset generated from 10 human demonstrations, and then fine-tuned through online adaptation and reinforcement learning. When benchmarked on the Robosuite dataset, ReinforceGen reaches 80% success rate on all tasks with visuomotor controls in the highest reset range setting. Additional ablation studies show that our fine-tuning approaches contributes to an 89% average performance increase. More results and videos available in https://reinforcegen.github.io/
OPENTOUCH: Bringing Full-Hand Touch to Real-World Interaction
The human hand is our primary interface to the physical world, yet egocentric perception rarely knows when, where, or how forcefully it makes contact. Robust wearable tactile sensors are scarce, and no existing in-the-wild datasets align first-person video with full-hand touch. To bridge the gap between visual perception and physical interaction, we present OpenTouch, the first in-the-wild egocentric full-hand tactile dataset, containing 5.1 hours of synchronized video-touch-pose data and 2,900 curated clips with detailed text annotations. Using OpenTouch, we introduce retrieval and classification benchmarks that probe how touch grounds perception and action. We show that tactile signals provide a compact yet powerful cue for grasp understanding, strengthen cross-modal alignment, and can be reliably retrieved from in-the-wild video queries. By releasing this annotated vision-touch-pose dataset and benchmark, we aim to advance multimodal egocentric perception, embodied learning, and contact-rich robotic manipulation.
comment: https://opentouch-tactile.github.io/
GeoPredict: Leveraging Predictive Kinematics and 3D Gaussian Geometry for Precise VLA Manipulation
Vision-Language-Action (VLA) models achieve strong generalization in robotic manipulation but remain largely reactive and 2D-centric, making them unreliable in tasks that require precise 3D reasoning. We propose GeoPredict, a geometry-aware VLA framework that augments a continuous-action policy with predictive kinematic and geometric priors. GeoPredict introduces a trajectory-level module that encodes motion history and predicts multi-step 3D keypoint trajectories of robot arms, and a predictive 3D Gaussian geometry module that forecasts workspace geometry with track-guided refinement along future keypoint trajectories. These predictive modules serve exclusively as training-time supervision through depth-based rendering, while inference requires only lightweight additional query tokens without invoking any 3D decoding. Experiments on RoboCasa Human-50, LIBERO, and real-world manipulation tasks show that GeoPredict consistently outperforms strong VLA baselines, especially in geometry-intensive and spatially demanding scenarios.
PhysBrain: Human Egocentric Data as a Bridge from Vision Language Models to Physical Intelligence
Robotic generalization relies on physical intelligence: the ability to reason about state changes, contact-rich interactions, and long-horizon planning under egocentric perception and action. However, most VLMs are trained primarily on third-person data, creating a fundamental viewpoint mismatch for humanoid robots. Scaling robot egocentric data collection remains impractical due to high cost and limited diversity, whereas large-scale human egocentric videos offer a scalable alternative that naturally capture rich interaction context and causal structure. The key challenge is to convert raw egocentric videos into structured and reliable embodiment training supervision. Accordingly, we propose an Egocentric2Embodiment translation pipeline that transforms first-person videos into multi-level, schema-driven VQA supervision with enforced evidence grounding and temporal consistency, enabling the construction of the Egocentric2Embodiment dataset (E2E-3M) at scale. An egocentric-aware embodied brain, termed PhysBrain, is obtained by training on the E2E-3M dataset. PhysBrain exhibits substantially improved egocentric understanding, particularly for planning on EgoThink. It provides an egocentric-aware initialization that enables more sample-efficient VLA fine-tuning and higher SimplerEnv success rates (53.9\%), demonstrating effective transfer from human egocentric supervision to downstream robot control.
comment: 17 pages, 4 figures
Vision-Language-Action Models for Autonomous Driving: Past, Present, and Future
Autonomous driving has long relied on modular "Perception-Decision-Action" pipelines, where hand-crafted interfaces and rule-based components often break down in complex or long-tailed scenarios. Their cascaded design further propagates perception errors, degrading downstream planning and control. Vision-Action (VA) models address some limitations by learning direct mappings from visual inputs to actions, but they remain opaque, sensitive to distribution shifts, and lack structured reasoning or instruction-following capabilities. Recent progress in Large Language Models (LLMs) and multimodal learning has motivated the emergence of Vision-Language-Action (VLA) frameworks, which integrate perception with language-grounded decision making. By unifying visual understanding, linguistic reasoning, and actionable outputs, VLAs offer a pathway toward more interpretable, generalizable, and human-aligned driving policies. This work provides a structured characterization of the emerging VLA landscape for autonomous driving. We trace the evolution from early VA approaches to modern VLA frameworks and organize existing methods into two principal paradigms: End-to-End VLA, which integrates perception, reasoning, and planning within a single model, and Dual-System VLA, which separates slow deliberation (via VLMs) from fast, safety-critical execution (via planners). Within these paradigms, we further distinguish subclasses such as textual vs. numerical action generators and explicit vs. implicit guidance mechanisms. We also summarize representative datasets and benchmarks for evaluating VLA-based driving systems and highlight key challenges and open directions, including robustness, interpretability, and instruction fidelity. Overall, this work aims to establish a coherent foundation for advancing human-compatible autonomous driving systems.
comment: Preprint; 40 pages, 7 figures, 9 tables; GitHub at https://github.com/worldbench/awesome-vla-for-ad
VERM: Leveraging Foundation Models to Create a Virtual Eye for Efficient 3D Robotic Manipulation
When performing 3D manipulation tasks, robots have to execute action planning based on perceptions from multiple fixed cameras. The multi-camera setup introduces substantial redundancy and irrelevant information, which increases computational costs and forces the model to spend extra training time extracting crucial task-relevant details. To filter out redundant information and accurately extract task-relevant features, we propose the VERM (Virtual Eye for Robotic Manipulation) method, leveraging the knowledge in foundation models to imagine a virtual task-adaptive view from the constructed 3D point cloud, which efficiently captures necessary information and mitigates occlusion. To facilitate 3D action planning and fine-grained manipulation, we further design a depth-aware module and a dynamic coarse-to-fine procedure. Extensive experimental results on both simulation benchmark RLBench and real-world evaluations demonstrate the effectiveness of our method, surpassing previous state-of-the-art methods while achieving 1.89x speedup in training time and 1.54x speedup in inference speed. More results can be found on our project website at https://verm-ral.github.io .
comment: Accepted at RA-L 2025
Olaf: Bringing an Animated Character to Life in the Physical World
Animated characters often move in non-physical ways and have proportions that are far from a typical walking robot. This provides an ideal platform for innovation in both mechanical design and stylized motion control. In this paper, we bring Olaf to life in the physical world, relying on reinforcement learning guided by animation references for control. To create the illusion of Olaf's feet moving along his body, we hide two asymmetric legs under a soft foam skirt. To fit actuators inside the character, we use spherical and planar linkages in the arms, mouth, and eyes. Because the walk cycle results in harsh contact sounds, we introduce additional rewards that noticeably reduce impact noise. The large head, driven by small actuators in the character's slim neck, creates a risk of overheating, amplified by the costume. To keep actuators from overheating, we feed temperature values as additional inputs to policies, introducing new rewards to keep them within bounds. We validate the efficacy of our modeling in simulation and on hardware, demonstrating an unmatched level of believability for a costumed robotic character.
A Formal Modular Synthesis Approach for the Coordination of 3-D Robotic Construction with Multi-robots
In this paper, we deal with the problem of coordinating multiple robots to build 3-D structures. This problem consists of a set of mobile robots that interact with each other in order to autonomously build a predefined 3-D structure. Our approach is based on Supervisory Control Theory, and it allows us to synthesize from models that represent a single robot and the target structure a correct-by-construction reactive controller, called supervisor. When this supervisor is replicated for the other robots, then the target structure can be completed by all robots
Tri-Select: A Multi-Stage Visual Data Selection Framework for Mobile Visual Crowdsensing
Mobile visual crowdsensing enables large-scale, fine-grained environmental monitoring through the collection of images from distributed mobile devices. However, the resulting data is often redundant and heterogeneous due to overlapping acquisition perspectives, varying resolutions, and diverse user behaviors. To address these challenges, this paper proposes Tri-Select, a multi-stage visual data selection framework that efficiently filters redundant and low-quality images. Tri-Select operates in three stages: (1) metadata-based filtering to discard irrelevant samples; (2) spatial similarity-based spectral clustering to organize candidate images; and (3) a visual-feature-guided selection based on maximum independent set search to retain high-quality, representative images. Experiments on real-world and public datasets demonstrate that Tri-Select improves both selection efficiency and dataset quality, making it well-suited for scalable crowdsensing applications.
SNOW: Spatio-Temporal Scene Understanding with World Knowledge for Open-World Embodied Reasoning
Autonomous robotic systems require spatio-temporal understanding of dynamic environments to ensure reliable navigation and interaction. While Vision-Language Models (VLMs) provide open-world semantic priors, they lack grounding in 3D geometry and temporal dynamics. Conversely, geometric perception captures structure and motion but remains semantically sparse. We propose SNOW (Scene Understanding with Open-World Knowledge), a training-free and backbone-agnostic framework for unified 4D scene understanding that integrates VLM-derived semantics with point cloud geometry and temporal consistency. SNOW processes synchronized RGB images and 3D point clouds, using HDBSCAN clustering to generate object-level proposals that guide SAM2-based segmentation. Each segmented region is encoded through our proposed Spatio-Temporal Tokenized Patch Encoding (STEP), producing multimodal tokens that capture localized semantic, geometric, and temporal attributes. These tokens are incrementally integrated into a 4D Scene Graph (4DSG), which serves as 4D prior for downstream reasoning. A lightweight SLAM backend anchors all STEP tokens spatially in the environment, providing the global reference alignment, and ensuring unambiguous spatial grounding across time. The resulting 4DSG forms a queryable, unified world model through which VLMs can directly interpret spatial scene structure and temporal dynamics. Experiments on a diverse set of benchmarks demonstrate that SNOW enables precise 4D scene understanding and spatially grounded inference, thereby setting new state-of-the-art performance in several settings, highlighting the importance of structured 4D priors for embodied reasoning and autonomous robotics.
AG-MPBS: a Mobility-Aware Prediction and Behavior-Based Scheduling Framework for Air-Ground Unmanned Systems
As unmanned systems such as Unmanned Aerial Vehicles (UAVs) and Unmanned Ground Vehicles (UGVs) become increasingly important to applications like urban sensing and emergency response, efficiently recruiting these autonomous devices to perform time-sensitive tasks has become a critical challenge. This paper presents MPBS (Mobility-aware Prediction and Behavior-based Scheduling), a scalable task recruitment framework that treats each device as a recruitable "user". MPBS integrates three key modules: a behavior-aware KNN classifier, a time-varying Markov prediction model for forecasting device mobility, and a dynamic priority scheduling mechanism that considers task urgency and base station performance. By combining behavioral classification with spatiotemporal prediction, MPBS adaptively assigns tasks to the most suitable devices in real time. Experimental evaluations on the real-world GeoLife dataset show that MPBS significantly improves task completion efficiency and resource utilization. The proposed framework offers a predictive, behavior-aware solution for intelligent and collaborative scheduling in unmanned systems.
Single-View Shape Completion for Robotic Grasping in Clutter
In vision-based robot manipulation, a single camera view can only capture one side of objects of interest, with additional occlusions in cluttered scenes further restricting visibility. As a result, the observed geometry is incomplete, and grasp estimation algorithms perform suboptimally. To address this limitation, we leverage diffusion models to perform category-level 3D shape completion from partial depth observations obtained from a single view, reconstructing complete object geometries to provide richer context for grasp planning. Our method focuses on common household items with diverse geometries, generating full 3D shapes that serve as input to downstream grasp inference networks. Unlike prior work, which primarily considers isolated objects or minimal clutter, we evaluate shape completion and grasping in realistic clutter scenarios with household objects. In preliminary evaluations on a cluttered scene, our approach consistently results in better grasp success rates than a naive baseline without shape completion by 23% and over a recent state of the art shape completion approach by 19%. Our code is available at https://amm.aass.oru.se/shape-completion-grasping/.
E-SDS: Environment-aware See it, Do it, Sorted - Automated Environment-Aware Reinforcement Learning for Humanoid Locomotion
Vision-language models (VLMs) show promise in automating reward design in humanoid locomotion, which could eliminate the need for tedious manual engineering. However, current VLM-based methods are essentially "blind", as they lack the environmental perception required to navigate complex terrain. We present E-SDS (Environment-aware See it, Do it, Sorted), a framework that closes this perception gap. E-SDS integrates VLMs with real-time terrain sensor analysis to automatically generate reward functions that facilitate training of robust perceptive locomotion policies, grounded by example videos. Evaluated on a Unitree G1 humanoid across four distinct terrains (simple, gaps, obstacles, stairs), E-SDS uniquely enabled successful stair descent, while policies trained with manually-designed rewards or a non-perceptive automated baseline were unable to complete the task. In all terrains, E-SDS also reduced velocity tracking error by 51.9-82.6%. Our framework reduces the human effort of reward design from days to less than two hours while simultaneously producing more robust and capable locomotion policies.
comment: 12 pages, 3 figures, 4 tables. Accepted at RiTA 2025 (Springer LNNS)
A2VISR: An Active and Adaptive Ground-Aerial Localization System Using Visual Inertial and Single-Range Fusion
It's a practical approach using the ground-aerial collaborative system to enhance the localization robustness of flying robots in cluttered environments, especially when visual sensors degrade. Conventional approaches estimate the flying robot's position using fixed cameras observing pre-attached markers, which could be constrained by limited distance and susceptible to capture failure. To address this issue, we improve the ground-aerial localization framework in a more comprehensive manner, which integrates active vision, single-ranging, inertial odometry, and optical flow. First, the designed active vision subsystem mounted on the ground vehicle can be dynamically rotated to detect and track infrared markers on the aerial robot, improving the field of view and the target recognition with a single camera. Meanwhile, the incorporation of single-ranging extends the feasible distance and enhances re-capture capability under visual degradation. During estimation, a dimension-reduced estimator fuses multi-source measurements based on polynomial approximation with an extended sliding window, balancing computational efficiency and redundancy. Considering different sensor fidelities, an adaptive sliding confidence evaluation algorithm is implemented to assess measurement quality and dynamically adjust the weighting parameters based on moving variance. Finally, extensive experiments under conditions such as smoke interference, illumination variation, obstacle occlusion, prolonged visual loss, and extended operating range demonstrate that the proposed approach achieves robust online localization, with an average root mean square error of approximately 0.09 m, while maintaining resilience to capture loss and sensor failures.
comment: accept by IEEE Transactions on Industrial Electronics
ManiLong-Shot: Interaction-Aware One-Shot Imitation Learning for Long-Horizon Manipulation AAAI 2026
One-shot imitation learning (OSIL) offers a promising way to teach robots new skills without large-scale data collection. However, current OSIL methods are primarily limited to short-horizon tasks, thus limiting their applicability to complex, long-horizon manipulations. To address this limitation, we propose ManiLong-Shot, a novel framework that enables effective OSIL for long-horizon prehensile manipulation tasks. ManiLong-Shot structures long-horizon tasks around physical interaction events, reframing the problem as sequencing interaction-aware primitives instead of directly imitating continuous trajectories. This primitive decomposition can be driven by high-level reasoning from a vision-language model (VLM) or by rule-based heuristics derived from robot state changes. For each primitive, ManiLong-Shot predicts invariant regions critical to the interaction, establishes correspondences between the demonstration and the current observation, and computes the target end-effector pose, enabling effective task execution. Extensive simulation experiments show that ManiLong-Shot, trained on only 10 short-horizon tasks, generalizes to 20 unseen long-horizon tasks across three difficulty levels via one-shot imitation, achieving a 22.8% relative improvement over the SOTA. Additionally, real-robot experiments validate ManiLong-Shot's ability to robustly execute three long-horizon manipulation tasks via OSIL, confirming its practical applicability.
comment: Accepted by AAAI 2026
Privacy-Aware Sharing of Raw Spatial Sensor Data for Cooperative Perception
Cooperative perception between vehicles is poised to offer robust and reliable scene understanding. Recently, we are witnessing experimental systems research building testbeds that share raw spatial sensor data for cooperative perception. While there has been a marked improvement in accuracies and is the natural way forward, we take a moment to consider the problems with such an approach for eventual adoption by automakers. In this paper, we first argue that new forms of privacy concerns arise and discourage stakeholders to share raw sensor data. Next, we present SHARP, a research framework to minimize privacy leakage and drive stakeholders towards the ambitious goal of raw data based cooperative perception. Finally, we discuss open questions for networked systems, mobile computing, perception researchers, industry and government in realizing our proposed framework.
Towards Closing the Domain Gap with Event Cameras
Although traditional cameras are the primary sensor for end-to-end driving, their performance suffers greatly when the conditions of the data they were trained on does not match the deployment environment, a problem known as the domain gap. In this work, we consider the day-night lighting difference domain gap. Instead of traditional cameras we propose event cameras as a potential alternative which can maintain performance across lighting condition domain gaps without requiring additional adjustments. Our results show that event cameras maintain more consistent performance across lighting conditions, exhibiting domain-shift penalties that are generally comparable to or smaller than grayscale frames and provide superior baseline performance in cross-domain scenarios.
comment: Accepted to Australasian Conference on Robotics and Automation (ACRA), 2025
A simulation platform calibration method for automated vehicle evaluation: accurate on both vehicle level and traffic flow level
Simulation testing is a fundamental approach for evaluating automated vehicles (AVs). To ensure its reliability, it is crucial to accurately replicate interactions between AVs and background traffic, which necessitates effective calibration. However, existing calibration methods often fall short in achieving this goal. To address this gap, this study introduces a simulation platform calibration method that ensures high accuracy at both the vehicle and traffic flow levels. The method offers several key features:(1) with the capability of calibration for vehicle-to-vehicle interaction; (2) with accuracy assurance; (3) with enhanced efficiency; (4) with pipeline calibration capability. The proposed method is benchmarked against a baseline with no calibration and a state-of-the-art calibration method. Results show that it enhances the accuracy of interaction replication by 83.53% and boosts calibration efficiency by 76.75%. Furthermore, it maintains accuracy across both vehicle-level and traffic flow-level metrics, with an improvement of 51.9%. Notably, the entire calibration process is fully automated, requiring no human intervention.
A Task-Driven, Planner-in-the-Loop Computational Design Framework for Modular Manipulators
Modular manipulators composed of pre-manufactured and interchangeable modules offer high adaptability across diverse tasks. However, their deployment requires generating feasible motions while jointly optimizing morphology and mounted pose under kinematic, dynamic, and physical constraints. Moreover, traditional single-branch designs often extend reach by increasing link length, which can easily violate torque limits at the base joint. To address these challenges, we propose a unified task-driven computational framework that integrates trajectory planning across varying morphologies with the co-optimization of morphology and mounted pose. Within this framework, a hierarchical model predictive control (HMPC) strategy is developed to enable motion planning for both redundant and non-redundant manipulators. For design optimization, the CMA-ES is employed to efficiently explore a hybrid search space consisting of discrete morphology configurations and continuous mounted poses. Meanwhile, a virtual module abstraction is introduced to enable bi-branch morphologies, allowing an auxiliary branch to offload torque from the primary branch and extend the achievable workspace without increasing the capacity of individual joint modules. Extensive simulations and hardware experiments on polishing, drilling, and pick-and-place tasks demonstrate the effectiveness of the proposed framework. The results show that: 1) the framework can generate multiple feasible designs that satisfy kinematic and dynamic constraints while avoiding environmental collisions for given tasks; 2) flexible design objectives, such as maximizing manipulability, minimizing joint effort, or reducing the number of modules, can be achieved by customizing the cost functions; and 3) a bi-branch morphology capable of operating in a large workspace can be realized without requiring more powerful basic modules.
Driving in Corner Case: A Real-World Adversarial Closed-Loop Evaluation Platform for End-to-End Autonomous Driving
Safety-critical corner cases, difficult to collect in the real world, are crucial for evaluating end-to-end autonomous driving. Adversarial interaction is an effective method to generate such safety-critical corner cases. While existing adversarial evaluation methods are built for models operating in simplified simulation environments, adversarial evaluation for real-world end-to-end autonomous driving has been little explored. To address this challenge, we propose a closed-loop evaluation platform for end-to-end autonomous driving, which can generate adversarial interactions in real-world scenes. In our platform, the real-world image generator cooperates with an adversarial traffic policy to evaluate various end-to-end models trained on real-world data. The generator, based on flow matching, efficiently and stably generates real-world images according to the traffic environment information. The efficient adversarial surrounding vehicle policy is designed to model challenging interactions and create corner cases that current autonomous driving systems struggle to handle. Experimental results demonstrate that the platform can generate realistic driving images efficiently. Through evaluating the end-to-end models such as UniAD and VAD, we demonstrate that based on the adversarial policy, our platform evaluates the performance degradation of the tested model in corner cases. This result indicates that this platform can effectively detect the model's potential issues, which will facilitate the safety and robustness of end-to-end autonomous driving.
DiffeoMorph: Learning to Morph 3D Shapes Using Differentiable Agent-Based Simulations
Biological systems can form complex three-dimensional structures through the collective behavior of identical agents -- cells that follow the same internal rules and communicate without central control. How such distributed control gives rise to precise global patterns remains a central question not only in developmental biology but also in distributed robotics, programmable matter, and multi-agent learning. Here, we introduce DiffeoMorph, an end-to-end differentiable framework for learning a morphogenesis protocol that guides a population of agents to morph into a target 3D shape. Each agent updates its position and internal state using an attention-based SE(3)-equivariant graph neural network, based on its own internal state and signals received from other agents. To train this system, we introduce a new shape-matching loss based on the 3D Zernike polynomials, which compares the predicted and target shapes as continuous spatial distributions, not as discrete point clouds, and is invariant to agent ordering, number of agents, and rigid-body transformations. To enforce full SO(3) invariance -- invariant to rotations yet sensitive to reflections, we include an alignment step that optimally rotates the predicted Zernike spectrum to match the target before computing the loss. This results in a bilevel problem, with the inner loop optimizing a unit quaternion for the best alignment and the outer loop updating the agent model. We compute gradients through the alignment step using implicit differentiation. We perform systematic benchmarking to establish the advantages of our shape-matching loss over other standard distance metrics for shape comparison tasks. We then demonstrate that DiffeoMorph can form a range of shapes -- from simple ellipsoids to complex morphologies -- using only minimal spatial cues.
Learning to Plan, Planning to Learn: Adaptive Hierarchical RL-MPC for Sample-Efficient Decision Making
We propose a new approach for solving planning problems with a hierarchical structure, fusing reinforcement learning and MPC planning. Our formulation tightly and elegantly couples the two planning paradigms. It leverages reinforcement learning actions to inform the MPPI sampler, and adaptively aggregates MPPI samples to inform the value estimation. The resulting adaptive process leverages further MPPI exploration where value estimates are uncertain, and improves training robustness and the overall resulting policies. This results in a robust planning approach that can handle complex planning problems and easily adapts to different applications, as demonstrated over several domains, including race driving, modified Acrobot, and Lunar Lander with added obstacles. Our results in these domains show better data efficiency and overall performance in terms of both rewards and task success, with up to a 72% increase in success rate compared to existing approaches, as well as accelerated convergence (x2.1) compared to non-adaptive sampling.
comment: 23 pages, 8 figures. Under review
Lang2Manip: A Tool for LLM-Based Symbolic-to-Geometric Planning for Manipulation
Simulation is essential for developing robotic manipulation systems, particularly for task and motion planning (TAMP), where symbolic reasoning interfaces with geometric, kinematic, and physics-based execution. Recent advances in Large Language Models (LLMs) enable robots to generate symbolic plans from natural language, yet executing these plans in simulation often requires robot-specific engineering or planner-dependent integration. In this work, we present a unified pipeline that connects an LLM-based symbolic planner with the Kautham motion planning framework to achieve generalizable, robot-agnostic symbolic-to-geometric manipulation. Kautham provides ROS-compatible support for a wide range of industrial manipulators and offers geometric, kinodynamic, physics-driven, and constraint-based motion planning under a single interface. Our system converts language instructions into symbolic actions and computes and executes collision-free trajectories using any of Kautham's planners without additional coding. The result is a flexible and scalable tool for language-driven TAMP that is generalized across robots, planning modalities, and manipulation tasks.
comment: Submitted to ICARA
Mr.MSTE: Multi-robot Multi-Source Term Estimation with Wind-Aware Coverage Control
This paper presents a Multi-Robot Multi-Source Term Estimation (MRMSTE) framework that enables teams of mobile robots to collaboratively sample gas concentrations and infer the parameters of an unknown number of airborne releases. The framework is built on a hybrid Bayesian inference scheme that represents the joint multi-source probability density and incorporates physics-informed state transitions, including source birth, removal, and merging induced by atmospheric dispersion. A superposition-based measurement model is naturally accommodated, allowing sparse concentration measurements to be exploited efficiently. To guide robot deployment, we introduce a wind-aware coverage control (WCC) strategy that integrates the evolving multi-source belief with local wind information to prioritize regions of high detection likelihood. Unlike conventional coverage control or information-theoretic planners, WCC explicitly accounts for anisotropic plume transport when modelling sensor performance, leading to more effective sensor placement for multi-source estimation. Monte Carlo studies demonstrate faster convergence and improved separation of individual source beliefs compared to traditional coverage-based strategies and small-scale static sensor networks. Real-world experiments with CO2 releases using TurtleBot platforms further validate the proposed approach, demonstrating its practicality for scalable multi-robot gas-sensing applications.
Dual-Channel Tomographic Tactile Skin with Pneumatic Pressure Sensing for Improved Force Estimation
Tactile skins based on Electrical Impedance Tomography (EIT) enable large-area contact localization with few electrodes, but suffer from nonuniform sensitivity that limits force estimation accuracy. This work introduces a dual-channel tactile skin that integrates an EIT layer with a pneumatic pressure layer and a calibration framework that leverages their complementary strengths. The EIT layer provides robust multi-contact localization, while the pneumatic pressure layer supplies a stable scalar measurement that serves as contact force estimation. A location-aware correction method is introduced, learning smooth spatial gain and offset fields from a single-session calibration, enabling spatially consistent multi-contact force estimation. The proposed system achieves accurate force estimation across diverse contact configurations, generalizes to varying indenter sizes, and preserves EIT's inherent advantages in multi-contact localization. By letting the pneumatic pressure layer handle the force estimation and using the EIT layer to determine where each contact occurs, the method avoids the need for large datasets, complicated calibration setups, and heavy machine-learning pipelines often required by previous EIT-only approaches. This dual-channel design provides a practical, scalable, and easy-to-calibrate solution for building large-area robotic skins.
comment: This work has been submitted to the IEEE for possible publication
Unleashing Humanoid Reaching Potential via Real-world-Ready Skill Space
Humans possess a large reachable space in the 3D world, enabling interaction with objects at varying heights and distances. However, realizing such large-space reaching on humanoids is a complex whole-body control problem and requires the robot to master diverse skills simultaneously-including base positioning and reorientation, height and body posture adjustments, and end-effector pose control. Learning from scratch often leads to optimization difficulty and poor sim2real transferability. To address this challenge, we propose Real-world-Ready Skill Space (R2S2). Our approach begins with a carefully designed skill library consisting of real-world-ready primitive skills. We ensure optimal performance and robust sim2real transfer through individual skill tuning and sim2real evaluation. These skills are then ensembled into a unified latent space, serving as a structured prior that helps task execution in an efficient and sim2real transferable manner. A high-level planner, trained to sample skills from this space, enables the robot to accomplish real-world goal-reaching tasks. We demonstrate zero-shot sim2real transfer and validate R2S2 in multiple challenging goal-reaching scenarios.
TACOS: Task Agnostic COordinator of a multi-drone System
When a single pilot is responsible for managing a multi-drone system, the task may demand varying levels of autonomy, from direct control of individual UAVs, to group-level coordination, to fully autonomous swarm behaviors for accomplishing high-level tasks. Enabling such flexible interaction requires a framework that supports multiple modes of shared autonomy. As language models continue to improve in reasoning and planning, they provide a natural foundation for such systems, reducing pilot workload by enabling high-level task delegation through intuitive, language-based interfaces. In this paper we present TACOS (Task-Agnostic COordinator of a multi-drone System), a unified framework that enables high-level natural language control of multi-UAV systems through Large Language Models (LLMs). TACOS integrates three key capabilities into a single architecture: a one-to-many natural language interface for intuitive user interaction, an intelligent coordinator for translating user intent into structured task plans, and an autonomous agent that executes plans interacting with the real world. TACOS allows a LLM to interact with a library of executable APIs, bridging semantic reasoning with real-time multi-robot coordination. We demonstrate the system on a real-world multi-drone system, and conduct an ablation study to assess the contribution of each module.
comment: 8 pages, 9 figures, submitted to the IEEE for possible publication
VLG-Loc: Vision-Language Global Localization from Labeled Footprint Maps IROS
This paper presents Vision-Language Global Localization (VLG-Loc), a novel global localization method that uses human-readable labeled footprint maps containing only names and areas of distinctive visual landmarks in an environment. While humans naturally localize themselves using such maps, translating this capability to robotic systems remains highly challenging due to the difficulty of establishing correspondences between observed landmarks and those in the map without geometric and appearance details. To address this challenge, VLG-Loc leverages a vision-language model (VLM) to search the robot's multi-directional image observations for the landmarks noted in the map. The method then identifies robot poses within a Monte Carlo localization framework, where the found landmarks are used to evaluate the likelihood of each pose hypothesis. Experimental validation in simulated and real-world retail environments demonstrates superior robustness compared to existing scan-based methods, particularly under environmental changes. Further improvements are achieved through the probabilistic fusion of visual and scan-based localization.
comment: v2: Updated the citation of SparseLoc from an arXiv preprint to its published version in the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Robust Finetuning of Vision-Language-Action Robot Policies via Parameter Merging
Generalist robot policies, trained on large and diverse datasets, have demonstrated the ability to generalize across a wide spectrum of behaviors, enabling a single policy to act in varied real-world environments. However, they still fall short on new tasks not covered in the training data. When finetuned on limited demonstrations of a new task, these policies often overfit to the specific demonstrations--not only losing their prior abilities to solve a wide variety of generalist tasks but also failing to generalize within the new task itself. In this work, we aim to develop a method that preserves the generalization capabilities of the generalist policy during finetuning, allowing a single policy to robustly incorporate a new skill into its repertoire. Our goal is a single policy that both learns to generalize to variations of the new task and retains the broad competencies gained from pretraining. We show that this can be achieved through a simple yet effective strategy: interpolating the weights of a finetuned model with that of the pretrained model. We show, across extensive simulated and real-world experiments, that such model merging produces a single model that inherits the generalist abilities of the base model and learns to solve the new task robustly, outperforming both the pretrained and finetuned model on out-of-distribution variations of the new task. Moreover, we show that model merging performance scales with the amount of pretraining data, and enables continual acquisition of new skills in a lifelong learning setting, without sacrificing previously learned generalist abilities.
An Anatomy of Vision-Language-Action Models: From Modules to Milestones and Challenges
Vision-Language-Action (VLA) models are driving a revolution in robotics, enabling machines to understand instructions and interact with the physical world. This field is exploding with new models and datasets, making it both exciting and challenging to keep pace with. This survey offers a clear and structured guide to the VLA landscape. We design it to follow the natural learning path of a researcher: we start with the basic Modules of any VLA model, trace the history through key Milestones, and then dive deep into the core Challenges that define recent research frontier. Our main contribution is a detailed breakdown of the five biggest challenges in: (1) Representation, (2) Execution, (3) Generalization, (4) Safety, and (5) Dataset and Evaluation. This structure mirrors the developmental roadmap of a generalist agent: establishing the fundamental perception-action loop, scaling capabilities across diverse embodiments and environments, and finally ensuring trustworthy deployment-all supported by the essential data infrastructure. For each of them, we review existing approaches and highlight future opportunities. We position this paper as both a foundational guide for newcomers and a strategic roadmap for experienced researchers, with the dual aim of accelerating learning and inspiring new ideas in embodied intelligence. A live version of this survey, with continuous updates, is maintained on our \href{https://suyuz1.github.io/Survery/}{project page}.
comment: project page: https://suyuz1.github.io/Survery/
Long-Horizon Visual Imitation Learning via Plan and Code Reflection
Learning from long-horizon demonstrations with complex action sequences presents significant challenges for visual imitation learning, particularly in understanding temporal relationships of actions and spatial relationships between objects. In this paper, we propose a new agent framework that incorporates two dedicated reflection modules to enhance both plan and code generation. The plan generation module produces an initial action sequence, which is then verified by the plan reflection module to ensure temporal coherence and spatial alignment with the demonstration video. The code generation module translates the plan into executable code, while the code reflection module verifies and refines the generated code to ensure correctness and consistency with the generated plan. These two reflection modules jointly enable the agent to detect and correct errors in both the plan generation and code generation, improving performance in tasks with intricate temporal and spatial dependencies. To support systematic evaluation, we introduce LongVILBench, a benchmark comprising 300 human demonstrations with action sequences of up to 18 steps. LongVILBench emphasizes temporal and spatial complexity across multiple task types. Experimental results demonstrate that existing methods perform poorly on this benchmark, whereas our new framework establishes a strong baseline for long-horizon visual imitation learning.
comment: 9 pages, 4 figures
Self-localization on a 3D map by fusing global and local features from a monocular camera
Self-localization on a 3D map by using an inexpensive monocular camera is required to realize autonomous driving. Self-localization based on a camera often uses a convolutional neural network (CNN) that can extract local features that are calculated by nearby pixels. However, when dynamic obstacles, such as people, are present, CNN does not work well. This study proposes a new method combining CNN with Vision Transformer, which excels at extracting global features that show the relationship of patches on whole image. Experimental results showed that, compared to the state-of-the-art method (SOTA), the accuracy improvement rate in a CG dataset with dynamic obstacles is 1.5 times higher than that without dynamic obstacles. Moreover, the self-localization error of our method is 20.1% smaller than that of SOTA on public datasets. Additionally, our robot using our method can localize itself with 7.51cm error on average, which is more accurate than SOTA.
"I am here for you": How relational conversational AI appeals to adolescents, especially those who are socially and emotionally vulnerable
General-purpose conversational AI chatbots and AI companions increasingly provide young adolescents with emotionally supportive conversations, raising questions about how conversational style shapes anthropomorphism and emotional reliance. In a preregistered online experiment with 284 adolescent-parent dyads, youth aged 11-15 and their parents read two matched transcripts in which a chatbot responded to an everyday social problem using either a relational style (first-person, affiliative, commitment language) or a transparent style (explicit nonhumanness, informational tone). Adolescents more often preferred the relational than the transparent style, whereas parents were more likely to prefer transparent style than adolescents. Adolescents rated the relational chatbot as more human-like, likable, trustworthy and emotionally close, while perceiving both styles as similarly helpful. Adolescents who preferred relational style had lower family and peer relationship quality and higher stress and anxiety than those preferring transparent style or both chatbots. These findings identify conversational style as a key design lever for youth AI safety, showing that relational framing heightens anthropomorphism, trust and emotional closeness and can be especially appealing to socially and emotionally vulnerable adolescents, who may be at increased risk for emotional reliance on conversational AI.
comment: I updated a title
Imperative Learning: A Self-supervised Neuro-Symbolic Learning Framework for Robot Autonomy
Data-driven methods such as reinforcement and imitation learning have achieved remarkable success in robot autonomy. However, their data-centric nature still hinders them from generalizing well to ever-changing environments. Moreover, labeling data for robotic tasks is often impractical and expensive. To overcome these challenges, we introduce a new self-supervised neuro-symbolic (NeSy) computational framework, imperative learning (IL), for robot autonomy, leveraging the generalization abilities of symbolic reasoning. The framework of IL consists of three primary components: a neural module, a reasoning engine, and a memory system. We formulate IL as a special bilevel optimization (BLO), which enables reciprocal learning over the three modules. This overcomes the label-intensive obstacles associated with data-driven approaches and takes advantage of symbolic reasoning concerning logical reasoning, physical principles, geometric analysis, etc. We discuss several optimization techniques for IL and verify their effectiveness in five distinct robot autonomy tasks including path planning, rule induction, optimal control, visual odometry, and multi-robot routing. Through various experiments, we show that IL can significantly enhance robot autonomy capabilities and we anticipate that it will catalyze further research across diverse domains.
Provable Ordering and Continuity in Vision-Language Pretraining for Generalizable Embodied Agents NeurIPS 2025
Pre-training vision-language representations on human action videos has emerged as a promising approach to reduce reliance on large-scale expert demonstrations for training embodied agents. However, prior methods often employ time contrastive learning based on goal-reaching heuristics, progressively aligning language instructions from the initial to the final frame. This overemphasis on future frames can result in erroneous vision-language associations, as actions may terminate early or include irrelevant moments in the end. To address this issue, we propose Action Temporal Coherence Learning (AcTOL) to learn ordered and continuous vision-language representations without rigid goal-based constraint. AcTOL treats a video as a continuous trajectory where it (1) contrasts semantic differences between frames to reflect their natural ordering, and (2) imposes a local Brownian bridge constraint to ensure smooth transitions across intermediate frames. Extensive imitation learning experiments on both simulated and real robots show that the pretrained features significantly enhance downstream manipulation tasks with high robustness to different linguistic styles of instructions, offering a viable pathway toward generalized embodied agents.
comment: NeurIPS 2025 Poster
Transformer Driven Visual Servoing and Dual Arm Impedance Control for Fabric Texture Matching
In this paper, we propose a method to align and place a fabric piece on top of another using a dual-arm manipulator and a grayscale camera, so that their surface textures are accurately matched. We propose a novel control scheme that combines Transformer-driven visual servoing with dualarm impedance control. This approach enables the system to simultaneously control the pose of the fabric piece and place it onto the underlying one while applying tension to keep the fabric piece flat. Our transformer-based network incorporates pretrained backbones and a newly introduced Difference Extraction Attention Module (DEAM), which significantly enhances pose difference prediction accuracy. Trained entirely on synthetic images generated using rendering software, the network enables zero-shot deployment in real-world scenarios without requiring prior training on specific fabric textures. Real-world experiments demonstrate that the proposed system accurately aligns fabric pieces with different textures.
comment: 8 pages, 11 figures. Accepted to IEEE Robotics and Automation Letters (RA-L)
A Practical Guide for Incorporating Symmetry in Diffusion Policy NeurIPS 2025
Recently, equivariant neural networks for policy learning have shown promising improvements in sample efficiency and generalization, however, their wide adoption faces substantial barriers due to implementation complexity. Equivariant architectures typically require specialized mathematical formulations and custom network design, posing significant challenges when integrating with modern policy frameworks like diffusion-based models. In this paper, we explore a number of straightforward and practical approaches to incorporate symmetry benefits into diffusion policies without the overhead of full equivariant designs. Specifically, we investigate (i) invariant representations via relative trajectory actions and eye-in-hand perception, (ii) integrating equivariant vision encoders, and (iii) symmetric feature extraction with pretrained encoders using Frame Averaging. We first prove that combining eye-in-hand perception with relative or delta action parameterization yields inherent SE(3)-invariance, thus improving policy generalization. We then perform a systematic experimental study on those design choices for integrating symmetry in diffusion policies, and conclude that an invariant representation with equivariant feature extraction significantly improves the policy performance. Our method achieves performance on par with or exceeding fully equivariant architectures while greatly simplifying implementation.
comment: NeurIPS 2025
Accelerating Hybrid Model Predictive Control using Warm-Started Generalized Benders Decomposition
Hybrid model predictive control with both continuous and discrete variables is widely applicable to robotic control tasks, especially those involving contacts with the environment. Due to combinatorial complexity, the solving speed of hybrid MPC can be insufficient for real-time applications. In this paper, we propose a hybrid MPC algorithm based on Generalized Benders Decomposition. The algorithm enumerates and stores cutting planes online inside a finite buffer and transfers them across MPC iterations to provide warm-starts for new problem instances, significantly enhancing solving speed. We theoretically analyze this warm-starting performance by modeling the deviation of mode sequences through temporal shifting and stretching, deriving bounds on the dual gap between transferred optimality cuts and the true optimal costs, and utilizing these bounds to quantify the level of suboptimality guaranteed in the first solve of the Benders Master Problem. Our algorithm is validated in simulation through controlling a cart-pole system with soft contact walls, a free-flying robot navigating around obstacles, and a humanoid robot standing on one leg while pushing against walls with its hands for balance. For our benchmark problems, the algorithm enumerates cuts on the order of only tens to hundreds while reaching speeds 2-3 times faster than the off-the-shelf solver Gurobi, oftentimes exceeding 1000 Hz. The code is available at https://github.com/XuanLin/Benders-MPC.
comment: Significantly revised to emphasize theoretical bounds. The heuristic Master Problem algorithm and GCS tightening experiments are preserved in v1
Deadlock-Free Hybrid RL-MAPF Framework for Zero-Shot Multi-Robot Navigation
Multi-robot navigation in cluttered environments presents fundamental challenges in balancing reactive collision avoidance with long-range goal achievement. When navigating through narrow passages or confined spaces, deadlocks frequently emerge that prevent agents from reaching their destinations, particularly when Reinforcement Learning (RL) control policies encounter novel configurations out of learning distribution. Existing RL-based approaches suffer from limited generalization capability in unseen environments. We propose a hybrid framework that seamlessly integrates RL-based reactive navigation with on-demand Multi-Agent Path Finding (MAPF) to explicitly resolve topological deadlocks. Our approach integrates a safety layer that monitors agent progress to detect deadlocks and, when detected, triggers a coordination controller for affected agents. The framework constructs globally feasible trajectories via MAPF and regulates waypoint progression to reduce inter-agent conflicts during navigation. Extensive evaluation on dense multi-agent benchmarks shows that our method boosts task completion from marginal to near-universal success, markedly reducing deadlocks and collisions. When integrated with hierarchical task planning, it enables coordinated navigation for heterogeneous robots, demonstrating that coupling reactive RL navigation with selective MAPF intervention yields a robust, zero-shot performance.
comment: 18 pages (including appendix), 3 figures. Project page (videos, animations, additional resources): https://wanghaoyi518.github.io/rl-mapf-project-page/
Symphony: A Heuristic Normalized Calibrated Advantage Actor and Critic Algorithm in application for Humanoid Robots
In our work we not explicitly hint that it is a misconception to think that humans learn fast. Learning process takes time. Babies start learning to move in the restricted liquid area called placenta. Children often are limited by underdeveloped body. Even adults are not allowed to participate in complex competitions right away. However, with robots, when learning from scratch, we often don't have the privilege of waiting for dozen millions of steps. "Swaddling" regularization is responsible for restraining an agent in rapid but unstable development penalizing action strength in a specific way not affecting actions directly. The Symphony, Transitional-policy Deterministic Actor and Critic algorithm, is a concise combination of different ideas for possibility of training humanoid robots from scratch with Sample Efficiency, Sample Proximity and Safety of Actions in mind. It is no secret that continuous increase in Gaussian noise without appropriate smoothing is harmful for motors and gearboxes. Compared to Stochastic algorithms, we set a limited parametric noise and promote a reduced strength of actions, safely increasing entropy, since the actions are kind of immersed in weaker noise. When actions require more extreme values, actions rise above the weak noise. Training becomes empirically much safer for both the environment around and the robot's mechanisms. We use Fading Replay Buffer: using a fixed formula containing the hyperbolic tangent, we adjust the batch sampling probability: the memory contains a recent memory and a long-term memory trail. Fading Replay Buffer allows us to use Temporal Advantage when we improve the current Critic Network prediction compared to the exponential moving average. Temporal Advantage allows us to update Actor and Critic in one pass, as well as combine Actor and Critic in one Object and implement their Losses in one line.
comment: https://github.com/SuspensionRailway/symphony
Multiagent Systems
Stackelberg Learning from Human Feedback: Preference Optimization as a Sequential Game
We introduce Stackelberg Learning from Human Feedback (SLHF), a new framework for preference optimization. SLHF frames the alignment problem as a sequential-move game between two policies: a Leader, which commits to an action, and a Follower, which responds conditionally on the Leader's action. This approach decomposes preference optimization into a refinement problem for the Follower and an optimization problem against an adversary for the Leader. Unlike Reinforcement Learning from Human Feedback (RLHF), which assigns scalar rewards to actions, or Nash Learning from Human Feedback (NLHF), which seeks a simultaneous-move equilibrium, SLHF leverages the asymmetry of sequential play to capture richer preference structures. The sequential design of SLHF naturally enables inference-time refinement, as the Follower learns to improve the Leader's actions, and these refinements can be leveraged through iterative sampling. We compare the solution concepts of SLHF, RLHF, and NLHF, and lay out key advantages in consistency, data sensitivity, and robustness to intransitive preferences. Experiments on large language models demonstrate that SLHF achieves strong alignment across diverse preference datasets, scales from 0.5B to 8B parameters, and yields inference-time refinements that transfer across model families without further fine-tuning.
comment: 10 pages, 5 tables, 1 figures
Don't Guess, Escalate: Towards Explainable Uncertainty-Calibrated AI Forensic Agents
AI is reshaping the landscape of multimedia forensics. We propose AI forensic agents: reliable orchestrators that select and combine forensic detectors, identify provenance and context, and provide uncertainty-aware assessments. We highlight pitfalls in current solutions and introduce a unified framework to improve the authenticity verification process.
A Formal Modular Synthesis Approach for the Coordination of 3-D Robotic Construction with Multi-robots
In this paper, we deal with the problem of coordinating multiple robots to build 3-D structures. This problem consists of a set of mobile robots that interact with each other in order to autonomously build a predefined 3-D structure. Our approach is based on Supervisory Control Theory, and it allows us to synthesize from models that represent a single robot and the target structure a correct-by-construction reactive controller, called supervisor. When this supervisor is replicated for the other robots, then the target structure can be completed by all robots
NDRL: Cotton Irrigation and Nitrogen Application with Nested Dual-Agent Reinforcement Learning ICONIP 2025
Effective irrigation and nitrogen fertilization have a significant impact on crop yield. However, existing research faces two limitations: (1) the high complexity of optimizing water-nitrogen combinations during crop growth and poor yield optimization results; and (2) the difficulty in quantifying mild stress signals and the delayed feedback, which results in less precise dynamic regulation of water and nitrogen and lower resource utilization efficiency. To address these issues, we propose a Nested Dual-Agent Reinforcement Learning (NDRL) method. The parent agent in NDRL identifies promising macroscopic irrigation and fertilization actions based on projected cumulative yield benefits, reducing ineffective explorationwhile maintaining alignment between objectives and yield. The child agent's reward function incorporates quantified Water Stress Factor (WSF) and Nitrogen Stress Factor (NSF), and uses a mixed probability distribution to dynamically optimize daily strategies, thereby enhancing both yield and resource efficiency. We used field experiment data from 2023 and 2024 to calibrate and validate the Decision Support System for Agrotechnology Transfer (DSSAT) to simulate real-world conditions and interact with NDRL. Experimental results demonstrate that, compared to the best baseline, the simulated yield increased by 4.7% in both 2023 and 2024, the irrigation water productivity increased by 5.6% and 5.1% respectively, and the nitrogen partial factor productivity increased by 6.3% and 1.0% respectively. Our method advances the development of cotton irrigation and nitrogen fertilization, providing new ideas for addressing the complexity and precision issues in agricultural resource management and for sustainable agricultural development.
comment: Accepted by ICONIP 2025
AMUSE: Audio-Visual Benchmark and Alignment Framework for Agentic Multi-Speaker Understanding
Recent multimodal large language models (MLLMs) such as GPT-4o and Qwen3-Omni show strong perception but struggle in multi-speaker, dialogue-centric settings that demand agentic reasoning tracking who speaks, maintaining roles, and grounding events across time. These scenarios are central to multimodal audio-video understanding, where models must jointly reason over audio and visual streams in applications such as conversational video assistants and meeting analytics. We introduce AMUSE, a benchmark designed around tasks that are inherently agentic, requiring models to decompose complex audio-visual interactions into planning, grounding, and reflection steps. It evaluates MLLMs across three modes zero-shot, guided, and agentic and six task families, including spatio-temporal speaker grounding and multimodal dialogue summarization. Across all modes, current models exhibit weak multi-speaker reasoning and inconsistent behavior under both non-agentic and agentic evaluation. Motivated by the inherently agentic nature of these tasks and recent advances in LLM agents, we propose RAFT, a data-efficient agentic alignment framework that integrates reward optimization with intrinsic multimodal self-evaluation as reward and selective parameter adaptation for data and parameter efficient updates. Using RAFT, we achieve up to 39.52\% relative improvement in accuracy on our benchmark. Together, AMUSE and RAFT provide a practical platform for examining agentic reasoning in multimodal models and improving their capabilities.
Ev-Trust: A Strategy Equilibrium Trust Mechanism for Evolutionary Games in LLM-Based Multi-Agent Services
The rapid evolution of the Web toward an agent-centric paradigm, driven by large language models (LLMs), has enabled autonomous agents to reason, plan, and interact in complex decentralized environments. However, the openness and heterogeneity of LLM-based multi-agent systems also amplify the risks of deception, fraud, and misinformation, posing severe challenges to trust establishment and system robustness. To address this issue, we propose Ev-Trust, a strategy-equilibrium trust mechanism grounded in evolutionary game theory. This mechanism integrates direct trust, indirect trust, and expected revenue into a dynamic feedback structure that guides agents' behavioral evolution toward equilibria. Within a decentralized "Request-Response-Payment-Evaluation" service framework, Ev-Trust enables agents to adaptively adjust strategies, naturally excluding malicious participants while reinforcing high-quality collaboration. Furthermore, our theoretical derivation based on replicator dynamics equations proves the existence and stability of local evolutionary equilibria. Experimental results indicate that our approach effectively reflects agent trustworthiness in LLM-driven open service interaction scenarios, reduces malicious strategies, and increases collective revenue. We hope Ev-Trust can provide a new perspective on trust modeling for the agentic service web in group evolutionary game scenarios.
comment: 12 pages, 11 figures
Evaluation of Generative Models for Emotional 3D Animation Generation in VR
Social interactions incorporate nonverbal signals to convey emotions alongside speech, including facial expressions and body gestures. Generative models have demonstrated promising results in creating full-body nonverbal animations synchronized with speech; however, evaluations using statistical metrics in 2D settings fail to fully capture user-perceived emotions, limiting our understanding of model effectiveness. To address this, we evaluate emotional 3D animation generative models within a Virtual Reality (VR) environment, emphasizing user-centric metrics emotional arousal realism, naturalness, enjoyment, diversity, and interaction quality in a real-time human-agent interaction scenario. Through a user study (N=48), we examine perceived emotional quality for three state of the art speech-driven 3D animation methods across two emotions happiness (high arousal) and neutral (mid arousal). Additionally, we compare these generative models against real human expressions obtained via a reconstruction-based method to assess both their strengths and limitations and how closely they replicate real human facial and body expressions. Our results demonstrate that methods explicitly modeling emotions lead to higher recognition accuracy compared to those focusing solely on speech-driven synchrony. Users rated the realism and naturalness of happy animations significantly higher than those of neutral animations, highlighting the limitations of current generative models in handling subtle emotional states. Generative models underperformed compared to reconstruction-based methods in facial expression quality, and all methods received relatively low ratings for animation enjoyment and interaction quality, emphasizing the importance of incorporating user-centric evaluations into generative model development. Finally, participants positively recognized animation diversity across all generative models.
comment: 20 pages, 5 figures. Webpage: https://emotional3dhumans.github.io/
DiffeoMorph: Learning to Morph 3D Shapes Using Differentiable Agent-Based Simulations
Biological systems can form complex three-dimensional structures through the collective behavior of identical agents -- cells that follow the same internal rules and communicate without central control. How such distributed control gives rise to precise global patterns remains a central question not only in developmental biology but also in distributed robotics, programmable matter, and multi-agent learning. Here, we introduce DiffeoMorph, an end-to-end differentiable framework for learning a morphogenesis protocol that guides a population of agents to morph into a target 3D shape. Each agent updates its position and internal state using an attention-based SE(3)-equivariant graph neural network, based on its own internal state and signals received from other agents. To train this system, we introduce a new shape-matching loss based on the 3D Zernike polynomials, which compares the predicted and target shapes as continuous spatial distributions, not as discrete point clouds, and is invariant to agent ordering, number of agents, and rigid-body transformations. To enforce full SO(3) invariance -- invariant to rotations yet sensitive to reflections, we include an alignment step that optimally rotates the predicted Zernike spectrum to match the target before computing the loss. This results in a bilevel problem, with the inner loop optimizing a unit quaternion for the best alignment and the outer loop updating the agent model. We compute gradients through the alignment step using implicit differentiation. We perform systematic benchmarking to establish the advantages of our shape-matching loss over other standard distance metrics for shape comparison tasks. We then demonstrate that DiffeoMorph can form a range of shapes -- from simple ellipsoids to complex morphologies -- using only minimal spatial cues.
On the Role of Contextual Information and Ego States in LLM Agent Behavior for Transactional Analysis Dialogues ACL
LLM-powered agents are now used in many areas, from customer support to education, and there is increasing interest in their ability to act more like humans. This includes fields such as social, political, and psychological research, where the goal is to model group dynamics and social behavior. However, current LLM agents often lack the psychological depth and consistency needed to capture the real patterns of human thinking. They usually provide direct or statistically likely answers, but they miss the deeper goals, emotional conflicts, and motivations that drive real human interactions. This paper proposes a Multi-Agent System (MAS) inspired by Transactional Analysis (TA) theory. In the proposed system, each agent is divided into three ego states - Parent, Adult, and Child. The ego states are treated as separate knowledge structures with their own perspectives and reasoning styles. To enrich their response process, they have access to an information retrieval mechanism that allows them to retrieve relevant contextual information from their vector stores. This architecture is evaluated through ablation tests in a simulated dialogue scenario, comparing agents with and without information retrieval. The results are promising and open up new directions for exploring how psychologically grounded structures can enrich agent behavior. The contribution is an agent architecture that integrates Transactional Analysis theory with contextual information retrieval to enhance the realism of LLM-based multi-agent simulations.
comment: Presented at the 39th Pacific Asia Conference on Language, Information and Computation (PACLIC 39)
PAACE: A Plan-Aware Automated Agent Context Engineering Framework
Large Language Model (LLM) agents are increasingly deployed in complex, multi-step workflows involving planning, tool use, reflection, and interaction with external knowledge systems. These workflows generate rapidly expanding contexts that must be curated, transformed, and compressed to maintain fidelity, avoid attention dilution, and reduce inference cost. Prior work on summarization and query-aware compression largely ignores the multi-step, plan-aware nature of agentic reasoning. In this work, we introduce PAACE (Plan-Aware Automated Context Engineering), a unified framework for optimizing the evolving state of LLM agents through next-k-task relevance modeling, plan-structure analysis, instruction co-refinement, and function-preserving compression. PAACE comprises (1) PAACE-Syn, a large-scale generator of synthetic agent workflows annotated with stepwise compression supervision, and (2) PAACE-FT, a family of distilled, plan-aware compressors trained from successful teacher demonstrations. Experiments on long-horizon benchmarks (AppWorld, OfficeBench, and 8-Objective QA) demonstrate that PAACE consistently improves agent correctness while substantially reducing context load. On AppWorld, PAACE achieves higher accuracy than all baselines while lowering peak context and cumulative dependency. On OfficeBench and multi-hop QA, PAACE improves both accuracy and F1, achieving fewer steps, lower peak tokens, and reduced attention dependency. Distilled PAACE-FT retains 97 percent of the teacher's performance while reducing inference cost by over an order of magnitude, enabling practical deployment of plan-aware compression with compact models.
Computing ESS in Multiplayer Games
We present an algorithm for computing all evolutionarily stable strategies in nondegenerate normal-form games with three or more players.
comment: Full title is "Computing Evolutionarily Stable Strategies in Multiplayer Games." I am modifying it because Google scholar is merging it with a different paper. I hope to revert the title back in a few weeks
Towards Pervasive Distributed Agentic Generative AI -- A State of The Art
The rapid advancement of intelligent agents and Large Language Models (LLMs) is reshaping the pervasive computing field. Their ability to perceive, reason, and act through natural language understanding enables autonomous problem-solving in complex pervasive environments, including the management of heterogeneous sensors, devices, and data. This survey outlines the architectural components of LLM agents (profiling, memory, planning, and action) and examines their deployment and evaluation across various scenarios. Than it reviews computational and infrastructural advancements (cloud to edge) in pervasive computing and how AI is moving in this field. It highlights state-of-the-art agent deployment strategies and applications, including local and distributed execution on resource-constrained devices. This survey identifies key challenges of these agents in pervasive computing such as architectural, energetic and privacy limitations. It finally proposes what we called "Agent as a Tool", a conceptual framework for pervasive agentic AI, emphasizing context awareness, modularity, security, efficiency and effectiveness.
Breaking the Performance Ceiling in Reinforcement Learning requires Inference Strategies
Reinforcement learning (RL) systems have countless applications, from energy-grid management to protein design. However, such real-world scenarios are often extremely difficult, combinatorial in nature, and require complex coordination between multiple agents. This level of complexity can cause even state-of-the-art RL systems, trained until convergence, to hit a performance ceiling which they are unable to break out of with zero-shot inference. Meanwhile, many digital or simulation-based applications allow for an inference phase that utilises a specific time and compute budget to explore multiple attempts before outputting a final solution. In this work, we show that such an inference phase employed at execution time, and the choice of a corresponding inference strategy, are key to breaking the performance ceiling observed in complex multi-agent RL problems. Our main result is striking: we can obtain up to a 126% and, on average, a 45% improvement over the previous state-of-the-art across 17 tasks, using only a couple seconds of extra wall-clock time during execution. We also demonstrate promising compute scaling properties, supported by over 60k experiments, making it the largest study on inference strategies for complex RL to date. Our experimental data and code are available at https://sites.google.com/view/inference-strategies-rl.
comment: Neurips '25 version (post conference)
TACOS: Task Agnostic COordinator of a multi-drone System
When a single pilot is responsible for managing a multi-drone system, the task may demand varying levels of autonomy, from direct control of individual UAVs, to group-level coordination, to fully autonomous swarm behaviors for accomplishing high-level tasks. Enabling such flexible interaction requires a framework that supports multiple modes of shared autonomy. As language models continue to improve in reasoning and planning, they provide a natural foundation for such systems, reducing pilot workload by enabling high-level task delegation through intuitive, language-based interfaces. In this paper we present TACOS (Task-Agnostic COordinator of a multi-drone System), a unified framework that enables high-level natural language control of multi-UAV systems through Large Language Models (LLMs). TACOS integrates three key capabilities into a single architecture: a one-to-many natural language interface for intuitive user interaction, an intelligent coordinator for translating user intent into structured task plans, and an autonomous agent that executes plans interacting with the real world. TACOS allows a LLM to interact with a library of executable APIs, bridging semantic reasoning with real-time multi-robot coordination. We demonstrate the system on a real-world multi-drone system, and conduct an ablation study to assess the contribution of each module.
comment: 8 pages, 9 figures, submitted to the IEEE for possible publication
The Illusion of Rationality: Tacit Bias and Strategic Dominance in Frontier LLM Negotiation Games
Large language models (LLMs) are increasingly being deployed as autonomous agents on behalf of institutions and individuals in economic, political, and social settings that involve negotiation. Yet this trend carries significant risks if their strategic behavior is not well understood. In this work, we revisit the NegotiationArena framework and run controlled simulation experiments on a diverse set of frontier LLMs across three multi turn bargaining games: Buyer Seller, Multi turn Ultimatum, and Resource Exchange. We ask whether improved general reasoning capabilities lead to rational, unbiased, and convergent negotiation strategies. Our results challenge this assumption. We find that models diverge into distinct, model specific strategic equilibria rather than converging to a unified optimal behavior. Moreover, strong numerical and semantic anchoring effects persist: initial offers are highly predictive of final agreements, and models consistently generate biased proposals by collapsing diverse internal valuations into rigid, generic price points. More concerningly, we observe dominance patterns in which some models systematically achieve higher payoffs than their counterparts. These findings underscore an urgent need to develop mechanisms to mitigate these issues before deploying such systems in real-world scenarios.
comment: 9 pages, 5 figures
Upgrading Democracies with Fairer Voting Methods
Voting methods are instrumental design elements of democracies. Citizens use them to express and aggregate their preferences to reach a collective decision. However, voting outcomes can be as sensitive to voting rules as they are to people's voting choices. Despite significance and interdisciplinary scientific progress, several democracies keep relying on outdated voting methods that do not fit modern, pluralistic societies well, while lacking social innovation. Here, we demonstrate how one can upgrade real-world democracies, namely by using alternative preferential voting methods such as cumulative voting and the method of equal shares designed for a proportional representation of voters' preferences. We rigorously evaluate the striking voting outcomes of these fair voting methods in a new participatory budgeting approach applied in the city of Aarau, Switzerland, including past and follow-up evidence. Results show more winning projects with the same budget. They also show broader geographic and preference representation of citizens by the elected projects, in particular for voters who used to be under-represented. We provide causal evidence showing that citizens prefer proportional voting methods, which possess strong legitimacy without the need of very specialized technical explanations. We also reveal strong underlying democratic values exhibited by citizens who support fair voting methods such as altruism and compromise. These findings come with the momentum to unleash a new and long-awaited participation blueprint of how to upgrade democracies globally.
comment: Includes Supplementary Information
Systems and Control (EESS)
Observer-based Differentially Private Consensus for Linear Multi-agent Systems
This paper investigates the differentially private consensus problem for general linear multi-agent systems (MASs) based on output feedback protocols. To protect the output information, which is considered private data and may be at high risk of exposure, Laplace noise is added to the information exchange. The conditions for achieving mean square and almost sure consensus in observer-based MASs are established using the backstepping method and the convergence theory for nonnegative almost supermartingales. It is shown that the separation principle remains valid for the consensus problem of linear MASs with decaying Laplace noise. Furthermore, the convergence rate is provided. Then, a joint design framework is developed for state estimation gain, feedback control gain, and noise to ensure the preservation of ε-differential privacy. The output information of each agent is shown to be protected at every time step. Finally, sufficient conditions are established for simultaneously achieving consensus and preserving differential privacy for linear MASs utilizing both full-order and reduced-order observers. Meanwhile, an ε*-differentially private consensus is achieved to meet the desired privacy level. Two simulation examples are provided to validate the theoretical results.
Learning-based Approximate Model Predictive Control for an Impact Wrench Tool
Learning-based model predictive control has emerged as a powerful approach for handling complex dynamics in mechatronic systems, enabling data-driven performance improvements while respecting safety constraints. However, when computational resources are severely limited, as in battery-powered tools with embedded processors, existing approaches struggle to meet real-time requirements. In this paper, we address the problem of real-time torque control for impact wrenches, where high-frequency control updates are necessary to accurately track the fast transients occurring during periodic impact events, while maintaining high-performance safety-critical control that mitigates harmful vibrations and component wear. The key novelty of the approach is that we combine data-driven model augmentation through Gaussian process regression with neural network approximation of the resulting control policy. This insight allows us to deploy predictive control on resource-constrained embedded platforms while maintaining both constraint satisfaction and microsecond-level inference times. The proposed framework is evaluated through numerical simulations and hardware experiments on a custom impact wrench testbed. The results show that our approach successfully achieves real-time control suitable for high-frequency operation while maintaining constraint satisfaction and improving tracking accuracy compared to baseline PID control.
Resilience of coupled systems under deep uncertainty and dynamic complexity: An integrative literature review
Resilience in coupled systems is increasingly critical in addressing global challenges such as climate change and pandemics. These systems show unpredictable behaviour due to dynamic complexity and deep uncertainty across spatiotemporal scales. Despite growing interest, few studies systematically integrate both concepts when assessing resilience. This paper conducts an integrative review of 102 English-language publications to identify gaps in current approaches. Findings reveal that most papers address lower levels of uncertainty and rarely consider dynamic complexity and deep uncertainty simultaneously, which limits the effectiveness of resilience strategies. To advance systems research, we propose a conceptual framework and practical tools to support researchers and decision-makers in evaluating and improving resilience. The paper also outlines future research directions for more robust, adaptive, and integrative resilience assessments.
A Formal Modular Synthesis Approach for the Coordination of 3-D Robotic Construction with Multi-robots
In this paper, we deal with the problem of coordinating multiple robots to build 3-D structures. This problem consists of a set of mobile robots that interact with each other in order to autonomously build a predefined 3-D structure. Our approach is based on Supervisory Control Theory, and it allows us to synthesize from models that represent a single robot and the target structure a correct-by-construction reactive controller, called supervisor. When this supervisor is replicated for the other robots, then the target structure can be completed by all robots
From Liability to Asset: A Three-Mode Grid-Forming Control Framework for Centralized Data Center UPS Systems
AI workloads are turning large data centers into highly dynamic power-electronic loads; fault-time behavior and workload pulsing can stress weak-grid points of interconnection. This paper proposes a centralized medium-voltage (MV) uninterruptible power supply (UPS) control architecture implemented as three operating modes: Mode 1 regulates a DC stiff bus and shapes normal-operation grid draw, Mode 2 enforces current-limited fault-mode P--Q priority with UPS battery energy storage system (UPS-BESS) buffering and a rate-limited post-fault "soft return," and Mode 3 optionally provides droop-based fast frequency response via grid-draw modulation. Fundamental-frequency averaged dq simulations (50 MW block, short-circuit ratio (SCR) = 1.5, 0.5 p.u. three-phase dip for 150~ms) show zero unserved information-technology (IT) energy (0.00000 MWh vs.0.00208 MWh for a momentary-cessation benchmark), a 0.57 p.u. peak inverter current (vs. 1.02 p.u. for a synchronous-reference-frame phase-locked loop (SRF-PLL) low-voltage ride-through (LVRT) baseline), a nonzero mean fault-window grid draw of 0.20~p.u. (vs.approx 0 for momentary cessation), and an improved settled point-of-common-coupling (PCC) voltage minimum of 0.79 p.u. after one cycle (vs. 0.66 p.u.). A forced-oscillation case study applies a 1 Hz pulsed load (+/- 0.25 p.u.) and shows that the normal-operation shaping filters the oscillation seen by the grid while the UPS-BESS buffers the pulsing component.
Machine Learning-based Optimal Control for Colloidal Self-Assembly
Achieving precise control of colloidal self-assembly into specific patterns remains a longstanding challenge due to the complex process dynamics. Recently, machine learning-based state representation and reinforcement learning-based control strategies have started to accumulate popularity in the field, showing great potential in achieving an automatable and generalizable approach to producing patterned colloidal assembly. In this work, we adopted a machine learning-based optimal control framework, combining unsupervised learning and graph convolutional neural work for state observation with deep reinforcement learning-based optimal control policy calculation, to provide a data-driven control approach that can potentially be generalized to other many-body self-assembly systems. With Brownian dynamics simulations, we demonstrated its superior performance as compared to traditional order parameter-based state description, and its efficacy in obtaining ordered 2-dimensional spherical colloidal self-assembly in an electric field-mediated system with an actual success rate of 97%.
comment: 19 pages, 5 figures, 1 table
Economic versus energetic model predictive control of a cold production plant with thermal energy storage
Economic model predictive control has been proposed as a means for solving the unit loading and unit allocation problem in multi-chiller cooling plants. The adjective economic stems from the use of financial cost due to electricity consumption in a time horizon, such is the loss function minimized at each sampling period. The energetic approach is rarely encountered. This article presents for the first time a comparison between the energetic optimization objective and the economic one. The comparison is made on a cooling plant using air-cooled water chillers and a cold storage system. Models developed have been integrated into Simscape, and non-convex mixed optimization methods used to achieve optimal control trajectories for both energetic and economic goals considered separately. The results over several scenarios, and in different seasons, support the consideration of the energetic approach despite the current prevalence of the economic one. The results are dependent on the electric season and the available tariffs. In particular, for the high electric season and considering a representative tariff, the results show that an increment of about 2.15% in energy consumption takes place when using the economic approach instead of the energetic one. On the other hand, a reduction in cost of 2.94% is achieved.
comment: 14 pages
Optimal chiller loading including transients
Scheduling and loading of chillers in a multi-chiller plant is considered. A new framework is introduced considering an extended set of independent variables for the optimization problem of energy consumption. In this way the number of decision variables is increased, providing extra degrees of freedom to optimize cooling plant operation. The dynamic effects due to transients arising from switching on and off of units are usually not considered in the literature dealing with Optimal Chiller Loading/Sequencing which is restricted to the static case. In this paper, these effects are treated in a way that results in a manageable optimization problem. A Simultaneous Perturbation Stochastic Approximation solution is deployed for the problem and the proposed method is compared with a similar but static approach showing the benefits in terms of reduced energy consumption.
Contraction Analysis of Filippov Solutions in Multi-Modal Piecewise Smooth Systems
This paper provides conditions to ensure contractive behavior of Filippov solutions generated by multi-modal piecewise smooth (PWS) systems. These conditions are instrumental in analyzing the asymptotic behavior of PWS systems, such as convergence towards an equilibrium point or a limit cycle. The work is motivated by a known principle for contraction analysis of bimodal PWS systems which ensures that the flow dynamics of each mode and the sliding dynamics on the switching manifold are contracting. This approach is extended first to PWS systems with multiple non-intersecting switching manifolds in Rn, and then to two intersecting switching manifolds in R2. Numerical examples are provided to validate the theoretical findings, along with a discussion on extensions to more general intersecting switching manifolds in Rn.
Using Seminorms To Analyze Contraction of Switched Systems With Only Non-Contracting Modes
This paper investigates contraction properties of switched dynamical systems for the case that all modes are non-contracting, thereby extending existing results that require at least one mode to be contracting. Leveraging the property that unstable systems may still exhibit stable behavior within certain subspaces, conditions are provided which ensure contracting evolution within a given subspace of the state space of the switched system. These conditions are derived using the concepts of seminorms and semi-contracting systems. Then, by selecting a set of subspaces whose corresponding seminorms form a separating family of the state space, and by verifying whether a given mode is contracting in each subspace, conditions on the activation time of each mode are provided by which contraction on the complete state space is guaranteed. Numerical examples are presented for illustration.
Closed Loop Reference Optimization for Extrusion Additive Manufacturing
Various defects occur during material extrusion additive manufacturing processes that degrade the quality of the 3D printed parts and lead to significant material waste. This motivates feedback control of the extrusion process to mitigate defects and prevent print failure. We propose a linear quadratic regulator (LQR) for closed-loop control with force feedback to provide accurate width tracking of the extruded filament. Furthermore, we propose preemptive optimization of the reference force given to the LQR that accounts for the performance of the LQR and generates the optimal reference for the closed loop extrusion dynamics and machine constraints. Simulation results demonstrate the improved tracking performance and response time. Experiments on a Fused Filament Fabrication 3D printer showcase a root mean square error improvement of 39.57% compared to tracking the unmodified reference as well as an 83.7% shorter settling time.
Black-Start Power Capacity Sizing and Control Strategy for an Islanded DFIG Wind-to-Hydrogen System
This paper proposes a black-start method for an off-grid wind-to-hydrogen (W2H) system comprising a wind farm based on Doubly-Fed Induction Generators (DFIGs), proton exchange membrane fuel cells (PEMFCs) serving as the black-start power source, and a hydrogen production industry. The PEMFC is installed within the hydrogen industry to facilitate direct access to hydrogen fuel. Based on the microgrid topology and black-start scheme, this study innovatively sizes the rated capacity of the PEMFC through power flow analysis. The capacity must be sufficient to charge passive components such as transmission lines and transformers, provide rotor excitation, and supply wind turbine (WT) and electrolyzer (ELZ) auxiliaries during startup. The proposed system integrates wind-hydrogen coordinated control (WHCC) and hydrogen-storage coordinated control (HSCC). Under maximum power point tracking (MPPT) of the WTs, the ELZ follows power fluctuations to absorb wind output, ensuring stable voltage and frequency. Fixed-frequency control applied to either the DFIG or PEMFC converters enables DFIGs to retain conventional grid-following (GFL) operation, reducing converter development costs. For both control modes, this paper establishes the black-start sequence and formulates a comprehensive coordinated control strategy for the entire system. The entire control system is validated through simulations in MATLAB/Simulink. Results confirm that the calculated PEMFC capacity supports reliable black-start, while the black-start control strategy ensures smooth system self-startup. Furthermore, the coordinated control strategy maintains stable frequency and voltage under fluctuating wind power, demonstrating the practicality and robustness of the proposed approach.
Consensus tracking of perturbed open multi-agent systems with repelling antagonistic interactions
An open multi-agent system (OMAS) comprises migrating agents which produce a flexible network structure that is naturally switching and size-varying. Meanwhile, agent migrations also make an OMAS more prone to environmental adversities. In this work, we deal with the consensus tracking problem of OMASs suffering these migration-induced adversities, including non-vanishing perturbations in the agent dynamics/state and the repelling antagonistic interactions among agents, over an intermittently disconnected signed digraph. The OMAS is interpreted into a perturbed $M^3D$ system in which unstable subsystems are created when repelling interactions dominate the normal cooperative ones in the OMAS network regardless of its connectivity. To handle the destabilizing effects brought by the repelling interaction as well as the non-vanishing perturbations, we extend the stability theory for $M^3D$ systems and apply it to the OMAS to show that practical consensus tracking can be achieved if the migration-induced switching satisfies the piecewise average dwell time and activation time ratio constraints. Particularly, we indicate that for vanishing perturbations and repelling interactions, asymptotic tracking can be expected under weaker switching constraints.
Synchronization, Identification, and Signal Detection for Underwater Photon-Counting Communications With Input-Dependent Shot Noise
Photon counting (PhC) is an effective detection technology for underwater optical wireless communication (OWC) systems. The presence of signal-dependent Poisson shot noise and asynchronous multi-user interference (MUI) complicates the processing of received data signals, hindering the effective signal detection of PhC OWC systems. This paper proposes a novel iterative signal detection method in grant-free, multi-user, underwater PhC OWC systems with signal-dependent Poisson shot noise. We first introduce a new synchronization algorithm with a unique frame structure design.The algorithm performs active user identification and transmission delay estimation. Specifically, the estimation is performed first on a user group basis and then at the individual user level with reduced complexity and latency.We also develop a nonlinear iterative multi-user detection (MUD) algorithm that utilizes a detection window for each user to identify interfering symbols and estimate MUI on a slot-by-slot basis, followed by maximum \textit{a-posteriori} probability detection of user signals.Simulations demonstrate that our scheme achieves bit error rates comparable to scenarios with transmission delays known and signal detection perfectly synchronized.
Perception-Limited Smooth Safety Filtering
This paper develops a smooth safety-filtering framework for nonlinear control-affine systems under limited perception. Classical Control Barrier Function (CBF) filters assume global availability of the safety function - its value and gradient must be known everywhere - an assumption incompatible with sensing-limited settings, and the resulting filters often exhibit nonsmooth switching when constraints activate. We propose two complementary perception-aware safety filters applicable to general control-invariant safety sets. The first introduces a smooth perception gate that modulates barrier constraints based on sensing range, yielding a closed-form Lipschitz-safe controller with forward-invariance guarantees. The second replaces the hard CBF constraint with a differentiable penalty term, leading to a smooth unconstrained optimization-based safety filter consistent with CBF principles. For both designs, we establish existence, uniqueness, and forward invariance of the closed-loop trajectories. Numerical results demonstrate that the proposed smooth filters enable the synthesis of higher-order tracking controllers for systems such as drones and second-order ground robots, offering substantially smoother and more robust safety-critical behaviors than classical CBF-based filters.
comment: 8 pages
Security Risks of Agentic Vehicles: A Systematic Analysis of Cognitive and Cross-Layer Threats
Agentic AI is increasingly being explored and introduced in both manually driven and autonomous vehicles, leading to the notion of Agentic Vehicles (AgVs), with capabilities such as memory-based personalization, goal interpretation, strategic reasoning, and tool-mediated assistance. While frameworks such as the OWASP Agentic AI Security Risks highlight vulnerabilities in reasoning-driven AI systems, they are not designed for safety-critical cyber-physical platforms such as vehicles, nor do they account for interactions with other layers such as perception, communication, and control layers. This paper investigates security threats in AgVs, including OWASP-style risks and cyber-attacks from other layers affecting the agentic layer. By introducing a role-based architecture for agentic vehicles, consisting of a Personal Agent and a Driving Strategy Agent, we will investigate vulnerabilities in both agentic AI layer and cross-layer risks, including risks originating from upstream layers (e.g., perception layer, control layer, etc.). A severity matrix and attack-chain analysis illustrate how small distortions can escalate into misaligned or unsafe behavior in both human-driven and autonomous vehicles. The resulting framework provides the first structured foundation for analyzing security risks of agentic AI in both current and emerging vehicle platforms.
Verifiable Natural Language to Linear Temporal Logic Translation: A Benchmark Dataset and Evaluation Suite
Empirical evaluation of state-of-the-art natural-language (NL) to temporal-logic (TL) translation systems reveals near-perfect performance on existing benchmarks. However, current studies measure only the accuracy of the translation of NL logic into formal TL, ignoring a system's capacity to ground atomic propositions into new scenarios or environments. This is a critical feature, necessary for the verification of resulting formulas in a concrete state space. Consequently, most NL-to-TL translation frameworks propose their own bespoke dataset in which the correct grounding is known a-priori, inflating performance metrics and neglecting the need for extensible, domain-general systems. In this paper, we introduce the Verifiable Linear Temporal Logic Benchmark ( VLTL-Bench), a unifying benchmark that measures verification and verifiability of automated NL-to-LTL translation. The dataset consists of four unique state spaces and thousands of diverse natural language specifications and corresponding formal specifications in temporal logic. Moreover, the benchmark contains sample traces to validate the temporal logic expressions. While the benchmark directly supports end-to-end evaluation, we observe that many frameworks decompose the process into i) lifting, ii) grounding, iii) translation, and iv) verification. The benchmark provides ground truths after each of these steps to enable researches to improve and evaluate different substeps of the overall problem. To encourage methodologically sound advances in verifiable NL-to-LTL translation approaches, we release VLTL-Bench here: https://www.kaggle.com/datasets/dubascudes/vltl bench.
Improving Action Smoothness for a Cascaded Online Learning Flight Control System
This paper aims to improve the action smoothness of a cascaded online learning flight control system. Although the cascaded structure is widely used in flight control design, its stability can be compromised by oscillatory control actions, which poses challenges for practical engineering applications. To address this issue, we introduce an online temporal smoothness technique and a low-pass filter to reduce the amplitude and frequency of the control actions. Fast Fourier Transform (FFT) is used to analyze policy performance in the frequency domain. Simulation results demonstrate the improvements achieved by the two proposed techniques.
Hill-Type Stability Analysis of Periodic Solutions of Fractional-Order Differential Equations
This paper explores stability properties of periodic solutions of (nonlinear) fractional-order differential equations (FODEs). As classical Caputo-type FODEs do not admit exactly periodic solutions, we propose a framework of Liouville-Weyl-type FODEs, which do admit exactly periodic solutions and are an extension of Caputo-type FODEs. Local linearization around a periodic solution results in perturbation dynamics governed by a linear time-periodic differential equation. In the classical integer-order case, the perturbation dynamics is therefore described by Floquet theory, i.e. the exponential growth or decay of perturbations is expressed by Floquet exponents which can be assessed using the Hill matrix approach. For fractional-order systems, however, a rigorous Floquet theory is lacking. Here, we explore the limitations when trying to extend Floquet theory and the Hill matrix method to linear time-periodic fractional-order differential equations (LTP-FODEs) as local linearization of nonlinear fractional-order systems. A key result of the paper is that such an extended Floquet theory can only assess exponentially growing solutions of LTP-FODEs. Moreover, we provide an analysis of linear time-invariant fractional-order systems (LTI-FODEs) with algebraically decaying solutions and show that the inaccessibility of decaying solutions through Floquet theory is already present in the time-invariant case.
comment: 22 pages, 6 figures
Feedback stabilization of some fourth-order nonlinear parabolic equations with saturated controls
In this work, we analyze the internal and boundary stabilization of the Cahn-Hilliard and Kuramoto-Sivashinsky equations under saturated feedback control. We conduct our study through the spectral analysis of the associated linear operator. We identify a finite number of eigenvalues related to the unstable part of the system and then design a stabilization strategy based on modal decomposition, linear matrix inequalities (LMIs), and geometric conditions on the saturation function. Local exponential stabilization in $H^{2}$ is established.
Hull Clustering with Blended Representative Periods for Energy System Optimization Models
The growing integration of renewable energy sources into power systems requires planning models to account for not only demand variability but also fluctuations in renewable availability during operational periods. Capturing this temporal detail over long planning horizons can be computationally demanding or even intractable. A common approach to address this challenge is to approximate the problem using a reduced set of selected time periods, known as representative periods (RPs). However, using too few RPs can significantly degrade solution quality. In this paper, we propose the method of hull clustering with blended RPs to enhance traditional clustering-based RP approaches in two key ways. First, instead of selecting typical cluster centers (e.g., centroids or medoids) as RPs, our method is based on extreme points, which are more likely to be constraint-binding. Second, it represents base periods as weighted combinations of RPs (e.g., convex or conic blends), approximating the full time horizon more accurately and with fewer RPs. Through two case studies based on data from the European network operators, we demonstrate that hull clustering with blended RPs outperforms traditional RP techniques in both regret and computational efficiency.
Historical Information Accelerates Decentralized Optimization: A Proximal Bundle Method
Historical information, such as past function values or gradients, has significant potential to enhance decentralized optimization methods for two key reasons: first, it provides richer information about the objective function, which also explains its established success in centralized optimization; second, unlike the second-order derivative or its alternatives, historical information has already been computed or communicated and requires no additional cost to acquire. Despite this potential, it remains underexploited. In this work, we employ a proximal bundle framework to incorporate the function values and gradients at historical iterates and adapt the framework to the proximal decentralized gradient descent method, resulting in a Decentralized Proximal Bundle Method (DPBM). To broaden its applicability, we further extend DPBM to the asynchronous and stochastic setting. We theoretically analysed the convergence of the proposed methods. Notably, both the asynchronous DPBM and its stochastic variant can converge with fixed step-sizes that are independent of delays, which is superior to the delay-dependent step-sizes required by most existing asynchronous optimization methods, as it is easier to determine and often leads to faster convergence. Numerical experiments on classification problems demonstrate that by using historical information, our methods yield faster convergence and stronger robustness in the step-sizes.
Iterative Joint Detection of Kalman Filter and Channel Decoder for Sensor-to-Controller Link in Wireless Networked Control Systems
In this letter, we propose an iterative joint detection algorithm of Kalman filter (KF) and channel decoder for the sensor-to-controller link of wireless networked control systems, which utilizes the prior information of control system to improve control and communication performance. In this algorithm, we first use the KF to estimate the probability density of the control system outputs and calculate the prior probability of received signals to assist decoder. Then, the possible outputs of the control system are traversed to update the prior probability in order to implement iterative detection. The simulation results show that the prior information and the iterative structure can reduce the block error rate performance of communications while improving the root mean square error performance of controls.
comment: This letter has been accepted by IEEE WCL
Optimizing the Network Topology of a Linear Reservoir Computer
Machine learning has become a fundamental approach for modeling, prediction, and control, enabling systems to learn from data and perform complex tasks. Reservoir computing is a machine learning tool that leverages high-dimensional dynamical systems to efficiently process temporal data for prediction and observation tasks. Traditionally, the connectivity of the network that underlies a reservoir computer (RC) is generated randomly, lacking a principled design. Here, we focus on optimizing the connectivity of a linear RC to improve its performance and interpretability, which we achieve by decoupling the RC dynamics into a number of independent modes. We then proceed to optimize each one of these modes to perform a given task, which corresponds to selecting an optimal RC connectivity in terms of a given set of eigenvalues of the RC adjacency matrix. Simulations on networks of varying sizes show that the optimized RC significantly outperforms randomly constructed reservoirs in both training and testing phases and often surpasses nonlinear reservoirs of comparable size. This approach provides both practical performance advantages and theoretical guidelines for designing efficient, task-specific, and analytically transparent RC architectures.
Data-driven uncertainty-aware seakeeping prediction of the Delft 372 catamaran using ensemble Hankel dynamic mode decomposition
In this study, we present and validate an ensemble-based Hankel Dynamic Mode Decomposition with control (HDMDc) for uncertainty-aware seakeeping predictions of a high-speed catamaran, namely the Delft 372 model. Experimental measurements (time histories) of wave elevation at the longitudinal center of gravity, heave, pitch, notional flight-deck velocity, notional bridge acceleration, and total resistance were collected from irregular wave basin tests on a 1:33.3 scale replica of the Delft 372 model under sea state 5 conditions at Fr = 0.425, and organized into training, validation, and test sets. The HDMDc algorithm constructs an equation-free linear reduced-order model of the seakeeping vessel by augmenting states and inputs with their time-lagged copies to capture nonlinear and memory effects. Two ensembling strategies, namely Bayesian HDMDc (BHDMDc), which samples hyperparameters considered stochastic variables with prior distribution to produce posterior mean forecasts with confidence intervals, and Frequentist HDMDc (FHDMDc), which aggregates multiple model obtained over data subsets, are compared in providing seakeeping prediction and uncertainty quantification. The FHDMDc approach is found to improve the accuracy of the predictions compared to the deterministic counterpart, also providing robust uncertainty estimation; whereas the application of BHDMDc to the present test case is not found beneficial in comparison to the deterministic model. FHDMDc-derived probability density functions for the motions closely match both experimental data and URANS results, demonstrating reliable and computationally efficient seakeeping prediction for design and operational support.
A Review of Software for Designing and Operating Quantum Networks
Quantum network protocol development is crucial to realizing a production-grade network that can support distributed sensing, secure communication, and utility-scale quantum computation. However, the transition from laboratory demonstration to deployable networks requires software implementations of architectures and protocols tailored to the unique constraints of quantum systems. This paper reviews the current state of software implementations for quantum networks, organized around the three-plane abstraction of infrastructure, logical, and control/service planes. We cover software for both designing quantum network protocols (e.g., SeQUeNCe, QuISP, and NetSquid) and operating them, with a focus on essential control/service plane functions such as entanglement, topology, and resource management, in a proposed taxonomy. Our review highlights a persistent gap between theoretical protocol proposals and their realization in simulators or testbeds, particularly in dynamic topology and network management. We conclude by outlining open challenges and proposing a roadmap for developing scalable software architectures to enable hybrid, large-scale quantum networks.
comment: 12 pages, 5 figures, 3 tables, journal
Accelerating Hybrid Model Predictive Control using Warm-Started Generalized Benders Decomposition
Hybrid model predictive control with both continuous and discrete variables is widely applicable to robotic control tasks, especially those involving contacts with the environment. Due to combinatorial complexity, the solving speed of hybrid MPC can be insufficient for real-time applications. In this paper, we propose a hybrid MPC algorithm based on Generalized Benders Decomposition. The algorithm enumerates and stores cutting planes online inside a finite buffer and transfers them across MPC iterations to provide warm-starts for new problem instances, significantly enhancing solving speed. We theoretically analyze this warm-starting performance by modeling the deviation of mode sequences through temporal shifting and stretching, deriving bounds on the dual gap between transferred optimality cuts and the true optimal costs, and utilizing these bounds to quantify the level of suboptimality guaranteed in the first solve of the Benders Master Problem. Our algorithm is validated in simulation through controlling a cart-pole system with soft contact walls, a free-flying robot navigating around obstacles, and a humanoid robot standing on one leg while pushing against walls with its hands for balance. For our benchmark problems, the algorithm enumerates cuts on the order of only tens to hundreds while reaching speeds 2-3 times faster than the off-the-shelf solver Gurobi, oftentimes exceeding 1000 Hz. The code is available at https://github.com/XuanLin/Benders-MPC.
comment: Significantly revised to emphasize theoretical bounds. The heuristic Master Problem algorithm and GCS tightening experiments are preserved in v1
Robotics
mimic-video: Video-Action Models for Generalizable Robot Control Beyond VLAs
Prevailing Vision-Language-Action Models (VLAs) for robotic manipulation are built upon vision-language backbones pretrained on large-scale, but disconnected static web data. As a result, despite improved semantic generalization, the policy must implicitly infer complex physical dynamics and temporal dependencies solely from robot trajectories. This reliance creates an unsustainable data burden, necessitating continuous, large-scale expert data collection to compensate for the lack of innate physical understanding. We contend that while vision-language pretraining effectively captures semantic priors, it remains blind to physical causality. A more effective paradigm leverages video to jointly capture semantics and visual dynamics during pretraining, thereby isolating the remaining task of low-level control. To this end, we introduce \model, a novel Video-Action Model (VAM) that pairs a pretrained Internet-scale video model with a flow matching-based action decoder conditioned on its latent representations. The decoder serves as an Inverse Dynamics Model (IDM), generating low-level robot actions from the latent representation of video-space action plans. Our extensive evaluation shows that our approach achieves state-of-the-art performance on simulated and real-world robotic manipulation tasks, improving sample efficiency by 10x and convergence speed by 2x compared to traditional VLA architectures.
An Open Toolkit for Underwater Field Robotics
Underwater robotics is becoming increasingly important for marine science, environmental monitoring, and subsea industrial operations, yet the development of underwater manipulation and actuation systems remains restricted by high costs, proprietary designs, and limited access to modular, research-oriented hardware. While open-source initiatives have democratized vehicle construction and control software, a substantial gap persists for joint-actuated systems-particularly those requiring waterproof, feedback-enabled actuation suitable for manipulators, grippers, and bioinspired devices. As a result, many research groups face lengthy development cycles, limited reproducibility, and difficulty transitioning laboratory prototypes to field-ready platforms. To address this gap, we introduce an open, cost-effective hardware and software toolkit for underwater manipulation research. The toolkit includes a depth-rated Underwater Robotic Joint (URJ) with early leakage detection, compact control and power management electronics, and a ROS2-based software stack for sensing and multi-mode actuation. All CAD models, fabrication files, PCB sources, firmware, and ROS2 packages are openly released, enabling local manufacturing, modification, and community-driven improvement. The toolkit has undergone extensive laboratory testing and multiple field deployments, demonstrating reliable operation up to 40 m depth across diverse applications, including a 3-DoF underwater manipulator, a tendon-driven soft gripper, and an underactuated sediment sampler. These results validate the robustness, versatility, and reusability of the toolkit for real marine environments. By providing a fully open, field-tested platform, this work aims to lower the barrier to entry for underwater manipulation research, improve reproducibility, and accelerate innovation in underwater field robotics.
comment: 10 pages, 8 figures
OMCL: Open-vocabulary Monte Carlo Localization
Robust robot localization is an important prerequisite for navigation planning. If the environment map was created from different sensors, robot measurements must be robustly associated with map features. In this work, we extend Monte Carlo Localization using vision-language features. These open-vocabulary features enable to robustly compute the likelihood of visual observations, given a camera pose and a 3D map created from posed RGB-D images or aligned point clouds. The abstract vision-language features enable to associate observations and map elements from different modalities. Global localization can be initialized by natural language descriptions of the objects present in the vicinity of locations. We evaluate our approach using Matterport3D and Replica for indoor scenes and demonstrate generalization on SemanticKITTI for outdoor scenes.
comment: Accepted to IEEE RA-L
QuantGraph: A Receding-Horizon Quantum Graph Solver
Dynamic programming is a cornerstone of graph-based optimization. While effective, it scales unfavorably with problem size. In this work, we present QuantGraph, a two-stage quantum-enhanced framework that casts local and global graph-optimization problems as quantum searches over discrete trajectory spaces. The solver is designed to operate efficiently by first finding a sequence of locally optimal transitions in the graph (local stage), without considering full trajectories. The accumulated cost of these transitions acts as a threshold that prunes the search space (up to 60% reduction for certain examples). The subsequent global stage, based on this threshold, refines the solution. Both stages utilize variants of the Grover-adaptive-search algorithm. To achieve scalability and robustness, we draw on principles from control theory and embed QuantGraph's global stage within a receding-horizon model-predictive-control scheme. This classical layer stabilizes and guides the quantum search, improving precision and reducing computational burden. In practice, the resulting closed-loop system exhibits robust behavior and lower overall complexity. Notably, for a fixed query budget, QuantGraph attains a 2x increase in control-discretization precision while still benefiting from Grover-search's inherent quadratic speedup compared to classical methods.
comment: P.Vaidhyanathan and A. Papatheodorou contributed equally to this work. 11 pages, 4 figures, 1 table, 2 algorithms
Load-Based Variable Transmission Mechanism for Robotic Applications
This paper presents a Load-Based Variable Transmission (LBVT) mechanism designed to enhance robotic actuation by dynamically adjusting the transmission ratio in response to external torque demands. Unlike existing variable transmission systems that require additional actuators for active control, the proposed LBVT mechanism leverages a pre-tensioned spring and a four-bar linkage to passively modify the transmission ratio, thereby reducing the complexity of robot joint actuation systems. The effectiveness of the LBVT mechanism is evaluated through simulation-based analyses. The results confirm that the system achieves up to a 40 percent increase in transmission ratio upon reaching a predefined torque threshold, effectively amplifying joint torque when required without additional actuation. Furthermore, the simulations demonstrate a torque amplification effect triggered when the applied force exceeds 18 N, highlighting the system ability to autonomously respond to varying load conditions. This research contributes to the development of lightweight, efficient, and adaptive transmission systems for robotic applications, particularly in legged robots where dynamic torque adaptation is critical.
comment: 22nd International Conference on Advanced Robotics (ICAR 2025)
Photorealistic Phantom Roads in Real Scenes: Disentangling 3D Hallucinations from Physical Geometry
Monocular depth foundation models achieve remarkable generalization by learning large-scale semantic priors, but this creates a critical vulnerability: they hallucinate illusory 3D structures from geometrically planar but perceptually ambiguous inputs. We term this failure the 3D Mirage. This paper introduces the first end-to-end framework to probe, quantify, and tame this unquantified safety risk. To probe, we present 3D-Mirage, the first benchmark of real-world illusions (e.g., street art) with precise planar-region annotations and context-restricted crops. To quantify, we propose a Laplacian-based evaluation framework with two metrics: the Deviation Composite Score (DCS) for spurious non-planarity and the Confusion Composite Score (CCS) for contextual instability. To tame this failure, we introduce Grounded Self-Distillation, a parameter-efficient strategy that surgically enforces planarity on illusion ROIs while using a frozen teacher to preserve background knowledge, thus avoiding catastrophic forgetting. Our work provides the essential tools to diagnose and mitigate this phenomenon, urging a necessary shift in MDE evaluation from pixel-wise accuracy to structural and contextual robustness. Our code and benchmark will be publicly available to foster this exciting research direction.
MiVLA: Towards Generalizable Vision-Language-Action Model with Human-Robot Mutual Imitation Pre-training
While leveraging abundant human videos and simulated robot data poses a scalable solution to the scarcity of real-world robot data, the generalization capability of existing vision-language-action models (VLAs) remains limited by mismatches in camera views, visual appearance, and embodiment morphologies. To overcome this limitation, we propose MiVLA, a generalizable VLA empowered by human-robot mutual imitation pre-training, which leverages inherent behavioral similarity between human hands and robotic arms to build a foundation of strong behavioral priors for both human actions and robotic control. Specifically, our method utilizes kinematic rules with left/right hand coordinate systems for bidirectional alignment between human and robot action spaces. Given human or simulated robot demonstrations, MiVLA is trained to forecast behavior trajectories for one embodiment, and imitate behaviors for another one unseen in the demonstration. Based on this mutual imitation, it integrates the behavioral fidelity of real-world human data with the manipulative diversity of simulated robot data into a unified model, thereby enhancing the generalization capability for downstream tasks. Extensive experiments conducted on both simulation and real-world platforms with three robots (ARX, PiPer and LocoMan), demonstrate that MiVLA achieves strong improved generalization capability, outperforming state-of-the-art VLAs (e.g., $\boldsymbolπ_{0}$, $\boldsymbolπ_{0.5}$ and H-RDT) by 25% in simulation, and 14% in real-world robot control tasks.
Remotely Detectable Robot Policy Watermarking
The success of machine learning for real-world robotic systems has created a new form of intellectual property: the trained policy. This raises a critical need for novel methods that verify ownership and detect unauthorized, possibly unsafe misuse. While watermarking is established in other domains, physical policies present a unique challenge: remote detection. Existing methods assume access to the robot's internal state, but auditors are often limited to external observations (e.g., video footage). This ``Physical Observation Gap'' means the watermark must be detected from signals that are noisy, asynchronous, and filtered by unknown system dynamics. We formalize this challenge using the concept of a \textit{glimpse sequence}, and introduce Colored Noise Coherency (CoNoCo), the first watermarking strategy designed for remote detection. CoNoCo embeds a spectral signal into the robot's motions by leveraging the policy's inherent stochasticity. To show it does not degrade performance, we prove CoNoCo preserves the marginal action distribution. Our experiments demonstrate strong, robust detection across various remote modalities, including motion capture and side-way/top-down video footage, in both simulated and real-world robot experiments. This work provides a necessary step toward protecting intellectual property in robotics, offering the first method for validating the provenance of physical policies non-invasively, using purely remote observations.
GuangMing-Explorer: A Four-Legged Robot Platform for Autonomous Exploration in General Environments
Autonomous exploration is a fundamental capability that tightly integrates perception, planning, control, and motion execution. It plays a critical role in a wide range of applications, including indoor target search, mapping of extreme environments, resource exploration, etc. Despite significant progress in individual components, a holistic and practical description of a completely autonomous exploration system, encompassing both hardware and software, remains scarce. In this paper, we present GuangMing-Explorer, a fully integrated autonomous exploration platform designed for robust operation across diverse environments. We provide a comprehensive overview of the system architecture, including hardware design, software stack, algorithm deployment, and experimental configuration. Extensive real-world experiments demonstrate the platform's effectiveness and efficiency in executing autonomous exploration tasks, highlighting its potential for practical deployment in complex and unstructured environments.
comment: 6 pages, published in ICUS2025
A Network-Based Framework for Modeling and Analyzing Human-Robot Coordination Strategies
Studies of human-robot interaction in dynamic and unstructured environments show that as more advanced robotic capabilities are deployed, the need for cooperative competencies to support collaboration with human problem-holders increases. Designing human-robot systems to meet these demands requires an explicit understanding of the work functions and constraints that shape the feasibility of alternative joint work strategies. Yet existing human-robot interaction frameworks either emphasize computational support for real-time execution or rely on static representations for design, offering limited support for reasoning about coordination dynamics during early-stage conceptual design. To address this gap, this article presents a novel computational framework for analyzing joint work strategies in human-robot systems by integrating techniques from functional modeling with graph-theoretic representations. The framework characterizes collective work in terms of the relationships among system functions and the physical and informational structure of the work environment, while explicitly capturing how coordination demands evolve over time. Its use during conceptual design is demonstrated through a case study in disaster robotics, which shows how the framework can be used to support early trade-space exploration of human-robot coordination strategies and to identify cooperative competencies that support flexible management of coordination overhead. These results show how the framework makes coordination demands and their temporal evolution explicit, supporting design-time reasoning about cooperative competency requirements and work demands prior to implementation.
comment: Under review at IEEE Transactions on Human-Machine Systems. 12 pages, 5 figures
VLA-AN: An Efficient and Onboard Vision-Language-Action Framework for Aerial Navigation in Complex Environments
This paper proposes VLA-AN, an efficient and onboard Vision-Language-Action (VLA) framework dedicated to autonomous drone navigation in complex environments. VLA-AN addresses four major limitations of existing large aerial navigation models: the data domain gap, insufficient temporal navigation with reasoning, safety issues with generative action policies, and onboard deployment constraints. First, we construct a high-fidelity dataset utilizing 3D Gaussian Splatting (3D-GS) to effectively bridge the domain gap. Second, we introduce a progressive three-stage training framework that sequentially reinforces scene comprehension, core flight skills, and complex navigation capabilities. Third, we design a lightweight, real-time action module coupled with geometric safety correction. This module ensures fast, collision-free, and stable command generation, mitigating the safety risks inherent in stochastic generative policies. Finally, through deep optimization of the onboard deployment pipeline, VLA-AN achieves a robust real-time 8.3x improvement in inference throughput on resource-constrained UAVs. Extensive experiments demonstrate that VLA-AN significantly improves spatial grounding, scene reasoning, and long-horizon navigation, achieving a maximum single-task success rate of 98.1%, and providing an efficient, practical solution for realizing full-chain closed-loop autonomy in lightweight aerial robots.
Infrastructure-based Autonomous Mobile Robots for Internal Logistics -- Challenges and Future Perspectives
The adoption of Autonomous Mobile Robots (AMRs) for internal logistics is accelerating, with most solutions emphasizing decentralized, onboard intelligence. While AMRs in indoor environments like factories can be supported by infrastructure, involving external sensors and computational resources, such systems remain underexplored in the literature. This paper presents a comprehensive overview of infrastructure-based AMR systems, outlining key opportunities and challenges. To support this, we introduce a reference architecture combining infrastructure-based sensing, on-premise cloud computing, and onboard autonomy. Based on the architecture, we review core technologies for localization, perception, and planning. We demonstrate the approach in a real-world deployment in a heavy-vehicle manufacturing environment and summarize findings from a user experience (UX) evaluation. Our aim is to provide a holistic foundation for future development of scalable, robust, and human-compatible AMR systems in complex industrial environments.
EPSM: A Novel Metric to Evaluate the Safety of Environmental Perception in Autonomous Driving
Extensive evaluation of perception systems is crucial for ensuring the safety of intelligent vehicles in complex driving scenarios. Conventional performance metrics such as precision, recall and the F1-score assess the overall detection accuracy, but they do not consider the safety-relevant aspects of perception. Consequently, perception systems that achieve high scores in these metrics may still cause misdetections that could lead to severe accidents. Therefore, it is important to evaluate not only the overall performance of perception systems, but also their safety. We therefore introduce a novel safety metric for jointly evaluating the most critical perception tasks, object and lane detection. Our proposed framework integrates a new, lightweight object safety metric that quantifies the potential risk associated with object detection errors, as well as an lane safety metric including the interdependence between both tasks that can occur in safety evaluation. The resulting combined safety score provides a unified, interpretable measure of perception safety performance. Using the DeepAccident dataset, we demonstrate that our approach identifies safety critical perception errors that conventional performance metrics fail to capture. Our findings emphasize the importance of safety-centric evaluation methods for perception systems in autonomous driving.
comment: Submitted at IEEE IV 2026
Criticality Metrics for Relevance Classification in Safety Evaluation of Object Detection in Automated Driving
Ensuring safety is the primary objective of automated driving, which necessitates a comprehensive and accurate perception of the environment. While numerous performance evaluation metrics exist for assessing perception capabilities, incorporating safety-specific metrics is essential to reliably evaluate object detection systems. A key component for safety evaluation is the ability to distinguish between relevant and non-relevant objects - a challenge addressed by criticality or relevance metrics. This paper presents the first in-depth analysis of criticality metrics for safety evaluation of object detection systems. Through a comprehensive review of existing literature, we identify and assess a range of applicable metrics. Their effectiveness is empirically validated using the DeepAccident dataset, which features a variety of safety-critical scenarios. To enhance evaluation accuracy, we propose two novel application strategies: bidirectional criticality rating and multi-metric aggregation. Our approach demonstrates up to a 100% improvement in terms of criticality classification accuracy, highlighting its potential to significantly advance the safety evaluation of object detection systems in automated vehicles.
comment: Accepted at IEEE ICVES 2025
I am here for you": How relational conversational AI appeals to adolescents, especially those who are socially and emotionally vulnerable
General-purpose conversational AI chatbots and AI companions increasingly provide young adolescents with emotionally supportive conversations, raising questions about how conversational style shapes anthropomorphism and emotional reliance. In a preregistered online experiment with 284 adolescent-parent dyads, youth aged 11-15 and their parents read two matched transcripts in which a chatbot responded to an everyday social problem using either a relational style (first-person, affiliative, commitment language) or a transparent style (explicit nonhumanness, informational tone). Adolescents more often preferred the relational than the transparent style, whereas parents were more likely to prefer transparent style than adolescents. Adolescents rated the relational chatbot as more human-like, likable, trustworthy and emotionally close, while perceiving both styles as similarly helpful. Adolescents who preferred relational style had lower family and peer relationship quality and higher stress and anxiety than those preferring transparent style or both chatbots. These findings identify conversational style as a key design lever for youth AI safety, showing that relational framing heightens anthropomorphism, trust and emotional closeness and can be especially appealing to socially and emotionally vulnerable adolescents, who may be at increased risk for emotional reliance on conversational AI.
BEV-Patch-PF: Particle Filtering with BEV-Aerial Feature Matching for Off-Road Geo-Localization
We propose BEV-Patch-PF, a GPS-free sequential geo-localization system that integrates a particle filter with learned bird's-eye-view (BEV) and aerial feature maps. From onboard RGB and depth images, we construct a BEV feature map. For each 3-DoF particle pose hypothesis, we crop the corresponding patch from an aerial feature map computed from a local aerial image queried around the approximate location. BEV-Patch-PF computes a per-particle log-likelihood by matching the BEV feature to the aerial patch feature. On two real-world off-road datasets, our method achieves 7.5x lower absolute trajectory error (ATE) on seen routes and 7.0x lower ATE on unseen routes than a retrieval-based baseline, while maintaining accuracy under dense canopy and shadow. The system runs in real time at 10 Hz on an NVIDIA Tesla T4, enabling practical robot deployment.
NAP3D: NeRF Assisted 3D-3D Pose Alignment for Autonomous Vehicles
Accurate localization is essential for autonomous vehicles, yet sensor noise and drift over time can lead to significant pose estimation errors, particularly in long-horizon environments. A common strategy for correcting accumulated error is visual loop closure in SLAM, which adjusts the pose graph when the agent revisits previously mapped locations. These techniques typically rely on identifying visual mappings between the current view and previously observed scenes and often require fusing data from multiple sensors. In contrast, this work introduces NeRF-Assisted 3D-3D Pose Alignment (NAP3D), a complementary approach that leverages 3D-3D correspondences between the agent's current depth image and a pre-trained Neural Radiance Field (NeRF). By directly aligning 3D points from the observed scene with synthesized points from the NeRF, NAP3D refines the estimated pose even from novel viewpoints, without relying on revisiting previously observed locations. This robust 3D-3D formulation provides advantages over conventional 2D-3D localization methods while remaining comparable in accuracy and applicability. Experiments demonstrate that NAP3D achieves camera pose correction within 5 cm on a custom dataset, robustly outperforming a 2D-3D Perspective-N-Point baseline. On TUM RGB-D, NAP3D consistently improves 3D alignment RMSE by approximately 6 cm compared to this baseline given varying noise, despite PnP achieving lower raw rotation and translation parameter error in some regimes, highlighting NAP3D's improved geometric consistency in 3D space. By providing a lightweight, dataset-agnostic tool, NAP3D complements existing SLAM and localization pipelines when traditional loop closure is unavailable.
comment: 10 pages, 5 figures, 2 tables
HERO: Hierarchical Traversable 3D Scene Graphs for Embodied Navigation Among Movable Obstacles
3D Scene Graphs (3DSGs) constitute a powerful representation of the physical world, distinguished by their abilities to explicitly model the complex spatial, semantic, and functional relationships between entities, rendering a foundational understanding that enables agents to interact intelligently with their environment and execute versatile behaviors. Embodied navigation, as a crucial component of such capabilities, leverages the compact and expressive nature of 3DSGs to enable long-horizon reasoning and planning in complex, large-scale environments. However, prior works rely on a static-world assumption, defining traversable space solely based on static spatial layouts and thereby treating interactable obstacles as non-traversable. This fundamental limitation severely undermines their effectiveness in real-world scenarios, leading to limited reachability, low efficiency, and inferior extensibility. To address these issues, we propose HERO, a novel framework for constructing Hierarchical Traversable 3DSGs, that redefines traversability by modeling operable obstacles as pathways, capturing their physical interactivity, functional semantics, and the scene's relational hierarchy. The results show that, relative to its baseline, HERO reduces PL by 35.1% in partially obstructed environments and increases SR by 79.4% in fully obstructed ones, demonstrating substantially higher efficiency and reachability.
ISS Policy : Scalable Diffusion Policy with Implicit Scene Supervision
Vision-based imitation learning has enabled impressive robotic manipulation skills, but its reliance on object appearance while ignoring the underlying 3D scene structure leads to low training efficiency and poor generalization. To address these challenges, we introduce \emph{Implicit Scene Supervision (ISS) Policy}, a 3D visuomotor DiT-based diffusion policy that predicts sequences of continuous actions from point cloud observations. We extend DiT with a novel implicit scene supervision module that encourages the model to produce outputs consistent with the scene's geometric evolution, thereby improving the performance and robustness of the policy. Notably, ISS Policy achieves state-of-the-art performance on both single-arm manipulation tasks (MetaWorld) and dexterous hand manipulation (Adroit). In real-world experiments, it also demonstrates strong generalization and robustness. Additional ablation studies show that our method scales effectively with both data and parameters. Code and videos will be released.
SWIFT-Nav: Stability-Aware Waypoint-Level TD3 with Fuzzy Arbitration for UAV Navigation in Cluttered Environments
Efficient and reliable UAV navigation in cluttered and dynamic environments remains challenging. We propose SWIFT-Nav: Stability-aware Waypoint-level Integration of Fuzzy arbitration and TD3 for Navigation, a TD3-based navigation framework that achieves fast, stable convergence to obstacle-aware paths. The system couples a sensor-driven perception front end with a TD3 waypoint policy: the perception module converts LiDAR ranges into a confidence-weighted safety map and goal cues, while the TD3 policy is trained with Prioritised Experience Replay to focus on high-error transitions and a decaying epsilon-greedy exploration schedule that gradually shifts from exploration to exploitation. A lightweight fuzzy-logic layer computes a safety score from radial measurements and near obstacles, gates mode switching and clamps unsafe actions; in parallel, task-aligned reward shaping combining goal progress, clearance, and switch-economy terms provides dense, well-scaled feedback that accelerates learning. Implemented in Webots with proximity-based collision checking, our approach consistently outperforms baselines in trajectory smoothness and generalization to unseen layouts, while preserving real-time responsiveness. These results show that combining TD3 with replay prioritisation, calibrated exploration, and fuzzy-safety rules yields a robust and deployable solution for UAV navigation in cluttered scenes.
comment: 10 pages, Accepted at Australasian Conference on Robotics and Automation (ACRA) 2025
Maintaining the Level of a Payload carried by Multi-Robot System on Irregular Surface
In this paper, we introduce a multi robot payload transport system to carry payloads through an environment of unknown and uneven inclinations while maintaining the desired orientation of the payload. For this task, we used custom built robots with a linear actuator (pistons) mounted on top of each robot. The system continuously monitors the payload's orientation and computes the required piston height of each robot to maintain the desired orientation of the payload. In this work, we propose an open loop controller coupled with a closed loop PID controller to achieve the goal. As our modelling makes no assumptions on the type of terrain, the system can work on any unknown and uneven terrains and inclinations. We showcase the efficacy of our proposed controller by testing it on various simulated environments with varied and complex terrains.
Few-Shot Inference of Human Perceptions of Robot Performance in Social Navigation Scenarios
Understanding how humans evaluate robot behavior during human-robot interactions is crucial for developing socially aware robots that behave according to human expectations. While the traditional approach to capturing these evaluations is to conduct a user study, recent work has proposed utilizing machine learning instead. However, existing data-driven methods require large amounts of labeled data, which limits their use in practice. To address this gap, we propose leveraging the few-shot learning capabilities of Large Language Models (LLMs) to improve how well a robot can predict a user's perception of its performance, and study this idea experimentally in social navigation tasks. To this end, we extend the SEAN TOGETHER dataset with additional real-world human-robot navigation episodes and participant feedback. Using this augmented dataset, we evaluate the ability of several LLMs to predict human perceptions of robot performance from a small number of in-context examples, based on observed spatio-temporal cues of the robot and surrounding human motion. Our results demonstrate that LLMs can match or exceed the performance of traditional supervised learning models while requiring an order of magnitude fewer labeled instances. We further show that prediction performance can improve with more in-context examples, confirming the scalability of our approach. Additionally, we investigate what kind of sensor-based information an LLM relies on to make these inferences by conducting an ablation study on the input features considered for performance prediction. Finally, we explore the novel application of personalized examples for in-context learning, i.e., drawn from the same user being evaluated, finding that they further enhance prediction accuracy. This work paves the path to improving robot behavior in a scalable manner through user-centered feedback.
dLITE: Differentiable Lighting-Informed Trajectory Evaluation for On-Orbit Inspection
Visual inspection of space-borne assets is of increasing interest to spacecraft operators looking to plan maintenance, characterise damage, and extend the life of high-value satellites in orbit. The environment of Low Earth Orbit (LEO) presents unique challenges when planning inspection operations that maximise visibility, information, and data quality. Specular reflection of sunlight from spacecraft bodies, self-shadowing, and dynamic lighting in LEO significantly impact the quality of data captured throughout an orbit. This is exacerbated by the relative motion between spacecraft, which introduces variable imaging distances and attitudes during inspection. Planning inspection trajectories with the aide of simulation is a common approach. However, the ability to design and optimise an inspection trajectory specifically to improve the resulting image quality in proximity operations remains largely unexplored. In this work, we present $\partial$LITE, an end-to-end differentiable simulation pipeline for on-orbit inspection operations. We leverage state-of-the-art differentiable rendering tools and a custom orbit propagator to enable end-to-end optimisation of orbital parameters based on visual sensor data. $\partial$LITE enables us to automatically design non-obvious trajectories, vastly improving the quality and usefulness of attained data. To our knowledge, our differentiable inspection-planning pipeline is the first of its kind and provides new insights into modern computational approaches to spacecraft mission planning. Project page: https://appearance-aware.github.io/dlite/
comment: 13 pages, 9 images
SORS: A Modular, High-Fidelity Simulator for Soft Robots
The deployment of complex soft robots in multiphysics environments requires advanced simulation frameworks that not only capture interactions between different types of material, but also translate accurately to real-world performance. Soft robots pose unique modeling challenges due to their large nonlinear deformations, material incompressibility, and contact interactions, which complicate both numerical stability and physical accuracy. Despite recent progress, robotic simulators often struggle with modeling such phenomena in a scalable and application-relevant manner. We present SORS (Soft Over Rigid Simulator), a versatile, high-fidelity simulator designed to handle these complexities for soft robot applications. Our energy-based framework, built on the finite element method, allows modular extensions, enabling the inclusion of custom-designed material and actuation models. To ensure physically consistent contact handling, we integrate a constrained nonlinear optimization based on sequential quadratic programming, allowing for stable and accurate modeling of contact phenomena. We validate our simulator through a diverse set of real-world experiments, which include cantilever deflection, pressure-actuation of a soft robotic arm, and contact interactions from the PokeFlex dataset. In addition, we showcase the potential of our framework for control optimization of a soft robotic leg. These tests confirm that our simulator can capture both fundamental material behavior and complex actuation dynamics with high physical fidelity. By bridging the sim-to-real gap in these challenging domains, our approach provides a validated tool for prototyping next-generation soft robots, filling the gap of extensibility, fidelity, and usability in the soft robotic ecosystem.
comment: This work has been submitted to the IEEE for possible publication. Code and data are available at github.com/srl-ethz/sors
R4: Retrieval-Augmented Reasoning for Vision-Language Models in 4D Spatio-Temporal Space
Humans perceive and reason about their surroundings in four dimensions by building persistent, structured internal representations that encode semantic meaning, spatial layout, and temporal dynamics. These multimodal memories enable them to recall past events, infer unobserved states, and integrate new information into context-dependent reasoning. Inspired by this capability, we introduce R4, a training-free framework for retrieval-augmented reasoning in 4D spatio-temporal space that equips vision-language models (VLMs) with structured, lifelong memory. R4 continuously constructs a 4D knowledge database by anchoring object-level semantic descriptions in metric space and time, yielding a persistent world model that can be shared across agents. At inference, natural language queries are decomposed into semantic, spatial, and temporal keys to retrieve relevant observations, which are integrated into the VLM's reasoning. Unlike classical retrieval-augmented generation methods, retrieval in R4 operates directly in 4D space, enabling episodic and collaborative reasoning without training. Experiments on embodied question answering and navigation benchmarks demonstrate that R4 substantially improves retrieval and reasoning over spatio-temporal information compared to baselines, advancing a new paradigm for embodied 4D reasoning in dynamic environments.
Large Video Planner Enables Generalizable Robot Control
General-purpose robots require decision-making models that generalize across diverse tasks and environments. Recent works build robot foundation models by extending multimodal large language models (MLLMs) with action outputs, creating vision-language-action (VLA) systems. These efforts are motivated by the intuition that MLLMs' large-scale language and image pretraining can be effectively transferred to the action output modality. In this work, we explore an alternative paradigm of using large-scale video pretraining as a primary modality for building robot foundation models. Unlike static images and language, videos capture spatio-temporal sequences of states and actions in the physical world that are naturally aligned with robotic behavior. We curate an internet-scale video dataset of human activities and task demonstrations, and train, for the first time at a foundation-model scale, an open video model for generative robotics planning. The model produces zero-shot video plans for novel scenes and tasks, which we post-process to extract executable robot actions. We evaluate task-level generalization through third-party selected tasks in the wild and real-robot experiments, demonstrating successful physical execution. Together, these results show robust instruction following, strong generalization, and real-world feasibility. We release both the model and dataset to support open, reproducible video-based robot learning. Our website is available at https://www.boyuan.space/large-video-planner/.
comment: 29 pages, 16 figures
Safety with Agency: Human-Centered Safety Filter with Application to AI-Assisted Motorsports
We propose a human-centered safety filter (HCSF) for shared autonomy that significantly enhances system safety without compromising human agency. Our HCSF is built on a neural safety value function, which we first learn scalably through black-box interactions and then use at deployment to enforce a novel state-action control barrier function (Q-CBF) safety constraint. Since this Q-CBF safety filter does not require any knowledge of the system dynamics for both synthesis and runtime safety monitoring and intervention, our method applies readily to complex, black-box shared autonomy systems. Notably, our HCSF's CBF-based interventions modify the human's actions minimally and smoothly, avoiding the abrupt, last-moment corrections delivered by many conventional safety filters. We validate our approach in a comprehensive in-person user study using Assetto Corsa-a high-fidelity car racing simulator with black-box dynamics-to assess robustness in "driving on the edge" scenarios. We compare both trajectory data and drivers' perceptions of our HCSF assistance against unassisted driving and a conventional safety filter. Experimental results show that 1) compared to having no assistance, our HCSF improves both safety and user satisfaction without compromising human agency or comfort, and 2) relative to a conventional safety filter, our proposed HCSF boosts human agency, comfort, and satisfaction while maintaining robustness.
comment: Accepted to Robotics: Science and Systems (R:SS) 2025, 22 pages, 16 figures, 7 tables Updates for v4: typos in Appendix Subsection A revised
ProbeMDE: Uncertainty-Guided Active Proprioception for Monocular Depth Estimation in Surgical Robotics
Monocular depth estimation (MDE) provides a useful tool for robotic perception, but its predictions are often uncertain and inaccurate in challenging environments such as surgical scenes where textureless surfaces, specular reflections, and occlusions are common. To address this, we propose ProbeMDE, a cost-aware active sensing framework that combines RGB images with sparse proprioceptive measurements for MDE. Our approach utilizes an ensemble of MDE models to predict dense depth maps conditioned on both RGB images and on a sparse set of known depth measurements obtained via proprioception, where the robot has touched the environment in a known configuration. We quantify predictive uncertainty via the ensemble's variance and measure the gradient of the uncertainty with respect to candidate measurement locations. To prevent mode collapse while selecting maximally informative locations to propriocept (touch), we leverage Stein Variational Gradient Descent (SVGD) over this gradient map. We validate our method in both simulated and physical experiments on central airway obstruction surgical phantoms. Our results demonstrate that our approach outperforms baseline methods across standard depth estimation metrics, achieving higher accuracy while minimizing the number of required proprioceptive measurements. Project page: https://brittonjordan.github.io/probe_mde/
comment: 9 pages, 5 figures. Project page: https://brittonjordan.github.io/probe_mde/
New Location Science Models with Applications to UAV-Based Disaster Relief
Natural and human-made disasters can cause severe devastation and claim thousands of lives worldwide. Therefore, developing efficient methods for disaster response and management is a critical task for relief teams. One of the most essential components of effective response is the rapid collection of information about affected areas, damages, and victims. More data translates into better coordination, faster rescue operations, and ultimately, more lives saved. However, in some disasters, such as earthquakes, the communication infrastructure is often partially or completely destroyed, making it extremely difficult for victims to send distress signals and for rescue teams to locate and assist them in time. Unmanned Aerial Vehicles (UAVs) have emerged as valuable tools in such scenarios. In particular, a fleet of UAVs can be dispatched from a mobile station to the affected area to facilitate data collection and establish temporary communication networks. Nevertheless, real-world deployment of UAVs faces several challenges, with adverse weather conditions--especially wind--being among the most significant. To address this, we develop a novel mathematical framework to determine the optimal location of a mobile UAV station while explicitly accounting for the heterogeneity of the UAVs and the effect of wind. In particular, we generalize the Sylvester problem to introduce the Sylvester-Fermat-Torricelli (SFT) problem, which captures complex factors such as wind influence, UAV heterogeneity, and back-and-forth motion within a unified framework. The proposed framework enhances the practicality of UAV-based disaster response planning by accounting for real-world factors such as wind and UAV heterogeneity. Experimental results demonstrate that it can reduce wasted operational time by up to 84%, making post-disaster missions significantly more efficient and effective.
Embodied Co-Design for Rapidly Evolving Agents: Taxonomy, Frontiers, and Challenges
Brain-body co-evolution enables animals to develop complex behaviors in their environments. Inspired by this biological synergy, embodied co-design (ECD) has emerged as a transformative paradigm for creating intelligent agents-from virtual creatures to physical robots-by jointly optimizing their morphologies and controllers rather than treating control in isolation. This integrated approach facilitates richer environmental interactions and robust task performance. In this survey, we provide a systematic overview of recent advances in ECD. We first formalize the concept of ECD and position it within related fields. We then introduce a hierarchical taxonomy: a lower layer that breaks down agent design into three fundamental components-controlling brain, body morphology, and task environment-and an upper layer that integrates these components into four major ECD frameworks: bi-level, single-level, generative, and open-ended. This taxonomy allows us to synthesize insights from more than one hundred recent studies. We further review notable benchmarks, datasets, and applications in both simulated and real-world scenarios. Finally, we identify significant challenges and offer insights into promising future research directions. A project associated with this survey has been created at https://github.com/Yuxing-Wang-THU/SurveyBrainBody.
Supervisory Measurement-Guided Noise Covariance Estimation
Reliable state estimation hinges on accurate specification of sensor noise covariances, which weigh heterogeneous measurements. In practice, these covariances are difficult to identify due to environmental variability, front-end preprocessing, and other reasons. We address this by formulating noise covariance estimation as a bilevel optimization that, from a Bayesian perspective, factorizes the joint likelihood of so-called odometry and supervisory measurements, thereby balancing information utilization with computational efficiency. The factorization converts the nested Bayesian dependency into a chain structure, enabling efficient parallel computation: at the lower level, an invariant extended Kalman filter with state augmentation estimates trajectories, while a derivative filter computes analytical gradients in parallel for upper-level gradient updates. The upper level refines the covariance to guide the lower-level estimation. Experiments on synthetic and real-world datasets show that our method achieves higher efficiency over existing baselines.
Integration of UWB Radar on Mobile Robots for Continuous Obstacle and Environment Mapping
This paper presents an infrastructure-free approach for obstacle detection and environmental mapping using ultra-wideband (UWB) radar mounted on a mobile robotic platform. Traditional sensing modalities such as visual cameras and Light Detection and Ranging (LiDAR) fail in environments with poor visibility due to darkness, smoke, or reflective surfaces. In these visioned-impaired conditions, UWB radar offers a promising alternative. To this end, this work explores the suitability of robot-mounted UWB radar for environmental mapping in dynamic, anchor-free scenarios. The study investigates how different materials (metal, concrete and plywood) and UWB radio channels (5 and 9) influence the Channel Impulse Response (CIR). Furthermore, a processing pipeline is proposed to achieve reliable mapping of detected obstacles, consisting of 3 steps: (i) target identification (based on CIR peak detection), (ii) filtering (based on peak properties, signal-to-noise score, and phase-difference of arrival), and (iii) clustering (based on distance estimation and angle-of-arrival estimation). The proposed approach successfully reduces noise and multipath effects, resulting in an obstacle detection precision of at least 90.71% and a recall of 88.40% on channel 9 even when detecting low-reflective materials such as concrete. This work offers a foundation for further development of UWB-based localisation and mapping (SLAM) systems that do not rely on visual features and, unlike conventional UWB localisation systems, do not require on fixed anchor nodes for triangulation.
comment: This paper has been submitted to IEEE Access Journal and is currently undergoing review
Event Camera Meets Mobile Embodied Perception: Abstraction, Algorithm, Acceleration, Application
With the increasing complexity of mobile device applications, these devices are evolving toward high agility. This shift imposes new demands on mobile sensing, particularly in achieving high-accuracy and low-latency. Event-based vision has emerged as a disruptive paradigm, offering high temporal resolution and low latency, making it well-suited for high-accuracy and low-latency sensing tasks on high-agility platforms. However, the presence of substantial noisy events, lack of stable, persistent semantic information, and large data volume pose challenges for event-based data processing on resource-constrained mobile devices. This paper surveys the literature from 2014 to 2025 and presents a comprehensive overview of event-based mobile sensing, encompassing its fundamental principles, event \textit{abstraction} methods, \textit{algorithm} advancements, and both hardware and software \textit{acceleration} strategies. We discuss key \textit{applications} of event cameras in mobile sensing, including visual odometry, object tracking, optical flow, and 3D reconstruction, while highlighting challenges associated with event data processing, sensor fusion, and real-time deployment. Furthermore, we outline future research directions, such as improving the event camera with advanced optics, leveraging neuromorphic computing for efficient processing, and integrating bio-inspired algorithms. To support ongoing research, we provide an open-source \textit{Online Sheet} with recent developments. We hope this survey serves as a reference, facilitating the adoption of event-based vision across diverse applications.
comment: Accepted by ACM CSUR,35 pages
Registering the 4D Millimeter Wave Radar Point Clouds Via Generalized Method of Moments
4D millimeter wave radars (4D radars) are new emerging sensors that provide point clouds of objects with both position and radial velocity measurements. Compared to LiDARs, they are more affordable and reliable sensors for robots' perception under extreme weather conditions. On the other hand, point cloud registration is an essential perception module that provides robot's pose feedback information in applications such as Simultaneous Localization and Mapping (SLAM). Nevertheless, the 4D radar point clouds are sparse and noisy compared to those of LiDAR, and hence we shall confront great challenges in registering the radar point clouds. To address this issue, we propose a point cloud registration framework for 4D radars based on Generalized Method of Moments. The method does not require explicit point-to-point correspondences between the source and target point clouds, which is difficult to compute for sparse 4D radar point clouds. Moreover, we show the consistency of the proposed method. Experiments on both synthetic and real-world datasets show that our approach achieves higher accuracy and robustness than benchmarks, and the accuracy is even comparable to LiDAR-based frameworks.
SignBot: Learning Human-to-Humanoid Sign Language Interaction
Sign language is a natural and visual form of language that uses movements and expressions to convey meaning, serving as a crucial means of communication for individuals who are deaf or hard-of-hearing (DHH). However, the number of people proficient in sign language remains limited, highlighting the need for technological advancements to bridge communication gaps and foster interactions with minorities. Based on recent advancements in embodied humanoid robots, we propose SignBot, a novel framework for human-robot sign language interaction. SignBot integrates a cerebellum-inspired motion control component and a cerebral-oriented module for comprehension and interaction. Specifically, SignBot consists of: 1) Motion Retargeting, which converts human sign language datasets into robot-compatible kinematics; 2) Motion Control, which leverages a learning-based paradigm to develop a robust humanoid control policy for tracking sign language gestures; and 3) Generative Interaction, which incorporates translator, responser, and generator of sign language, thereby enabling natural and effective communication between robots and humans. Simulation and real-world experimental results demonstrate that SignBot can effectively facilitate human-robot interaction and perform sign language motions with diverse robots and datasets. SignBot represents a significant advancement in automatic sign language interaction on embodied humanoid robot platforms, providing a promising solution to improve communication accessibility for the DHH community.
Context Representation via Action-Free Transformer encoder-decoder for Meta Reinforcement Learning
Reinforcement learning (RL) enables robots to operate in uncertain environments, but standard approaches often struggle with poor generalization to unseen tasks. Context-adaptive meta reinforcement learning addresses these limitations by conditioning on the task representation, yet they mostly rely on complete action information in the experience making task inference tightly coupled to a specific policy. This paper introduces Context Representation via Action Free Transformer encoder decoder (CRAFT), a belief model that infers task representations solely from sequences of states and rewards. By removing the dependence on actions, CRAFT decouples task inference from policy optimization, supports modular training, and leverages amortized variational inference for scalable belief updates. Built on a transformer encoder decoder with rotary positional embeddings, the model captures long range temporal dependencies and robustly encodes both parametric and non-parametric task variations. Experiments on the MetaWorld ML-10 robotic manipulation benchmark show that CRAFT achieves faster adaptation, improved generalization, and more effective exploration compared to context adaptive meta--RL baselines. These findings highlight the potential of action-free inference as a foundation for scalable RL in robotic control.
Demonstration Sidetracks: Categorizing Systematic Non-Optimality in Human Demonstrations
Learning from Demonstration (LfD) is a popular approach for robots to acquire new skills, but most LfD methods suffer from imperfections in human demonstrations. Prior work typically treats these suboptimalities as random noise. In this paper we study non-optimal behaviors in non-expert demonstrations and show that they are systematic, forming what we call demonstration sidetracks. Using a public space study with 40 participants performing a long-horizon robot task, we recreated the setup in simulation and annotated all demonstrations. We identify four types of sidetracks (Exploration, Mistake, Alignment, Pause) and one control pattern (one-dimension control). Sidetracks appear frequently across participants, and their temporal and spatial distribution is tied to task context. We also find that users' control patterns depend on the control interface. These insights point to the need for better models of suboptimal demonstrations to improve LfD algorithms and bridge the gap between lab training and real-world deployment. All demonstrations, infrastructure, and annotations are available at https://github.com/AABL-Lab/Human-Demonstration-Sidetracks.
Off-Road Navigation via Implicit Neural Representation of Terrain Traversability
Autonomous off-road navigation requires robots to estimate terrain traversability from onboard sensors and plan accordingly. Conventional approaches typically rely on sampling-based planners such as MPPI to generate short-term control actions that aim to minimize traversal time and risk measures derived from the traversability estimates. These planners can react quickly but optimize only over a short look-ahead window, limiting their ability to reason about the full path geometry, which is important for navigating in challenging off-road environments. Moreover, they lack the ability to adjust speed based on the terrain bumpiness, which is important for smooth navigation on challenging terrains. In this paper, we introduce TRAIL (Traversability with an Implicit Learned Representation), an off-road navigation framework that leverages an implicit neural representation to continuously parameterize terrain properties. This representation yields spatial gradients that enable integration with a novel gradient-based trajectory optimization method that adapts the path geometry and speed profile based on terrain traversability.
comment: 9 pages
Dexterous Manipulation through Imitation Learning: A Survey
Dexterous manipulation, which refers to the ability of a robotic hand or multi-fingered end-effector to skillfully control, reorient, and manipulate objects through precise, coordinated finger movements and adaptive force modulation, enables complex interactions similar to human hand dexterity. With recent advances in robotics and machine learning, there is a growing demand for these systems to operate in complex and unstructured environments. Traditional model-based approaches struggle to generalize across tasks and object variations due to the high dimensionality and complex contact dynamics of dexterous manipulation. Although model-free methods such as reinforcement learning (RL) show promise, they require extensive training, large-scale interaction data, and carefully designed rewards for stability and effectiveness. Imitation learning (IL) offers an alternative by allowing robots to acquire dexterous manipulation skills directly from expert demonstrations, capturing fine-grained coordination and contact dynamics while bypassing the need for explicit modeling and large-scale trial-and-error. This survey provides an overview of dexterous manipulation methods based on imitation learning, details recent advances, and addresses key challenges in the field. Additionally, it explores potential research directions to enhance IL-driven dexterous manipulation. Our goal is to offer researchers and practitioners a comprehensive introduction to this rapidly evolving domain.
comment: 32pages, 6 figures, 9 tables
CaFe-TeleVision: A Coarse-to-Fine Teleoperation System with Immersive Situated Visualization for Enhanced Ergonomics
Teleoperation presents a promising paradigm for remote control and robot proprioceptive data collection. Despite recent progress, current teleoperation systems still suffer from limitations in efficiency and ergonomics, particularly in challenging scenarios. In this paper, we propose CaFe-TeleVision, a coarse-to-fine teleoperation system with immersive situated visualization for enhanced ergonomics. At its core, a coarse-to-fine control mechanism is proposed in the retargeting module to bridge workspace disparities, jointly optimizing efficiency and physical ergonomics. To stream immersive feedback with adequate visual cues for human vision systems, an on-demand situated visualization technique is integrated in the perception module, which reduces the cognitive load for multi-view processing. The system is built on a humanoid collaborative robot and validated with six challenging bimanual manipulation tasks. User study among 24 participants confirms that CaFe-TeleVision enhances ergonomics with statistical significance, indicating a lower task load and a higher user acceptance during teleoperation. Quantitative results also validate the superior performance of our system across six tasks, surpassing comparative methods by up to 28.89% in success rate and accelerating by 26.81% in completion time. Project webpage: https://clover-cuhk.github.io/cafe_television/
comment: Project webpage: https://clover-cuhk.github.io/cafe_television/ Code: https://github.com/Zixin-Tang/CaFe-TeleVision
History-Enhanced Two-Stage Transformer for Aerial Vision-and-Language Navigation
Aerial Vision-and-Language Navigation (AVLN) requires Unmanned Aerial Vehicle (UAV) agents to localize targets in large-scale urban environments based on linguistic instructions. While successful navigation demands both global environmental reasoning and local scene comprehension, existing UAV agents typically adopt mono-granularity frameworks that struggle to balance these two aspects. To address this limitation, this work proposes a History-Enhanced Two-Stage Transformer (HETT) framework, which integrates the two aspects through a coarse-to-fine navigation pipeline. Specifically, HETT first predicts coarse-grained target positions by fusing spatial landmarks and historical context, then refines actions via fine-grained visual analysis. In addition, a historical grid map is designed to dynamically aggregate visual features into a structured spatial memory, enhancing comprehensive scene awareness. Additionally, the CityNav dataset annotations are manually refined to enhance data quality. Experiments on the refined CityNav dataset show that HETT delivers significant performance gains, while extensive ablation studies further verify the effectiveness of each component.
Fast and Continual Learning for Hybrid Control Policies using Generalized Benders Decomposition
Hybrid model predictive control with both continuous and discrete variables is widely applicable to robotic control tasks, especially those involving contact with the environment. Due to the combinatorial complexity, the solving speed of hybrid MPC can be insufficient for real-time applications. In this paper, we proposed a hybrid MPC solver based on Generalized Benders Decomposition (GBD). The algorithm enumerates and stores cutting planes online inside a finite buffer. After a short cold-start phase, the stored cuts provide warm-starts for the new problem instances to enhance the solving speed. Despite the disturbance and randomly changing environment, the solving speed maintains. Leveraging on the sparsity of feasibility cuts, we also propose a fast algorithm for Benders master problems. Our solver is validated through controlling a cart-pole system with randomly moving soft contact walls, and a free-flying robot navigating around obstacles. The results show that with significantly less data than previous works, the solver reaches competitive speeds to the off-the-shelf solver Gurobi despite the Python overhead.
comment: This paper has been withdrawn by the author. It has been superseded by a significantly updated version available at arXiv:2406.00780
Expert Switching for Robust AAV Landing: A Dual-Detector Framework in Simulation
Reliable helipad detection is essential for Autonomous Aerial Vehicle (AAV) landing, especially under GPS-denied or visually degraded conditions. While modern detectors such as YOLOv8 offer strong baseline performance, single-model pipelines struggle to remain robust across the extreme scale transitions that occur during descent, where helipads appear small at high altitude and large near touchdown. To address this limitation, we propose a scale-adaptive dual-expert perception framework that decomposes the detection task into far-range and close-range regimes. Two YOLOv8 experts are trained on scale-specialized versions of the HelipadCat dataset, enabling one model to excel at detecting small, low-resolution helipads and the other to provide high-precision localization when the target dominates the field of view. During inference, both experts operate in parallel, and a geometric gating mechanism selects the expert whose prediction is most consistent with the AAV's viewpoint. This adaptive routing prevents the degradation commonly observed in single-detector systems when operating across wide altitude ranges. The dual-expert perception module is evaluated in a closed-loop landing environment that integrates CARLA's photorealistic rendering with NASA's GUAM flight-dynamics engine. Results show substantial improvements in alignment stability, landing accuracy, and overall robustness compared to single-detector baselines. By introducing a scale-aware expert routing strategy tailored to the landing problem, this work advances resilient vision-based perception for autonomous descent and provides a foundation for future multi-expert AAV frameworks.
comment: To be Published in AIAA SciTech 2026
D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI
Large language models leverage internet-scale text data, yet embodied AI remains constrained by the prohibitive costs of physical trajectory collection. Desktop environments -- particularly gaming -- offer a compelling alternative: they provide rich sensorimotor interactions at scale while maintaining the structured observation-action coupling essential for embodied learning. We present D2E (Desktop to Embodied AI), a framework that demonstrates desktop interactions can serve as an effective pretraining substrate for robotics embodied AI tasks. Unlike prior work that remained domain-specific (e.g., VPT for Minecraft) or kept data proprietary (e.g., SIMA), D2E establishes a complete pipeline from scalable desktop data collection to verified transfer in embodied domains. Our framework comprises three components: (1) the OWA Toolkit that unifies diverse desktop interactions into a standardized format with 152x compression, (2) the Generalist-IDM that achieves strong zero-shot generalization across unseen games through timestamp-based event prediction, enabling internet-scale pseudo-labeling, and (3) VAPT that transfers desktop-pretrained representations to physical manipulation and navigation. Using 1.3K+ hours of data (259 hours of human demonstrations, and 1K+ hours of pseudo-labeled gameplay), we achieve a total of 96.6% success rate on LIBERO manipulation and 83.3% on CANVAS navigation benchmarks. This validates that sensorimotor primitives in digital interactions exhibit sufficient invariance to transfer meaningfully to physical embodied tasks, establishing desktop pretraining as a practical paradigm for robotics. We will make all our work public, including the OWA toolkit, datasets of human-collected and pseudo-labeled, and VAPT-trained models available at https://worv-ai.github.io/d2e/
Multiagent Systems
Mapis: A Knowledge-Graph Grounded Multi-Agent Framework for Evidence-Based PCOS Diagnosis
Polycystic Ovary Syndrome (PCOS) constitutes a significant public health issue affecting 10% of reproductive-aged women, highlighting the critical importance of developing effective diagnostic tools. Previous machine learning and deep learning detection tools are constrained by their reliance on large-scale labeled data and an lack of interpretability. Although multi-agent systems have demonstrated robust capabilities, the potential of such systems for PCOS detection remains largely unexplored. Existing medical multi-agent frameworks are predominantly designed for general medical tasks, suffering from insufficient domain integration and a lack of specific domain knowledge. To address these challenges, we propose Mapis, the first knowledge-grounded multi-agent framework explicitly designed for guideline-based PCOS diagnosis. Specifically, it built upon the 2023 International Guideline into a structured collaborative workflow that simulates the clinical diagnostic process. It decouples complex diagnostic tasks across specialized agents: a gynecological endocrine agent and a radiology agent collaborative to verify inclusion criteria, while an exclusion agent strictly rules out other causes. Furthermore, we construct a comprehensive PCOS knowledge graph to ensure verifiable, evidence-based decision-making. Extensive experiments on public benchmarks and specialized clinical datasets, benchmarking against nine diverse baselines, demonstrate that Mapis significantly outperforms competitive methods. On the clinical dataset, it surpasses traditional machine learning models by 13.56%, single-agent by 6.55%, and previous medical multi-agent systems by 7.05% in Accuracy.
Epistemic diversity across language models mitigates knowledge collapse
The growing use of artificial intelligence (AI) raises concerns of knowledge collapse, i.e., a reduction to the most dominant and central set of ideas. Prior work has demonstrated single-model collapse, defined as performance decay in an AI model trained on its own output. Inspired by ecology, we ask whether AI ecosystem diversity, that is, diversity among models, can mitigate such a collapse. We build on the single-model approach but focus on ecosystems of models trained on their collective output. To study the effect of diversity on model performance, we segment the training data across language models and evaluate the resulting ecosystems over ten, self-training iterations. We find that increased epistemic diversity mitigates collapse, but, interestingly, only up to an optimal level. Our results suggest that an ecosystem containing only a few diverse models fails to express the rich mixture of the full, true distribution, resulting in rapid performance decay. Yet distributing the data across too many models reduces each model's approximation capacity on the true distribution, leading to poor performance already in the first iteration step. In the context of AI monoculture, our results suggest the need to monitor diversity across AI systems and to develop policies that incentivize more domain- and community-specific models.
comment: 16 pages, 7 figures
Solving Multi-Agent Multi-Goal Path Finding Problems in Polynomial Time
In this paper, we plan missions for a fleet of agents in undirected graphs, such as grids, with multiple goals. In contrast to regular multi-agent path-finding, the solver finds and updates the assignment of goals to the agents on its own. In the continuous case for a point agent with motions in the Euclidean plane, the problem can be solved arbitrarily close to optimal. For discrete variants that incur node and edge conflicts, we show that it can be solved in polynomial time, which is unexpected, since traditional vehicle routing on general graphs is NP-hard. We implement a corresponding planner that finds conflict-free optimized routes for the agents. Global assignment strategies greatly reduce the number of conflicts, with the remaining ones resolved by elaborating on the concept of ants-on-the-stick, by solving local assignment problems, by interleaving agent paths, and by kicking agents that have already arrived out of their destinations
What Is Your AI Agent Buying? Evaluation, Biases, Model Dependence, & Emerging Implications for Agentic E-Commerce
Online marketplaces will be transformed by autonomous AI agents acting on behalf of consumers. Rather than humans browsing and clicking, AI agents can parse webpages or leverage APIs to view, evaluate and choose products. We investigate the behavior of AI agents using ACES, a provider-agnostic framework for auditing agent decision-making. We reveal that agents can exhibit choice homogeneity, often concentrating demand on a few ``modal'' products while ignoring others entirely. Yet, these preferences are unstable: model updates can drastically reshuffle market shares. Furthermore, randomized trials show that while agents have improved over time on simple tasks with a clearly identified best choice, they exhibit strong position biases -- varying across providers and model versions, and persisting even in text-only "headless" interfaces -- undermining any universal notion of a ``top'' rank. Agents also consistently penalize sponsored tags while rewarding platform endorsements, and sensitivities to price, ratings, and reviews vary sharply across models. Finally, we demonstrate that sellers can respond: a seller-side agent making simple, query-conditional description tweaks can drive significant gains in market share. These findings reveal that agentic markets are volatile and fundamentally different from human-centric commerce, highlighting the need for continuous auditing and raising questions for platform design, seller strategy and regulation.
EvoLattice: Persistent Internal-Population Evolution through Multi-Alternative Quality-Diversity Graph Representations for LLM-Guided Program Discovery
Large language models (LLMs) are increasingly used to evolve programs and multi-agent systems, yet most existing approaches rely on overwrite-based mutations that maintain only a single candidate at a time. Such methods discard useful variants, suffer from destructive edits, and explore a brittle search space prone to structural failure. We introduce EvoLattice, a framework that represents an entire population of candidate programs or agent behaviors within a single directed acyclic graph. Each node stores multiple persistent alternatives, and every valid path through the graph defines a distinct executable candidate, yielding a large combinatorial search space without duplicating structure. EvoLattice enables fine-grained alternative-level evaluation by scoring each alternative across all paths in which it appears, producing statistics that reveal how local design choices affect global performance. These statistics provide a dense, data-driven feedback signal for LLM-guided mutation, recombination, and pruning, while preserving successful components. Structural correctness is guaranteed by a deterministic self-repair mechanism that enforces acyclicity and dependency consistency independently of the LLM. EvoLattice naturally extends to agent evolution by interpreting alternatives as prompt fragments or sub-agent behaviors. Across program synthesis (proxy and optimizer meta-learning), EvoLattice yields more stable evolution, greater expressivity, and stronger improvement trajectories than prior LLM-guided methods. The resulting dynamics resemble quality-diversity optimization, emerging implicitly from EvoLattice's internal multi-alternative representation rather than an explicit external archive.
TrafficGamer: Reliable and Flexible Traffic Simulation for Safety-Critical Scenarios with Game-Theoretic Oracles
While modern Autonomous Vehicle (AV) systems can develop reliable driving policies under regular traffic conditions, they frequently struggle with safety-critical traffic scenarios. This difficulty primarily arises from the rarity of such scenarios in driving datasets and the complexities associated with predictive modeling of multiple vehicles. Effectively simulating safety-critical traffic situations is therefore a crucial challenge. In this paper, we introduce TrafficGamer, which facilitates game-theoretic traffic simulation by viewing common road driving as a multi-agent game. When we evaluate the empirical performance across various real-world datasets, TrafficGamer ensures both the fidelity, exploitability, and diversity of the simulated scenarios, guaranteeing that they not only statically align with real-world traffic distribution but also efficiently capture equilibria for representing safety-critical scenarios involving multiple agents compared with other methods. Additionally, the results demonstrate that TrafficGamer provides highly flexible simulations across various contexts. Specifically, we demonstrate that the generated scenarios can dynamically adapt to equilibria of varying tightness by configuring risk-sensitive constraints during optimization. We have provided a demo webpage at: https://anonymous.4open.science/api/repo/trafficgamer-demo-1EE0/file/index.html.
Parallelism Meets Adaptiveness: Scalable Documents Understanding in Multi-Agent LLM Systems AAAI 2026
Large language model (LLM) agents have shown increasing promise for collaborative task completion. However, existing multi-agent frameworks often rely on static workflows, fixed roles, and limited inter-agent communication, reducing their effectiveness in open-ended, high-complexity domains. This paper proposes a coordination framework that enables adaptiveness through three core mechanisms: dynamic task routing, bidirectional feedback, and parallel agent evaluation. The framework allows agents to reallocate tasks based on confidence and workload, exchange structured critiques to iteratively improve outputs, and crucially compete on high-ambiguity subtasks with evaluator-driven selection of the most suitable result. We instantiate these principles in a modular architecture and demonstrate substantial improvements in factual coverage, coherence, and efficiency over static and partially adaptive baselines. Our findings highlight the benefits of incorporating both adaptiveness and structured competition in multi-agent LLM systems.
comment: Accepted at AAAI 2026 Workshop on WoMAPF
Stronger-MAS: Multi-Agent Reinforcement Learning for Collaborative LLMs
Multi-agent systems (MAS) and reinforcement learning (RL) are widely used to enhance the agentic capabilities of large language models (LLMs). MAS improves task performance through role-based orchestration, while RL uses environmental rewards to learn stronger policies, such as GRPO-style optimization. However, applying on-policy RL to MAS remains underexplored and presents unique challenges. Algorithmically, standard GRPO grouping assumptions break down because prompts vary by role and by turn. System-wise, the training stack must support MAS-workflow rollouts and on-policy updates for both single-policy and multi-policy models. We propose AT-GRPO, which includes (i) an agent- and turn-wise grouped RL algorithm tailored to MAS and (ii) a training system that supports both single- and multi-policy regimes. Across game, planning, coding, and math tasks, AT-GRPO delivers substantial gains. On long-horizon planning, it increases accuracy from a 14.0 to 47.0 percent single-agent RL baseline to 96.0 to 99.5 percent. It also improves reasoning performance, with average gains of 3.87 to 7.62 percent on coding tasks and 9.0 to 17.93 percent on math. Code and environments are available at: https://github.com/pettingllms-ai/PettingLLMs.
Improved High-probability Convergence Guarantees of Decentralized SGD
Convergence in high-probability (HP) has been receiving increasing interest, due to its attractive properties, such as exponentially decaying tail bounds and strong guarantees for each individual run of an algorithm. While HP guarantees are extensively studied in centralized settings, much less is understood in the decentralized, networked setup. Existing HP studies in decentralized settings impose strong assumptions, like uniformly bounded gradients, or asymptotically vanishing noise, resulting in a significant gap between assumptions used to establish convergence in the HP and the mean-squared error (MSE) sense, even for vanilla Decentralized Stochastic Gradient Descent ($\mathtt{DSGD}$) algorithm. This is contrary to centralized settings, where it is known that $\mathtt{SGD}$ converges in HP under the same conditions on the cost function as needed to guarantee MSE convergence. Motivated by this observation, we revisit HP guarantees for $\mathtt{DSGD}$ in the presence of light-tailed noise. We show that $\mathtt{DSGD}$ converges in HP under the same conditions on the cost as in the MSE sense, removing uniformly bounded gradients and other restrictive assumptions, while simultaneously achieving order-optimal rates for both non-convex and strongly convex costs. Moreover, our improved analysis yields linear speed-up in the number of users, demonstrating that $\mathtt{DSGD}$ maintains strong performance in the HP sense and matches existing MSE guarantees. Our improved results stem from a careful analysis of the MGF of quantities of interest (norm-squared of gradient or optimality gap) and the MGF of the consensus gap between users' models. To achieve linear speed-up, we provide a novel result on the variance-reduction effect of decentralized methods in the HP sense and more fine-grained bounds on the MGF for strongly convex costs, which are both of independent interest.
comment: 48 pages, 2 figures
Systems and Control (EESS)
Multi-Modal Semantic Communication
Semantic communication aims to transmit information most relevant to a task rather than raw data, offering significant gains in communication efficiency for applications such as telepresence, augmented reality, and remote sensing. Recent transformer-based approaches have used self-attention maps to identify informative regions within images, but they often struggle in complex scenes with multiple objects, where self-attention lacks explicit task guidance. To address this, we propose a novel Multi-Modal Semantic Communication framework that integrates text-based user queries to guide the information extraction process. Our proposed system employs a cross-modal attention mechanism that fuses visual features with language embeddings to produce soft relevance scores over the visual data. Based on these scores and the instantaneous channel bandwidth, we use an algorithm to transmit image patches at adaptive resolutions using independently trained encoder-decoder pairs, with total bitrate matching the channel capacity. At the receiver, the patches are reconstructed and combined to preserve task-critical information. This flexible and goal-driven design enables efficient semantic communication in complex and bandwidth-constrained environments.
Service-Oriented Fast Frequency Response from Flexible Loads and Energy Storage in Low-Inertia Power Systems
The increasing penetration of inverter-based renewable generation has significantly reduced system inertia, making modern power grids more vulnerable to rapid frequency deviations following disturbances. While a wide range of flexible resources-including electric vehicles (EVs), data centers, and battery energy storage systems (BESS)-have demonstrated the physical capability to provide fast frequency response (FFR), existing studies primarily focus on individual resource performance or controller-level designs. A systematic framework that translates heterogeneous FFR capabilities into deployable, system-level frequency services remains largely unexplored. This paper proposes a service-oriented coordination framework for fast frequency response from flexible loads and energy storage, bridging the gap between physical capability assessment and grid-operational utilization. The framework decomposes frequency support into multiple time-critical service layers based on response speed, power capacity, and energy sustainability, and dynamically allocates FFR responsibilities among heterogeneous resources accordingly. By explicitly accounting for response latency, saturation limits, and energy constraints, the proposed approach enables coordinated dispatch that prioritizes ultra-fast resources for initial frequency arrest while leveraging slower but energy-rich resources to sustain recovery.
Enhancing industrial microalgae production through Economic Model Predictive Control
The industrial production of microalgae is an important and sustainable process, but its actual competitiveness is closely related to its optimization. The biological nature of the process hinders this task, mainly due to the high nonlinearity of the process along with its changing nature, features that make its modeling, control and optimization remarkably challenging. This paper presents an economic optimization framework aiming to enhance the operation of such systems. An Economic Model Predictive Controller is proposed, centralizing the decision making and achieving the theoretical optimal operation. Different scenarios with changing climate conditions are presented, and a comparison with the typical, non-optimized industrial process operation is established. The obtained results achieve economic optimization and dynamic stability of the process, while providing some insight into the priorities during process operation at industrial level, and justifying the use of optimal controllers over traditional operation.
Operator-Theoretic Joint Estimation of Aging-Aware State of Charge and Control-Informed State of Health
Accurate estimation of a battery's state of charge and state of health is essential for safe and reliable battery management. Existing approaches often decouple these two states, lack stability guarantees, and exhibit limited generalization across operating conditions. This study introduces a unified operator-theoretic framework for aging-aware state of charge and control-informed state of health estimation. The architecture couples a Koopman-based latent dynamics model, which enables linear forecasting of nonlinear discharge-capacity evolution under varying operational conditions, with a neural operator that maps measurable intra-cycle signals to state of charge. The predicted discharge capacity is incorporated as a static correction within the neural operator pathway, yielding an age-aware state of charge estimate. Stability is ensured through spectral-radius clipping of the Koopman operator. The overall framework is trained end-to-end and evaluated on real-world lithium-ion battery datasets, demonstrating real-time capability while maintaining stable dynamics. To handle condition shifts and unseen regimes, the method integrates both zero-shot and few-shot out-of-distribution adaptation using only a limited number of cycles. Results show accurate and stable capacity forecasts, competitive state of charge trajectories on held-out cycles, and a direct, model-consistent mechanism for tracking capacity fade as a surrogate for state of health across diverse operating conditions.
comment: This paper has been submitted to IEEE for publication
Scheduling the Charge of Temporally Flexible Electric Vehicles: a Market-based Approach
The increasing electrification of human activities and the rapid integration of variable renewable energy sources strain the power grid. A solution to address the need for more grid storage is to use the battery of electric vehicles as a back-up capacity. However, drivers tend to disconnect their electric vehicle when its battery is needed the most. We propose a charge scheduler that incentivizes drivers to delay their disconnection to improve vehicle-to-grid services. We also leverage drivers' temporal flexibility to alleviate congestion in oversubscribed charging stations. We formulate the computation of an optimal flexible schedule as a mixed-integer quadratic problem. We tractably approximate its solution using the Alternating Direction Method of Multipliers. Considering the possibility that strategic drivers misreport their charging preferences to the station coordinator, we then propose a Vickrey-Clarke-Groves mechanism that incentivizes truthful reporting. We conclude with a simulated case study using real-world data to quantitatively assess the added value of drivers' temporal flexibility for enhancing vehicle-to-grid services and reducing station congestion.
Exact Learning of Linear Model Predictive Control Laws using Oblique Decision Trees with Linear Predictions
Model Predictive Control (MPC) is a powerful strategy for constrained multivariable systems but faces computational challenges in real-time deployment due to its online optimization requirements. While explicit MPC and neural network approximations mitigate this burden, they suffer from scalability issues or lack interpretability, limiting their applicability in safety-critical systems. This work introduces a data-driven framework that directly learns the Linear MPC control law from sampled state-action pairs using Oblique Decision Trees with Linear Predictions (ODT-LP), achieving both computational efficiency and interpretability. By leveraging the piecewise affine structure of Linear MPC, we prove that the Linear MPC control law can be replicated by finite-depth ODT-LP models. We develop a gradient-based training algorithm using smooth approximations of tree routing functions to learn this structure from grid-sampled Linear MPC solutions, enabling end-to-end optimization. Input-to-state stability is established under bounded approximation errors, with explicit error decomposition into learning inaccuracies and sampling errors to inform model design. Numerical experiments demonstrate that ODT-LP controllers match MPC's closed-loop performance while reducing online evaluation time by orders of magnitude compared to MPC, explicit MPC, neural network, and random forest counterparts. The transparent tree structure enables formal verification of control logic, bridging the gap between computational efficiency and certifiable reliability for safety-critical systems.
comment: 6 pages, 4 figures, accepted by and presented at the 64th IEEE Conference on Decision and Control (CDC) in December 2025
Ising Machines for Model Predictive Path Integral-Based Optimal Control
We present a sampling-based Model Predictive Control (MPC) method that implements Model Predictive Path Integral (MPPI) as an \emph{Ising machine}, suitable for novel forms of probabilistic computing. By expressing the control problem as a Quadratic Unconstrained Binary Optimization (QUBO) problem, we map MPC onto an energy landscape suitable for Gibbs sampling from an Ising model. This formulation enables efficient exploration of (near-)optimal control trajectories. We demonstrate that the approach achieves accurate trajectory tracking compared to a reference MPPI implementation, highlighting the potential of Ising-based MPPI for real-time control in robotics and autonomous systems.
QuantGraph: A Receding-Horizon Quantum Graph Solver
Dynamic programming is a cornerstone of graph-based optimization. While effective, it scales unfavorably with problem size. In this work, we present QuantGraph, a two-stage quantum-enhanced framework that casts local and global graph-optimization problems as quantum searches over discrete trajectory spaces. The solver is designed to operate efficiently by first finding a sequence of locally optimal transitions in the graph (local stage), without considering full trajectories. The accumulated cost of these transitions acts as a threshold that prunes the search space (up to 60% reduction for certain examples). The subsequent global stage, based on this threshold, refines the solution. Both stages utilize variants of the Grover-adaptive-search algorithm. To achieve scalability and robustness, we draw on principles from control theory and embed QuantGraph's global stage within a receding-horizon model-predictive-control scheme. This classical layer stabilizes and guides the quantum search, improving precision and reducing computational burden. In practice, the resulting closed-loop system exhibits robust behavior and lower overall complexity. Notably, for a fixed query budget, QuantGraph attains a 2x increase in control-discretization precision while still benefiting from Grover-search's inherent quadratic speedup compared to classical methods.
comment: P.Vaidhyanathan and A. Papatheodorou contributed equally to this work. 11 pages, 4 figures, 1 table, 2 algorithms
A Container-based Approach For Proactive Asset Administration Shell Digital Twins
In manufacturing, digital twins, realized as Asset Administration Shells (AAS), have emerged as a prevalent practice. These digital replicas, often utilized as structured repositories of asset-related data, facilitate interoperability across diverse systems. However, extant approaches treat the AAS as a static information model, lacking support for dynamic service integration and system adaptation. The existing body of literature has not yet thoroughly explored the potential for integrating executable behavior, particularly in the form of containerized services, into or from the AAS. This integration could serve to enable proactive functionality. In this paper, we propose a submodel-based architecture that introduces a structured service notion to the AAS, enabling services to dynamically interact with and adapt AAS instances at runtime. This concept is implemented through the extension of a submodel with behavioral definitions, resulting in a modular event-driven architecture capable of deploying containerized services based on embedded trigger conditions. The approach is illustrated through a case study on a 3-axis milling machine. Our contribution enables the AAS to serve not only as a passive digital representation but also as an active interface for executing added-value services.%, thereby laying the foundation for future AI-driven adaptation and system-level intelligence in digital twin environments.
Lithium-ion battery degradation: Introducing the concept of reservoirs to design for lifetime
Designing lithium-ion batteries for long service life remains a challenge, as most cells are optimized for beginning-of-life metrics such as energy density, often overlooking how design and operating conditions shape degradation. This work introduces a degradation-aware design framework built around finite, interacting reservoirs (lithium, porosity, and electrolyte) that are depleted over time by coupled degradation processes. We extend a physics-based Doyle-Fuller-Newman model to include validated mechanisms such as SEI growth, lithium plating, cracking, and solvent dry-out, and simulate how small design changes impact lifetime. Across more than 1,000 cycles, we find that increasing electrolyte volume by just 1% or porosity by 5% can extend service life by over 30% without significantly affecting cell energy density. However, lithium excess, while boosting initial capacity, can accelerate failure if not supported by sufficient structural or ionic buffers. Importantly, we show that interaction between reservoirs is crucial to optimal design: multi-reservoir tuning yields either synergistic benefits or compound failures, depending on operating conditions. We also quantify how C-rate and operating temperature influence degradation pathways, emphasizing the need for co-optimized design and usage profiles. By reframing degradation as a problem of managing finite internal reservoirs, this work offers a predictive and mechanistic foundation for designing lithium-ion batteries that balance energy, durability, and application-specific needs.
comment: 20 Pages, 5 figures
Remotely Detectable Robot Policy Watermarking
The success of machine learning for real-world robotic systems has created a new form of intellectual property: the trained policy. This raises a critical need for novel methods that verify ownership and detect unauthorized, possibly unsafe misuse. While watermarking is established in other domains, physical policies present a unique challenge: remote detection. Existing methods assume access to the robot's internal state, but auditors are often limited to external observations (e.g., video footage). This ``Physical Observation Gap'' means the watermark must be detected from signals that are noisy, asynchronous, and filtered by unknown system dynamics. We formalize this challenge using the concept of a \textit{glimpse sequence}, and introduce Colored Noise Coherency (CoNoCo), the first watermarking strategy designed for remote detection. CoNoCo embeds a spectral signal into the robot's motions by leveraging the policy's inherent stochasticity. To show it does not degrade performance, we prove CoNoCo preserves the marginal action distribution. Our experiments demonstrate strong, robust detection across various remote modalities, including motion capture and side-way/top-down video footage, in both simulated and real-world robot experiments. This work provides a necessary step toward protecting intellectual property in robotics, offering the first method for validating the provenance of physical policies non-invasively, using purely remote observations.
Consensus-based formation of a swarm of quadrotors interacting over ring digraphs
This work proposes a cooperative strategy for a group of quadrotors interacting over ring digraphs with macro-vertices of size two. Consensus for a group of general double integrators has been initially investigated, and it has been proved that through a suitable choice of a single controller parameter, consensus and stability of the resulting networked dynamical system can be ensured. This further opens up the possibility of achieving a desired formation and to move a swarm of quadrotors, interacting over ring digraphs, at a desired flight velocity, using a single controller gain. An analysis of achievable velocities is performed. Examples have been provided to offer deeper insights into the obtained analytical results. Simulation studies clearly demonstrate that a desired formation is achieved, starting from arbitrary initial positions, while also ensuring convergence to a final desired flight velocity.
Remote Magnetic Levitation Using Reduced Attitude Control and Parametric Field Models
Electromagnetic navigation systems (eMNS) are increasingly used in minimally invasive procedures such as endovascular interventions and targeted drug delivery due to their ability to generate fast and precise magnetic fields. In this paper, we utilize the OctoMag eMNS to achieve remote levitation and control of a rigid body across large air gaps which showcases the dynamic capabilities of clinical eMNS. A compact parametric analytical model maps coil currents to the forces and torques acting on the levitating object, eliminating the need for computationally expensive simulations or lookup tables and leading to a levitator agnostic modeling approach. Translational motion is stabilized using linear quadratic regulators. A nonlinear time-invariant controller is used to regulate the reduced attitude accounting for the inherent uncontrollability of rotations about the dipole axis and stabilizing the full five degrees of freedom controllable pose subspace. We analyze key design limitations and evaluate the approach through trajectory tracking experiments. This work demonstrates the dynamic capabilities and potential of feedback control in electromagnetic navigation, which is likely to open up new medical applications.
Historical Information Accelerates Decentralized Optimization: A Proximal Bundle Method
Historical information, such as past function values or gradients, has significant potential to enhance decentralized optimization methods for two key reasons: first, it provides richer information about the objective function, which also explains its established success in centralized optimization; second, unlike the second-order derivative or its alternatives, historical information has already been computed or communicated and requires no additional cost to acquire. Despite this potential, it remains underexploited. In this work, we employ a proximal bundle framework to incorporate the function values and gradients at historical iterates and adapt the framework to the proximal decentralized gradient descent method, resulting in a Decentralized Proximal Bundle Method (DPBM). To broaden its applicability, we further extend DPBM to the asynchronous and stochastic setting. We theoretically analysed the convergence of the proposed methods. Notably, both the asynchronous DPBM and its stochastic variant can converge with fixed step-sizes that are independent of delays, which is superior to the delay-dependent step-sizes required by most existing asynchronous optimization methods, as it is easier to determine and often leads to faster convergence. Numerical experiments on classification problems demonstrate that by using historical information, our methods yield faster convergence and stronger robustness in the step-sizes.
Intertemporal Hedging Demand under Epstein-Zin Preferences in a Multi-Asset Long-Run Risk Model: Evidence from Projected Pontryagin-Guided Deep Policy Optimization
I study intertemporal hedging demand in a continuous-time multi-asset long-run risk (LRR) model under Epstein--Zin (EZ) recursive preferences. The investor trades a risk-free asset and several risky assets whose drifts and volatilities depend on an Ornstein--Uhlenbeck type LRR factor. Preferences are described by EZ utility with risk aversion $R$, elasticity of intertemporal substitution $ψ$, and discount rate $δ$, so that the standard time-additive CRRA case appears as a limiting benchmark. To handle the high-dimensional consumption--investment problem, I use a projected Pontryagin-guided deep policy optimization (P-PGDPO) scheme adapted to EZ preferences. The method starts from the continuous-time Hamiltonian implied by the Pontryagin maximum principle, represents the value and costate processes with neural networks, and updates the policy along the Hamiltonian gradient. Portfolio constraints and a lower bound on wealth are enforced by explicit projection operators rather than by adding ad hoc penalties. Three main findings emerge from numerical experiments in a five-asset LRR economy: \textbf{(1)} the P-PGDPO algorithm achieves stable convergence across multiple random seeds, validating its reliability for solving high-dimensional EZ problems; \textbf{(2)} wealth floors materially reduce hedging demand by limiting the investor's ability to exploit intertemporal risk-return tradeoffs; and \textbf{(3)} the learned hedging portfolios concentrate exposure in assets with high correlation to the LRR factor, confirming that EZ agents actively hedge long-run uncertainty rather than merely following myopic rules. Because EZ preferences nest time-additive CRRA in the limit $ψ\to 1/R$, I use CRRA as an explicit diagnostic benchmark and, when needed, a warm start to stabilize training in high dimensions.
EMFusion: Conditional Diffusion Framework for Trustworthy Frequency Selective EMF Forecasting in Wireless Networks
The rapid growth in wireless infrastructure has increased the need to accurately estimate and forecast electromagnetic field (EMF) levels to ensure ongoing compliance, assess potential health impacts, and support efficient network planning. While existing studies rely on univariate forecasting of wideband aggregate EMF data, frequency-selective multivariate forecasting is needed to capture the inter-operator and inter-frequency variations essential for proactive network planning. To this end, this paper introduces EMFusion, a conditional multivariate diffusion-based probabilistic forecasting framework that integrates diverse contextual factors (e.g., time of day, season, and holidays) while providing explicit uncertainty estimates. The proposed architecture features a residual U-Net backbone enhanced by a cross-attention mechanism that dynamically integrates external conditions to guide the generation process. Furthermore, EMFusion integrates an imputation-based sampling strategy that treats forecasting as a structural inpainting task, ensuring temporal coherence even with irregular measurements. Unlike standard point forecasters, EMFusion generates calibrated probabilistic prediction intervals directly from the learned conditional distribution, providing explicit uncertainty quantification essential for trustworthy decision-making. Numerical experiments conducted on frequency-selective EMF datasets demonstrate that EMFusion with the contextual information of working hours outperforms the baseline models with or without conditions. The EMFusion outperforms the best baseline by 23.85% in continuous ranked probability score (CRPS), 13.93% in normalized root mean square error, and reduces prediction CRPS error by 22.47%.
comment: Submission for possible publication
Lyapunov-based Adaptive Transformer (LyAT) for Control of Stochastic Nonlinear Systems
This paper presents a novel Lyapunov-based Adaptive Transformer (LyAT) controller for stochastic nonlinear systems. While transformers have shown promise in various control applications due to sequential modeling through self-attention mechanisms, they have not been used within adaptive control architectures that provide stability guarantees. Existing transformer-based approaches for control rely on offline training with fixed weights, resulting in open-loop implementations that lack real-time adaptation capabilities and stability assurances. To address these limitations, a continuous LyAT controller is developed that adaptively estimates drift and diffusion uncertainties in stochastic dynamical systems without requiring offline pre-training. A key innovation is the analytically derived adaptation law constructed from a Lyapunov-based stability analysis, which enables real-time weight updates while guaranteeing probabilistic uniform ultimate boundedness of tracking and parameter estimation errors. Experimental validation on a quadrotor demonstrates the performance of the developed controller.
A Comprehensive Benchmark Platform for Process Control Research of Outdoor Microalgae Raceway Reactors
This paper presents a benchmarking framework to evaluate process control strategies in outdoor microalgae raceway reactors, integrating four key control regulation tasks: pH, dissolved oxygen (DO), culture volume through coordinated harvest-dilution actions, and temperature via a sump-mounted spiral heat exchanger. The benchmark is built upon a high-fidelity, experimentally calibrated dynamic model that captures the strongly coupled thermal, physicochemical, and biological processes governing industrial-scale open raceway ponds. A closed-loop simulation environment is provided, featuring realistic actuator constraints, gas transport delays, stiff integration, and a fully specified scenario based on multi-day outdoor disturbances (irradiance, temperature, wind, and humidity). Four user-replaceable controllers define the manipulation of CO2 injection, air bubbling, harvest/dilution sequencing, and heat-exchanger operation. The platform computes a unified global performance index, in addition to individual metrics for each control problem, combining tracking error, gas and energy usage, and biomass productivity, enabling consistent and quantitative comparison of alternative control strategies. Baseline regulatory architectures (On/Off, PI/PID, and Economic Model Predictive Control (EMPC)) are included to illustrate the benchmark use for classical and advanced control methods. By providing an openly specified, reproducible, and computationally tractable benchmark with well-defined function interfaces, this work aims to bridge control methodology and outdoor algal bioprocess engineering, and to support the development of multivariable control strategies for disturbance-rich environmental systems.
A Fair, Flexible, Zero-Waste Digital Electricity Market: A First-Principles Approach Combining Automatic Market Making, Holarchic Architectures and Shapley Theory
This thesis presents a fundamental rethink of electricity market design at the wholesale and balancing layers. Rather than treating markets as static spot clearing mechanisms, it reframes them as a continuously online, event driven dynamical control system: a two sided marketplace operating directly on grid physics. Existing energy only, capacity augmented, and zonal market designs are shown to admit no shock robust Nash equilibrium under realistic uncertainty, instead relying on price caps, uplift, and regulatory intervention to preserve solvency and security. In response, the thesis develops a holarchic Automatic Market Maker (AMM) in which prices are bounded, exogenous control signals derived from physical tightness rather than emergent equilibrium outcomes. The AMM generalises nodal and zonal pricing through nested scarcity layers, from node to cluster to zone to region to system, such that participant facing prices inherit from the tightest binding constraint. Nodal and zonal pricing therefore emerge as special cases of a unified scarcity propagation rule. Beyond pricing, the AMM functions as a scarcity aware control system and a digitally enforceable rulebook for fair access and proportional allocation under shortage. Fuel costs are recovered through pay as bid energy dispatch consistent with merit order, while non fuel operating and capital costs are allocated according to adequacy, flexibility, and locational contribution. Large scale simulations demonstrate bounded input bounded output stability, controllable procurement costs, zero structural waste, and improved distributional outcomes. The architecture is climate aligned and policy configurable, but requires a managed transition and new operational tools for system operators and market participants.
comment: PhD thesis
Safety with Agency: Human-Centered Safety Filter with Application to AI-Assisted Motorsports
We propose a human-centered safety filter (HCSF) for shared autonomy that significantly enhances system safety without compromising human agency. Our HCSF is built on a neural safety value function, which we first learn scalably through black-box interactions and then use at deployment to enforce a novel state-action control barrier function (Q-CBF) safety constraint. Since this Q-CBF safety filter does not require any knowledge of the system dynamics for both synthesis and runtime safety monitoring and intervention, our method applies readily to complex, black-box shared autonomy systems. Notably, our HCSF's CBF-based interventions modify the human's actions minimally and smoothly, avoiding the abrupt, last-moment corrections delivered by many conventional safety filters. We validate our approach in a comprehensive in-person user study using Assetto Corsa-a high-fidelity car racing simulator with black-box dynamics-to assess robustness in "driving on the edge" scenarios. We compare both trajectory data and drivers' perceptions of our HCSF assistance against unassisted driving and a conventional safety filter. Experimental results show that 1) compared to having no assistance, our HCSF improves both safety and user satisfaction without compromising human agency or comfort, and 2) relative to a conventional safety filter, our proposed HCSF boosts human agency, comfort, and satisfaction while maintaining robustness.
comment: Accepted to Robotics: Science and Systems (R:SS) 2025, 22 pages, 16 figures, 7 tables Updates for v4: typos in Appendix Subsection A revised
Embodied Co-Design for Rapidly Evolving Agents: Taxonomy, Frontiers, and Challenges
Brain-body co-evolution enables animals to develop complex behaviors in their environments. Inspired by this biological synergy, embodied co-design (ECD) has emerged as a transformative paradigm for creating intelligent agents-from virtual creatures to physical robots-by jointly optimizing their morphologies and controllers rather than treating control in isolation. This integrated approach facilitates richer environmental interactions and robust task performance. In this survey, we provide a systematic overview of recent advances in ECD. We first formalize the concept of ECD and position it within related fields. We then introduce a hierarchical taxonomy: a lower layer that breaks down agent design into three fundamental components-controlling brain, body morphology, and task environment-and an upper layer that integrates these components into four major ECD frameworks: bi-level, single-level, generative, and open-ended. This taxonomy allows us to synthesize insights from more than one hundred recent studies. We further review notable benchmarks, datasets, and applications in both simulated and real-world scenarios. Finally, we identify significant challenges and offer insights into promising future research directions. A project associated with this survey has been created at https://github.com/Yuxing-Wang-THU/SurveyBrainBody.
Simultaneous and Proportional Finger Motion Decoding Using Spatial Features from High-Density Surface Electromyography
Restoring natural and intuitive hand function requires simultaneous and proportional control (SPC) of multiple degrees of freedom (DoFs). This study systematically evaluated the multichannel linear descriptors-based block field method (MLD-BFM) for continuous decoding of five finger-joint DoFs by leveraging the rich spatial information of high-density surface electromyography (HD sEMG). Twenty-one healthy participants performed dynamic sinusoidal finger movements while HD sEMG signals were recorded from the extensor digitorum communis (EDC) and flexor digitorum superficialis (FDS) muscles. MLD-BFM extracted region-specific spatial features, including effective field strength ($Σ$), field-strength variation rate ($Φ$), and spatial complexity ($Ω$). Model performance was optimized (block size: $2 \times 2$; window: 0.15 s) and compared with conventional time-domain features and dimensionality reduction approaches when applied to multi-output regression models. MLD-BFM consistently achieved the highest $\mathrm{R}^2_{\mathrm{vw}}$ values across all models. The multilayer perceptron (MLP) combined with MLD-BFM yielded the best performance ($\mathrm{R}^2_{\mathrm{vw}} = 86.68\% \pm 0.33$). Time-domain features also showed strong predictive capability and were statistically comparable to MLD-BFM in some models, whereas dimensionality reduction techniques exhibited lower accuracy. Decoding accuracy was higher for the middle and ring fingers than for the thumb. Overall, MLD-BFM improved continuous finger movement decoding accuracy, underscoring the importance of taking advantage of the spatial richness of HD sEMG. These findings suggest that spatially structured features enhance SPC and provide practical guidance for designing robust, real-time, and responsive myoelectric interfaces.
comment: 39 pages, 13 figures, 2 tables
Online convex optimization for robust control of constrained dynamical systems
This article investigates the problem of controlling linear time-invariant systems subject to time-varying and a priori unknown cost functions, state and input constraints, and exogenous disturbances. We combine the online convex optimization framework with tools from robust model predictive control to propose an algorithm that is able to guarantee robust constraint satisfaction. The performance of the closed loop emerging from application of our framework is studied in terms of its dynamic regret, which is proven to be bounded linearly by the variation of the cost functions and the magnitude of the disturbances. We corroborate our theoretical findings and illustrate implementational aspects of the proposed algorithm by a numerical case study on a tracking control problem of an autonomous vehicle.
comment: 16 pages
Missing Money and Market-Based Adequacy in Deeply Decarbonized Power Systems with Long-Duration Energy Storage
The ability of deeply decarbonised power systems to ensure adequacy may increasingly depend on long-duration energy storage (LDES). A central challenge is whether capacity markets (CMs), originally designed around thermal generation, can provide efficient investment signals when storage becomes a central participant. While recent studies have advanced methods for accrediting variable renewables and short-duration storage, the effectiveness of these methods in CMs with substantial LDES penetration remains largely unexplored. To address this gap, we extend a two-stage stochastic equilibrium investment model by endogenising continuous, duration-based capacity accreditation for storage and apply it to a Great Britain-based case using 40 years of weather-driven demand and renewable profiles under varying emission limits. Results show that well-calibrated CMs can sustain near-efficient investment and mitigate revenue volatility, but their effectiveness diminishes in deeply decarbonized systems, underscoring both their potential and the regulatory challenges of supporting large-scale LDES.
Accelerated Decentralized Constraint-Coupled Optimization: A Dual$^2$ Approach
In this paper, we focus on a class of decentralized constraint-coupled optimization problem: $\min_{x_i \in \mathbb{R}^{d_i}, i \in \mathcal{I}; y \in \mathbb{R}^p}$ $\sum_{i=1}^n\left(f_i(x_i) + g_i(x_i)\right) + h(y) \ \text{s.t.} \ \sum_{i=1}^{n}A_ix_i = y$, over an undirected and connected network of $n$ agents. Here, $f_i$, $g_i$, and $A_i$ represent private information of agent $i \in \mathcal{I} = \{1, \cdots, n\}$, while $h$ is public for all agents. Building on a novel dual$^2$ approach, we develop two accelerated algorithms to solve this problem: the inexact Dual$^2$ Accelerated (iD2A) gradient method and the Multi-consensus inexact Dual$^2$ Accelerated (MiD2A) gradient method. We demonstrate that both iD2A and MiD2A can guarantee asymptotic convergence under a milder condition on $h$ compared to existing algorithms. Furthermore, under additional assumptions, we establish linear convergence rates and derive significantly lower communication and computational complexity bounds than those of existing algorithms. Several numerical experiments validate our theoretical analysis and demonstrate the practical superiority of the proposed algorithms.
Differentially Private Gradient-Tracking-Based Distributed Stochastic Optimization over Directed Graphs
This paper proposes a differentially private gradient-tracking-based distributed stochastic optimization algorithm over directed graphs. In particular, privacy noises are incorporated into each agent's state and tracking variable to mitigate information leakage, after which the perturbed states and tracking variables are transmitted to neighbors. We design two novel schemes for the step-sizes and the sampling number within the algorithm. The sampling parameter-controlled subsampling method employed by both schemes enhances the differential privacy level, and ensures a finite cumulative privacy budget even over infinite iterations. The algorithm achieves both almost sure and mean square convergence for nonconvex objectives. Furthermore, when nonconvex objectives satisfy the Polyak-Lojasiewicz (PL) condition, Scheme (S1) achieves a polynomial mean square convergence rate, and Scheme (S2) achieves an exponential mean square convergence rate. The trade-off between privacy and convergence is presented. The effectiveness of the algorithm and its superior performance compared to existing works are illustrated through numerical examples of distributed training on the benchmark datasets "MNIST" and "CIFAR-10".
Observation Compression in Rate-Limited Closed-Loop Distributed ISAC Systems: From Signal Reconstruction to Control
In closed-loop distributed multi-sensor integrated sensing and communication (ISAC) systems, performance often hinges on transmitting high-dimensional sensor observations over rate-limited networks. In this paper, we first present a general framework for rate-limited closed-loop distributed ISAC systems, and then propose an autoencoder-based observation compression method to overcome the constraints imposed by limited transmission capacity. Building on this framework, we conduct a case study using a closed-loop linear quadratic regulator (LQR) system to analyze how the interplay among observation, compression, and state dimensions affects reconstruction accuracy, state estimation error, and control performance. In multi-sensor scenarios, our results further show that optimal resource allocation initially prioritizes low-noise sensors until the compression becomes lossless, after which resources are reallocated to high-noise sensors.
comment: 6 pages, 15 figures. This work has been accepted by Globecom Workshop 2025
Large Language Model-Based Intelligent Antenna Design System
Antenna simulation typically involves modeling and optimization, which are time-consuming and labor-intensive, slowing down antenna analysis and design. This paper presents a prototype of a large language model (LLM)-based antenna design system (LADS) to assist in antenna simulation. LADS generates antenna models with textual descriptions and images extracted from academic papers, patents, and technical reports (either one or multiple), and it interacts with engineers to iteratively refine the designs. After that, LADS configures and runs an optimizer to meet the design specifications. The effectiveness of LADS is demonstrated by a monopole slotted antenna generated from images and descriptions from the literature. To improve gain stability across the 3.1-10.6 GHz ultra-wide band, LADS modifies the cross-slot into an H-slot and changes substrate material, followed by parameter optimization. As a result, the gain variation is reduced while maintaining the same gain level. The LLM-enabled antenna modeling (LEAM) is available at: https://github.com/TaoWu974/LEAM.
comment: Code are available: https://github.com/TaoWu974/LEAM. Accepted by and will be presented in EuCAP 2026, Dublin
AI-Assisted Game Management Decisions: A Fuzzy Logic Approach to Real-Time Soccer Substitutions
In elite soccer, substitution decisions entail significant financial and sporting consequences yet remain heavily reliant on intuition or predictive models that merely mimic historical biases. This paper introduces a Fuzzy Logic based Decision Support System (DSS) designed for real time, prescriptive game management. Unlike traditional Machine Learning approaches that encounter a predictive ceiling by attempting to replicate human behavior, our system audits performance through an objective, rule based inference engine. We propose a methodological advancement by reformulating the PlayeRank metric into a Cumulative Mean with Role Aware Normalization, eliminating the play time exposure bias inherent in cumulative sum models to enable accurate intra match comparison. The system integrates this refined metric with physiological proxies (fatigue) and contextual variables (disciplinary risk modulated by tactical role) to calculate a dynamic Substitution Priority (P final). Validation via a case study of the 2018 FIFA World Cup match between Brazil and Belgium demonstrates the system's ecological validity: it not only aligned with expert consensus on executed substitutions (for example Gabriel Jesus) but, crucially, identified high risk scenarios ignored by human decision makers. Specifically, the model flagged the "FAGNER Paradox" - a maximum priority defensive risk - minutes before a critical yellow card, and detected the "Lukaku Paradox", where an isolated assist masked a severe drop in participation. These results confirm that Fuzzy Logic offers a transparent, explainable, and superior alternative to black box models for optimizing real time tactical decisions.
comment: This version contains a methodological inconsistency in the performance evaluation formulation described in Section 3 (Methodology). Specifically, the definition of the temporal aggregation of player performance was incomplete, leading to an ambiguity between cumulative and normalized metrics in the intra
Improving Action Smoothness for a Cascaded Online Learning Flight Control System
This paper aims to improve the action smoothness of a cascaded online learning flight control system. Although the cascaded structure is widely used in flight control design, its stability can be compromised by oscillatory control actions, which poses challenges for practical engineering applications. To address this issue, we introduce an online temporal smoothness technique and a low-pass filter to reduce the amplitude and frequency of the control actions. Fast Fourier Transform (FFT) is used to analyze policy performance in the frequency domain. Simulation results demonstrate the improvements achieved by the two proposed techniques.
Exploring the Influence of Residential Electric Vehicle Charging on Distribution System Hosting Capacity -- A Case-Study in Arizona
The installation of high-capacity fast chargers for electric vehicles (EVs) is posing a significant risk to the distribution grid as the increased demand from widespread residential EV charging could exceed the technical limits of the distribution system. Addressing this issue is critical, given that current infrastructure upgrades to enhance EV hosting capacity are both costly and time-consuming. Moreover, the inherent uncertainties associated with EV charging parameters make it challenging for power utilities to accurately assess the impact of EVs added to specific locations. To address these knowledge gaps, this study (a) introduces an algorithm to coordinate residential EV charging, and (b) proposes a comprehensive framework that evaluates all transformers within a feeder. The proposed method is applied to a real-world feeder, which includes 120 transformers of varying capacities. The results demonstrate that this approach effectively manages a substantial number of EVs without overloading any of the transformers, while also pinpointing locations that must be prioritized for future upgrades. This framework can serve as a valuable reference for utilities when conducting distribution system evaluations for supporting the growing EV penetration.
Robotics
CRISP: Contact-Guided Real2Sim from Monocular Video with Planar Scene Primitives SP
We introduce CRISP, a method that recovers simulatable human motion and scene geometry from monocular video. Prior work on joint human-scene reconstruction relies on data-driven priors and joint optimization with no physics in the loop, or recovers noisy geometry with artifacts that cause motion tracking policies with scene interactions to fail. In contrast, our key insight is to recover convex, clean, and simulation-ready geometry by fitting planar primitives to a point cloud reconstruction of the scene, via a simple clustering pipeline over depth, normals, and flow. To reconstruct scene geometry that might be occluded during interactions, we make use of human-scene contact modeling (e.g., we use human posture to reconstruct the occluded seat of a chair). Finally, we ensure that human and scene reconstructions are physically-plausible by using them to drive a humanoid controller via reinforcement learning. Our approach reduces motion tracking failure rates from 55.2\% to 6.9\% on human-centric video benchmarks (EMDB, PROX), while delivering a 43\% faster RL simulation throughput. We further validate it on in-the-wild videos including casually-captured videos, Internet videos, and even Sora-generated videos. This demonstrates CRISP's ability to generate physically-valid human motion and interaction environments at scale, greatly advancing real-to-sim applications for robotics and AR/VR.
comment: Project page: https://crisp-real2sim.github.io/CRISP-Real2Sim/
CHIP: Adaptive Compliance for Humanoid Control through Hindsight Perturbation
Recent progress in humanoid robots has unlocked agile locomotion skills, including backflipping, running, and crawling. Yet it remains challenging for a humanoid robot to perform forceful manipulation tasks such as moving objects, wiping, and pushing a cart. We propose adaptive Compliance Humanoid control through hIsight Perturbation (CHIP), a plug-and-play module that enables controllable end-effector stiffness while preserving agile tracking of dynamic reference motions. CHIP is easy to implement and requires neither data augmentation nor additional reward tuning. We show that a generalist motion-tracking controller trained with CHIP can perform a diverse set of forceful manipulation tasks that require different end-effector compliance, such as multi-robot collaboration, wiping, box delivery, and door opening.
comment: The first two authors contributed equally. Project page: https://nvlabs.github.io/CHIP/
EVOLVE-VLA: Test-Time Training from Environment Feedback for Vision-Language-Action Models
Achieving truly adaptive embodied intelligence requires agents that learn not just by imitating static demonstrations, but by continuously improving through environmental interaction, which is akin to how humans master skills through practice. Vision-Language-Action (VLA) models have advanced robotic manipulation by leveraging large language models, yet remain fundamentally limited by Supervised Finetuning (SFT): requiring hundreds of demonstrations per task, rigidly memorizing trajectories, and failing to adapt when deployment conditions deviate from training. We introduce EVOLVE-VLA, a test-time training framework enabling VLAs to continuously adapt through environment interaction with minimal or zero task-specific demonstrations. The key technical challenge is replacing oracle reward signals (unavailable at test time) with autonomous feedback. We address this through a learned progress estimator providing dense feedback, and critically, we design our framework to ``tame'' this inherently noisy signal via two mechanisms: (1) an accumulative progress estimation mechanism smoothing noisy point-wise estimates, and (2) a progressive horizon extension strategy enabling gradual policy evolution. EVOLVE-VLA achieves substantial gains: +8.6\% on long-horizon tasks, +22.0\% in 1-shot learning, and enables cross-task generalization -- achieving 20.8\% success on unseen tasks without task-specific demonstrations training (vs. 0\% for pure SFT). Qualitative analysis reveals emergent capabilities absent in demonstrations, including error recovery and novel strategies. This work represents a critical step toward VLAs that truly learn and adapt, moving beyond static imitation toward continuous self-improvements.
comment: 15 pages
Nonlinear System Identification Nano-drone Benchmark
We introduce a benchmark for system identification based on 75k real-world samples from the Crazyflie 2.1 Brushless nano-quadrotor, a sub-50g aerial vehicle widely adopted in robotics research. The platform presents a challenging testbed due to its multi-input, multi-output nature, open-loop instability, and nonlinear dynamics under agile maneuvers. The dataset comprises four aggressive trajectories with synchronized 4-dimensional motor inputs and 13-dimensional output measurements. To enable fair comparison of identification methods, the benchmark includes a suite of multi-horizon prediction metrics for evaluating both one-step and multi-step error propagation. In addition to the data, we provide a detailed description of the platform and experimental setup, as well as baseline models highlighting the challenge of accurate prediction under real-world noise and actuation nonlinearities. All data, scripts, and reference implementations are released as open-source at https://github.com/idsia-robotics/nanodrone-sysid-benchmark to facilitate transparent comparison of algorithms and support research on agile, miniaturized aerial robotics.
A4-Agent: An Agentic Framework for Zero-Shot Affordance Reasoning
Affordance prediction, which identifies interaction regions on objects based on language instructions, is critical for embodied AI. Prevailing end-to-end models couple high-level reasoning and low-level grounding into a single monolithic pipeline and rely on training over annotated datasets, which leads to poor generalization on novel objects and unseen environments. In this paper, we move beyond this paradigm by proposing A4-Agent, a training-free agentic framework that decouples affordance prediction into a three-stage pipeline. Our framework coordinates specialized foundation models at test time: (1) a $\textbf{Dreamer}$ that employs generative models to visualize $\textit{how}$ an interaction would look; (2) a $\textbf{Thinker}$ that utilizes large vision-language models to decide $\textit{what}$ object part to interact with; and (3) a $\textbf{Spotter}$ that orchestrates vision foundation models to precisely locate $\textit{where}$ the interaction area is. By leveraging the complementary strengths of pre-trained models without any task-specific fine-tuning, our zero-shot framework significantly outperforms state-of-the-art supervised methods across multiple benchmarks and demonstrates robust generalization to real-world settings.
Geometric Parameter Optimization of a Novel 3-(PP(2-(UPS))) Redundant Parallel Mechanism based on Workspace Determination
Redundant parallel robots are normally employed in scenarios requiring good precision, high load capability, and large workspace compared to traditional parallel mechanisms. However, the elementary robotic configuration and geometric parameter optimization are still quite challenging. This paper proposes a novel 3-(PP(2-(UPS))) redundant parallel mechanism, with good generalizability first, and further investigates the kinematic optimization issue by analyzing and investigating how its key geometric parameters influence the volume, shape, boundary completeness, and orientation capabilities of its workspace. The torsional capability index TI_1 and tilting capability index TI_2 are defined to evaluate the orientation performance of the mechanism. Numerical simulation studies are completed to indicate the analysis, providing reasonable but essential references for the parameter optimization of 3-(PP(2-(UPS))) and other similar redundant parallel mechanisms.
comment: 7 pages, 5 figures
Odyssey: An Automotive Lidar-Inertial Odometry Dataset for GNSS-denied situations
The development and evaluation of Lidar-Inertial Odometry (LIO) and Simultaneous Localization and Mapping (SLAM) systems requires a precise ground truth. The Global Navigation Satellite System (GNSS) is often used as a foundation for this, but its signals can be unreliable in obstructed environments due to multi-path effects or loss-of-signal. While existing datasets compensate for the sporadic loss of GNSS signals by incorporating Inertial Measurement Unit (IMU) measurements, the commonly used Micro-Electro-Mechanical Systems (MEMS) or Fiber Optic Gyroscope (FOG)-based systems do not permit the prolonged study of GNSS-denied environments. To close this gap, we present Odyssey, a LIO dataset with a focus on GNSS-denied environments such as tunnels and parking garages as well as other underrepresented, yet ubiquitous situations such as stop-and-go-traffic, bumpy roads and wide open fields. Our ground truth is derived from a navigation-grade Inertial Navigation System (INS) equipped with a Ring Laser Gyroscope (RLG), offering exceptional bias stability characteristics compared to IMUs used in existing datasets and enabling the prolonged and accurate study of GNSS-denied environments. This makes Odyssey the first publicly available dataset featuring a RLG-based INS. Besides providing data for LIO, we also support other tasks, such as place recognition, through the threefold repetition of all trajectories as well as the integration of external mapping data by providing precise geodetic coordinates. All data, dataloader and other material is available online at https://odyssey.uni-goettingen.de/ .
comment: 9 pages, 4 figures, submitted to International Journal of Robotics Research (IJRR)
Quadratic Kalman Filter for Elliptical Extended Object Tracking based on Decoupling State Components
Extended object tracking involves estimating both the physical extent and kinematic parameters of a target object, where typically multiple measurements are observed per time step. In this article, we propose a deterministic closed-form elliptical extended object tracker, based on decoupling of the kinematics, orientation, and axis lengths. By disregarding potential correlations between these state components, fewer approximations are required for the individual estimators than for an overall joint solution. The resulting algorithm outperforms existing algorithms, reaching the accuracy of sampling-based procedures. Additionally, a batch-based variant is introduced, yielding highly efficient computation while outperforming all comparable state-of-the-art algorithms. This is validated both by a simulation study using common models from literature, as well as an extensive quantitative evaluation on real automotive radar data.
comment: 13 pages, 8 figures, submitted to IEEE Transactions on Aerospace and Electronic Systems
Synthetic Data Pipelines for Adaptive, Mission-Ready Militarized Humanoids
Omnia presents a synthetic data driven pipeline to accelerate the training, validation, and deployment readiness of militarized humanoids. The approach converts first-person spatial observations captured from point-of-view recordings, smart glasses, augmented reality headsets, and spatial browsing workflows into scalable, mission-specific synthetic datasets for humanoid autonomy. By generating large volumes of high-fidelity simulated scenarios and pairing them with automated labeling and model training, the pipeline enables rapid iteration on perception, navigation, and decision-making capabilities without the cost, risk, or time constraints of extensive field trials. The resulting datasets can be tuned quickly for new operational environments and threat conditions, supporting both baseline humanoid performance and advanced subsystems such as multimodal sensing, counter-detection survivability, and CBRNE-relevant reconnaissance behaviors. This work targets faster development cycles and improved robustness in complex, contested settings by exposing humanoid systems to broad scenario diversity early in the development process.
comment: 6 pages; xTech Humanoid white paper submission
A Comprehensive Safety Metric to Evaluate Perception in Autonomous Systems SC 2020
Complete perception of the environment and its correct interpretation is crucial for autonomous vehicles. Object perception is the main component of automotive surround sensing. Various metrics already exist for the evaluation of object perception. However, objects can be of different importance depending on their velocity, orientation, distance, size, or the potential damage that could be caused by a collision due to a missed detection. Thus, these additional parameters have to be considered for safety evaluation. We propose a new safety metric that incorporates all these parameters and returns a single easily interpretable safety assessment score for object perception. This new metric is evaluated with both real world and virtual data sets and compared to state of the art metrics.
comment: Accepted at IEEE ITSC 2020
CoLD Fusion: A Real-time Capable Spline-based Fusion Algorithm for Collective Lane Detection
Comprehensive environment perception is essential for autonomous vehicles to operate safely. It is crucial to detect both dynamic road users and static objects like traffic signs or lanes as these are required for safe motion planning. However, in many circumstances a complete perception of other objects or lanes is not achievable due to limited sensor ranges, occlusions, and curves. In scenarios where an accurate localization is not possible or for roads where no HD maps are available, an autonomous vehicle must rely solely on its perceived road information. Thus, extending local sensing capabilities through collective perception using vehicle-to-vehicle communication is a promising strategy that has not yet been explored for lane detection. Therefore, we propose a real-time capable approach for collective perception of lanes using a spline-based estimation of undetected road sections. We evaluate our proposed fusion algorithm in various situations and road types. We were able to achieve real-time capability and extend the perception range by up to 200%.
comment: Accepted at IEEE IV 2023
Fine-Tuning of Neural Network Approximate MPC without Retraining via Bayesian Optimization
Approximate model-predictive control (AMPC) aims to imitate an MPC's behavior with a neural network, removing the need to solve an expensive optimization problem at runtime. However, during deployment, the parameters of the underlying MPC must usually be fine-tuned. This often renders AMPC impractical as it requires repeatedly generating a new dataset and retraining the neural network. Recent work addresses this problem by adapting AMPC without retraining using approximated sensitivities of the MPC's optimization problem. Currently, this adaption must be done by hand, which is labor-intensive and can be unintuitive for high-dimensional systems. To solve this issue, we propose using Bayesian optimization to tune the parameters of AMPC policies based on experimental data. By combining model-based control with direct and local learning, our approach achieves superior performance to nominal AMPC on hardware, with minimal experimentation. This allows automatic and data-efficient adaptation of AMPC to new system instances and fine-tuning to cost functions that are difficult to directly implement in MPC. We demonstrate the proposed method in hardware experiments for the swing-up maneuver on an inverted cartpole and yaw control of an under-actuated balancing unicycle robot, a challenging control problem.
A Geometric Task-Space Port-Hamiltonian Formulation for Redundant Manipulators
We present a novel geometric port-Hamiltonian formulation of redundant manipulators performing a differential kinematic task $η=J(q)\dot{q}$, where $q$ is a point on the configuration manifold, $η$ is a velocity-like task space variable, and $J(q)$ is a linear map representing the task, for example the classical analytic or geometric manipulator Jacobian matrix. The proposed model emerges from a change of coordinates from canonical Hamiltonian dynamics, and splits the standard Hamiltonian momentum variable into a task-space momentum variable and a null-space momentum variable. Properties of this model and relation to Lagrangian formulations present in the literature are highlighted. Finally, we apply the proposed model in an \textit{Interconnection and Damping Assignment Passivity-Based Control} (IDA-PBC) design to stabilize and shape the impedance of a 7-DOF Emika Panda robot in simulation.
Field evaluation and optimization of a lightweight lidar-based UAV navigation system for dense boreal forest environments
The interest in the usage of uncrewed aerial vehicles (UAVs) for forest applications has increased in recent years. While above-canopy flight has reached a high level of autonomy, navigating under-canopy remains a significant challenge. The use of autonomous UAVs could reduce the burden of data collection, which has motivated the development of numerous solutions for under-canopy autonomous flight. However, the experiments conducted in the literature and their reporting lack rigor. Very rarely, the density and the difficulty of the test forests are reported, or multiple flights are flown, and the success rate of those flights is reported. The aim of this study was to implement an autonomously flying quadrotor based on a lightweight lidar using openly available algorithms and test its behavior in real forest environments. A set of rigorous experiments was conducted with a quadrotor prototype utilizing the IPC path planner and LTA-OM SLAM algorithm. Based on the results of the first 33 flights, the original system was further enhanced. With the optimized system, 60 flights were performed, resulting in a total of 93 test flights. The optimized system performed significantly better in terms of reliability and flight mission completion times, achieving success rates of 12/15 in a medium-density forest and 15/15 in a dense forest, at a target flight velocity of 1 m/s. At a target flight velocity of 2 m/s, it had a success rate of 12/15 and 5/15, respectively. Furthermore, a standardized testing setup and evaluation criteria were proposed, enabling consistent performance comparisons of autonomous under-canopy UAV systems, enhancing reproducibility, guiding system improvements, and accelerating progress in forest robotics.
comment: This work has been submitted to the IEEE for possible publication
ARCADE: Adaptive Robot Control with Online Changepoint-Aware Bayesian Dynamics Learning
Real-world robots must operate under evolving dynamics caused by changing operating conditions, external disturbances, and unmodeled effects. These may appear as gradual drifts, transient fluctuations, or abrupt shifts, demanding real-time adaptation that is robust to short-term variation yet responsive to lasting change. We propose a framework for modeling the nonlinear dynamics of robotic systems that can be updated in real time from streaming data. The method decouples representation learning from online adaptation, using latent representations learned offline to support online closed-form Bayesian updates. To handle evolving conditions, we introduce a changepoint-aware mechanism with a latent variable inferred from data likelihoods that indicates continuity or shift. When continuity is likely, evidence accumulates to refine predictions; when a shift is detected, past information is tempered to enable rapid re-learning. This maintains calibrated uncertainty and supports probabilistic reasoning about transient, gradual, or structural change. We prove that the adaptive regret of the framework grows only logarithmically in time and linearly with the number of shifts, competitive with an oracle that knows timings of shift. We validate on cartpole simulations and real quadrotor flights with swinging payloads and mid-flight drops, showing improved predictive accuracy, faster recovery, and more accurate closed-loop tracking than relevant baselines.
CaFe-TeleVision: A Coarse-to-Fine Teleoperation System with Immersive Situated Visualization for Enhanced Ergonomics
Teleoperation presents a promising paradigm for remote control and robot proprioceptive data collection. Despite recent progress, current teleoperation systems still suffer from limitations in efficiency and ergonomics, particularly in challenging scenarios. In this paper, we propose CaFe-TeleVision, a coarse-to-fine teleoperation system with immersive situated visualization for enhanced ergonomics. At its core, a coarse-to-fine control mechanism is proposed in the retargeting module to bridge workspace disparities, jointly optimizing efficiency and physical ergonomics. To stream immersive feedback with adequate visual cues for human vision systems, an on-demand situated visualization technique is integrated in the perception module, which reduces the cognitive load for multi-view processing. The system is built on a humanoid collaborative robot and validated with six challenging bimanual manipulation tasks. User study among 24 participants confirms that CaFe-TeleVision enhances ergonomics with statistical significance, indicating a lower task load and a higher user acceptance during teleoperation. Quantitative results also validate the superior performance of our system across six tasks, surpassing comparative methods by up to 28.89% in success rate and accelerating by 26.81% in completion time. Project webpage: https://clover-cuhk.github.io/cafe_television/
History-Enhanced Two-Stage Transformer for Aerial Vision-and-Language Navigation
Aerial Vision-and-Language Navigation (AVLN) requires Unmanned Aerial Vehicle (UAV) agents to localize targets in large-scale urban environments based on linguistic instructions. While successful navigation demands both global environmental reasoning and local scene comprehension, existing UAV agents typically adopt mono-granularity frameworks that struggle to balance these two aspects. To address this limitation, this work proposes a History-Enhanced Two-Stage Transformer (HETT) framework, which integrates the two aspects through a coarse-to-fine navigation pipeline. Specifically, HETT first predicts coarse-grained target positions by fusing spatial landmarks and historical context, then refines actions via fine-grained visual analysis. In addition, a historical grid map is designed to dynamically aggregate visual features into a structured spatial memory, enhancing comprehensive scene awareness. Additionally, the CityNav dataset annotations are manually refined to enhance data quality. Experiments on the refined CityNav dataset show that HETT delivers significant performance gains, while extensive ablation studies further verify the effectiveness of each component.
DRAW2ACT: Turning Depth-Encoded Trajectories into Robotic Demonstration Videos
Video diffusion models provide powerful real-world simulators for embodied AI but remain limited in controllability for robotic manipulation. Recent works on trajectory-conditioned video generation address this gap but often rely on 2D trajectories or single modality conditioning, which restricts their ability to produce controllable and consistent robotic demonstrations. We present DRAW2ACT, a depth-aware trajectory-conditioned video generation framework that extracts multiple orthogonal representations from the input trajectory, capturing depth, semantics, shape and motion, and injects them into the diffusion model. Moreover, we propose to jointly generate spatially aligned RGB and depth videos, leveraging cross-modality attention mechanisms and depth supervision to enhance the spatio-temporal consistency. Finally, we introduce a multimodal policy model conditioned on the generated RGB and depth sequences to regress the robot's joint angles. Experiments on Bridge V2, Berkeley Autolab, and simulation benchmarks show that DRAW2ACT achieves superior visual fidelity and consistency while yielding higher manipulation success rates compared to existing baselines.
Trajectory Tracking for Multi-Manipulator Systems in Constrained Environments
We consider the problem of cooperative manipulation by a mobile multi-manipulator system operating in obstacle-cluttered and highly constrained environments under spatio-temporal task specifications. The task requires transporting a grasped object while respecting both continuous robot dynamics and discrete geometric constraints arising from obstacles and narrow passages. To address this hybrid structure, we propose a multi-rate planning and control framework that combines offline generation of an STL-satisfying object trajectory and collision-free base footprints with online constrained inverse kinematics and continuous-time feedback control. The resulting closed-loop system enables coordinated reconfiguration of multiple manipulators while tracking the desired object motion. The approach is evaluated in high-fidelity physics simulations using three Franka Emika Panda mobile manipulators rigidly grasping an object.
SUPER -- A Framework for Sensitivity-based Uncertainty-aware Performance and Risk Assessment in Visual Inertial Odometry
While many visual odometry (VO), visual-inertial odometry (VIO), and SLAM systems achieve high accuracy, the majority of existing methods miss to assess risks at runtime. This paper presents SUPER (Sensitivity-based Uncertainty-aware PErformance and Risk assessment) that is a generic and explainable framework that propagates uncertainties via sensitivities for real-time risk assessment in VIO. The scientific novelty lies in the derivation of a real-time risk indicator that is backend-agnostic and exploits the Schur complement blocks of the Gauss-Newton normal matrix to propagate uncertainties. Practically, the Schur complement captures the sensitivity that reflects the influence of the uncertainty on the risk occurrence. Our framework estimates risks on the basis of the residual magnitudes, geometric conditioning, and short horizon temporal trends without requiring ground truth knowledge. Our framework enables to reliably predict trajectory degradation 50 frames ahead with an improvement of 20% to the baseline. In addition, SUPER initiates a stop or relocalization policy with 89.1% recall. The framework is backend agnostic and operates in real time with less than 0.2% additional CPU cost. Experiments show that SUPER provides consistent uncertainty estimates. A SLAM evaluation highlights the applicability to long horizon mapping.
Interactive Motion Planning for Human-Robot Collaboration Based on Human-Centric Configuration Space Ergonomic Field
Industrial human-robot collaboration requires motion planning that is collision-free, responsive, and ergonomically safe to reduce fatigue and musculoskeletal risk. We propose the Configuration Space Ergonomic Field (CSEF), a continuous and differentiable field over the human joint space that quantifies ergonomic quality and provides gradients for real-time ergonomics-aware planning. An efficient algorithm constructs CSEF from established metrics with joint-wise weighting and task conditioning, and we integrate it into a gradient-based planner compatible with impedance-controlled robots. In a 2-DoF benchmark, CSEF-based planning achieves higher success rates, lower ergonomic cost, and faster computation than a task-space ergonomic planner. Hardware experiments with a dual-arm robot in unimanual guidance, collaborative drilling, and bimanual cocarrying show faster ergonomic cost reduction, closer tracking to optimized joint targets, and lower muscle activation than a point-to-point baseline. CSEF-based planning method reduces average ergonomic scores by up to 10.31% for collaborative drilling tasks and 5.60% for bimanual co-carrying tasks while decreasing activation in key muscle groups, indicating practical benefits for real-world deployment.
comment: 10 pages, 9 figures
Context Representation via Action-Free Transformer encoder-decoder for Meta Reinforcement Learning
Reinforcement learning (RL) enables robots to operate in uncertain environments, but standard approaches often struggle with poor generalization to unseen tasks. Context-adaptive meta reinforcement learning addresses these limitations by conditioning on the task representation, yet they mostly rely on complete action information in the experience making task inference tightly coupled to a specific policy. This paper introduces Context Representation via Action Free Transformer encoder decoder (CRAFT), a belief model that infers task representations solely from sequences of states and rewards. By removing the dependence on actions, CRAFT decouples task inference from policy optimization, supports modular training, and leverages amortized variational inference for scalable belief updates. Built on a transformer encoder decoder with rotary positional embeddings, the model captures long range temporal dependencies and robustly encodes both parametric and non-parametric task variations. Experiments on the MetaWorld ML-10 robotic manipulation benchmark show that CRAFT achieves faster adaptation, improved generalization, and more effective exploration compared to context adaptive meta--RL baselines. These findings highlight the potential of action-free inference as a foundation for scalable RL in robotic control.
Expert Switching for Robust AAV Landing: A Dual-Detector Framework in Simulation
Reliable helipad detection is essential for Autonomous Aerial Vehicle (AAV) landing, especially under GPS-denied or visually degraded conditions. While modern detectors such as YOLOv8 offer strong baseline performance, single-model pipelines struggle to remain robust across the extreme scale transitions that occur during descent, where helipads appear small at high altitude and large near touchdown. To address this limitation, we propose a scale-adaptive dual-expert perception framework that decomposes the detection task into far-range and close-range regimes. Two YOLOv8 experts are trained on scale-specialized versions of the HelipadCat dataset, enabling one model to excel at detecting small, low-resolution helipads and the other to provide high-precision localization when the target dominates the field of view. During inference, both experts operate in parallel, and a geometric gating mechanism selects the expert whose prediction is most consistent with the AAV's viewpoint. This adaptive routing prevents the degradation commonly observed in single-detector systems when operating across wide altitude ranges. The dual-expert perception module is evaluated in a closed-loop landing environment that integrates CARLA's photorealistic rendering with NASA's GUAM flight-dynamics engine. Results show substantial improvements in alignment stability, landing accuracy, and overall robustness compared to single-detector baselines. By introducing a scale-aware expert routing strategy tailored to the landing problem, this work advances resilient vision-based perception for autonomous descent and provides a foundation for future multi-expert AAV frameworks.
E-Navi: Environmental Adaptive Navigation for UAVs on Resource Constrained Platforms
The ability to adapt to changing environments is crucial for the autonomous navigation systems of Unmanned Aerial Vehicles (UAVs). However, existing navigation systems adopt fixed execution configurations without considering environmental dynamics based on available computing resources, e.g., with a high execution frequency and task workload. This static approach causes rigid flight strategies and excessive computations, ultimately degrading flight performance or even leading to failures in UAVs. Despite the necessity for an adaptive system, dynamically adjusting workloads remains challenging, due to difficulties in quantifying environmental complexity and modeling the relationship between environment and system configuration. Aiming at adapting to dynamic environments, this paper proposes E-Navi, an environmental-adaptive navigation system for UAVs that dynamically adjusts task executions on the CPUs in response to environmental changes based on available computational resources. Specifically, the perception-planning pipeline of UAVs navigation system is redesigned through dynamic adaptation of mapping resolution and execution frequency, driven by the quantitative environmental complexity evaluations. In addition, E-Navi supports flexible deployment across hardware platforms with varying levels of computing capability. Extensive Hardware-In-the-Loop and real-world experiments demonstrate that the proposed system significantly outperforms the baseline method across various hardware platforms, achieving up to 53.9% navigation task workload reduction, up to 63.8% flight time savings, and delivering more stable velocity control.
Sample-Efficient Robot Skill Learning for Construction Tasks: Benchmarking Hierarchical Reinforcement Learning and Vision-Language-Action VLA Model
This study evaluates two leading approaches for teaching construction robots new skills to understand their applicability for construction automation: a Vision-Language-Action (VLA) model and Reinforcement Learning (RL) methods. The goal is to understand both task performance and the practical effort needed to deploy each approach on real jobs. The authors developed two teleoperation interfaces to control the robots and collect the demonstrations needed, both of which proved effective for training robots for long-horizon and dexterous tasks. In addition, the authors conduct a three-stage evaluation. First, the authors compare a Multi-Layer Perceptron (MLP) policy with a Deep Q-network (DQN) imitation model to identify the stronger RL baseline, focusing on model performance, generalization, and a pick-up experiment. Second, three different VLA models are trained in two different scenarios and compared with each other. Third, the authors benchmark the selected RL baseline against the VLA model using computational and sample-efficiency measures and then a robot experiment on a multi-stage panel installation task that includes transport and installation. The VLA model demonstrates strong generalization and few-shot capability, achieving 60% and 100% success in the pickup phase. In comparison, DQN can be made robust but needs additional noise during tuning, which increases the workload. Overall, the findings indicate that VLA offers practical advantages for changing tasks by reducing programming effort and enabling useful performance with minimal data, while DQN provides a viable baseline when sufficient tuning effort is acceptable.
CLAIM: Camera-LiDAR Alignment with Intensity and Monodepth IROS 2025
In this paper, we unleash the potential of the powerful monodepth model in camera-LiDAR calibration and propose CLAIM, a novel method of aligning data from the camera and LiDAR. Given the initial guess and pairs of images and LiDAR point clouds, CLAIM utilizes a coarse-to-fine searching method to find the optimal transformation minimizing a patched Pearson correlation-based structure loss and a mutual information-based texture loss. These two losses serve as good metrics for camera-LiDAR alignment results and require no complicated steps of data processing, feature extraction, or feature matching like most methods, rendering our method simple and adaptive to most scenes. We validate CLAIM on public KITTI, Waymo, and MIAS-LCEC datasets, and the experimental results demonstrate its superior performance compared with the state-of-the-art methods. The code is available at https://github.com/Tompson11/claim.
comment: Accepted by IROS 2025
Impact of Robot Facial-Audio Expressions on Human Robot Trust Dynamics and Trust Repair
Despite recent advances in robotics and human-robot collaboration in the AEC industry, trust has mostly been treated as a static factor, with little guidance on how it changes across events during collaboration. This paper investigates how a robot's task performance and its expressive responses after outcomes shape the dynamics of human trust over time. To this end, we designed a controlled within-subjects study with two construction-inspired tasks, Material Delivery (physical assistance) and Information Gathering (perceptual assistance), and measured trust repeatedly (four times per task) using the 14-item Trust Perception Scale for HRI plus a redelegation choice. The robot produced two multimodal expressions, a "glad" display with a brief confirmation after success, and a "sad" display with an apology and a request for a second chance after failure. The study was conducted in a lab environment with 30 participants and a quadruped platform, and we evaluated trust dynamics and repair across both tasks. Results show that robot success reliably increases trust, failure causes sharp drops, and apology-based expressions partially restores trust (44% recovery in Material Delivery; 38% in Information Gathering). Item-level analysis indicates that recovered trust was driven mostly by interaction and communication factors, with competence recovering partially and autonomy aspects changing least. Additionally, age group and prior attitudes moderated trust dynamics with younger participants showed larger but shorter-lived changes, mid-20s participants exhibited the most durable repair, and older participants showed most conservative dynamics. This work provides a foundation for future efforts that adapt repair strategies to task demands and user profiles to support safe, productive adoption of robots on construction sites.
Autonomous Construction-Site Safety Inspection Using Mobile Robots: A Multilayer VLM-LLM Pipeline
Construction safety inspection remains mostly manual, and automated approaches still rely on task-specific datasets that are hard to maintain in fast-changing construction environments due to frequent retraining. Meanwhile, field inspection with robots still depends on human teleoperation and manual reporting, which are labor-intensive. This paper aims to connect what a robot sees during autonomous navigation to the safety rules that are common in construction sites, automatically generating a safety inspection report. To this end, we proposed a multi-layer framework with two main modules: robotics and AI. On the robotics side, SLAM and autonomous navigation provide repeatable coverage and targeted revisits via waypoints. On AI side, a Vision Language Model (VLM)-based layer produces scene descriptions; a retrieval component powered grounds those descriptions in OSHA and site policies; Another VLM-based layer assesses the safety situation based on rules; and finally Large Language Model (LLM) layer generates safety reports based on previous outputs. The framework is validated with a proof-of-concept implementation and evaluated in a lab environment that simulates common hazards across three scenarios. Results show high recall with competitive precision compared to state-of-the-art closed-source models. This paper contributes a transparent, generalizable pipeline that moves beyond black-box models by exposing intermediate artifacts from each layer and keeping the human in the loop. This work provides a foundation for future extensions to additional tasks and settings within and beyond construction context.
Breathe with Me: Synchronizing Biosignals for User Embodiment in Robots
Embodiment of users within robotic systems has been explored in human-robot interaction, most often in telepresence and teleoperation. In these applications, synchronized visuomotor feedback can evoke a sense of body ownership and agency, contributing to the experience of embodiment. We extend this work by employing embreathment, the representation of the user's own breath in real time, as a means for enhancing user embodiment experience in robots. In a within-subjects experiment, participants controlled a robotic arm, while its movements were either synchronized or non-synchronized with their own breath. Synchrony was shown to significantly increase body ownership, and was preferred by most participants. We propose the representation of physiological signals as a novel interoceptive pathway for human-robot interaction, and discuss implications for telepresence, prosthetics, collaboration with robots, and shared autonomy.
comment: Accepted to appear in the ACM/IEEE International Conference on Human-Robot Interaction (HRI '26), Edinburgh, United Kingdom. Iddo Yehoshua Wald and Amber Maimon contributed equally
Semantic-Drive: Democratizing Long-Tail Data Curation via Open-Vocabulary Grounding and Neuro-Symbolic VLM Consensus
The development of robust Autonomous Vehicles (AVs) is bottlenecked by the scarcity of "Long-Tail" training data. While fleets collect petabytes of video logs, identifying rare safety-critical events (e.g., erratic jaywalking, construction diversions) remains a manual, cost-prohibitive process. Existing solutions rely on coarse metadata search, which lacks precision, or cloud-based VLMs, which are privacy-invasive and expensive. We introduce Semantic-Drive, a local-first, neuro-symbolic framework for semantic data mining. Our approach decouples perception into two stages: (1) Symbolic Grounding via a real-time open-vocabulary detector (YOLOE) to anchor attention, and (2) Cognitive Analysis via a Reasoning VLM that performs forensic scene analysis. To mitigate hallucination, we implement a "System 2" inference-time alignment strategy, utilizing a multi-model "Judge-Scout" consensus mechanism. Benchmarked on the nuScenes dataset against the Waymo Open Dataset (WOD-E2E) taxonomy, Semantic-Drive achieves a Recall of 0.966 (vs. 0.475 for CLIP) and reduces Risk Assessment Error by 40% ccompared to the best single scout models. The system runs entirely on consumer hardware (NVIDIA RTX 3090), offering a privacy-preserving alternative to the cloud.
Closing the Loop: Motion Prediction Models beyond Open-Loop Benchmarks
Fueled by motion prediction competitions and benchmarks, recent years have seen the emergence of increasingly large learning based prediction models, many with millions of parameters, focused on improving open-loop prediction accuracy by mere centimeters. However, these benchmarks fail to assess whether such improvements translate to better performance when integrated into an autonomous driving stack. In this work, we systematically evaluate the interplay between state-of-the-art motion predictors and motion planners. Our results show that higher open-loop accuracy does not always correlate with better closed-loop driving behavior and that other factors, such as temporal consistency of predictions and planner compatibility, also play a critical role. Furthermore, we investigate downsized variants of these models, and, surprisingly, find that in some cases models with up to 86% fewer parameters yield comparable or even superior closed-loop driving performance. Our code is available at https://github.com/aumovio/pred2plan.
Data-fused MPC with Guarantees: Application to Flying Humanoid Robots
This paper introduces a Data-Fused Model Predictive Control (DFMPC) framework that combines physics-based models with data-driven representations of unknown dynamics. Leveraging Willems' Fundamental Lemma and an artificial equilibrium formulation, the method enables tracking of changing, potentially unreachable setpoints while explicitly handling measurement noise through slack variables and regularization. We provide guarantees of recursive feasibility and practical stability under input-output constraints for a specific class of reference signals. The approach is validated on the iRonCub flying humanoid robot, integrating analytical momentum models with data-driven turbine dynamics. Simulations show improved tracking and robustness compared to a purely model-based MPC, while maintaining real-time feasibility.
comment: This paper has been accepted for publication in IEEE Control Systems Letters (L-CSS)
Mirror Skin: In Situ Visualization of Robot Touch Intent on Robotic Skin
Effective communication of robotic touch intent is a key factor in promoting safe and predictable physical human-robot interaction (pHRI). While intent communication has been widely studied, existing approaches lack the spatial specificity and semantic depth necessary to convey robot touch actions. We present Mirror Skin, a cephalopod-inspired concept that utilizes high-resolution, mirror-like visual feedback on robotic skin. By mapping in-situ visual representations of a human's body parts onto the corresponding robot's touch region, Mirror Skin communicates who shall initiate touch, where it will occur, and when it is imminent. To inform the design of Mirror Skin, we conducted a structured design exploration with experts in virtual reality (VR), iteratively refining six key dimensions. A subsequent controlled user study demonstrated that Mirror Skin significantly enhances accuracy and reduces response times for interpreting touch intent. These findings highlight the potential of visual feedback on robotic skin to communicate human-robot touch interactions.
MindDrive: A Vision-Language-Action Model for Autonomous Driving via Online Reinforcement Learning
Current Vision-Language-Action (VLA) paradigms in autonomous driving primarily rely on Imitation Learning (IL), which introduces inherent challenges such as distribution shift and causal confusion. Online Reinforcement Learning offers a promising pathway to address these issues through trial-and-error learning. However, applying online reinforcement learning to VLA models in autonomous driving is hindered by inefficient exploration in continuous action spaces. To overcome this limitation, we propose MindDrive, a VLA framework comprising a large language model (LLM) with two distinct sets of LoRA parameters. The one LLM serves as a Decision Expert for scenario reasoning and driving decision-making, while the other acts as an Action Expert that dynamically maps linguistic decisions into feasible trajectories. By feeding trajectory-level rewards back into the reasoning space, MindDrive enables trial-and-error learning over a finite set of discrete linguistic driving decisions, instead of operating directly in a continuous action space. This approach effectively balances optimal decision-making in complex scenarios, human-like driving behavior, and efficient exploration in online reinforcement learning. Using the lightweight Qwen-0.5B LLM, MindDrive achieves Driving Score (DS) of 78.04 and Success Rate (SR) of 55.09% on the challenging Bench2Drive benchmark. To the best of our knowledge, this is the first work to demonstrate the effectiveness of online reinforcement learning for the VLA model in autonomous driving.
comment: 16 pages, 12 figures, 6 tables; Project Page: https://xiaomi-mlab.github.io/MindDrive/
Recent Advances in Multi-Agent Human Trajectory Prediction: A Comprehensive Review
With the emergence of powerful data-driven methods in human trajectory prediction (HTP), gaining a finer understanding of multi-agent interactions lies within hand's reach, with important implications in areas such as social robot navigation, autonomous navigation, and crowd modeling. This survey reviews some of the most recent advancements in deep learning-based multi-agent trajectory prediction, focusing on studies published between 2020 and 2025. We categorize the existing methods based on their architectural design, their input representations, and their overall prediction strategies, placing a particular emphasis on models evaluated using the ETH/UCY benchmark. Furthermore, we highlight key challenges and future research directions in the field of multi-agent HTP.
comment: 45 pages
MAPS$^2$: Multi-Robot Autonomous Motion Planning under Signal Temporal Logic Specifications
This article presents MAPS$^2$ : a distributed algorithm that allows multi-robot systems to deliver coupled tasks expressed as Signal Temporal Logic (STL) constraints. Classical control theoretical tools addressing STL constraints either adopt a limited fragment of the STL formula or require approximations of min/max operators, whereas works maximising robustness through optimisation-based methods often suffer from local minima, relaxing any completeness arguments due to the NP-hard nature of the problem. Endowed with probabilistic guarantees, MAPS$^2$ provides an anytime algorithm that iteratively improves the robots' trajectories. The algorithm selectively imposes spatial constraints by taking advantage of the temporal properties of the STL. The algorithm is distributed, in the sense that each robot calculates its trajectory by communicating only with its immediate neighbours as defined via a communication graph. We illustrate the efficiency of MAPS$^2$ by conducting extensive simulation and experimental studies, verifying the generation of STL satisfying trajectories.
MMDrive: Interactive Scene Understanding Beyond Vision with Multi-representational Fusion
Vision-language models enable the understanding and reasoning of complex traffic scenarios through multi-source information fusion, establishing it as a core technology for autonomous driving. However, existing vision-language models are constrained by the image understanding paradigm in 2D plane, which restricts their capability to perceive 3D spatial information and perform deep semantic fusion, resulting in suboptimal performance in complex autonomous driving environments. This study proposes MMDrive, an multimodal vision-language model framework that extends traditional image understanding to a generalized 3D scene understanding framework. MMDrive incorporates three complementary modalities, including occupancy maps, LiDAR point clouds, and textual scene descriptions. To this end, it introduces two novel components for adaptive cross-modal fusion and key information extraction. Specifically, the Text-oriented Multimodal Modulator dynamically weights the contributions of each modality based on the semantic cues in the question, guiding context-aware feature integration. The Cross-Modal Abstractor employs learnable abstract tokens to generate compact, cross-modal summaries that highlight key regions and essential semantics. Comprehensive evaluations on the DriveLM and NuScenes-QA benchmarks demonstrate that MMDrive achieves significant performance gains over existing vision-language models for autonomous driving, with a BLEU-4 score of 54.56 and METEOR of 41.78 on DriveLM, and an accuracy score of 62.7% on NuScenes-QA. MMDrive effectively breaks the traditional image-only understanding barrier, enabling robust multimodal reasoning in complex driving environments and providing a new foundation for interpretable autonomous driving scene understanding.
Decomposed Object Manipulation via Dual-Actor Policy
Object manipulation, which focuses on learning to perform tasks on similar parts across different types of objects, can be divided into an approaching stage and a manipulation stage. However, previous works often ignore this characteristic of the task and rely on a single policy to directly learn the whole process of object manipulation. To address this problem, we propose a novel Dual-Actor Policy, termed DAP, which explicitly considers different stages and leverages heterogeneous visual priors to enhance each stage. Specifically, we introduce an affordance-based actor to locate the functional part in the manipulation task, thereby improving the approaching process. Following this, we propose a motion flow-based actor to capture the movement of the component, facilitating the manipulation process. Finally, we introduce a decision maker to determine the current stage of DAP and select the corresponding actor. Moreover, existing object manipulation datasets contain few objects and lack the visual priors needed to support training. To address this, we construct a simulated dataset, the Dual-Prior Object Manipulation Dataset, which combines the two visual priors and includes seven tasks, including two challenging long-term, multi-stage tasks. Experimental results on our dataset, the RoboTwin benchmark and real-world scenarios illustrate that our method consistently outperforms the SOTA method by 5.55%, 14.7% and 10.4% on average respectively.
Intrinsic-Motivation Multi-Robot Social Formation Navigation with Coordinated Exploration
This paper investigates the application of reinforcement learning (RL) to multi-robot social formation navigation, a critical capability for enabling seamless human-robot coexistence. While RL offers a promising paradigm, the inherent unpredictability and often uncooperative dynamics of pedestrian behavior pose substantial challenges, particularly concerning the efficiency of coordinated exploration among robots. To address this, we propose a novel coordinated-exploration multi-robot RL algorithm introducing an intrinsic motivation exploration. Its core component is a self-learning intrinsic reward mechanism designed to collectively alleviate policy conservatism. Moreover, this algorithm incorporates a dual-sampling mode within the centralized training and decentralized execution framework to enhance the representation of both the navigation policy and the intrinsic reward, leveraging a two-time-scale update rule to decouple parameter updates. Empirical results on social formation navigation benchmarks demonstrate the proposed algorithm's superior performance over existing state-of-the-art methods across crucial metrics. Our code and video demos are available at: https://github.com/czxhunzi/CEMRRL.
Multiagent Systems
Multi-Agent Medical Decision Consensus Matrix System: An Intelligent Collaborative Framework for Oncology MDT Consultations
Multidisciplinary team (MDT) consultations are the gold standard for cancer care decision-making, yet current practice lacks structured mechanisms for quantifying consensus and ensuring decision traceability. We introduce a Multi-Agent Medical Decision Consensus Matrix System that deploys seven specialized large language model agents, including an oncologist, a radiologist, a nurse, a psychologist, a patient advocate, a nutritionist and a rehabilitation therapist, to simulate realistic MDT workflows. The framework incorporates a mathematically grounded consensus matrix that uses Kendall's coefficient of concordance to objectively assess agreement. To further enhance treatment recommendation quality and consensus efficiency, the system integrates reinforcement learning methods, including Q-Learning, PPO and DQN. Evaluation across five medical benchmarks (MedQA, PubMedQA, DDXPlus, MedBullets and SymCat) shows substantial gains over existing approaches, achieving an average accuracy of 87.5% compared with 83.8% for the strongest baseline, a consensus achievement rate of 89.3% and a mean Kendall's W of 0.823. Expert reviewers rated the clinical appropriateness of system outputs at 8.9/10. The system guarantees full evidence traceability through mandatory citations of clinical guidelines and peer-reviewed literature, following GRADE principles. This work advances medical AI by providing structured consensus measurement, role-specialized multi-agent collaboration and evidence-based explainability to improve the quality and efficiency of clinical decision-making.
comment: 14 pages, 6 figures
Grammar Search for Multi-Agent Systems
Automatic search for Multi-Agent Systems has recently emerged as a key focus in agentic AI research. Several prior approaches have relied on LLM-based free-form search over the code space. In this work, we propose a more structured framework that explores the same space through a fixed set of simple, composable components. We show that, despite lacking the generative flexibility of LLMs during the candidate generation stage, our method outperforms prior approaches on four out of five benchmarks across two domains: mathematics and question answering. Furthermore, our method offers additional advantages, including a more cost-efficient search process and the generation of modular, interpretable multi-agent systems with simpler logic.
Single-Agent Scaling Fails Multi-Agent Intelligence: Towards Foundation Models with Native Multi-Agent Intelligence
Foundation models (FMs) are increasingly assuming the role of the ''brain'' of AI agents. While recent efforts have begun to equip FMs with native single-agent abilities -- such as GUI interaction or integrated tool use -- we argue that the next frontier is endowing FMs with native multi-agent intelligence. We identify four core capabilities of FMs in multi-agent contexts: understanding, planning, efficient communication, and adaptation. Contrary to assumptions about the spontaneous emergence of such abilities, we provide extensive empirical evidence, across 41 large language models and 7 challenging benchmarks, showing that scaling single-agent performance alone does not automatically yield robust multi-agent intelligence. To address this gap, we outline key research directions -- spanning dataset construction, evaluation, training paradigms, and safety considerations -- for building FMs with native multi-agent intelligence.
Exploring Network-Knowledge Graph Duality: A Case Study in Agentic Supply Chain Risk Analysis
Large Language Models (LLMs) struggle with the complex, multi-modal, and network-native data underlying financial risk. Standard Retrieval-Augmented Generation (RAG) oversimplifies relationships, while specialist models are costly and static. We address this gap with an LLM-centric agent framework for supply chain risk analysis. Our core contribution is to exploit the inherent duality between networks and knowledge graphs (KG). We treat the supply chain network as a KG, allowing us to use structural network science principles for retrieval. A graph traverser, guided by network centrality scores, efficiently extracts the most economically salient risk paths. An agentic architecture orchestrates this graph retrieval alongside data from numerical factor tables and news streams. Crucially, it employs novel ``context shells'' -- descriptive templates that embed raw figures in natural language -- to make quantitative data fully intelligible to the LLM. This lightweight approach enables the model to generate concise, explainable, and context-rich risk narratives in real-time without costly fine-tuning or a dedicated graph database.
comment: Accepted to the 2nd Workshop on LLMs and Generative AI in Finance: International Conference on AI in Finance(ICAIF) 2025;7 pages, 3 figures
Beyond Task Completion: An Assessment Framework for Evaluating Agentic AI Systems
Recent advances in agentic AI have shifted the focus from standalone Large Language Models (LLMs) to integrated systems that combine LLMs with tools, memory, and other agents to perform complex tasks. These multi-agent architectures enable coordinated reasoning, planning, and execution across diverse domains, allowing agents to collaboratively automate complex workflows. Despite these advances, evaluation and assessment of LLM agents and the multi-agent systems they constitute remain a fundamental challenge. Although various approaches have been proposed in the software engineering literature for evaluating conventional software components, existing methods for AI-based systems often overlook the non-deterministic nature of models. This non-determinism introduces behavioral uncertainty during execution, yet existing evaluations rely on binary task completion metrics that fail to capture it. Evaluating agentic systems therefore requires examining additional dimensions, including the agent ability to invoke tools, ingest and retrieve memory, collaborate with other agents, and interact effectively with its environment. These challenges emerged during our ongoing industry collaboration with MontyCloud Inc., when we deployed an agentic system in production. These limitations surfaced during deployment, highlighting practical gaps in the current evaluation methods and the need for a systematic assessment of agent behavior beyond task outcomes. Informed by these observations and established definitions of agentic systems, we propose an end-to-end Agent Assessment Framework with four evaluation pillars encompassing LLMs, Memory, Tools, and Environment. We validate the framework on a representative Autonomous CloudOps use case, where experiments reveal behavioral deviations overlooked by conventional metrics, demonstrating its effectiveness in capturing runtime uncertainties.
Optimizing Highway Traffic Flow in Mixed Autonomy: A Multiagent Truncated Rollout Approach
The development of connected and autonomous vehicles (CAVs) offers substantial opportunities to enhance traffic efficiency. However, in mixed autonomy environments where CAVs coexist with human-driven vehicles (HDVs), achieving efficient coordination among CAVs remains challenging due to heterogeneous driving behaviors. To address this, this paper proposes a multiagent truncated rollout approach that enhances CAV speed coordination to improve highway throughput while reducing computational overhead. In this approach, a traffic density evolution equation is formulated that comprehensively accounts for the presence or absence of CAVs, and a distributed coordination control framework is established accordingly. By incorporating kinematic information from neighbor agents and employing an agent-by-agent sequential solution mechanism, our method enables explicit cooperation among CAVs. Furthermore, we introduce a truncated rollout scheme that adaptively shortens the optimization horizon based on the evaluation of control sequences. This significantly reduces the time complexity, thereby improving real-time performance and scalability. Theoretical analysis provides rigorous guarantees on the stability and performance improvement of the system. Simulations conducted on real-world bottleneck scenarios demonstrate that, in large-scale mixed traffic flows, the proposed method outperforms conventional model predictive control methods by reducing both the average travel time in the bottleneck area and overall computational time, highlighting its strong potential for practical deployment.
Bayesian Ego-graph Inference for Networked Multi-Agent Reinforcement Learning NeurIPS 2025
In networked multi-agent reinforcement learning (Networked-MARL), decentralized agents must act under local observability and constrained communication over fixed physical graphs. Existing methods often assume static neighborhoods, limiting adaptability to dynamic or heterogeneous environments. While centralized frameworks can learn dynamic graphs, their reliance on global state access and centralized infrastructure is impractical in real-world decentralized systems. We propose a stochastic graph-based policy for Networked-MARL, where each agent conditions its decision on a sampled subgraph over its local physical neighborhood. Building on this formulation, we introduce BayesG, a decentralized actor-framework that learns sparse, context-aware interaction structures via Bayesian variational inference. Each agent operates over an ego-graph and samples a latent communication mask to guide message passing and policy computation. The variational distribution is trained end-to-end alongside the policy using an evidence lower bound (ELBO) objective, enabling agents to jointly learn both interaction topology and decision-making strategies. BayesG outperforms strong MARL baselines on large-scale traffic control tasks with up to 167 agents, demonstrating superior scalability, efficiency, and performance.
comment: Accepted at NeurIPS 2025 https://openreview.net/forum?id=3qeTs05bRL
Rethinking the Reliability of Multi-agent System: A Perspective from Byzantine Fault Tolerance
Ensuring the reliability of agent architectures and effectively identifying problematic agents when failures occur are crucial challenges in multi-agent systems (MAS). Advances in large language models (LLMs) have established LLM-based agents as a major branch of MAS, enabling major breakthroughs in complex problem solving and world modeling. However, the reliability implications of this shift remain largely unexplored. i.e., whether substituting traditional agents with LLM-based agents can effectively enhance the reliability of MAS. In this work, we investigate and quantify the reliability of LLM-based agents from the perspective of Byzantine fault tolerance. We observe that LLM-based agents demonstrate stronger skepticism when processing erroneous message flows, a characteristic that enables them to outperform traditional agents across different topological structures. Motivated by the results of the pilot experiment, we design CP-WBFT, a confidence probe-based weighted Byzantine Fault Tolerant consensus mechanism to enhance the stability of MAS with different topologies. It capitalizes on the intrinsic reflective and discriminative capabilities of LLMs by employing a probe-based, weighted information flow transmission method to improve the reliability of LLM-based agents. Extensive experiments demonstrate that CP-WBFT achieves superior performance across diverse network topologies under extreme Byzantine conditions (85.7\% fault rate). Notably, our approach surpasses traditional methods by attaining remarkable accuracy on various topologies and maintaining strong reliability in both mathematical reasoning and safety assessment tasks.
MedChat: A Multi-Agent Framework for Multimodal Diagnosis with Large Language Models
The integration of deep learning-based glaucoma detection with large language models (LLMs) presents an automated strategy to mitigate ophthalmologist shortages and improve clinical reporting efficiency. However, applying general LLMs to medical imaging remains challenging due to hallucinations, limited interpretability, and insufficient domain-specific medical knowledge, which can potentially reduce clinical accuracy. Although recent approaches combining imaging models with LLM reasoning have improved reporting, they typically rely on a single generalist agent, restricting their capacity to emulate the diverse and complex reasoning found in multidisciplinary medical teams. To address these limitations, we propose MedChat, a multi-agent diagnostic framework and platform that combines specialized vision models with multiple role-specific LLM agents, all coordinated by a director agent. This design enhances reliability, reduces hallucination risk, and enables interactive diagnostic reporting through an interface tailored for clinical review and educational use. Code available at https://github.com/Purdue-M2/MedChat.
Systems and Control (EESS)
Enhancing Orbital Debris Remediation with Reconfigurable Space-Based Laser Constellations
Orbital debris poses an escalating threat to space missions and the long-term sustainability of Earth's orbital environment. The literature proposes various approaches for orbital debris remediation, including the use of multiple space-based lasers that collaboratively engage debris targets. While the proof of concept for this laser-based approach has been demonstrated, critical questions remain about its scalability and responsiveness as the debris population continues to expand rapidly. This paper introduces constellation reconfiguration as a system-level strategy to address these limitations. Through coordinated orbital maneuvers, laser-equipped satellites can dynamically adapt their positions to respond to evolving debris distributions and time-critical events. We formalize this concept as the Reconfigurable Laser-to-Debris Engagement Scheduling Problem (R-L2D-ESP), an optimization framework that determines the optimal sequence of constellation reconfigurations and laser engagements to maximize debris remediation capacity, which quantifies the constellation's ability to nudge, deorbit, or perform just-in-time collision avoidance maneuvers on debris objects. To manage the complexity of this combinatorial optimization problem, we employ a receding horizon approach. Our experiments reveal that reconfigurable constellations significantly outperform static ones, achieving greater debris remediation capacity and successfully deorbiting substantially more debris objects. Additionally, our sensitivity analyses identify the key parameters that influence remediation performance the most, providing essential insights for future system design. These findings demonstrate that constellation reconfiguration represents a promising advancement for laser-based debris removal systems, offering the adaptability and scalability necessary to enhance this particular approach to orbital debris remediation.
comment: Accepted to the 2026 IEEE Aerospace Conference
gridfm-datakit-v1: A Python Library for Scalable and Realistic Power Flow and Optimal Power Flow Data Generation
We introduce gridfm-datakit-v1, a Python library for generating realistic and diverse Power Flow (PF) and Optimal Power Flow (OPF) datasets for training Machine Learning (ML) solvers. Existing datasets and libraries face three main challenges: (1) lack of realistic stochastic load and topology perturbations, limiting scenario diversity; (2) PF datasets are restricted to OPF-feasible points, hindering generalization of ML solvers to cases that violate operating limits (e.g., branch overloads or voltage violations); and (3) OPF datasets use fixed generator cost functions, limiting generalization across varying costs. gridfm-datakit addresses these challenges by: (1) combining global load scaling from real-world profiles with localized noise and supporting arbitrary N-k topology perturbations to create diverse yet realistic datasets; (2) generating PF samples beyond operating limits; and (3) producing OPF data with varying generator costs. It also scales efficiently to large grids (up to 10,000 buses). Comparisons with OPFData, OPF-Learn, PGLearn, and PF$Δ$ are provided. Available on GitHub at https://github.com/gridfm/gridfm-datakit under Apache 2.0 and via `pip install gridfm-datakit`.
comment: Main equal contributors: Alban Puech, Matteo Mazzonelli. Other equal contributors: Celia Cintas, Tamara R. Govindasamy, Mangaliso Mngomezulu, Jonas Weiss
Scalable Nonlinear DeePC: Bridging Direct and Indirect Methods and Basis Reduction
This paper studies regularized data-enabled predictive control (DeePC) within a nonlinear framework and its relationship to subspace predictive control (SPC). The $Π$-regularization is extended to general basis functions and it is shown that, under suitable conditions, the resulting basis functions DeePC formulation constitutes a relaxation of basis functions SPC. To improve scalability, we introduce an SVD-based dimensionality reduction that preserves the equivalence with SPC, and we derive a reduced Π-regularization. A LASSO based sparse basis selection method is proposed to obtain a reduced basis from lifted data. Simulations on a nonlinear van der Pol oscillator model indicate that, in the absence of noise, DeePC and SPC yield equivalent absolute mean tracking errors (AMEs) when large penalties are applied. In contrast, under noisy measurements, careful tuning of the DeePC regularization results in a reduced AME, outperforming SPC.
The Innovation Null Space of the Kalman Predictor: A Stochastic Perspective for DeePC
Willems' fundamental lemma uses a key decision variable $g$ to combine measured input-output data and describe trajectories of a linear time-invariant system. In this paper, we ask: what is a good choice for this vector $g$ when the system is affected by noise? For a linear system with Gaussian noise, we show that there exists an optimal subspace for this decision variable $g$, which is the null space of the innovation Hankel matrix. If the decision vector lies in this null space, the resulting predictor gets closer to the Kalman predictor. To show this, we use a result that we refer to as the Kalman Filter Fundamental Lemma (KFFL), which applies Willems' lemma to the Kalman predictor. This viewpoint also explains several existing data-driven predictive control methods: regularized DeePC schemes act as soft versions of the innovation null-space constraint, instrumental-variable methods enforce it by construction, and ARX-based approaches explicitly estimate this innovation null space.
Closed-Loop Consistent, Causal Data-Driven Predictive Control via SSARX
We propose a fundamental-lemma-free data-driven predictive control (DDPC) scheme for synthesizing model predictive control (MPC)-like policies directly from input-output data. Unlike the well-known DeePC approach and other DDPC methods that rely on Willems' fundamental lemma, our method avoids stacked Hankel representations and the DeePC decision variable g. Instead, we develop a closed-loop consistent, causal DDPC scheme based on the multi-step predictor Subspace-ARX (SSARX). The method first (i) estimates predictor/observer Markov parameters via a high-order ARX model to decouple the noise, then (ii) learns a multi-step past-to-future map by regression, optionally with a reduced-rank constraint. The SSARX predictor is strictly causal, which allows it to be integrated naturally into an MPC formulation. Our experimental results show that SSARX performs competitively with other methods when applied to closed-loop data affected by measurement and process noise.
Equivariant Observer for Bearing Estimation with Linear and Angular Velocity Inputs
This work addresses the problem of designing an equivariant observer for a first order dynamical system on the unit-sphere. Building upon the established case of unit bearing vector dynamics with angular velocity inputs, we introduce an additional linear velocity input projected onto the unit-sphere tangent space. This extended formulation is particularly useful in image-based visual servoing scenarios where stable bearing estimates are required and the relative velocity between the vehicle and target features must be accounted for. Leveraging lifted kinematics to the Special Orthogonal group, we design an observer for the bearing vector and prove its almost global asymptotic stability. Additionally, we demonstrate how the equivariant observer can be expressed in the original state manifold. Numerical simulation results validate the effectiveness of the proposed algorithm.
comment: This work has been submitted to the 2026 European Control Conference
Nonlinear System Identification Nano-drone Benchmark
We introduce a benchmark for system identification based on 75k real-world samples from the Crazyflie 2.1 Brushless nano-quadrotor, a sub-50g aerial vehicle widely adopted in robotics research. The platform presents a challenging testbed due to its multi-input, multi-output nature, open-loop instability, and nonlinear dynamics under agile maneuvers. The dataset comprises four aggressive trajectories with synchronized 4-dimensional motor inputs and 13-dimensional output measurements. To enable fair comparison of identification methods, the benchmark includes a suite of multi-horizon prediction metrics for evaluating both one-step and multi-step error propagation. In addition to the data, we provide a detailed description of the platform and experimental setup, as well as baseline models highlighting the challenge of accurate prediction under real-world noise and actuation nonlinearities. All data, scripts, and reference implementations are released as open-source at https://github.com/idsia-robotics/nanodrone-sysid-benchmark to facilitate transparent comparison of algorithms and support research on agile, miniaturized aerial robotics.
Equivariant Filter Cascade for Relative Attitude, Target's Angular Velocity, and Gyroscope Bias Estimation
Rendezvous and docking between a chaser spacecraft and an uncooperative target, such as an inoperative satellite, require synchronization between the chaser spacecraft and the target. In these scenarios, the chaser must estimate the relative attitude and angular velocity of the target using onboard sensors, in the presence of gyroscope bias. In this work, we propose a cascade of Equivariant Filters (EqF) to address this problem. The first stage of the cascade estimates the chaser's attitude and the bias, using measurements from a star tracker, while the second stage of the cascade estimates the relative attitude and the target's angular velocity, using observations of two known, non-collinear vectors fixed in the target frame. The stability of the EqF cascade is theoretically analyzed and simulation results demonstrate the filter cascade's performance.
comment: This work has been submitted to the 2026 European Control Conference
Temporal interference stimulation for deep brain neuromodulation in humans
For decades, focal non-invasive neuromodulation of deep brain regions has not been possible because of the steep depth-focality trade-off of conventional non-invasive brain stimulation (NIBS) techniques, such as transcranial magnetic stimulation (TMS) or classical transcranial electric stimulation (tES). Deep brain stimulation has therefore largely relied on invasive approaches in clinical populations, requiring surgery. Transcranial Temporal Interference Stimulation (tTIS) has recently emerged as a promising method to overcome this challenge and allows for the first time focal non-invasive electrical deep brain stimulation. The method, which was first validated through computational modeling and rodent work, has now been successfully translated to humans to target deep brain regions such as the hippocampus or striatum. In this Perspective, we present current evidence for tTIS-based neuromodulation, underlying mechanisms and discuss future developments of this promising technology. More specifically, we highlight key opportunities and challenges for fundamental neuroscience as well as for the design of new interventions in neuropsychiatric disorders. We also discuss the status of understanding and challenges regarding the basic mechanisms of action of tTIS and possible lines of technological innovation to optimize stimulation, in particular in terms of intensity and focality. Overall, we suggest that following the first proof-of-concepts, an important multidisciplinary research effort is now required to further validate the use of tTIS in multiple applications, understand its underlying principles and optimize the technology in the view of a wider scientific and clinical deployment.
Fine-Tuning of Neural Network Approximate MPC without Retraining via Bayesian Optimization
Approximate model-predictive control (AMPC) aims to imitate an MPC's behavior with a neural network, removing the need to solve an expensive optimization problem at runtime. However, during deployment, the parameters of the underlying MPC must usually be fine-tuned. This often renders AMPC impractical as it requires repeatedly generating a new dataset and retraining the neural network. Recent work addresses this problem by adapting AMPC without retraining using approximated sensitivities of the MPC's optimization problem. Currently, this adaption must be done by hand, which is labor-intensive and can be unintuitive for high-dimensional systems. To solve this issue, we propose using Bayesian optimization to tune the parameters of AMPC policies based on experimental data. By combining model-based control with direct and local learning, our approach achieves superior performance to nominal AMPC on hardware, with minimal experimentation. This allows automatic and data-efficient adaptation of AMPC to new system instances and fine-tuning to cost functions that are difficult to directly implement in MPC. We demonstrate the proposed method in hardware experiments for the swing-up maneuver on an inverted cartpole and yaw control of an under-actuated balancing unicycle robot, a challenging control problem.
A Geometric Task-Space Port-Hamiltonian Formulation for Redundant Manipulators
We present a novel geometric port-Hamiltonian formulation of redundant manipulators performing a differential kinematic task $η=J(q)\dot{q}$, where $q$ is a point on the configuration manifold, $η$ is a velocity-like task space variable, and $J(q)$ is a linear map representing the task, for example the classical analytic or geometric manipulator Jacobian matrix. The proposed model emerges from a change of coordinates from canonical Hamiltonian dynamics, and splits the standard Hamiltonian momentum variable into a task-space momentum variable and a null-space momentum variable. Properties of this model and relation to Lagrangian formulations present in the literature are highlighted. Finally, we apply the proposed model in an \textit{Interconnection and Damping Assignment Passivity-Based Control} (IDA-PBC) design to stabilize and shape the impedance of a 7-DOF Emika Panda robot in simulation.
A Data-Driven Approach for Electric Vehicle Powertrain Modeling
Electrification in the automotive industry and increasing powertrain complexity demand accelerated, cost-effective development cycles. While data-driven models are recently investigated at component level, a gap exists in systematically integrating them into cohesive, system-level simulations for virtual validation. This paper addresses this gap by presenting a modular framework for developing powertrain simulations. By defining standardized interfaces for key components-the battery, inverter, and electric motor-our methodology enables independently developed models, whether data-driven, physics-based, or empirical, to be easily integrated. This approach facilitates scalable system-level modeling, aims to shorten development timelines and to meet the agile demands of the modern automotive industry.
Estimating Reaction Rate Constants from Impedance Spectra: Simulating the Multistep Oxygen Evolution Reaction
The efficiency of water electrolysis in a photoelectrochemical cell is largely limited by the oxygen evolution reaction (OER) at its semiconductor photoanode. Reaction rate constants are key to investigating the slow kinetics of the multistep OER, as they indicate the rate-determining step. While these rate constants are usually calculated based on first-principles simulations, this research aims to estimate them from experimental electrochemical impedance spectroscopy (EIS) data. Starting from a microkinetic model for charge transfer at the semiconductor-electrolyte interface, an expression for the impedance as a function of the rate constants is derived. At lower potentials, the order of this impedance model is reduced, thus eliminating the rate constants corresponding to the last reaction steps. Moreover, it is shown that EIS data from at least two potentials needs to be combined in order to uniquely identify the rate constants of a particular reduced order model. Therefore, this work details a sample maximum likelihood estimator that integrates not only multiple frequencies, but also multiple potentials simultaneously. Measuring multiple periods of the current density and potential signals, allows this frequency domain estimator to take measurement uncertainty into account. In addition, due to the large numerical range of the rate constants, various scaling methods are implemented to achieve numerical stability. To find suitable initial values for the highly nonlinear optimization problem, different global estimation methods are compared. The complete estimation procedure of the rate constants is illustrated on simulated EIS data of a hematite photoanode.
comment: 18 pages, 8 figures
LegionITS: A Federated Intrusion-Tolerant System Architecture
The growing sophistication, frequency, and diversity of cyberattacks increasingly exceed the capacity of individual entities to fully understand and counter them. While existing solutions, such as Security Information and Event Management (SIEM) systems, Security Orchestration, Automation, and Response (SOAR) platforms, and Security Operation Center (SOC), play a vital role in mitigating known threats, they often struggle to effectively address emerging and unforeseen attacks. To increase the effectiveness of cyber defense, it is essential to foster greater information sharing between entities; however, this requires addressing the challenge of exchanging sensitive data without compromising confidentiality or operational security. To address the challenges of secure and confidential Cyber Threat Intelligence (CTI) sharing, we propose a novel architecture that federates Intrusion Tolerant Systems (ITSs) and leverages concepts from Malware Information Sharing Platform (MISP) to empower SOCs. This framework enables controlled collaboration and data privacy while enhancing collective defenses. As a proof of concept, we evaluate one module by applying Differential Privacy (DP) to Federated Learning (FL), observing a manageable accuracy drop from 98.42% to 85.98% (average loss 12.44%) while maintaining reliable detection of compromised messages. These results highlight the viability of secure data sharing and establishes a foundation for the future full-scale implementation of LegionITS.
comment: 17 pages, 4 Figures, 1 Table
Trajectory Tracking for Multi-Manipulator Systems in Constrained Environments
We consider the problem of cooperative manipulation by a mobile multi-manipulator system operating in obstacle-cluttered and highly constrained environments under spatio-temporal task specifications. The task requires transporting a grasped object while respecting both continuous robot dynamics and discrete geometric constraints arising from obstacles and narrow passages. To address this hybrid structure, we propose a multi-rate planning and control framework that combines offline generation of an STL-satisfying object trajectory and collision-free base footprints with online constrained inverse kinematics and continuous-time feedback control. The resulting closed-loop system enables coordinated reconfiguration of multiple manipulators while tracking the desired object motion. The approach is evaluated in high-fidelity physics simulations using three Franka Emika Panda mobile manipulators rigidly grasping an object.
KalMRACO: Unifying Kalman Filter and Model Reference Adaptive Control for Robust Control and Estimation of Uncertain Systems
A common assumption when applying the Kalman filter is a priori knowledge of the system parameters. These parameters are not necessarily known, and this may limit real-world applications of the Kalman filter. The well-established Model Reference Adaptive Controller (MRAC) utilizes a known reference model and ensures that the input-output behavior of a potentially unknown system converges to that of the reference model. We present KalMRACO, a unification of the Kalman filter and MRAC leveraging the reference model of MRAC as the Kalman filter system model, thus eliminating, to a large degree, the need for knowledge of the underlying system parameters in the application of the Kalman filter. We also introduce the concept of blending estimated states and measurements in the feedback law to handle stability issues during the initial transient. KalMRACO is validated through simulations and lab trials on an underwater vehicle. Results show superior tracking of the reference model state, observer state convergence, and noise mitigation properties.
comment: 7 pages, 4 figures
Coordinated Fast Frequency Response from Electric Vehicles, Data Centers, and Battery Energy Storage Systems
High renewable penetration has significantly reduced system inertia in modern power grids, increasing the need for fast frequency response (FFR) from distributed and non-traditional resources. While electric vehicles (EVs), data centers, and battery energy storage systems (BESS) have each demonstrated the capability to provide sub-second active power support, their combined frequency response potential has not been systematically evaluated. This paper proposes a coordinated control framework that aggregates these heterogeneous resources to provide fast, stable, and reliable FFR. Dynamic models for EV fleets, data center UPS and workload modulation, and BESS are developed, explicitly capturing their response times, power limits, and operational constraints. A hierarchical control architecture is introduced, where an upper-level coordinator dynamically allocates FFR among resources based on response speed and available capacity, and lower-level controllers implement the actual power response. Case studies based on the IEEE 39-bus test system demonstrate that the coordinated EV-DC-BESS framework improves frequency nadir by up to 0.2 Hz, reduces RoCoF, and accelerates frequency recovery compared with single-resource FFR. Results confirm that synergistic coordination significantly enhances grid stability, especially in low-inertia scenarios. This work highlights the value of multi-resource aggregation for future frequency regulation markets in renewable-dominated grids.
Fast Frequency Response Potential of Data Centers through Workload Modulation and UPS Coordination
The rapid growth of renewable energy sources has significantly reduced system inertia and increased the need for fast frequency response (FFR) in modern power systems. Data centers, as large and flexible electrical consumers, hold great potential to contribute to frequency stabilization due to their controllable IT workloads and on-site uninterruptible power supply (UPS) systems. This paper investigates the feasibility of leveraging data centers for providing fast frequency response through real-time workload modulation and UPS coordination. A dynamic model combining data center power consumption and grid frequency dynamics is developed, capturing the interactions between IT servers, cooling systems, and energy storage. Control strategies based on frequency deviation are implemented to adjust server power and discharge UPS batteries during frequency events. Case studies on a modified IEEE 39-bus system demonstrate that the proposed strategy can effectively reduce frequency nadir and shorten recovery time without compromising service quality. The results highlight the promising role of data centers as grid-supporting resources in future low-inertia systems.
Optimizing Sensor Node Localization for Achieving Sustainable Smart Agriculture System Connectivity
The innovative agriculture system is revolutionizing how we farm, making it one of the most critical innovations of our time! Yet it faces significant connectivity challenges, particularly with the sensors that power this technology. An efficient sensor deployment solution is still required to maximize the network's detection capabilities and efficiency while minimizing resource consumption and operational costs. This paper introduces an innovative sensor allocation optimization method that employs a Gradient-Based Iteration with Lagrange. The proposed method enhances coverage by utilizing a hybrid approach while minimizing the number of sensor nodes required under grid-based allocation. The proposed sensor distribution outperformed the classic deterministic deployment across coverage, number of sensors, cost, and power consumption. Furthermore, scalability is enhanced by extending sensing coverage to the remaining area via Bluetooth, which has a shorter communication range. Moreover, the proposed algorithm achieved 98.5% wireless sensor coverage, compared with 95% for the particle swarm distribution.
Restless Multi-Process Multi-Armed Bandits with Applications to Self-Driving Microscopies
High-content screening microscopy generates large amounts of live-cell imaging data, yet its potential remains constrained by the inability to determine when and where to image most effectively. Optimally balancing acquisition time, computational capacity, and photobleaching budgets across thousands of dynamically evolving regions of interest remains an open challenge, further complicated by limited field-of-view adjustments and sensor sensitivity. Existing approaches either rely on static sampling or heuristics that neglect the dynamic evolution of biological processes, leading to inefficiencies and missed events. Here, we introduce the restless multi-process multi-armed bandit (RMPMAB), a new decision-theoretic framework in which each experimental region is modeled not as a single process but as an ensemble of Markov chains, thereby capturing the inherent heterogeneity of biological systems such as asynchronous cell cycles and heterogeneous drug responses. Building upon this foundation, we derive closed-form expressions for transient and asymptotic behaviors of aggregated processes, and design scalable Whittle index policies with sub-linear complexity in the number of imaging regions. Through both simulations and a real biological live-cell imaging dataset, we show that our approach achieves substantial improvements in throughput under resource constraints. Notably, our algorithm outperforms Thomson Sampling, Bayesian UCB, epsilon-Greedy, and Round Robin by reducing cumulative regret by more than 37% in simulations and capturing 93% more biologically relevant events in live imaging experiments, underscoring its potential for transformative smart microscopy. Beyond improving experimental efficiency, the RMPMAB framework unifies stochastic decision theory with optimal autonomous microscopy control, offering a principled approach to accelerate discovery across multidisciplinary sciences.
Assessing the Frequency Response Potential of Heavy-Duty Electric Vehicles with Vehicle-to-Grid Integration in the California Power System
The integration of heavy-duty electric vehicles (EVs) with Vehicle-to-Grid (V2G) capability can enhance primary frequency response and improve stability in power systems with high renewable penetration. This study evaluates the technical potential of heavy-duty EV fleets to support the California power grid under three practical charging strategies: immediate charging, delayed charging, and constant-minimum-power charging. We develop a simulation framework that couples aggregated frequency dynamics with battery and charger constraints, state-of-charge management, and fleet-availability profiles. Performance is assessed using standard frequency security metrics, including nadir, rate-of-change-of-frequency, overshoot, and settling time, across credible contingency scenarios and renewable generation conditions. Results indicate that both non-V2G modes and V2G-enabled operation can contribute meaningful primary response, with V2G providing the strongest and fastest support while respecting mobility and network limits. Sensitivity analyses show that the relative benefits depend on charging strategy, control parameters, and renewable output, highlighting design trade-offs between response magnitude, duration, and battery usage. Overall, heavy-duty EV fleets-when coordinated by appropriate charging and V2G controls-offer a viable resource for strengthening primary frequency control on the California grid and mitigating stability challenges associated with increasing renewable penetration.
comment: This work has been submitted to International Journal of Electrical Power & Energy System and is currently under consideration
Compensation of Coarse Quantization Effects on Channel Estimation and BER in Massive MIMO
Low-resolution quantization is essential to reduce implementation cost and power consumption in massive multiple-input multiple-output (MIMO) systems for 5G and 6G. While most existing studies assume perfect channel state information (CSI), we model the impact of coarse quantization noise on both channel estimation and data transmission, yielding a more realistic assessment of system performance under imperfect CSI conditions in the uplink. We develop a tight approximation for the bit-error ratio (BER) of uncoded M-QAM with zero-forcing detection, based on the linear minimum mean-square error (LMMSE) channel estimate. These analytical results enable compensation strategies that jointly optimize quantization resolution, transmit power, and pilot length across different numbers of users and base station antennas. We further demonstrate the applicability of the proposed framework through several design scenarios that highlight its effectiveness in optimizing system parameters and improving energy efficiency under quantization constraints. For example, in a 16-QAM system, extending the pilot sequence by 2.5 times and lowering transmit power by 0.5 dB enables a 3-bit quantized system to match the BER of the full-resolution case. The proposed framework offers a fast and accurate alternative to Monte Carlo simulations, enabling practical system optimization under realistic quantization constraints.
comment: 12 pages, submitted to IEEE Transactions
Competent Discrete Time Modeling For analogue controlled PWM Converter Considering State-Feedback
Ever since R.D.Middlebrook proposed the state space averaging notion. The small signal model has been widely used as a design tool to tune control parameters. As Moore's law is continuing and the AI chip's high demand for power consumption and dynamic response, the control bandwidth needs to be boosted. However, the average model has two basic assumptions: the low-frequency assumption, the small ripple assumption. In high-bandwidth design, these two assumptions are violated. In order to solve this, various methods have been proposed. This paper gives a comprehensive overview of the existing small signal model for PWM converters from the following perspectives: 1. model fidelity, 2. analytical tractability. 3. complexity of the derivation process and result 4.generality.
Closing the Loop: Motion Prediction Models beyond Open-Loop Benchmarks
Fueled by motion prediction competitions and benchmarks, recent years have seen the emergence of increasingly large learning based prediction models, many with millions of parameters, focused on improving open-loop prediction accuracy by mere centimeters. However, these benchmarks fail to assess whether such improvements translate to better performance when integrated into an autonomous driving stack. In this work, we systematically evaluate the interplay between state-of-the-art motion predictors and motion planners. Our results show that higher open-loop accuracy does not always correlate with better closed-loop driving behavior and that other factors, such as temporal consistency of predictions and planner compatibility, also play a critical role. Furthermore, we investigate downsized variants of these models, and, surprisingly, find that in some cases models with up to 86% fewer parameters yield comparable or even superior closed-loop driving performance. Our code is available at https://github.com/aumovio/pred2plan.
A constrained optimization approach to nonlinear system identification through simulation error minimization
This paper introduces a novel approach to system identification for nonlinear input-output models that minimizes the simulation error and frames the problem as a constrained optimization task. The proposed method addresses vanishing gradient issues, enabling faster convergence than traditional gradient-based techniques. We present an algorithm based on feedback linearization control of Lagrange multipliers and conduct a theoretical analysis of its performance. We prove that the algorithm converges to a local minimum, and it enhances computational efficiency by exploiting the problem's structure. Numerical experiments demonstrate that our approach outperforms gradient-based methods in both computational effort and estimation accuracy.
Data-fused MPC with Guarantees: Application to Flying Humanoid Robots
This paper introduces a Data-Fused Model Predictive Control (DFMPC) framework that combines physics-based models with data-driven representations of unknown dynamics. Leveraging Willems' Fundamental Lemma and an artificial equilibrium formulation, the method enables tracking of changing, potentially unreachable setpoints while explicitly handling measurement noise through slack variables and regularization. We provide guarantees of recursive feasibility and practical stability under input-output constraints for a specific class of reference signals. The approach is validated on the iRonCub flying humanoid robot, integrating analytical momentum models with data-driven turbine dynamics. Simulations show improved tracking and robustness compared to a purely model-based MPC, while maintaining real-time feasibility.
comment: This paper has been accepted for publication in IEEE Control Systems Letters (L-CSS)
Data-driven Implementations of Various Generalizations of Balanced Truncation
There exist two main approaches for non-intrusive implementations of approximate balanced truncation within the Loewner framework: the quadrature-based method [1] and the Alternating Direction Implicit (ADI)-based method [2]. Both approaches rely solely on samples of the transfer function to construct truncated balanced models, eliminating the need for access to the original model's statespace realization. Recently, the quadrature-based approach has been extended to various generalizations of balanced truncation, including positive-real balanced truncation, bounded-real balanced truncation, and balanced stochastic truncation. While this extension [3] is theoretically non-intrusive-meaning it does not require the original state-space realization-it depends on samples of spectral factorizations of the transfer function. Since practical methods for obtaining such samples are currently unavailable, this extension remains largely a theoretical contribution. In this work, we present a non-intrusive ADI-type framework for these generalized balanced truncation methods that requires only samples of the original transfer function for implementation.
The Phantom of Davis-Wielandt Shell: A Unified Framework for Graphical Stability Analysis of MIMO LTI Systems
This paper presents a unified framework based on Davis-Wielandt (DW) shell for graphical stability analysis of multi-input and multi-output linear time-invariant feedback systems. Connections between DW shells and various graphical representations, as well as gain and phase measures, are established through an intuitive geometric perspective. Within this framework, we map the relationships and relative conservatism among various separation conditions. A rotated scaled relative graph ($θ$-SRG) concept is proposed as a mixed gain-phase representation, from which a closed-loop stability criterion is derived and shown to be the least conservative among the existing 2-D graphical conditions for bi-component feedback loops. We also propose a reliable and generalizable algorithm for visualizing the $θ$-SRGs and include a system example to demonstrate the reduced conservatism of the proposed condition.
comment: 16 pages, 12 figures
Learning Robust and Correct Controllers Guided by Feasibility-Aware Signal Temporal Logic via BarrierNet
Control Barrier Functions (CBFs) have emerged as a powerful tool for enforcing safety in optimization-based controllers, and their integration with Signal Temporal Logic (STL) has enabled the specification-driven synthesis of complex robotic behaviors. However, existing CBF-STL approaches typically rely on fixed hyperparameters and myopic, per-time step optimization, which can lead to overly conservative behavior, infeasibility near tight input limits, and difficulty satisfying long-horizon STL tasks. To address these limitations, we propose a feasibility-aware learning framework that embeds trainable, time-varying High Order Control Barrier Functions (HOCBFs) into a differentiable Quadratic Program (dQP). Our approach provides a systematic procedure for constructing time-varying HOCBF constraints for a broad fragment of STL and introduces a unified robustness measure that jointly captures STL satisfaction, QP feasibility, and control-bound compliance. Three neural networks-InitNet, RefNet, and an extended BarrierNet-collaborate to generate reference inputs and adapt constraint-related hyperparameters automatically over time and across initial conditions, reducing conservativeness while maximizing robustness. The resulting controller achieves STL satisfaction with strictly feasible dQPs and requires no manual tuning. Simulation results demonstrate that the proposed framework maintains high STL robustness under tight input bounds and significantly outperforms fixed-parameter and non-adaptive baselines in complex environments.
comment: 16 pages, 11 figures
Meta-Reinforcement Learning for Building Energy Management System
The building sector is one of the largest contributors to global energy consumption. Improving its energy efficiency is essential for reducing operational costs and greenhouse gas emissions. Energy management systems (EMS) play a key role in monitoring and controlling building appliances efficiently and reliably. With the increasing integration of renewable energy, intelligent EMS solutions have received growing attention. Reinforcement learning (RL) has recently been explored for this purpose and shows strong potential. However, most RL-based EMS methods require a large number of training steps to learn effective control policies, especially when adapting to unseen buildings, which limits their practical deployment. This paper introduces MetaEMS, a meta-reinforcement learning framework for EMS. MetaEMS improves learning efficiency by transferring knowledge from previously solved tasks to new ones through group-level and building-level adaptation, enabling fast adaptation and effective control across diverse building environments. Experimental results demonstrate that MetaEMS adapts more rapidly to unseen buildings and consistently outperforms baseline methods across various scenarios.
comment: arXiv admin note: text overlap with arXiv:1909.10165 by other authors
Standards-Compliant DM-RS Allocation via Temporal Channel Prediction for Massive MIMO Systems
Reducing feedback overhead in beyond 5G networks is a critical challenge, as the growing number of antennas in modern massive MIMO systems substantially increases the channel state information (CSI) feedback demand in frequency division duplex (FDD) systems. To address this, extensive research has focused on CSI compression and prediction, with neural network-based approaches gaining momentum and being considered for integration into the 3GPP 5G-Advanced standards. While deep learning has been effectively applied to CSI-limited beamforming and handover optimization, reference signal allocation under such constraints remains surprisingly underexplored. To fill this gap, we introduce the concept of channel prediction-based reference signal allocation (CPRS), which jointly optimizes channel prediction and DM-RS allocation to improve data throughput without requiring CSI feedback. We further propose a standards-compliant ViViT/CNN-based architecture that implements CPRS by treating evolving CSI matrices as sequential image-like data, enabling efficient and adaptive transmission in dynamic environments. Simulation results using ray-tracing channel data generated in NVIDIA Sionna validate the proposed method, showing up to 36.60% throughput improvement over benchmark strategies.
On the Impact of Voltage Unbalance on Distribution Locational Marginal Prices
Finding clear economic signals for distribution-network operation and expansion is increasingly important as single-phase loads and distributed energy resources escalate. These devices create phase-to-phase imbalances that manifest as voltage unbalance, a power quality issue that accelerates insulation aging in machines and increases network losses, thereby raising costs for operators and consumers. Traditional grid codes address unbalance via disparate hard limits on various indices thresholds that differ across standards, offer no dynamic economic incentive and undermine optimality. This paper proposes instead to treat voltage unbalance as a `soft limit' by adding penalty terms to grid operation costs within a three-phase optimal power flow to reflect the cost of the decrease in lifetime of assets due to being subject to voltage unbalance. This unified approach yields dynamic economic signals unbalance-aware Distribution Locational Marginal Prices (DLMP) that reflect the cost of power quality deviations. A novel mathematical decomposition of DLMP is developed, isolating the energy, loss, congestion, and unbalance components. Case studies conducted on two benchmark networks demonstrate the effectiveness and practical value of the proposed method. The results indicate that unbalance penalties reshape nodal prices, produce unexpected phase-level effects, and even allow scenarios where added load reduces unbalance and lowers costs, while providing planners and market designers with actionable insights to balance investment, operation, and power quality in modern distribution systems.
EDMD-Based Robust Observer Synthesis for Nonlinear Systems
This paper presents a data-driven Koopman operator-based approach for designing robust state observers for nonlinear systems. Based on a finite-dimensional surrogate of the Koopman generator, identified via an extended dynamic mode decomposition (EDMD) procedure, a tractable formulation of the observer design problem is enabled on the data-driven model with conic uncertainties. The resulting problem is cast as a semidefinite program (SDP) with linear matrix inequalities (LMIs), guaranteeing exponential convergence of the observer with a predetermined rate in a probabilistic sense. The approach bridges the gap between statistical error tolerance and observer convergence certification, and enables an explicit use of linear systems theory for nonlinear observation in a data-driven framework. Numerical studies demonstrate the effectiveness and flexibility of the proposed method.
comment: 9 pages, 2 figures. Submitted to Computers and Chemical Engineering, 2025
Robotics
NL2SpaTiaL: Generating Geometric Spatio-Temporal Logic Specifications from Natural Language for Manipulation Tasks
Spatio-Temporal Logic (SpaTiaL) offers a principled formalism for expressing geometric spatial requirements-an essential component of robotic manipulation, where object locations, neighborhood relations, pose constraints, and interactions directly determine task success. Yet prior works have largely relied on standard temporal logic (TL), which models only robot trajectories and overlooks object-level interactions. Existing datasets built from randomly generated TL formulas paired with natural-language descriptions therefore cover temporal operators but fail to represent the layered spatial relations that manipulation tasks depend on. To address this gap, we introduce a dataset generation framework that synthesizes SpaTiaL specifications and converts them into natural-language descriptions through a deterministic, semantics-preserving back-translation procedure. This pipeline produces the NL2SpaTiaL dataset, aligning natural language with multi-level spatial relations and temporal objectives to reflect the compositional structure of manipulation tasks. Building on this foundation, we propose a translation-verification framework equipped with a language-based semantic checker that ensures the generated SpaTiaL formulas faithfully encode the semantics specified by the input description. Experiments across a suite of manipulation tasks show that SpaTiaL-based representations yield more interpretable, verifiable, and compositional grounding for instruction following. Project website: https://sites.google.com/view/nl2spatial
RoboTracer: Mastering Spatial Trace with Reasoning in Vision-Language Models for Robotics
Spatial tracing, as a fundamental embodied interaction ability for robots, is inherently challenging as it requires multi-step metric-grounded reasoning compounded with complex spatial referring and real-world metric measurement. However, existing methods struggle with this compositional task. To this end, we propose RoboTracer, a 3D-aware VLM that first achieves both 3D spatial referring and measuring via a universal spatial encoder and a regression-supervised decoder to enhance scale awareness during supervised fine-tuning (SFT). Moreover, RoboTracer advances multi-step metric-grounded reasoning via reinforcement fine-tuning (RFT) with metric-sensitive process rewards, supervising key intermediate perceptual cues to accurately generate spatial traces. To support SFT and RFT training, we introduce TraceSpatial, a large-scale dataset of 30M QA pairs, spanning outdoor/indoor/tabletop scenes and supporting complex reasoning processes (up to 9 steps). We further present TraceSpatial-Bench, a challenging benchmark filling the gap to evaluate spatial tracing. Experimental results show that RoboTracer surpasses baselines in spatial understanding, measuring, and referring, with an average success rate of 79.1%, and also achieves SOTA performance on TraceSpatial-Bench by a large margin, exceeding Gemini-2.5-Pro by 36% accuracy. Notably, RoboTracer can be integrated with various control policies to execute long-horizon, dynamic tasks across diverse robots (UR5, G1 humanoid) in cluttered real-world scenes.
comment: Project page: https://zhoues.github.io/RoboTracer
World Models Can Leverage Human Videos for Dexterous Manipulation
Dexterous manipulation is challenging because it requires understanding how subtle hand motion influences the environment through contact with objects. We introduce DexWM, a Dexterous Manipulation World Model that predicts the next latent state of the environment conditioned on past states and dexterous actions. To overcome the scarcity of dexterous manipulation datasets, DexWM is trained on over 900 hours of human and non-dexterous robot videos. To enable fine-grained dexterity, we find that predicting visual features alone is insufficient; therefore, we introduce an auxiliary hand consistency loss that enforces accurate hand configurations. DexWM outperforms prior world models conditioned on text, navigation, and full-body actions, achieving more accurate predictions of future states. DexWM also demonstrates strong zero-shot generalization to unseen manipulation skills when deployed on a Franka Panda arm equipped with an Allegro gripper, outperforming Diffusion Policy by over 50% on average in grasping, placing, and reaching tasks.
MindDrive: A Vision-Language-Action Model for Autonomous Driving via Online Reinforcement Learning
Current Vision-Language-Action (VLA) paradigms in autonomous driving primarily rely on Imitation Learning (IL), which introduces inherent challenges such as distribution shift and causal confusion. Online Reinforcement Learning offers a promising pathway to address these issues through trial-and-error learning. However, applying online reinforcement learning to VLA models in autonomous driving is hindered by inefficient exploration in continuous action spaces. To overcome this limitation, we propose MindDrive, a VLA framework comprising a large language model (LLM) with two distinct sets of LoRA parameters. The one LLM serves as a Decision Expert for scenario reasoning and driving decision-making, while the other acts as an Action Expert that dynamically maps linguistic decisions into feasible trajectories. By feeding trajectory-level rewards back into the reasoning space, MindDrive enables trial-and-error learning over a finite set of discrete linguistic driving decisions, instead of operating directly in a continuous action space. This approach effectively balances optimal decision-making in complex scenarios, human-like driving behavior, and efficient exploration in online reinforcement learning. MindDrive achieves strong closed-loop performance on the challenging Bench2Drive benchmark, with a Driving Score (DS) of 78.04 and a Success Rate (SR) of 55.09%. To the best of our knowledge, this is the first work to demonstrate the effectiveness of online reinforcement learning for the VLA model in autonomous driving.
comment: 16 pages, 12 figures, 6 tables; Project Page: https://xiaomi-mlab.github.io/MindDrive/
Near-Field Perception for Safety Enhancement of Autonomous Mobile Robots in Manufacturing Environments
Near-field perception is essential for the safe operation of autonomous mobile robots (AMRs) in manufacturing environments. Conventional ranging sensors such as light detection and ranging (LiDAR) and ultrasonic devices provide broad situational awareness but often fail to detect small objects near the robot base. To address this limitation, this paper presents a three-tier near-field perception framework. The first approach employs light-discontinuity detection, which projects a laser stripe across the near-field zone and identifies interruptions in the stripe to perform fast, binary cutoff sensing for obstacle presence. The second approach utilizes light-displacement measurement to estimate object height by analyzing the geometric displacement of a projected stripe in the camera image, which provides quantitative obstacle height information with minimal computational overhead. The third approach employs a computer vision-based object detection model on embedded AI hardware to classify objects, enabling semantic perception and context-aware safety decisions. All methods are implemented on a Raspberry Pi 5 system, achieving real-time performance at 25 or 50 frames per second. Experimental evaluation and comparative analysis demonstrate that the proposed hierarchy balances precision, computation, and cost, thereby providing a scalable perception solution for enabling safe operations of AMRs in manufacturing environments.
comment: Submitted to the 54th SME North American Manufacturing Research Conference (NAMRC 54)
Reinforcement Learning based 6-DoF Maneuvers for Microgravity Intravehicular Docking: A Simulation Study with Int-Ball2 in ISS-JEM
Autonomous free-flyers play a critical role in intravehicular tasks aboard the International Space Station (ISS), where their precise docking under sensing noise, small actuation mismatches, and environmental variability remains a nontrivial challenge. This work presents a reinforcement learning (RL) framework for six-degree-of-freedom (6-DoF) docking of JAXA's Int-Ball2 robot inside a high-fidelity Isaac Sim model of the Japanese Experiment Module (JEM). Using Proximal Policy Optimization (PPO), we train and evaluate controllers under domain-randomized dynamics and bounded observation noise, while explicitly modeling propeller drag-torque effects and polarity structure. This enables a controlled study of how Int-Ball2's propulsion physics influence RL-based docking performance in constrained microgravity interiors. The learned policy achieves stable and reliable docking across varied conditions and lays the groundwork for future extensions pertaining to Int-Ball2 in collision-aware navigation, safe RL, propulsion-accurate sim-to-real transfer, and vision-based end-to-end docking.
comment: Presented at AI4OPA Workshop at the International Conference on Space Robotics (iSpaRo) 2025 at Sendai, Japan
Evaluating the Navigation Capabilities of a Modified COAST Guidewire Robot in an Anatomical Phantom Model
To address the issues that arise due to the manual navigation of guidewires in endovascular interventions, research in medical robotics has taken a strong interest in developing robotically steerable guidewires, which offer the possibility of enhanced maneuverability and navigation, as the tip of the guidewire can be actively steered. The COaxially Aligned STeerable (COAST) guidewire robot has the ability to generate a wide variety of motions including bending motion with different bending lengths, follow-the-leader motion, and feedforward motion. In our past studies, we have explored different designs of the COAST guidewire robot and developed modeling, control, and sensing strategies for the COAST guidewire robot. In this study, the performance of a modified COAST guidewire robot is evaluated by conducting navigation experiments in an anatomical phantom model with pulsatile flow. The modified COAST guidewire robot is a simplified version of the COAST guidewire robot and consists of two tubes as opposed to three tubes. Through this study, we demonstrate the effectiveness of the modified COAST guidewire robot in navigating the tortuous phantom vasculature.
comment: Presented at the 14th Conference on New Technologies for Computer and Robot Assisted Surgery (CRAS 2025)
Universal Dexterous Functional Grasping via Demonstration-Editing Reinforcement Learning
Reinforcement learning (RL) has achieved great success in dexterous grasping, significantly improving grasp performance and generalization from simulation to the real world. However, fine-grained functional grasping, which is essential for downstream manipulation tasks, remains underexplored and faces several challenges: the complexity of specifying goals and reward functions for functional grasps across diverse objects, the difficulty of multi-task RL exploration, and the challenge of sim-to-real transfer. In this work, we propose DemoFunGrasp for universal dexterous functional grasping. We factorize functional grasping conditions into two complementary components - grasping style and affordance - and integrate them into an RL framework that can learn to grasp any object with any functional grasping condition. To address the multi-task optimization challenge, we leverage a single grasping demonstration and reformulate the RL problem as one-step demonstration editing, substantially enhancing sample efficiency and performance. Experimental results in both simulation and the real world show that DemoFunGrasp generalizes to unseen combinations of objects, affordances, and grasping styles, outperforming baselines in both success rate and functional grasping accuracy. In addition to strong sim-to-real capability, by incorporating a vision-language model (VLM) for planning, our system achieves autonomous instruction-following grasp execution.
comment: 19 pages
Fast Policy Learning for 6-DOF Position Control of Underwater Vehicles
Autonomous Underwater Vehicles (AUVs) require reliable six-degree-of-freedom (6-DOF) position control to operate effectively in complex and dynamic marine environments. Traditional controllers are effective under nominal conditions but exhibit degraded performance when faced with unmodeled dynamics or environmental disturbances. Reinforcement learning (RL) provides a powerful alternative but training is typically slow and sim-to-real transfer remains challenging. This work introduces a GPU-accelerated RL training pipeline built in JAX and MuJoCo-XLA (MJX). By jointly JIT-compiling large-scale parallel physics simulation and learning updates, we achieve training times of under two minutes.Through systematic evaluation of multiple RL algorithms, we show robust 6-DOF trajectory tracking and effective disturbance rejection in real underwater experiments, with policies transferred zero-shot from simulation. Our results provide the first explicit real-world demonstration of RL-based AUV position control across all six degrees of freedom.
Control of a Twin Rotor using Twin Delayed Deep Deterministic Policy Gradient (TD3)
This paper proposes a reinforcement learning (RL) framework for controlling and stabilizing the Twin Rotor Aerodynamic System (TRAS) at specific pitch and azimuth angles and tracking a given trajectory. The complex dynamics and non-linear characteristics of the TRAS make it challenging to control using traditional control algorithms. However, recent developments in RL have attracted interest due to their potential applications in the control of multirotors. The Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm was used in this paper to train the RL agent. This algorithm is used for environments with continuous state and action spaces, similar to the TRAS, as it does not require a model of the system. The simulation results illustrated the effectiveness of the RL control method. Next, external disturbances in the form of wind disturbances were used to test the controller's effectiveness compared to conventional PID controllers. Lastly, experiments on a laboratory setup were carried out to confirm the controller's effectiveness in real-world applications.
comment: This is the Author Accepted Manuscript version of a paper accepted for publication. The final published version is available via IEEE Xplore
Humanoid Robot Running Through Random Stepping Stones and Jumping Over Obstacles: Step Adaptation Using Spring-Mass Trajectories
This study proposes a step adaptation framework for running through spring-mass trajectories and deadbeat control gain libraries. It includes four main parts: (1) Automatic spring-mass trajectory library generation; (2) Deadbeat control gain library generation through an actively controlled template model that resembles the whole-body dynamics well; (3) Trajectory selection policy development for step adaptation; (4) Mapping spring-mass trajectories to a humanoid model through a whole-body control (WBC) framework also accounting for closed-kinematic chain systems, self collisions, and reactive limb swinging. We show the inclusiveness and the robustness of the proposed framework through various challenging and agile behaviors such as running through randomly generated stepping stones, jumping over random obstacles, performing slalom motions, changing the running direction suddenly with a random leg, and rejecting significant disturbances and uncertainties through the MuJoCo physics simulator. We also perform additional simulations under a comprehensive set of uncertainties and noise to better justify the proposed method's robustness to real-world challenges, including signal noise, imprecision, modeling errors, and delays. All the aforementioned behaviors are performed with a single library and the same set of WBC control parameters without additional tuning. The spring-mass and the deadbeat control gain library are automatically computed in 4.5 seconds in total for 315 different trajectories.
comment: Accepted for publication in Biomimetic Intelligence and Robotics. Supplemental video: https://youtu.be/HlAg2nbNct4
Intrinsic-Motivation Multi-Robot Social Formation Navigation with Coordinated Exploration
This paper investigates the application of reinforcement learning (RL) to multi-robot social formation navigation, a critical capability for enabling seamless human-robot coexistence. While RL offers a promising paradigm, the inherent unpredictability and often uncooperative dynamics of pedestrian behavior pose substantial challenges, particularly concerning the efficiency of coordinated exploration among robots. To address this, we propose a novel coordinated-exploration multi-robot RL algorithm introducing an intrinsic motivation exploration. Its core component is a self-learning intrinsic reward mechanism designed to collectively alleviate policy conservatism. Moreover, this algorithm incorporates a dual-sampling mode within the centralized training and decentralized execution framework to enhance the representation of both the navigation policy and the intrinsic reward, leveraging a two-time-scale update rule to decouple parameter updates. Empirical results on social formation navigation benchmarks demonstrate the proposed algorithm's superior performance over existing state-of-the-art methods across crucial metrics. Our code and video demos are available at: https://github.com/czxhunzi/CEMRRL.
SAMAY: System for Acoustic Measurement and Analysis
This paper describes an automatic bird call recording system called SAMAY, which is developed to study bird species by creating a database of large amounts of bird acoustic data. By analysing the recorded bird call data, the system can also be used for automatic classification of bird species, monitoring bird populations and analysing the impact of environmental changes. The system is driven through a powerful STM32F407 series microcontroller, supports 4 microphones, is equipped with 128 GB of storage capacity, and is powered by a 10400 mAh battery pack interfaced with a solar charger. In addition, the device is user-configurable over USB and Wi-Fi during runtime, ensuring user-friendly operation during field deployment.
Lightweight Dynamic Modeling of Cable-Driven Continuum Robots Based on Actuation-Space Energy Formulation
Cable-driven continuum robots (CDCRs) require accurate, real-time dynamic models for high-speed dynamics prediction or model-based control, making such capability an urgent need. In this paper, we propose the Lightweight Actuation-Space Energy Modeling (LASEM) framework for CDCRs, which formulates actuation potential energy directly in actuation space to enable lightweight yet accurate dynamic modeling. Through a unified variational derivation, the governing dynamics reduce to a single partial differential equation (PDE), requiring only the Euler moment balance while implicitly incorporating the Newton force balance. By also avoiding explicit computation of cable-backbone contact forces, the formulation simplifies the model structure and improves computational efficiency while preserving geometric accuracy and physical consistency. Importantly, the proposed framework for dynamic modeling natively supports both force-input and displacement-input actuation modes, a capability seldom achieved in existing dynamic formulations. Leveraging this lightweight structure, a Galerkin space-time modal discretization with analytical time-domain derivatives of the reduced state further enables an average 62.3% computational speedup over state-of-the-art real-time dynamic modeling approaches.
Post-Training and Test-Time Scaling of Generative Agent Behavior Models for Interactive Autonomous Driving
Learning interactive motion behaviors among multiple agents is a core challenge in autonomous driving. While imitation learning models generate realistic trajectories, they often inherit biases from datasets dominated by safe demonstrations, limiting robustness in safety-critical cases. Moreover, most studies rely on open-loop evaluation, overlooking compounding errors in closed-loop execution. We address these limitations with two complementary strategies. First, we propose Group Relative Behavior Optimization (GRBO), a reinforcement learning post-training method that fine-tunes pretrained behavior models via group relative advantage maximization with human regularization. Using only 10% of the training dataset, GRBO improves safety performance by over 40% while preserving behavioral realism. Second, we introduce Warm-K, a warm-started Top-K sampling strategy that balances consistency and diversity in motion selection. Our Warm-K method-based test-time scaling enhances behavioral consistency and reactivity at test time without retraining, mitigating covariate shift and reducing performance discrepancies. Demo videos are available in the supplementary material.
comment: 11 pages, 5 figures
A Unified Framework for Automated Assembly Sequence and Production Line Planning using Graph-based Optimization
This paper presents PyCAALP (Python-based Computer-Aided Assembly Line Planning), a framework for automated Assembly Sequence Planning (ASP) and Production Line Planning (PLP), employing a graph-based approach to model components and joints within production modules. The framework integrates kinematic boundary conditions, such as potential part collisions, to guarantee the feasibility of automated assembly planning. The developed algorithm computes all feasible production sequences, integrating modules for detecting spatial relationships and formulating geometric constraints. The algorithm incorporates additional attributes, including handling feasibility, tolerance matching, and joint compatibility, to manage the high combinatorial complexity inherent in assembly sequence generation. Heuristics, such as Single-Piece Flow assembly and geometrical constraint enforcement, are utilized to further refine the solution space, facilitating more efficient planning for complex assemblies. The PLP stage is formulated as a Mixed-Integer Program (MIP), balancing the total times of a fixed number of manufacturing stations. While some complexity reduction techniques may sacrifice optimality, they significantly reduce the MIPs computational time. Furthermore, the framework enables customization of engineering constraints and supports a flexible trade-off between ASP and PLP. The open-source nature of the framework, available at https://github.com/TUM-utg/PyCAALP, promotes further collaboration and adoption in both industrial and production research applications.
comment: Code available at https://github.com/TUM-utg/PyCAALP (repository will be made public prior to publication)
Multi-directional Safe Rectangle Corridor-Based MPC for Nonholonomic Robots Navigation in Cluttered Environment
Autonomous Mobile Robots (AMRs) have become indispensable in industrial applications due to their operational flexibility and efficiency. Navigation serves as a crucial technical foundation for accomplishing complex tasks. However, navigating AMRs in dense, cluttered, and semi-structured environments remains challenging, primarily due to nonholonomic vehicle dynamics, interactions with mixed static/dynamic obstacles, and the non-convex constrained nature of such operational spaces. To solve these problems, this paper proposes an Improved Sequential Model Predictive Control (ISMPC) navigation framework that systematically reformulates navigation tasks as sequential switched optimal control problems. The framework addresses the aforementioned challenges through two key innovations: 1) Implementation of a Multi-Directional Safety Rectangular Corridor (MDSRC) algorithm, which encodes the free space through rectangular convex regions to avoid collision with static obstacles, eliminating redundant computational burdens and accelerating solver convergence; 2) A sequential MPC navigation framework that integrates corridor constraints with barrier function constraints is proposed to achieve static and dynamic obstacle avoidance. The ISMPC navigation framework enables direct velocity generation for AMRs, simplifying traditional navigation algorithm architectures. Comparative experiments demonstrate the framework's superiority in free-space utilization ( an increase of 41.05$\%$ in the average corridor area) while maintaining real-time computational performance (average corridors generation latency of 3 ms).
comment: 9 pages, 11 figures, conference paper for the 2025 International Conference on Advanced Robotics and Mechatronics (ICARM), accepted
Differentiable Material Point Method for the Control of Deformable Objects
Controlling the deformation of flexible objects is challenging due to their non-linear dynamics and high-dimensional configuration space. This work presents a differentiable Material Point Method (MPM) simulator targeted at control applications. We exploit the differentiability of the simulator to optimize a control trajectory in an active damping problem for a hyperelastic rope. The simulator effectively minimizes the kinetic energy of the rope around 2$\times$ faster than a baseline MPPI method and to a 20% lower energy level, while using about 3% of the computation time.
comment: 7 Pages, 4 Figures, 1 Table
ALBATROSS: A robotised system for high-throughput electrolyte screening via automated electrolyte formulation, coin-cell fabrication, and electrochemical evaluation
As battery technologies advance toward higher stability and energy density, the need for extensive cell-level testing across various component configurations becomes critical. To evaluate performance and understand the operating principles of batteries in laboratory scale, fabrication and evaluation of coin cells are essential processes. However, the conventional coin-cell assembly and testing processes require significant time and labor from researchers, posing challenges to high-throughput screening research. In this study, we introduce an Automated Li-ion BAttery Testing RObot SyStem (ALBATROSS), an automated system capable of electrolyte formulation, coin-cell assembly, and electrochemical evaluation. The system, integrated within a argon-filled glovebox, enables fully automated assembly and testing of up to 48 cells without researcher intervention. By incorporating custom-designed robot gripper and 3D-printed structures optimized for precise cell handling, ALBATROSS achieved high assembly reliability, yielding a relative standard deviation (RSD) of less than 1.2% in discharge capacity and a standard deviation of less than 3 Ω in EIS measurements for NCM811||Li half cells. Owing to its high reliability and automation capability, ALBATROSS allows for the acquisition of high-quality coin-cell datasets, which are expected to accelerate the development of next-generation electrolytes.
Efficient Generation of Smooth Paths with Curvature Guarantees by Mollification
Most path following and trajectory tracking algorithms in mobile robotics require the desired path or trajectory to be defined by at least twice continuously differentiable functions to guarantee key properties such as global convergence, especially for nonholonomic robots like unicycles with speed constraints. Consequently, these algorithms typically exclude continuous but non-differentiable paths, such as piecewise functions. Despite this exclusion, such paths provide convenient high-level inputs for describing robot missions or behavior. While techniques such as spline interpolation or optimization-based methods are commonly used to smooth non-differentiable paths or create feasible ones from sequences of waypoints, they either can produce unnecessarily complex trajectories or are computationally expensive. In this work, we present a method to regularize non-differentiable functions and generate feasible paths through mollification. Specifically, we approximate an arbitrary path with a differentiable function that can converge to it with arbitrary precision. Additionally, we provide a systematic method for bounding the curvature of generated paths, which we demonstrate by applying it to paths resulting from linking a sequence of waypoints with segments. The proposed approach is computationally efficient, enabling real-time implementation on microcontrollers and compatibility with standard trajectory tracking and path following algorithms.
MMDrive: Interactive Scene Understanding Beyond Vision with Multi-representational Fusion
Vision-language models enable the understanding and reasoning of complex traffic scenarios through multi-source information fusion, establishing it as a core technology for autonomous driving. However, existing vision-language models are constrained by the image understanding paradigm in 2D plane, which restricts their capability to perceive 3D spatial information and perform deep semantic fusion, resulting in suboptimal performance in complex autonomous driving environments. This study proposes MMDrive, an multimodal vision-language model framework that extends traditional image understanding to a generalized 3D scene understanding framework. MMDrive incorporates three complementary modalities, including occupancy maps, LiDAR point clouds, and textual scene descriptions. To this end, it introduces two novel components for adaptive cross-modal fusion and key information extraction. Specifically, the Text-oriented Multimodal Modulator dynamically weights the contributions of each modality based on the semantic cues in the question, guiding context-aware feature integration. The Cross-Modal Abstractor employs learnable abstract tokens to generate compact, cross-modal summaries that highlight key regions and essential semantics. Comprehensive evaluations on the DriveLM and NuScenes-QA benchmarks demonstrate that MMDrive achieves significant performance gains over existing vision-language models for autonomous driving, with a BLEU-4 score of 54.56 and METEOR of 41.78 on DriveLM, and an accuracy score of 62.7% on NuScenes-QA. MMDrive effectively breaks the traditional image-only understanding barrier, enabling robust multimodal reasoning in complex driving environments and providing a new foundation for interpretable autonomous driving scene understanding.
Iterative Tuning of Nonlinear Model Predictive Control for Robotic Manufacturing Tasks
Manufacturing processes are often perturbed by drifts in the environment and wear in the system, requiring control re-tuning even in the presence of repetitive operations. This paper presents an iterative learning framework for automatic tuning of Nonlinear Model Predictive Control (NMPC) weighting matrices based on task-level performance feedback. Inspired by norm-optimal Iterative Learning Control (ILC), the proposed method adaptively adjusts NMPC weights Q and R across task repetitions to minimize key performance indicators (KPIs) related to tracking accuracy, control effort, and saturation. Unlike gradient-based approaches that require differentiating through the NMPC solver, we construct an empirical sensitivity matrix, enabling structured weight updates without analytic derivatives. The framework is validated through simulation on a UR10e robot performing carbon fiber winding on a tetrahedral core. Results demonstrate that the proposed approach converges to near-optimal tracking performance (RMSE within 0.3% of offline Bayesian Optimization (BO)) in just 4 online repetitions, compared to 100 offline evaluations required by BO algorithm. The method offers a practical solution for adaptive NMPC tuning in repetitive robotic tasks, combining the precision of carefully optimized controllers with the flexibility of online adaptation.
START: Traversing Sparse Footholds with Terrain Reconstruction
Traversing terrains with sparse footholds like legged animals presents a promising yet challenging task for quadruped robots, as it requires precise environmental perception and agile control to secure safe foot placement while maintaining dynamic stability. Model-based hierarchical controllers excel in laboratory settings, but suffer from limited generalization and overly conservative behaviors. End-to-end learning-based approaches unlock greater flexibility and adaptability, but existing state-of-the-art methods either rely on heightmaps that introduce noise and complex, costly pipelines, or implicitly infer terrain features from egocentric depth images, often missing accurate critical geometric cues and leading to inefficient learning and rigid gaits. To overcome these limitations, we propose START, a single-stage learning framework that enables agile, stable locomotion on highly sparse and randomized footholds. START leverages only low-cost onboard vision and proprioception to accurately reconstruct local terrain heightmap, providing an explicit intermediate representation to convey essential features relevant to sparse foothold regions. This supports comprehensive environmental understanding and precise terrain assessment, reducing exploration cost and accelerating skill acquisition. Experimental results demonstrate that START achieves zero-shot transfer across diverse real-world scenarios, showcasing superior adaptability, precise foothold placement, and robust locomotion.
OXE-AugE: A Large-Scale Robot Augmentation of OXE for Scaling Cross-Embodiment Policy Learning
Large and diverse datasets are needed for training generalist robot policies that have potential to control a variety of robot embodiments -- robot arm and gripper combinations -- across diverse tasks and environments. As re-collecting demonstrations and retraining for each new hardware platform are prohibitively costly, we show that existing robot data can be augmented for transfer and generalization. The Open X-Embodiment (OXE) dataset, which aggregates demonstrations from over 60 robot datasets, has been widely used as the foundation for training generalist policies. However, it is highly imbalanced: the top four robot types account for over 85\% of its real data, which risks overfitting to robot-scene combinations. We present AugE-Toolkit, a scalable robot augmentation pipeline, and OXE-AugE, a high-quality open-source dataset that augments OXE with 9 different robot embodiments. OXE-AugE provides over 4.4 million trajectories, more than triple the size of the original OXE. We conduct a systematic study of how scaling robot augmentation impacts cross-embodiment learning. Results suggest that augmenting datasets with diverse arms and grippers improves policy performance not only on the augmented robots, but also on unseen robots and even the original robots under distribution shifts. In physical experiments, we demonstrate that state-of-the-art generalist policies such as OpenVLA and $π_0$ benefit from fine-tuning on OXE-AugE, improving success rates by 24-45% on previously unseen robot-gripper combinations across four real-world manipulation tasks. Project website: https://OXE-AugE.github.io/.
Sequence of Expert: Boosting Imitation Planners for Autonomous Driving through Temporal Alternation
Imitation learning (IL) has emerged as a central paradigm in autonomous driving. While IL excels in matching expert behavior in open-loop settings by minimizing per-step prediction errors, its performance degrades unexpectedly in closed-loop due to the gradual accumulation of small, often imperceptible errors over time.Over successive planning cycles, these errors compound, potentially resulting in severe failures.Current research efforts predominantly rely on increasingly sophisticated network architectures or high-fidelity training datasets to enhance the robustness of IL planners against error accumulation, focusing on the state-level robustness at a single time point. However, autonomous driving is inherently a continuous-time process, and leveraging the temporal scale to enhance robustness may provide a new perspective for addressing this issue.To this end, we propose a method termed Sequence of Experts (SoE), a temporal alternation policy that enhances closed-loop performance without increasing model size or data requirements. Our experiments on large-scale autonomous driving benchmarks nuPlan demonstrate that SoE method consistently and significantly improves the performance of all the evaluated models, and achieves state-of-the-art performance.This module may provide a key and widely applicable support for improving the training efficiency of autonomous driving models.
PvP: Data-Efficient Humanoid Robot Learning with Proprioceptive-Privileged Contrastive Representations
Achieving efficient and robust whole-body control (WBC) is essential for enabling humanoid robots to perform complex tasks in dynamic environments. Despite the success of reinforcement learning (RL) in this domain, its sample inefficiency remains a significant challenge due to the intricate dynamics and partial observability of humanoid robots. To address this limitation, we propose PvP, a Proprioceptive-Privileged contrastive learning framework that leverages the intrinsic complementarity between proprioceptive and privileged states. PvP learns compact and task-relevant latent representations without requiring hand-crafted data augmentations, enabling faster and more stable policy learning. To support systematic evaluation, we develop SRL4Humanoid, the first unified and modular framework that provides high-quality implementations of representative state representation learning (SRL) methods for humanoid robot learning. Extensive experiments on the LimX Oli robot across velocity tracking and motion imitation tasks demonstrate that PvP significantly improves sample efficiency and final performance compared to baseline SRL methods. Our study further provides practical insights into integrating SRL with RL for humanoid WBC, offering valuable guidance for data-efficient humanoid robot learning.
comment: 13 pages, 12 figures
Multi-Robot Motion Planning from Vision and Language using Heat-Inspired Diffusion
Diffusion models have recently emerged as powerful tools for robot motion planning by capturing the multi-modal distribution of feasible trajectories. However, their extension to multi-robot settings with flexible, language-conditioned task specifications remains limited. Furthermore, current diffusion-based approaches incur high computational cost during inference and struggle with generalization because they require explicit construction of environment representations and lack mechanisms for reasoning about geometric reachability. To address these limitations, we present Language-Conditioned Heat-Inspired Diffusion (LCHD), an end-to-end vision-based framework that generates language-conditioned, collision-free trajectories. LCHD integrates CLIP-based semantic priors with a collision-avoiding diffusion kernel serving as a physical inductive bias that enables the planner to interpret language commands strictly within the reachable workspace. This naturally handles out-of-distribution scenarios -- in terms of reachability -- by guiding robots toward accessible alternatives that match the semantic intent, while eliminating the need for explicit obstacle information at inference time. Extensive evaluations on diverse real-world-inspired maps, along with real-robot experiments, show that LCHD consistently outperforms prior diffusion-based planners in success rate, while reducing planning latency.
Spatial-Aware VLA Pretraining through Visual-Physical Alignment from Human Videos
Vision-Language-Action (VLA) models provide a promising paradigm for robot learning by integrating visual perception with language-guided policy learning. However, most existing approaches rely on 2D visual inputs to perform actions in 3D physical environments, creating a significant gap between perception and action grounding. To bridge this gap, we propose a Spatial-Aware VLA Pretraining paradigm that performs explicit alignment between visual space and physical space during pretraining, enabling models to acquire 3D spatial understanding before robot policy learning. Starting from pretrained vision-language models, we leverage large-scale human demonstration videos to extract 3D visual and 3D action annotations, forming a new source of supervision that aligns 2D visual observations with 3D spatial reasoning. We instantiate this paradigm with VIPA-VLA, a dual-encoder architecture that incorporates a 3D visual encoder to augment semantic visual representations with 3D-aware features. When adapted to downstream robot tasks, VIPA-VLA achieves significantly improved grounding between 2D vision and 3D action, resulting in more robust and generalizable robotic policies.
Motus: A Unified Latent Action World Model
While a general embodied agent must function as a unified system, current methods are built on isolated models for understanding, world modeling, and control. This fragmentation prevents unifying multimodal generative capabilities and hinders learning from large-scale, heterogeneous data. In this paper, we propose Motus, a unified latent action world model that leverages existing general pretrained models and rich, sharable motion information. Motus introduces a Mixture-of-Transformer (MoT) architecture to integrate three experts (i.e., understanding, video generation, and action) and adopts a UniDiffuser-style scheduler to enable flexible switching between different modeling modes (i.e., world models, vision-language-action models, inverse dynamics models, video generation models, and video-action joint prediction models). Motus further leverages the optical flow to learn latent actions and adopts a recipe with three-phase training pipeline and six-layer data pyramid, thereby extracting pixel-level "delta action" and enabling large-scale action pretraining. Experiments show that Motus achieves superior performance against state-of-the-art methods in both simulation (a +15% improvement over X-VLA and a +45% improvement over Pi0.5) and real-world scenarios(improved by +11~48%), demonstrating unified modeling of all functionalities and priors significantly benefits downstream robotic tasks.
K-VARK: Kernelized Variance-Aware Residual Kalman Filter for Sensorless Force Estimation in Collaborative Robots
Reliable estimation of contact forces is crucial for ensuring safe and precise interaction of robots with unstructured environments. However, accurate sensorless force estimation remains challenging due to inherent modeling errors and complex residual dynamics and friction. To address this challenge, in this paper, we propose K-VARK (Kernelized Variance-Aware Residual Kalman filter), a novel approach that integrates a kernelized, probabilistic model of joint residual torques into an adaptive Kalman filter framework. Through Kernelized Movement Primitives trained on optimized excitation trajectories, K-VARK captures both the predictive mean and input-dependent heteroscedastic variance of residual torques, reflecting data variability and distance-to-training effects. These statistics inform a variance-aware virtual measurement update by augmenting the measurement noise covariance, while the process noise covariance adapts online via variational Bayesian optimization to handle dynamic disturbances. Experimental validation on a 6-DoF collaborative manipulator demonstrates that K-VARK achieves over 20% reduction in RMSE compared to state-of-the-art sensorless force estimation methods, yielding robust and accurate external force/torque estimation suitable for advanced tasks such as polishing and assembly.
Learning Terrain Aware Bipedal Locomotion via Reduced Dimensional Perceptual Representations
This work introduces a hierarchical strategy for terrain-aware bipedal locomotion that integrates reduced-dimensional perceptual representations to enhance reinforcement learning (RL)-based high-level (HL) policies for real-time gait generation. Unlike end-to-end approaches, our framework leverages latent terrain encodings via a Convolutional Variational Autoencoder (CNN-VAE) alongside reduced-order robot dynamics, optimizing the locomotion decision process with a compact state. We systematically analyze the impact of latent space dimensionality on learning efficiency and policy robustness. Additionally, we extend our method to be history-aware, incorporating sequences of recent terrain observations into the latent representation to improve robustness. To address real-world feasibility, we introduce a distillation method to learn the latent representation directly from depth camera images and provide preliminary hardware validation by comparing simulated and real sensor data. We further validate our framework using the high-fidelity Agility Robotics (AR) simulator, incorporating realistic sensor noise, state estimation, and actuator dynamics. The results confirm the robustness and adaptability of our method, underscoring its potential for hardware deployment.
Tackling Snow-Induced Challenges: Safe Autonomous Lane-Keeping with Robust Reinforcement Learning
This paper proposes two new algorithms for the lane keeping system (LKS) in autonomous vehicles (AVs) operating under snowy road conditions. These algorithms use deep reinforcement learning (DRL) to handle uncertainties and slippage. They include Action-Robust Recurrent Deep Deterministic Policy Gradient (AR-RDPG) and end-to-end Action-Robust convolutional neural network Attention Deterministic Policy Gradient (AR-CADPG), two action-robust approaches for decision-making. In the AR-RDPG method, within the perception layer, camera images are first denoised using multi-scale neural networks. Then, the centerline coefficients are extracted by a pre-trained deep convolutional neural network (DCNN). These coefficients, concatenated with the driving characteristics, are used as input to the control layer. The AR-CADPG method presents an end-to-end approach in which a convolutional neural network (CNN) and an attention mechanism are integrated within a DRL framework. Both methods are first trained in the CARLA simulator and validated under various snowy scenarios. Real-world experiments on a Jetson Nano-based autonomous vehicle confirm the feasibility and stability of the learned policies. Among the two models, the AR-CADPG approach demonstrates superior path-tracking accuracy and robustness, highlighting the effectiveness of combining temporal memory, adversarial resilience, and attention mechanisms in AVs.
SLIM-VDB: A Real-Time 3D Probabilistic Semantic Mapping Framework
This paper introduces SLIM-VDB, a new lightweight semantic mapping system with probabilistic semantic fusion for closed-set or open-set dictionaries. Advances in data structures from the computer graphics community, such as OpenVDB, have demonstrated significantly improved computational and memory efficiency in volumetric scene representation. Although OpenVDB has been used for geometric mapping in robotics applications, semantic mapping for scene understanding with OpenVDB remains unexplored. In addition, existing semantic mapping systems lack support for integrating both fixed-category and open-language label predictions within a single framework. In this paper, we propose a novel 3D semantic mapping system that leverages the OpenVDB data structure and integrates a unified Bayesian update framework for both closed- and open-set semantic fusion. Our proposed framework, SLIM-VDB, achieves significant reduction in both memory and integration times compared to current state-of-the-art semantic mapping approaches, while maintaining comparable mapping accuracy. An open-source C++ codebase with a Python interface is available at https://github.com/umfieldrobotics/slim-vdb.
comment: Accepted into R-AL
PrediFlow: A Flow-Based Prediction-Refinement Framework for Real-Time Human Motion Prediction in Human-Robot Collaboration
Stochastic human motion prediction is critical for safe and effective human-robot collaboration (HRC) in industrial remanufacturing, as it captures human motion uncertainties and multi-modal behaviors that deterministic methods cannot handle. While earlier works emphasize highly diverse predictions, they often generate unrealistic human motions. More recent methods focus on accuracy and real-time performance, yet there remains potential to improve prediction quality further without exceeding time budgets. Additionally, current research on stochastic human motion prediction in HRC typically considers human motion in isolation, neglecting the influence of robot motion on human behavior. To address these research gaps and enable real-time, realistic, and interaction-aware human motion prediction, we propose a novel prediction-refinement framework that integrates both human and robot observed motion to refine the initial predictions produced by a pretrained state-of-the-art predictor. The refinement module employs a Flow Matching structure to account for uncertainty. Experimental studies on the HRC desktop disassembly dataset demonstrate that our method significantly improves prediction accuracy while preserving the uncertainties and multi-modalities of human motion. Moreover, the total inference time of the proposed framework remains within the time budget, highlighting the effectiveness and practicality of our approach.
A Convex Obstacle Avoidance Formulation
Autonomous driving requires reliable collision avoidance in dynamic environments. Nonlinear Model Predictive Controllers (NMPCs) are suitable for this task, but struggle in time-critical scenarios requiring high frequency. To meet this demand, optimization problems are often simplified via linearization, narrowing the horizon window, or reduced temporal nodes, each compromising accuracy or reliability. This work presents the first general convex obstacle avoidance formulation, enabled by a novel approach to integrating logic. This facilitates the incorporation of an obstacle avoidance formulation into convex MPC schemes, enabling a convex optimization framework with substantially improved computational efficiency relative to conventional nonconvex methods. A key property of the formulation is that obstacle avoidance remains effective even when obstacles lie outside the prediction horizon, allowing shorter horizons for real-time deployment. In scenarios where nonconvex formulations are unavoidable, the proposed method meets or exceeds the performance of representative nonconvex alternatives. The method is evaluated in autonomous vehicle applications, where system dynamics are highly nonlinear.
comment: 18 pages, 17 figures
Constrained Policy Optimization via Sampling-Based Weight-Space Projection
Safety-critical learning requires policies that improve performance without leaving the safe operating regime. We study constrained policy learning where model parameters must satisfy unknown, rollout-based safety constraints. We propose SCPO, a sampling-based weight-space projection method that enforces safety directly in parameter space without requiring gradient access to the constraint functions. Our approach constructs a local safe region by combining trajectory rollouts with smoothness bounds that relate parameter changes to shifts in safety metrics. Each gradient update is then projected via a convex SOCP, producing a safe first-order step. We establish a safe-by-induction guarantee: starting from any safe initialization, all intermediate policies remain safe given feasible projections. In constrained control settings with a stabilizing backup policy, our approach further ensures closed-loop stability and enables safe adaptation beyond the conservative backup. On regression with harmful supervision and a constrained double-integrator task with malicious expert, our approach consistently rejects unsafe updates, maintains feasibility throughout training, and achieves meaningful primal objective improvement.
comment: Submitted to IFAC World Congress 2026
SocialNav-MoE: A Mixture-of-Experts Vision Language Model for Socially Compliant Navigation with Reinforcement Fine-Tuning
For robots navigating in human-populated environments, safety and social compliance are equally critical, yet prior work has mostly emphasized safety. Socially compliant navigation that accounts for human comfort, social norms, and contextual appropriateness remains underexplored. Vision language models (VLMs) show promise for this task; however, large-scale models incur substantial computational overhead, leading to higher inference latency and energy consumption, which makes them unsuitable for real-time deployment on resource-constrained robotic platforms. To address this issue, we investigate the effectiveness of small VLM and propose SocialNav-MoE, an efficient Mixture-of-Experts vision language model for socially compliant navigation with reinforcement fine-tuning (RFT). We further introduce a semantic similarity reward (SSR) to effectively leverage RFT for enhancing the decision-making capabilities. Additionally, we study the effectiveness of different small language model types (Phi, Qwen, and StableLM), routing strategies, and vision encoders (CLIP vs. SigLIP, frozen vs. fine-tuned). Experiments on the SNEI dataset demonstrate that SocialNav-MoE achieves an excellent balance between navigation accuracy and efficiency. The proposed SSR function is more effective than hard-level and character-level rewards. Source code will be released upon acceptance.
Active 6D Pose Estimation for Textureless Objects using Multi-View RGB Frames
Estimating the 6D pose of textureless objects from RGB images is an important problem in robotics. Due to appearance ambiguities, rotational symmetries, and severe occlusions, single-view based 6D pose estimators are still unable to handle a wide range of objects, motivating research towards multi-view pose estimation and next-best-view prediction that addresses these limitations. In this work, we propose a comprehensive active perception framework for estimating the 6D poses of textureless objects using only RGB images. Our approach is built upon a key idea: decoupling the 6D pose estimation into a two-step sequential process can greatly improve both accuracy and efficiency. First, we estimate the 3D translation of each object, resolving scale and depth ambiguities inherent to RGB images. These estimates are then used to simplify the subsequent task of determining the 3D orientation, which we achieve through canonical scale template matching. Building on this formulation, we then introduce an active perception strategy that predicts the next best camera viewpoint to capture an RGB image, effectively reducing object pose uncertainty and enhancing pose accuracy. We evaluate our method on the public ROBI and TOD datasets, as well as on our reconstructed transparent object dataset, T-ROBI. Under the same camera viewpoints, our multi-view pose estimation significantly outperforms state-of-the-art approaches. Furthermore, by leveraging our next-best-view strategy, our approach achieves high pose accuracy with fewer viewpoints than heuristic-based policies across all evaluated datasets. The accompanying video and T-ROBI dataset will be released on our project page: https://trailab.github.io/ActiveODPE.
Accelerated Rotation-Invariant Convolution for UAV Image Segmentation
Rotation invariance is essential for precise, object-level segmentation in UAV aerial imagery, where targets can have arbitrary orientations and exhibit fine-scale details. Conventional segmentation architectures like U-Net rely on convolution operators that are not rotation-invariant, leading to degraded segmentation accuracy across varying viewpoints. Rotation invariance can be achieved by expanding the filter bank across multiple orientations; however, this will significantly increase computational cost and memory traffic. In this paper, we introduce a GPU-optimized rotation-invariant convolution framework that eliminates the traditional data-lowering (im2col) step required for matrix-multiplication-based convolution. By exploiting structured data sharing among symmetrically rotated filters, our method achieves multi-orientation convolution with greatly reduced memory traffic and computational redundancy. We further generalize the approach to accelerate convolution with arbitrary (non-symmetric) rotation angles. Across extensive benchmarks, the proposed convolution achieves 20--55% faster training and 15--45% lower energy consumption than CUDNN, while maintaining accuracy comparable to state-of-the-art rotation-invariant methods. In the eight-orientation setting, our approach achieves up to 45% speedup and 41% energy savings on 256\(\times\)256 inputs, and 32% speedup and 23% lower energy usage on 1024\(\times\)1024 inputs. Integrated into a U-Net segmentation model, the framework yields up to 6% improvement in accuracy over the non-rotation-aware baseline. These results demonstrate that the proposed method provides an effective and highly efficient alternative to existing rotation-invariant CNN frameworks.
RoboCOIN: An Open-Sourced Bimanual Robotic Data COllection for INtegrated Manipulation
Bimanual manipulation is essential for achieving human-like dexterity in robots, but the large-scale and diverse bimanual robot datasets remain scarce due to hardware heterogeneity across robotic platforms. To address the challenge, we present RoboCOIN, a comprehensive multi-embodiment bimanual manipulation dataset with over 180,000 demonstrations collected from 15 distinct robotic platforms. The dataset covers 16 scenarios, including residential, commercial, and working environments, with 421 tasks systematically organized by bimanual coordination patterns and object properties. Our key innovation is a hierarchical capability pyramid that provides multi-level annotations, spanning trajectory-level concepts, segment-level subtasks, and frame-level kinematics. We further develop CoRobot, a comprehensive processing framework featuring Robot Trajectory Markup Language (RTML) for quality assessment, automated annotation generation, and unified multi-embodiment management. Extensive experiments demonstrate the reliability and effectiveness of RoboCOIN in multi-embodiment bimanual learning, with significant performance improvements across various model architectures and robotic platforms. The complete dataset and framework are open-sourced and publicly available for further research purposes. Project website: https://FlagOpen.github.io/RoboCOIN/.
comment: Fixed typos
CT-UIO: Continuous-Time UWB-Inertial-Odometer Localization Using Non-Uniform B-spline with Fewer Anchors
Ultra-wideband (UWB) based positioning with fewer anchors has attracted significant research interest in recent years, especially under energy-constrained conditions. However, most existing methods rely on discrete-time representations and smoothness priors to infer a robot's motion states, which often struggle with ensuring multi-sensor data synchronization. In this article, we present a continuous-time UWB-Inertial-Odometer localization system (CT-UIO), utilizing a non-uniform B-spline framework with fewer anchors. Unlike traditional uniform B-spline-based continuous-time methods, we introduce an adaptive knot-span adjustment strategy for non-uniform continuous-time trajectory representation. This is accomplished by adjusting control points dynamically based on movement speed. To enable efficient fusion of {inertial measurement unit (IMU) and odometer data, we propose an improved extended Kalman filter (EKF) with innovation-based adaptive estimation to provide short-term accurate motion prior. Furthermore, to address the challenge of achieving a fully observable UWB localization system under few-anchor conditions, the virtual anchor (VA) generation method based on multiple hypotheses is proposed. At the backend, we propose an adaptive sliding window strategy for global trajectory estimation. Comprehensive experiments are conducted on three self-collected datasets with different UWB anchor numbers and motion modes. The result shows that the proposed CT-UIO achieves 0.403m, 0.150m, and 0.189m localization accuracy in corridor, exhibition hall, and office environments, yielding 17.2%, 26.1%, and 15.2% improvements compared with competing state-of-the-art UIO systems, respectively. The codebase and datasets of this work will be open-sourced at https://github.com/JasonSun623/CT-UIO.
comment: Accepted to IEEE Transactions on Mobile Computing (TMC). The codebase and datasets will be open-sourced at https://github.com/JasonSun623/CT-UIO
Enhancing Interpretability and Interactivity in Robot Manipulation: A Neurosymbolic Approach
In this paper we present a neurosymbolic architecture for coupling language-guided visual reasoning with robot manipulation. A non-expert human user can prompt the robot using unconstrained natural language, providing a referring expression (REF), a question (VQA), or a grasp action instruction. The system tackles all cases in a task-agnostic fashion through the utilization of a shared library of primitive skills. Each primitive handles an independent sub-task, such as reasoning about visual attributes, spatial relation comprehension, logic and enumeration, as well as arm control. A language parser maps the input query to an executable program composed of such primitives, depending on the context. While some primitives are purely symbolic operations (e.g. counting), others are trainable neural functions (e.g. visual grounding), therefore marrying the interpretability and systematic generalization benefits of discrete symbolic approaches with the scalability and representational power of deep networks. We generate a 3D vision-and-language synthetic dataset of tabletop scenes in a simulation environment to train our approach and perform extensive evaluations in both synthetic and real-world scenes. Results showcase the benefits of our approach in terms of accuracy, sample-efficiency, and robustness to the user's vocabulary, while being transferable to real-world scenes with few-shot visual fine-tuning. Finally, we integrate our method with a robot framework and demonstrate how it can serve as an interpretable solution for an interactive object-picking task, achieving an average success rate of 80.2\%, both in simulation and with a real robot. We make supplementary material available in https://gtziafas.github.io/neurosymbolic-manipulation.
comment: Published in International Journal of Robotics Research (IJRR) (2025)
A Survey of Behavior Foundation Model: Next-Generation Whole-Body Control System of Humanoid Robots
Humanoid robots are drawing significant attention as versatile platforms for complex motor control, human-robot interaction, and general-purpose physical intelligence. However, achieving efficient whole-body control (WBC) in humanoids remains a fundamental challenge due to sophisticated dynamics, underactuation, and diverse task requirements. While learning-based controllers have shown promise for complex tasks, their reliance on labor-intensive and costly retraining for new scenarios limits real-world applicability. To address these limitations, behavior(al) foundation models (BFMs) have emerged as a new paradigm that leverages large-scale pre-training to learn reusable primitive skills and broad behavioral priors, enabling zero-shot or rapid adaptation to a wide range of downstream tasks. In this paper, we present a comprehensive overview of BFMs for humanoid WBC, tracing their development across diverse pre-training pipelines. Furthermore, we discuss real-world applications, current limitations, urgent challenges, and future opportunities, positioning BFMs as a key approach toward scalable and general-purpose humanoid intelligence. Finally, we provide a curated and regularly updated collection of BFM papers and projects to facilitate more subsequent research, which is available at https://github.com/yuanmingqi/awesome-bfm-papers.
comment: 19 pages, 10 figures
WholeBodyVLA: Towards Unified Latent VLA for Whole-Body Loco-Manipulation Control
Humanoid robots require precise locomotion and dexterous manipulation to perform challenging loco-manipulation tasks. Yet existing approaches, modular or end-to-end, are deficient in manipulation-aware locomotion. This confines the robot to a limited workspace, preventing it from performing large-space loco-manipulation. We attribute this to: (1) the challenge of acquiring loco-manipulation knowledge due to the scarcity of humanoid teleoperation data, and (2) the difficulty of faithfully and reliably executing locomotion commands, stemming from the limited precision and stability of existing RL controllers. To acquire richer loco-manipulation knowledge, we propose a unified latent learning framework that enables Vision-Language-Action (VLA) system to learn from low-cost action-free egocentric videos. Moreover, an efficient human data collection pipeline is devised to augment the dataset and scale the benefits. To execute the desired locomotion commands more precisely, we present a loco-manipulation-oriented (LMO) RL policy specifically tailored for accurate and stable core loco-manipulation movements, such as advancing, turning, and squatting. Building on these components, we introduce WholeBodyVLA, a unified framework for humanoid loco-manipulation. To the best of our knowledge, WholeBodyVLA is one of its kind enabling large-space humanoid loco-manipulation. It is verified via comprehensive experiments on the AgiBot X2 humanoid, outperforming prior baseline by 21.3%. It also demonstrates strong generalization and high extensibility across a broad range of tasks.
HACTS: a Human-As-Copilot Teleoperation System for Robot Learning
Teleoperation is essential for autonomous robot learning, especially in manipulation tasks that require human demonstrations or corrections. However, most existing systems only offer unilateral robot control and lack the ability to synchronize the robot's status with the teleoperation hardware, preventing real-time, flexible intervention. In this work, we introduce HACTS (Human-As-Copilot Teleoperation System), a novel system that establishes bilateral, real-time joint synchronization between a robot arm and teleoperation hardware. This simple yet effective feedback mechanism, akin to a steering wheel in autonomous vehicles, enables the human copilot to intervene seamlessly while collecting action-correction data for future learning. Implemented using 3D-printed components and low-cost, off-the-shelf motors, HACTS is both accessible and scalable. Our experiments show that HACTS significantly enhances performance in imitation learning (IL) and reinforcement learning (RL) tasks, boosting IL recovery capabilities and data efficiency, and facilitating human-in-the-loop RL. HACTS paves the way for more effective and interactive human-robot collaboration and data-collection, advancing the capabilities of robot manipulation.
DRO-EDL-MPC: Evidential Deep Learning-Based Distributionally Robust Model Predictive Control for Safe Autonomous Driving
Safety is a critical concern in motion planning for autonomous vehicles. Modern autonomous vehicles rely on neural network-based perception, but making control decisions based on these inference results poses significant safety risks due to inherent uncertainties. To address this challenge, we present a distributionally robust optimization (DRO) framework that accounts for both aleatoric and epistemic perception uncertainties using evidential deep learning (EDL). Our approach introduces a novel ambiguity set formulation based on evidential distributions that dynamically adjusts the conservativeness according to perception confidence levels. We integrate this uncertainty-aware constraint into model predictive control (MPC), proposing the DRO-EDL-MPC algorithm with computational tractability for autonomous driving applications. Validation in the CARLA simulator demonstrates that our approach maintains efficiency under high perception confidence while enforcing conservative constraints under low confidence.
comment: Accepted to IEEE Robotics and Automation Letters (RA-L)
DualMap: Online Open-Vocabulary Semantic Mapping for Natural Language Navigation in Dynamic Changing Scenes
We introduce DualMap, an online open-vocabulary mapping system that enables robots to understand and navigate dynamically changing environments through natural language queries. Designed for efficient semantic mapping and adaptability to changing environments, DualMap meets the essential requirements for real-world robot navigation applications. Our proposed hybrid segmentation frontend and object-level status check eliminate the costly 3D object merging required by prior methods, enabling efficient online scene mapping. The dual-map representation combines a global abstract map for high-level candidate selection with a local concrete map for precise goal-reaching, effectively managing and updating dynamic changes in the environment. Through extensive experiments in both simulation and real-world scenarios, we demonstrate state-of-the-art performance in 3D open-vocabulary segmentation, efficient scene mapping, and online language-guided navigation. Project page: https://eku127.github.io/DualMap/
comment: 14 pages, 14 figures. Published in IEEE Robotics and Automation Letters (RA-L), 2025. Code: https://github.com/Eku127/DualMap Project page: https://eku127.github.io/DualMap/
MDE-AgriVLN: Agricultural Vision-and-Language Navigation with Monocular Depth Estimation
Agricultural robots are serving as powerful assistants across a wide range of agricultural tasks, nevertheless, still heavily relying on manual operations or railway systems for movement. The AgriVLN method and the A2A benchmark pioneeringly extended Vision-and-Language Navigation (VLN) to the agricultural domain, enabling a robot to navigate to a target position following a natural language instruction. Unlike human binocular vision, most agricultural robots are only given a single camera for monocular vision, which results in limited spatial perception. To bridge this gap, we present the method of Agricultural Vision-and-Language Navigation with Monocular Depth Estimation (MDE-AgriVLN), in which we propose the MDE module generating depth features from RGB images, to assist the decision-maker on multimodal reasoning. When evaluated on the A2A benchmark, our MDE-AgriVLN method successfully increases Success Rate from 0.23 to 0.32 and decreases Navigation Error from 4.43m to 4.08m, demonstrating the state-of-the-art performance in the agricultural VLN domain. Code: https://github.com/AlexTraveling/MDE-AgriVLN.
SoMi-ToM: Evaluating Multi-Perspective Theory of Mind in Embodied Social Interactions
Humans continuously infer the states, goals, and behaviors of others by perceiving their surroundings in dynamic, real-world social interactions. However, most Theory of Mind (ToM) benchmarks only evaluate static, text-based scenarios, which have a significant gap compared to real interactions. We propose the SoMi-ToM benchmark, designed to evaluate multi-perspective ToM in embodied multi-agent complex social interactions. This benchmark is based on rich multimodal interaction data generated by the interaction environment SoMi, covering diverse crafting goals and social relationships. Our framework supports multi-level evaluation: (1) first-person evaluation provides multimodal (visual, dialogue, action, etc.) input from a first-person perspective during a task for real-time state inference, (2) third-person evaluation provides complete third-person perspective video and text records after a task for goal and behavior inference. This evaluation method allows for a more comprehensive examination of a model's ToM capabilities from both the subjective immediate experience and the objective global observation. We constructed a challenging dataset containing 35 third-person perspective videos, 363 first-person perspective images, and 1225 expert-annotated multiple-choice questions (three options). On this dataset, we systematically evaluated the performance of human subjects and several state-of-the-art large vision-language models (LVLMs). The results show that LVLMs perform significantly worse than humans on SoMi-ToM: the average accuracy gap between humans and models is 40.1% in first-person evaluation and 26.4% in third-person evaluation. This indicates that future LVLMs need to further improve their ToM capabilities in embodied, complex social interactions.
comment: 24 pages, 6 figures
MSG-Loc: Multi-Label Likelihood-based Semantic Graph Matching for Object-Level Global Localization
Robots are often required to localize in environments with unknown object classes and semantic ambiguity. However, when performing global localization using semantic objects, high semantic ambiguity intensifies object misclassification and increases the likelihood of incorrect associations, which in turn can cause significant errors in the estimated pose. Thus, in this letter, we propose a multi-label likelihood-based semantic graph matching framework for object-level global localization. The key idea is to exploit multi-label graph representations, rather than single-label alternatives, to capture and leverage the inherent semantic context of object observations. Based on these representations, our approach enhances semantic correspondence across graphs by combining the likelihood of each node with the maximum likelihood of its neighbors via context-aware likelihood propagation. For rigorous validation, data association and pose estimation performance are evaluated under both closed-set and open-set detection configurations. In addition, we demonstrate the scalability of our approach to large-vocabulary object categories in both real-world indoor scenes and synthetic environments.
comment: Accepted in IEEE Robotics and Automation Letters (2025)
ESPADA: Execution Speedup via Semantics Aware Demonstration Data Downsampling for Imitation Learning
Behavior-cloning based visuomotor policies enable precise manipulation but often inherit the slow, cautious tempo of human demonstrations, limiting practical deployment. However, prior studies on acceleration methods mainly rely on statistical or heuristic cues that ignore task semantics and can fail across diverse manipulation settings. We present ESPADA, a semantic and spatially aware framework that segments demonstrations using a VLM-LLM pipeline with 3D gripper-object relations, enabling aggressive downsampling only in non-critical segments while preserving precision-critical phases, without requiring extra data or architectural modifications, or any form of retraining. To scale from a single annotated episode to the full dataset, ESPADA propagates segment labels via Dynamic Time Warping (DTW) on dynamics-only features. Across both simulation and real-world experiments with ACT and DP baselines, ESPADA achieves approximately a 2x speed-up while maintaining success rates, narrowing the gap between human demonstrations and efficient robot control.
comment: project page: https://project-espada.github.io/espada/
Multiagent Systems
Fair Coordination in Strategic Scheduling
We consider a scheduling problem of strategic agents representing jobs of different weights. Each agent has to decide on one of a finite set of identical machines to get their job processed. In contrast to the common and exclusive focus on makespan minimization, we want the outcome to be fair under strategic considerations of the agents. Two natural properties are credibility, which ensures that the assignment is a Nash equilibrium and equality, requiring that agents with equal-weight jobs are assigned to machines of equal load. We combine these two with a hierarchy of fairness properties based on envy-freeness together with several relaxations based on the idea that envy seems more justified towards agents with a higher weight. We present a complete complexity landscape for satisfiability and decision versions of these properties, alone or in combination, and study them as structural constraints under makespan optimization. For our positive results, we develop a unified algorithmic approach, where we achieve different properties by fine-tuning key subroutines.
Finch: Benchmarking Finance & Accounting across Spreadsheet-Centric Enterprise Workflows
We introduce a finance & accounting benchmark (Finch) for evaluating AI agents on real-world, enterprise-grade professional workflows -- interleaving data entry, structuring, formatting, web search, cross-file retrieval, calculation, modeling, validation, translation, visualization, and reporting. Finch is sourced from authentic enterprise workspaces at Enron (15,000 spreadsheets and 500,000 emails from 150 employees) and other financial institutions, preserving in-the-wild messiness across multimodal artifacts (text, tables, formulas, charts, code, and images) and spanning diverse domains such as budgeting, trading, and asset management. We propose a workflow construction process that combines LLM-assisted discovery with expert annotation: (1) LLM-assisted, expert-verified derivation of workflows from real-world email threads and version histories of spreadsheet files, and (2) meticulous expert annotation for workflows, requiring over 700 hours of domain-expert effort. This yields 172 composite workflows with 384 tasks, involving 1,710 spreadsheets with 27 million cells, along with PDFs and other artifacts, capturing the intrinsically messy, long-horizon, knowledge-intensive, and collaborative nature of real-world enterprise work. We conduct both human and automated evaluations of frontier AI systems including GPT 5.1, Claude Sonnet 4.5, Gemini 3 Pro, Grok 4, and Qwen 3 Max, and GPT 5.1 Pro spends 48 hours in total yet passes only 38.4% of workflows, while Claude Sonnet 4.5 passes just 25.0%. Comprehensive case studies further surface the challenges that real-world enterprise workflows pose for AI agents.
The Optimal Control Algorithm of Connected and Automated Vehicles at Roundabouts with Communication Delay
Connected and automated vehicles (CAVs) rely on wireless communication to exchange state information for distributed control, making communication delays a critical factor that can affect vehicle motion and degrade control performance, particularly in high-speed scenarios. To address these challenges in the complex environment of roundabout intersections, this paper proposes a roundabout control algorithm, which takes into account the uncertainty of interactive information caused by time delays. First, to maintain the required distance between the current vehicle and its preceding and following vehicles, conflicting vehicles are identified based on the time-to-collision (TTC) in the conflict zone. To fully consider communication performance, a vehicle motion model incorporating time delays is established. According to the distributed model predictive control (DMPC) mechanism, the vehicle motion control that satisfies the roundabout constraints is determined. Second, by scheduling the sequence of vehicles entering the roundabout, a multiscale optimization objective is developed by integrating vehicle motion indicators and roundabout system indicators. Traffic density and travel time are embedded into the optimization problem to guide vehicles to enter the roundabout safely and stably. Through a variety of simulation experiments, the effectiveness of the proposed control algorithm is verified by comparing its performance with that of multiple control algorithms under different autonomous vehicle penetration rates and heavy traffic load scenarios.
comment: 9 pages, 4 figures
Quantigence: A Multi-Agent AI Framework for Quantum Security Research
Cryptographically Relevant Quantum Computers (CRQCs) pose a structural threat to the global digital economy. Algorithms like Shor's factoring and Grover's search threaten to dismantle the public-key infrastructure (PKI) securing sovereign communications and financial transactions. While the timeline for fault-tolerant CRQCs remains probabilistic, the "Store-Now, Decrypt-Later" (SNDL) model necessitates immediate migration to Post-Quantum Cryptography (PQC). This transition is hindered by the velocity of research, evolving NIST standards, and heterogeneous deployment environments. To address this, we present Quantigence, a theory-driven multi-agent AI framework for structured quantum-security analysis. Quantigence decomposes research objectives into specialized roles - Cryptographic Analyst, Threat Modeler, Standards Specialist, and Risk Assessor - coordinated by a supervisory agent. Using "cognitive parallelism," agents reason independently to maintain context purity while execution is serialized on resource-constrained hardware (e.g., NVIDIA RTX 2060). The framework integrates external knowledge via the Model Context Protocol (MCP) and prioritizes vulnerabilities using the Quantum-Adjusted Risk Score (QARS), a formal extension of Mosca's Theorem. Empirical validation shows Quantigence achieves a 67% reduction in research turnaround time and superior literature coverage compared to manual workflows, democratizing access to high-fidelity quantum risk assessment.
comment: 13 pages, 2 figures
Multi-Agent Collaborative Framework for Intelligent IT Operations: An AOI System with Context-Aware Compression and Dynamic Task Scheduling
The proliferation of cloud-native architectures, characterized by microservices and dynamic orchestration, has rendered modern IT infrastructures exceedingly complex and volatile. This complexity generates overwhelming volumes of operational data, leading to critical bottlenecks in conventional systems: inefficient information processing, poor task coordination, and loss of contextual continuity during fault diagnosis and remediation. To address these challenges, we propose AOI (AI-Oriented Operations), a novel multi-agent collaborative framework that integrates three specialized agents with an LLM-based Context Compressor. Its core innovations include: (1) a dynamic task scheduling strategy that adaptively prioritizes operations based on real-time system states, and (2) a three-layer memory architecture comprising Working, Episodic, and Semantic layers that optimizes context retention and retrieval. Extensive experiments on both synthetic and real-world benchmarks demonstrate that AOI effectively mitigates information overload, achieving a 72.4% context compression ratio while preserving 92.8% of critical information and significantly enhances operational efficiency, attaining a 94.2% task success rate and reducing the Mean Time to Repair (MTTR) by 34.4% compared to the best baseline. This work presents a paradigm shift towards scalable, adaptive, and context-aware autonomous operations, enabling robust management of next-generation IT infrastructures with minimal human intervention.
Hierarchical Multi-agent Large Language Model Reasoning for Autonomous Functional Materials Discovery
Artificial intelligence is reshaping scientific exploration, but most methods automate procedural tasks without engaging in scientific reasoning, limiting autonomy in discovery. We introduce Materials Agents for Simulation and Theory in Electronic-structure Reasoning (MASTER), an active learning framework where large language models autonomously design, execute, and interpret atomistic simulations. In MASTER, a multimodal system translates natural language into density functional theory workflows, while higher-level reasoning agents guide discovery through a hierarchy of strategies, including a single agent baseline and three multi-agent approaches: peer review, triage-ranking, and triage-forms. Across two chemical applications, CO adsorption on Cu-surface transition metal (M) adatoms and on M-N-C catalysts, reasoning-driven exploration reduces required atomistic simulations by up to 90% relative to trial-and-error selection. Reasoning trajectories reveal chemically grounded decisions that cannot be explained by stochastic sampling or semantic bias. Altogether, multi-agent collaboration accelerates materials discovery and marks a new paradigm for autonomous scientific exploration.
comment: Keywords: Multi-agent reasoning; Large language models; Active learning; AI-driven simulation; Materials discovery; Density functional theory; Surface chemistry
EvoLattice: Persistent Internal-Population Evolution through Multi-Alternative Quality-Diversity Graph Representations for LLM-Guided Program Discovery
Large language models (LLMs) are increasingly used to evolve programs and multi-agent systems, yet most existing approaches rely on overwrite-based mutations that maintain only a single candidate at a time. Such methods discard useful variants, suffer from destructive edits, and explore a brittle search space prone to structural failure. We introduce EvoLattice, a framework that represents an entire population of candidate programs or agent behaviors within a single directed acyclic graph. Each node stores multiple persistent alternatives, and every valid path through the graph defines a distinct executable candidate, yielding a large combinatorial search space without duplicating structure. EvoLattice enables fine-grained alternative-level evaluation by scoring each alternative across all paths in which it appears, producing statistics that reveal how local design choices affect global performance. These statistics provide a dense, data-driven feedback signal for LLM-guided mutation, recombination, and pruning, while preserving successful components. Structural correctness is guaranteed by a deterministic self-repair mechanism that enforces acyclicity and dependency consistency independently of the LLM. EvoLattice naturally extends to agent evolution by interpreting alternatives as prompt fragments or sub-agent behaviors. Across program synthesis (proxy and optimizer meta-learning), EvoLattice yields more stable evolution, greater expressivity, and stronger improvement trajectories than prior LLM-guided methods. The resulting dynamics resemble quality-diversity optimization, emerging implicitly from EvoLattice's internal multi-alternative representation rather than an explicit external archive.
Practical challenges of control monitoring in frontier AI deployments
Automated control monitors could play an important role in overseeing highly capable AI agents that we do not fully trust. Prior work has explored control monitoring in simplified settings, but scaling monitoring to real-world deployments introduces additional dynamics: parallel agent instances, non-negligible oversight latency, incremental attacks between agent instances, and the difficulty of identifying scheming agents based on individual harmful actions. In this paper, we analyse design choices to address these challenges, focusing on three forms of monitoring with different latency-safety trade-offs: synchronous, semi-synchronous, and asynchronous monitoring. We introduce a high-level safety case sketch as a tool for understanding and comparing these monitoring protocols. Our analysis identifies three challenges -- oversight, latency, and recovery -- and explores them in four case studies of possible future AI deployments.
Adaptive GPU Resource Allocation for Multi-Agent Collaborative Reasoning in Serverless Environments
Multi-agent systems powered by large language models have emerged as a promising paradigm for solving complex reasoning tasks through collaborative intelligence. However, efficiently deploying these systems on serverless GPU platforms presents significant resource allocation challenges due to heterogeneous agent workloads, varying computational demands, and the need for cost-effective scaling. This paper presents an adaptive GPU resource allocation framework that achieves 85\% latency reduction compared to round-robin scheduling while maintaining comparable throughput to static allocation, using an $O(N)$ complexity algorithm for real-time adaptation. Our approach dynamically allocates GPU resources based on workload characteristics, agent priorities, and minimum resource requirements, enabling efficient utilization while maintaining quality of service. The framework addresses three key challenges: (1) heterogeneous computational demands across lightweight coordinators and heavyweight specialists, (2) dynamic workload fluctuations requiring millisecond-scale reallocation, and (3) capacity constraints in serverless environments. Through comprehensive simulations modeling realistic multi-agent workflows with four heterogeneous agents, we demonstrate that adaptive allocation outperforms static equal and round-robin strategies across latency, cost, and GPU utilization metrics. The framework provides a practical solution for deploying cost-efficient multi-agent AI systems on serverless GPU infrastructure.
comment: 7 pages, 2 figures
MALLM: Multi-Agent Large Language Models Framework EMNLP 2025
Multi-agent debate (MAD) has demonstrated the ability to augment collective intelligence by scaling test-time compute and leveraging expertise. Current frameworks for multi-agent debate are often designed towards tool use, lack integrated evaluation, or provide limited configurability of agent personas, response generators, discussion paradigms, and decision protocols. We introduce MALLM (Multi-Agent Large Language Models), an open-source framework that enables systematic analysis of MAD components. MALLM offers more than 144 unique configurations of MAD, including (1) agent personas (e.g., Expert, Personality), (2) response generators (e.g., Critical, Reasoning), (3) discussion paradigms (e.g., Memory, Relay), and (4) decision protocols (e.g., Voting, Consensus). MALLM uses simple configuration files to define a debate. Furthermore, MALLM can load any textual Hugging Face dataset (e.g., MMLU-Pro, WinoGrande) and provides an evaluation pipeline for easy comparison of MAD configurations. MALLM enables researchers to systematically configure, run, and evaluate debates for their problems, facilitating the understanding of the components and their interplay.
comment: Accepted at EMNLP 2025 (Demo)
AI Agents with Decentralized Identifiers and Verifiable Credentials
A fundamental limitation of current LLM-based AI agents is their inability to build differentiated trust among each other at the onset of an agent-to-agent dialogue. However, autonomous and interoperable trust establishment becomes essential once agents start to operate beyond isolated environments and engage in dialogues across individual or organizational boundaries. A promising way to fill this gap in Agentic AI is to equip agents with long-lived digital identities and introduce tamper-proof and flexible identity-bound attestations of agents, provisioned by commonly trusted third parties and designed for cross-domain verifiability. This article presents a conceptual framework and a prototypical multi-agent system, where each agent is endowed with a self-sovereign digital identity. It combines a unique and ledger-anchored W3C Decentralized Identifier (DID) of an agent with a set of third-party issued W3C Verifiable Credentials (VCs). This enables agents at the start of a dialog to prove ownership of their self-controlled DIDs for authentication purposes and to establish various cross-domain trust relationships through the spontaneous exchange of their self-hosted DID-bound VCs. A comprehensive evaluation of the prototypical implementation demonstrates technical feasibility but also reveals limitations once an agent's LLM is in sole charge to control the respective security procedures.
comment: Accepted for presentation at the 18th International Conference on Agents and Artificial Intelligence 2026
Unsupervised Acquisition of Discrete Grammatical Categories
This article presents experiments performed using a computational laboratory environment for language acquisition experiments. It implements a multi-agent system consisting of two agents: an adult language model and a daughter language model that aims to learn the mother language. Crucially, the daughter agent does not have access to the internal knowledge of the mother language model but only to the language exemplars the mother agent generates. These experiments illustrate how this system can be used to acquire abstract grammatical knowledge. We demonstrate how statistical analyses of patterns in the input data corresponding to grammatical categories yield discrete grammatical rules. These rules are subsequently added to the grammatical knowledge of the daughter language model. To this end, hierarchical agglomerative cluster analysis was applied to the utterances consecutively generated by the mother language model. It is argued that this procedure can be used to acquire structures resembling grammatical categories proposed by linguists for natural languages. Thus, it is established that non-trivial grammatical knowledge has been acquired. Moreover, the parameter configuration of this computational laboratory environment determined using training data generated by the mother language model is validated in a second experiment with a test set similarly resulting in the acquisition of non-trivial categories.
comment: 34 pages, 3 figures, 7 tables. Related work: Shakouri et al., arXiv:2512.02195
Convergence dynamics of Agent-to-Agent Interactions with Misaligned objectives
We develop and analyze a theoretical framework for agent-to-agent interactions in a simplified in-context linear regression setting. In our model, each agent is instantiated as a single-layer transformer with linear self-attention (LSA) trained to implement gradient-descent-like updates on a quadratic regression objective from in-context examples. We then study the coupled dynamics when two such LSA agents alternately update from each other's outputs under potentially misaligned fixed objectives. Within this framework, we characterize the generation dynamics and show that misalignment leads to a biased equilibrium where neither agent reaches its target, with residual errors predictable from the objective gap and the prompt-induced geometry. We further contrast this fixed objective regime with an adaptive multi-agent setting, wherein a helper agent updates a turn-based objective to implement a Newton-like step for the main agent, eliminating the plateau and accelerating its convergence. Experiments with trained LSA agents, as well as black-box GPT-5-mini runs on in-context linear regression tasks, are consistent with our theoretical predictions within this simplified setting. We view our framework as a mechanistic framework that links prompt geometry and objective misalignment to stability, bias, and robustness, and as a stepping stone toward analyzing more realistic multi-agent LLM systems.
Systems and Control (EESS)
Bidding Aggregated Flexibility in European Electricity Auctions
Bidding flexibility in day-ahead and intraday auctions would enable decentralized flexible resources, such as electric vehicles and heat pumps, to efficiently align their consumption with the intermittent generation of renewable energy. However, because these resources are individually too small to participate in those auctions directly, an aggregator (e.g., a utility) must act on their behalf. This requires aggregating many decentralized resources, which is a computationally challenging task. In this paper, we propose a computationally efficient and highly accurate method that is readily applicable to European day-ahead and intraday auctions. Distinct from existing methods, we aggregate only economically relevant power profiles, identified through price forecasts. The resulting flexibility is then conveyed to the market operator via exclusive groups of block bids. We evaluate our method for a utility serving the Swiss town of Losone, where flexibility from multiple heat pumps distributed across the grid must be aggregated and bid in the Swiss day-ahead auction. Results show that our method aggregates accurately, achieving 98% of the theoretically possible cost savings. This aggregation accuracy remains stable even as the number of heat pumps increases, while computation time grows only linearly, demonstrating strong scalability.
Competent Discrete Time Modeling For analogue controlled PWM Converter Considering State-Feedback
Ever since R.D.Middlebrook proposed the state space averaging notion. The small signal model has been widely used as a design tool to tune control parameters. As Moore's law is continuing and the AI chip's high demand for power consumption and dynamic response, the control bandwidth needs to be boosted. However, the average model has two basic assumptions: the low-frequency assumption, the small ripple assumption. In high-bandwidth design, these two assumptions are violated. In order to solve this, various methods have been proposed. This paper gives a comprehensive overview of the existing small signal model for PWM converters from the following perspectives: 1. model fidelity, 2. analytical tractability. 3. complexity of the derivation process and result 4.generality.
An $H_2$-norm approach to performance analysis of networked control systems under multiplicative routing transformations
This paper investigates the performance of networked control systems subject to multiplicative routing transformations that alter measurement pathways without directly injecting signals. Such transformations, arising from faults or adversarial actions, modify the feedback structure and can degrade performance while remaining stealthy. An $H_2$-norm framework is proposed to quantify the impact of these transformations by evaluating the ratio between the steady-state energies of performance and residual outputs. Equivalent linear matrix inequality (LMI) formulations are derived for computational assessment, and analytical upper bounds are established to estimate the worst-case degradation. The results provide structural insight into how routing manipulations influence closed-loop behavior and reveal conditions for stealthy multiplicative attacks.
comment: This work has been submitted to the IEEE for possible publication
Identification of Technical Design Constraints and Considerations for Transmission Grid Expansion Planning Projects
The large-scale deployment of renewable energy sources, particularly offshore wind, requires large-scale transmission grid expansion projects to transmit the produced low-carbon power to the main demand centers. However, the planning and design of such complex projects currently lack a transparent and systematic process that system operators can follow when considering such investments in their grids. This paper identifies and classifies the main technical design constraints and considerations relevant to the planning of transmission grid expansion projects, and more specifically, electrical energy hubs. Seven key areas of interest are identified, namely network integration, HVDC technologies, costs (CAPEX, OPEX, and space requirements), electricity market design, future proofness and modular expandability, reliability-availability-maintainability, and sustainability. Each area of interest is analyzed in terms of its technical and operational relevance, with technical design constraints and considerations derived from such analysis. In addition, a hierarchical classification of the identified constraints and considerations (and therefore areas of interest) is introduced, distinguishing them between three criticality classes, namely hard constraints, main drivers, and key considerations. The dependencies between the different areas are discussed, too. Therefore, this work provides system operators and policymakers with a structured basis to support a transparent planning methodology with clear decision hierarchies for investments in transmission grid expansion projects.
Impact analysis of hidden faults in nonlinear control systems using output-to-output gain
Networked control systems (NCSs) are vulnerable to faults and hidden malfunctions in communication channels that can degrade performance or even destabilize the closed loop. Classical metrics in robust control and fault detection typically treat impact and detectability separately, whereas the output-to-output gain (OOG) provides a unified measure of both. While existing results have been limited to linear systems, this paper extends the OOG framework to nonlinear NCSs with quadratically constrained nonlinearities, considering false-injection attacks that can also manipulate sensor measurements through nonlinear transformations. Specifically, we provide computationally efficient linear matrix inequality conditions and complementary frequency-domain tests that yield explicit upper bounds on the OOG of this class of nonlinear systems. Furthermore, we derive frequency-domain conditions for absolute stability of closed-loop systems, generalizing the Yakubovich quadratic criterion.
comment: This work has been submitted to IFAC World Congress 2026 for possible publication
Real-Time AI-Driven Milling Digital Twin Towards Extreme Low-Latency
Digital twin (DT) enables smart manufacturing by leveraging real-time data, AI models, and intelligent control systems. This paper presents a state-of-the-art analysis on the emerging field of DTs in the context of milling. The critical aspects of DT are explored through the lens of virtual models of physical milling, data flow from physical milling to virtual model, and feedback from virtual model to physical milling. Live data streaming protocols and virtual modeling methods are highlighted. A case study showcases the transformative capability of a real-time machine learning-driven live DT of tool-work contact in a milling process. Future research directions are outlined to achieve the goals of Industry 4.0 and beyond.
Evaluating the Navigation Capabilities of a Modified COAST Guidewire Robot in an Anatomical Phantom Model
To address the issues that arise due to the manual navigation of guidewires in endovascular interventions, research in medical robotics has taken a strong interest in developing robotically steerable guidewires, which offer the possibility of enhanced maneuverability and navigation, as the tip of the guidewire can be actively steered. The COaxially Aligned STeerable (COAST) guidewire robot has the ability to generate a wide variety of motions including bending motion with different bending lengths, follow-the-leader motion, and feedforward motion. In our past studies, we have explored different designs of the COAST guidewire robot and developed modeling, control, and sensing strategies for the COAST guidewire robot. In this study, the performance of a modified COAST guidewire robot is evaluated by conducting navigation experiments in an anatomical phantom model with pulsatile flow. The modified COAST guidewire robot is a simplified version of the COAST guidewire robot and consists of two tubes as opposed to three tubes. Through this study, we demonstrate the effectiveness of the modified COAST guidewire robot in navigating the tortuous phantom vasculature.
comment: Presented at the 14th Conference on New Technologies for Computer and Robot Assisted Surgery (CRAS 2025)
Balancing Timeliness and Privacy in Discrete-Time Updating Systems
We study the trade-off between Age of Information (AoI) and maximal leakage (MaxL) in discrete-time status updating systems. A source generates time-stamped update packets that are processed by a server that delivers them to a monitor. An adversary, who eavesdrops on the server-monitor link, wishes to infer the timing of the underlying source update sequence. The server must balance the timeliness of the status information at the monitor against the timing information leaked to the adversary. We consider a model with Bernoulli source updates under two classes of Last-Come-First-Served (LCFS) service policies: (1) Coupled policies that tie the server's deliveries to the update arrival process in a preemptive queue; (2) Decoupled (dumping) policies in which the server transmits its freshest update according to a schedule that is independent of the update arrivals. For each class, we characterize the structure of the optimal policy that minimizes AoI for a given MaxL rate. Our analysis reveals that decoupled dumping policies offer a superior age-leakage trade-off to coupled policies. When subject to a MaxL constraint, we prove that the optimal dumping strategy is achieved by dithering between two adjacent deterministic dump periods.
QoS-Aware State-Augmented Learnable Framework for 5G NR-U/Wi-Fi Coexistence: Impact of Parameter Selection and Enhanced Collision Resolution
Unlicensed spectrum supports diverse traffic with stringent Quality-of-Service (QoS) requirements. In NR-U/Wi-Fi coexistence,the values of MAC parameters critically influence delay, collision behavior, and airtime fairness and efficiency. In this paper, we investigate the impact of (i) cost scaling and violation modeling, (ii) choice of MAC parameters, and (iii) an enhanced collision resolution scheme for the Listen-Before-Talk (LBT) mechanism on the performance of a state-augmented constrained reinforcement learning controller for NR-U/Wi-Fi coexistence. Coexistence control is formulated as a constrained Markov decision process with an explicit delay constraint for high-priority traffic and fairness as the optimization goal. Our simulation results show three key findings: (1) signed, threshold-invariant cost scaling with temporal smoothing stabilizes learning and strengthens long-term constraint adherence; (2) use of the contention window parameter for control provides smoother adaptation and better delay compliance than other MAC parameters; and (3) enhanced LBT significantly reduces collisions and improves airtime efficiency. These findings provide practical insights for achieving robust, QoS-aware coexistence control.
comment: 10 pages, 6 figures, 2 tables
Neural Control Barrier Functions for Signal Temporal Logic Specifications with Input Constraints
Signal Temporal Logic (STL) provides a powerful framework to describe complex tasks involving temporal and logical behavior in dynamical systems. In this work, we address the problem of synthesizing controllers for continuous-time systems under STL specifications with input constraints. We propose a neural network-based framework for synthesizing time-varying control barrier functions (TVCBF) and their corresponding controllers for systems to fulfill STL specifications while respecting input constraints. We formulate barrier conditions incorporating the spatial and temporal logic of the given STL specification. Additionally, we introduce a validity condition to provide formal safety guarantees across the entire state space. Finally, we demonstrate the effectiveness of the proposed approach through several simulation studies considering different STL tasks.
A Multi-Worker Assembly Line Rebalancing with Spatial and Ergonomic Considerations
This work addresses the Assembly Line Rebalancing Problem in manual assembly systems where multiple workers operate in parallel within the same station - an industrially relevant scenario that remains insufficiently explored in the literature. A multi-objective optimization model is proposed that incorporates task reassignment, worker allocation, ergonomic evaluation, and explicit spatial feasibility through work-area constraints. The formulation minimizes deviations from the current configuration while promoting balanced workload and ergonomic conditions among workers. Computational experiments on synthetic problem instances demonstrate that the model consistently generates feasible and human-centered reconfigurations across varying cycle-time conditions, highlighting its potential as a decision-support tool for industrial rebalancing in flexible production environments.
Temporal parallelisation of continuous-time maximum-a-posteriori trajectory estimation
This paper proposes a parallel-in-time method for computing continuous-time maximum-a-posteriori (MAP) trajectory estimates of the states of partially observed stochastic differential equations (SDEs), with the goal of improving computational speed on parallel architectures. The MAP estimation problem is reformulated as a continuous-time optimal control problem based on the Onsager-Machlup functional. This reformulation enables the use of a previously proposed parallel-in-time solution for optimal control problems, which we adapt to the current problem. The structure of the resulting optimal control problem admits a parallel solution based on parallel associative scan algorithms. In the linear Gaussian special case, it yields a parallel Kalman-Bucy filter and a parallel continuous-time Rauch-Tung-Striebel smoother. These linear computational methods are further extended to nonlinear continuous-time state-space models through Taylor expansions. We also present the corresponding parallel two-filter smoother. The graphics processing unit (GPU) experiments on linear and nonlinear models demonstrate that the proposed framework achieves a significant speedup in computations while maintaining the accuracy of sequential algorithms.
An End-to-End Neural Network Transceiver Design for OFDM System with FPGA-Accelerated Implementation
The evolution toward sixth-generation (6G) wireless networks demands high-performance transceiver architectures capable of handling complex and dynamic environments. Conventional orthogonal frequency-division multiplexing (OFDM) receivers rely on cascaded discrete Fourier transform (DFT) and demodulation blocks, which are prone to inter-stage error propagation and suboptimal global performance. In this work, we propose two neural network (NN) models DFT-Net and Demodulation-Net (Demod-Net) to jointly replace the IDFT/DFT and demodulation modules in an OFDM transceiver. The models are trained end-to-end (E2E) to minimize bit error rate (BER) while preserving operator equivalence for hybrid deployment. A customized DFT-Demodulation Net Accelerator (DDNA) is further developed to efficiently map the proposed networks onto field-programmable gate array (FPGA) platforms. Leveraging fine-grained pipelining and block matrix operations, DDNA achieves high throughput and flexibility under stringent latency constraints. Experimental results show that the DL-based transceiver consistently outperforms the conventional OFDM system across multiple modulation schemes. With only a modest increase in hardware resource usage, it achieves approximately 1.5 dB BER gain and up to 66\% lower execution time.
Measurement of Material Volume Fractions in a Microwave Resonant Cavity Sensor Using Convolutional Neural Network
A non-destructive, real-time method for estimating the volume fraction of a dielectric mixture inside a resonant cavity is presented. A convolutional neural network (CNN)-based approach is used to estimate the fractional composition of two-phase dielectric mixtures inside a resonant cavity using scattering parameter (S-parameter) measurements. A rectangular cavity sensor with a strip feed structure is characterized using a vector network analyzer (VNA) from 0.01--20~GHz. The CNN is trained using both simulated and experimentally measured S-parameters and achieves high predictive accuracy even without de-embedding or filtering, demonstrating robustness to measurement imperfections. The simulation results achieve a coefficient of determination ($R^2$)=0.99 using $k$-fold cross-validation, while the experimental model using raw data achieves an $R^2=0.94$ with a mean absolute error (MAE) below 6\%. Data augmentation further improves the accuracy of the experimental prediction to above $R^2=0.998$ (MAE$<$0.72\%). The proposed method enables rapid, non-destructive, accurate, low-cost, and real-time estimation of material fractions, illustrating strong potential for sensing applications in microwave material characterization.
Plant Equivalent Controller Realizations for Attack-Resilient Cyber-Physical Systems
As cyber-physical systems (CPSs) become more dependent on data and communication networks, their vulnerability to false data injection (FDI) attacks has raised significant concerns. Among these, stealthy attacks, those that evade conventional detection mechanisms, pose a critical threat to closed-loop performance. This paper introduces a controller-oriented method to enhance CPS resiliency against such attacks without compromising nominal closed-loop behavior. Specifically, we propose the concept of plant equivalent controller (PEC) realizations, representing a class of dynamic output-feedback controllers that preserve the input-output behavior of a given base controller while exhibiting distinct robustness properties in the presence of disturbances and sensor attacks. To quantify and improve robustness, we employ reachable set analysis to assess the impact of stealthy attacks on the closed-loop dynamics. Building on this analysis, we provide mathematical tools (in terms of linear matrix inequalities) to synthesize the optimal PEC realization that minimizes the reachable set under peak-bounded disturbances. The proposed framework thus provides systematic analysis and synthesis tools to enhance the attack resilience of CPSs while maintaining the desired nominal performance. The effectiveness of the approach is demonstrated on the quadruple-tank process subject to stealthy sensor attacks.
Differentiable Material Point Method for the Control of Deformable Objects
Controlling the deformation of flexible objects is challenging due to their non-linear dynamics and high-dimensional configuration space. This work presents a differentiable Material Point Method (MPM) simulator targeted at control applications. We exploit the differentiability of the simulator to optimize a control trajectory in an active damping problem for a hyperelastic rope. The simulator effectively minimizes the kinetic energy of the rope around 2$\times$ faster than a baseline MPPI method and to a 20% lower energy level, while using about 3% of the computation time.
comment: 7 Pages, 4 Figures, 1 Table
Efficient Generation of Smooth Paths with Curvature Guarantees by Mollification
Most path following and trajectory tracking algorithms in mobile robotics require the desired path or trajectory to be defined by at least twice continuously differentiable functions to guarantee key properties such as global convergence, especially for nonholonomic robots like unicycles with speed constraints. Consequently, these algorithms typically exclude continuous but non-differentiable paths, such as piecewise functions. Despite this exclusion, such paths provide convenient high-level inputs for describing robot missions or behavior. While techniques such as spline interpolation or optimization-based methods are commonly used to smooth non-differentiable paths or create feasible ones from sequences of waypoints, they either can produce unnecessarily complex trajectories or are computationally expensive. In this work, we present a method to regularize non-differentiable functions and generate feasible paths through mollification. Specifically, we approximate an arbitrary path with a differentiable function that can converge to it with arbitrary precision. Additionally, we provide a systematic method for bounding the curvature of generated paths, which we demonstrate by applying it to paths resulting from linking a sequence of waypoints with segments. The proposed approach is computationally efficient, enabling real-time implementation on microcontrollers and compatibility with standard trajectory tracking and path following algorithms.
Iterative Tuning of Nonlinear Model Predictive Control for Robotic Manufacturing Tasks
Manufacturing processes are often perturbed by drifts in the environment and wear in the system, requiring control re-tuning even in the presence of repetitive operations. This paper presents an iterative learning framework for automatic tuning of Nonlinear Model Predictive Control (NMPC) weighting matrices based on task-level performance feedback. Inspired by norm-optimal Iterative Learning Control (ILC), the proposed method adaptively adjusts NMPC weights Q and R across task repetitions to minimize key performance indicators (KPIs) related to tracking accuracy, control effort, and saturation. Unlike gradient-based approaches that require differentiating through the NMPC solver, we construct an empirical sensitivity matrix, enabling structured weight updates without analytic derivatives. The framework is validated through simulation on a UR10e robot performing carbon fiber winding on a tetrahedral core. Results demonstrate that the proposed approach converges to near-optimal tracking performance (RMSE within 0.3% of offline Bayesian Optimization (BO)) in just 4 online repetitions, compared to 100 offline evaluations required by BO algorithm. The method offers a practical solution for adaptive NMPC tuning in repetitive robotic tasks, combining the precision of carefully optimized controllers with the flexibility of online adaptation.
On the Complementarity of Shared Electric Mobility and Renewable Energy Communities
Driven by the ongoing energy transition, shared mobility service providers are emerging actors in electrical power systems which aim to shift combustion-based mobility to electric paradigm. In the meantime, Energy Communities are deployed to enhance the local usage of distributed renewable production. As both ators share the same goal of satisfying the demand at the lowest cost, they could take advantage of their complementarity and coordinate their decisions to enhance each other operation. This paper presents an original Mixed-Integer Second Order Cone Programming long-term Electric Vehicle fleet planning optimization problem that integrates the coordination with a Renewable Energy Community and Vehicle-to-Grid capability. This model is used to assess the economic, energy, and grid performances of their collaboration in a 21 buses low-voltage distribution network. Key results show that, both actors coordination can help reducing the yearly cost up to 11.3 % compared to their stand-alone situation and that it may reduce the stress on the substation transformer by 46 % through the activation of the inherent EVs flexibility when subject to peak penalties from the grid operator.
Safe Control of Multi-Agent Systems with Minimal Communication
In many multi-agent systems, communication is limited by bandwidth, latency, and energy constraints. Designing controllers that achieve coordination and safety with minimal communication is critical for scalable and reliable deployment. This paper presents a method for designing controllers that minimize inter-agent communication in multi-agent systems while satisfying safety and coordination requirements, while conforming to communication delay constraints. The control synthesis problem is cast as a rank minimization problem, where a convex relaxation is obtained via system level synthesis. Simulation results on various tasks, including trajectory tracking with relative and heterogeneous sensing, demonstrate that the proposed method significantly reduces inter-agent transmission compared to baseline approaches.
comment: to appear at 2025 IEEE Conference on Decision and Control (CDC)
Large Language Models for Power System Applications: A Comprehensive Literature Survey
This comprehensive literature review examines the emerging applications of Large Language Models (LLMs) in power system engineering. Through a systematic analysis of recent research published between 2020 and 2025, we explore how LLMs are being integrated into various aspects of power system operations, planning, and management. The review covers key application areas including fault diagnosis, load forecasting, cybersecurity, control and optimization, system planning, simulation, and knowledge management. Our findings indicate that while LLMs show promising potential in enhancing power system operations through their advanced natural language processing and reasoning capabilities, significant challenges remain in their practical implementation. These challenges include limited domain-specific training data, concerns about reliability and safety in critical infrastructure, and the need for enhanced explainability. The review also highlights emerging trends such as the development of power system-specific LLMs and hybrid approaches combining LLMs with traditional power engineering methods. We identify crucial research directions for advancing the field, including the development of specialized architectures, improved security frameworks, and enhanced integration with existing power system tools. This survey provides power system researchers and practitioners with a comprehensive overview of the current state of LLM applications in the field and outlines future pathways for research and development.
comment: 17 pages, 1 table, 1 figure
Headroom as A Grid Service in Software-Defined Power Grids: A Peak-to-Peak Control Design Approach
To address system frequency challenges driven by the integration of renewable generation, advanced control strategies are designed at the device level to provide effective frequency support following disturbances. However, typically relying on energy-based performance metrics, these methods cannot guarantee the system frequency constraints such as frequency nadir and maximum Rate-of-Change-of-Frequency (RoCoF). Moreover, locally-designed frequency support cannot minimize the overall system cost to maintain frequency stability. On the other hand, the concept of frequency-constrained system scheduling is introduced, which incorporates frequency dynamic constraints into the system economic optimization, so that frequency requirements can be maintained with minimum cost. However, these works rely on analytical approximations of the frequency dynamic metrics, which are mathematically complicated and tend to be over-conservative for the approximation of IBR headroom requirements.
Information-Optimal Formation Geometry Design for Multimodal UAV Cooperative Perception
The efficacy of UAV swarm cooperative perception fundamentally depends on three-dimensional (3D) formation geometry, which governs target observability and sensor complementarity. In the literature, the exploitation of formation geometry and its impact on UAV sensing have rarely been studied, which can significantly degrade multimodal cooperative perception at scenarios where heterogeneous payloads (vision cameras and LiDAR) should be geometrically arranged to exploit their complementary strengths while managing communication interference and hardware budgets. To bridge this critical gap, we propose an information-theoretic optimization framework that allocation of UAVs and multimodal sensors, configures formation geometries, and flight control. The UAV-sensor allocation is optimized by the Fisher Information Matrix (FIM) determinant maximization. Under this framework we introduce an equivalent formation transition strategy that enhances field-of-view (FOV) coverage without compromising perception accuracy and communication interference. Furthermore, we design a novel Lyapunov-stable flight control scheme with logarithmic potential fields to generate energy-efficient trajectories for formation transitions. Extensive simulations demonstrate our formation-aware design achieves 25.0\% improvement in FOV coverage, 104.2\% enhancement in communication signal strength, and 47.2\% reduction in energy consumption compared to conventional benchmarks. This work establishes that task-driven geometric configuration represents a foundational rather than incidental component in next-generation UAV swarm systems.
Data-Driven Control via Conditional Mean Embeddings: Formal Guarantees via Uncertain MDP Abstraction
Controlling stochastic systems with unknown dynamics and under complex specifications is specially challenging in safety-critical settings, where performance guarantees are essential. We propose a data-driven policy synthesis framework that yields formal performance guarantees for such systems using conditional mean embeddings (CMEs) and uncertain Markov decision processes (UMDPs). From trajectory data, we learn the system's transition kernel as a CME, then construct a finite-state UMDP abstraction whose transition uncertainties capture learning and discretization errors. Next, we generate a policy with formal performance bounds through robust dynamic programming. We demonstrate and empirically validate our method through a temperature regulation benchmark.
Group-Theoretic Reinforcement Learning of Dynamical Decoupling Sequences
Dynamical decoupling seeks to mitigate phase decoherence in qubits by applying a carefully designed sequence of effectively instantaneous electromagnetic pulses. Although analytic solutions exist for pulse timings that are optimal under specific noise regimes, identifying the optimal timings for a realistic noise spectrum remains challenging. We propose a reinforcement learning (RL)-based method for designing pulse sequences on qubits. Our novel action set enables the RL agent to efficiently navigate this inherently non-convex optimization landscape. The action set, derived from Thompson's group $F$, is applicable to a broad class of sequential decision problems whose states can be represented as bounded sequences. We demonstrate that our RL agent can learn pulse sequences that minimize dephasing without requiring explicit knowledge of the underlying noise spectrum. This work opens the possibility for real-time learning of optimal dynamical decoupling sequences on qubits which are dephasing-limited. The model-free nature of our algorithm suggests that the agent may ultimately learn optimal pulse sequences even in the presence of unmodeled physical effects, such as pulse errors or non-Gaussian noise.
A Fair, Flexible, Zero-Waste Digital Electricity Market: A First-Principles Approach Combining Automatic Market Making, Holarchic Architectures and Shapley Theory
This thesis presents a fundamental rethink of electricity market design at the wholesale and balancing layers. Rather than treating markets as static spot clearing mechanisms, it reframes them as a continuously online, event driven dynamical control system: a two sided marketplace operating directly on grid physics. Existing energy only, capacity augmented, and zonal market designs are shown to admit no shock robust Nash equilibrium under realistic uncertainty, instead relying on price caps, uplift, and regulatory intervention to preserve solvency and security. In response, the thesis develops a holarchic Automatic Market Maker (AMM) in which prices are bounded, exogenous control signals derived from physical tightness rather than emergent equilibrium outcomes. The AMM generalises nodal and zonal pricing through nested scarcity layers, from node to cluster to zone to region to system, such that participant facing prices inherit from the tightest binding constraint. Nodal and zonal pricing therefore emerge as special cases of a unified scarcity propagation rule. Beyond pricing, the AMM functions as a scarcity aware control system and a digitally enforceable rulebook for fair access and proportional allocation under shortage. Fuel costs are recovered through pay as bid energy dispatch consistent with merit order, while non fuel operating and capital costs are allocated according to adequacy, flexibility, and locational contribution. Large scale simulations demonstrate bounded input bounded output stability, controllable procurement costs, zero structural waste, and improved distributional outcomes. The architecture is climate aligned and policy configurable, but requires a managed transition and new operational tools for system operators and market participants.
comment: PhD thesis
Simultaneous and Proportional Finger Motion Decoding Using Spatial Features from High-Density Surface Electromyography
Restoring natural and intuitive hand function requires simultaneous and proportional control (SPC) of multiple degrees of freedom (DoFs). This study systematically evaluated the multichannel linear descriptors-based block field method (MLD-BFM) for continuous decoding of five finger-joint DoFs by leveraging the rich spatial information of high-density surface electromyography (HD sEMG). Twenty-one healthy participants performed dynamic sinusoidal finger movements while HD sEMG signals were recorded from the \textit{extensor digitorum communis} (EDC) and \textit{flexor digitorum superficialis} (FDS) muscles. MLD-BFM extracted region-specific spatial features, including effective field strength ($Σ$), field-strength variation rate ($Φ$), and spatial complexity ($Ω$). Model performance was optimized (block size: $2 \times 2$; window: 0.15 s) and compared with conventional time-domain features and dimensionality reduction approaches when applied to multi-output regression models. MLD-BFM consistently achieved the highest $\mathrm{R}^2_{\mathrm{vw}}$ values across all models. The multilayer perceptron (MLP) combined with MLD-BFM yielded the best performance ($\mathrm{R}^2_{\mathrm{vw}} = 86.68\% \pm 0.33$). Time-domain features also showed strong predictive capability and were statistically comparable to MLD-BFM in some models, whereas dimensionality reduction techniques exhibited lower accuracy. Decoding accuracy was higher for the middle and ring fingers than for the thumb. Overall, MLD-BFM improved continuous finger movement decoding accuracy, underscoring the importance of taking advantage of the spatial richness of HD sEMG. These findings suggest that spatially structured features enhance SPC and provide practical guidance for designing robust, real-time, and responsive myoelectric interfaces.
comment: 39 pages, 13 figures, 2 tables
Safe Online Control-Informed Learning
This paper proposes a Safe Online Control-Informed Learning framework for safety-critical autonomous systems. The framework unifies optimal control, parameter estimation, and safety constraints into an online learning process. It employs an extended Kalman filter to incrementally update system parameters in real time, enabling robust and data-efficient adaptation under uncertainty. A softplus barrier function enforces constraint satisfaction during learning and control while eliminating the dependence on high-quality initial guesses. Theoretical analysis establishes convergence and safety guarantees, and the framework's effectiveness is demonstrated on cart-pole and robot-arm systems.
A Convex Obstacle Avoidance Formulation
Autonomous driving requires reliable collision avoidance in dynamic environments. Nonlinear Model Predictive Controllers (NMPCs) are suitable for this task, but struggle in time-critical scenarios requiring high frequency. To meet this demand, optimization problems are often simplified via linearization, narrowing the horizon window, or reduced temporal nodes, each compromising accuracy or reliability. This work presents the first general convex obstacle avoidance formulation, enabled by a novel approach to integrating logic. This facilitates the incorporation of an obstacle avoidance formulation into convex MPC schemes, enabling a convex optimization framework with substantially improved computational efficiency relative to conventional nonconvex methods. A key property of the formulation is that obstacle avoidance remains effective even when obstacles lie outside the prediction horizon, allowing shorter horizons for real-time deployment. In scenarios where nonconvex formulations are unavoidable, the proposed method meets or exceeds the performance of representative nonconvex alternatives. The method is evaluated in autonomous vehicle applications, where system dynamics are highly nonlinear.
comment: 18 pages, 17 figures
Delay Optimization in a Simple Offloading System: Extended Version
We consider a computation offloading system where jobs are processed sequentially at a local server followed by a higher-capacity cloud server. The system offers two service modes, differing in how the processing is split between the servers. Our goal is to design an optimal policy for assigning jobs to service modes and partitioning server resources in order to minimize delay. We begin by characterizing the system's stability region and establishing design principles for service modes that maximize throughput. For any given job assignment strategy, we derive the optimal resource partitioning and present a closed-form expression for the resulting delay. Moreover, we establish that the delay-optimal assignment policy exhibits a distinct breakaway structure: at low system loads, it is optimal to route all jobs through a single service mode, whereas beyond a critical load threshold, jobs must be assigned across both modes. We conclude by validating these theoretical insights through numerical evaluation.
Spacecraft Angular Rate Estimation via Event-Based Camera Sensing
This paper presents a method for determining spacecraft angular rates using event-based camera sensing. This is achieved by analyzing the temporal distribution of brightness events triggered by the apparent motion of stars. The location and polarity of the events are used to infer the apparent motion field of the stars, which is, in turn, employed to estimate the observer angular velocity in the camera frame. This can be converted to the spacecraft angular rates provided an attitude reference. The method is validated through numerical simulation for a synthetic dataset of event streams generated on random spacecraft pointing and rates conditions. The accuracy of the method is assessed, demonstrating its potential to complement or replace conventional rate sensors in spacecraft systems using event camera sensing.
A $μ$-Analysis and Synthesis Framework for Partial Integral Equations using IQCs
We develop a $μ$-analysis and synthesis framework for infinite-dimensional systems that leverages the Integral Quadratic Constraints (IQCs) to compute the structured singular value's upper bound. The methodology formulates robust stability and performance conditions jointly as Linear Partial Integral Inequalities within the Partial Integral Equation framework, establishing connections between IQC multipliers and $μ$-theory. Computational implementation via PIETOOLS enables computational tools that practically applicable to spatially distributed infinite dimensional systems. Illustrations with the help of Partial and Delay Differential Equations validate the effectiveness of the framework, showing a significant reduction in conservatism compared to unstructured methods and providing systematic tools for stability-performance trade-off analysis.
Hybrid Control as a Proxy for Detection and Mitigation of Sensor Attacks in Cooperative Driving
We propose a real-time hybrid controller scheme to detect and mitigate False-Data Injection (FDI) attacks on Cooperative Adaptive Cruise Control (CACC). Our method uses sensor redundancy to create equivalent controller realizations, each driven by distinct sensor subsets but producing identical control inputs when no attack occurs. By comparing control signals and measurements via majority voting, the scheme identifies compromised sensors in real-time and switches to a healthy controller. The hybrid controller uses attack-dependent flow and jump sets, and resets compromised controllers' states. Simulation results demonstrate the effectiveness of this approach.
comment: 6 pages, submitted to the IFAC World Congress 2026
GAPses: Versatile smart glasses for comfortable and fully-dry acquisition and parallel ultra-low-power processing of EEG and EOG
Recent advancements in head-mounted wearable technology are revolutionizing the field of biopotential measurement, but the integration of these technologies into practical, user-friendly devices remains challenging due to issues with design intrusiveness, comfort, and data privacy. To address these challenges, this paper presents GAPses, a novel smart glasses platform designed for unobtrusive, comfortable, and secure acquisition and processing of electroencephalography (EEG) and electrooculography (EOG) signals. We introduce a direct electrode-electronics interface with custom dry soft electrodes to enhance comfort for long wear. An integrated parallel ultra-low-power RISC-V processor (GAP9, Greenwaves Technologies) processes data at the edge, thereby eliminating the need for continuous data streaming through a wireless link, enhancing privacy, and increasing system reliability in adverse channel conditions. We demonstrate the broad applicability of the designed prototype through validation in a number of EEG-based interaction tasks, including alpha waves, steady-state visual evoked potential analysis, and motor movement classification. Furthermore, we demonstrate an EEG-based biometric subject recognition task, where we reach a sensitivity and specificity of 98.87% and 99.86% respectively, with only 8 EEG channels and an energy consumption per inference on the edge as low as 121 $μJ$. Moreover, in an EOG-based eye movement classification task, we reach an accuracy of 96.68% on 11 classes, resulting in an information transfer rate of 94.78 bit/min, which can be further increased to 161.43 bit/min by reducing the accuracy to 81.43%. The deployed implementation has an energy consumption of 40 $μJ$ per inference and a total system power of only 12.4 mW, of which only 1.61% is used for classification, allowing for continuous operation of more than 22 h with a small 75 mAh battery.
comment: 11 pages, 6 figures
An Input-Output Data-Driven Dissipativity Approach for Compositional Stability Certification of Interconnected LTI MIMO Systems
We propose an input-output data-driven framework for certifying the stability of interconnected multiple-input-multiple-output linear time-invariant discrete-time systems via QSR-dissipativity. That is, by using measured input-output trajectories of each subsystem, we verify dissipative properties and extract local passivity indices without requiring an explicit model identification. These passivity indices are then used to derive conditions under which the equilibrium of the interconnected system is stable. In particular, the framework identifies how the lack of passivity in some subsystems can be compensated by surpluses in others. The proposed approach enables a compositional stability analysis by combining subsystem-level conditions into a criterion valid for the overall interconnected system. We illustrate via a numerical case study, how to compute channel-wise passivity indices and infer stability guarantees directly from data with the proposed method.
BioGAP-Ultra: A Modular Edge-AI Platform for Wearable Multimodal Biosignal Acquisition and Processing
The growing demand for continuous physiological monitoring and human-machine interaction in real-world settings calls for wearable platforms that are flexible, low-power, and capable of on-device intelligence. This work presents BioGAP-Ultra, an advanced multimodal biosensing platform that supports synchronized acquisition of diverse electrophysiological and hemodynamic signals such as EEG, EMG, ECG, and PPG while enabling embedded AI processing at state-of-the-art energy efficiency. BioGAP-Ultra is a major extension of our previous BioGAP design aimed at meeting the rapidly growing requirements of wearable biosensing applications. It features (i) increased on-device storage (x2 SRAM, x4 FLASH), (ii) improved wireless connectivity (supporting up to 1.4 Mbit/s bandwidth, x4 higher than BioGAP), (iii) enhanced number of signal modalities (from 3 to 5) and analog input channels (x2). Further, it is accompanied by a real-time visualization and analysis software suite that supports the hardware design, providing access to raw data and real-time configurability on a mobile phone. Finally, we demonstrate the system's versatility through integration into various wearable form factors: an EEG-PPG headband consuming 32.8 mW, an EMG sleeve at 26.7 mW, and an ECG-PPG chestband requiring only 9.3 mW for continuous acquisition and streaming, tailored for diverse biosignal applications. To showcase its edge-AI capabilities, we further deploy two representative on-device applications: (1) ECG-PPG-based PAT estimation at 8.6 mW, and (2) EMG-ACC-based classification of reach-and-grasp motion phases, achieving 79.9 % $\pm$ 5.7 % accuracy at 23.6 mW. All hardware and software design files are also released open-source with a permissive license.
comment: 17 pages, 15 figures
Physics-informed Modularized Neural Network for Advanced Building Control by Deep Reinforcement Learning
Physics-informed machine learning (PIML) provides a promising solution for building energy modeling and can serve as a virtual environment to enable reinforcement learning (RL) agents to interact and learn. However, challenges remain in efficiently integrating physics priors, evaluating the effectiveness of physics constraints, balancing model accuracy and physics consistency, and enabling real-world implementation. To address these gaps, this study introduces a Physics-Informed Modularized Neural Network (PI-ModNN), which incorporates physics priors through a physics-informed model structure, loss functions, and hard constraints. A new evaluation metric called "temperature response violation" is developed to quantify the physical consistency of data-driven building dynamic models under varying control inputs and training data sizes. Additionally, a physics prior evaluation framework based on rule importance is proposed to assess the contribution of each individual physics prior, offering guidance on selecting appropriate PIML techniques. Results indicate that incorporating physical priors does not always improve model performance; inappropriate priors may decrease model accuracy and consistency. However, hard constraints are effective in enforcing model consistency. Furthermore, we present a general workflow for developing control-oriented PIML models and integrating them with deep reinforcement learning (DRL). Following this framework, a case study implementing DRL in an office space over three months demonstrates potential energy savings of 31.4%. Finally, we provide a general guideline for integrating data-driven models with advanced building control through a four-step evaluation framework, paving the way for reliable and scalable deployment of advanced building controls.
Estimating Hormone Concentrations in the Pituitary-Thyroid Feedback Loop from Irregularly Sampled Measurements
Model-based control techniques have recently been investigated for the recommendation of medication dosages to address thyroid diseases. These techniques often rely on knowledge of internal hormone concentrations that cannot be measured from blood samples. Moreover, the measurable concentrations are typically only obtainable at irregular sampling times. In this work, we empirically verify a notion of sample-based detectability that accounts for irregular sampling of the measurable concentrations on two pituitary-thyroid loop models representing patients with hypo- and hyperthyroidism, respectively, and include the internal concentrations as states. We then implement sample-based moving horizon estimation for the models, and test its performance on virtual patients across a range of sampling schemes. Our study shows robust stability of the estimator across all scenarios, and that more frequent sampling leads to less estimation error in the presence of model uncertainty and misreported dosages.
comment: 8 pages; This work has been submitted to IFAC for possible publication
Off-grid solar energy storage system with lithium iron phosphate (LFP) batteries in high mountains: a case report of Tianchi Lodge in Taiwan
Mountain huts are buildings located at high altitude, providing shelter and a place for hikers. Energy supply on mountain huts remains an open issue. Using renewable energies could be an appropriate solution. Tianchi Lodge, a famous mountain hut in Taiwan, has operated an off-grid solar energy storage system with lithium iron phosphate (LFP) batteries since 2020. In this case report, the energy architecture, detailed descriptions, and historical status of the system are provided.
comment: 6 pages, 3 figures, 1 table
On Structural Properties of Risk-Averse Optimal Stopping Problems
We establish structural properties of optimal stopping problems under time-consistent dynamic (coherent) risk measures, focusing on value function monotonicity and the existence of control limit (threshold) optimal policies. While such results are well developed for risk-neutral (expected-value) models, they remain underexplored in risk-averse settings. Coherent risk measures typically lack the tower property and are subadditive rather than additive, complicating structural analysis. We show that value function monotonicity mirrors the risk-neutral case. Moreover, if the risk envelope associated with each coherent risk measure admits a minimal element, the risk-averse optimal stopping problem reduces to an equivalent risk-neutral formulation. We also develop a general procedure for identifying control limit optimal policies and use it to derive practical, verifiable conditions on the risk measures and MDP structure that guarantee their existence. We illustrate the theory and verify these conditions through optimal stopping problems arising in operations, marketing, and finance.
Model Predictive Control with High-Probability Safety Guarantee for Nonlinear Stochastic Systems
We present a model predictive control (MPC) framework for nonlinear stochastic systems that ensures safety guarantee with high probability. Unlike most existing stochastic MPC schemes, our method adopts a set-erosion that converts the probabilistic safety constraint into a tractable deterministic safety constraint on a smaller safe set over deterministic dynamics. As a result, our method is compatible with any off-the-shelf deterministic MPC algorithm. The key to the effectiveness of our method is a tight bound on the stochastic fluctuation of a stochastic trajectory around its nominal version. Our method is scalable and can guarantee safety with high probability level (e.g., 99.99%), making it particularly suitable for safety-critical applications involving complex nonlinear dynamics. Rigorous analysis is conducted to establish a theoretical safety guarantee, and numerical experiments are provided to validate the effectiveness of the proposed MPC method.
Can Carbon-Aware Electric Load Shifting Reduce Emissions? An Equilibrium-Based Analysis
An increasing number of electric loads, such as hydrogen producers or data centers, can be characterized as carbon-sensitive, meaning that they are willing to adapt the timing and/or location of their electricity usage in order to minimize carbon footprints. However, the emission reduction efforts of these carbon-sensitive loads rely on carbon intensity information such as average carbon emissions, and it is unclear whether load shifting based on these signals effectively reduces carbon emissions. To address this open question, we design a carbon-aware equilibrium model, which expands the commonly used equilibrium model for standard (carbon-agnostic) electricity market clearing to include carbon-sensitive consumers that adapt their consumption based on average carbon emission signals and carbon costs. This analysis represents an idealized situation for carbon-sensitive consumers, where their carbon preferences are reflected directly in the market clearing, and contrasts with current practice, where carbon emission signals only become known to consumers a posteriori (i.e., after the market has already been cleared). Furthermore, we extend our model to consider temporal load shifting and time-varying maximum renewable generations. We employ illustrative three-bus examples and numerical simulations on the IEEE RTS-GMLC system to reveal the limitations of the widely adopted average carbon emission signal for guiding carbon emission reduction. Our model offers a novel perspective for evaluating the effectiveness of different carbon signals and contributes to new carbon signal design.
comment: 15 pages, 8 figures, accepted by ACM e-Energy 2026. arXiv admin note: text overlap with arXiv:2501.09853
Optimal Satellite Maneuvers for Spaceborne Jamming Attacks
Satellites are becoming exceedingly critical for communication, making them prime targets for cyber-physical attacks. We consider a rogue satellite in low Earth orbit that jams the uplink communication between another satellite and a ground station. To achieve maximal interference with minimal fuel consumption, the jammer carefully maneuvers itself relative to the target satellite's antenna. We cast this maneuvering objective as a two-stage optimal control problem, involving i) repositioning to an efficient jamming position before uplink communication commences; and ii) maintaining an efficient jamming position after communication has started. We obtain the optimal maneuvering trajectories for the jammer and perform simulations to show how they enable the disruption of uplink communication with reasonable fuel consumption.
Adversarial Pursuits in Cislunar Space
Cislunar space is becoming a critical domain for future lunar and interplanetary missions, yet its remoteness, sparse infrastructure, and unstable dynamics create single points of failure. Adversaries in cislunar orbits can exploit these vulnerabilities to pursue and jam co-located communication relays, potentially severing communications between lunar missions and the Earth. We study a pursuit-evasion scenario between two spacecraft in a cislunar orbit, where the evader must avoid a pursuer-jammer while remaining close to its nominal trajectory. We model the evader-pursuer interaction as a zero-sum adversarial differential game cast in the circular restricted three-body problem. This formulation incorporates critical aspects of cislunar orbital dynamics, including autonomous adjustment of the reference orbit phasing to enable aggressive evading maneuvers, and shaping of the evader's cost with the orbit's stable and unstable manifolds. We solve the resulting nonlinear game locally using a continuous-time differential dynamic programming variant, which iteratively applies linear-quadratic approximations to the Hamilton-Jacobi-Isaacs equation. We simulate the evader's behavior against both a worst-case and a linear-quadratic pursuer. Our results pave the way for securing future missions in cislunar space against emerging cyber threats.
comment: 17 pages, 9 figures
Robotics
SignRAG: A Retrieval-Augmented System for Scalable Zero-Shot Road Sign Recognition
Automated road sign recognition is a critical task for intelligent transportation systems, but traditional deep learning methods struggle with the sheer number of sign classes and the impracticality of creating exhaustive labeled datasets. This paper introduces a novel zero-shot recognition framework that adapts the Retrieval-Augmented Generation (RAG) paradigm to address this challenge. Our method first uses a Vision Language Model (VLM) to generate a textual description of a sign from an input image. This description is used to retrieve a small set of the most relevant sign candidates from a vector database of reference designs. Subsequently, a Large Language Model (LLM) reasons over the retrieved candidates to make a final, fine-grained recognition. We validate this approach on a comprehensive set of 303 regulatory signs from the Ohio MUTCD. Experimental results demonstrate the framework's effectiveness, achieving 95.58% accuracy on ideal reference images and 82.45% on challenging real-world road data. This work demonstrates the viability of RAG-based architectures for creating scalable and accurate systems for road sign recognition without task-specific training.
comment: Submitted to IV 2026
Cross-Level Sensor Fusion with Object Lists via Transformer for 3D Object Detection
In automotive sensor fusion systems, smart sensors and Vehicle-to-Everything (V2X) modules are commonly utilized. Sensor data from these systems are typically available only as processed object lists rather than raw sensor data from traditional sensors. Instead of processing other raw data separately and then fusing them at the object level, we propose an end-to-end cross-level fusion concept with Transformer, which integrates highly abstract object list information with raw camera images for 3D object detection. Object lists are fed into a Transformer as denoising queries and propagated together with learnable queries through the latter feature aggregation process. Additionally, a deformable Gaussian mask, derived from the positional and size dimensional priors from the object lists, is explicitly integrated into the Transformer decoder. This directs attention toward the target area of interest and accelerates model training convergence. Furthermore, as there is no public dataset containing object lists as a standalone modality, we propose an approach to generate pseudo object lists from ground-truth bounding boxes by simulating state noise and false positives and negatives. As the first work to conduct cross-level fusion, our approach shows substantial performance improvements over the vision-based baseline on the nuScenes dataset. It demonstrates its generalization capability over diverse noise levels of simulated object lists and real detectors.
comment: 6 pages, 3 figures, accepted at IV2025
MPC-Guided Safe Reinforcement Learning and Lipschitz-Based Filtering for Structured Nonlinear Systems
Modern engineering systems, such as autonomous vehicles, flexible robotics, and intelligent aerospace platforms, require controllers that are robust to uncertainties, adaptive to environmental changes, and safety-aware under real-time constraints. RL offers powerful data-driven adaptability for systems with nonlinear dynamics that interact with uncertain environments. RL, however, lacks built-in mechanisms for dynamic constraint satisfaction during exploration. MPC offers structured constraint handling and robustness, but its reliance on accurate models and computationally demanding online optimization may pose significant challenges. This paper proposes an integrated MPC-RL framework that combines stability and safety guarantees of MPC with the adaptability of RL. During training, MPC defines safe control bounds that guide the RL component and that enable constraint-aware policy learning. At deployment, the learned policy operates in real time with a lightweight safety filter based on Lipschitz continuity to ensure constraint satisfaction without heavy online optimizations. The approach, which is validated on a nonlinear aeroelastic wing system, demonstrates improved disturbance rejection, reduced actuator effort, and robust performance under turbulence. The architecture generalizes to other domains with structured nonlinearities and bounded disturbances, offering a scalable solution for safe artificial-intelligence-driven control in engineering applications.
SAGA: Open-World Mobile Manipulation via Structured Affordance Grounding
We present SAGA, a versatile and adaptive framework for visuomotor control that can generalize across various environments, task objectives, and user specifications. To efficiently learn such capability, our key idea is to disentangle high-level semantic intent from low-level visuomotor control by explicitly grounding task objectives in the observed environment. Using an affordance-based task representation, we express diverse and complex behaviors in a unified, structured form. By leveraging multimodal foundation models, SAGA grounds the proposed task representation to the robot's visual observation as 3D affordance heatmaps, highlighting task-relevant entities while abstracting away spurious appearance variations that would hinder generalization. These grounded affordances enable us to effectively train a conditional policy on multi-task demonstration data for whole-body control. In a unified framework, SAGA can solve tasks specified in different forms, including language instructions, selected points, and example demonstrations, enabling both zero-shot execution and few-shot adaptation. We instantiate SAGA on a quadrupedal manipulator and conduct extensive experiments across eleven real-world tasks. SAGA consistently outperforms end-to-end and modular baselines by substantial margins. Together, these results demonstrate that structured affordance grounding offers a scalable and effective pathway toward generalist mobile manipulation.
comment: 9 pages, 7 figures
VLG-Loc: Vision-Language Global Localization from Labeled Footprint Maps
This paper presents Vision-Language Global Localization (VLG-Loc), a novel global localization method that uses human-readable labeled footprint maps containing only names and areas of distinctive visual landmarks in an environment. While humans naturally localize themselves using such maps, translating this capability to robotic systems remains highly challenging due to the difficulty of establishing correspondences between observed landmarks and those in the map without geometric and appearance details. To address this challenge, VLG-Loc leverages a vision-language model (VLM) to search the robot's multi-directional image observations for the landmarks noted in the map. The method then identifies robot poses within a Monte Carlo localization framework, where the found landmarks are used to evaluate the likelihood of each pose hypothesis. Experimental validation in simulated and real-world retail environments demonstrates superior robustness compared to existing scan-based methods, particularly under environmental changes. Further improvements are achieved through the probabilistic fusion of visual and scan-based localization.
High Order Control Lyapunov Function - Control Barrier Function - Quadratic Programming Based Autonomous Driving Controller for Bicyclist Safety
Ensuring the safety of Vulnerable Road Users (VRUs) is a critical challenge in the development of advanced autonomous driving systems in smart cities. Among vulnerable road users, bicyclists present unique characteristics that make their safety both critical and also manageable. Vehicles often travel at significantly higher relative speeds when interacting with bicyclists as compared to their interactions with pedestrians which makes collision avoidance system design for bicyclist safety more challenging. Yet, bicyclist movements are generally more predictable and governed by clear traffic rules as compared to the sudden and sometimes erratic pedestrian motion, offering opportunities for model-based control strategies. To address bicyclist safety in complex traffic environments, this study proposes and develops a High Order Control Lyapunov Function High Order Control Barrier Function Quadratic Programming (HOCLF HOCBF QP) control framework. Through this framework, CLFs constraints guarantee system stability so that the vehicle can track its reference trajectory, whereas CBFs constraints ensure system safety by letting vehicle avoiding potential collisions region with surrounding obstacles. Then by solving a QP problem, an optimal control command that simultaneously satisfies stability and safety requirements can be calculated. Three key bicyclist crash scenarios recorded in the Fatality Analysis Reporting System (FARS) are recreated and used to comprehensively evaluate the proposed autonomous driving bicyclist safety control strategy in a simulation study. Simulation results demonstrate that the HOCLF HOCBF QP controller can help the vehicle perform robust, and collision-free maneuvers, highlighting its potential for improving bicyclist safety in complex traffic environments.
Making Robots Play by the Rules: The ROS 2 CLIPS-Executive
CLIPS is a rule-based programming language for building knowledge-driven applications, well suited for the complex task of coordinating autonomous robots. Inspired by the CLIPS-Executive originally developed for the lesser known Fawkes robotics framework, we present an Integration of CLIPS into the ROS ecosystem. Additionally, we show the flexibility of CLIPS by describing a PDDL-based planning framework integration.
HMPCC: Human-Aware Model Predictive Coverage Control
We address the problem of coordinating a team of robots to cover an unknown environment while ensuring safe operation and avoiding collisions with non-cooperative agents. Traditional coverage strategies often rely on simplified assumptions, such as known or convex environments and static density functions, and struggle to adapt to real-world scenarios, especially when humans are involved. In this work, we propose a human-aware coverage framework based on Model Predictive Control (MPC), namely HMPCC, where human motion predictions are integrated into the planning process. By anticipating human trajectories within the MPC horizon, robots can proactively coordinate their actions %avoid redundant exploration, and adapt to dynamic conditions. The environment is modeled as a Gaussian Mixture Model (GMM), representing regions of interest. Team members operate in a fully decentralized manner, without relying on explicit communication, an essential feature in hostile or communication-limited scenarios. Our results show that human trajectory forecasting enables more efficient and adaptive coverage, improving coordination between human and robotic agents.
Bayesian Optimization Parameter Tuning Framework for a Lyapunov Based Path Following Controller
Parameter tuning in real-world experiments is constrained by the limited evaluation budget available on hardware. The path-following controller studied in this paper reflects a typical situation in nonlinear geometric controller, where multiple gains influence the dynamics through coupled nonlinear terms. Such interdependence makes manual tuning inefficient and unlikely to yield satisfactory performance within a practical number of trials. To address this challenge, we propose a Bayesian optimization (BO) framework that treats the closed-loop system as a black box and selects controller gains using a Gaussian-process surrogate. BO offers model-free exploration, quantified uncertainty, and data-efficient search, making it well suited for tuning tasks where each evaluation is costly. The framework is implemented on Honda's AI-Formula three-wheeled robot and assessed through repeated full-lap experiments on a fixed test track. The results show that BO improves controller performance within 32 trials, including 15 warm-start initial evaluations, indicating that it can efficiently locate high-performing regions of the parameter space under real-world conditions. These findings demonstrate that BO provides a practical, reliable, and data-efficient tuning approach for nonlinear path-following controllers on real robotic platforms.
Optimized Conflict Management for Urban Air Mobility Using Swarm UAV Networks
Urban Air Mobility (UAM) poses unprecedented traffic coordination challenges, especially with increasing UAV densities in dense urban corridors. This paper introduces a mathematical model using a control algorithm to optimize an Edge AI-driven decentralized swarm architecture for intelligent conflict resolution, enabling real-time decision-making with low latency. Using lightweight neural networks, the system leverages edge nodes to perform distributed conflict detection and resolution. A simulation platform was developed to evaluate the scheme under various UAV densities. Results indicate that the conflict resolution time is dramatically minimized up to 3.8 times faster, and accuracy is enhanced compared to traditional centralized control models. The proposed architecture is highly promising for scalable, efficient, and safe aerial traffic management in future UAM systems.
comment: Preprint. Under review for conference submission
D3D-VLP: Dynamic 3D Vision-Language-Planning Model for Embodied Grounding and Navigation
Embodied agents face a critical dilemma that end-to-end models lack interpretability and explicit 3D reasoning, while modular systems ignore cross-component interdependencies and synergies. To bridge this gap, we propose the Dynamic 3D Vision-Language-Planning Model (D3D-VLP). Our model introduces two key innovations: 1) A Dynamic 3D Chain-of-Thought (3D CoT) that unifies planning, grounding, navigation, and question answering within a single 3D-VLM and CoT pipeline; 2) A Synergistic Learning from Fragmented Supervision (SLFS) strategy, which uses a masked autoregressive loss to learn from massive and partially-annotated hybrid data. This allows different CoT components to mutually reinforce and implicitly supervise each other. To this end, we construct a large-scale dataset with 10M hybrid samples from 5K real scans and 20K synthetic scenes that are compatible with online learning methods such as RL and DAgger. Our D3D-VLP achieves state-of-the-art results on multiple benchmarks, including Vision-and-Language Navigation (R2R-CE, REVERIE-CE, NavRAG-CE), Object-goal Navigation (HM3D-OVON), and Task-oriented Sequential Grounding and Navigation (SG3D). Real-world mobile manipulation experiments further validate the effectiveness.
LocoMamba: Vision-Driven Locomotion via End-to-End Deep Reinforcement Learning with Mamba
We introduce LocoMamba, a vision-driven cross-modal DRL framework built on selective state-space models, specifically leveraging Mamba, that achieves near-linear-time sequence modeling, effectively captures long-range dependencies, and enables efficient training with longer sequences. First, we embed proprioceptive states with a multilayer perceptron and patchify depth images with a lightweight convolutional neural network, producing compact tokens that improve state representation. Second, stacked Mamba layers fuse these tokens via near-linear-time selective scanning, reducing latency and memory footprint, remaining robust to token length and image resolution, and providing an inductive bias that mitigates overfitting. Third, we train the policy end-to-end with Proximal Policy Optimization under terrain and appearance randomization and an obstacle-density curriculum, using a compact state-centric reward that balances progress, smoothness, and safety. We evaluate our method in challenging simulated environments with static and moving obstacles as well as uneven terrain. Compared with state-of-the-art baselines, our method achieves higher returns and success rates with fewer collisions, exhibits stronger generalization to unseen terrains and obstacle densities, and improves training efficiency by converging in fewer updates under the same compute budget.
comment: 14 pages. This paper has been published in Advanced Engineering Informatics. Please cite the journal version: DOI: 10.1016/j.aei.2025.104230
Learning Obstacle Avoidance using Double DQN for Quadcopter Navigation
One of the challenges faced by Autonomous Aerial Vehicles is reliable navigation through urban environments. Factors like reduction in precision of Global Positioning System (GPS), narrow spaces and dynamically moving obstacles make the path planning of an aerial robot a complicated task. One of the skills required for the agent to effectively navigate through such an environment is to develop an ability to avoid collisions using information from onboard depth sensors. In this paper, we propose Reinforcement Learning of a virtual quadcopter robot agent equipped with a Depth Camera to navigate through a simulated urban environment.
comment: Fixed typo in second author's name throughout the paper
Human-Like Robot Impedance Regulation Skill Learning from Human-Human Demonstrations
Humans are experts in physical collaboration by leveraging cognitive abilities such as perception, reasoning, and decision-making to regulate compliance behaviors based on their partners' states and task requirements. Equipping robots with similar cognitive-inspired collaboration skills can significantly enhance the efficiency and adaptability of human-robot collaboration (HRC). This paper introduces an innovative HumanInspired Impedance Regulation Skill Learning framework (HIImpRSL) for robotic systems to achieve leader-follower and mutual adaptation in multiple physical collaborative tasks. The proposed framework enables the robot to adapt its compliance based on human states and reference trajectories derived from human-human demonstrations. By integrating electromyography (EMG) signals and motion data, we extract endpoint impedance profiles and reference trajectories to construct a joint representation via imitation learning. An LSTM-based module then learns task-oriented impedance regulation policies, which are implemented through a whole-body impedance controller for online impedance adaptation. Experimental validation was conducted through collaborative transportation, two interactive Tai Chi pushing hands, and collaborative sawing tasks with multiple human subjects, demonstrating the ability of our framework to achieve human-like collaboration skills and the superior performance from the perspective of interactive forces compared to four other related methods.
comment: 13 pages, 9 figures
EfficientFlow: Efficient Equivariant Flow Policy Learning for Embodied AI AAAI 2026
Generative modeling has recently shown remarkable promise for visuomotor policy learning, enabling flexible and expressive control across diverse embodied AI tasks. However, existing generative policies often struggle with data inefficiency, requiring large-scale demonstrations, and sampling inefficiency, incurring slow action generation during inference. We introduce EfficientFlow, a unified framework for efficient embodied AI with flow-based policy learning. To enhance data efficiency, we bring equivariance into flow matching. We theoretically prove that when using an isotropic Gaussian prior and an equivariant velocity prediction network, the resulting action distribution remains equivariant, leading to improved generalization and substantially reduced data demands. To accelerate sampling, we propose a novel acceleration regularization strategy. As direct computation of acceleration is intractable for marginal flow trajectories, we derive a novel surrogate loss that enables stable and scalable training using only conditional trajectories. Across a wide range of robotic manipulation benchmarks, the proposed algorithm achieves competitive or superior performance under limited data while offering dramatically faster inference. These results highlight EfficientFlow as a powerful and efficient paradigm for high-performance embodied AI.
comment: Accepted by AAAI 2026. Project Page: https://efficientflow.github.io/
Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics
Learning robust and generalizable world models is crucial for enabling efficient and scalable robotic control in real-world environments. In this work, we introduce a novel framework for learning world models that accurately capture complex, partially observable, and stochastic dynamics. The proposed method employs a dual-autoregressive mechanism and self-supervised training to achieve reliable long-horizon predictions without relying on domain-specific inductive biases, ensuring adaptability across diverse robotic tasks. We further propose a policy optimization framework that leverages world models for efficient training in imagined environments and seamless deployment in real-world systems. This work advances model-based reinforcement learning by addressing the challenges of long-horizon prediction, error accumulation, and sim-to-real transfer. By providing a scalable and robust framework, the introduced methods pave the way for adaptive and efficient robotic systems in real-world applications.
Enhancing Sample Efficiency and Exploration in Reinforcement Learning through the Integration of Diffusion Models and Proximal Policy Optimization
Proximal Policy Optimization (PPO) is widely used in continuous control due to its robustness and stable training, yet it remains sample-inefficient in tasks with expensive interactions and high-dimensional action spaces. This paper proposes PPO-DAP (PPO with Diffusion Action Prior), a strictly on-policy framework that improves exploration quality and learning efficiency without modifying the PPO objective. PPO-DAP follows a two-stage protocol. Offline, we pretrain a conditional diffusion action prior on logged trajectories to cover the action distribution supported by the behavior policy. Online, PPO updates the actor-critic only using newly collected on-policy rollouts, while the diffusion prior is adapted around the on-policy state distribution via parameter-efficient tuning (Adapter/LoRA) over a small parameter subset. For each on-policy state, the prior generates multiple action proposals and concentrates them toward high-value regions using critic-based energy reweighting and in-denoising gradient guidance. These proposals affect the actor only through a low-weight imitation loss and an optional soft KL regularizer to the prior; importantly, PPO gradients are never backpropagated through offline logs or purely synthetic trajectories. We further analyze the method from a dual-proximal perspective and derive a one-step performance lower bound. Across eight MuJoCo continuous-control tasks under a unified online budget of 1.0M environment steps, PPO-DAP consistently improves early learning efficiency (area under the learning curve over the first 40 epochs, ALC@40) and matches or exceeds the strongest on-policy baselines in final return on 6/8 tasks, with modest overhead (1.18+/-0.04x wall-clock time and 1.05+/-0.02x peak GPU memory relative to PPO).
Symphony: A Heuristic Normalized Calibrated Advantage Actor and Critic Algorithm in application for Humanoid Robots
In our work we not explicitly hint that it is a misconception to think that humans learn fast. Learning process takes time. Babies start learning to move in the restricted liquid area called placenta. Children often are limited by underdeveloped body. Even adults are not allowed to participate in complex competitions right away. However, with robots, when learning from scratch, we often don't have the privilege of waiting for dozen millions of steps. "Swaddling" regularization is responsible for restraining an agent in rapid but unstable development penalizing action strength in a specific way not affecting actions directly. The Symphony, Transitional-policy Deterministic Actor and Critic algorithm, is a concise combination of different ideas for possibility of training humanoid robots from scratch with Sample Efficiency, Sample Proximity and Safety of Actions in mind. It is no secret that continuous increase in Gaussian noise without appropriate smoothing is harmful for motors and gearboxes. Compared to Stochastic algorithms, we set a limited parametric noise and promote a reduced strength of actions, safely increasing entropy, since the actions are kind of immersed in weaker noise. When actions require more extreme values, actions rise above the weak noise. Training becomes empirically much safer for both the environment around and the robot's mechanisms. We use Fading Replay Buffer: using a fixed formula containing the hyperbolic tangent, we adjust the batch sampling probability: the memory contains a recent memory and a long-term memory trail. Fading Replay Buffer allows us to use Temporal Advantage when we improve the current Critic Network prediction compared to the exponential moving average. Temporal Advantage allows us to update Actor and Critic in one pass, as well as combine Actor and Critic in one Object and implement their Losses in one line.
comment: https://github.com/SuspensionRailway/symphony
Multiagent Systems
Beyond Task Completion: An Assessment Framework for Evaluating Agentic AI Systems
Recent advances in agentic AI have shifted the focus from standalone Large Language Models (LLMs) to integrated systems that combine LLMs with tools, memory, and other agents to perform complex tasks. These multi-agent architectures enable coordinated reasoning, planning, and execution across diverse domains, allowing agents to collaboratively automate complex workflows. Despite these advances, evaluation and assessment of LLM agents and the multi-agent systems they constitute remain a fundamental challenge. Although various approaches have been proposed in the software engineering literature for evaluating conventional software components, existing methods for AI-based systems often overlook the non-deterministic nature of models. This non-determinism introduces behavioral uncertainty during execution, yet existing evaluations rely on binary task completion metrics that fail to capture it. Evaluating agentic systems therefore requires examining additional dimensions, including the agent ability to invoke tools, ingest and retrieve memory, collaborate with other agents, and interact effectively with its environment. We propose an end-to-end Agent Assessment Framework with four evaluation pillars encompassing LLMs, Memory, Tools, and Environment. We validate the framework on a representative Autonomous CloudOps use case, where experiments reveal behavioral deviations overlooked by conventional metrics, demonstrating its effectiveness in capturing runtime uncertainties.
Value-Aware Multiagent Systems
This paper introduces the concept of value awareness in AI, which goes beyond the traditional value-alignment problem. Our definition of value awareness presents us with a concise and simplified roadmap for engineering value-aware AI. The roadmap is structured around three core pillars: (1) learning and representing human values using formal semantics, (2) ensuring the value alignment of both individual agents and multiagent systems, and (3) providing value-based explainability on behaviour. The paper presents a selection of our ongoing work on some of these topics, along with applications to real-life domains.
EcoNet: Multiagent Planning and Control Of Household Energy Resources Using Active Inference
Advances in automated systems afford new opportunities for intelligent management of energy at household, local area, and utility scales. Home Energy Management Systems (HEMS) can play a role by optimizing the schedule and use of household energy devices and resources. One challenge is that the goals of a household can be complex and conflicting. For example, a household might wish to reduce energy costs and grid-associated greenhouse gas emissions, yet keep room temperatures comfortable. Another challenge is that an intelligent HEMS agent must make decisions under uncertainty. An agent must plan actions into the future, but weather and solar generation forecasts, for example, provide inherently uncertain estimates of future conditions. This paper introduces EcoNet, a Bayesian approach to household and neighborhood energy management that is based on active inference. The aim is to improve energy management and coordination, while accommodating uncertainties and taking into account potentially conditional and conflicting goals and preferences. Simulation results are presented and discussed.
comment: 17 pages, 9 figures
Social welfare optimisation in well-mixed and structured populations
Research on promoting cooperation among autonomous, self-regarding agents has often focused on the bi-objective optimisation problem: minimising the total incentive cost while maximising the frequency of cooperation. However, the optimal value of social welfare under such constraints remains largely unexplored. In this work, we hypothesise that achieving maximal social welfare is not guaranteed at the minimal incentive cost required to drive agents to a desired cooperative state. To address this gap, we adopt to a single-objective approach focused on maximising social welfare, building upon foundational evolutionary game theory models that examined cost efficiency in finite populations, in both well-mixed and structured population settings. Our analytical model and agent-based simulations show how different interference strategies, including rewarding local versus global behavioural patterns, affect social welfare and dynamics of cooperation. Our results reveal a significant gap in the per-individual incentive cost between optimising for pure cost efficiency or cooperation frequency and optimising for maximal social welfare. Overall, our findings indicate that incentive design, policy, and benchmarking in multi-agent systems and human societies should prioritise welfare-centric objectives over proxy targets of cost or cooperation frequency.
Fast and Robust Flocking of Protesters on Street Networks
We present a simple model of protesters scattered throughout a city who want to gather into large and mobile groups. This model relies on random walkers on a street network that follow tactics built from a set of basic rules. Our goal is to identify the most important rules for fast and robust flocking of walkers. We explore a wide set of tactics and show the central importance of a specific rule based on alignment. Other rules alone perform poorly, but our experiments show that combining alignment with them enhances flocking, and that obtained groups are then remarkably robust.
The Traitors: Deception and Trust in Multi-Agent Language Model Simulations
As AI systems increasingly assume roles where trust and alignment with human values are essential, understanding when and why they engage in deception has become a critical research priority. We introduce The Traitors, a multi-agent simulation framework inspired by social deduction games, designed to probe deception, trust formation, and strategic communication among large language model (LLM) agents under asymmetric information. A minority of agents the traitors seek to mislead the majority, while the faithful must infer hidden identities through dialogue and reasoning. Our contributions are: (1) we ground the environment in formal frameworks from game theory, behavioral economics, and social cognition; (2) we develop a suite of evaluation metrics capturing deception success, trust dynamics, and collective inference quality; (3) we implement a fully autonomous simulation platform where LLMs reason over persistent memory and evolving social dynamics, with support for heterogeneous agent populations, specialized traits, and adaptive behaviors. Our initial experiments across DeepSeek-V3, GPT-4o-mini, and GPT-4o (10 runs per model) reveal a notable asymmetry: advanced models like GPT-4o demonstrate superior deceptive capabilities yet exhibit disproportionate vulnerability to others' falsehoods. This suggests deception skills may scale faster than detection abilities. Overall, The Traitors provides a focused, configurable testbed for investigating LLM behavior in socially nuanced interactions. We position this work as a contribution toward more rigorous research on deception mechanisms, alignment challenges, and the broader social reliability of AI systems.
comment: 9 main pages, 31 pages
Climate Driven Interactions Between Malaria Transmission and Diabetes Prevalence
Climate change is intensifying infectious and chronic diseases like malaria and diabetes, respectively, especially among the vulnerable populations. Global temperatures have risen by approximately $0.6^\circ$C since 1950, extending the window of transmission for mosquito-borne infections and worsening outcomes in diabetes due to metabolic stress caused by heat. People living with diabetes have already weakened immune defenses and, therefore, are at an alarmingly increased risk of contraction of malaria. However, most models rarely include both ways of interaction in changing climate conditions. In the paper, we introduce a new compartmental epidemiological model based on synthetic data fitted to disease patterns of India from 2019 to 2021. The framework captures temperature-dependent transmission parameters, seasonal variability, and different disease dynamics between diabetic and non-diabetic groups within the three-compartment system. Model calibration using Multi-Start optimization combined with Sequential Quadratic Programming allows us to find outstanding differences between populations. The odds of malaria infection in diabetic individuals were found to be 1.8--4.0 times higher, with peak infection levels in 35--36\%, as compared to 20--21\% in the non-diabetic ones. The fitted model was able to capture well the epidemiological patterns observed, while the basic reproduction number averaged around 2.3, ranging from 0.31 to 2.75 in different seasons. Given that India's diabetic population is set to rise to about 157 million people by 2050, these findings point to a pressing need for concerted efforts toward climate-informed health strategies and monitoring systems that address both malaria and diabetes jointly.
comment: 21 pages, 6 figures
Systems and Control (EESS)
On the embedding transformation for optimal control of multi-mode switched systems
This paper develops an embedding-based approach to solve switched optimal control problems (SOCPs) with an arbitrary number of subsystems. Initially, the discrete switching signal is represented by a set of binary variables, encoding each mode in binary format. An embedded optimal control problem (EOCP) is then formulated by replacing these binary variables with continuous embedded variables that can take intermediate values between zero and one. Although embedding allows SOCPs to be addressed using conventional techniques, the optimal solutions of EOCPs often yield intermediate values for binary variables, which may not be feasible for the original SOCP. To address this challenge, a modified EOCP (MEOCP) is introduced by adding a concave auxiliary cost function of appropriate dimensionality to the main cost function. This addition ensures that the optimal solution of the EOCP is bang-bang, and as a result, feasible for the original SOCP.
Heavy-Duty Electric Vehicles Contribution for Frequency Response in Power Systems with V2G
The integration of heavy-duty electric vehicles (EVs) with Vehicle-to-Grid (V2G) capability offers a promising solution to enhance grid stability by providing primary frequency response in power systems. This paper investigates the potential of heavy-duty EVs to support the California power grid under different charging strategies: immediate, delayed, and constant minimum power charging. Simulation results demonstrate that both V2G-capable EVs and non-V2G modes have great potential to provide primary frequency response, with V2G-capable EVs exhibiting especially strong contributions. The study highlights the influence of charging strategies, control modes, and grid conditions on EV contributions to grid stability, emphasizing their critical role in mitigating the adverse effects of renewable energy penetration.
comment: Accepted at The 16th IEEE International Conference on Green Energy and Smart Systems, 2025
CapOptix: An Options-Framework for Capacity Market Pricing
Electricity markets are under increasing pressure to maintain reliability amidst rising renewable penetration, demand variability, and occasional price shocks. Traditional capacity market designs often fall short in addressing this by relying on expected-value metrics of energy unserved, which overlook risk exposure in such systems. In this work, we present CapOptix, a capacity pricing framework that interprets capacity commitments as reliability options, i.e., financial derivatives of wholesale electricity prices. CapOptix characterizes the capacity premia charged by accounting for structural price shifts modeled by the Markov Regime Switching Process. We apply the framework to historical price data from multiple electricity markets and compare the resulting premium ranges with existing capacity remuneration mechanisms.
MPC-Guided Safe Reinforcement Learning and Lipschitz-Based Filtering for Structured Nonlinear Systems
Modern engineering systems, such as autonomous vehicles, flexible robotics, and intelligent aerospace platforms, require controllers that are robust to uncertainties, adaptive to environmental changes, and safety-aware under real-time constraints. RL offers powerful data-driven adaptability for systems with nonlinear dynamics that interact with uncertain environments. RL, however, lacks built-in mechanisms for dynamic constraint satisfaction during exploration. MPC offers structured constraint handling and robustness, but its reliance on accurate models and computationally demanding online optimization may pose significant challenges. This paper proposes an integrated MPC-RL framework that combines stability and safety guarantees of MPC with the adaptability of RL. During training, MPC defines safe control bounds that guide the RL component and that enable constraint-aware policy learning. At deployment, the learned policy operates in real time with a lightweight safety filter based on Lipschitz continuity to ensure constraint satisfaction without heavy online optimizations. The approach, which is validated on a nonlinear aeroelastic wing system, demonstrates improved disturbance rejection, reduced actuator effort, and robust performance under turbulence. The architecture generalizes to other domains with structured nonlinearities and bounded disturbances, offering a scalable solution for safe artificial-intelligence-driven control in engineering applications.
KANELÉ: Kolmogorov-Arnold Networks for Efficient LUT-based Evaluation
Low-latency, resource-efficient neural network inference on FPGAs is essential for applications demanding real-time capability and low power. Lookup table (LUT)-based neural networks are a common solution, combining strong representational power with efficient FPGA implementation. In this work, we introduce KANELÉ, a framework that exploits the unique properties of Kolmogorov-Arnold Networks (KANs) for FPGA deployment. Unlike traditional multilayer perceptrons (MLPs), KANs employ learnable one-dimensional splines with fixed domains as edge activations, a structure naturally suited to discretization and efficient LUT mapping. We present the first systematic design flow for implementing KANs on FPGAs, co-optimizing training with quantization and pruning to enable compact, high-throughput, and low-latency KAN architectures. Our results demonstrate up to a 2700x speedup and orders of magnitude resource savings compared to prior KAN-on-FPGA approaches. Moreover, KANELÉ matches or surpasses other LUT-based architectures on widely used benchmarks, particularly for tasks involving symbolic or physical formulas, while balancing resource usage across FPGA hardware. Finally, we showcase the versatility of the framework by extending it to real-time, power-efficient control systems.
comment: International Symposium on Field-Programmable Gate Arrays 2026 (ISFPGA'2026)
Data-driven Supervisory Control under Attacks via Spectral Learning
The technological advancements facilitating the rapid development of cyber-physical systems (CPS) also render such systems vulnerable to cyber attacks with devastating effects. Supervisory control is a commonly used control method to neutralize attacks on CPS. The supervisor strives to confine the (symbolic) paths of the system to a desired language via sensors and actuators in a closed control loop, even when attackers can manipulate the symbols received by the sensors and actuators. Currently, supervisory control methods face limitations when effectively identifying and mitigating unknown, broad-spectrum attackers. In order to capture the behavior of broad-spectrum attacks on both sensing and actuation channels we model the plant, supervisors, and attackers with finite-state transducers (FSTs). Our general method for addressing unknown attackers involves constructing FST models of the attackers from spectral analysis of their input and output symbol sequences recorded from a history of attack behaviors observed in a supervisory control loop. To construct these FST models, we devise a novel learning method based on the recorded history of attack behaviors. A supervisor is synthesized using such models to neutralize the attacks.
comment: 10 pages
Sensitivity increase of 3D printed, self-sensing, carbon fibers structures with conductive filament matrix due to flexural loading
The excellent structural and piezoresistive properties of continuous carbon fiber make it suitable for both structural and sensing applications. This work studies the use of 3D printed, continuous carbon fiber reinforced beams as self-sensing structures. It is demonstrated how the sensitivity of these carbon fiber strain gauges can be increased irreversibly by means of a pretreatment by ``breaking-in'' the sensors with a large compressive bending load. The increase in the gauge factor is attributed to local progressive fiber failure, due to the combination of the thermal residual stress from the printing process and external loading. The coextrusion of conductive filament around the carbon fibers is demonstrated as a means of improving the reliability, noise and electrical connection of the sensors. A micrograph of the sensor cross section shows that the conductive filament contacts the various carbon fiber bundles. All-in-all, the use of ``breaking-in'' carbon fiber strain gauges in combination with coextrusion of conductive filament hold promises for 3D printed structural sensors with a high sensitivity.
comment: 28 pages, 9 figures, 4 tables
Distributed Reinforcement Learning using Local Smart Meter Data for Voltage Regulation in Distribution Networks
Centralised reinforcement learning (RL) for voltage magnitude regulation in distribution networks typically involves numerous agent-environment interactions and power flow (PF) calculations, inducing computational overhead and privacy concerns over shared data. Thus, we propose a distributed RL algorithm to regulate voltage magnitude. First, a dynamic Thevenin equivalent model is integrated within smart meters (SM), enabling local voltage magnitude estimation using local SM data for RL agent training, and mitigating the dependency of synchronised data collection and centralised PF calculations. To mitigate estimation errors induced by Thevenin model inaccuracies, a voltage magnitude correction strategy that combines piecewise functions and neural networks is introduced. The piecewise function corrects the large errors of estimated voltage magnitude, while a neural network mimics the grid's sensitivity to control actions, improving action adjustment precision. Second, a coordination strategy is proposed to refine local RL agent actions online, preventing voltage magnitude violations induced by excessive actions from multiple independently trained agents. Case studies on energy storage systems validate the feasibility and effectiveness of the proposed approach, demonstrating its potential to improve voltage regulation in distribution networks.
A Rule-Aware Prompt Framework for Structured Numeric Reasoning in Cyber-Physical Systems
Many cyber-physical systems (CPS) rely on high-dimensional numeric telemetry and explicit operating rules to maintain safe and efficient operation. Recent large language models (LLMs) are increasingly considered as decision-support components in such systems, yet most deployments focus on textual inputs and do not directly address rule-grounded reasoning over numeric measurements. This paper proposes a rule-aware prompt framework that systematically encodes CPS domain context, numeric normalization, and decision rules into a modular prompt architecture for LLMs. The framework decomposes prompts into five reusable modules, including role specification, CPS domain context, numeric normalization, rule-aware reasoning, and output schema, and exposes an interface for plugging in diverse rule sets. A key design element is separating rule specification from the representation of normalized numeric deviations, which enables concise prompts that remain aligned with domain rules. We analyze how different normalization strategies and prompt configurations influence rule adherence, interpretability, and token efficiency. The framework is model-agnostic and applicable across CPS domains. To illustrate its behavior, we instantiate it on numeric anomaly assessment in an IEEE 118-bus electric power transmission network and evaluate several prompting and adaptation regimes. The results show that rule-aware, z-score-based value blocks and a hybrid LLM-detector architecture can substantially improve consistency with CPS rules and anomaly detection performance while reducing token usage, providing a reusable bridge between numeric telemetry and general-purpose LLMs.
High Order Control Lyapunov Function - Control Barrier Function - Quadratic Programming Based Autonomous Driving Controller for Bicyclist Safety
Ensuring the safety of Vulnerable Road Users (VRUs) is a critical challenge in the development of advanced autonomous driving systems in smart cities. Among vulnerable road users, bicyclists present unique characteristics that make their safety both critical and also manageable. Vehicles often travel at significantly higher relative speeds when interacting with bicyclists as compared to their interactions with pedestrians which makes collision avoidance system design for bicyclist safety more challenging. Yet, bicyclist movements are generally more predictable and governed by clear traffic rules as compared to the sudden and sometimes erratic pedestrian motion, offering opportunities for model-based control strategies. To address bicyclist safety in complex traffic environments, this study proposes and develops a High Order Control Lyapunov Function High Order Control Barrier Function Quadratic Programming (HOCLF HOCBF QP) control framework. Through this framework, CLFs constraints guarantee system stability so that the vehicle can track its reference trajectory, whereas CBFs constraints ensure system safety by letting vehicle avoiding potential collisions region with surrounding obstacles. Then by solving a QP problem, an optimal control command that simultaneously satisfies stability and safety requirements can be calculated. Three key bicyclist crash scenarios recorded in the Fatality Analysis Reporting System (FARS) are recreated and used to comprehensively evaluate the proposed autonomous driving bicyclist safety control strategy in a simulation study. Simulation results demonstrate that the HOCLF HOCBF QP controller can help the vehicle perform robust, and collision-free maneuvers, highlighting its potential for improving bicyclist safety in complex traffic environments.
Lexicographic Multi-Objective Stochastic Shortest Path with Mixed Max-Sum Costs
We study the Stochastic Shortest Path (SSP) problem for autonomous systems with mixed max-sum cost aggregations under Linear Temporal Logic constraints. Classical SSP formulations rely on sum-aggregated costs, which are suitable for cumulative quantities such as time or energy but fail to capture bottleneck-style objectives such as avoiding high-risk transitions, where performance is determined by the worst single event along a trajectory. Such objectives are particularly important in safety-critical systems, where even one hazardous transition can be unacceptable. To address this limitation, we introduce max-aggregated objectives that minimize the bottleneck cost, i.e., the maximum one-step cost along a trajectory. We show that standard Bellman equations on the original state space do not apply in this setting and propose an augmented MDP with a state variable tracking the running maximum cost, together with a value iteration algorithm. We further identify a cyclic policy phenomenon, where zero-marginal-cost cycles prevent goal reaching under max-aggregation, and resolve it via a finite-horizon formulation. To handle richer task requirements, linear temporal logic specifications are translated into deterministic finite automata and combined with the system to construct a product MDP. We propose a lexicographic value iteration algorithm that handles mixed max-sum objectives under lexicographic ordering on this product MDP. Gridworld case studies demonstrate the effectiveness of the proposed framework.
From Information Freshness to Semantics of Information and Goal-oriented Communications
Future wireless networks must support real-time, data-driven cyber-physical systems in which communication is tightly coupled with sensing, inference, control, and decision-making. Traditional communication paradigms centered on accuracy, throughput, and latency are increasingly inadequate for these systems, where the value of information depends on its semantic relevance to a specific task. This paper provides a unified exposition of the progression from classical distortion-based frameworks, through information freshness metrics such as the Age of Information (AoI) and its variants, to the emerging paradigm of goal-oriented semantics-aware communication. We organize and systematize existing semantics-aware metrics, including content- and version-aware measures, context-dependent distortion formulations, and history-dependent error persistence metrics that capture lasting impact and urgency. Within this framework, we highlight how these metrics address the limitations of purely accuracy- or freshness-centric designs, and how they collectively enable the selective generation and transmission of only task-relevant information. We further review analytical tools based on Markov decision process (MDP) and Lyapunov optimization methods that have been employed to characterize optimal or near-optimal timing and scheduling policies under semantic performance criteria and communication constraints. By synthesizing these developments into a coherent framework, the paper clarifies the design principles underlying goal-oriented, semantics-aware communication systems. It illustrates how they can significantly improve efficiency, reliability, and task performance. The presented perspective aims to serve as a bridge between information-theoretic, control-theoretic, and networking viewpoints, and to guide the design of semantic communication architectures for 6G and beyond.
comment: Submitted to IEEE as invited paper
An End-to-End Approach for Microgrid Probabilistic Forecasting and Robust Operation via Decision-focused Learning
High penetration of renewable energy sources (RES) introduces significant uncertainty and intermittency into microgrid operations, posing challenges to economic and reliable scheduling. To address this, this paper proposes an end-to-end decision-focused framework that jointly optimizes probabilistic forecasting and robust operation for microgrids. A multilayer encoder-decoder (MED) probabilistic forecasting model is integrated with a two-stage robust optimization (TSRO) model involving direct load control (DLC) through a differentiable decision pathway, enabling gradient-based feedback from operational outcomes to improve forecasting performance. Unlike conventional sequential approaches, the proposed method aligns forecasting accuracy with operational objectives by directly minimizing decision regret via a surrogate smart predict-then-optimize (SPO) loss function. This integration ensures that probabilistic forecasts are optimized for downstream decisions, enhancing both economic efficiency and robustness. Case studies on modified IEEE 33-bus and 69-bus systems demonstrate that the proposed framework achieves superior forecasting accuracy and operational performance, reducing total and net operation costs by up to 18% compared with conventional forecasting and optimization combinations. The results verify the effectiveness and scalability of the end-to-end decision-focused approach for resilient and cost-efficient microgrid management under uncertainty.
comment: 10 pages
Bayesian Optimization Parameter Tuning Framework for a Lyapunov Based Path Following Controller
Parameter tuning in real-world experiments is constrained by the limited evaluation budget available on hardware. The path-following controller studied in this paper reflects a typical situation in nonlinear geometric controller, where multiple gains influence the dynamics through coupled nonlinear terms. Such interdependence makes manual tuning inefficient and unlikely to yield satisfactory performance within a practical number of trials. To address this challenge, we propose a Bayesian optimization (BO) framework that treats the closed-loop system as a black box and selects controller gains using a Gaussian-process surrogate. BO offers model-free exploration, quantified uncertainty, and data-efficient search, making it well suited for tuning tasks where each evaluation is costly. The framework is implemented on Honda's AI-Formula three-wheeled robot and assessed through repeated full-lap experiments on a fixed test track. The results show that BO improves controller performance within 32 trials, including 15 warm-start initial evaluations, indicating that it can efficiently locate high-performing regions of the parameter space under real-world conditions. These findings demonstrate that BO provides a practical, reliable, and data-efficient tuning approach for nonlinear path-following controllers on real robotic platforms.
Electric Road Systems for Smart Cities: A Scalable Infrastructure Framework for Dynamic Wireless Charging
The transition to electric transportation is a key enabler for intelligent and sustainable cities; however, inadequate charging infrastructure remains a major barrier to large-scale electric vehicle (EV) adoption. This paper presents a scalable Electric Road System (ERS) architecture that enables Dynamic Wireless Charging (DWC) of EVs during motion. The proposed framework integrates inductive charging coils embedded in road pavement, real-time vehicle-to-infrastructure (V2I) communication, and adaptive energy management coordinated with smart grid systems. Modular road segments with a standardized charging process are employed to ensure scalability across urban corridors and interoperability among different EV platforms. System performance is evaluated using a co-simulation framework combining MATLAB-based power analysis with traffic inputs generated in SUMO. Key performance metrics include charging efficiency, energy cost per kilometer, and battery lifecycle improvement. Simulation results indicate a potential reduction in range anxiety and an increase in battery lifespan due to frequent shallow charging cycles. The study further discusses deployment challenges, policy considerations, and energy distribution strategies aligned with climate-resilient urban development. A case study of a tier-1 Indian city is presented to analyze the cost-benefit trade-offs of retrofitting high-density urban corridors with ERS. The proposed framework provides a practical foundation for next-generation EV infrastructure planning in smart cities.
comment: Preprint. Under review for conference submission. Simulation-based study
Quadratic-Programming-based Control of Multi-Robot Systems for Cooperative Object Transport
This paper investigates the control problem of steering a group of spherical mobile robots to cooperatively transport a spherical object. By controlling the movements of the robots to exert appropriate contact (pushing) forces, it is desired that the object follows a velocity command. To solve the problem, we first treat the robots' positions as virtual control inputs of the object, and propose a velocity-tracking controller based on quadratic programming (QP), enabling the robots to cooperatively generate desired contact forces while minimizing the sum of the contact-force magnitudes. Then, we design position-tracking controllers for the robots. By appropriately designing the objective function and the constraints for the QP, it is guaranteed that the QP admits a unique solution and the QP-based velocity-tracking controller is Lipschitz continuous. Finally, we consider the closed-loop system as an interconnection of two subsystems, corresponding to the velocity-tracking error of the object and the position-tracking error of the robots, and employ nonlinear small-gain techniques for stability analysis. The effectiveness of the proposed design is demonstrated through numerical simulations.
Feasible-Set Reshaping for Constraint Qualification in Optimization-Based Control
This paper presents a novel feasible-set reshaping technique to optimization-based control with ensured constraint qualification. In our problem setting, the feasible set of admissible control inputs depends on the real-time state of the plant, and the linear independence constraint qualification (LICQ) may not be satisfied in some regions of interest. By feasible-set reshaping, we project the constraints of the original feasible set onto an appropriately chosen constant matrix with its rows forming a positive span of the space of the optimization variable. It is proved that the reshaped feasible set is nonempty and satisfies LICQ, as long as the original feasible set is nonempty. The effectiveness of the proposed method is verified by constructing Lipschitz continuous quadratic-program-based (QP-based) controllers based on the reshaped feasible sets.
Robustness analysis in static and dynamic quantum state tomography
Quantum state tomography is a core task in quantum system identification. Real experimental conditions often deviate from nominal designs, introducing errors in both the measurement devices and the Hamiltonian governing the system's dynamics. In this paper, we investigate the robustness of quantum state tomography against such perturbations in both static and dynamic settings using linear regression estimation. We derive explicit bounds that quantify how bounded errors in the measurement devices and the Hamiltonian affect the mean squared error (MSE) upper bound in each scenario. Numerical simulations for qubit systems illustrate how these bounds scale with resources.
comment: 7 pages, 3 figures
A Unified Low-rank ADI Framework with Shared Linear Solves for Simultaneously Solving Multiple Lyapunov, Sylvester, and Riccati Equations
The alternating direction implicit (ADI) methods are computationally efficient and numerically effective tools for computing low-rank solutions of large-scale linear matrix equations. It is known in the literature that the low-rank ADI method for Lyapunov equations is a Petrov-Galerkin projection algorithm that implicitly performs model order reduction. It recursively enforces interpolation at the mirror images of the ADI shifts and places the poles of the reduced-order models at the ADI shifts. In this paper, we show that the low-rank ADI methods for Sylvester and Riccati equations are also Petrov-Galerkin projection algorithms that implicitly perform model order reduction. These methods likewise enforce interpolation at the mirror images of the ADI shifts; however, they do not place the poles at the mirror images of the interpolation points. Instead, their pole placement ensures that the projected Sylvester and Riccati equations they implicitly solve admit a unique solution. By observing that the ADI methods for Lyapunov, Sylvester, and Riccati equations differ only in pole placement and not in their interpolatory nature, we show that the shifted linear solves-which constitute the bulk of the computational cost-can be shared. The pole-placement step involves only small-scale operations and is therefore inexpensive. We propose a unified ADI framework that requires only two shifted linear solves per iteration to simultaneously solve six Lyapunov equations, one Sylvester equation, and ten Riccati equations, thus substantially increasing the return on investment for the computational cost spent on the linear solves. All operations needed to extract the individual solutions from these shared linear solves are small-scale and inexpensive. Three numerical examples are presented to demonstrate the effectiveness of the proposed ADI framework.
LocoMamba: Vision-Driven Locomotion via End-to-End Deep Reinforcement Learning with Mamba
We introduce LocoMamba, a vision-driven cross-modal DRL framework built on selective state-space models, specifically leveraging Mamba, that achieves near-linear-time sequence modeling, effectively captures long-range dependencies, and enables efficient training with longer sequences. First, we embed proprioceptive states with a multilayer perceptron and patchify depth images with a lightweight convolutional neural network, producing compact tokens that improve state representation. Second, stacked Mamba layers fuse these tokens via near-linear-time selective scanning, reducing latency and memory footprint, remaining robust to token length and image resolution, and providing an inductive bias that mitigates overfitting. Third, we train the policy end-to-end with Proximal Policy Optimization under terrain and appearance randomization and an obstacle-density curriculum, using a compact state-centric reward that balances progress, smoothness, and safety. We evaluate our method in challenging simulated environments with static and moving obstacles as well as uneven terrain. Compared with state-of-the-art baselines, our method achieves higher returns and success rates with fewer collisions, exhibits stronger generalization to unseen terrains and obstacle densities, and improves training efficiency by converging in fewer updates under the same compute budget.
comment: 14 pages. This paper has been published in Advanced Engineering Informatics. Please cite the journal version: DOI: 10.1016/j.aei.2025.104230
SuperGen: An Efficient Ultra-high-resolution Video Generation System with Sketching and Tiling
Diffusion models have recently achieved remarkable success in generative tasks (e.g., image and video generation), and the demand for high-quality content (e.g., 2K/4K videos) is rapidly increasing across various domains. However, generating ultra-high-resolution videos on existing standard-resolution (e.g., 720p) platforms remains challenging due to the excessive re-training requirements and prohibitively high computational and memory costs. To this end, we introduce SUPERGEN, an efficient tile-based framework for ultra-high-resolution video generation. SUPERGEN features a novel training-free algorithmic innovation with tiling to successfully support a wide range of resolutions without additional training efforts while significantly reducing both memory footprint and computational complexity. Moreover, SUPERGEN incorporates a tile-tailored, adaptive, region-aware caching strategy that accelerates video generation by exploiting redundancy across denoising steps and spatial regions. SUPERGEN also integrates cache-guided, communication-minimized tile parallelism for enhanced throughput and minimized latency. Evaluations show that SUPERGEN maximizes performance gains while achieving high output quality across various benchmarks.
Major Space Weather Risks Identified via Coupled Physics-Engineering-Economic Modeling
Space weather poses an important but under-quantified threat to critical infrastructure, the economy, and society. While extreme geomagnetic storms are recognized as potential global catastrophes, their socio-economic impacts remain poorly quantified. Here we present a novel physics-engineering-economic framework that links geophysical drivers of geomagnetic storms to power grid geoelectric fields, transformer vulnerability, and macroeconomic consequences. Using the United States as an example, we estimate daily economic losses from transformer thermal heating of 1.37 billion USD (95 percent confidence interval: 1.16 to 1.58 billion USD) for a 100-year geomagnetic storm, with power outages affecting 4.1 million people and 101,000 businesses. A 250-year event could raise losses to 2.09 billion USD per day (95 percent confidence interval: 1.84 to 2.34 billion USD), disrupting power for more than 6 million people and 155,000 businesses. Crucially, the framework is scalable and transferable, offering a template for assessing space weather risk to critical infrastructure in other countries. This integrative approach provides the first end-to-end quantification of space weather socio-economic impacts, bridging space physics through to policy-relevant metrics. Our results demonstrate that coupled socio-economic modeling of space weather is both feasible and essential, enabling evidence-based decision making in infrastructure protection and global risk management.
Adaptive Federated Learning to Optimize Integrated Flows in Cyber-Physical Data Centers
Data centers play an increasingly critical role in societal digitalization, yet their rapidly growing energy demand poses significant challenges for sustainable operation. To enhance the energy efficiency of geographically distributed data centers, this paper formulates a multi-period optimization model that captures the interdependence of electricity, heat, and data flows. The optimization of such integrated multi-domain flows inherently involves mixed-integer formulations and the access to proprietary or sensitive datasets, which correspondingly exacerbate computational complexity and raise data-privacy concerns. To address these challenges, an adaptive federated learning-to-optimization approach is proposed, accounting for the heterogeneity of datasets across distributed data centers. To safeguard privacy, cryptography techniques are leveraged in both the learning and optimization processes. A model acceptance criterion with convergence guarantee is developed to improve learning performance and filter out potentially contaminated data, while a verifiable double aggregation mechanism is further proposed to simultaneously ensure privacy and integrity of shared data during optimization. Theoretical analysis and numerical simulations demonstrate that the proposed approach preserves the privacy and integrity of shared data, achieves near-optimal performance, and exhibits high computational efficiency, making it suitable for large-scale data center optimization under privacy constraints.
Robotics
Autonomously Unweaving Multiple Cables Using Visual Feedback
Many cable management tasks involve separating out the different cables and removing tangles. Automating this task is challenging because cables are deformable and can have combinations of knots and multiple interwoven segments. Prior works have focused on untying knots in one cable, which is one subtask of cable management. However, in this paper, we focus on a different subtask called multi-cable unweaving, which refers to removing the intersections among multiple interwoven cables to separate them and facilitate further manipulation. We propose a method that utilizes visual feedback to unweave a bundle of loosely entangled cables. We formulate cable unweaving as a pick-and-place problem, where the grasp position is selected from discrete nodes in a graph-based cable state representation. Our cable state representation encodes both topological and geometric information about the cables from the visual image. To predict future cable states and identify valid actions, we present a novel state transition model that takes into account the straightening and bending of cables during manipulation. Using this state transition model, we select between two high-level action primitives and calculate predicted immediate costs to optimize the lower-level actions. We experimentally demonstrate that iterating the above perception-planning-action process enables unweaving electric cables and shoelaces with an 84% success rate on average.
comment: 6 pages, 5 figures
Sim2Real Reinforcement Learning for Soccer skills
This thesis work presents a more efficient and effective approach to training control-related tasks for humanoid robots using Reinforcement Learning (RL). The traditional RL methods are limited in adapting to real-world environments, complexity, and natural motions, but the proposed approach overcomes these limitations by using curriculum training and Adversarial Motion Priors (AMP) technique. The results show that the developed RL policies for kicking, walking, and jumping are more dynamic, and adaptive, and outperformed previous methods. However, the transfer of the learned policy from simulation to the real world was unsuccessful, highlighting the limitations of current RL methods in fully adapting to real-world scenarios.
comment: Undergrad Thesis
Unifying Quadrotor Motion Planning and Control by Chaining Different Fidelity Models
Many aerial tasks involving quadrotors demand both instant reactivity and long-horizon planning. High-fidelity models enable accurate control but are too slow for long horizons; low-fidelity planners scale but degrade closed-loop performance. We present Unique, a unified MPC that cascades models of different fidelity within a single optimization: a short-horizon, high-fidelity model for accurate control, and a long-horizon, low-fidelity model for planning. We align costs across horizons, derive feasibility-preserving thrust and body-rate constraints for the point-mass model, and introduce transition constraints that match the different states, thrust-induced acceleration, and jerk-body-rate relations. To prevent local minima emerging from nonsmooth clutter, we propose a 3D progressive smoothing schedule that morphs norm-based obstacles along the horizon. In addition, we deploy parallel randomly initialized MPC solvers to discover lower-cost local minima on the long, low-fidelity horizon. In simulation and real flights, under equal computational budgets, Unique improves closed-loop position or velocity tracking by up to 75% compared with standard MPC and hierarchical planner-tracker baselines. Ablations and Pareto analyses confirm robust gains across horizon variations, constraint approximations, and smoothing schedules.
INDOOR-LiDAR: Bridging Simulation and Reality for Robot-Centric 360 degree Indoor LiDAR Perception -- A Robot-Centric Hybrid Dataset
We present INDOOR-LIDAR, a comprehensive hybrid dataset of indoor 3D LiDAR point clouds designed to advance research in robot perception. Existing indoor LiDAR datasets often suffer from limited scale, inconsistent annotation formats, and human-induced variability during data collection. INDOOR-LIDAR addresses these limitations by integrating simulated environments with real-world scans acquired using autonomous ground robots, providing consistent coverage and realistic sensor behavior under controlled variations. Each sample consists of dense point cloud data enriched with intensity measurements and KITTI-style annotations. The annotation schema encompasses common indoor object categories within various scenes. The simulated subset enables flexible configuration of layouts, point densities, and occlusions, while the real-world subset captures authentic sensor noise, clutter, and domain-specific artifacts characteristic of real indoor settings. INDOOR-LIDAR supports a wide range of applications including 3D object detection, bird's-eye-view (BEV) perception, SLAM, semantic scene understanding, and domain adaptation between simulated and real indoor domains. By bridging the gap between synthetic and real-world data, INDOOR-LIDAR establishes a scalable, realistic, and reproducible benchmark for advancing robotic perception in complex indoor environments.
Programmable Deformation Design of Porous Soft Actuator through Volumetric-Pattern-Induced Anisotropy
Conventional soft pneumatic actuators, typically based on hollow elastomeric chambers, often suffer from small structural support and require costly geometry-specific redesigns for multimodal functionality. Porous materials such as foam, filled into chambers, can provide structural stability for the actuators. However, methods to achieve programmable deformation by tailoring the porous body itself remain underexplored. In this paper, a novel design method is presented to realize soft porous actuators with programmable deformation by incising specific patterns into the porous foam body. This approach introduces localized structural anisotropy of the foam guiding the material's deformation under a global vacuum input. Furthermore, three fundamental patterns on a cylindrical foam substrate are discussed: transverse for bending, longitudinal for tilting, and diagonal for twisting. A computational model is built with Finite Element Analysis (FEA), to investigate the mechanism of the incision-patterning method. Experiments demonstrate that with a potential optimal design of the pattern array number N, actuators can achieve bending up to $80^{\circ}$ (N=2), tilting of $18^{\circ}$ (N=1), and twisting of $115^{\circ}$ (N=8). The versatility of our approach is demonstrated via pattern transferability, scalability, and mold-less rapid prototyping of complex designs. As a comprehensive application, we translate the human hand crease map into a functional incision pattern, creating a bio-inspired soft robot hand capable of human-like adaptive grasping. Our work provides a new, efficient, and scalable paradigm for the design of multi-functional soft porous robots.
From Human Intention to Action Prediction: A Comprehensive Benchmark for Intention-driven End-to-End Autonomous Driving
Current end-to-end autonomous driving systems operate at a level of intelligence akin to following simple steering commands. However, achieving genuinely intelligent autonomy requires a paradigm shift: moving from merely executing low-level instructions to understanding and fulfilling high-level, abstract human intentions. This leap from a command-follower to an intention-fulfiller, as illustrated in our conceptual framework, is hindered by a fundamental challenge: the absence of a standardized benchmark to measure and drive progress on this complex task. To address this critical gap, we introduce Intention-Drive, the first comprehensive benchmark designed to evaluate the ability to translate high-level human intent into safe and precise driving actions. Intention-Drive features two core contributions: (1) a new dataset of complex scenarios paired with corresponding natural language intentions, and (2) a novel evaluation protocol centered on the Intent Success Rate (ISR), which assesses the semantic fulfillment of the human's goal beyond simple geometric accuracy. Through an extensive evaluation of a spectrum of baseline models on Intention-Drive, we reveal a significant performance deficit, showing that the baseline model struggle to achieve the comprehensive scene and intention understanding required for this advanced task.
CAR-CHASE: Car-Like Robot Conflict-Aware Heuristic Adaptive Search Enhancement
Multi-Agent Path Finding (MAPF) for car-like robots, addressed by algorithms such as Conflict-Based Search with Continuous Time (CL-CBS), faces significant computational challenges due to expensive kinematic heuristic calculations. Traditional heuristic caching assumes that the heuristic function depends only on the state, which is incorrect in CBS where constraints from conflict resolution make the search space context-dependent. We propose \textbf{CAR-CHASE} (Car-Like Robot Conflict-Aware Heuristic Adaptive Search Enhancement), a novel approach that combines \textbf{conflict-aware heuristic caching} -- which caches heuristic values based on both state and relevant constraint context -- with an \textbf{adaptive hybrid heuristic} that intelligently switches between fast approximate and exact computations. Our key innovations are (1) a compact \emph{conflict fingerprint} that efficiently encodes which constraints affect a state's heuristic, (2) a relevance filter using spatial, temporal, and geometric criteria, and (3) an adaptive switching strategy with theoretical quality bounds. Experimental evaluation on 480 benchmark instances with varying agent counts (10 to 30) and obstacle densities (0\% and 50\%) demonstrates a geometric mean speedup of 2.46$\times$ over the baseline CL-CBS implementation while maintaining solution optimality. The optimizations improve success rate from 77.9\% to 84.8\% (+6.9 percentage points), reduce total runtime by 70.1\%, and enable solving 33 additional instances that previously timed out. Performance gains scale with problem complexity, reaching up to 4.06$\times$ speedup for challenging 30-agent obstacle scenarios. Our techniques are general and applicable to other CBS variants.
Robust Underwater Localization of Buoyancy Driven microFloats Using Acoustic Time-of-Flight Measurements
Accurate underwater localization remains a challenge for inexpensive autonomous platforms that require highfrequency position updates. In this paper, we present a robust, low-cost localization pipeline for buoyancy-driven microFloats operating in coastal waters. We build upon previous work by introducing a bidirectional acoustic Time-of-Flight (ToF) localization framework, which incorporates both float-to-buoy and buoy-to-float transmissions, thereby increasing the number of usable measurements. The method integrates nonlinear trilateration with a filtering of computed position estimates based on geometric cost and Cramer-Rao Lower Bounds (CRLB). This approach removes outliers caused by multipath effects and other acoustic errors from the ToF estimation and improves localization robustness without relying on heavy smoothing. We validate the framework in two field deployments in Puget Sound, Washington, USA. The localization pipeline achieves median positioning errors below 4 m relative to GPS positions. The filtering technique shows a reduction in mean error from 139.29 m to 12.07 m, and improved alignment of trajectories with GPS paths. Additionally, we demonstrate a Time-Difference-of-Arrival (TDoA) localization for unrecovered floats that were transmitting during the experiment. Range-based acoustic localization techniques are widely used and generally agnostic to hardware-this work aims to maximize their utility by improving positioning frequency and robustness through careful algorithmic design.
comment: 9 pages
Learning to Get Up Across Morphologies: Zero-Shot Recovery with a Unified Humanoid Policy
Fall recovery is a critical skill for humanoid robots in dynamic environments such as RoboCup, where prolonged downtime often decides the match. Recent techniques using deep reinforcement learning (DRL) have produced robust get-up behaviors, yet existing methods require training of separate policies for each robot morphology. This paper presents a single DRL policy capable of recovering from falls across seven humanoid robots with diverse heights (0.48 - 0.81 m), weights (2.8 - 7.9 kg), and dynamics. Trained with CrossQ, the unified policy transfers zero-shot up to 86 +/- 7% (95% CI [81, 89]) on unseen morphologies, eliminating the need for robot-specific training. Comprehensive leave-one-out experiments, morph scaling analysis, and diversity ablations show that targeted morphological coverage improves zero-shot generalization. In some cases, the shared policy even surpasses the specialist baselines. These findings illustrate the practicality of morphology-agnostic control for fall recovery, laying the foundation for generalist humanoid control. The software is open-source and available at: https://github.com/utra-robosoccer/unified-humanoid-getup
comment: Accepted at 28th RoboCup International Symposium
Semantic Zone based 3D Map Management for Mobile Robot
Mobile robots in large-scale indoor environments, such as hospitals and logistics centers, require accurate 3D spatial representations. However, 3D maps consume substantial memory, making it difficult to maintain complete map data within limited computational resources. Existing SLAM frameworks typically rely on geometric distance or temporal metrics for memory management, often resulting in inefficient data retrieval in spatially compartmentalized environments. To address this, we propose a semantic zone-based 3D map management method that shifts the paradigm from geometry-centric to semantics-centric control. Our approach partitions the environment into meaningful spatial units (e.g., lobbies, hallways) and designates these zones as the primary unit for memory management. By dynamically loading only task-relevant zones into Working Memory (WM) and offloading inactive zones to Long-Term Memory (LTM), the system strictly enforces user-defined memory thresholds. Implemented within the RTAB-Map framework, our method demonstrates substantial reductions in unnecessary signature load/unload cycles and cumulative memory utilization compared to standard approaches. The results confirm that semantic zone-based management ensures stable, predictable memory usage while preserving map availability for navigation. Code is available at: https://github.com/huichangs/rtabmap/tree/segment
comment: 12 pages, 11 figures
Measuring What Matters: Scenario-Driven Evaluation for Trajectory Predictors in Autonomous Driving
Being able to anticipate the motion of surrounding agents is essential for the safe operation of autonomous driving systems in dynamic situations. While various methods have been proposed for trajectory prediction, the current evaluation practices still rely on error-based metrics (e.g., ADE, FDE), which reveal the accuracy from a post-hoc view but ignore the actual effect the predictor brings to the self-driving vehicles (SDVs), especially in complex interactive scenarios: a high-quality predictor not only chases accuracy, but should also captures all possible directions a neighbor agent might move, to support the SDVs' cautious decision-making. Given that the existing metrics hardly account for this standard, in our work, we propose a comprehensive pipeline that adaptively evaluates the predictor's performance by two dimensions: accuracy and diversity. Based on the criticality of the driving scenario, these two dimensions are dynamically combined and result in a final score for the predictor's performance. Extensive experiments on a closed-loop benchmark using real-world datasets show that our pipeline yields a more reasonable evaluation than traditional metrics by better reflecting the correlation of the predictors' evaluation with the autonomous vehicles' driving performance. This evaluation pipeline shows a robust way to select a predictor that potentially contributes most to the SDV's driving performance.
comment: 9 Pages, 8 Figures
A Hybrid Deep Learning Framework for Emotion Recognition in Children with Autism During NAO Robot-Mediated Interaction
Understanding emotional responses in children with Autism Spectrum Disorder (ASD) during social interaction remains a critical challenge in both developmental psychology and human-robot interaction. This study presents a novel deep learning pipeline for emotion recognition in autistic children in response to a name-calling event by a humanoid robot (NAO), under controlled experimental settings. The dataset comprises of around 50,000 facial frames extracted from video recordings of 15 children with ASD. A hybrid model combining a fine-tuned ResNet-50-based Convolutional Neural Network (CNN) and a three-layer Graph Convolutional Network (GCN) trained on both visual and geometric features extracted from MediaPipe FaceMesh landmarks. Emotions were probabilistically labeled using a weighted ensemble of two models: DeepFace's and FER, each contributing to soft-label generation across seven emotion classes. Final classification leveraged a fused embedding optimized via Kullback-Leibler divergence. The proposed method demonstrates robust performance in modeling subtle affective responses and offers significant promise for affective profiling of ASD children in clinical and therapeutic human-robot interaction contexts, as the pipeline effectively captures micro emotional cues in neurodivergent children, addressing a major gap in autism-specific HRI research. This work represents the first such large-scale, real-world dataset and pipeline from India on autism-focused emotion analysis using social robotics, contributing an essential foundation for future personalized assistive technologies.
comment: 12 pages, journal paper
Navigation Around Unknown Space Objects Using Visible-Thermal Image Fusion SC
As the popularity of on-orbit operations grows, so does the need for precise navigation around unknown resident space objects (RSOs) such as other spacecraft, orbital debris, and asteroids. The use of Simultaneous Localization and Mapping (SLAM) algorithms is often studied as a method to map out the surface of an RSO and find the inspector's relative pose using a lidar or conventional camera. However, conventional cameras struggle during eclipse or shadowed periods, and lidar, though robust to lighting conditions, tends to be heavier, bulkier, and more power-intensive. Thermal-infrared cameras can track the target RSO throughout difficult illumination conditions without these limitations. While useful, thermal-infrared imagery lacks the resolution and feature-richness of visible cameras. In this work, images of a target satellite in low Earth orbit are photo-realistically simulated in both visible and thermal-infrared bands. Pixel-level fusion methods are used to create visible/thermal-infrared composites that leverage the best aspects of each camera. Navigation errors from a monocular SLAM algorithm are compared between visible, thermal-infrared, and fused imagery in various lighting and trajectories. Fused imagery yields substantially improved navigation performance over visible-only and thermal-only methods.
comment: 18 pages, 11 figures. To be published in proceedings of AIAA SCITECH 2026 Forum
B-ActiveSEAL: Scalable Uncertainty-Aware Active Exploration with Tightly Coupled Localization-Mapping
Active robot exploration requires decision-making processes that integrate localization and mapping under tightly coupled uncertainty. However, managing these interdependent uncertainties over long-term operations in large-scale environments rapidly becomes computationally intractable. To address this challenge, we propose B-ActiveSEAL, a scalable information-theoretic active exploration framework that explicitly accounts for coupled uncertainties-from perception through mapping-into the decision-making process. Our framework (i) adaptively balances map uncertainty (exploration) and localization uncertainty (exploitation), (ii) accommodates a broad class of generalized entropy measures, enabling flexible and uncertainty-aware active exploration, and (iii) establishes Behavioral entropy (BE) as an effective information measure for active exploration by enabling intuitive and adaptive decision-making under coupled uncertainties. We establish a theoretical foundation for propagating coupled uncertainties and integrating them into general entropy formulations, enabling uncertainty-aware active exploration under tightly coupled localization-mapping. The effectiveness of the proposed approach is validated through rigorous theoretical analysis and extensive experiments on open-source maps and ROS-Unity simulations across diverse and complex environments. The results demonstrate that B-ActiveSEAL achieves a well-balanced exploration-exploitation trade-off and produces diverse, adaptive exploration behaviors across environments, highlighting clear advantages over representative baselines.
comment: 18 pages, 17 figures
Ctrl-Crash: Controllable Diffusion for Realistic Car Crashes
Video diffusion techniques have advanced significantly in recent years; however, they struggle to generate realistic imagery of car crashes due to the scarcity of accident events in most driving datasets. Improving traffic safety requires realistic and controllable accident simulations. To tackle the problem, we propose Ctrl-Crash, a controllable car crash video generation model that conditions on signals such as bounding boxes, crash types, and an initial image frame. Our approach enables counterfactual scenario generation where minor variations in input can lead to dramatically different crash outcomes. To support fine-grained control at inference time, we leverage classifier-free guidance with independently tunable scales for each conditioning signal. Ctrl-Crash achieves state-of-the-art performance across quantitative video quality metrics (e.g., FVD and JEDi) and qualitative measurements based on a human-evaluation of physical realism and video quality compared to prior diffusion-based methods.
comment: Under review at Pattern Recognition Letters
End-to-End Dexterous Arm-Hand VLA Policies via Shared Autonomy: VR Teleoperation Augmented by Autonomous Hand VLA Policy for Efficient Data Collection
Achieving human-like dexterous manipulation remains a major challenge for general-purpose robots. While Vision-Language-Action (VLA) models show potential in learning skills from demonstrations, their scalability is limited by scarce high-quality training data. Existing data collection methods face inherent constraints: manual teleoperation overloads human operators, while automated planning often produces unnatural motions. We propose a Shared Autonomy framework that divides control between macro and micro motions. A human operator guides the robot's arm pose through intuitive VR teleoperation, while an autonomous DexGrasp-VLA policy handles fine-grained hand control using real-time tactile and visual feedback. This division significantly reduces cognitive load and enables efficient collection of high-quality coordinated arm-hand demonstrations. Using this data, we train an end-to-end VLA policy enhanced with our novel Arm-Hand Feature Enhancement module, which captures both distinct and shared representations of macro and micro movements for more natural coordination. Our Corrective Teleoperation system enables continuous policy improvement through human-in-the-loop failure recovery. Experiments demonstrate that our framework generates high-quality data with minimal manpower and achieves a 90% success rate across diverse objects, including unseen instances. Comprehensive evaluations validate the system's effectiveness in developing dexterous manipulation capabilities.
Entropy-Controlled Intrinsic Motivation Reinforcement Learning for Quadruped Robot Locomotion in Complex Terrains
Learning is the basis of both biological and artificial systems when it comes to mimicking intelligent behaviors. From the classical PPO (Proximal Policy Optimization), there is a series of deep reinforcement learning algorithms which are widely used in training locomotion policies for quadrupedal robots because of their stability and sample efficiency. However, among all these variants, experiments and simulations often converge prematurely, leading to suboptimal locomotion and reduced task performance. Therefore, in this paper, we introduce Entropy-Controlled Intrinsic Motivation (ECIM), an entropy-based reinforcement learning algorithm in contrast with the PPO series, that can reduce premature convergence by combining intrinsic motivation with adaptive exploration. For experiments, in order to parallel with other baselines, we chose to apply it in Isaac Gym across six terrain categories: upward slopes, downward slopes, uneven rough terrain, ascending stairs, descending stairs, and flat ground as widely used. For comparison, our experiments consistently achieve better performance: task rewards increase by 4--12%, peak body pitch oscillation is reduced by 23--29%, joint acceleration decreases by 20--32%, and joint torque consumption declines by 11--20%. Overall, our model ECIM, by combining entropy control and intrinsic motivation control, achieves better results in stability across different terrains for quadrupedal locomotion, and at the same time reduces energetic cost and makes it a practical choice for complex robotic control tasks.
BEASST: Behavioral Entropic Gradient based Adaptive Source Seeking for Mobile Robots
This paper presents BEASST (Behavioral Entropic Gradient-based Adaptive Source Seeking for Mobile Robots), a novel framework for robotic source seeking in complex, unknown environments. Our approach enables mobile robots to efficiently balance exploration and exploitation by modeling normalized signal strength as a surrogate probability of source location. Building on Behavioral Entropy(BE) with Prelec's probability weighting function, we define an objective function that adapts robot behavior from risk-averse to risk-seeking based on signal reliability and mission urgency. The framework provides theoretical convergence guarantees under unimodal signal assumptions and practical stability under bounded disturbances. Experimental validation across DARPA SubT and multi-room scenarios demonstrates that BEASST consistently outperforms state-of-the-art methods, achieving 15% reduction in path length and 20% faster source localization through intelligent uncertainty-driven navigation that dynamically transitions between aggressive pursuit and cautious exploration.
Camera Calibration via Circular Patterns: A Comprehensive Framework with Detection Uncertainty and Unbiased Projection Model
Camera calibration using planar targets has been widely favored, and two types of control points have been mainly considered as measurements: the corners of the checkerboard and the centroid of circles. Since a centroid is derived from numerous pixels, the circular pattern provides more precise measurements than the checkerboard. However, the existing projection model of circle centroids is biased under lens distortion, resulting in low performance. To surmount this limitation, we propose an unbiased projection model of the circular pattern and demonstrate its superior accuracy compared to the checkerboard. Complementing this, we introduce uncertainty into circular patterns to enhance calibration robustness and completeness. Defining centroid uncertainty improves the performance of calibration components, including pattern detection, optimization, and evaluation metrics. We also provide guidelines for performing good camera calibration based on the evaluation metric. The core concept of this approach is to model the boundary points of a two-dimensional shape as a Markov random field, considering its connectivity. The shape distribution is propagated to the centroid uncertainty through an appropriate shape representation based on the Green theorem. Consequently, the resulting framework achieves marked gains in calibration accuracy and robustness. The complete source code and demonstration video are available at https://github.com/chaehyeonsong/discocal.
Multiagent Systems
Emergence: Overcoming Privileged Information Bias in Asymmetric Embodied Agents via Active Querying
Large Language Models (LLMs) act as powerful reasoning engines but struggle with "symbol grounding" in embodied environments, particularly when information is asymmetrically distributed. We investigate the Privileged Information Bias (or "Curse of Knowledge"), where a knowledgeable "Leader" agent fails to guide a sensor-limited "Follower" due to a lack of Theory of Mind. To quantify this phenomenon, we propose a novel Asymmetric Assistive Reasoning framework within AI2-THOR. Our experiments reveal a significant "Success Gap": while the Leader successfully perceives the target in 35.0% of episodes, the collaborative team succeeds only 17.0% of the time, implying that nearly 50% of feasible plans fail solely due to communicative grounding errors. We demonstrate that a "Pull-based" protocol (active querying) is significantly more robust than standard "Push-based" instruction, with successful episodes featuring 2x the frequency of clarification requests. This research isolates the mechanism of active uncertainty reduction as a prerequisite for safe human-AI and robot-robot collaboration.
comment: 12 pages, 9 pages of content, 6 tables, 5 figures
Bidirectional human-AI collaboration in brain tumour assessments improves both expert human and AI agent performance
The benefits of artificial intelligence (AI) human partnerships-evaluating how AI agents enhance expert human performance-are increasingly studied. Though rarely evaluated in healthcare, an inverse approach is possible: AI benefiting from the support of an expert human agent. Here, we investigate both human-AI clinical partnership paradigms in the magnetic resonance imaging-guided characterisation of patients with brain tumours. We reveal that human-AI partnerships improve accuracy and metacognitive ability not only for radiologists supported by AI, but also for AI agents supported by radiologists. Moreover, the greatest patient benefit was evident with an AI agent supported by a human one. Synergistic improvements in agent accuracy, metacognitive performance, and inter-rater agreement suggest that AI can create more capable, confident, and consistent clinical agents, whether human or model-based. Our work suggests that the maximal value of AI in healthcare could emerge not from replacing human intelligence, but from AI agents that routinely leverage and amplify it.
comment: 38 pages, 6 figures, 7 supplementary figures
Trusted AI Agents in the Cloud
AI agents powered by large language models are increasingly deployed as cloud services that autonomously access sensitive data, invoke external tools, and interact with other agents. However, these agents run within a complex multi-party ecosystem, where untrusted components can lead to data leakage, tampering, or unintended behavior. Existing Confidential Virtual Machines (CVMs) provide only per binary protection and offer no guarantees for cross-principal trust, accelerator-level isolation, or supervised agent behavior. We present Omega, a system that enables trusted AI agents by enforcing end-to-end isolation, establishing verifiable trust across all contributing principals, and supervising every external interaction with accountable provenance. Omega builds on Confidential VMs and Confidential GPUs to create a Trusted Agent Platform that hosts many agents within a single CVM using nested isolation. It also provides efficient multi-agent orchestration with cross-principal trust establishment via differential attestation, and a policy specification and enforcement framework that governs data access, tool usage, and inter-agent communication for data protection and regulatory compliance. Implemented on AMD SEV-SNP and NVIDIA H100, Omega fully secures agent state across CVM-GPU, and achieves high performance while enabling high-density, policy-compliant multi-agent deployments at cloud scale.
Regret Lower Bounds for Decentralized Multi-Agent Stochastic Shortest Path Problems NeurIPS 2025
Multi-agent systems (MAS) are central to applications such as swarm robotics and traffic routing, where agents must coordinate in a decentralized manner to achieve a common objective. Stochastic Shortest Path (SSP) problems provide a natural framework for modeling decentralized control in such settings. While the problem of learning in SSP has been extensively studied in single-agent settings, the decentralized multi-agent variant remains largely unexplored. In this work, we take a step towards addressing that gap. We study decentralized multi-agent SSPs (Dec-MASSPs) under linear function approximation, where the transition dynamics and costs are represented using linear models. Applying novel symmetry-based arguments, we identify the structure of optimal policies. Our main contribution is the first regret lower bound for this setting based on the construction of hard-to-learn instances for any number of agents, $n$. Our regret lower bound of $Ω(\sqrt{K})$, over $K$ episodes, highlights the inherent learning difficulty in Dec-MASSPs. These insights clarify the learning complexity of decentralized control and can further guide the design of efficient learning algorithms in multi-agent systems.
comment: To appear in 39th Conference on Neural Information Processing Systems (NeurIPS 2025)
Systems and Control (EESS)
Improved Directional State Transition Tensors for Accurate Aerocapture Performance Analysis
Aerocapture is a unique challenge for semi-analytical propagation because its nonconservative dynamics lead to force magnitudes that vary substantially across the trajectory. State transition tensors (STTs), higher-order Taylor series expansions of the solution flow, have been widely used as a computationally efficient semi-analytical propagation method for orbital scenarios, but have not previously been applied to aerocapture. However, obtaining the higher-order STTs requires integrating exponentially more equations. Directional state transition tensors (DSTTs) mitigate this cost by projecting the state into a reduced-dimension basis. This work develops novel dynamics analysis techniques to identify effective bases for this reduction, including augmented higher-order Cauchy Green tensors tailored to quantities of interest such as apoapsis radius. Results show that DSTTs constructed along these bases significantly reduce computational cost while maintaining accuracy in apoapsis and energy prediction. In particular, certain of these DSTTs outperform traditional DSTTs in nonlinear perturbation propagation for key state subsets and quantities of interest. These results establish STTs and DSTTs as practical tools for aerocapture performance analysis to enable robust guidance and navigation.
comment: Submitted to AIAA Journal of Guidance, Control, and Dynamics for publication
AI-Driven Real-Time Kick Classification in Olympic Taekwondo Using Sensor Fusion
Olympic Taekwondo has faced challenges in spectator engagement due to static, defensive gameplay and contentious scoring. Current Protector and Scoring Systems (PSS) rely on impact sensors and simplistic logic, encouraging safe strategies that diminish the sport's dynamism. This paper proposes an AI-powered scoring system that integrates existing PSS sensors with additional accelerometers, gyroscopes, magnetic/RFID, and impact force sensors in a sensor fusion framework. The system classifies kicks in real-time to identify technique type, contact location, impact force, and even the part of the foot used. A machine learning pipeline employing sensor fusion and Support Vector Machines (SVMs) is detailed, enabling automatic kick technique recognition for scoring. We present a novel kick scoring rubric that awards points based on specific kick techniques (e.g., turning and spinning kicks) to incentivize dynamic attacks. Drawing on a 2024 study achieving 96-98% accuracy, we validate the feasibility of real-time kick classification and further propose enhancements to this methodology, such as ensemble SVM classifiers and expanded datasets, to achieve the high-stakes accuracy required by the sport. We analyze how the proposed system can improve scoring fairness, reduce rule exploitation and illegitimate tactics, encourage more dynamic techniques, and enhance spectator understanding and excitement. The paper includes system design illustrations, a kick scoring table from an AI-augmented rule set, and discusses anticipated impacts on Olympic Taekwondo.
comment: 13 pages, 4 figures
Unifying Quadrotor Motion Planning and Control by Chaining Different Fidelity Models
Many aerial tasks involving quadrotors demand both instant reactivity and long-horizon planning. High-fidelity models enable accurate control but are too slow for long horizons; low-fidelity planners scale but degrade closed-loop performance. We present Unique, a unified MPC that cascades models of different fidelity within a single optimization: a short-horizon, high-fidelity model for accurate control, and a long-horizon, low-fidelity model for planning. We align costs across horizons, derive feasibility-preserving thrust and body-rate constraints for the point-mass model, and introduce transition constraints that match the different states, thrust-induced acceleration, and jerk-body-rate relations. To prevent local minima emerging from nonsmooth clutter, we propose a 3D progressive smoothing schedule that morphs norm-based obstacles along the horizon. In addition, we deploy parallel randomly initialized MPC solvers to discover lower-cost local minima on the long, low-fidelity horizon. In simulation and real flights, under equal computational budgets, Unique improves closed-loop position or velocity tracking by up to 75% compared with standard MPC and hierarchical planner-tracker baselines. Ablations and Pareto analyses confirm robust gains across horizon variations, constraint approximations, and smoothing schedules.
Data-Selective Online Battery Identification Using Extended Time Regular Expressions
In this paper, we propose a data-efficient online battery identification method which targets highly informative battery cell data segments based on the driving pattern of the vehicle. We consider the case of a vehicle driving on/off a motorway and construct an Extended Time Regular Expression (ETRE) to detect data segments fitting these driving patterns. Simulation results indicate that by only using up to 10.71% of the data on average, the proposed method provides a low-bias and low-variance estimator under non-negligible current and voltage noise compared to other conventional estimation algorithms.
comment: This work has been submitted to IFAC for possible publication
Evaluating Asynchronous Semantics in Trace-Discovered Resilience Models: A Case Study on the OpenTelemetry Demo
While distributed tracing and chaos engineering are becoming standard for microservices, resilience models remain largely manual and bespoke. We revisit a trace-discovered connectivity model that derives a service dependency graph from traces and uses Monte Carlo simulation to estimate endpoint availability under fail-stop service failures. Compared to earlier work, we (i) derive the graph directly from raw OpenTelemetry traces, (ii) attach endpoint-specific success predicates, and (iii) add a simple asynchronous semantics that treats Kafka edges as non-blocking for immediate HTTP success. We apply this model to the OpenTelemetry Demo ("Astronomy Shop") using a GitHub Actions workflow that discovers the graph, runs simulations, and executes chaos experiments that randomly kill microservices in a Docker Compose deployment. Across the studied failure fractions, the model reproduces the overall availability degradation curve, while asynchronous semantics for Kafka edges change predicted availabilities by at most about 10^(-5) (0.001 percentage points). This null result suggests that for immediate HTTP availability in this case study, explicitly modeling asynchronous dependencies is not warranted, and a simpler connectivity-only model is sufficient.
Measuring What Matters: Scenario-Driven Evaluation for Trajectory Predictors in Autonomous Driving
Being able to anticipate the motion of surrounding agents is essential for the safe operation of autonomous driving systems in dynamic situations. While various methods have been proposed for trajectory prediction, the current evaluation practices still rely on error-based metrics (e.g., ADE, FDE), which reveal the accuracy from a post-hoc view but ignore the actual effect the predictor brings to the self-driving vehicles (SDVs), especially in complex interactive scenarios: a high-quality predictor not only chases accuracy, but should also captures all possible directions a neighbor agent might move, to support the SDVs' cautious decision-making. Given that the existing metrics hardly account for this standard, in our work, we propose a comprehensive pipeline that adaptively evaluates the predictor's performance by two dimensions: accuracy and diversity. Based on the criticality of the driving scenario, these two dimensions are dynamically combined and result in a final score for the predictor's performance. Extensive experiments on a closed-loop benchmark using real-world datasets show that our pipeline yields a more reasonable evaluation than traditional metrics by better reflecting the correlation of the predictors' evaluation with the autonomous vehicles' driving performance. This evaluation pipeline shows a robust way to select a predictor that potentially contributes most to the SDV's driving performance.
comment: 9 Pages, 8 Figures
Braess' Paradoxes in Coupled Power and Transportation Systems
Transportation electrification introduces strong coupling between the power and transportation systems. In this paper, we generalize the classical notion of Braess' paradox to coupled power and transportation systems, and examine how the cross-system coupling induces new types of Braess' paradoxes. To this end, we model the power and transportation networks as graphs, coupled with charging points connecting to nodes in both graphs. The power system operation is characterized by the economic dispatch optimization, while the transportation system user equilibrium models travelers' route and charging choices. By analyzing simple coupled systems, we demonstrate that capacity expansion in either transportation or power system can deteriorate the performance of both systems, and uncover the fundamental mechanisms for such new Braess' paradoxes to occur. We also provide necessary and sufficient conditions of the occurrences of Braess' paradoxes for general coupled systems, leading to managerial insights for infrastructure planners. For general networks, through characterizing the generalized user equilibrium of the coupled systems, we develop efficient algorithms to detect Braess' paradoxes and novel charging pricing policies to mitigate them.
comment: 61 pages, 17 figures
Realizing Space-oriented Control in Smart Buildings via Word Embeddings
This paper presents a novel framework for implementing space-oriented control systems in smart buildings. In contrast to conventional device-oriented approaches, which often suffer from issues related to development efficiency and portability, our framework adopts a space-oriented paradigm that leverages natural language processing and word embedding techniques. The proposed framework features a chat-based graphical user interface (GUI) that converts natural language inputs into actionable OpenAI API calls, thereby enabling intuitive space level (e.g., room) control within smart environments. To support efficient embedding-based search and metadata retrieval, the framework integrates a vector database powered by Elasticsearch. This ensures the accurate identification and invocation of appropriate smart building APIs. A prototype implementation has been tested in a smart building environment at the University of Tokyo, demonstrating the feasibility of the approach.
comment: 4pages, 4figures, 1 table, accepted by IEEE GCCE 2025
Fractional Calculus in Optimal Control and Game Theory: Theory, Numerics, and Applications -- A Survey
Many physical, biological, and engineered systems exhibit memory effects that challenge Markovian models. Fractional calculus provides nonlocal operators to capture hereditary dynamics. This survey connects modeling, analysis, and controller/game design for systems with memory. We unify notation for Caputo, Riemann-Liouville, and Grunwald-Letnikov derivatives and relate them to practical approximations, including diffusive (sum-of-exponentials) state augmentation and frequency-domain realizations (e.g., Oustaloup). We review fractional extensions of the calculus of variations and the Pontryagin maximum principle, and dynamic-programming formulations with memory, including path-dependent HJB for optimal control and HJI for zero-sum games. We cover design tools such as LQR, MPC, and fractional-order PID, as well as fractional differential games with Nash, Stackelberg, and minimax equilibria. Computational approaches are compared across time-domain schemes, frequency-domain approximations, and diffusive augmentations, highlighting accuracy-complexity trade-offs and remedies for the curse of history (windowing and sum-of-exponentials). We conclude with applications and open problems on equilibria with memory, Isaacs-type conditions, constraint handling, and scalable solvers.
comment: 42 pages, 1 figure, comprehensive survey on fractional calculus in optimal control and game theory
A Framework for Scalable Digital Twin Deployment in Smart Campus Building Facility Management
Digital twin (DT) offers significant opportunities for enhancing facility management (FM) in campus environments. However, existing research often focuses narrowly on isolated domains, such as point-cloud geometry or energy analytics, without providing a scalable and interoperable workflow that integrates building geometry, equipment metadata, and operational data into a unified FM platform. This study proposes a comprehensive framework for scalable digital-twin deployment in smart campus buildings by integrating 3D laser scanning, BIM modeling, and IoT-enabled data visualization to support facility operations and maintenance. The methodology includes: (1) reality capture using terrestrial laser scanning and structured point-cloud processing; (2) development of an enriched BIM model incorporating architectural, mechanical, electrical, plumbing, conveying, and sensor systems; and (3) creation of a digital-twin environment that links equipment metadata, maintenance policies, and simulated IoT data within a digital-twin management platform. A case study of the Price Gilbert Building at Georgia Tech demonstrates the implementation of this workflow. A total of 509 equipment items were modeled and embedded with OmniClass classifications into the digital twin. Ten interactive dashboards were developed to visualize system performance. Results show that the proposed framework enables centralized asset documentation, improved system visibility, and enhanced preventive and reactive maintenance workflows. Although most IoT data were simulated due to limited existing sensor infrastructure, the prototype validates the feasibility of a scalable digital twin for facility management and establishes a reference model for real-time monitoring, analytics integration, and future autonomous building operations.
Neuromorphic Photonic Computing with an Electro-Optic Analog Memory
In neuromorphic photonic systems, device operations are typically governed by analog signals, necessitating digital-to-analog converters (DAC) and analog-to-digital converters (ADC). However, data movement between memory and these converters in conventional von Neumann architectures incur significant energy costs. We propose an analog electronic memory co-located with photonic computing units to eliminate repeated long-distance data movement. Here, we demonstrate a monolithically integrated neuromorphic photonic circuit with on-chip capacitive analog memory and evaluate its performance in machine learning for in situ training and inference using the MNIST dataset. Our analysis shows that integrating analog memory into a neuromorphic photonic architecture can achieve over 26x power savings compared to conventional SRAM-DAC architectures. Furthermore, maintaining a minimum analog memory retention-to-network-latency ratio of 100 maintains >90% inference accuracy, enabling leaky analog memories without substantial performance degradation. This approach reduces reliance on DACs, minimizes data movement, and offers a scalable pathway toward energy-efficient, high-speed neuromorphic photonic computing.
Cyber-Resilient Data-Driven Event-Triggered Secure Control for Autonomous Vehicles Under False Data Injection Attacks
This paper proposes a cyber-resilient secure control framework for autonomous vehicles (AVs) subject to false data injection (FDI) threats as actuator attacks. The framework integrates data-driven modeling, event-triggered communication, and fractional-order sliding mode control (FSMC) to enhance the resilience against adversarial interventions. A dynamic model decomposition (DMD)-based methodology is employed to extract the lateral dynamics from real-world data, eliminating the reliance on conventional mechanistic modeling. To optimize communication efficiency, an event-triggered transmission scheme is designed to reduce the redundant transmissions while ensuring system stability. Furthermore, an extended state observer (ESO) is developed for real-time estimation and mitigation of actuator attack effects. Theoretical stability analysis, conducted using Lyapunov methods and linear matrix inequality (LMI) formulations, guarantees exponential error convergence. Extensive simulations validate the proposed event-triggered secure control framework, demonstrating substantial improvements in attack mitigation, communication efficiency, and lateral tracking performance. The results show that the framework effectively counteracts actuator attacks while optimizing communication-resource utilization, making it highly suitable for safety-critical AV applications.
comment: 14 pages, 8 figures
Learning Dissipative Chaotic Dynamics with Boundedness Guarantees
Chaotic dynamics, commonly seen in weather systems and fluid turbulence, are characterized by their sensitivity to initial conditions, which makes accurate prediction challenging. Recent approaches have focused on developing data-driven models that attempt to preserve invariant statistics over long horizons since many chaotic systems exhibit dissipative behaviors and ergodicity. Despite the recent progress in such models, they are still often prone to generating unbounded trajectories, leading to invalid statistics evaluation. To address this fundamental challenge, we introduce a modular framework that provides formal guarantees of trajectory boundedness for neural network chaotic dynamics models. Our core contribution is a dissipative projection layer that leverages control-theoretic principles to ensure the learned system is dissipative. Specifically, our framework simultaneously learns a dynamics emulator and an energy-like function, where the latter is used to construct an algebraic dissipative constraint within the projection layer. Furthermore, the learned invariant level set provides an outer estimate for the system's strange attractor, which is known to be difficult to characterize due to its complex geometry. We demonstrate our model's ability to produce bounded long-horizon forecasts that preserve invariant statistics for chaotic dynamical systems, including Lorenz 96 and a reduced-order model of the Kuramoto-Sivashinsky equation.
Nash Equilibrium Learning In Large Populations With First-Order Payoff Modifications
We establish Nash equilibrium learning in large populations of noncooperative, strategic agents. Our analysis considers the broadest class to date of payoff mechanisms with first-order modifications, capable of modeling bounded rationality and anticipatory effects, averaging, or Padé delay approximations. We propose a framework that, for the first time, combines two nonstandard system-theoretic passivity notions. Our results hold for discontinuous best response dynamics alongside continuous learning rules, significantly extending prior work.
comment: 6 pages, 3 figures
Optimal Rates of Convergence for Entropy Regularization in Discounted Markov Decision Processes
We study the error introduced by entropy regularization in infinite-horizon discrete discounted Markov decision processes. We show that this error decreases exponentially in the inverse regularization strength, both in a weighted KL-divergence and in value with a problem-specific exponent. This is in contrast to previously known estimates, of the order $O(τ)$, where $τ$ is the regularization strength. We provide a lower bound that matches our upper bound up to a polynomial term, thereby characterizing the exponential convergence rate for entropy regularization. Our proof relies on the observation that the solutions of entropy-regularized Markov decision processes solve a gradient flow of the unregularized reward with respect to a Riemannian metric common in natural policy gradient methods. This correspondence allows us to identify the limit of this gradient flow as the generalized maximum entropy optimal policy, thereby characterizing the implicit bias of this gradient flow, which corresponds to a time-continuous version of the natural policy gradient method. We use our improved error estimates to show that for entropy-regularized natural policy gradient methods, the overall error decays exponentially in the square root of the number of iterations, improving over existing sublinear guarantees. Finally, we extend our analysis to settings beyond the entropy. In particular, we characterize the implicit bias regarding general convex potentials and their resulting generalized natural policy gradients.
comment: 32 pages, 1 figure
Inverse Optimal Control for Linear Quadratic Tracking with Unknown Target States
This paper addresses the inverse optimal control for the linear quadratic tracking problem with a fixed but unknown target state, which aims to estimate the possible triplets comprising the target state, the state weight matrix, and the input weight matrix from observed optimal control input and the corresponding state trajectories. Sufficient conditions have been provided for the unique determination of both the linear quadratic cost function as well as the target state. A computationally efficient and numerically reliable parameter identification algorithm is proposed by equating optimal control strategies with a system of linear equations, and the associated relative error upper bound is derived in terms of data volume and signal-to-noise ratio. Moreover, the proposed inverse optimal control algorithm is applied for the joint cluster coordination and intent identification of a multi-agent system. By incorporating the structural constraint of the Laplace matrix, the relative error upper bound can be reduced accordingly. Finally, the algorithm's efficiency and accuracy are validated by a vehicle-on-a-lever example and a multi-agent formation control example.
Variational Bayesian Inference for Multiple Extended Targets or Unresolved Group Targets Tracking
In this work, we propose a method for tracking multiple extended targets or unresolvable group targets in a clutter environment. Firstly, based on the Random Matrix Model (RMM), the joint state of the target is modeled as the Gamma Gaussian Inverse Wishart (GGIW) distribution. Considering the uncertainty of measurement origin caused by the clutters, we adopt the idea of probabilistic data association and describe the joint association event as an unknown parameter in the joint prior distribution. Then the Variational Bayesian Inference (VBI) is employed to approximately solve the non-analytical posterior distribution. Furthermore, to ensure the practicability of the proposed method, we further provide two potential lightweight schemes to reduce its computational complexity. One of them is based on clustering, which effectively prunes the joint association events. The other is a simplification of the variational posterior through marginal association probabilities. Finally, the effectiveness of the proposed method is demonstrated by simulation and real data experiments, and we show that the proposed method outperforms current state-of-the-art methods in terms of accuracy and adaptability.
comment: 21 pages, 14 figures, 4 tables
Quantum Encrypted Control of Networked Systems
Encrypted control has been extensively studied to ensure the confidentiality of system states and control inputs for networked control systems. This paper presents a computationally efficient encrypted control framework for networked systems enabled by quantum communication. A quantum channel between sensors and actuators is used to generate identical secret keys, whose security is further enhanced through quantum key distribution. These keys enable lightweight encryption and decryption while preserving confidentiality and control accuracy. We develop a novel encryption-decryption architecture for state-feedback control of linear systems based on quantum keys, and characterize the impact of quantum state errors on closed-loop stability. In particular, we establish the existence of a critical threshold on intrinsic quantum noise below which stability is guaranteed. In contrast to classical encrypted control schemes, which may collapse under a single key-bit error, the proposed quantum encrypted control exhibits strong robustness to key imperfections. We further adopt quantization techniques to address the scenarios with limited communication bits in practical situations, and implement privacy protection for quantum keys based on a stochastic quantizer. These results demonstrate that integrating quantum technologies into control systems in a nontrivial and principled manner, even at their current level of maturity, can yield substantial performance gains in reducing computational complexity and improving resilience to key errors while ensuring security against multiple eavesdropping sources.
Fixed Horizon Linear Quadratic Covariance Steering in Continuous Time with Hilbert-Schmidt Terminal Cost
We formulate and solve the fixed horizon linear quadratic covariance steering problem in continuous time with a terminal cost measured in Hilbert-Schmidt (i.e., Frobenius) norm error between the desired and the controlled terminal covariances. For this problem, the necessary conditions of optimality become a coupled matrix ODE two-point boundary value problem. To solve this system of equations, we design a matricial recursive algorithm and prove its convergence. The proposed algorithm and its analysis make use of the linear fractional transforms parameterized by the state transition matrix of the associated Hamiltonian matrix. To illustrate the results, we provide two numerical examples: one with a two dimensional and another with a six dimensional state space.
AI-Assisted Game Management Decisions: A Fuzzy Logic Approach to Real-Time Soccer Substitutions
In elite soccer, substitution decisions entail significant financial and sporting consequences yet remain heavily reliant on intuition or predictive models that merely mimic historical biases. This paper introduces a Fuzzy Logic based Decision Support System (DSS) designed for real time, prescriptive game management. Unlike traditional Machine Learning approaches that encounter a predictive ceiling by attempting to replicate human behavior, our system audits performance through an objective, rule based inference engine. We propose a methodological advancement by reformulating the PlayeRank metric into a Cumulative Mean with Role Aware Normalization, eliminating the play time exposure bias inherent in cumulative sum models to enable accurate intra match comparison. The system integrates this refined metric with physiological proxies (fatigue) and contextual variables (disciplinary risk modulated by tactical role) to calculate a dynamic Substitution Priority (P final). Validation via a case study of the 2018 FIFA World Cup match between Brazil and Belgium demonstrates the system's ecological validity: it not only aligned with expert consensus on executed substitutions (for example Gabriel Jesus) but, crucially, identified high risk scenarios ignored by human decision makers. Specifically, the model flagged the "FAGNER Paradox" - a maximum priority defensive risk - minutes before a critical yellow card, and detected the "Lukaku Paradox", where an isolated assist masked a severe drop in participation. These results confirm that Fuzzy Logic offers a transparent, explainable, and superior alternative to black box models for optimizing real time tactical decisions.
comment: 34 pages, 7 figures