MyArxiv
Robotics
Source-Free Bistable Fluidic Gripper for Size-Selective and Stiffness-Adaptive Grasping
Conventional fluid-driven soft grippers typically depend on external sources, which limit portability and long-term autonomy. This work introduces a self-contained soft gripper with fixed size that operates solely through internal liquid redistribution among three interconnected bistable snap-through chambers. When the top sensing chamber deforms upon contact, the displaced liquid triggers snap-through expansion of the grasping chambers, enabling stable and size-selective grasping without continuous energy input. The internal hydraulic feedback further allows passive adaptation of gripping pressure to object stiffness. This source-free and compact design opens new possibilities for lightweight, stiffness-adaptive fluid-driven manipulation in soft robotics, providing a feasible approach for targeted size-specific sampling and operation in underwater and field environments.
Unconscious and Intentional Human Motion Cues for Expressive Robot-Arm Motion Design
This study investigates how human motion cues can be used to design expressive robot-arm movements. Using the imperfect-information game Geister, we analyzed two types of human piece-moving motions: natural gameplay (unconscious tendencies) and instructed expressions (intentional cues). Based on these findings, we created phase-specific robot motions by varying movement speed and stop duration, and evaluated observer impressions under two presentation modalities: a physical robot and a recorded video. Results indicate that late-phase motion timing, particularly during withdrawal, plays an important role in impression formation and that physical embodiment enhances the interpretability of motion cues. These findings provide insights for designing expressive robot motions based on human timing behavior.
comment: 5 pages, 5 figures, HAI2025 Workshop on Socially Aware and Cooperative Intelligent Systems
Motion Planning Under Temporal Logic Specifications In Semantically Unknown Environments
This paper addresses a motion planning problem to achieve spatio-temporal-logical tasks, expressed by syntactically co-safe linear temporal logic specifications (scLTL\next), in uncertain environments. Here, the uncertainty is modeled as some probabilistic knowledge on the semantic labels of the environment. For example, the task is "first go to region 1, then go to region 2"; however, the exact locations of regions 1 and 2 are not known a priori, instead a probabilistic belief is available. We propose a novel automata-theoretic approach, where a special product automaton is constructed to capture the uncertainty related to semantic labels, and a reward function is designed for each edge of this product automaton. The proposed algorithm utilizes value iteration for online replanning. We show some theoretical results and present some simulations/experiments to demonstrate the efficacy of the proposed approach.
comment: 8 pages, 6 figures
Flying Robotics Art: ROS-based Drone Draws the Record-Breaking Mural
This paper presents the innovative design and successful deployment of a pioneering autonomous unmanned aerial system developed for executing the world's largest mural painted by a drone. Addressing the dual challenges of maintaining artistic precision and operational reliability under adverse outdoor conditions such as wind and direct sunlight, our work introduces a robust system capable of navigating and painting outdoors with unprecedented accuracy. Key to our approach is a novel navigation system that combines an infrared (IR) motion capture camera and LiDAR technology, enabling precise location tracking tailored specifically for largescale artistic applications. We employ a unique control architecture that uses different regulation in tangential and normal directions relative to the planned path, enabling precise trajectory tracking and stable line rendering. We also present algorithms for trajectory planning and path optimization, allowing for complex curve drawing and area filling. The system includes a custom-designed paint spraying mechanism, specifically engineered to function effectively amidst the turbulent airflow generated by the drone's propellers, which also protects the drone's critical components from paint-related damage, ensuring longevity and consistent performance. Experimental results demonstrate the system's robustness and precision in varied conditions, showcasing its potential for autonomous large-scale art creation and expanding the functional applications of robotics in creative fields.
Multi-robot searching with limited sensing range for static and mobile intruders
We consider the problem of searching for an intruder in a geometric domain by utilizing multiple search robots. The domain is a simply connected orthogonal polygon with edges parallel to the cartesian coordinate axes. Each robot has a limited sensing capability. We study the problem for both static and mobile intruders. It turns out that the problem of finding an intruder is NP-hard, even for a stationary intruder. Given this intractability, we turn our attention towards developing efficient and robust algorithms, namely methods based on space-filling curves, random search, and cooperative random search. Moreover, for each proposed algorithm, we evaluate the trade-off between the number of search robots and the time required for the robots to complete the search process while considering the geometric properties of the connected orthogonal search area.
Manifold-constrained Hamilton-Jacobi Reachability Learning for Decentralized Multi-Agent Motion Planning
Safe multi-agent motion planning (MAMP) under task-induced constraints is a critical challenge in robotics. Many real-world scenarios require robots to navigate dynamic environments while adhering to manifold constraints imposed by tasks. For example, service robots must carry cups upright while avoiding collisions with humans or other robots. Despite recent advances in decentralized MAMP for high-dimensional systems, incorporating manifold constraints remains difficult. To address this, we propose a manifold-constrained Hamilton-Jacobi reachability (HJR) learning framework for decentralized MAMP. Our method solves HJR problems under manifold constraints to capture task-aware safety conditions, which are then integrated into a decentralized trajectory optimization planner. This enables robots to generate motion plans that are both safe and task-feasible without requiring assumptions about other agents' policies. Our approach generalizes across diverse manifold-constrained tasks and scales effectively to high-dimensional multi-agent manipulation problems. Experiments show that our method outperforms existing constrained motion planners and operates at speeds suitable for real-world applications. Video demonstrations are available at https://youtu.be/RYcEHMnPTH8 .
Multi-User Personalisation in Human-Robot Interaction: Using Quantitative Bipolar Argumentation Frameworks for Preferences Conflict Resolution
While personalisation in Human-Robot Interaction (HRI) has advanced significantly, most existing approaches focus on single-user adaptation, overlooking scenarios involving multiple stakeholders with potentially conflicting preferences. To address this, we propose the Multi-User Preferences Quantitative Bipolar Argumentation Framework (MUP-QBAF), a novel multi-user personalisation framework based on Quantitative Bipolar Argumentation Frameworks (QBAFs) that explicitly models and resolves multi-user preference conflicts. Unlike prior work in Argumentation Frameworks, which typically assumes static inputs, our approach is tailored to robotics: it incorporates both users' arguments and the robot's dynamic observations of the environment, allowing the system to adapt over time and respond to changing contexts. Preferences, both positive and negative, are represented as arguments whose strength is recalculated iteratively based on new information. The framework's properties and capabilities are presented and validated through a realistic case study, where an assistive robot mediates between the conflicting preferences of a caregiver and a care recipient during a frailty assessment task. This evaluation further includes a sensitivity analysis of argument base scores, demonstrating how preference outcomes can be shaped by user input and contextual observations. By offering a transparent, structured, and context-sensitive approach to resolving competing user preferences, this work advances the field of multi-user HRI. It provides a principled alternative to data-driven methods, enabling robots to navigate conflicts in real-world environments.
comment: Preprint submitted to a journal
OneOcc: Semantic Occupancy Prediction for Legged Robots with a Single Panoramic Camera
Robust 3D semantic occupancy is crucial for legged/humanoid robots, yet most semantic scene completion (SSC) systems target wheeled platforms with forward-facing sensors. We present OneOcc, a vision-only panoramic SSC framework designed for gait-introduced body jitter and 360{\deg} continuity. OneOcc combines: (i) Dual-Projection fusion (DP-ER) to exploit the annular panorama and its equirectangular unfolding, preserving 360{\deg} continuity and grid alignment; (ii) Bi-Grid Voxelization (BGV) to reason in Cartesian and cylindrical-polar spaces, reducing discretization bias and sharpening free/occupied boundaries; (iii) a lightweight decoder with Hierarchical AMoE-3D for dynamic multi-scale fusion and better long-range/occlusion reasoning; and (iv) plug-and-play Gait Displacement Compensation (GDC) learning feature-level motion correction without extra sensors. We also release two panoramic occupancy benchmarks: QuadOcc (real quadruped, first-person 360{\deg}) and Human360Occ (H3O) (CARLA human-ego 360{\deg} with RGB, Depth, semantic occupancy; standardized within-/cross-city splits). OneOcc sets new state-of-the-art (SOTA): on QuadOcc it beats strong vision baselines and popular LiDAR ones; on H3O it gains +3.83 mIoU (within-city) and +8.08 (cross-city). Modules are lightweight, enabling deployable full-surround perception for legged/humanoid robots. Datasets and code will be publicly available at https://github.com/MasterHow/OneOcc.
comment: Datasets and code will be publicly available at https://github.com/MasterHow/OneOcc
Indicating Robot Vision Capabilities with Augmented Reality
Research indicates that humans can mistakenly assume that robots and humans have the same field of view (FoV), possessing an inaccurate mental model of robots. This misperception may lead to failures during human-robot collaboration tasks where robots might be asked to complete impossible tasks about out-of-view objects. The issue is more severe when robots do not have a chance to scan the scene to update their world model while focusing on assigned tasks. To help align humans' mental models of robots' vision capabilities, we propose four FoV indicators in augmented reality (AR) and conducted a user human-subjects experiment (N=41) to evaluate them in terms of accuracy, confidence, task efficiency, and workload. These indicators span a spectrum from egocentric (robot's eye and head space) to allocentric (task space). Results showed that the allocentric blocks at the task space had the highest accuracy with a delay in interpreting the robot's FoV. The egocentric indicator of deeper eye sockets, possible for physical alteration, also increased accuracy. In all indicators, participants' confidence was high while cognitive load remained low. Finally, we contribute six guidelines for practitioners to apply our AR indicators or physical alterations to align humans' mental models with robots' vision capabilities.
ROSBag MCP Server: Analyzing Robot Data with LLMs for Agentic Embodied AI Applications
Agentic AI systems and Physical or Embodied AI systems have been two key research verticals at the forefront of Artificial Intelligence and Robotics, with Model Context Protocol (MCP) increasingly becoming a key component and enabler of agentic applications. However, the literature at the intersection of these verticals, i.e., Agentic Embodied AI, remains scarce. This paper introduces an MCP server for analyzing ROS and ROS 2 bags, allowing for analyzing, visualizing and processing robot data with natural language through LLMs and VLMs. We describe specific tooling built with robotics domain knowledge, with our initial release focused on mobile robotics and supporting natively the analysis of trajectories, laser scan data, transforms, or time series data. This is in addition to providing an interface to standard ROS 2 CLI tools ("ros2 bag list" or "ros2 bag info"), as well as the ability to filter bags with a subset of topics or trimmed in time. Coupled with the MCP server, we provide a lightweight UI that allows the benchmarking of the tooling with different LLMs, both proprietary (Anthropic, OpenAI) and open-source (through Groq). Our experimental results include the analysis of tool calling capabilities of eight different state-of-the-art LLM/VLM models, both proprietary and open-source, large and small. Our experiments indicate that there is a large divide in tool calling capabilities, with Kimi K2 and Claude Sonnet 4 demonstrating clearly superior performance. We also conclude that there are multiple factors affecting the success rates, from the tool description schema to the number of arguments, as well as the number of tools available to the models. The code is available with a permissive license at https://github.com/binabik-ai/mcp-rosbags.
Development of the Bioinspired Tendon-Driven DexHand 021 with Proprioceptive Compliance Control
The human hand plays a vital role in daily life and industrial applications, yet replicating its multifunctional capabilities-including motion, sensing, and coordinated manipulation-with robotic systems remains a formidable challenge. Developing a dexterous robotic hand requires balancing human-like agility with engineering constraints such as complexity, size-to-weight ratio, durability, and force-sensing performance. This letter presents Dex-Hand 021, a high-performance, cable-driven five-finger robotic hand with 12 active and 7 passive degrees of freedom (DoFs), achieving 19 DoFs dexterity in a lightweight 1 kg design. We propose a proprioceptive force-sensing-based admittance control method to enhance manipulation. Experimental results demonstrate its superior performance: a single-finger load capacity exceeding 10 N, fingertip repeatability under 0.001 m, and force estimation errors below 0.2 N. Compared to PID control, joint torques in multi-object grasping are reduced by 31.19%, significantly improves force-sensing capability while preventing overload during collisions. The hand excels in both power and precision grasps, successfully executing 33 GRASP taxonomy motions and complex manipulation tasks. This work advances the design of lightweight, industrial-grade dexterous hands and enhances proprioceptive control, contributing to robotic manipulation and intelligent manufacturing.
comment: 8 pages 18 fogures, IEEE RAL accept
Value Elicitation for a Socially Assistive Robot Addressing Social Anxiety: A Participatory Design Approach ECAI 2025
Social anxiety is a prevalent mental health condition that can significantly impact overall well-being and quality of life. Despite its widespread effects, adequate support or treatment for social anxiety is often insufficient. Advances in technology, particularly in social robotics, offer promising opportunities to complement traditional mental health. As an initial step toward developing effective solutions, it is essential to understand the values that shape what is considered meaningful, acceptable, and helpful. In this study, a participatory design workshop was conducted with mental health academic researchers to elicit the underlying values that should inform the design of socially assistive robots for social anxiety support. Through creative, reflective, and envisioning activities, participants explored scenarios and design possibilities, allowing for systematic elicitation of values, expectations, needs, and preferences related to robot-supported interventions. The findings reveal rich insights into design-relevant values-including adaptivity, acceptance, and efficacy-that are core to support for individuals with social anxiety. This study highlights the significance of a research-led approach to value elicitation, emphasising user-centred and context-aware design considerations in the development of socially assistive robots.
comment: Accepted at Value Engineering in AI (VALE) Workshop (ECAI 2025)
GUIDES: Guidance Using Instructor-Distilled Embeddings for Pre-trained Robot Policy Enhancement IROS 2025
Pre-trained robot policies serve as the foundation of many validated robotic systems, which encapsulate extensive embodied knowledge. However, they often lack the semantic awareness characteristic of foundation models, and replacing them entirely is impractical in many situations due to high costs and the loss of accumulated knowledge. To address this gap, we introduce GUIDES, a lightweight framework that augments pre-trained policies with semantic guidance from foundation models without requiring architectural redesign. GUIDES employs a fine-tuned vision-language model (Instructor) to generate contextual instructions, which are encoded by an auxiliary module into guidance embeddings. These embeddings are injected into the policy's latent space, allowing the legacy model to adapt to this new semantic input through brief, targeted fine-tuning. For inference-time robustness, a large language model-based Reflector monitors the Instructor's confidence and, when confidence is low, initiates a reasoning loop that analyzes execution history, retrieves relevant examples, and augments the VLM's context to refine subsequent actions. Extensive validation in the RoboCasa simulation environment across diverse policy architectures shows consistent and substantial improvements in task success rates. Real-world deployment on a UR5 robot further demonstrates that GUIDES enhances motion precision for critical sub-tasks such as grasping. Overall, GUIDES offers a practical and resource-efficient pathway to upgrade, rather than replace, validated robot policies.
comment: 8 pages, 4 figures, Accepted by IEEE IROS 2025 Workshop WIR-M
Collaborative Assembly Policy Learning of a Sightless Robot
This paper explores a physical human-robot collaboration (pHRC) task involving the joint insertion of a board into a frame by a sightless robot and a human operator. While admittance control is commonly used in pHRC tasks, it can be challenging to measure the force/torque applied by the human for accurate human intent estimation, limiting the robot's ability to assist in the collaborative task. Other methods that attempt to solve pHRC tasks using reinforcement learning (RL) are also unsuitable for the board-insertion task due to its safety constraints and sparse rewards. Therefore, we propose a novel RL approach that utilizes a human-designed admittance controller to facilitate more active robot behavior and reduce human effort. Through simulation and real-world experiments, we demonstrate that our approach outperforms admittance control in terms of success rate and task completion time. Additionally, we observed a significant reduction in measured force/torque when using our proposed approach compared to admittance control. The video of the experiments is available at https://youtu.be/va07Gw6YIog.
comment: Accepted by IEEE ROBIO 2025
Periodic Skill Discovery NeurIPS 2025
Unsupervised skill discovery in reinforcement learning (RL) aims to learn diverse behaviors without relying on external rewards. However, current methods often overlook the periodic nature of learned skills, focusing instead on increasing the mutual dependence between states and skills or maximizing the distance traveled in latent space. Considering that many robotic tasks -- particularly those involving locomotion -- require periodic behaviors across varying timescales, the ability to discover diverse periodic skills is essential. Motivated by this, we propose Periodic Skill Discovery (PSD), a framework that discovers periodic behaviors in an unsupervised manner. The key idea of PSD is to train an encoder that maps states to a circular latent space, thereby naturally encoding periodicity in the latent representation. By capturing temporal distance, PSD can effectively learn skills with diverse periods in complex robotic tasks, even with pixel-based observations. We further show that these learned skills achieve high performance on downstream tasks such as hurdling. Moreover, integrating PSD with an existing skill discovery method offers more diverse behaviors, thus broadening the agent's repertoire. Our code and demos are available at https://jonghaepark.github.io/psd/
comment: NeurIPS 2025
Learning-based Cooperative Robotic Paper Wrapping: A Unified Control Policy with Residual Force Control
Human-robot cooperation is essential in environments such as warehouses and retail stores, where workers frequently handle deformable objects like paper, bags, and fabrics. Coordinating robotic actions with human assistance remains difficult due to the unpredictable dynamics of deformable materials and the need for adaptive force control. To explore this challenge, we focus on the task of gift wrapping, which exemplifies a long-horizon manipulation problem involving precise folding, controlled creasing, and secure fixation of paper. Success is achieved when the robot completes the sequence to produce a neatly wrapped package with clean folds and no tears. We propose a learning-based framework that integrates a high-level task planner powered by a large language model (LLM) with a low-level hybrid imitation learning (IL) and reinforcement learning (RL) policy. At its core is a Sub-task Aware Robotic Transformer (START) that learns a unified policy from human demonstrations. The key novelty lies in capturing long-range temporal dependencies across the full wrapping sequence within a single model. Unlike vanilla Action Chunking with Transformer (ACT), typically applied to short tasks, our method introduces sub-task IDs that provide explicit temporal grounding. This enables robust performance across the entire wrapping process and supports flexible execution, as the policy learns sub-goals rather than merely replicating motion sequences. Our framework achieves a 97% success rate on real-world wrapping tasks. We show that the unified transformer-based policy reduces the need for specialized models, allows controlled human supervision, and effectively bridges high-level intent with the fine-grained force control required for deformable object manipulation.
Optimizing Earth-Moon Transfer and Cislunar Navigation: Integrating Low-Energy Trajectories, AI Techniques and GNSS-R Technologies
The rapid growth of cislunar activities, including lunar landings, the Lunar Gateway, and in-space refueling stations, requires advances in cost-efficient trajectory design and reliable integration of navigation and remote sensing. Traditional Earth-Moon transfers suffer from rigid launch windows and high propellant demands, while Earth-based GNSS systems provide little to no coverage beyond geostationary orbit. This limits autonomy and environmental awareness in cislunar space. This review compares four major transfer strategies by evaluating velocity requirements, flight durations, and fuel efficiency, and by identifying their suitability for both crewed and robotic missions. The emerging role of artificial intelligence and machine learning is highlighted: convolutional neural networks support automated crater recognition and digital terrain model generation, while deep reinforcement learning enables adaptive trajectory refinement during descent and landing to reduce risk and decision latency. The study also examines how GNSS-Reflectometry and advanced Positioning, Navigation, and Timing architectures can extend navigation capabilities beyond current limits. GNSS-R can act as a bistatic radar for mapping lunar ice, soil properties, and surface topography, while PNT systems support autonomous rendezvous, Lagrange point station-keeping, and coordinated satellite swarm operations. Combining these developments establishes a scalable framework for sustainable cislunar exploration and long-term human and robotic presence.
Learning Natural and Robust Hexapod Locomotion over Complex Terrains via Motion Priors based on Deep Reinforcement Learning
Multi-legged robots offer enhanced stability to navigate complex terrains with their multiple legs interacting with the environment. However, how to effectively coordinate the multiple legs in a larger action exploration space to generate natural and robust movements is a key issue. In this paper, we introduce a motion prior-based approach, successfully applying deep reinforcement learning algorithms to a real hexapod robot. We generate a dataset of optimized motion priors, and train an adversarial discriminator based on the priors to guide the hexapod robot to learn natural gaits. The learned policy is then successfully transferred to a real hexapod robot, and demonstrate natural gait patterns and remarkable robustness without visual information in complex terrains. This is the first time that a reinforcement learning controller has been used to achieve complex terrain walking on a real hexapod robot.
SENT Map - Semantically Enhanced Topological Maps with Foundation Models ICRA 2025
We introduce SENT-Map, a semantically enhanced topological map for representing indoor environments, designed to support autonomous navigation and manipulation by leveraging advancements in foundational models (FMs). Through representing the environment in a JSON text format, we enable semantic information to be added and edited in a format that both humans and FMs understand, while grounding the robot to existing nodes during planning to avoid infeasible states during deployment. Our proposed framework employs a two stage approach, first mapping the environment alongside an operator with a Vision-FM, then using the SENT-Map representation alongside a natural-language query within an FM for planning. Our experimental results show that semantic-enhancement enables even small locally-deployable FMs to successfully plan over indoor environments.
comment: Accepted at ICRA 2025 Workshop on Foundation Models and Neuro-Symbolic AI for Robotics
SENT Map -- Semantically Enhanced Topological Maps with Foundation Models ICRA 2025
We introduce SENT-Map, a semantically enhanced topological map for representing indoor environments, designed to support autonomous navigation and manipulation by leveraging advancements in foundational models (FMs). Through representing the environment in a JSON text format, we enable semantic information to be added and edited in a format that both humans and FMs understand, while grounding the robot to existing nodes during planning to avoid infeasible states during deployment. Our proposed framework employs a two stage approach, first mapping the environment alongside an operator with a Vision-FM, then using the SENT-Map representation alongside a natural-language query within an FM for planning. Our experimental results show that semantic-enhancement enables even small locally-deployable FMs to successfully plan over indoor environments.
comment: Accepted at ICRA 2025 Workshop on Foundation Models and Neuro-Symbolic AI for Robotics
RoboRAN: A Unified Robotics Framework for Reinforcement Learning-Based Autonomous Navigation
Autonomous robots must navigate and operate in diverse environments, from terrestrial and aquatic settings to aerial and space domains. While Reinforcement Learning (RL) has shown promise in training policies for specific autonomous robots, existing frameworks and benchmarks are often constrained to unique platforms, limiting generalization and fair comparisons across different mobility systems. In this paper, we present a multi-domain framework for training, evaluating and deploying RL-based navigation policies across diverse robotic platforms and operational environments. Our work presents four key contributions: (1) a scalable and modular framework, facilitating seamless robot-task interchangeability and reproducible training pipelines; (2) sim-to-real transfer demonstrated through real-world experiments with multiple robots, including a satellite robotic simulator, an unmanned surface vessel, and a wheeled ground vehicle; (3) the release of the first open-source API for deploying Isaac Lab-trained policies to real robots, enabling lightweight inference and rapid field validation; and (4) uniform tasks and metrics for cross-medium evaluation, through a unified evaluation testbed to assess performance of navigation tasks in diverse operational conditions (aquatic, terrestrial and space). By ensuring consistency between simulation and real-world deployment, RoboRAN lowers the barrier to developing adaptable RL-based navigation strategies. Its modular design enables straightforward integration of new robots and tasks through predefined templates, fostering reproducibility and extension to diverse domains. To support the community, we release RoboRAN as open-source.
comment: Accepted at Transactions on Machine Learning Research (TMLR)
Depth Matters: Multimodal RGB-D Perception for Robust Autonomous Agents ICRA 2025
Autonomous agents that rely purely on perception to make real-time control decisions require efficient and robust architectures. In this work, we demonstrate that augmenting RGB input with depth information significantly enhances our agents' ability to predict steering commands compared to using RGB alone. We benchmark lightweight recurrent controllers that leverage the fused RGB-D features for sequential decision-making. To train our models, we collect high-quality data using a small-scale autonomous car controlled by an expert driver via a physical steering wheel, capturing varying levels of steering difficulty. Our models were successfully deployed on real hardware and inherently avoided dynamic and static obstacles, under out-of-distribution conditions. Specifically, our findings reveal that the early fusion of depth data results in a highly robust controller, which remains effective even with frame drops and increased noise levels, without compromising the network's focus on the task.
comment: Submitted to ICRA 2025
An explicit construction of Kaleidocycles by elliptic theta functions
We consider the configuration space of ordered points on the two-dimensional sphere that satisfy a specific system of quadratic equations. We construct periodic orbits in this configuration space using elliptic theta functions and show that they simultaneously satisfy semi-discrete analogues of mKdV and sine-Gordon equations. The configuration space we investigate corresponds to the state space of a linkage mechanism known as the Kaleidocycle, and the constructed orbits describe the characteristic motion of the Kaleidocycle. A key consequence of our construction is the proof that Kaleidocycles exist for any number of tetrahedra greater than five. Our approach is founded on the relationship between the deformation of spatial curves and integrable systems, offering an intriguing example where an integrable system is explicitly solved to generate an orbit in the space of real solutions to polynomial equations defined by geometric constraints.
Autonomous Robotic Drilling System for Mice Cranial Window Creation
Robotic assistance for experimental manipulation in the life sciences is expected to enable favorable outcomes, regardless of the skill of the scientist. Experimental specimens in the life sciences are subject to individual variability and hence require intricate algorithms for successful autonomous robotic control. As a use case, we are studying the cranial window creation in mice. This operation requires the removal of an 8-mm circular patch of the skull, which is approximately 300 um thick, but the shape and thickness of the mouse skull significantly varies depending on the strain of the mouse, sex, and age. In this work, we develop an autonomous robotic drilling system with no offline planning, consisting of a trajectory planner with execution-time feedback with drilling completion level recognition based on image and force information. In the experiments, we first evaluate the image-and-force-based drilling completion level recognition by comparing it with other state-of-the-art deep learning image processing methods and conduct an ablation study in eggshell drilling to evaluate the impact of each module on system performance. Finally, the system performance is further evaluated in postmortem mice, achieving a success rate of 70% (14/20 trials) with an average drilling time of 9.3 min.
comment: 14 pages, 11 figures, accepted on T-ASE 2025
Toward Humanoid Brain-Body Co-design: Joint Optimization of Control and Morphology for Fall Recovery
Humanoid robots represent a central frontier in embodied intelligence, as their anthropomorphic form enables natural deployment in humans' workspace. Brain-body co-design for humanoids presents a promising approach to realizing this potential by jointly optimizing control policies and physical morphology. Within this context, fall recovery emerges as a critical capability. It not only enhances safety and resilience but also integrates naturally with locomotion systems, thereby advancing the autonomy of humanoids. In this paper, we propose RoboCraft, a scalable humanoid co-design framework for fall recovery that iteratively improves performance through the coupled updates of control policy and morphology. A shared policy pretrained across multiple designs is progressively finetuned on high-performing morphologies, enabling efficient adaptation without retraining from scratch. Concurrently, morphology search is guided by human-inspired priors and optimization algorithms, supported by a priority buffer that balances reevaluation of promising candidates with the exploration of novel designs. Experiments show that RoboCraft achieves an average performance gain of 44.55% on seven public humanoid robots, with morphology optimization drives at least 40% of improvements in co-designing four humanoid robots, underscoring the critical role of humanoid co-design.
Mastering Contact-rich Tasks by Combining Soft and Rigid Robotics with Imitation Learning
Soft robots have the potential to revolutionize the use of robotic systems with their capability of establishing safe, robust, and adaptable interactions with their environment, but their precise control remains challenging. In contrast, traditional rigid robots offer high accuracy and repeatability but lack the flexibility of soft robots. We argue that combining these characteristics in a hybrid robotic platform can significantly enhance overall capabilities. This work presents a novel hybrid robotic platform that integrates a rigid manipulator with a fully developed soft arm. This system is equipped with the intelligence necessary to perform flexible and generalizable tasks through imitation learning autonomously. The physical softness and machine learning enable our platform to achieve highly generalizable skills, while the rigid components ensure precision and repeatability.
comment: Update with additional results and experiments
Augmented Reality for RObots (ARRO): Pointing Visuomotor Policies Towards Visual Robustness
Visuomotor policies trained on human expert demonstrations have recently shown strong performance across a wide range of robotic manipulation tasks. However, these policies remain highly sensitive to domain shifts stemming from background or robot embodiment changes, which limits their generalization capabilities. In this paper, we present ARRO, a novel visual representation that leverages zero-shot open-vocabulary segmentation and object detection models to efficiently mask out task-irrelevant regions of the scene in real time without requiring additional training, modeling of the setup, or camera calibration. By filtering visual distractors and overlaying virtual guides during both training and inference, ARRO improves robustness to scene variations and reduces the need for additional data collection. We extensively evaluate ARRO with Diffusion Policy on a range of tabletop manipulation tasks in both simulation and real-world environments, and further demonstrate its compatibility and effectiveness with generalist robot policies, such as Octo and OpenVLA. Across all settings in our evaluation, ARRO yields consistent performance gains, allows for selective masking to choose between different objects, and shows robustness even to challenging segmentation conditions. Videos showcasing our results are available at: https://augmented-reality-for-robots.github.io/
mmE-Loc: Facilitating Accurate Drone Landing with Ultra-High-Frequency Localization
For precise, efficient, and safe drone landings, ground platforms should real-time, accurately locate descending drones and guide them to designated spots. While mmWave sensing combined with cameras improves localization accuracy, lower sampling frequency of traditional frame cameras compared to mmWave radar creates bottlenecks in system throughput. In this work, we upgrade traditional frame camera with event camera, a novel sensor that harmonizes in sampling frequency with mmWave radar within ground platform setup, and introduce mmE-Loc, a high-precision, low-latency ground localization system designed for precise drone landings. To fully exploit the \textit{temporal consistency} and \textit{spatial complementarity} between these two modalities, we propose two innovative modules: \textit{(i)} the Consistency-instructed Collaborative Tracking module, which further leverages the drone's physical knowledge of periodic micro-motions and structure for accurate measurements extraction, and \textit{(ii)} the Graph-informed Adaptive Joint Optimization module, which integrates drone motion information for efficient sensor fusion and drone localization. Real-world experiments conducted in landing scenarios with a drone delivery company demonstrate that mmE-Loc significantly outperforms state-of-the-art methods in both accuracy and latency.
comment: 17 pages, 34 figures. Journal extended version of arXiv:2502.14992
Decentralized Aerial Manipulation of a Cable-Suspended Load using Multi-Agent Reinforcement Learning
This paper presents the first decentralized method to enable real-world 6-DoF manipulation of a cable-suspended load using a team of Micro-Aerial Vehicles (MAVs). Our method leverages multi-agent reinforcement learning (MARL) to train an outer-loop control policy for each MAV. Unlike state-of-the-art controllers that utilize a centralized scheme, our policy does not require global states, inter-MAV communications, nor neighboring MAV information. Instead, agents communicate implicitly through load pose observations alone, which enables high scalability and flexibility. It also significantly reduces computing costs during inference time, enabling onboard deployment of the policy. In addition, we introduce a new action space design for the MAVs using linear acceleration and body rates. This choice, combined with a robust low-level controller, enables reliable sim-to-real transfer despite significant uncertainties caused by cable tension during dynamic 3D motion. We validate our method in various real-world experiments, including full-pose control under load model uncertainties, showing setpoint tracking performance comparable to the state-of-the-art centralized method. We also demonstrate cooperation amongst agents with heterogeneous control policies, and robustness to the complete in-flight loss of one MAV. Videos of experiments: https://autonomousrobots.nl/paper_websites/aerial-manipulation-marl
Thor: Towards Human-Level Whole-Body Reactions for Intense Contact-Rich Environments
Humanoids hold great potential for service, industrial, and rescue applications, in which robots must sustain whole-body stability while performing intense, contact-rich interactions with the environment. However, enabling humanoids to generate human-like, adaptive responses under such conditions remains a major challenge. To address this, we propose Thor, a humanoid framework for human-level whole-body reactions in contact-rich environments. Based on the robot's force analysis, we design a force-adaptive torso-tilt (FAT2) reward function to encourage humanoids to exhibit human-like responses during force-interaction tasks. To mitigate the high-dimensional challenges of humanoid control, Thor introduces a reinforcement learning architecture that decouples the upper body, waist, and lower body. Each component shares global observations of the whole body and jointly updates its parameters. Finally, we deploy Thor on the Unitree G1, and it substantially outperforms baselines in force-interaction tasks. Specifically, the robot achieves a peak pulling force of 167.7 N (approximately 48% of the G1's body weight) when moving backward and 145.5 N when moving forward, representing improvements of 68.9% and 74.7%, respectively, compared with the best-performing baseline. Moreover, Thor is capable of pulling a loaded rack (130 N) and opening a fire door with one hand (60 N). These results highlight Thor's effectiveness in enhancing humanoid force-interaction capabilities.
Multi-Agent Reinforcement Learning for Autonomous Multi-Satellite Earth Observation: A Realistic Case Study
The exponential growth of Low Earth Orbit (LEO) satellites has revolutionised Earth Observation (EO) missions, addressing challenges in climate monitoring, disaster management, and more. However, autonomous coordination in multi-satellite systems remains a fundamental challenge. Traditional optimisation approaches struggle to handle the real-time decision-making demands of dynamic EO missions, necessitating the use of Reinforcement Learning (RL) and Multi-Agent Reinforcement Learning (MARL). In this paper, we investigate RL-based autonomous EO mission planning by modelling single-satellite operations and extending to multi-satellite constellations using MARL frameworks. We address key challenges, including energy and data storage limitations, uncertainties in satellite observations, and the complexities of decentralised coordination under partial observability. By leveraging a near-realistic satellite simulation environment, we evaluate the training stability and performance of state-of-the-art MARL algorithms, including PPO, IPPO, MAPPO, and HAPPO. Our results demonstrate that MARL can effectively balance imaging and resource management while addressing non-stationarity and reward interdependency in multi-satellite coordination. The insights gained from this study provide a foundation for autonomous satellite operations, offering practical guidelines for improving policy learning in decentralised EO missions.
AURA: Autonomous Upskilling with Retrieval-Augmented Agents
Designing reinforcement learning curricula for agile robots traditionally requires extensive manual tuning of reward functions, environment randomizations, and training configurations. We introduce AURA (Autonomous Upskilling with Retrieval-Augmented Agents), a schema-validated curriculum reinforcement learning (RL) framework that leverages Large Language Models (LLMs) as autonomous designers of multi-stage curricula. AURA transforms user prompts into YAML workflows that encode full reward functions, domain randomization strategies, and training configurations. All files are statically validated before any GPU time is used, ensuring efficient and reliable execution. A retrieval-augmented feedback loop allows specialized LLM agents to design, execute, and refine curriculum stages based on prior training results stored in a vector database, enabling continual improvement over time. Quantitative experiments show that AURA consistently outperforms LLM-guided baselines in generation success rate, humanoid locomotion, and manipulation tasks. Ablation studies highlight the importance of schema validation and retrieval for curriculum quality. AURA successfully trains end-to-end policies directly from user prompts and deploys them zero-shot on a custom humanoid robot in multiple environments - capabilities that did not exist previously with manually designed controllers. By abstracting the complexity of curriculum design, AURA enables scalable and adaptive policy learning pipelines that would be complex to construct by hand. Project page: https://aura-research.org/
Hybrid Dynamics Modeling and Trajectory Planning for a Cable-Trailer System with a Quadruped Robot
Inspired by sled-pulling dogs in transportation, we present a cable-trailer integrated with a quadruped robot system. The motion planning of this system faces challenges due to the interactions between the cable's state transitions, the trailer's nonholonomic constraints, and the system's underactuation. To address these challenges, we first develop a hybrid dynamics model that captures the cable's taut and slack states. A search algorithm is then introduced to compute a suboptimal trajectory while incorporating mode transitions. Additionally, we propose a novel collision avoidance constraint based on geometric polygons to formulate the trajectory optimization problem for the hybrid system. The proposed method is implemented on a Unitree A1 quadruped robot with a customized cable-trailer and validated through experiments. The real system demonstrates both agile and safe motion with cable mode transitions.
comment: 8 pages, 8 figures, Accept by RA-L 2025
Multiagent Systems
Outbidding and Outbluffing Elite Humans: Mastering Liar's Poker via Self-Play and Reinforcement Learning
AI researchers have long focused on poker-like games as a testbed for environments characterized by multi-player dynamics, imperfect information, and reasoning under uncertainty. While recent breakthroughs have matched elite human play at no-limit Texas hold'em, the multi-player dynamics are subdued: most hands converge quickly with only two players engaged through multiple rounds of bidding. In this paper, we present Solly, the first AI agent to achieve elite human play in reduced-format Liar's Poker, a game characterized by extensive multi-player engagement. We trained Solly using self-play with a model-free, actor-critic, deep reinforcement learning algorithm. Solly played at an elite human level as measured by win rate (won over 50% of hands) and equity (money won) in heads-up and multi-player Liar's Poker. Solly also outperformed large language models (LLMs), including those with reasoning abilities, on the same metrics. Solly developed novel bidding strategies, randomized play effectively, and was not easily exploitable by world-class human players.
Multi-robot searching with limited sensing range for static and mobile intruders
We consider the problem of searching for an intruder in a geometric domain by utilizing multiple search robots. The domain is a simply connected orthogonal polygon with edges parallel to the cartesian coordinate axes. Each robot has a limited sensing capability. We study the problem for both static and mobile intruders. It turns out that the problem of finding an intruder is NP-hard, even for a stationary intruder. Given this intractability, we turn our attention towards developing efficient and robust algorithms, namely methods based on space-filling curves, random search, and cooperative random search. Moreover, for each proposed algorithm, we evaluate the trade-off between the number of search robots and the time required for the robots to complete the search process while considering the geometric properties of the connected orthogonal search area.
Inter-Agent Trust Models: A Comparative Study of Brief, Claim, Proof, Stake, Reputation and Constraint in Agentic Web Protocol Design-A2A, AP2, ERC-8004, and Beyond AAAI 2026
As the "agentic web" takes shape-billions of AI agents (often LLM-powered) autonomously transacting and collaborating-trust shifts from human oversight to protocol design. In 2025, several inter-agent protocols crystallized this shift, including Google's Agent-to-Agent (A2A), Agent Payments Protocol (AP2), and Ethereum's ERC-8004 "Trustless Agents," yet their underlying trust assumptions remain under-examined. This paper presents a comparative study of trust models in inter-agent protocol design: Brief (self- or third-party verifiable claims), Claim (self-proclaimed capabilities and identity, e.g. AgentCard), Proof (cryptographic verification, including zero-knowledge proofs and trusted execution environment attestations), Stake (bonded collateral with slashing and insurance), Reputation (crowd feedback and graph-based trust signals), and Constraint (sandboxing and capability bounding). For each, we analyze assumptions, attack surfaces, and design trade-offs, with particular emphasis on LLM-specific fragilities-prompt injection, sycophancy/nudge-susceptibility, hallucination, deception, and misalignment-that render purely reputational or claim-only approaches brittle. Our findings indicate no single mechanism suffices. We argue for trustless-by-default architectures anchored in Proof and Stake to gate high-impact actions, augmented by Brief for identity and discovery and Reputation overlays for flexibility and social signals. We comparatively evaluate A2A, AP2, ERC-8004 and related historical variations in academic research under metrics spanning security, privacy, latency/cost, and social robustness (Sybil/collusion/whitewashing resistance). We conclude with hybrid trust model recommendations that mitigate reputation gaming and misinformed LLM behavior, and we distill actionable design guidelines for safer, interoperable, and scalable agent economies.
comment: Submitted to AAAI 2026 Workshop on Trust and Control in Agentic AI (TrustAgent)
Learning Communication Skills in Multi-task Multi-agent Deep Reinforcement Learning
In multi-agent deep reinforcement learning (MADRL), agents can communicate with one another to perform a task in a coordinated manner. When multiple tasks are involved, agents can also leverage knowledge from one task to improve learning in other tasks. In this paper, we propose Multi-task Communication Skills (MCS), a MADRL with communication method that learns and performs multiple tasks simultaneously, with agents interacting through learnable communication protocols. MCS employs a Transformer encoder to encode task-specific observations into a shared message space, capturing shared communication skills among agents. To enhance coordination among agents, we introduce a prediction network that correlates messages with the actions of sender agents in each task. We adapt three multi-agent benchmark environments to multi-task settings, where the number of agents as well as the observation and action spaces vary across tasks. Experimental results demonstrate that MCS achieves better performance than multi-task MADRL baselines without communication, as well as single-task MADRL baselines with and without communication.
comment: 20 pages, 10 figures
Characterising Global Platforms: Centralised, Decentralised, Federated, and Grassroots
Global digital platforms are software systems designed to serve entire populations, with some already serving billions of people. We propose atomic transactions-based multiagent transition systems and protocols as a formal framework to study them; introduce essential agents -- minimal sets of agents the removal of which makes communication impossible; and show that the cardinality of essential agents partitions all global platforms into four classes: 1. Centralised -- one (the server) 2. Decentralised -- finite $>1$ (bootstrap nodes) 3. Federated -- infinite but not universal (all servers) 4. Grassroots -- universal (all agents) Our illustrative formal example is a global social network, for which we provide centralised, decentralised, federated, and grassroots specifications via multiagent atomic transactions, and prove they satisfy basic correctness properties. We discuss informally additional global platforms -- currencies, ``sharing economy'' apps, AI, and more. While this may be the first characterisation of centralised, decentralised, and federated global platforms, grassroots platforms have been formally defined previously, but using different notions. Here, we prove that their original definition implies that all agents are essential, placing grassroots platforms in a distinct class within the broader formal context that includes all global platforms. This work provides the first mathematical framework for classifying any global platform -- existing or imagined -- by providing a multiagent atomic-transactions specification of it and determining the cardinality of the minimal set of essential agents in the ensuing multiagent protocol. It thus
Toward Autonomous Engineering Design: A Knowledge-Guided Multi-Agent Framework
The engineering design process often demands expertise from multiple domains, leading to complex collaborations and iterative refinements. Traditional methods can be resource-intensive and prone to inefficiencies. To address this, we formalize the engineering design process through a multi-agent AI framework that integrates structured design and review loops. The framework introduces specialized knowledge-driven agents that collaborate to generate and refine design candidates. As an exemplar, we demonstrate its application to the aerodynamic optimization of 4-digit NACA airfoils. The framework consists of three key AI agents: a Graph Ontologist, a Design Engineer, and a Systems Engineer. The Graph Ontologist employs a Large Language Model (LLM) to construct two domain-specific knowledge graphs from airfoil design literature. The Systems Engineer, informed by a human manager, formulates technical requirements that guide design generation and evaluation. The Design Engineer leverages the design knowledge graph and computational tools to propose candidate airfoils meeting these requirements. The Systems Engineer reviews and provides feedback both qualitative and quantitative using its own knowledge graph, forming an iterative feedback loop until a design is validated by the manager. The final design is then optimized to maximize performance metrics such as the lift-to-drag ratio. Overall, this work demonstrates how collaborative AI agents equipped with structured knowledge representations can enhance efficiency, consistency, and quality in the engineering design process.
Scaling Multi-Agent Environment Co-Design with Diffusion Models
The agent-environment co-design paradigm jointly optimises agent policies and environment configurations in search of improved system performance. With application domains ranging from warehouse logistics to windfarm management, co-design promises to fundamentally change how we deploy multi-agent systems. However, current co-design methods struggle to scale. They collapse under high-dimensional environment design spaces and suffer from sample inefficiency when addressing moving targets inherent to joint optimisation. We address these challenges by developing Diffusion Co-Design (DiCoDe), a scalable and sample-efficient co-design framework pushing co-design towards practically relevant settings. DiCoDe incorporates two core innovations. First, we introduce Projected Universal Guidance (PUG), a sampling technique that enables DiCoDe to explore a distribution of reward-maximising environments while satisfying hard constraints such as spatial separation between obstacles. Second, we devise a critic distillation mechanism to share knowledge from the reinforcement learning critic, ensuring that the guided diffusion model adapts to evolving agent policies using a dense and up-to-date learning signal. Together, these improvements lead to superior environment-policy pairs when validated on challenging multi-agent environment co-design benchmarks including warehouse automation, multi-agent pathfinding and wind farm optimisation. Our method consistently exceeds the state-of-the-art, achieving, for example, 39% higher rewards in the warehouse setting with 66% fewer simulation samples. This sets a new standard in agent-environment co-design, and is a stepping stone towards reaping the rewards of co-design in real world domains.
ALAS: Transactional and Dynamic Multi-Agent LLM Planning
Large language models enable flexible multi-agent planning but remain fragile in practice: verification is often circular, state changes are not tracked for repair, and small faults trigger costly global recomputation. We present ALAS, a stateful, disruption-aware framework that separates planning from non-circular validation, records a versioned execution log for grounded checks and restore points, and performs localized repair that preserves work in progress. The validator operates independently of the planning LLM with fresh, bounded context, avoiding self-check loops and mid-context attrition. The repair protocol edits only the minimal affected region under explicit policies (retry, catch, timeout, backoff, idempotency keys, compensation, loop guards) defined in a canonical workflow IR that maps to Amazon States Language and Argo Workflows. On job-shop scheduling suites (DMU, TA) across five classical benchmarks, ALAS matches or exceeds strong single-LLM and multi-agent baselines, achieving 83.7% success, reducing token usage by 60%, and running 1.82times faster under comparable settings. A minimal reliability study shows that the validator detects injected structural faults with low overhead, and that localized repair contains runtime perturbations with a bounded edit radius and less makespan degradation than global recompute. Results indicate that the combination of validator isolation, versioned execution logs, and localized repair provides measurable efficiency, feasibility, and scalability for multi-agent LLM planning. Code and seeds will be released.
Divide by Question, Conquer by Agent: SPLIT-RAG with Question-Driven Graph Partitioning
Retrieval-Augmented Generation (RAG) systems empower large language models (LLMs) with external knowledge, yet struggle with efficiency-accuracy trade-offs when scaling to large knowledge graphs. Existing approaches often rely on monolithic graph retrieval, incurring unnecessary latency for simple queries and fragmented reasoning for complex multi-hop questions. To address these challenges, this paper propose SPLIT-RAG, a multi-agent RAG framework that addresses these limitations with question-driven semantic graph partitioning and collaborative subgraph retrieval. The innovative framework first create Semantic Partitioning of Linked Information, then use the Type-Specialized knowledge base to achieve Multi-Agent RAG. The attribute-aware graph segmentation manages to divide knowledge graphs into semantically coherent subgraphs, ensuring subgraphs align with different query types, while lightweight LLM agents are assigned to partitioned subgraphs, and only relevant partitions are activated during retrieval, thus reduce search space while enhancing efficiency. Finally, a hierarchical merging module resolves inconsistencies across subgraph-derived answers through logical verifications. Extensive experimental validation demonstrates considerable improvements compared to existing approaches.
comment: 20 pages, 4 figures
Beyond Single Pass, Looping Through Time: KG-IRAG with Iterative Knowledge Retrieval
Graph Retrieval-Augmented Generation (GraphRAG) has proven highly effective in enhancing the performance of Large Language Models (LLMs) on tasks that require external knowledge. By leveraging Knowledge Graphs (KGs), GraphRAG improves information retrieval for complex reasoning tasks, providing more precise and comprehensive retrieval and generating more accurate responses to QAs. However, most RAG methods fall short in addressing multi-step reasoning, particularly when both information extraction and inference are necessary. To address this limitation, this paper presents Knowledge Graph-Based Iterative Retrieval-Augmented Generation (KG-IRAG), a novel framework that integrates KGs with iterative reasoning to improve LLMs' ability to handle queries involving temporal and logical dependencies. Through iterative retrieval steps, KG-IRAG incrementally gathers relevant data from external KGs, enabling step-by-step reasoning. The proposed approach is particularly suited for scenarios where reasoning is required alongside dynamic temporal data extraction, such as determining optimal travel times based on weather conditions or traffic patterns. Experimental results show that KG-IRAG improves accuracy in complex reasoning tasks by effectively integrating external knowledge with iterative, logic-based retrieval. Additionally, three new datasets: weatherQA-Irish, weatherQA-Sydney, and trafficQA-TFNSW, are formed to evaluate KG-IRAG's performance, demonstrating its potential beyond traditional RAG applications.
comment: 15 pages, 3 figures
Decentralized Aerial Manipulation of a Cable-Suspended Load using Multi-Agent Reinforcement Learning
This paper presents the first decentralized method to enable real-world 6-DoF manipulation of a cable-suspended load using a team of Micro-Aerial Vehicles (MAVs). Our method leverages multi-agent reinforcement learning (MARL) to train an outer-loop control policy for each MAV. Unlike state-of-the-art controllers that utilize a centralized scheme, our policy does not require global states, inter-MAV communications, nor neighboring MAV information. Instead, agents communicate implicitly through load pose observations alone, which enables high scalability and flexibility. It also significantly reduces computing costs during inference time, enabling onboard deployment of the policy. In addition, we introduce a new action space design for the MAVs using linear acceleration and body rates. This choice, combined with a robust low-level controller, enables reliable sim-to-real transfer despite significant uncertainties caused by cable tension during dynamic 3D motion. We validate our method in various real-world experiments, including full-pose control under load model uncertainties, showing setpoint tracking performance comparable to the state-of-the-art centralized method. We also demonstrate cooperation amongst agents with heterogeneous control policies, and robustness to the complete in-flight loss of one MAV. Videos of experiments: https://autonomousrobots.nl/paper_websites/aerial-manipulation-marl
Large Language Models Miss the Multi-Agent Mark NeurIPS 2025
Recent interest in Multi-Agent Systems of Large Language Models (MAS LLMs) has led to an increase in frameworks leveraging multiple LLMs to tackle complex tasks. However, much of this literature appropriates the terminology of MAS without engaging with its foundational principles. In this position paper, we highlight critical discrepancies between MAS theory and current MAS LLMs implementations, focusing on four key areas: the social aspect of agency, environment design, coordination and communication protocols, and measuring emergent behaviours. Our position is that many MAS LLMs lack multi-agent characteristics such as autonomy, social interaction, and structured environments, and often rely on oversimplified, LLM-centric architectures. The field may slow down and lose traction by revisiting problems the MAS literature has already addressed. Therefore, we systematically analyse this issue and outline associated research opportunities; we advocate for better integrating established MAS concepts and more precise terminology to avoid mischaracterisation and missed opportunities.
comment: NeurIPS 2025 (position track)
Multi-Agent Reinforcement Learning for Autonomous Multi-Satellite Earth Observation: A Realistic Case Study
The exponential growth of Low Earth Orbit (LEO) satellites has revolutionised Earth Observation (EO) missions, addressing challenges in climate monitoring, disaster management, and more. However, autonomous coordination in multi-satellite systems remains a fundamental challenge. Traditional optimisation approaches struggle to handle the real-time decision-making demands of dynamic EO missions, necessitating the use of Reinforcement Learning (RL) and Multi-Agent Reinforcement Learning (MARL). In this paper, we investigate RL-based autonomous EO mission planning by modelling single-satellite operations and extending to multi-satellite constellations using MARL frameworks. We address key challenges, including energy and data storage limitations, uncertainties in satellite observations, and the complexities of decentralised coordination under partial observability. By leveraging a near-realistic satellite simulation environment, we evaluate the training stability and performance of state-of-the-art MARL algorithms, including PPO, IPPO, MAPPO, and HAPPO. Our results demonstrate that MARL can effectively balance imaging and resource management while addressing non-stationarity and reward interdependency in multi-satellite coordination. The insights gained from this study provide a foundation for autonomous satellite operations, offering practical guidelines for improving policy learning in decentralised EO missions.
Systems and Control (CS)
A Constant-Gain Equation-Error Framework for Airliner Aerodynamic Monitoring Using QAR Data
Monitoring the in-service aerodynamic performance of airliners is critical for operational efficiency and safety, but using operational Quick Access Recorder (QAR) data for this purpose presents significant challenges. This paper first establishes that the absence of key parameters, particularly aircraft moments of inertia, makes conventional state-propagation filters fundamentally unsuitable for this application. This limitation necessitates a decoupled, Equation-Error Method (EEM). However, we then demonstrate through a comparative analysis that standard recursive estimators with time-varying gains, such as Recursive Least Squares (RLS), also fail within an EEM framework, exhibiting premature convergence or instability when applied to low-excitation cruise data. To overcome these dual challenges, we propose and validate the Constant-Gain Equation-Error Method (CG-EEM). This framework employs a custom estimator with a constant, Kalman-like gain, which is perfectly suited to the stationary, low-signal-to-noise characteristics of cruise flight. The CG-EEM is extensively validated on a large, multi-fleet dataset of over 200 flights, where it produces highly consistent, physically plausible aerodynamic parameters and correctly identifies known performance differences between aircraft types. The result is a robust, scalable, and computationally efficient tool for fleet-wide performance monitoring and the early detection of performance degradation.
Flying Robotics Art: ROS-based Drone Draws the Record-Breaking Mural
This paper presents the innovative design and successful deployment of a pioneering autonomous unmanned aerial system developed for executing the world's largest mural painted by a drone. Addressing the dual challenges of maintaining artistic precision and operational reliability under adverse outdoor conditions such as wind and direct sunlight, our work introduces a robust system capable of navigating and painting outdoors with unprecedented accuracy. Key to our approach is a novel navigation system that combines an infrared (IR) motion capture camera and LiDAR technology, enabling precise location tracking tailored specifically for largescale artistic applications. We employ a unique control architecture that uses different regulation in tangential and normal directions relative to the planned path, enabling precise trajectory tracking and stable line rendering. We also present algorithms for trajectory planning and path optimization, allowing for complex curve drawing and area filling. The system includes a custom-designed paint spraying mechanism, specifically engineered to function effectively amidst the turbulent airflow generated by the drone's propellers, which also protects the drone's critical components from paint-related damage, ensuring longevity and consistent performance. Experimental results demonstrate the system's robustness and precision in varied conditions, showcasing its potential for autonomous large-scale art creation and expanding the functional applications of robotics in creative fields.
Geometrically robust least squares through manifold optimization
This paper presents a methodology for solving a geometrically robust least squares problem, which arises in various applications where the model is subject to geometric constraints. The problem is formulated as a minimax optimization problem on a product manifold, where one variable is constrained to a ball describing uncertainty. To handle the constraint, an exact penalty method is applied. A first-order gradient descent ascent algorithm is proposed to solve the problem, and its convergence properties are illustrated by an example. The proposed method offers a robust approach to solving a wide range of problems arising in signal processing and data-driven control.
comment: Submitted to the 26th International Symposium on Mathematical Theory of Networks and Systems 19-23 August 2024, Cambridge, UK
Artificial-reference tracking MPC with probabilistically validated performance on industrial embedded systems
Industrial embedded systems are typically used to execute simple control algorithms due to their low computational resources. Despite these limitations, the implementation of advanced control techniques such as Model Predictive Control (MPC) has been explored by the control community in recent years, typically considering simple linear formulations or explicit ones to facilitate the online computation of the control input. These simplifications often lack features and properties that are desirable in real-world environments. In this article, we present an efficient implementation for embedded systems of MPC for tracking with artificial reference, solved via a recently developed structure-exploiting first-order method. This formulation is tailored to a wide range of applications by incorporating essential practical features at a small computational cost, including integration with an offset-free scheme, back-off parameters that enable constraint tightening, and soft constraints that preserve feasibility under disturbances or plant-model mismatch. We accompany this with a framework for probabilistic performance validation of the closed-loop system over long-term operation. We illustrate the applicability of the approach on a Programmable Logic Controller (PLC), incorporated in a hardware-in-the-loop setup to control a nonlinear continuous stirred-tank reactor. The behavior of the closed-loop system is probabilistically validated with respect to constraint violations and the number of iterations required at each time step by the MPC optimization algorithm.
comment: 14 pages, 24 figures
Tensor-Efficient High-Dimensional Q-learning
High-dimensional reinforcement learning faces challenges with complex calculations and low sample efficiency in large state-action spaces. Q-learning algorithms struggle particularly with the curse of dimensionality, where the number of state-action pairs grows exponentially with problem size. While neural network-based approaches like Deep Q-Networks have shown success, recent tensor-based methods using low-rank decomposition offer more parameter-efficient alternatives. Building upon existing tensor-based methods, we propose Tensor-Efficient Q-Learning (TEQL), which enhances low-rank tensor decomposition via improved block coordinate descent on discretized state-action spaces, incorporating novel exploration and regularization mechanisms. The key innovation is an exploration strategy that combines approximation error with visit count-based upper confidence bound to prioritize actions with high uncertainty, avoiding wasteful random exploration. Additionally, we incorporate a frequency-based penalty term in the objective function to encourage exploration of less-visited state-action pairs and reduce overfitting to frequently visited regions. Empirical results on classic control tasks demonstrate that TEQL outperforms conventional matrix-based methods and deep RL approaches in both sample efficiency and total rewards, making it suitable for resource-constrained applications, such as space and healthcare where sampling costs are high.
Powered Descent Trajectory Optimization of Chandrayaan-3 using Radau Collocation and Controllable Sets
India achieved a significant milestone on August $23^{\text{rd}}$ 2023, becoming the fourth country to accomplish a soft landing on the Moon. This paper presents the powered descent trajectory design for the Chandrayaan-3 mission. The optimization framework is based on pseudospectral Radau collocation, and controllability-based waypoint refinement is employed to further enhance the robustness of the trajectory against state and control perturbations. Furthermore, the trade-off between fuel consumption and robustness is explicitly quantified, providing insights into the practical considerations of mission planning.
comment: 6 pages, 6 figure, Accepted for publication in Indian Control Conference 2025
Manifold-constrained Hamilton-Jacobi Reachability Learning for Decentralized Multi-Agent Motion Planning
Safe multi-agent motion planning (MAMP) under task-induced constraints is a critical challenge in robotics. Many real-world scenarios require robots to navigate dynamic environments while adhering to manifold constraints imposed by tasks. For example, service robots must carry cups upright while avoiding collisions with humans or other robots. Despite recent advances in decentralized MAMP for high-dimensional systems, incorporating manifold constraints remains difficult. To address this, we propose a manifold-constrained Hamilton-Jacobi reachability (HJR) learning framework for decentralized MAMP. Our method solves HJR problems under manifold constraints to capture task-aware safety conditions, which are then integrated into a decentralized trajectory optimization planner. This enables robots to generate motion plans that are both safe and task-feasible without requiring assumptions about other agents' policies. Our approach generalizes across diverse manifold-constrained tasks and scales effectively to high-dimensional multi-agent manipulation problems. Experiments show that our method outperforms existing constrained motion planners and operates at speeds suitable for real-world applications. Video demonstrations are available at https://youtu.be/RYcEHMnPTH8 .
Exploiting Over-Approximation Errors as Preview Information for Nonlinear Control
We study the control of nonlinear constrained systems via over-approximations. Our key observation is that the over-approximation error, rather than being an unknown disturbance, can be exploited as input-dependent preview information. This leads to the notion of informed policies, which depend on both the state and the error. We formulate the concretization problem -recovering a valid input for the true system from a preview-based policy- as a fixed-point equation. Existence of solutions follows from the Brouwer fixed-point theorem, while efficient computation is enabled through closed-form, linear, or convex programs for input-affine systems, and through an iterative method based on the Banach fixed-point theorem for nonlinear systems.
comment: 7 pages, 2 figures
Explicit Ensemble Learning Surrogate for Joint Chance-Constrained Optimal Power Flow
The increasing penetration of renewable generation introduces uncertainty into power systems, challenging traditional deterministic optimization methods. Chance-constrained optimization offers an approach to balancing cost and risk; however, incorporating joint chance constraints introduces computational challenges. This paper presents an ensemble support vector machine surrogate for joint chance constraint optimal power flow, where multiple linear classifiers are trained on simulated optimal power flow data and embedded as tractable hyperplane constraints via Big--M reformulations. The surrogate yields a polyhedral approximation of probabilistic line flow limits that preserves interpretability and scalability. Numerical experiments on the IEEE 118-bus system show that the proposed method achieves near-optimal costs with a negligible average error of $0.03\%$. These results demonstrate the promise of ensemble surrogates as efficient and transparent tools for risk-aware optimization of power systems.
Data-driven Modeling of Grid-following Control in Grid-connected Converters
As power systems evolve with the integration of renewable energy sources and the implementation of smart grid technologies, there is an increasing need for flexible and scalable modeling approaches capable of accurately capturing the complex dynamics of modern grids. To meet this need, various methods, such as the sparse identification of nonlinear dynamics and deep symbolic regression, have been developed to identify dynamical systems directly from data. In this study, we examine the application of a converter-based resource as a replacement for a traditional generator within a lossless transmission line linked to an infinite bus system. This setup is used to generate synthetic data in grid-following control mode, enabling the evaluation of these methods in effectively capturing system dynamics.
System Identification of a Moored ASV with Recessed Moon Pool via Deterministic and Bayesian Hankel-DMDc
This study addresses the system identification of a small autonomous surface vehicle (ASV) under moored conditions using Hankel dynamic mode decomposition with control (HDMDc) and its Bayesian extension (BHDMDc). Experiments were carried out on a Codevintec CK-14e ASV in the towing tank of CNR-INM, under both irregular and regular head-sea wave conditions. The ASV under investigation features a recessed moon pool, which induces nonlinear responses due to sloshing, thereby increasing the modelling challenge. Data-driven reduced-order models were built from measurements of vessel motions and mooring loads. The HDMDc framework provided accurate deterministic predictions of vessel dynamics, while the Bayesian formulation enabled uncertainty-aware characterization of the model response by accounting for variability in hyperparameter selection. Validation against experimental data demonstrated that both HDMDc and BHDMDc can predict the vessel's response to unseen regular and irregular wave excitations. In conclusion, the study shows that HDMDc-based ROMs are a viable data-driven alternative for system identification, demonstrating for the first time their generalization capability for a sea condition different from the training set, achieving high accuracy in reproducing vessel dynamics.
comment: 26 pages, 11 figures, 2 tables, 1 box
An Alternative Derivation and Optimal Design Method of the Generalized Bilinear Transformation for Discretizing Analog Systems
A popular method for designing digital systems is transforming the transfer function of the corresponding analog systems from the continuous-time domain (s-domain) into the discrete-time domain (z-domain) using the Euler or Tustin method. We demonstrate that these transformations are two specific forms of the Generalized Bilinear Transformation (GBT) with a design parameter, $\alpha$. However, the physical meaning and optimal design method for this parameter are not sufficiently studied. In this paper, we propose an alternative derivation of the GBT derived by employing a new hexagonal shape to approximate the enclosed area of the error function, and we define the parameter $\alpha$ as the shape factor. The physical meaning of the shape factor is firstly revealed, which equals to the percentage of the backward rectangular ratio of the proposed hexagonal shape. We demonstrate that the stable range of the shape factor is [0.5, 1] through domain mapping. Depending on the operating frequencies and the shape factor, we observe two distinct distortion modes, i.e., the magnitude and phase distortion. We proceed to develop an optimal design method for the shape factor based on an objective function in form of the normalized magnitude or phase error. Finally, a low-pass filter (LPF) is designed and tested to verify the effectiveness of the proposed method by comparing the theoretical calculations with the experimental results.
Maximum Likelihood Estimation of Dynamic Sub-Networks with Missing Data
Maximum likelihood estimation is effective for identifying dynamical systems, but applying it to large networks becomes computationally prohibitive. This paper introduces a maximum likelihood estimation method that enables identification of sub-networks within complex interconnected systems without estimating the entire network. The key insight is that under specific topological conditions, a sub-network's parameters can be estimated using only local measurements: signals within the target sub-network and those in the directly connected to the so-called separator sub-network. This approach significantly reduces computational complexity while enhancing privacy by eliminating the need to share sensitive internal data across organizational boundaries. We establish theoretical conditions for network separability, derive the probability density function for the sub-network, and demonstrate the method's effectiveness through numerical examples.
A Digital Twin of Evaporative Thermo-Fluidic Process in Fixation Unit of DoD Inkjet Printers
In inkjet printing, optimal paper moisture is crucial for print quality, achieved through hot-air impingement in the fixation unit. This paper presents a modular digital twin of the fixation unit, modeling the thermo-fluidic drying process and monitoring its spatio-temporal performance. The novel approach formulates the digital twin as an infinite-dimensional state estimator that infers fixation states from limited sensor data, while remaining robust to disturbances. Modularity is achieved through a graph-theoretic model, where each node represents thermo-fluidic dynamics in different sections of the fixation unit. Evaporation is modeled as a nonlinear boundary effect coupled with node dynamics via Linear Fractional Representation. Using the Partial Integral Equation (PIE) framework, we develop a unified approach for stability, input-output analysis, simulation, and rapid prototyping, validated with operational data from a commercial printer. An $\mathcal{H}_{\infty}$-optimal Luenberger state estimator is then synthesized to estimate thermal states from available sensor data, enabling real-time monitoring of spatio-temporal thermal effects on paper sheets.
Lightwave Power Transfer-Enabled Underwater Optical ISAC Systems under Ship Attitude Variation
In this paper, we propose a lightwave power transfer-enabled underwater optical integrated sensing and communication (O-ISAC) system, where an access point (AP) mounted on a seasurface ship transmits lightwave signals to two nodes, namely ($i$) a seabed sensor that harvests energy and transmits uplink information to the AP, and ($ii$) a sensing target whose position is estimated by the AP using an array of pinhole cameras. To capture practical deployment conditions, the ship attitude variation is modeled through its roll, pitch, and yaw angles, each following a Gaussian distribution under low-to-moderate sea states. Closed-form approximations are derived for the mean squared error (MSE) of target localization and the achievable uplink data rate. Analytical and simulation results demonstrate excellent agreement, validating the proposed models and derived expressions, while revealing the fundamental communication-sensing tradeoff in the O-ISAC system. The results further provide valuable design insights, including the optimal camera placement on the ship to minimize localization error, achieving a minimum MSE of $10^{-2}$ $\text{m}^2$ with multiple cameras under roll, pitch, and yaw angle variation of $10^{\circ}$, and the optimal harvest-use ratio of $0.55$ for the considered setup.
comment: This paper has been submitted to the IEEE International Conference on Communications (ICC 2026) conference
Evolutionary Dynamics in Continuous-time Finite-state Mean Field Games - Part II: Stability
We study a dynamic game with a large population of players who choose actions from a finite set in continuous time. Each player has a state in a finite state space that evolves stochastically with their actions. A player's reward depends not only on their own state and action but also on the distribution of states and actions across the population, capturing effects such as congestion in traffic networks. In Part I, we introduced an evolutionary model and a new solution concept - the mixed stationary Nash Equilibrium (MSNE) - which coincides with the rest points of the mean field evolutionary model under meaningful families of revision protocols. In this second part, we investigate the evolutionary stability of MSNE. We derive conditions on both the structure of the MSNE and the game's payoff map that ensure local and global stability under evolutionary dynamics. These results characterize when MSNE can robustly emerge and persist against strategic deviations, thereby providing insight into its long-term viability in large population dynamic games.
Computing the nearest $Ω$-admissible descriptor dissipative Hamiltonian system
For a given set $\Omega \subseteq \mathbb{C}$, a matrix pair $(E,A)$ is called $\Omega$-admissible if it is regular, impulse-free and its eigenvalues lie inside the region $\Omega$. In this paper, we provide a dissipative Hamiltonian characterization for the matrix pairs that are $\Omega$-admissible where $\Omega$ is an LMI region. We then use these results for solving the nearest $\Omega$-admissible matrix pair problem: Given a matrix pair $(E,A)$, find the nearest $\Omega$-admissible pair $(\tilde E, \tilde A)$ to the given pair $(E,A)$. We illustrate our results on several data sets and compare with the state of the art.
comment: 24 pages, 6 figures, code available from https://gitlab.com/ngillis/nearest-omega-stable-pair
Theoretical and Experimental Limitations of RoCoF Estimation
A precise estimation of the Rate of Change of Frequency (RoCoF) is crucial for secure power system operation. In fact, RoCoF is strictly related to the amount of the available physical and/or virtual inertia of the system and the severity of the active power unbalance following a disturbance. For this reason, it is widely exploited in different protection systems, e.g., Anti-Islanding, Under Frequency Load Shedding (UFLS) and wide-area protection systems. The new paradigm of modern power systems, with a low-inertia and converter-based generation assets, is increasing the transient severity, making the frequency and the RoCoF estimation more complex and less precise for the actual devices. This work addresses this issue by proposing a numerically robust approach based on concepts inherited from differential geometry and fluid mechanics. The proposed approach is then tested with high-sampling real experimental measurements and used to develop a faster control logic for a RoCoF-based UFLS control scheme. The proposed approach provides information to protections regarding the nature of the contingency which can be used to improve its response.
MHE in Output Feedback Control of Uncertain Nonlinear Systems via IQCs
We propose a moving horizon estimation (MHE) scheme for general nonlinear constrained systems with parametric or static nonlinear uncertainties and a predetermined state feedback controller that is assumed to robustly stabilize the system in the absence of estimation errors. Leveraging integral quadratic constraints (IQCs), we introduce a new notion of detectability that is robust to possibly non-parametric uncertainties and verifiable in practice. Assuming that the uncertain system driven by the controller satisfies this notion of detectability, we provide an MHE formulation such that the closed-loop system formed of the uncertain system, the controller and MHE is input-to-state stable w.r.t. exogenous disturbances.
comment: 8 pages, 2 figures; extended version; a shortened version is accepted at IEEE Control System Letters, October 27, 2025
Collaborative Assembly Policy Learning of a Sightless Robot
This paper explores a physical human-robot collaboration (pHRC) task involving the joint insertion of a board into a frame by a sightless robot and a human operator. While admittance control is commonly used in pHRC tasks, it can be challenging to measure the force/torque applied by the human for accurate human intent estimation, limiting the robot's ability to assist in the collaborative task. Other methods that attempt to solve pHRC tasks using reinforcement learning (RL) are also unsuitable for the board-insertion task due to its safety constraints and sparse rewards. Therefore, we propose a novel RL approach that utilizes a human-designed admittance controller to facilitate more active robot behavior and reduce human effort. Through simulation and real-world experiments, we demonstrate that our approach outperforms admittance control in terms of success rate and task completion time. Additionally, we observed a significant reduction in measured force/torque when using our proposed approach compared to admittance control. The video of the experiments is available at https://youtu.be/va07Gw6YIog.
comment: Accepted by IEEE ROBIO 2025
Frequency- and Amplitude-Modulated Gates for Universal Quantum Control
Achieving high-fidelity single- and two-qubit gates is essential for executing arbitrary digital quantum algorithms and for building error-corrected quantum computers. We propose a theoretical framework for implementing quantum gates using frequency- and amplitude-modulated microwave control, which extends conventional amplitude modulation by introducing frequency modulation as an additional degree of control. Our approach operates on fixed-frequency qubits, converting the need for qubit frequency tunability into drive frequency modulation. Using Floquet theory, we analyze and design these drives for optimal fidelity within specified criteria. Our framework spans adiabatic to nonadiabatic gates within the Floquet framework, ensuring broad applicability across gate types and control schemes. Using typical transmon qubit parameters in numerical simulations, we demonstrate a universal gate set-including the X, Hadamard, phase, and CZ gates-with control error well below 0.1% and gate times of 25-40 ns for single-qubit operations and 125-135 ns for two-qubit operations. Furthermore, we show an always-on CZ gate tailored for driven qubits, which has gate times of 80-90 ns.
Active Noise Control Method Using Time Domain Neural Networks for Path Decoupling
In decentralized active noise control (ANC) systems, crosstalk between multichannel secondary sources and error microphones significantly degrades control accuracy. Moreover, prefiltering reference signals in filtered-x (Fx) type algorithms may further introduce modeling errors. A theoretical analysis of the Fx-based decentralized control algorithm was performed, which reveals how prefiltering and crosstalk affect the control performance. Then, a hybrid method combining fixed-value neural networks and adaptive strategies was proposed for efficient decentralized ANC. The adaptive filter models the primary path of its own channel online using the least mean square (LMS) algorithm while the neural network (named DecNet) is used for secondary paths inverting and decoupling. The hybrid DecNet-LMS algorithm was implemented in the time domain to guarantee causality and avoid latency. Simulation results with measured acoustic paths show that the proposed method outperforms the existing ANC algorithms using either traditional adaptive filters or neural network-based fixed-coefficient methods under different acoustic conditions.
Control Barrier Function for Aligning Large Language Models
This paper proposes a control-based framework for aligning large language models (LLMs) by leveraging a control barrier function (CBF) to ensure user-desirable text generation. The presented framework applies the CBF safety filter to the predicted token generated from the baseline LLM, to intervene in the generated text. The safety filter includes two significant advantages: this safety filter is an add-on type, allowing it to be used for alignment purposes without fine-tuning the baseline LLM, and if there is an evaluation model regarding the desired alignment, it can be directly applied to the filter design. The overall text-generation system is implemented with open-source language models, aiming to generate positive text.
D2-UC: A Distributed-Distributed Quantum-Classical Framework for Unit Commitment
This paper introduces D2-UC, a quantum-ready framework for the unit commitment (UC) problem that prepares UC for near-term hybrid quantum-classical solvers by combining distributed classical decomposition with distributed quantum execution. We reformulate deterministic and stochastic UC into a three-block alternating direction method of multipliers (ADMM): (i) a convex quadratic subproblem for dispatch and reserves, (ii) a binary subproblem expressed as a quadratic unconstrained binary optimization (QUBO), and (iii) a proximal slack update for consensus. The core contributions are fivefold. First, we demonstrate how the full UC problem can be expressed as a single monolithic QUBO, establishing a direct interface to quantum solvers. Second, we decompose this large binary block into three type-specific QUBOs for commitment, startup, and shutdown, making the problem more tractable but revealing slower ADMM convergence. Third, we restore local logical couplings through per-unit-time micro-QUBOs, which accelerate convergence. Fourth, we batch micro-QUBOs into K non-overlapping block-diagonal problems, reducing many subproblems to a fixed number of solver-ready QUBOs per iteration, compatible with distributed variational quantum eigensolvers (DVQE). Fifth, we integrate an accept-if-better safeguard with DVQE to stabilize hybrid updates and prevent oscillations. Case studies confirm that the proposed methods deliver feasible schedules, faster convergence, and QUBO sizes aligned with current and near-term quantum hardware capabilities. All detailed data, codes, and parameter values are available at https://github.com/LSU-RAISE-LAB/3B-ADMM-UC-DVQE .
Exploiting Over-Approximation Errors as Preview Information for Nonlinear Control
We study the control of nonlinear constrained systems via over-approximations. Our key observation is that the over-approximation error, rather than being an unknown disturbance, can be exploited as input-dependent preview information. This leads to the notion of informed policies, which depend on both the state and the error. We formulate the concretization problem -- recovering a valid input for the true system from a preview-based policy -- as a fixed-point equation. Existence of solutions follows from the Brouwer fixed-point theorem, while efficient computation is enabled through closed-form, linear, or convex programs for input-affine systems, and through an iterative method based on the Banach fixed-point theorem for nonlinear systems.
comment: 7 pages, 2 figures
Evolutionary Dynamics in Continuous-time Finite-state Mean Field Games -- Part II: Stability
We study a dynamic game with a large population of players who choose actions from a finite set in continuous time. Each player has a state in a finite state space that evolves stochastically with their actions. A player's reward depends not only on their own state and action but also on the distribution of states and actions across the population, capturing effects such as congestion in traffic networks. In Part I, we introduced an evolutionary model and a new solution concept - the mixed stationary Nash Equilibrium (MSNE) - which coincides with the rest points of the mean field evolutionary model under meaningful families of revision protocols. In this second part, we investigate the evolutionary stability of MSNE. We derive conditions on both the structure of the MSNE and the game's payoff map that ensure local and global stability under evolutionary dynamics. These results characterize when MSNE can robustly emerge and persist against strategic deviations, thereby providing insight into its long-term viability in large population dynamic games.
Asynchronous Push-sum Dual Gradient Algorithm in Distributed Model Predictive Control
This paper studies the distributed model predictive control (DMPC) problem for distributed discrete-time linear systems with both local and global constraints over directed communication networks. We establish an optimization problem to formulate the DMPC policy, including the design of terminal ingredients. To cope with the global constraint, we transform the primal optimization problem into its dual problem. Then, we propose a novel asynchronous push-sum dual gradient (APDG) algorithm with an adaptive step-size scheme to solve this dual problem in a fully asynchronous distributed manner. The proposed algorithm does not require synchronous waiting and any form of coordination, which greatly improves solving efficiency. We prove that the APDG algorithm converges at an R-linear rate as long as the step-size does not exceed the designed upper bound. Furthermore, we develop a distributed termination criterion to terminate the APDG algorithm when its output solution satisfies the specified suboptimality and the global constraint, thereby avoiding an infinite number of iterations. The recursive feasibility and the stability of the closed-loop system are also established. Finally, a numerical example is provided to clarify and validate our theoretical findings.
Proximal Gradient Dynamics and Feedback Control for Equality-Constrained Composite Optimization
This paper studies equality-constrained composite minimization problems. This class of problems, capturing regularization terms and inequality constraints, naturally arises in a wide range of engineering and machine learning applications. To tackle these optimization problems, inspired by recent results, we introduce the \emph{proportional--integral proximal gradient dynamics} (PI--PGD): a closed-loop system where the Lagrange multipliers are control inputs and states are the problem decision variables. First, we establish the equivalence between the stationary points of the minimization problem and the equilibria of the PI--PGD. Then for the case of affine constraints, by leveraging tools from contraction theory we give a comprehensive convergence analysis for the dynamics, showing linear--exponential convergence towards the equilibrium. That is, the distance between each solution and the equilibrium is upper bounded by a function that first decreases linearly and then exponentially. Our findings are illustrated numerically on a set of representative examples, which include an exploratory application to nonlinear equality constraints.
comment: 18 pages, 10 figures
Reactive power flow optimization in AC drive systems
This paper explores a limit avoidance approach in the case of input (modulation) and output (current) constraints with the aim of enhancing system availability of AC drives. Drawing on the observation that, in a certain range of reactive power, there exists a trade-off between current and modulation magnitude, we exploit this freedom and define a constrained optimization problem. We propose two approaches, one in the form of an activation-function which drives the reactive power set-point towards safety, and an approach which uses online feedback optimization to set the reactive power dynamically. Both methods compromise reactive power tracking accuracy for increased system robustness. Through a high fidelity simulation, we compare the benefits of the two methods, highlighting their effectiveness in industrial applications.
comment: Accepted for an oral talk at the Conference on Decision and Control, 2025
Accounting for Subsystem Aging Variability in Battery Energy Storage System Optimization
This paper presents a degradation-cost-aware optimization framework for multi-string battery energy storage systems, emphasizing the impact of inhomogeneous subsystem-level aging in operational decision-making. We evaluate four scenarios for an energy arbitrage scenario, that vary in model precision and treatment of aging costs. Key performance metrics include operational revenue, power schedule mismatch, missed revenues, capacity losses, and revenue generated per unit of capacity loss. Our analysis reveals that ignoring heterogeneity of subunits may lead to infeasible dispatch plans and reduced revenues. In contrast, combining accurate representation of degraded subsystems and the consideration of aging costs in the objective function improves operational accuracy and economic efficiency of BESS with heterogeneous aged subunits. The fully informed scenario, which combines aging-cost-aware optimization with precise string-level modeling, achieves 21% higher revenue per unit of SOH loss compared to the baseline scenario. These findings highlight that modeling aging heterogeneity is not just a technical refinement but may become a crucial enabler for maximizing both short-term profitability and long-term asset value in particular for long BESS usage scenarios.
An Empirical Bayes approach to ARX Estimation
Empirical Bayes inference is based on estimation of the parameters of an a priori distribution from the observed data. The estimation technique of the parameters of the prior, called hyperparameters, is based on the marginal distribution obtained by integrating the joint density of the model with respect to the prior. This is a key step which needs to be properly adapted to the problem at hand. In this paper we study Empirical Bayes inference of linear autoregressive models with inputs (ARX models) for time series and compare the performance of the marginal parametric estimator with that a full Empirical Bayesian analysis based on the estimated prior. Such a comparison, can only make sense for a (realistic) finite data length. In this setting, we propose a new estimation technique of the hyperparameters by a sequential Bayes procedure which is essentially a backward Kalman filter. It turns out that for finite data length the marginal Bayes tends to behave slightly better than the full Empirical Bayesian parameter estimator and so also in the case of slowly varying random parameters.
Recurrent neural network-based robust control systems with closed-loop regional incremental ISS and application to MPC design
This paper investigates the design of output-feedback schemes for systems described by a class of recurrent neural networks. We propose a procedure based on linear matrix inequalities for designing an observer and a static state-feedback controller. The algorithm leverages global and regional incremental input-to-state stability (incremental ISS) and enables the tracking of constant setpoints, ensuring robustness to disturbances and state estimation uncertainty. To address the potential limitations of regional incremental ISS, we introduce an alternative scheme in which the static law is replaced with a tube-based nonlinear model predictive controller (NMPC) that exploits regional incremental ISS properties. We show that these conditions enable the formulation of a robust NMPC law with guarantees of convergence and recursive feasibility, leading to an enlarged region of attraction. Theoretical results are validated through numerical simulations on the pH-neutralisation process benchmark.
comment: 16 pages, 5 figures, submitted to IEEE Transactions on Automatic Control (under review)
Release Date Optimization in MRP Using Clearing Functions
This paper integrates a clearing function (CF)-based release planning approach into Material Requirements Planning (MRP) to address its limitations in modeling capacity constraints and dynamic lead times. The proposed optimization model replaces MRP's backward scheduling step while preserving its overall structure. Performance is evaluated through simulation experiments on two flow shop systems that explore a range of demand uncertainties and utilization levels. Computational results show that the proposed approach is capable of yielding significant improvements over the conventional backward scheduling approach, due to its ability to compute planned lead times for individual production orders as opposed to BOM items.
PGD-based optimization of 3D bobsleigh track centerlines from 2D centerlines for simulation applications
The centerline of a bobsleigh track defines its geometry and is essential for simulation modeling. To reduce bBobsleigh training costs, leveraging the centerline of the bobsleigh track to construct a virtual environment that closely replicates real competitive settings presents a promising solution. However, publicly available centerline data are typically limited and it is imprecise to construct a training system solely based on 2-dimensional (2D) centerline. To address this practical issue, this paper proposes a method for generating a 3-dimensional (3D) track centerline based on 2D centerline data. Incorporating international track design regulations, the method formulates an optimization problem that considers total track length, height difference, slope constraints, and geometric continuity. A Projected Gradient Descent (PGD) algorithm is used to solve the optimization problem. The generated 3D centerlines are compared with real track data, and the results show that the method can reproduce realistic centerline trends from original or scaled 2D data. For the selected track segment, the relative errors in total length, height difference, and average slope are within 1.7%, 3.2% and 4.1%, respectively, for real 2D data and within 1.1%, 3.5% and 4.3% respectively for scaled data. All slope values remain within the allowable limits. Moreover, by adjusting the segmentation or modifying the weight of height difference in the cost function, various centerline styles applicable to different competitions can be generated. Under different segmentation and weight factors, the maximum errors reach up to 4.4%, 4.8%, and 9.8%, and 4.4%, 4.8%, and 10.0%, respectively. The proposed method provides a flexible and efficient tool for supporting bobsleigh track centerline design.
On Improvisation and Open-Endedness: Insights for Experiential AI AAAI 2026
Improvisation-the art of spontaneous creation that unfolds moment-to-moment without a scripted outcome-requires practitioners to continuously sense, adapt, and create anew. It is a fundamental mode of human creativity spanning music, dance, and everyday life. The open-ended nature of improvisation produces a stream of novel, unrepeatable moments-an aspect highly valued in artistic creativity. In parallel, open-endedness (OE)-a system's capacity for unbounded novelty and endless "interestingness"-is exemplified in natural or cultural evolution and has been considered "the last grand challenge" in artificial life (ALife). The rise of generative AI now raises the question in computational creativity (CC) research: What makes a "good" improvisation for AI? Can AI learn to improvise in a genuinely open-ended way? In this work-in-progress paper, we report insights from in-depth interviews with 6 experts in improvisation across dance, music, and contact improvisation. We draw systemic connections between human improvisational arts and the design of future experiential AI agents that could improvise alone or alongside humans-or even with other AI agents-embodying qualities of improvisation drawn from practice: active listening (umwelt and awareness), being in the time (mindfulness and ephemerality), embracing the unknown (source of randomness and serendipity), non-judgmental flow (acceptance and dynamical stability, balancing structure and surprise (unpredictable criticality at edge of chaos), imaginative metaphor (synaesthesia and planning), empathy, trust, boundary, and care (mutual theory of mind), and playfulness and intrinsic motivation (maintaining interestingness).
comment: Submitted to AAAI 2026 Creative AI for Live Interactive Performances Workshop (CLIP) as a work-in-progress paper
Wireless Laser Power Transfer for Low-altitude Uncrewed Aerial Vehicle-assisted Internet of Things: Paradigms, Challenges, and Solutions
Low-altitude uncrewed aerial vehicles (UAVs) have become integral enablers for the Internet of Things (IoT) by offering enhanced coverage, improved connectivity and access to remote areas. A critical challenge limiting their operational capacity lies in the energy constraints of both aerial platforms and ground-based sensors. This paper explores WLPT as a transformative solution for sustainable energy provisioning in UAV-assisted IoT networks. We first systematically investigate the fundamental principles of WLPT and analysis the comparative advantages. Then, we introduce three operational paradigms for system integration, identify key challenges, and discuss corresponding potential solutions. In case study, we propose a multi-agent reinforcement learning framework to address the coordination and optimization challenges in WLPT-enabled UAV-assisted IoT data collection. Simulation results demonstrate that our framework significantly improves energy sustainability and data freshness. Finally, we discuss some future directions.
comment: This paper has been submitted to IEEE Internet of Things Magazine
Systems and Control (EESS)
A Constant-Gain Equation-Error Framework for Airliner Aerodynamic Monitoring Using QAR Data
Monitoring the in-service aerodynamic performance of airliners is critical for operational efficiency and safety, but using operational Quick Access Recorder (QAR) data for this purpose presents significant challenges. This paper first establishes that the absence of key parameters, particularly aircraft moments of inertia, makes conventional state-propagation filters fundamentally unsuitable for this application. This limitation necessitates a decoupled, Equation-Error Method (EEM). However, we then demonstrate through a comparative analysis that standard recursive estimators with time-varying gains, such as Recursive Least Squares (RLS), also fail within an EEM framework, exhibiting premature convergence or instability when applied to low-excitation cruise data. To overcome these dual challenges, we propose and validate the Constant-Gain Equation-Error Method (CG-EEM). This framework employs a custom estimator with a constant, Kalman-like gain, which is perfectly suited to the stationary, low-signal-to-noise characteristics of cruise flight. The CG-EEM is extensively validated on a large, multi-fleet dataset of over 200 flights, where it produces highly consistent, physically plausible aerodynamic parameters and correctly identifies known performance differences between aircraft types. The result is a robust, scalable, and computationally efficient tool for fleet-wide performance monitoring and the early detection of performance degradation.
Flying Robotics Art: ROS-based Drone Draws the Record-Breaking Mural
This paper presents the innovative design and successful deployment of a pioneering autonomous unmanned aerial system developed for executing the world's largest mural painted by a drone. Addressing the dual challenges of maintaining artistic precision and operational reliability under adverse outdoor conditions such as wind and direct sunlight, our work introduces a robust system capable of navigating and painting outdoors with unprecedented accuracy. Key to our approach is a novel navigation system that combines an infrared (IR) motion capture camera and LiDAR technology, enabling precise location tracking tailored specifically for largescale artistic applications. We employ a unique control architecture that uses different regulation in tangential and normal directions relative to the planned path, enabling precise trajectory tracking and stable line rendering. We also present algorithms for trajectory planning and path optimization, allowing for complex curve drawing and area filling. The system includes a custom-designed paint spraying mechanism, specifically engineered to function effectively amidst the turbulent airflow generated by the drone's propellers, which also protects the drone's critical components from paint-related damage, ensuring longevity and consistent performance. Experimental results demonstrate the system's robustness and precision in varied conditions, showcasing its potential for autonomous large-scale art creation and expanding the functional applications of robotics in creative fields.
Geometrically robust least squares through manifold optimization
This paper presents a methodology for solving a geometrically robust least squares problem, which arises in various applications where the model is subject to geometric constraints. The problem is formulated as a minimax optimization problem on a product manifold, where one variable is constrained to a ball describing uncertainty. To handle the constraint, an exact penalty method is applied. A first-order gradient descent ascent algorithm is proposed to solve the problem, and its convergence properties are illustrated by an example. The proposed method offers a robust approach to solving a wide range of problems arising in signal processing and data-driven control.
comment: Submitted to the 26th International Symposium on Mathematical Theory of Networks and Systems 19-23 August 2024, Cambridge, UK
Artificial-reference tracking MPC with probabilistically validated performance on industrial embedded systems
Industrial embedded systems are typically used to execute simple control algorithms due to their low computational resources. Despite these limitations, the implementation of advanced control techniques such as Model Predictive Control (MPC) has been explored by the control community in recent years, typically considering simple linear formulations or explicit ones to facilitate the online computation of the control input. These simplifications often lack features and properties that are desirable in real-world environments. In this article, we present an efficient implementation for embedded systems of MPC for tracking with artificial reference, solved via a recently developed structure-exploiting first-order method. This formulation is tailored to a wide range of applications by incorporating essential practical features at a small computational cost, including integration with an offset-free scheme, back-off parameters that enable constraint tightening, and soft constraints that preserve feasibility under disturbances or plant-model mismatch. We accompany this with a framework for probabilistic performance validation of the closed-loop system over long-term operation. We illustrate the applicability of the approach on a Programmable Logic Controller (PLC), incorporated in a hardware-in-the-loop setup to control a nonlinear continuous stirred-tank reactor. The behavior of the closed-loop system is probabilistically validated with respect to constraint violations and the number of iterations required at each time step by the MPC optimization algorithm.
comment: 14 pages, 24 figures
Tensor-Efficient High-Dimensional Q-learning
High-dimensional reinforcement learning faces challenges with complex calculations and low sample efficiency in large state-action spaces. Q-learning algorithms struggle particularly with the curse of dimensionality, where the number of state-action pairs grows exponentially with problem size. While neural network-based approaches like Deep Q-Networks have shown success, recent tensor-based methods using low-rank decomposition offer more parameter-efficient alternatives. Building upon existing tensor-based methods, we propose Tensor-Efficient Q-Learning (TEQL), which enhances low-rank tensor decomposition via improved block coordinate descent on discretized state-action spaces, incorporating novel exploration and regularization mechanisms. The key innovation is an exploration strategy that combines approximation error with visit count-based upper confidence bound to prioritize actions with high uncertainty, avoiding wasteful random exploration. Additionally, we incorporate a frequency-based penalty term in the objective function to encourage exploration of less-visited state-action pairs and reduce overfitting to frequently visited regions. Empirical results on classic control tasks demonstrate that TEQL outperforms conventional matrix-based methods and deep RL approaches in both sample efficiency and total rewards, making it suitable for resource-constrained applications, such as space and healthcare where sampling costs are high.
Powered Descent Trajectory Optimization of Chandrayaan-3 using Radau Collocation and Controllable Sets
India achieved a significant milestone on August $23^{\text{rd}}$ 2023, becoming the fourth country to accomplish a soft landing on the Moon. This paper presents the powered descent trajectory design for the Chandrayaan-3 mission. The optimization framework is based on pseudospectral Radau collocation, and controllability-based waypoint refinement is employed to further enhance the robustness of the trajectory against state and control perturbations. Furthermore, the trade-off between fuel consumption and robustness is explicitly quantified, providing insights into the practical considerations of mission planning.
comment: 6 pages, 6 figure, Accepted for publication in Indian Control Conference 2025
Manifold-constrained Hamilton-Jacobi Reachability Learning for Decentralized Multi-Agent Motion Planning
Safe multi-agent motion planning (MAMP) under task-induced constraints is a critical challenge in robotics. Many real-world scenarios require robots to navigate dynamic environments while adhering to manifold constraints imposed by tasks. For example, service robots must carry cups upright while avoiding collisions with humans or other robots. Despite recent advances in decentralized MAMP for high-dimensional systems, incorporating manifold constraints remains difficult. To address this, we propose a manifold-constrained Hamilton-Jacobi reachability (HJR) learning framework for decentralized MAMP. Our method solves HJR problems under manifold constraints to capture task-aware safety conditions, which are then integrated into a decentralized trajectory optimization planner. This enables robots to generate motion plans that are both safe and task-feasible without requiring assumptions about other agents' policies. Our approach generalizes across diverse manifold-constrained tasks and scales effectively to high-dimensional multi-agent manipulation problems. Experiments show that our method outperforms existing constrained motion planners and operates at speeds suitable for real-world applications. Video demonstrations are available at https://youtu.be/RYcEHMnPTH8 .
Exploiting Over-Approximation Errors as Preview Information for Nonlinear Control
We study the control of nonlinear constrained systems via over-approximations. Our key observation is that the over-approximation error, rather than being an unknown disturbance, can be exploited as input-dependent preview information. This leads to the notion of informed policies, which depend on both the state and the error. We formulate the concretization problem -recovering a valid input for the true system from a preview-based policy- as a fixed-point equation. Existence of solutions follows from the Brouwer fixed-point theorem, while efficient computation is enabled through closed-form, linear, or convex programs for input-affine systems, and through an iterative method based on the Banach fixed-point theorem for nonlinear systems.
comment: 7 pages, 2 figures
Explicit Ensemble Learning Surrogate for Joint Chance-Constrained Optimal Power Flow
The increasing penetration of renewable generation introduces uncertainty into power systems, challenging traditional deterministic optimization methods. Chance-constrained optimization offers an approach to balancing cost and risk; however, incorporating joint chance constraints introduces computational challenges. This paper presents an ensemble support vector machine surrogate for joint chance constraint optimal power flow, where multiple linear classifiers are trained on simulated optimal power flow data and embedded as tractable hyperplane constraints via Big--M reformulations. The surrogate yields a polyhedral approximation of probabilistic line flow limits that preserves interpretability and scalability. Numerical experiments on the IEEE 118-bus system show that the proposed method achieves near-optimal costs with a negligible average error of $0.03\%$. These results demonstrate the promise of ensemble surrogates as efficient and transparent tools for risk-aware optimization of power systems.
Data-driven Modeling of Grid-following Control in Grid-connected Converters
As power systems evolve with the integration of renewable energy sources and the implementation of smart grid technologies, there is an increasing need for flexible and scalable modeling approaches capable of accurately capturing the complex dynamics of modern grids. To meet this need, various methods, such as the sparse identification of nonlinear dynamics and deep symbolic regression, have been developed to identify dynamical systems directly from data. In this study, we examine the application of a converter-based resource as a replacement for a traditional generator within a lossless transmission line linked to an infinite bus system. This setup is used to generate synthetic data in grid-following control mode, enabling the evaluation of these methods in effectively capturing system dynamics.
System Identification of a Moored ASV with Recessed Moon Pool via Deterministic and Bayesian Hankel-DMDc
This study addresses the system identification of a small autonomous surface vehicle (ASV) under moored conditions using Hankel dynamic mode decomposition with control (HDMDc) and its Bayesian extension (BHDMDc). Experiments were carried out on a Codevintec CK-14e ASV in the towing tank of CNR-INM, under both irregular and regular head-sea wave conditions. The ASV under investigation features a recessed moon pool, which induces nonlinear responses due to sloshing, thereby increasing the modelling challenge. Data-driven reduced-order models were built from measurements of vessel motions and mooring loads. The HDMDc framework provided accurate deterministic predictions of vessel dynamics, while the Bayesian formulation enabled uncertainty-aware characterization of the model response by accounting for variability in hyperparameter selection. Validation against experimental data demonstrated that both HDMDc and BHDMDc can predict the vessel's response to unseen regular and irregular wave excitations. In conclusion, the study shows that HDMDc-based ROMs are a viable data-driven alternative for system identification, demonstrating for the first time their generalization capability for a sea condition different from the training set, achieving high accuracy in reproducing vessel dynamics.
comment: 26 pages, 11 figures, 2 tables, 1 box
An Alternative Derivation and Optimal Design Method of the Generalized Bilinear Transformation for Discretizing Analog Systems
A popular method for designing digital systems is transforming the transfer function of the corresponding analog systems from the continuous-time domain (s-domain) into the discrete-time domain (z-domain) using the Euler or Tustin method. We demonstrate that these transformations are two specific forms of the Generalized Bilinear Transformation (GBT) with a design parameter, $\alpha$. However, the physical meaning and optimal design method for this parameter are not sufficiently studied. In this paper, we propose an alternative derivation of the GBT derived by employing a new hexagonal shape to approximate the enclosed area of the error function, and we define the parameter $\alpha$ as the shape factor. The physical meaning of the shape factor is firstly revealed, which equals to the percentage of the backward rectangular ratio of the proposed hexagonal shape. We demonstrate that the stable range of the shape factor is [0.5, 1] through domain mapping. Depending on the operating frequencies and the shape factor, we observe two distinct distortion modes, i.e., the magnitude and phase distortion. We proceed to develop an optimal design method for the shape factor based on an objective function in form of the normalized magnitude or phase error. Finally, a low-pass filter (LPF) is designed and tested to verify the effectiveness of the proposed method by comparing the theoretical calculations with the experimental results.
Maximum Likelihood Estimation of Dynamic Sub-Networks with Missing Data
Maximum likelihood estimation is effective for identifying dynamical systems, but applying it to large networks becomes computationally prohibitive. This paper introduces a maximum likelihood estimation method that enables identification of sub-networks within complex interconnected systems without estimating the entire network. The key insight is that under specific topological conditions, a sub-network's parameters can be estimated using only local measurements: signals within the target sub-network and those in the directly connected to the so-called separator sub-network. This approach significantly reduces computational complexity while enhancing privacy by eliminating the need to share sensitive internal data across organizational boundaries. We establish theoretical conditions for network separability, derive the probability density function for the sub-network, and demonstrate the method's effectiveness through numerical examples.
A Digital Twin of Evaporative Thermo-Fluidic Process in Fixation Unit of DoD Inkjet Printers
In inkjet printing, optimal paper moisture is crucial for print quality, achieved through hot-air impingement in the fixation unit. This paper presents a modular digital twin of the fixation unit, modeling the thermo-fluidic drying process and monitoring its spatio-temporal performance. The novel approach formulates the digital twin as an infinite-dimensional state estimator that infers fixation states from limited sensor data, while remaining robust to disturbances. Modularity is achieved through a graph-theoretic model, where each node represents thermo-fluidic dynamics in different sections of the fixation unit. Evaporation is modeled as a nonlinear boundary effect coupled with node dynamics via Linear Fractional Representation. Using the Partial Integral Equation (PIE) framework, we develop a unified approach for stability, input-output analysis, simulation, and rapid prototyping, validated with operational data from a commercial printer. An $\mathcal{H}_{\infty}$-optimal Luenberger state estimator is then synthesized to estimate thermal states from available sensor data, enabling real-time monitoring of spatio-temporal thermal effects on paper sheets.
Lightwave Power Transfer-Enabled Underwater Optical ISAC Systems under Ship Attitude Variation
In this paper, we propose a lightwave power transfer-enabled underwater optical integrated sensing and communication (O-ISAC) system, where an access point (AP) mounted on a seasurface ship transmits lightwave signals to two nodes, namely ($i$) a seabed sensor that harvests energy and transmits uplink information to the AP, and ($ii$) a sensing target whose position is estimated by the AP using an array of pinhole cameras. To capture practical deployment conditions, the ship attitude variation is modeled through its roll, pitch, and yaw angles, each following a Gaussian distribution under low-to-moderate sea states. Closed-form approximations are derived for the mean squared error (MSE) of target localization and the achievable uplink data rate. Analytical and simulation results demonstrate excellent agreement, validating the proposed models and derived expressions, while revealing the fundamental communication-sensing tradeoff in the O-ISAC system. The results further provide valuable design insights, including the optimal camera placement on the ship to minimize localization error, achieving a minimum MSE of $10^{-2}$ $\text{m}^2$ with multiple cameras under roll, pitch, and yaw angle variation of $10^{\circ}$, and the optimal harvest-use ratio of $0.55$ for the considered setup.
comment: This paper has been submitted to the IEEE International Conference on Communications (ICC 2026) conference
Evolutionary Dynamics in Continuous-time Finite-state Mean Field Games - Part II: Stability
We study a dynamic game with a large population of players who choose actions from a finite set in continuous time. Each player has a state in a finite state space that evolves stochastically with their actions. A player's reward depends not only on their own state and action but also on the distribution of states and actions across the population, capturing effects such as congestion in traffic networks. In Part I, we introduced an evolutionary model and a new solution concept - the mixed stationary Nash Equilibrium (MSNE) - which coincides with the rest points of the mean field evolutionary model under meaningful families of revision protocols. In this second part, we investigate the evolutionary stability of MSNE. We derive conditions on both the structure of the MSNE and the game's payoff map that ensure local and global stability under evolutionary dynamics. These results characterize when MSNE can robustly emerge and persist against strategic deviations, thereby providing insight into its long-term viability in large population dynamic games.
Computing the nearest $Ω$-admissible descriptor dissipative Hamiltonian system
For a given set $\Omega \subseteq \mathbb{C}$, a matrix pair $(E,A)$ is called $\Omega$-admissible if it is regular, impulse-free and its eigenvalues lie inside the region $\Omega$. In this paper, we provide a dissipative Hamiltonian characterization for the matrix pairs that are $\Omega$-admissible where $\Omega$ is an LMI region. We then use these results for solving the nearest $\Omega$-admissible matrix pair problem: Given a matrix pair $(E,A)$, find the nearest $\Omega$-admissible pair $(\tilde E, \tilde A)$ to the given pair $(E,A)$. We illustrate our results on several data sets and compare with the state of the art.
comment: 24 pages, 6 figures, code available from https://gitlab.com/ngillis/nearest-omega-stable-pair
Theoretical and Experimental Limitations of RoCoF Estimation
A precise estimation of the Rate of Change of Frequency (RoCoF) is crucial for secure power system operation. In fact, RoCoF is strictly related to the amount of the available physical and/or virtual inertia of the system and the severity of the active power unbalance following a disturbance. For this reason, it is widely exploited in different protection systems, e.g., Anti-Islanding, Under Frequency Load Shedding (UFLS) and wide-area protection systems. The new paradigm of modern power systems, with a low-inertia and converter-based generation assets, is increasing the transient severity, making the frequency and the RoCoF estimation more complex and less precise for the actual devices. This work addresses this issue by proposing a numerically robust approach based on concepts inherited from differential geometry and fluid mechanics. The proposed approach is then tested with high-sampling real experimental measurements and used to develop a faster control logic for a RoCoF-based UFLS control scheme. The proposed approach provides information to protections regarding the nature of the contingency which can be used to improve its response.
MHE in Output Feedback Control of Uncertain Nonlinear Systems via IQCs
We propose a moving horizon estimation (MHE) scheme for general nonlinear constrained systems with parametric or static nonlinear uncertainties and a predetermined state feedback controller that is assumed to robustly stabilize the system in the absence of estimation errors. Leveraging integral quadratic constraints (IQCs), we introduce a new notion of detectability that is robust to possibly non-parametric uncertainties and verifiable in practice. Assuming that the uncertain system driven by the controller satisfies this notion of detectability, we provide an MHE formulation such that the closed-loop system formed of the uncertain system, the controller and MHE is input-to-state stable w.r.t. exogenous disturbances.
comment: 8 pages, 2 figures; extended version; a shortened version is accepted at IEEE Control System Letters, October 27, 2025
Collaborative Assembly Policy Learning of a Sightless Robot
This paper explores a physical human-robot collaboration (pHRC) task involving the joint insertion of a board into a frame by a sightless robot and a human operator. While admittance control is commonly used in pHRC tasks, it can be challenging to measure the force/torque applied by the human for accurate human intent estimation, limiting the robot's ability to assist in the collaborative task. Other methods that attempt to solve pHRC tasks using reinforcement learning (RL) are also unsuitable for the board-insertion task due to its safety constraints and sparse rewards. Therefore, we propose a novel RL approach that utilizes a human-designed admittance controller to facilitate more active robot behavior and reduce human effort. Through simulation and real-world experiments, we demonstrate that our approach outperforms admittance control in terms of success rate and task completion time. Additionally, we observed a significant reduction in measured force/torque when using our proposed approach compared to admittance control. The video of the experiments is available at https://youtu.be/va07Gw6YIog.
comment: Accepted by IEEE ROBIO 2025
Frequency- and Amplitude-Modulated Gates for Universal Quantum Control
Achieving high-fidelity single- and two-qubit gates is essential for executing arbitrary digital quantum algorithms and for building error-corrected quantum computers. We propose a theoretical framework for implementing quantum gates using frequency- and amplitude-modulated microwave control, which extends conventional amplitude modulation by introducing frequency modulation as an additional degree of control. Our approach operates on fixed-frequency qubits, converting the need for qubit frequency tunability into drive frequency modulation. Using Floquet theory, we analyze and design these drives for optimal fidelity within specified criteria. Our framework spans adiabatic to nonadiabatic gates within the Floquet framework, ensuring broad applicability across gate types and control schemes. Using typical transmon qubit parameters in numerical simulations, we demonstrate a universal gate set-including the X, Hadamard, phase, and CZ gates-with control error well below 0.1% and gate times of 25-40 ns for single-qubit operations and 125-135 ns for two-qubit operations. Furthermore, we show an always-on CZ gate tailored for driven qubits, which has gate times of 80-90 ns.
Active Noise Control Method Using Time Domain Neural Networks for Path Decoupling
In decentralized active noise control (ANC) systems, crosstalk between multichannel secondary sources and error microphones significantly degrades control accuracy. Moreover, prefiltering reference signals in filtered-x (Fx) type algorithms may further introduce modeling errors. A theoretical analysis of the Fx-based decentralized control algorithm was performed, which reveals how prefiltering and crosstalk affect the control performance. Then, a hybrid method combining fixed-value neural networks and adaptive strategies was proposed for efficient decentralized ANC. The adaptive filter models the primary path of its own channel online using the least mean square (LMS) algorithm while the neural network (named DecNet) is used for secondary paths inverting and decoupling. The hybrid DecNet-LMS algorithm was implemented in the time domain to guarantee causality and avoid latency. Simulation results with measured acoustic paths show that the proposed method outperforms the existing ANC algorithms using either traditional adaptive filters or neural network-based fixed-coefficient methods under different acoustic conditions.
Control Barrier Function for Aligning Large Language Models
This paper proposes a control-based framework for aligning large language models (LLMs) by leveraging a control barrier function (CBF) to ensure user-desirable text generation. The presented framework applies the CBF safety filter to the predicted token generated from the baseline LLM, to intervene in the generated text. The safety filter includes two significant advantages: this safety filter is an add-on type, allowing it to be used for alignment purposes without fine-tuning the baseline LLM, and if there is an evaluation model regarding the desired alignment, it can be directly applied to the filter design. The overall text-generation system is implemented with open-source language models, aiming to generate positive text.
D2-UC: A Distributed-Distributed Quantum-Classical Framework for Unit Commitment
This paper introduces D2-UC, a quantum-ready framework for the unit commitment (UC) problem that prepares UC for near-term hybrid quantum-classical solvers by combining distributed classical decomposition with distributed quantum execution. We reformulate deterministic and stochastic UC into a three-block alternating direction method of multipliers (ADMM): (i) a convex quadratic subproblem for dispatch and reserves, (ii) a binary subproblem expressed as a quadratic unconstrained binary optimization (QUBO), and (iii) a proximal slack update for consensus. The core contributions are fivefold. First, we demonstrate how the full UC problem can be expressed as a single monolithic QUBO, establishing a direct interface to quantum solvers. Second, we decompose this large binary block into three type-specific QUBOs for commitment, startup, and shutdown, making the problem more tractable but revealing slower ADMM convergence. Third, we restore local logical couplings through per-unit-time micro-QUBOs, which accelerate convergence. Fourth, we batch micro-QUBOs into K non-overlapping block-diagonal problems, reducing many subproblems to a fixed number of solver-ready QUBOs per iteration, compatible with distributed variational quantum eigensolvers (DVQE). Fifth, we integrate an accept-if-better safeguard with DVQE to stabilize hybrid updates and prevent oscillations. Case studies confirm that the proposed methods deliver feasible schedules, faster convergence, and QUBO sizes aligned with current and near-term quantum hardware capabilities. All detailed data, codes, and parameter values are available at https://github.com/LSU-RAISE-LAB/3B-ADMM-UC-DVQE .
Exploiting Over-Approximation Errors as Preview Information for Nonlinear Control
We study the control of nonlinear constrained systems via over-approximations. Our key observation is that the over-approximation error, rather than being an unknown disturbance, can be exploited as input-dependent preview information. This leads to the notion of informed policies, which depend on both the state and the error. We formulate the concretization problem -- recovering a valid input for the true system from a preview-based policy -- as a fixed-point equation. Existence of solutions follows from the Brouwer fixed-point theorem, while efficient computation is enabled through closed-form, linear, or convex programs for input-affine systems, and through an iterative method based on the Banach fixed-point theorem for nonlinear systems.
comment: 7 pages, 2 figures
Evolutionary Dynamics in Continuous-time Finite-state Mean Field Games -- Part II: Stability
We study a dynamic game with a large population of players who choose actions from a finite set in continuous time. Each player has a state in a finite state space that evolves stochastically with their actions. A player's reward depends not only on their own state and action but also on the distribution of states and actions across the population, capturing effects such as congestion in traffic networks. In Part I, we introduced an evolutionary model and a new solution concept - the mixed stationary Nash Equilibrium (MSNE) - which coincides with the rest points of the mean field evolutionary model under meaningful families of revision protocols. In this second part, we investigate the evolutionary stability of MSNE. We derive conditions on both the structure of the MSNE and the game's payoff map that ensure local and global stability under evolutionary dynamics. These results characterize when MSNE can robustly emerge and persist against strategic deviations, thereby providing insight into its long-term viability in large population dynamic games.
Asynchronous Push-sum Dual Gradient Algorithm in Distributed Model Predictive Control
This paper studies the distributed model predictive control (DMPC) problem for distributed discrete-time linear systems with both local and global constraints over directed communication networks. We establish an optimization problem to formulate the DMPC policy, including the design of terminal ingredients. To cope with the global constraint, we transform the primal optimization problem into its dual problem. Then, we propose a novel asynchronous push-sum dual gradient (APDG) algorithm with an adaptive step-size scheme to solve this dual problem in a fully asynchronous distributed manner. The proposed algorithm does not require synchronous waiting and any form of coordination, which greatly improves solving efficiency. We prove that the APDG algorithm converges at an R-linear rate as long as the step-size does not exceed the designed upper bound. Furthermore, we develop a distributed termination criterion to terminate the APDG algorithm when its output solution satisfies the specified suboptimality and the global constraint, thereby avoiding an infinite number of iterations. The recursive feasibility and the stability of the closed-loop system are also established. Finally, a numerical example is provided to clarify and validate our theoretical findings.
Proximal Gradient Dynamics and Feedback Control for Equality-Constrained Composite Optimization
This paper studies equality-constrained composite minimization problems. This class of problems, capturing regularization terms and inequality constraints, naturally arises in a wide range of engineering and machine learning applications. To tackle these optimization problems, inspired by recent results, we introduce the \emph{proportional--integral proximal gradient dynamics} (PI--PGD): a closed-loop system where the Lagrange multipliers are control inputs and states are the problem decision variables. First, we establish the equivalence between the stationary points of the minimization problem and the equilibria of the PI--PGD. Then for the case of affine constraints, by leveraging tools from contraction theory we give a comprehensive convergence analysis for the dynamics, showing linear--exponential convergence towards the equilibrium. That is, the distance between each solution and the equilibrium is upper bounded by a function that first decreases linearly and then exponentially. Our findings are illustrated numerically on a set of representative examples, which include an exploratory application to nonlinear equality constraints.
comment: 18 pages, 10 figures
Reactive power flow optimization in AC drive systems
This paper explores a limit avoidance approach in the case of input (modulation) and output (current) constraints with the aim of enhancing system availability of AC drives. Drawing on the observation that, in a certain range of reactive power, there exists a trade-off between current and modulation magnitude, we exploit this freedom and define a constrained optimization problem. We propose two approaches, one in the form of an activation-function which drives the reactive power set-point towards safety, and an approach which uses online feedback optimization to set the reactive power dynamically. Both methods compromise reactive power tracking accuracy for increased system robustness. Through a high fidelity simulation, we compare the benefits of the two methods, highlighting their effectiveness in industrial applications.
comment: Accepted for an oral talk at the Conference on Decision and Control, 2025
Accounting for Subsystem Aging Variability in Battery Energy Storage System Optimization
This paper presents a degradation-cost-aware optimization framework for multi-string battery energy storage systems, emphasizing the impact of inhomogeneous subsystem-level aging in operational decision-making. We evaluate four scenarios for an energy arbitrage scenario, that vary in model precision and treatment of aging costs. Key performance metrics include operational revenue, power schedule mismatch, missed revenues, capacity losses, and revenue generated per unit of capacity loss. Our analysis reveals that ignoring heterogeneity of subunits may lead to infeasible dispatch plans and reduced revenues. In contrast, combining accurate representation of degraded subsystems and the consideration of aging costs in the objective function improves operational accuracy and economic efficiency of BESS with heterogeneous aged subunits. The fully informed scenario, which combines aging-cost-aware optimization with precise string-level modeling, achieves 21% higher revenue per unit of SOH loss compared to the baseline scenario. These findings highlight that modeling aging heterogeneity is not just a technical refinement but may become a crucial enabler for maximizing both short-term profitability and long-term asset value in particular for long BESS usage scenarios.
An Empirical Bayes approach to ARX Estimation
Empirical Bayes inference is based on estimation of the parameters of an a priori distribution from the observed data. The estimation technique of the parameters of the prior, called hyperparameters, is based on the marginal distribution obtained by integrating the joint density of the model with respect to the prior. This is a key step which needs to be properly adapted to the problem at hand. In this paper we study Empirical Bayes inference of linear autoregressive models with inputs (ARX models) for time series and compare the performance of the marginal parametric estimator with that a full Empirical Bayesian analysis based on the estimated prior. Such a comparison, can only make sense for a (realistic) finite data length. In this setting, we propose a new estimation technique of the hyperparameters by a sequential Bayes procedure which is essentially a backward Kalman filter. It turns out that for finite data length the marginal Bayes tends to behave slightly better than the full Empirical Bayesian parameter estimator and so also in the case of slowly varying random parameters.
Recurrent neural network-based robust control systems with closed-loop regional incremental ISS and application to MPC design
This paper investigates the design of output-feedback schemes for systems described by a class of recurrent neural networks. We propose a procedure based on linear matrix inequalities for designing an observer and a static state-feedback controller. The algorithm leverages global and regional incremental input-to-state stability (incremental ISS) and enables the tracking of constant setpoints, ensuring robustness to disturbances and state estimation uncertainty. To address the potential limitations of regional incremental ISS, we introduce an alternative scheme in which the static law is replaced with a tube-based nonlinear model predictive controller (NMPC) that exploits regional incremental ISS properties. We show that these conditions enable the formulation of a robust NMPC law with guarantees of convergence and recursive feasibility, leading to an enlarged region of attraction. Theoretical results are validated through numerical simulations on the pH-neutralisation process benchmark.
comment: 16 pages, 5 figures, submitted to IEEE Transactions on Automatic Control (under review)
Release Date Optimization in MRP Using Clearing Functions
This paper integrates a clearing function (CF)-based release planning approach into Material Requirements Planning (MRP) to address its limitations in modeling capacity constraints and dynamic lead times. The proposed optimization model replaces MRP's backward scheduling step while preserving its overall structure. Performance is evaluated through simulation experiments on two flow shop systems that explore a range of demand uncertainties and utilization levels. Computational results show that the proposed approach is capable of yielding significant improvements over the conventional backward scheduling approach, due to its ability to compute planned lead times for individual production orders as opposed to BOM items.
PGD-based optimization of 3D bobsleigh track centerlines from 2D centerlines for simulation applications
The centerline of a bobsleigh track defines its geometry and is essential for simulation modeling. To reduce bBobsleigh training costs, leveraging the centerline of the bobsleigh track to construct a virtual environment that closely replicates real competitive settings presents a promising solution. However, publicly available centerline data are typically limited and it is imprecise to construct a training system solely based on 2-dimensional (2D) centerline. To address this practical issue, this paper proposes a method for generating a 3-dimensional (3D) track centerline based on 2D centerline data. Incorporating international track design regulations, the method formulates an optimization problem that considers total track length, height difference, slope constraints, and geometric continuity. A Projected Gradient Descent (PGD) algorithm is used to solve the optimization problem. The generated 3D centerlines are compared with real track data, and the results show that the method can reproduce realistic centerline trends from original or scaled 2D data. For the selected track segment, the relative errors in total length, height difference, and average slope are within 1.7%, 3.2% and 4.1%, respectively, for real 2D data and within 1.1%, 3.5% and 4.3% respectively for scaled data. All slope values remain within the allowable limits. Moreover, by adjusting the segmentation or modifying the weight of height difference in the cost function, various centerline styles applicable to different competitions can be generated. Under different segmentation and weight factors, the maximum errors reach up to 4.4%, 4.8%, and 9.8%, and 4.4%, 4.8%, and 10.0%, respectively. The proposed method provides a flexible and efficient tool for supporting bobsleigh track centerline design.
On Improvisation and Open-Endedness: Insights for Experiential AI AAAI 2026
Improvisation-the art of spontaneous creation that unfolds moment-to-moment without a scripted outcome-requires practitioners to continuously sense, adapt, and create anew. It is a fundamental mode of human creativity spanning music, dance, and everyday life. The open-ended nature of improvisation produces a stream of novel, unrepeatable moments-an aspect highly valued in artistic creativity. In parallel, open-endedness (OE)-a system's capacity for unbounded novelty and endless "interestingness"-is exemplified in natural or cultural evolution and has been considered "the last grand challenge" in artificial life (ALife). The rise of generative AI now raises the question in computational creativity (CC) research: What makes a "good" improvisation for AI? Can AI learn to improvise in a genuinely open-ended way? In this work-in-progress paper, we report insights from in-depth interviews with 6 experts in improvisation across dance, music, and contact improvisation. We draw systemic connections between human improvisational arts and the design of future experiential AI agents that could improvise alone or alongside humans-or even with other AI agents-embodying qualities of improvisation drawn from practice: active listening (umwelt and awareness), being in the time (mindfulness and ephemerality), embracing the unknown (source of randomness and serendipity), non-judgmental flow (acceptance and dynamical stability, balancing structure and surprise (unpredictable criticality at edge of chaos), imaginative metaphor (synaesthesia and planning), empathy, trust, boundary, and care (mutual theory of mind), and playfulness and intrinsic motivation (maintaining interestingness).
comment: Submitted to AAAI 2026 Creative AI for Live Interactive Performances Workshop (CLIP) as a work-in-progress paper
Wireless Laser Power Transfer for Low-altitude Uncrewed Aerial Vehicle-assisted Internet of Things: Paradigms, Challenges, and Solutions
Low-altitude uncrewed aerial vehicles (UAVs) have become integral enablers for the Internet of Things (IoT) by offering enhanced coverage, improved connectivity and access to remote areas. A critical challenge limiting their operational capacity lies in the energy constraints of both aerial platforms and ground-based sensors. This paper explores WLPT as a transformative solution for sustainable energy provisioning in UAV-assisted IoT networks. We first systematically investigate the fundamental principles of WLPT and analysis the comparative advantages. Then, we introduce three operational paradigms for system integration, identify key challenges, and discuss corresponding potential solutions. In case study, we propose a multi-agent reinforcement learning framework to address the coordination and optimization challenges in WLPT-enabled UAV-assisted IoT data collection. Simulation results demonstrate that our framework significantly improves energy sustainability and data freshness. Finally, we discuss some future directions.
comment: This paper has been submitted to IEEE Internet of Things Magazine
Robotics
TWIST2: Scalable, Portable, and Holistic Humanoid Data Collection System
Large-scale data has driven breakthroughs in robotics, from language models to vision-language-action models in bimanual manipulation. However, humanoid robotics lacks equally effective data collection frameworks. Existing humanoid teleoperation systems either use decoupled control or depend on expensive motion capture setups. We introduce TWIST2, a portable, mocap-free humanoid teleoperation and data collection system that preserves full whole-body control while advancing scalability. Our system leverages PICO4U VR for obtaining real-time whole-body human motions, with a custom 2-DoF robot neck (cost around $250) for egocentric vision, enabling holistic human-to-humanoid control. We demonstrate long-horizon dexterous and mobile humanoid skills and we can collect 100 demonstrations in 15 minutes with an almost 100% success rate. Building on this pipeline, we propose a hierarchical visuomotor policy framework that autonomously controls the full humanoid body based on egocentric vision. Our visuomotor policy successfully demonstrates whole-body dexterous manipulation and dynamic kicking tasks. The entire system is fully reproducible and open-sourced at https://yanjieze.com/TWIST2 . Our collected dataset is also open-sourced at https://twist-data.github.io .
comment: Website: https://yanjieze.com/TWIST2
XR-1: Towards Versatile Vision-Language-Action Models via Learning Unified Vision-Motion Representations
Recent progress in large-scale robotic datasets and vision-language models (VLMs) has advanced research on vision-language-action (VLA) models. However, existing VLA models still face two fundamental challenges: (i) producing precise low-level actions from high-dimensional observations, (ii) bridging domain gaps across heterogeneous data sources, including diverse robot embodiments and human demonstrations. Existing methods often encode latent variables from either visual dynamics or robotic actions to guide policy learning, but they fail to fully exploit the complementary multi-modal knowledge present in large-scale, heterogeneous datasets. In this work, we present X Robotic Model 1 (XR-1), a novel framework for versatile and scalable VLA learning across diverse robots, tasks, and environments. XR-1 introduces the \emph{Unified Vision-Motion Codes (UVMC)}, a discrete latent representation learned via a dual-branch VQ-VAE that jointly encodes visual dynamics and robotic motion. UVMC addresses these challenges by (i) serving as an intermediate representation between the observations and actions, and (ii) aligning multimodal dynamic information from heterogeneous data sources to capture complementary knowledge. To effectively exploit UVMC, we propose a three-stage training paradigm: (i) self-supervised UVMC learning, (ii) UVMC-guided pretraining on large-scale cross-embodiment robotic datasets, and (iii) task-specific post-training. We validate XR-1 through extensive real-world experiments with more than 14,000 rollouts on six different robot embodiments, spanning over 120 diverse manipulation tasks. XR-1 consistently outperforms state-of-the-art baselines such as $\pi_{0.5}$, $\pi_0$, RDT, UniVLA, and GR00T-N1.5 while demonstrating strong generalization to novel objects, background variations, distractors, and illumination changes. Our project is at https://xr-1-vla.github.io/.
Non-Contact Manipulation of Induced Magnetic Dipoles
Extending the field of magnetic manipulation to conductive, non-magnetic objects opens the door for a wide array of applications previously limited to hard or soft magnetic materials. Of particular interest is the recycling of space debris through the use of oscillating magnetic fields, which represent a cache of raw materials in an environment particularly suited to the low forces generated from inductive magnetic manipulation. Building upon previous work that demonstrated 3D open-loop position control by leveraging the opposing dipole moment created from induced eddy currents, this work demonstrates closed-loop position control of a semi-buoyant aluminum sphere in lab tests, and the efficacy of varying methods for force inversion is explored. The closed-loop methods represent a critical first step towards wider applications for 3-DOF position control of induced magnetic dipoles.
Many-vs-Many Missile Guidance via Virtual Targets
This paper presents a novel approach to many-vs-many missile guidance using virtual targets (VTs) generated by a Normalizing Flows-based trajectory predictor. Rather than assigning n interceptors directly to m physical targets through conventional weapon target assignment algorithms, we propose a centralized strategy that constructs n VT trajectories representing probabilistic predictions of maneuvering target behavior. Each interceptor is guided toward its assigned VT using Zero-Effort-Miss guidance during midcourse flight, transitioning to Proportional Navigation guidance for terminal interception. This approach treats many-vs-many engagements as many-vs-distribution scenarios, exploiting numerical superiority (n > m) by distributing interceptors across diverse trajectory hypotheses rather than pursuing identical deterministic predictions. Monte Carlo simulations across various target-interceptor configurations (1-6 targets, 1-8 interceptors) demonstrate that the VT method matches or exceeds baseline straight-line prediction performance by 0-4.1% when n = m, with improvements increasing to 5.8-14.4% when n > m. The results confirm that probabilistic VTs enable effective exploitation of numerical superiority, significantly increasing interception probability in many-vs-many scenarios.
comment: will be submitted to Journal of Guidance, Control, and Dynamics as Technical Note
Keeping it Local, Tiny and Real: Automated Report Generation on Edge Computing Devices for Mechatronic-Based Cognitive Systems
Recent advancements in Deep Learning enable hardware-based cognitive systems, that is, mechatronic systems in general and robotics in particular with integrated Artificial Intelligence, to interact with dynamic and unstructured environments. While the results are impressive, the application of such systems to critical tasks like autonomous driving as well as service and care robotics necessitate the evaluation of large amount of heterogeneous data. Automated report generation for Mobile Robotics can play a crucial role in facilitating the evaluation and acceptance of such systems in various domains. In this paper, we propose a pipeline for generating automated reports in natural language utilizing various multi-modal sensors that solely relies on local models capable of being deployed on edge computing devices, thus preserving the privacy of all actors involved and eliminating the need for external services. In particular, we evaluate our implementation on a diverse dataset spanning multiple domains including indoor, outdoor and urban environments, providing quantitative as well as qualitative evaluation results. Various generated example reports and other supplementary materials are available via a public repository.
comment: 6 pages, 4 figures, 1 table; accepted for MECATRONICS-REM 2025 International Conference, PARIS, FRANCE December 3-5 2025
Dexterous Robotic Piano Playing at Scale
Endowing robot hands with human-level dexterity has been a long-standing goal in robotics. Bimanual robotic piano playing represents a particularly challenging task: it is high-dimensional, contact-rich, and requires fast, precise control. We present OmniPianist, the first agent capable of performing nearly one thousand music pieces via scalable, human-demonstration-free learning. Our approach is built on three core components. First, we introduce an automatic fingering strategy based on Optimal Transport (OT), allowing the agent to autonomously discover efficient piano-playing strategies from scratch without demonstrations. Second, we conduct large-scale Reinforcement Learning (RL) by training more than 2,000 agents, each specialized in distinct music pieces, and aggregate their experience into a dataset named RP1M++, consisting of over one million trajectories for robotic piano playing. Finally, we employ a Flow Matching Transformer to leverage RP1M++ through large-scale imitation learning, resulting in the OmniPianist agent capable of performing a wide range of musical pieces. Extensive experiments and ablation studies highlight the effectiveness and scalability of our approach, advancing dexterous robotic piano playing at scale.
From the Laboratory to Real-World Application: Evaluating Zero-Shot Scene Interpretation on Edge Devices for Mobile Robotics
Video Understanding, Scene Interpretation and Commonsense Reasoning are highly challenging tasks enabling the interpretation of visual information, allowing agents to perceive, interact with and make rational decisions in its environment. Large Language Models (LLMs) and Visual Language Models (VLMs) have shown remarkable advancements in these areas in recent years, enabling domain-specific applications as well as zero-shot open vocabulary tasks, combining multiple domains. However, the required computational complexity poses challenges for their application on edge devices and in the context of Mobile Robotics, especially considering the trade-off between accuracy and inference time. In this paper, we investigate the capabilities of state-of-the-art VLMs for the task of Scene Interpretation and Action Recognition, with special regard to small VLMs capable of being deployed to edge devices in the context of Mobile Robotics. The proposed pipeline is evaluated on a diverse dataset consisting of various real-world cityscape, on-campus and indoor scenarios. The experimental evaluation discusses the potential of these small models on edge devices, with particular emphasis on challenges, weaknesses, inherent model biases and the application of the gained information. Supplementary material is provided via the following repository: https://datahub.rz.rptu.de/hstr-csrl-public/publications/scene-interpretation-on-edge-devices/
comment: 15 pages, 6 figures, 1 table; accepted for AI-2025 Forty-fifth SGAI International Conference on Artificial Intelligence CAMBRIDGE, ENGLAND 16-18 DECEMBER 2025
Synthetic Crop-Weed Image Generation and its Impact on Model Generalization
Precise semantic segmentation of crops and weeds is necessary for agricultural weeding robots. However, training deep learning models requires large annotated datasets, which are costly to obtain in real fields. Synthetic data can reduce this burden, but the gap between simulated and real images remains a challenge. In this paper, we present a pipeline for procedural generation of synthetic crop-weed images using Blender, producing annotated datasets under diverse conditions of plant growth, weed density, lighting, and camera angle. We benchmark several state-of-the-art segmentation models on synthetic and real datasets and analyze their cross-domain generalization. Our results show that training on synthetic images leads to a sim-to-real gap of 10%, surpassing previous state-of-the-art methods. Moreover, synthetic data demonstrates good generalization properties, outperforming real datasets in cross-domain scenarios. These findings highlight the potential of synthetic agricultural datasets and support hybrid strategies for more efficient model training.
Whole-body motion planning and safety-critical control for aerial manipulation
Aerial manipulation combines the maneuverability of multirotors with the dexterity of robotic arms to perform complex tasks in cluttered spaces. Yet planning safe, dynamically feasible trajectories remains difficult due to whole-body collision avoidance and the conservativeness of common geometric abstractions such as bounding boxes or ellipsoids. We present a whole-body motion planning and safety-critical control framework for aerial manipulators built on superquadrics (SQs). Using an SQ-plus-proxy representation, we model both the vehicle and obstacles with differentiable, geometry-accurate surfaces. Leveraging this representation, we introduce a maximum-clearance planner that fuses Voronoi diagrams with an equilibrium-manifold formulation to generate smooth, collision-aware trajectories. We further design a safety-critical controller that jointly enforces thrust limits and collision avoidance via high-order control barrier functions. In simulation, our approach outperforms sampling-based planners in cluttered environments, producing faster, safer, and smoother trajectories and exceeding ellipsoid-based baselines in geometric fidelity. Actual experiments on a physical aerial-manipulation platform confirm feasibility and robustness, demonstrating consistent performance across simulation and hardware settings. The video can be found at https://youtu.be/hQYKwrWf1Ak.
comment: Submitted to 2026 IFAC World Congress with the Journal option (MECHATRONICS)
Cycle-Sync: Robust Global Camera Pose Estimation through Enhanced Cycle-Consistent Synchronization NeurIPS 2025
We introduce Cycle-Sync, a robust and global framework for estimating camera poses (both rotations and locations). Our core innovation is a location solver that adapts message-passing least squares (MPLS) -- originally developed for group synchronization -- to camera location estimation. We modify MPLS to emphasize cycle-consistent information, redefine cycle consistencies using estimated distances from previous iterations, and incorporate a Welsch-type robust loss. We establish the strongest known deterministic exact-recovery guarantee for camera location estimation, showing that cycle consistency alone -- without access to inter-camera distances -- suffices to achieve the lowest sample complexity currently known. To further enhance robustness, we introduce a plug-and-play outlier rejection module inspired by robust subspace recovery, and we fully integrate cycle consistency into MPLS for rotation synchronization. Our global approach avoids the need for bundle adjustment. Experiments on synthetic and real datasets show that Cycle-Sync consistently outperforms leading pose estimators, including full structure-from-motion pipelines with bundle adjustment.
comment: NeurIPS 2025 spotlight paper
ZJUNlict Extended Team Description Paper 2025
This paper presents the ZJUNlict team's work over the past year, covering both hardware and software advancements. In the hardware domain, the integration of an IMU into the v2023 robot was completed to enhance posture accuracy and angular velocity planning. On the software side, key modules were optimized, including the strategy and CUDA modules, with significant improvements in decision making efficiency, ball pursuit prediction, and ball possession prediction to adapt to high-tempo game dynamics.
SuckTac: Camera-based Tactile Sucker for Unstructured Surface Perception and Interaction
Suckers are significant for robots in picking, transferring, manipulation and locomotion on diverse surfaces. However, most of the existing suckers lack high-fidelity perceptual and tactile sensing, which impedes them from resolving the fine-grained geometric features and interaction status of the target surface. This limits their robust performance with irregular objects and in complex, unstructured environments. Inspired by the adaptive structure and high-performance sensory capabilities of cephalopod suckers, in this paper, we propose a novel, intelligent sucker, named SuckTac, that integrates a camera-based tactile sensor directly within its optimized structure to provide high-density perception and robust suction. Specifically, through joint structure design and optimization and based on a multi-material integrated casting technique, a camera and light source are embedded into the sucker, which enables in-situ, high-density perception of fine details like surface shape, texture and roughness. To further enhance robustness and adaptability, the sucker's mechanical design is also optimized by refining its profile, adding a compliant lip, and incorporating surface microstructure. Extensive experiments, including challenging tasks such as robotic cloth manipulation and soft mobile robot inspection, demonstrate the superior performance and broad applicability of the proposed system.
LACY: A Vision-Language Model-based Language-Action Cycle for Self-Improving Robotic Manipulation
Learning generalizable policies for robotic manipulation increasingly relies on large-scale models that map language instructions to actions (L2A). However, this one-way paradigm often produces policies that execute tasks without deeper contextual understanding, limiting their ability to generalize or explain their behavior. We argue that the complementary skill of mapping actions back to language (A2L) is essential for developing more holistic grounding. An agent capable of both acting and explaining its actions can form richer internal representations and unlock new paradigms for self-supervised learning. We introduce LACY (Language-Action Cycle), a unified framework that learns such bidirectional mappings within a single vision-language model. LACY is jointly trained on three synergistic tasks: generating parameterized actions from language (L2A), explaining observed actions in language (A2L), and verifying semantic consistency between two language descriptions (L2C). This enables a self-improving cycle that autonomously generates and filters new training data through an active augmentation strategy targeting low-confidence cases, thereby improving the model without additional human labels. Experiments on pick-and-place tasks in both simulation and the real world show that LACY improves task success rates by 56.46% on average and yields more robust language-action grounding for robotic manipulation. Project page: https://vla2026.github.io/LACY/
comment: Preprint. Project page: https://vla2026.github.io/LACY/
A Quantitative Comparison of Centralised and Distributed Reinforcement Learning-Based Control for Soft Robotic Arms
This paper presents a quantitative comparison between centralised and distributed multi-agent reinforcement learning (MARL) architectures for controlling a soft robotic arm modelled as a Cosserat rod in simulation. Using PyElastica and the OpenAI Gym interface, we train both a global Proximal Policy Optimisation (PPO) controller and a Multi-Agent PPO (MAPPO) under identical budgets. Both approaches are based on the arm having $n$ number of controlled sections. The study systematically varies $n$ and evaluates the performance of the arm to reach a fixed target in three scenarios: default baseline condition, recovery from external disturbance, and adaptation to actuator failure. Quantitative metrics used for the evaluation are mean action magnitude, mean final distance, mean episode length, and success rate. The results show that there are no significant benefits of the distributed policy when the number of controlled sections $n\le4$. In very simple systems, when $n\le2$, the centralised policy outperforms the distributed one. When $n$ increases to $4< n\le 12$, the distributed policy shows a high sample efficiency. In these systems, distributed policy promotes a stronger success rate, resilience, and robustness under local observability and yields faster convergence given the same sample size. However, centralised policies achieve much higher time efficiency during training as it takes much less time to train the same size of samples. These findings highlight the trade-offs between centralised and distributed policy in reinforcement learning-based control for soft robotic systems and provide actionable design guidance for future sim-to-real transfer in soft rod-like manipulators.
comment: 7 pages, 4 figures, 2 tables, submitted to RoboSoft 2026
Kinematic and Ergonomic Design of a Robotic Arm for Precision Laparoscopic Surgery
Robotic assistance in minimally invasive surgery can greatly enhance surgical precision and reduce surgeon fatigue. This paper presents a focused investigation on the kinematic and ergonomic design principles for a laparoscopic surgical robotic arm aimed at high-precision tasks. We propose a 7-degree-of-freedom (7-DOF) robotic arm system that incorporates a remote center of motion (RCM) at the instrument insertion point and ergonomic considerations to improve surgeon interaction. The design is implemented on a general-purpose robotic platform, and a series of simulated surgical tasks were performed to evaluate targeting accuracy, task efficiency, and surgeon comfort compared to conventional manual laparoscopy. Experimental results demonstrate that the optimized robotic design achieves significantly improved targeting accuracy (error reduced by over 50%) and shorter task completion times, while substantially lowering operator muscle strain and discomfort. These findings validate the importance of kinematic optimization (such as added articulations and tremor filtering) and human-centered ergonomic design in enhancing the performance of robot-assisted surgery. The insights from this work can guide the development of next-generation surgical robots that improve surgical outcomes and ergonomics for the operating team.
Text to Robotic Assembly of Multi Component Objects using 3D Generative AI and Vision Language Models NeurIPS 2025
Advances in 3D generative AI have enabled the creation of physical objects from text prompts, but challenges remain in creating objects involving multiple component types. We present a pipeline that integrates 3D generative AI with vision-language models (VLMs) to enable the robotic assembly of multi-component objects from natural language. Our method leverages VLMs for zero-shot, multi-modal reasoning about geometry and functionality to decompose AI-generated meshes into multi-component 3D models using predefined structural and panel components. We demonstrate that a VLM is capable of determining which mesh regions need panel components in addition to structural components, based on object functionality. Evaluation across test objects shows that users preferred the VLM-generated assignments 90.6% of the time, compared to 59.4% for rule-based and 2.5% for random assignment. Lastly, the system allows users to refine component assignments through conversational feedback, enabling greater human control and agency in making physical objects with generative AI and robotics.
comment: Accepted to NeurIPS 2025, Conference on Neural Information Processing Systems, Creative AI Track
Census-Based Population Autonomy For Distributed Robotic Teaming
Collaborating teams of robots show promise due in their ability to complete missions more efficiently and with improved robustness, attributes that are particularly useful for systems operating in marine environments. A key issue is how to model, analyze, and design these multi-robot systems to realize the full benefits of collaboration, a challenging task since the domain of multi-robot autonomy encompasses both collective and individual behaviors. This paper introduces a layered model of multi-robot autonomy that uses the principle of census, or a weighted count of the inputs from neighbors, for collective decision-making about teaming, coupled with multi-objective behavior optimization for individual decision-making about actions. The census component is expressed as a nonlinear opinion dynamics model and the multi-objective behavior optimization is accomplished using interval programming. This model can be reduced to recover foundational algorithms in distributed optimization and control, while the full model enables new types of collective behaviors that are useful in real-world scenarios. To illustrate these points, a new method for distributed optimization of subgroup allocation is introduced where robots use a gradient descent algorithm to minimize portions of the cost functions that are locally known, while being influenced by the opinion states from neighbors to account for the unobserved costs. With this method the group can collectively use the information contained in the Hessian matrix of the total global cost. The utility of this model is experimentally validated in three categorically different experiments with fleets of autonomous surface vehicles: an adaptive sampling scenario, a high value unit protection scenario, and a competitive game of capture the flag.
comment: 16 pages, 17 figures
3D Cal: An Open-Source Software Library for Calibrating Tactile Sensors
Tactile sensing plays a key role in enabling dexterous and reliable robotic manipulation, but realizing this capability requires substantial calibration to convert raw sensor readings into physically meaningful quantities. Despite its near-universal necessity, the calibration process remains ad hoc and labor-intensive. Here, we introduce \libname{}, an open-source library that transforms a low-cost 3D printer into an automated probing device capable of generating large volumes of labeled training data for tactile sensor calibration. We demonstrate the utility of \libname{} by calibrating two commercially available vision-based tactile sensors, DIGIT and GelSight Mini, to reconstruct high-quality depth maps using the collected data and a custom convolutional neural network. In addition, we perform a data ablation study to determine how much data is needed for accurate calibration, providing practical guidelines for researchers working with these specific sensors, and we benchmark the trained models on previously unseen objects to evaluate calibration accuracy and generalization performance. By automating tactile sensor calibration, \libname{} can accelerate tactile sensing research, simplify sensor deployment, and promote the practical integration of tactile sensing in robotic platforms.
WorldPlanner: Monte Carlo Tree Search and MPC with Action-Conditioned Visual World Models
Robots must understand their environment from raw sensory inputs and reason about the consequences of their actions in it to solve complex tasks. Behavior Cloning (BC) leverages task-specific human demonstrations to learn this knowledge as end-to-end policies. However, these policies are difficult to transfer to new tasks, and generating training data is challenging because it requires careful demonstrations and frequent environment resets. In contrast to such policy-based view, in this paper we take a model-based approach where we collect a few hours of unstructured easy-to-collect play data to learn an action-conditioned visual world model, a diffusion-based action sampler, and optionally a reward model. The world model -- in combination with the action sampler and a reward model -- is then used to optimize long sequences of actions with a Monte Carlo Tree Search (MCTS) planner. The resulting plans are executed on the robot via a zeroth-order Model Predictive Controller (MPC). We show that the action sampler mitigates hallucinations of the world model during planning and validate our approach on 3 real-world robotic tasks with varying levels of planning and modeling complexity. Our experiments support the hypothesis that planning leads to a significant improvement over BC baselines on a standard manipulation test environment.
A Collaborative Reasoning Framework for Anomaly Diagnostics in Underwater Robotics ICRA 2026
The safe deployment of autonomous systems in safety-critical settings requires a paradigm that combines human expertise with AI-driven analysis, especially when anomalies are unforeseen. We introduce AURA (Autonomous Resilience Agent), a collaborative framework for anomaly and fault diagnostics in robotics. AURA integrates large language models (LLMs), a high-fidelity digital twin (DT), and human-in-the-loop interaction to detect and respond to anomalous behavior in real time. The architecture uses two agents with clear roles: (i) a low-level State Anomaly Characterization Agent that monitors telemetry and converts signals into a structured natural-language problem description, and (ii) a high-level Diagnostic Reasoning Agent that conducts a knowledge-grounded dialogue with an operator to identify root causes, drawing on external sources. Human-validated diagnoses are then converted into new training examples that refine the low-level perceptual model. This feedback loop progressively distills expert knowledge into the AI, transforming it from a static tool into an adaptive partner. We describe the framework's operating principles and provide a concrete implementation, establishing a pattern for trustworthy, continually improving human-robot teams.
comment: Paper was submitted for ICRA 2026
Comprehensive Assessment of LiDAR Evaluation Metrics: A Comparative Study Using Simulated and Real Data
For developing safe Autonomous Driving Systems (ADS), rigorous testing is required before they are deemed safe for road deployments. Since comprehensive conventional physical testing is impractical due to cost and safety concerns, Virtual Testing Environments (VTE) can be adopted as an alternative. Comparing VTE-generated sensor outputs against their real-world analogues can be a strong indication that the VTE accurately represents reality. Correspondingly, this work explores a comprehensive experimental approach to finding evaluation metrics suitable for comparing real-world and simulated LiDAR scans. The metrics were tested in terms of sensitivity and accuracy with different noise, density, distortion, sensor orientation, and channel settings. From comparing the metrics, we found that Density Aware Chamfer Distance (DCD) works best across all cases. In the second step of the research, a Virtual Testing Environment was generated using real LiDAR scan data. The data was collected in a controlled environment with only static objects using an instrumented vehicle equipped with LiDAR, IMU and cameras. Simulated LiDAR scans were generated from the VTEs using the same pose as real LiDAR scans. The simulated and LiDAR scans were compared in terms of model perception and geometric similarity. Actual and simulated LiDAR scans have a similar semantic segmentation output with a mIoU of 21\% with corrected intensity and an average density aware chamfer distance (DCD) of 0.63. This indicates a slight difference in the geometric properties of simulated and real LiDAR scans and a significant difference between model outputs. During the comparison, density-aware chamfer distance was found to be the most correlated among the metrics with perception methods.
EvtSlowTV - A Large and Diverse Dataset for Event-Based Depth Estimation
Event cameras, with their high dynamic range (HDR) and low latency, offer a promising alternative for robust depth estimation in challenging environments. However, many event-based depth estimation approaches are constrained by small-scale annotated datasets, limiting their generalizability to real-world scenarios. To bridge this gap, we introduce EvtSlowTV, a large-scale event camera dataset curated from publicly available YouTube footage, which contains more than 13B events across various environmental conditions and motions, including seasonal hiking, flying, scenic driving, and underwater exploration. EvtSlowTV is an order of magnitude larger than existing event datasets, providing an unconstrained, naturalistic setting for event-based depth learning. This work shows the suitability of EvtSlowTV for a self-supervised learning framework to capitalise on the HDR potential of raw event streams. We further demonstrate that training with EvtSlowTV enhances the model's ability to generalise to complex scenes and motions. Our approach removes the need for frame-based annotations and preserves the asynchronous nature of event data.
Toward an Agricultural Operational Design Domain: A Framework
The agricultural sector increasingly relies on autonomous systems that operate in complex and variable environments. Unlike on-road applications, agricultural automation integrates driving and working processes, each of which imposes distinct operational constraints. Handling this complexity and ensuring consistency throughout the development and validation processes requires a structured, transparent, and verified description of the environment. However, existing Operational Design Domain (ODD) concepts do not yet address the unique challenges of agricultural applications. Therefore, this work introduces the Agricultural ODD (Ag-ODD) Framework, which can be used to describe and verify the operational boundaries of autonomous agricultural systems. The Ag-ODD Framework consists of three core elements. First, the Ag-ODD description concept, which provides a structured method for unambiguously defining environmental and operational parameters using concepts from ASAM Open ODD and CityGML. Second, the 7-Layer Model derived from the PEGASUS 6-Layer Model, has been extended to include a process layer to capture dynamic agricultural operations. Third, the iterative verification process verifies the Ag-ODD against its corresponding logical scenarios, derived from the 7-Layer Model, to ensure the Ag-ODD's completeness and consistency. Together, these elements provide a consistent approach for creating unambiguous and verifiable Ag-ODD. Demonstrative use cases show how the Ag-ODD Framework can support the standardization and scalability of environmental descriptions for autonomous agricultural systems.
comment: 18 pages, 7 figures, 2 tables
EvtSlowTV -- A Large and Diverse Dataset for Event-Based Depth Estimation
Event cameras, with their high dynamic range (HDR) and low latency, offer a promising alternative for robust depth estimation in challenging environments. However, many event-based depth estimation approaches are constrained by small-scale annotated datasets, limiting their generalizability to real-world scenarios. To bridge this gap, we introduce EvtSlowTV, a large-scale event camera dataset curated from publicly available YouTube footage, which contains more than 13B events across various environmental conditions and motions, including seasonal hiking, flying, scenic driving, and underwater exploration. EvtSlowTV is an order of magnitude larger than existing event datasets, providing an unconstrained, naturalistic setting for event-based depth learning. This work shows the suitability of EvtSlowTV for a self-supervised learning framework to capitalise on the HDR potential of raw event streams. We further demonstrate that training with EvtSlowTV enhances the model's ability to generalise to complex scenes and motions. Our approach removes the need for frame-based annotations and preserves the asynchronous nature of event data.
Using Fiber Optic Bundles to Miniaturize Vision-Based Tactile Sensors
Vision-based tactile sensors have recently become popular due to their combination of low cost, very high spatial resolution, and ease of integration using widely available miniature cameras. The associated field of view and focal length, however, are difficult to package in a human-sized finger. In this paper we employ optical fiber bundles to achieve a form factor that, at 15 mm diameter, is smaller than an average human fingertip. The electronics and camera are also located remotely, further reducing package size. The sensor achieves a spatial resolution of 0.22 mm and a minimum force resolution 5 mN for normal and shear contact forces. With these attributes, the DIGIT Pinki sensor is suitable for applications such as robotic and teleoperated digital palpation. We demonstrate its utility for palpation of the prostate gland and show that it can achieve clinically relevant discrimination of prostate stiffness for phantom and ex vivo tissue.
comment: This work has been submitted to the IEEE for possible publication. The CAD design files of DIGIT Pinki are available at https://github.com/facebookresearch/digit-design
Tactile Displays Driven by Projected Light
Tactile displays that lend tangible form to digital content could transform computing interactions. However, achieving the resolution, speed, and dynamic range needed for perceptual fidelity remains challenging. We present a tactile display that directly converts projected light into visible tactile patterns via a photomechanical surface populated with millimeter-scale optotactile pixels. The pixels transduce incident light into mechanical displacements through photostimulated thermal gas expansion, yielding millimeter scale displacements with response times of 2 to 100 milliseconds. Employing projected light for power transmission and addressing renders these displays highly scalable. We demonstrate optically driven displays with up to 1,511 addressable pixels -- several times more pixels than any prior tactile display attaining comparable performance. Perceptual studies confirm that these displays can reproduce diverse spatiotemporal tactile patterns with high fidelity. This research establishes a foundation for practical, versatile high-resolution tactile displays driven by light.
Replicating Human Anatomy with Vision Controlled Jetting -- A Pneumatic Musculoskeletal Hand and Forearm
The functional replication and actuation of complex structures inspired by nature is a longstanding goal for humanity. Creating such complex structures combining soft and rigid features and actuating them with artificial muscles would further our understanding of natural kinematic structures. We printed a biomimetic hand in a single print process comprised of a rigid skeleton, soft joint capsules, tendons, and printed touch sensors. We showed it's actuation using electric motors. In this work, we expand on this work by adding a forearm that is also closely modeled after the human anatomy and replacing the hand's motors with 22 independently controlled pneumatic artificial muscles (PAMs). Our thin, high-strain (up to 30.1%) PAMs match the performance of state-of-the-art artificial muscles at a lower cost. The system showcases human-like dexterity with independent finger movements, demonstrating successful grasping of various objects, ranging from a small, lightweight coin to a large can of 272g in weight. The performance evaluation, based on fingertip and grasping forces along with finger joint range of motion, highlights the system's potential.
Mobile Robotic Multi-View Photometric Stereo SP
Multi-View Photometric Stereo (MVPS) is a popular method for fine-detailed 3D acquisition of an object from images. Despite its outstanding results on diverse material objects, a typical MVPS experimental setup requires a well-calibrated light source and a monocular camera installed on an immovable base. This restricts the use of MVPS on a movable platform, limiting us from taking MVPS benefits in 3D acquisition for mobile robotics applications. To this end, we introduce a new mobile robotic system for MVPS. While the proposed system brings advantages, it introduces additional algorithmic challenges. Addressing them, in this paper, we further propose an incremental approach for mobile robotic MVPS. Our approach leverages a supervised learning setup to predict per-view surface normal, object depth, and per-pixel uncertainty in model-predicted results. A refined depth map per view is obtained by solving an MVPS-driven optimization problem proposed in this paper. Later, we fuse the refined depth map while tracking the camera pose w.r.t the reference frame to recover globally consistent object 3D geometry. Experimental results show the advantages of our robotic system and algorithm, featuring the local high-frequency surface detail recovery with globally consistent object shape. Our work is beyond any MVPS system yet presented, providing encouraging results on objects with unknown reflectance properties using fewer frames without a tiring calibration and installation process, enabling computationally efficient robotic automation approach to photogrammetry. The proposed approach is nearly 100 times computationally faster than the state-of-the-art MVPS methods such as [1, 2] while maintaining the similar results when tested on subjects taken from the benchmark DiLiGenT MV dataset [3].
comment: Acknowledgment Added. Published at International Society Journal of Photogrammetry and Remote Sensing (ISPRS). 32 pages, 14 Figures, 5 Tables
FRASA: An End-to-End Reinforcement Learning Agent for Fall Recovery and Stand Up of Humanoid Robots
Humanoid robotics faces significant challenges in achieving stable locomotion and recovering from falls in dynamic environments. Traditional methods, such as Model Predictive Control (MPC) and Key Frame Based (KFB) routines, either require extensive fine-tuning or lack real-time adaptability. This paper introduces FRASA, a Deep Reinforcement Learning (DRL) agent that integrates fall recovery and stand up strategies into a unified framework. Leveraging the Cross-Q algorithm, FRASA significantly reduces training time and offers a versatile recovery strategy that adapts to unpredictable disturbances. Comparative tests on Sigmaban humanoid robots demonstrate FRASA superior performance against the KFB method deployed in the RoboCup 2023 by the Rhoban Team, world champion of the KidSize League.
Extended Friction Models for the Physics Simulation of Servo Actuators
Accurate physical simulation is crucial for the development and validation of control algorithms in robotic systems. Recent works in Reinforcement Learning (RL) take notably advantage of extensive simulations to produce efficient robot control. State-of-the-art servo actuator models generally fail at capturing the complex friction dynamics of these systems. This limits the transferability of simulated behaviors to real-world applications. In this work, we present extended friction models that allow to more accurately simulate servo actuator dynamics. We propose a comprehensive analysis of various friction models, present a method for identifying model parameters using recorded trajectories from a pendulum test bench, and demonstrate how these models can be integrated into physics engines. The proposed friction models are validated on four distinct servo actuators and tested on 2R manipulators, showing significant improvements in accuracy over the standard Coulomb-Viscous model. Our results highlight the importance of considering advanced friction effects in the simulation of servo actuators to enhance the realism and reliability of robotic simulations.
Dual-Stream Diffusion for World-Model Augmented Vision-Language-Action Model
Recently, augmenting vision-language-action models (VLAs) with world-models has shown promise in robotic policy learning. However, it remains challenging to jointly predict next-state observations and action sequences because of the inherent difference between the two modalities. To address this, we propose DUal-STream diffusion (DUST), a world-model augmented VLA framework that handles the modality conflict and enhances the performance of VLAs across diverse tasks. Specifically, we propose a multimodal diffusion transformer architecture that explicitly maintains separate modality streams while enabling cross-modal knowledge sharing. In addition, we propose training techniques such as independent noise perturbations for each modality and a decoupled flow matching loss, which enables the model to learn the joint distribution in a bidirectional manner while avoiding the need for a unified latent space. Furthermore, based on the decoupled training framework, we introduce a sampling method where we sample action and vision tokens asynchronously at different rates, which shows improvement through inference-time scaling. Through experiments on simulated benchmarks such as RoboCasa and GR-1, DUST achieves up to 6% gains over a standard VLA baseline and implicit world-modeling methods, with our inference-time scaling approach providing an additional 2-5% gain on success rate. On real-world tasks with the Franka Research 3, DUST outperforms baselines in success rate by 13%, confirming its effectiveness beyond simulation. Lastly, we demonstrate the effectiveness of DUST in large-scale pretraining with action-free videos from BridgeV2, where DUST leads to significant gain when transferred to the RoboCasa benchmark.
comment: 20 pages, 10 figures
Virtual Target Trajectory Prediction for Stochastic Targets
Trajectory prediction of aerial vehicles is a key requirement in applications ranging from missile guidance to UAV collision avoidance. While most prediction methods assume deterministic target motion, real-world targets often exhibit stochastic behaviors such as evasive maneuvers or random gliding patterns. This paper introduces a probabilistic framework based on Conditional Normalizing Flows (CNFs) to model and predict such stochastic dynamics directly from trajectory data. The learned model generates probability distributions of future target positions conditioned on initial states and dynamic parameters, enabling efficient sampling and exact density evaluation. To provide deterministic surrogates compatible with existing guidance and planning algorithms, sampled trajectories are clustered using a time series k-means approach, yielding a set of representative "virtual target" trajectories. The method is target-agnostic, computationally efficient, and requires only trajectory data for training, making it suitable as a drop-in replacement for deterministic predictors. Simulated scenarios with maneuvering and ballistic targets demonstrate that the proposed approach bridges the gap between deterministic assumptions and stochastic reality, advancing guidance and control algorithms for autonomous vehicles.
comment: Manuscript accepted by Journal of Guidance, Control, and Dynamics
Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation
We introduce Genie Envisioner (GE), a unified world foundation platform for robotic manipulation that integrates policy learning, evaluation, and simulation within a single video-generative framework. At its core, GE-Base is a large-scale, instruction-conditioned video diffusion model that captures the spatial, temporal, and semantic dynamics of real-world robotic interactions in a structured latent space. Built upon this foundation, GE-Act maps latent representations to executable action trajectories through a lightweight, flow-matching decoder, enabling precise and generalizable policy inference across diverse embodiments with minimal supervision. To support scalable evaluation and training, GE-Sim serves as an action-conditioned neural simulator, producing high-fidelity rollouts for closed-loop policy development. The platform is further equipped with EWMBench, a standardized benchmark suite measuring visual fidelity, physical consistency, and instruction-action alignment. Together, these components establish Genie Envisioner as a scalable and practical foundation for instruction-driven, general-purpose embodied intelligence. All code, models, and benchmarks will be released publicly.
comment: https://genie-envisioner.github.io/
Radar-Based Odometry for Low-Speed Driving
We address automotive odometry for low-speed driving and parking, where centimeter-level accuracy is required due to tight spaces and nearby obstacles. Traditional methods using inertial-measurement units and wheel encoders require vehicle-specific calibration, making them costly for consumer-grade vehicles. To overcome this, we propose a radar-based simultaneous localization and mapping (SLAM) approach that fuses inertial and 4D radar measurements. Our approach tightly couples feature positions and Doppler velocities for accurate localization and robust data association. Key contributions include a tightly coupled radar-Doppler extended Kalman filter, multi-radar support and an information-based feature-pruning strategy. Experiments using both proprietary and public datasets demonstrate high-accuracy localization during low-speed driving.
comment: This work has been submitted to the IEEE for possible publication
No Plan but Everything Under Control: Robustly Solving Sequential Tasks with Dynamically Composed Gradient Descent ICRA25
We introduce a novel gradient-based approach for solving sequential tasks by dynamically adjusting the underlying myopic potential field in response to feedback and the world's regularities. This adjustment implicitly considers subgoals encoded in these regularities, enabling the solution of long sequential tasks, as demonstrated by solving the traditional planning domain of Blocks World - without any planning. Unlike conventional planning methods, our feedback-driven approach adapts to uncertain and dynamic environments, as demonstrated by one hundred real-world trials involving drawer manipulation. These experiments highlight the robustness of our method compared to planning and show how interactive perception and error recovery naturally emerge from gradient descent without explicitly implementing them. This offers a computationally efficient alternative to planning for a variety of sequential tasks, while aligning with observations on biological problem-solving strategies.
comment: Accepted at ICRA25; Supplementary Material under https://www.tu.berlin/robotics/papers/noplan ; 7 pages + 6 figures;
UniCoD: Enhancing Robot Policy via Unified Continuous and Discrete Representation Learning
Building generalist robot policies that can handle diverse tasks in open-ended environments is a central challenge in robotics. To leverage knowledge from large-scale pretraining, prior work (VLA) has typically built generalist policies either on top of vision-language understanding models (VLMs) or generative models. However, both semantic understanding from vision-language pretraining and visual dynamics modeling from visual-generation pretraining are crucial for embodied robots. Recent unified models of generation and understanding have demonstrated strong capabilities in both comprehension and generation through large-scale pretraining. We posit that robotic policy learning can likewise benefit from the combined strengths of understanding, planning, and continuous future representation learning. Building on this insight, we introduce UniCoD, which acquires the ability to dynamically model high-dimensional visual features through pretraining on over 1M internet-scale instructional manipulation videos. Subsequently, UniCoD is fine-tuned on data collected from the robot embodiment, enabling the learning of mappings from predictive representations to action tokens. Extensive experiments show our approach consistently outperforms baseline methods in terms of 9\% and 12\% across simulation environments and real-world out-of-distribution tasks.
Unseen from Seen: Rewriting Observation-Instruction Using Foundation Models for Augmenting Vision-Language Navigation
Data scarcity is a long-standing challenge in the Vision-Language Navigation (VLN) field, which extremely hinders the generalization of agents to unseen environments. Previous works primarily rely on additional simulator data or web-collected images/videos to improve the generalization. However, the simulator environments still face limited diversity, and the web-collected data often requires extensive labor to remove the noise. In this paper, we propose a Rewriting-driven AugMentation (RAM) paradigm for VLN, which directly creates the unseen observation-instruction pairs via rewriting human-annotated training data. Benefiting from our rewriting mechanism, new observation-instruction pairs can be obtained in both simulator-free and labor-saving manners to promote generalization. Specifically, we first introduce Object-Enriched Observation Rewriting, where we combine Vision-Language Models (VLMs) and Large Language Models (LLMs) to derive rewritten object-enriched scene descriptions, enabling observation synthesis with diverse objects and spatial layouts via Text-to-Image Generation Models (T2IMs). Then, we propose Observation-Contrast Instruction Rewriting, which generates observation-aligned rewritten instructions by requiring LLMs to reason the difference between original and new observations. We further develop a mixing-then-focusing training strategy with a random observation cropping scheme, effectively enhancing data distribution diversity while suppressing augmentation data noise during training. Experiments on both the discrete environments (R2R, REVERIE, and R4R datasets) and continuous environments (R2R-CE dataset) show the superior performance and impressive generalization ability of our method. Code is available at https://github.com/SaDil13/VLN-RAM.
comment: Accepted by IEEE Transactions on Neural Networks and Learning Systems
Kinematify: Open-Vocabulary Synthesis of High-DoF Articulated Objects
A deep understanding of kinematic structures and movable components is essential for enabling robots to manipulate objects and model their own articulated forms. Such understanding is captured through articulated objects, which are essential for tasks such as physical simulation, motion planning, and policy learning. However, creating these models, particularly for objects with high degrees of freedom (DoF), remains a significant challenge. Existing methods typically rely on motion sequences or strong assumptions from hand-curated datasets, which hinders scalability. In this paper, we introduce Kinematify, an automated framework that synthesizes articulated objects directly from arbitrary RGB images or textual descriptions. Our method addresses two core challenges: (i) inferring kinematic topologies for high-DoF objects and (ii) estimating joint parameters from static geometry. To achieve this, we combine MCTS search for structural inference with geometry-driven optimization for joint reasoning, producing physically consistent and functionally valid descriptions. We evaluate Kinematify on diverse inputs from both synthetic and real-world environments, demonstrating improvements in registration and kinematic topology accuracy over prior work.
comment: project page: https://sites.google.com/deemos.com/kinematify
Generative World Models of Tasks: LLM-Driven Hierarchical Scaffolding for Embodied Agents NeurIPS 2025
Recent advances in agent development have focused on scaling model size and raw interaction data, mirroring successes in large language models. However, for complex, long-horizon multi-agent tasks such as robotic soccer, this end-to-end approach often fails due to intractable exploration spaces and sparse rewards. We propose that an effective world model for decision-making must model the world's physics and also its task semantics. A systematic review of 2024 research in low-resource multi-agent soccer reveals a clear trend towards integrating symbolic and hierarchical methods, such as Hierarchical Task Networks (HTNs) and Bayesian Strategy Networks (BSNs), with multi-agent reinforcement learning (MARL). These methods decompose complex goals into manageable subgoals, creating an intrinsic curriculum that shapes agent learning. We formalize this trend into a framework for Hierarchical Task Environments (HTEs), which are essential for bridging the gap between simple, reactive behaviors and sophisticated, strategic team play. Our framework incorporates the use of Large Language Models (LLMs) as generative world models of tasks, capable of dynamically generating this scaffolding. We argue that HTEs provide a mechanism to guide exploration, generate meaningful learning signals, and train agents to internalize hierarchical structure, enabling the development of more capable and general-purpose agents with greater sample efficiency than purely end-to-end approaches.
comment: In the 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: Embodied World Models for Decision Making (EWM)
Neural Network Aided Kalman Filtering with Model Predictive Control Enables Robot-Assisted Drone Recovery on a Wavy Surface
Recovering a drone on a disturbed water surface remains a significant challenge in maritime robotics. In this paper, we propose a unified framework for robot-assisted drone recovery on a wavy surface that addresses two major tasks: Firstly, accurate prediction of a moving drone's position under wave-induced disturbances using KalmanNet Plus Plus (KalmanNet++), a Neural Network Aided Kalman Filtering we proposed. Secondly, effective motion planning using the desired position we got for a manipulator via Receding Horizon Model Predictive Control (RHMPC). Specifically, we compared multiple prediction methods and proposed KalmanNet Plus Plus to predict the position of the UAV, thereby obtaining the desired position. The KalmanNet++ predicts the drone's future position 0.1\,s ahead, while the manipulator plans a capture trajectory in real time, thus overcoming not only wave-induced base motions but also limited constraints such as torque constraints and joint constraints. For the system design, we provide a collaborative system, comprising a manipulator subsystem and a UAV subsystem, enables drone lifting and drone recovery. Simulation and real-world experiments using wave-disturbed motion data demonstrate that our approach achieves a high success rate - above 95\% and outperforms conventional baseline methods by up to 10\% in efficiency and 20\% in precision. The results underscore the feasibility and robustness of our system, which achieves state-of-the-art performance and offers a practical solution for maritime drone operations.
comment: 17 pages, 51 figures
Light Future: Multimodal Action Frame Prediction via InstructPix2Pix WACV 2026
Predicting future motion trajectories is a critical capability across domains such as robotics, autonomous systems, and human activity forecasting, enabling safer and more intelligent decision-making. This paper proposes a novel, efficient, and lightweight approach for robot action prediction, offering significantly reduced computational cost and inference latency compared to conventional video prediction models. Importantly, it pioneers the adaptation of the InstructPix2Pix model for forecasting future visual frames in robotic tasks, extending its utility beyond static image editing. We implement a deep learning-based visual prediction framework that forecasts what a robot will observe 100 frames (10 seconds) into the future, given a current image and a textual instruction. We repurpose and fine-tune the InstructPix2Pix model to accept both visual and textual inputs, enabling multimodal future frame prediction. Experiments on the RoboTWin dataset (generated based on real-world scenarios) demonstrate that our method achieves superior SSIM and PSNR compared to state-of-the-art baselines in robot action prediction tasks. Unlike conventional video prediction models that require multiple input frames, heavy computation, and slow inference latency, our approach only needs a single image and a text prompt as input. This lightweight design enables faster inference, reduced GPU demands, and flexible multimodal control, particularly valuable for applications like robotics and sports motion trajectory analytics, where motion trajectory precision is prioritized over visual fidelity.
comment: 9 pages including appendix, 4 tables, 8 figures, to be submitted to WACV 2026
Grounded Vision-Language Interpreter for Integrated Task and Motion Planning
While recent advances in vision-language models have accelerated the development of language-guided robot planners, their black-box nature often lacks safety guarantees and interpretability crucial for real-world deployment. Conversely, classical symbolic planners offer rigorous safety verification but require significant expert knowledge for setup. To bridge the current gap, this paper proposes ViLaIn-TAMP, a hybrid planning framework for enabling verifiable, interpretable, and autonomous robot behaviors. ViLaIn-TAMP comprises three main components: (1) a Vision-Language Interpreter (ViLaIn) adapted from previous work that converts multimodal inputs into structured problem specifications, (2) a modular Task and Motion Planning (TAMP) system that grounds these specifications in actionable trajectory sequences through symbolic and geometric constraint reasoning, and (3) a corrective planning (CP) module which receives concrete feedback on failed solution attempts and feed them with constraints back to ViLaIn to refine the specification. We design challenging manipulation tasks in a cooking domain and evaluate our framework. Experimental results demonstrate that ViLaIn-TAMP outperforms a VLM-as-a-planner baseline by 18% in mean success rate, and that adding the CP module boosts mean success rate by 32%.
comment: Project website: https://omron-sinicx.github.io/ViLaIn-TAMP/
Talk2Event: Grounded Understanding of Dynamic Scenes from Event Cameras NeurIPS 2025
Event cameras offer microsecond-level latency and robustness to motion blur, making them ideal for understanding dynamic environments. Yet, connecting these asynchronous streams to human language remains an open challenge. We introduce Talk2Event, the first large-scale benchmark for language-driven object grounding in event-based perception. Built from real-world driving data, we provide over 30,000 validated referring expressions, each enriched with four grounding attributes -- appearance, status, relation to viewer, and relation to other objects -- bridging spatial, temporal, and relational reasoning. To fully exploit these cues, we propose EventRefer, an attribute-aware grounding framework that dynamically fuses multi-attribute representations through a Mixture of Event-Attribute Experts (MoEE). Our method adapts to different modalities and scene dynamics, achieving consistent gains over state-of-the-art baselines in event-only, frame-only, and event-frame fusion settings. We hope our dataset and approach will establish a foundation for advancing multimodal, temporally-aware, and language-driven perception in real-world robotics and autonomy.
comment: NeurIPS 2025 Spotlight; 43 pages, 17 figures, 16 tables; Project Page at https://talk2event.github.io
Towards Predicting Any Human Trajectory In Context NeurIPS 2025
Predicting accurate future trajectories of pedestrians is essential for autonomous systems but remains a challenging task due to the need for adaptability in different environments and domains. A common approach involves collecting scenario-specific data and performing fine-tuning via backpropagation. However, the need to fine-tune for each new scenario is often impractical for deployment on edge devices. To address this challenge, we introduce TrajICL, an In-Context Learning (ICL) framework for pedestrian trajectory prediction that enables adaptation without fine-tuning on the scenario-specific data at inference time without requiring weight updates. We propose a spatio-temporal similarity-based example selection (STES) method that selects relevant examples from previously observed trajectories within the same scene by identifying similar motion patterns at corresponding locations. To further refine this selection, we introduce prediction-guided example selection (PG-ES), which selects examples based on both the past trajectory and the predicted future trajectory, rather than relying solely on the past trajectory. This approach allows the model to account for long-term dynamics when selecting examples. Finally, instead of relying on small real-world datasets with limited scenario diversity, we train our model on a large-scale synthetic dataset to enhance its prediction ability by leveraging in-context examples. Extensive experiments demonstrate that TrajICL achieves remarkable adaptation across both in-domain and cross-domain scenarios, outperforming even fine-tuned approaches across multiple public benchmarks. Project Page: https://fujiry0.github.io/TrajICL-project-page/.
comment: NeurIPS 2025
Rethinking Bimanual Robotic Manipulation: Learning with Decoupled Interaction Framework
Bimanual robotic manipulation is an emerging and critical topic in the robotics community. Previous works primarily rely on integrated control models that take the perceptions and states of both arms as inputs to directly predict their actions. However, we think bimanual manipulation involves not only coordinated tasks but also various uncoordinated tasks that do not require explicit cooperation during execution, such as grasping objects with the closest hand, which integrated control frameworks ignore to consider due to their enforced cooperation in the early inputs. In this paper, we propose a novel decoupled interaction framework that considers the characteristics of different tasks in bimanual manipulation. The key insight of our framework is to assign an independent model to each arm to enhance the learning of uncoordinated tasks, while introducing a selective interaction module that adaptively learns weights from its own arm to improve the learning of coordinated tasks. Extensive experiments on seven tasks in the RoboTwin dataset demonstrate that: (1) Our framework achieves outstanding performance, with a 23.5% boost over the SOTA method. (2) Our framework is flexible and can be seamlessly integrated into existing methods. (3) Our framework can be effectively extended to multi-agent manipulation tasks, achieving a 28% boost over the integrated control SOTA. (4) The performance boost stems from the decoupled design itself, surpassing the SOTA by 16.5% in success rate with only 1/6 of the model size.
comment: 15 pages, 8 figures
DiffVLA++: Bridging Cognitive Reasoning and End-to-End Driving through Metric-Guided Alignment
Conventional end-to-end (E2E) driving models are effective at generating physically plausible trajectories, but often fail to generalize to long-tail scenarios due to the lack of essential world knowledge to understand and reason about surrounding environments. In contrast, Vision-Language-Action (VLA) models leverage world knowledge to handle challenging cases, but their limited 3D reasoning capability can lead to physically infeasible actions. In this work we introduce DiffVLA++, an enhanced autonomous driving framework that explicitly bridges cognitive reasoning and E2E planning through metric-guided alignment. First, we build a VLA module directly generating semantically grounded driving trajectories. Second, we design an E2E module with a dense trajectory vocabulary that ensures physical feasibility. Third, and most critically, we introduce a metric-guided trajectory scorer that guides and aligns the outputs of the VLA and E2E modules, thereby integrating their complementary strengths. The experiment on the ICCV 2025 Autonomous Grand Challenge leaderboard shows that DiffVLA++ achieves EPDMS of 49.12.
RoboTron-Mani: All-in-One Multimodal Large Model for Robotic Manipulation
Recently, robotics has advanced significantly through the integration of larger models and large-scale datasets. However, challenges remain in applying these models to 3D spatial interactions and managing data collection costs. To address these issues, we propose the multimodal robotic manipulation model RoboTron-Mani and the comprehensive dataset RoboData. RoboTron-Mani, on one hand, enhances 3D perception through camera parameters and occupancy supervision. On the other hand, it further incorporates Modality-Isolation-Mask and multimodal decoder blocks based on OpenFlamingo, improving modality fusion and fine-grained perception. RoboData integrats several publicly-available datasets, achieving the first fusion of multi-view images, camera parameters, depth maps, actions, and space alignment, which facilitates comprehensive learning from diverse robotic datasets and offers one complete evaluation system. Trained on RoboData, RoboTron-Mani is the first generalist policy that surpasses expert models, enabling simultaneous evaluation of all tasks across multiple datasets, rather than being limited to specific data or task selections. Specifically, RoboTron-Mani boosts manipulation performance by increasing the average sequence length on CALVIN from 1.7 to 3.5, enabling cross-embodiment generalization, and achieving state-of-the-art results on both simulated and real-world datasets.
Adv-BMT: Bidirectional Motion Transformer for Safety-Critical Traffic Scenario Generation
Scenario-based testing is essential for validating the performance of autonomous driving (AD) systems. However, such testing is limited by the scarcity of long-tailed, safety-critical scenarios in existing datasets collected in the real world. To tackle the data issue, we propose the Adv-BMT framework, which augments real-world scenarios with diverse and realistic adversarial traffic interactions. The core component of Adv-BMT is a bidirectional motion transformer (BMT) model to perform inverse traffic motion predictions, which takes agent information in the last time step of the scenario as input, and reconstructs the traffic in the inverse of chronological order until the initial time step. The Adv-BMT framework is a two-staged pipeline: it first conducts adversarial initializations and then inverse motion predictions. Different from previous work, we do not need any collision data for pretraining, and are able to generate realistic and diverse collision interactions. Our experimental results validate the quality of generated collision scenarios by Adv-BMT: training in our augmented dataset would reduce episode collision rates by 20%. Demo and code are available at: https://metadriverse.github.io/adv-bmt/.
MetAdv: A Unified and Interactive Adversarial Testing Platform for Autonomous Driving ACM MM 2025
Evaluating and ensuring the adversarial robustness of autonomous driving (AD) systems is a critical and unresolved challenge. This paper introduces MetAdv, a novel adversarial testing platform that enables realistic, dynamic, and interactive evaluation by tightly integrating virtual simulation with physical vehicle feedback. At its core, MetAdv establishes a hybrid virtual-physical sandbox, within which we design a three-layer closed-loop testing environment with dynamic adversarial test evolution. This architecture facilitates end-to-end adversarial evaluation, ranging from high-level unified adversarial generation, through mid-level simulation-based interaction, to low-level execution on physical vehicles. Additionally, MetAdv supports a broad spectrum of AD tasks, algorithmic paradigms (e.g., modular deep learning pipelines, end-to-end learning, vision-language models). It supports flexible 3D vehicle modeling and seamless transitions between simulated and physical environments, with built-in compatibility for commercial platforms such as Apollo and Tesla. A key feature of MetAdv is its human-in-the-loop capability: besides flexible environmental configuration for more customized evaluation, it enables real-time capture of physiological signals and behavioral feedback from drivers, offering new insights into human-machine trust under adversarial conditions. We believe MetAdv can offer a scalable and unified framework for adversarial assessment, paving the way for safer AD.
comment: ACM MM 2025 Most Popular Demo Award
ROADWork: A Dataset and Benchmark for Learning to Recognize, Observe, Analyze and Drive Through Work Zones ICCV 2025
Perceiving and autonomously navigating through work zones is a challenging and underexplored problem. Open datasets for this long-tailed scenario are scarce. We propose the ROADWork dataset to learn to recognize, observe, analyze, and drive through work zones. State-of-the-art foundation models fail when applied to work zones. Fine-tuning models on our dataset significantly improves perception and navigation in work zones. With ROADWork dataset, we discover new work zone images with higher precision (+32.5%) at a much higher rate (12.8$\times$) around the world. Open-vocabulary methods fail too, whereas fine-tuned detectors improve performance (+32.2 AP). Vision-Language Models (VLMs) struggle to describe work zones, but fine-tuning substantially improves performance (+36.7 SPICE). Beyond fine-tuning, we show the value of simple techniques. Video label propagation provides additional gains (+2.6 AP) for instance segmentation. While reading work zone signs, composing a detector and text spotter via crop-scaling improves performance +14.2% 1-NED). Composing work zone detections to provide context further reduces hallucinations (+3.9 SPICE) in VLMs. We predict navigational goals and compute drivable paths from work zone videos. Incorporating road work semantics ensures 53.6% goals have angular error (AE) < 0.5 (+9.9 %) and 75.3% pathways have AE < 0.5 (+8.1 %).
comment: ICCV 2025 Accepted Paper
Human-Exoskeleton Kinematic Calibration to Improve Hand Tracking for Dexterous Teleoperation
Hand exoskeletons are critical tools for dexterous teleoperation and immersive manipulation interfaces, but achieving accurate hand tracking remains a challenge due to user-specific anatomical variability and donning inconsistencies. These issues lead to kinematic misalignments that degrade tracking performance and limit applicability in precision tasks. We propose a subject-specific calibration framework for exoskeleton-based hand tracking that estimates virtual link parameters through residual-weighted optimization. A data-driven approach is introduced to empirically tune cost function weights using motion capture ground truth, enabling accurate and consistent calibration across users. Implemented on the Maestro hand exoskeleton with seven healthy participants, the method achieved substantial reductions in joint and fingertip tracking errors across diverse hand geometries. Qualitative visualizations using a Unity-based virtual hand further demonstrate improved motion fidelity. The proposed framework generalizes to exoskeletons with closed-loop kinematics and minimal sensing, laying the foundation for high-fidelity teleoperation and robot learning applications.
comment: 8 pages, 10 figures, 1 supplementary video, submitted to RA-L
Enhancing Fatigue Detection through Heterogeneous Multi-Source Data Integration and Cross-Domain Modality Imputation
Fatigue detection for human operators plays a key role in safety critical applications such as aviation, mining, and long haul transport. While numerous studies have demonstrated the effectiveness of high fidelity sensors in controlled laboratory environments, their performance often degrades when ported to real world settings due to noise, lighting conditions, and field of view constraints, thereby limiting their practicality. This paper formalizes a deployment oriented setting for real world fatigue detection, where high quality sensors are often unavailable in practical applications. To address this challenge, we propose leveraging knowledge from heterogeneous source domains, including high fidelity sensors that are difficult to deploy in the field but commonly used in controlled environments, to assist fatigue detection in the real world target domain. Building on this idea, we design a heterogeneous and multiple source fatigue detection framework that adaptively utilizes the available modalities in the target domain while exploiting diverse configurations in the source domains through alignment across domains and modality imputation. Our experiments, conducted using a field deployed sensor setup and two publicly available human fatigue datasets, demonstrate the practicality, robustness, and improved generalization of our approach across subjects and domains. The proposed method achieves consistent gains over strong baselines in sensor constrained scenarios.
comment: 4figures,14pages
NaviTrace: Evaluating Embodied Navigation of Vision-Language Models
Vision-language models demonstrate unprecedented performance and generalization across a wide range of tasks and scenarios. Integrating these foundation models into robotic navigation systems opens pathways toward building general-purpose robots. Yet, evaluating these models' navigation capabilities remains constrained by costly real-world trials, overly simplified simulations, and limited benchmarks. We introduce NaviTrace, a high-quality Visual Question Answering benchmark where a model receives an instruction and embodiment type (human, legged robot, wheeled robot, bicycle) and must output a 2D navigation trace in image space. Across 1000 scenarios and more than 3000 expert traces, we systematically evaluate eight state-of-the-art VLMs using a newly introduced semantic-aware trace score. This metric combines Dynamic Time Warping distance, goal endpoint error, and embodiment-conditioned penalties derived from per-pixel semantics and correlates with human preferences. Our evaluation reveals consistent gap to human performance caused by poor spatial grounding and goal localization. NaviTrace establishes a scalable and reproducible benchmark for real-world robotic navigation. The benchmark and leaderboard can be found at https://leggedrobotics.github.io/navitrace_webpage/.
comment: 9 pages, 6 figures, under review at IEEE conference
Deep Learning Warm Starts for Trajectory Optimization on the International Space Station
Trajectory optimization is a cornerstone of modern robot autonomy, enabling systems to compute trajectories and controls in real-time while respecting safety and physical constraints. However, it has seen limited usage in spaceflight applications due to its heavy computational demands that exceed the capability of most flight computers. In this work, we provide results on the first in-space demonstration of using machine learning-based warm starts for accelerating trajectory optimization for the Astrobee free-flying robot onboard the International Space Station (ISS). We formulate a data-driven optimal control approach that trains a neural network to learn the structure of the trajectory generation problem being solved using sequential convex programming (SCP). Onboard, this trained neural network predicts solutions for the trajectory generation problem and relies on using the SCP solver to enforce safety constraints for the system. Our trained network reduces the number of solver iterations required for convergence in cases including rotational dynamics by 60% and in cases with obstacles drawn from the training distribution of the warm start model by 50%. This work represents a significant milestone in the use of learning-based control for spaceflight applications and a stepping stone for future advances in the use of machine learning for autonomous guidance, navigation, & control.
comment: Accepted to 2025 International Conference on Space Robotics (iSpaRo). Presented at RSS 2025 Workshop on Space Robotics
Multiagent Systems
When One Modality Sabotages the Others: A Diagnostic Lens on Multimodal Reasoning NeurIPS 2025
Despite rapid growth in multimodal large language models (MLLMs), their reasoning traces remain opaque: it is often unclear which modality drives a prediction, how conflicts are resolved, or when one stream dominates. In this paper, we introduce modality sabotage, a diagnostic failure mode in which a high-confidence unimodal error overrides other evidence and misleads the fused result. To analyze such dynamics, we propose a lightweight, model-agnostic evaluation layer that treats each modality as an agent, producing candidate labels and a brief self-assessment used for auditing. A simple fusion mechanism aggregates these outputs, exposing contributors (modalities supporting correct outcomes) and saboteurs (modalities that mislead). Applying our diagnostic layer in a case study on multimodal emotion recognition benchmarks with foundation models revealed systematic reliability profiles, providing insight into whether failures may arise from dataset artifacts or model limitations. More broadly, our framework offers a diagnostic scaffold for multimodal reasoning, supporting principled auditing of fusion dynamics and informing possible interventions.
comment: Accepted at the Multimodal Algorithmic Reasoning (MAR) Workshop, NeurIPS 2025
From Solo to Symphony: Orchestrating Multi-Agent Collaboration with Single-Agent Demos
Training a team of agents from scratch in multi-agent reinforcement learning (MARL) is highly inefficient, much like asking beginners to play a symphony together without first practicing solo. Existing methods, such as offline or transferable MARL, can ease this burden, but they still rely on costly multi-agent data, which often becomes the bottleneck. In contrast, solo experiences are far easier to obtain in many important scenarios, e.g., collaborative coding, household cooperation, and search-and-rescue. To unlock their potential, we propose Solo-to-Collaborative RL (SoCo), a framework that transfers solo knowledge into cooperative learning. SoCo first pretrains a shared solo policy from solo demonstrations, then adapts it for cooperation during multi-agent training through a policy fusion mechanism that combines an MoE-like gating selector and an action editor. Experiments across diverse cooperative tasks show that SoCo significantly boosts the training efficiency and performance of backbone algorithms. These results demonstrate that solo demonstrations provide a scalable and effective complement to multi-agent data, making cooperative learning more practical and broadly applicable.
Modeling Hawkish-Dovish Latent Beliefs in Multi-Agent Debate-Based LLMs for Monetary Policy Decision Classification
Accurately forecasting central bank policy decisions, particularly those of the Federal Open Market Committee(FOMC) has become increasingly important amid heightened economic uncertainty. While prior studies have used monetary policy texts to predict rate changes, most rely on static classification models that overlook the deliberative nature of policymaking. This study proposes a novel framework that structurally imitates the FOMC's collective decision-making process by modeling multiple large language models(LLMs) as interacting agents. Each agent begins with a distinct initial belief and produces a prediction based on both qualitative policy texts and quantitative macroeconomic indicators. Through iterative rounds, agents revise their predictions by observing the outputs of others, simulating deliberation and consensus formation. To enhance interpretability, we introduce a latent variable representing each agent's underlying belief(e.g., hawkish or dovish), and we theoretically demonstrate how this belief mediates the perception of input information and interaction dynamics. Empirical results show that this debate-based approach significantly outperforms standard LLMs-based baselines in prediction accuracy. Furthermore, the explicit modeling of beliefs provides insights into how individual perspectives and social influence shape collective policy forecasts.
comment: PRIMA2025 Accepted
Automata-Conditioned Cooperative Multi-Agent Reinforcement Learning
We study the problem of learning multi-task, multi-agent policies for cooperative, temporal objectives, under centralized training, decentralized execution. In this setting, using automata to represent tasks enables the decomposition of complex tasks into simpler sub-tasks that can be assigned to agents. However, existing approaches remain sample-inefficient and are limited to the single-task case. In this work, we present Automata-Conditioned Cooperative Multi-Agent Reinforcement Learning (ACC-MARL), a framework for learning task-conditioned, decentralized team policies. We identify the main challenges to ACC-MARL's feasibility in practice, propose solutions, and prove the correctness of our approach. We further show that the value functions of learned policies can be used to assign tasks optimally at test time. Experiments show emergent task-aware, multi-step coordination among agents, e.g., pressing a button to unlock a door, holding the door, and short-circuiting tasks.
Optimizing Multi-Lane Intersection Performance in Mixed Autonomy Environments
One of the main challenges in managing traffic at multilane intersections is ensuring smooth coordination between human-driven vehicles (HDVs) and connected autonomous vehicles (CAVs). This paper presents a novel traffic signal control framework that combines Graph Attention Networks (GAT) with Soft Actor-Critic (SAC) reinforcement learning to address this challenge. GATs are used to model the dynamic graph- structured nature of traffic flow to capture spatial and temporal dependencies between lanes and signal phases. The proposed SAC is a robust off-policy reinforcement learning algorithm that enables adaptive signal control through entropy-optimized decision making. This design allows the system to coordinate the signal timing and vehicle movement simultaneously with objectives focused on minimizing travel time, enhancing performance, ensuring safety, and improving fairness between HDVs and CAVs. The model is evaluated using a SUMO-based simulation of a four-way intersection and incorporating different traffic densities and CAV penetration rates. The experimental results demonstrate the effectiveness of the GAT-SAC approach by achieving a 24.1% reduction in average delay and up to 29.2% fewer traffic violations compared to traditional methods. Additionally, the fairness ratio between HDVs and CAVs improved to 1.59, indicating more equitable treatment across vehicle types. These findings suggest that the GAT-SAC framework holds significant promise for real-world deployment in mixed-autonomy traffic systems.
Census-Based Population Autonomy For Distributed Robotic Teaming
Collaborating teams of robots show promise due in their ability to complete missions more efficiently and with improved robustness, attributes that are particularly useful for systems operating in marine environments. A key issue is how to model, analyze, and design these multi-robot systems to realize the full benefits of collaboration, a challenging task since the domain of multi-robot autonomy encompasses both collective and individual behaviors. This paper introduces a layered model of multi-robot autonomy that uses the principle of census, or a weighted count of the inputs from neighbors, for collective decision-making about teaming, coupled with multi-objective behavior optimization for individual decision-making about actions. The census component is expressed as a nonlinear opinion dynamics model and the multi-objective behavior optimization is accomplished using interval programming. This model can be reduced to recover foundational algorithms in distributed optimization and control, while the full model enables new types of collective behaviors that are useful in real-world scenarios. To illustrate these points, a new method for distributed optimization of subgroup allocation is introduced where robots use a gradient descent algorithm to minimize portions of the cost functions that are locally known, while being influenced by the opinion states from neighbors to account for the unobserved costs. With this method the group can collectively use the information contained in the Hessian matrix of the total global cost. The utility of this model is experimentally validated in three categorically different experiments with fleets of autonomous surface vehicles: an adaptive sampling scenario, a high value unit protection scenario, and a competitive game of capture the flag.
comment: 16 pages, 17 figures
CPU-Based Layout Design for Picker-to-Parts Pallet Warehouses
Picker-to-parts pallet warehouses often face inefficiencies due to conventional layouts causing excessive travel distances and high labor requirements. This study introduces a novel layout design inspired by CPU architecture, partitioning warehouse space into specialized zones, namely Performance (P), Efficiency (E), and Shared (S). Discrete-event simulation is used to evaluate this design against traditional rectangular (random and ABC storage) and Flying-V layouts. Results demonstrate significant improvements in throughput time and reduced labor requirements, highlighting the potential for CPU-based layouts in optimizing warehouse operations.
comment: 15 pages,10 figures, conference
Strategic Communication and Language Bias in Multi-Agent LLM Coordination
Large Language Model (LLM)-based agents are increasingly deployed in multi-agent scenarios where coordination is crucial but not always assured. Research shows that the way strategic scenarios are framed linguistically can affect cooperation. This paper explores whether allowing agents to communicate amplifies these language-driven effects. Leveraging FAIRGAME, we simulate one-shot and repeated games across different languages and models, both with and without communication. Our experiments, conducted with two advanced LLMs-GPT-4o and Llama 4 Maverick-reveal that communication significantly influences agent behavior, though its impact varies by language, personality, and game structure. These findings underscore the dual role of communication in fostering coordination and reinforcing biases.
Osprey: A Scalable Framework for the Orchestration of Agentic Systems
Coordinating workflows across complex systems remains a central challenge in safety-critical environments such as scientific facilities. Language-model-driven agents offer a natural interface for these tasks, but existing approaches often lack scalability, reliability, and human oversight. We introduce the Osprey Framework, a domain-agnostic, production-ready architecture for scalable agentic systems that integrate conversational context with robust tool orchestration across safety-critical domains. Our framework provides: (i) dynamic capability classification to select only relevant tools; (ii) plan-first orchestration with explicit dependencies and optional human approval; (iii) context-aware task extraction that combines dialogue history with external memory and domain resources; and (iv) production-ready execution with checkpointing, artifact management, and modular deployment. We demonstrate its versatility through two case studies: a deployment at the Advanced Light Source particle accelerator and a tutorial-style wind farm monitoring example. These results establish Osprey as a reliable and transparent framework for agentic systems across diverse high-stakes domains.
I Want to Break Free! Persuasion and Anti-Social Behavior of LLMs in Multi-Agent Settings with Social Hierarchy
As LLM-based agents become increasingly autonomous and will more freely interact with each other, studying the interplay among them becomes crucial to anticipate emergent phenomena and potential risks. In this work, we provide an in-depth analysis of the interactions among agents within a simulated hierarchical social environment, drawing inspiration from the Stanford Prison Experiment. Leveraging 2,400 conversations across six LLMs (i.e., LLama3, Orca2, Command-r, Mixtral, Mistral2, and gpt4.1) and 240 experimental scenarios, we analyze persuasion and anti-social behavior between a guard and a prisoner agent with differing objectives. We first document model-specific conversational failures in this multi-agent power dynamic context, thereby narrowing our analytic sample to 1,600 conversations. Among models demonstrating successful interaction, we find that goal setting significantly influences persuasiveness but not anti-social behavior. Moreover, agent personas, especially the guard's, substantially impact both successful persuasion by the prisoner and the manifestation of anti-social actions. Notably, we observe the emergence of anti-social conduct even in absence of explicit negative personality prompts. These results have important implications for the development of interactive LLM agents and the ongoing discussion of their societal impact.
Tongyi DeepResearch Technical Report
We present Tongyi DeepResearch, an agentic large language model, which is specifically designed for long-horizon, deep information-seeking research tasks. To incentivize autonomous deep research agency, Tongyi DeepResearch is developed through an end-to-end training framework that combines agentic mid-training and agentic post-training, enabling scalable reasoning and information seeking across complex tasks. We design a highly scalable data synthesis pipeline that is fully automatic, without relying on costly human annotation, and empowers all training stages. By constructing customized environments for each stage, our system enables stable and consistent interactions throughout. Tongyi DeepResearch, featuring 30.5 billion total parameters, with only 3.3 billion activated per token, achieves state-of-the-art performance across a range of agentic deep research benchmarks, including Humanity's Last Exam, BrowseComp, BrowseComp-ZH, WebWalkerQA, xbench-DeepSearch, FRAMES and xbench-DeepSearch-2510. We open-source the model, framework, and complete solutions to empower the community.
comment: https://tongyi-agent.github.io/blog
When Is Diversity Rewarded in Cooperative Multi-Agent Learning?
The success of teams in robotics, nature, and society often depends on the division of labor among diverse specialists; however, a principled explanation for when such diversity surpasses a homogeneous team is still missing. Focusing on multi-agent task allocation problems, we study this question from the perspective of reward design: what kinds of objectives are best suited for heterogeneous teams? We first consider an instantaneous, non-spatial setting where the global reward is built by two generalized aggregation operators: an inner operator that maps the $N$ agents' effort allocations on individual tasks to a task score, and an outer operator that merges the $M$ task scores into the global team reward. We prove that the curvature of these operators determines whether heterogeneity can increase reward, and that for broad reward families this collapses to a simple convexity test. Next, we ask what incentivizes heterogeneity to emerge when embodied, time-extended agents must learn an effort allocation policy. To study heterogeneity in such settings, we use multi-agent reinforcement learning (MARL) as our computational paradigm, and introduce Heterogeneity Gain Parameter Search (HetGPS), a gradient-based algorithm that optimizes the parameter space of underspecified MARL environments to find scenarios where heterogeneity is advantageous. Across different environments, we show that HetGPS rediscovers the reward regimes predicted by our theory to maximize the advantage of heterogeneity, both validating HetGPS and connecting our theoretical insights to reward design in MARL. Together, these results help us understand when behavioral diversity delivers a measurable benefit.
Communicating Plans, Not Percepts: Scalable Multi-Agent Coordination with Embodied World Models NeurIPS 2025
Robust coordination is critical for effective decision-making in multi-agent systems, especially under partial observability. A central question in Multi-Agent Reinforcement Learning (MARL) is whether to engineer communication protocols or learn them end-to-end. We investigate this dichotomy using embodied world models. We propose and compare two communication strategies for a cooperative task-allocation problem. The first, Learned Direct Communication (LDC), learns a protocol end-to-end. The second, Intention Communication, uses an engineered inductive bias: a compact, learned world model, the Imagined Trajectory Generation Module (ITGM), which uses the agent's own policy to simulate future states. A Message Generation Network (MGN) then compresses this plan into a message. We evaluate these approaches on goal-directed interaction in a grid world, a canonical abstraction for embodied AI problems, while scaling environmental complexity. Our experiments reveal that while emergent communication is viable in simple settings, the engineered, world model-based approach shows superior performance, sample efficiency, and scalability as complexity increases. These findings advocate for integrating structured, predictive models into MARL agents to enable active, goal-driven coordination.
comment: Published in the Proceedings of the 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: Scaling Environments for Agents (SEA). Additionally accepted for presentation in the NeurIPS 2025 Workshop: Embodied World Models for Decision Making (EWM) and the NeurIPS 2025 Workshop: Optimization for Machine Learning (OPT)
Co-Evolving Complexity: An Adversarial Framework for Automatic MARL Curricula NeurIPS 2025
The advancement of general-purpose intelligent agents is intrinsically linked to the environments in which they are trained. While scaling models and datasets has yielded remarkable capabilities, scaling the complexity, diversity, and interactivity of environments remains a crucial bottleneck. Hand-crafted environments are finite and often contain implicit biases, limiting the potential for agents to develop truly generalizable and robust skills. In this work, we propose a paradigm for generating a boundless and adaptive curriculum of challenges by framing the environment generation process as an adversarial game. We introduce a system where a team of cooperative multi-agent defenders learns to survive against a procedurally generative attacker. The attacker agent learns to produce increasingly challenging configurations of enemy units, dynamically creating novel worlds tailored to exploit the defenders' current weaknesses. Concurrently, the defender team learns cooperative strategies to overcome these generated threats. This co-evolutionary dynamic creates a self-scaling environment where complexity arises organically from the adversarial interaction, providing an effectively infinite stream of novel and relevant training data. We demonstrate that with minimal training, this approach leads to the emergence of complex, intelligent behaviors, such as flanking and shielding by the attacker, and focus-fire and spreading by the defenders. Our findings suggest that adversarial co-evolution is a powerful mechanism for automatically scaling environmental complexity, driving agents towards greater robustness and strategic depth.
comment: Published in the proceedings of the 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: Scaling Environments for Agents (SEA)
Generative World Models of Tasks: LLM-Driven Hierarchical Scaffolding for Embodied Agents NeurIPS 2025
Recent advances in agent development have focused on scaling model size and raw interaction data, mirroring successes in large language models. However, for complex, long-horizon multi-agent tasks such as robotic soccer, this end-to-end approach often fails due to intractable exploration spaces and sparse rewards. We propose that an effective world model for decision-making must model the world's physics and also its task semantics. A systematic review of 2024 research in low-resource multi-agent soccer reveals a clear trend towards integrating symbolic and hierarchical methods, such as Hierarchical Task Networks (HTNs) and Bayesian Strategy Networks (BSNs), with multi-agent reinforcement learning (MARL). These methods decompose complex goals into manageable subgoals, creating an intrinsic curriculum that shapes agent learning. We formalize this trend into a framework for Hierarchical Task Environments (HTEs), which are essential for bridging the gap between simple, reactive behaviors and sophisticated, strategic team play. Our framework incorporates the use of Large Language Models (LLMs) as generative world models of tasks, capable of dynamically generating this scaffolding. We argue that HTEs provide a mechanism to guide exploration, generate meaningful learning signals, and train agents to internalize hierarchical structure, enabling the development of more capable and general-purpose agents with greater sample efficiency than purely end-to-end approaches.
comment: In the 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: Embodied World Models for Decision Making (EWM)
H-NeiFi: Non-Invasive and Consensus-Efficient Multi-Agent Opinion Guidance
The openness of social media enables the free exchange of opinions, but it also presents challenges in guiding opinion evolution towards global consensus. Existing methods often directly modify user views or enforce cross-group connections. These intrusive interventions undermine user autonomy, provoke psychological resistance, and reduce the efficiency of global consensus. Additionally, due to the lack of a long-term perspective, promoting local consensus often exacerbates divisions at the macro level. To address these issues, we propose the hierarchical, non-intrusive opinion guidance framework, H-NeiFi. It first establishes a two-layer dynamic model based on social roles, considering the behavioral characteristics of both experts and non-experts. Additionally, we introduce a non-intrusive neighbor filtering method that adaptively controls user communication channels. Using multi-agent reinforcement learning (MARL), we optimize information propagation paths through a long-term reward function, avoiding direct interference with user interactions. Experiments show that H-NeiFi increases consensus speed by 22.0% to 30.7% and maintains global convergence even in the absence of experts. This approach enables natural and efficient consensus guidance by protecting user interaction autonomy, offering a new paradigm for social network governance.
Systems and Control (CS)
LLM-Supported Formal Knowledge Representation for Enhancing Control Engineering Content with an Interactive Semantic Layer
The rapid growth of research output in control engineering calls for new approaches to structure and formalize domain knowledge. This paper briefly describes an LLM-supported method for semi-automated generation of formal knowledge representations that combine human readability with machine interpretability and increased expressiveness. Based on the Imperative Representation of Knowledge (PyIRK) framework, we demonstrate how language models can assist in transforming natural-language descriptions and mathematical definitions (available as LaTeX source code) into a formalized knowledge graph. As a first application we present the generation of an ``interactive semantic layer'' to enhance the source documents in order to facilitate knowledge transfer. From our perspective this contributes to the vision of easily accessible, collaborative, and verifiable knowledge bases for the control engineering domain.
comment: 4 pages, 2 figures
Adjustable Low-Cost Highly Sensitive Microwave Oscillator Sensor for Liquid Level Detection
This paper explores the implementation of a low-cost high-precision microwave oscillator sensor with an adjustable input resistance to enhance its limit of detection (LoD). To achieve this, we introduce a \textit{Z$_{2}$} branch in the input network, comprising a transmission line, a capacitor (\textit{C$_{B}$}) and a resistor (\textit{R$_{V}$}). The sensor is tested with eight different liquids with different dielectric constants, including water, IV fluid, milk, ethanol, acetone, petrol, olive oil, and Vaseline. By fine-tuning the \textit{Z$_{2}$} branch, a clear relationship is found between $\varepsilon_{r}$ of materials and R$_{V}$.Our experimental results demonstrate outstanding characteristics, including remarkable linearity (nonlinearity < 2.44\%), high accuracy with an average sensitivity of 21 kHz/$\mu$m, and an excellent limit of detection (LoD < 0.05 mm). The sensor also exhibits good stability across a range of liquid temperatures and shows robust and repeatable behavior. Considering the strong absorption of microwave energy in liquids with high dielectric constants, this oscillator sensor is a superior choice over capacitive sensors for such applications. We validate the performance of the oscillator sensor using water as a representative liquid. Additionally, we substantiate the sensor's improvement through both experimental results and theoretical analysis. Its advantages, including affordability, compatibility with CMOS and MEMS technologies, and ease of fabrication, make it an excellent choice for small-scale liquid detection applications.
An unscented Kalman filter method for real time input-parameter-state estimation
The input-parameter-state estimation capabilities of a novel unscented Kalman filter is examined herein on both linear and nonlinear systems. The unknown input is estimated in two stages within each time step. Firstly, the predicted dynamic states and the system parameters provide an estimation of the input. Secondly, the corrected with measurements states and parameters provide a final estimation. Importantly, it is demonstrated using the perturbation analysis that, a system with at least a zero or a non-zero known input can potentially be uniquely identified. This output-only methodology allows for a better understanding of the system compared to classical output-only parameter identification strategies, given that all the dynamic states, the parameters, and the input are estimated jointly and in real-time.
comment: author-accepted manuscript (AAM) published in Mechanical Systems and Signal Processing
Policy Gradient Methods for Information-Theoretic Opacity in Markov Decision Processes
Opacity, or non-interference, is a property ensuring that an external observer cannot infer confidential information (the "secret") from system observations. We introduce an information-theoretic measure of opacity, which quantifies information leakage using the conditional entropy of the secret given the observer's partial observations in a system modeled as a Markov decision process (MDP). Our objective is to find a control policy that maximizes opacity while satisfying task performance constraints, assuming that an informed observer is aware of the control policy and system dynamics. Specifically, we consider a class of opacity called state-based opacity, where the secret is a propositional formula about the past or current state of the system, and a special case of state-based opacity called language-based opacity, where the secret is defined by a temporal logic formula (LTL) or a regular language recognized by a finite-state automaton. First, we prove that finite-memory policies can outperform Markov policies in optimizing information-theoretic opacity. Second, we develop an algorithm to compute a maximally opaque Markov policy using a primal-dual gradient-based algorithm, and prove its convergence. Since opacity cannot be expressed as a cumulative cost, we develop a novel method to compute the gradient of conditional entropy with respect to policy parameters using observable operators in hidden Markov models. The experimental results validate the effectiveness and optimality of our proposed methods.
Feedback dynamics in Politics: The interplay between sentiment and engagement
We investigate feedback mechanisms in political communication by testing whether politicians adapt the sentiment of their messages in response to public engagement. Using over 1.5 million tweets from Members of Parliament in the United Kingdom, Spain, and Greece during 2021, we identify sentiment dynamics through a simple yet interpretable linear model. The analysis reveals a closed-loop behavior: engagement with positive and negative messages influences the sentiment of subsequent posts. Moreover, the learned coefficients highlight systematic differences across political roles: opposition members are more reactive to negative engagement, whereas government officials respond more to positive signals. These results provide a quantitative, control-oriented view of behavioral adaptation in online politics, showing how feedback principles can explain the self-reinforcing dynamics that emerge in social media discourse.
comment: 6 pages, 7 figures
Stochastic Redistribution of Indistinguishable Items in Shared Habitation: A Multi-Agent Simulation Framework
This paper presents a discrete-event stochastic model for the redistribution of indistinguishable personal items, exemplified by socks, among multiple cohabitants sharing a communal laundry system. Drawing on concepts from ecological population dynamics, diffusion processes, and stochastic exchange theory, the model captures the probabilistic mechanisms underlying item mixing, recovery, and loss. Each cohabitant is represented as an autonomous agent whose belongings interact through iterative cycles of collective washing, sorting, and partial correction. The system's evolution is characterized by random mixing events, selective recollection, and attrition over time. Implemented using the SimPy discrete-event simulation framework, the model demonstrates that even minimal exchange probabilities can generate emergent asymmetries, quasi-equilibrium distributions, and long-term disorder. The findings illustrate how stochastic processes inherent to shared domestic systems can produce persistent imbalances, offering a quantitative perspective on an everyday social phenomenon.
Natural-gas storage modelling by deep reinforcement learning
We introduce GasRL, a simulator that couples a calibrated representation of the natural gas market with a model of storage-operator policies trained with deep reinforcement learning (RL). We use it to analyse how optimal stockpile management affects equilibrium prices and the dynamics of demand and supply. We test various RL algorithms and find that Soft Actor Critic (SAC) exhibits superior performance in the GasRL environment: multiple objectives of storage operators - including profitability, robust market clearing and price stabilisation - are successfully achieved. Moreover, the equilibrium price dynamics induced by SAC-derived optimal policies have characteristics, such as volatility and seasonality, that closely match those of real-world prices. Remarkably, this adherence to the historical distribution of prices is obtained without explicitly calibrating the model to price data. We show how the simulator can be used to assess the effects of EU-mandated minimum storage thresholds. We find that such thresholds have a positive effect on market resilience against unanticipated shifts in the distribution of supply shocks. For example, with unusually large shocks, market disruptions are averted more often if a threshold is in place.
comment: 8 pages, 5 figures, published on
ISAC Empowered Air-Sea Collaborative System: A UAV-USV Joint Inspection Framework
In this paper, we construct an air-sea collaborative system framework based on the Integrated Sensing and Communication (ISAC) techniques, where the Unmanned Aerial Vehicle (UAV) and Unmanned Surface Vehicle (USV) jointly inspect targets of interest while keeping communication with each other simultaneously. First, we demonstrate the unique challenges encountered in this collaborative system, i.e., the coupling and heterogeneity of the UAV/USV's trajectories. Then, we formulate a total energy consumption minimization problem to jointly optimize the trajectories, flying and hovering times, target scheduling, and beamformers under the constraints of water currents, collision avoidance, and Sensing and Communication (S\&C) requirements. To address the strong coupling of the variables, we divide the original problem into two subproblems, namely, the hover point selection and the joint trajectory planning and beamforming design. In the first subproblem, we propose a three-step hierarchical method including: (1) a virtual base station coverage (VBSC) and clustering algorithm to obtain the target scheduling and rough position of hover points; (2) a Bi-traveling salesman problem with neighborhood (Bi-TSPN)-based algorithm to determine the visiting order sequence of the hover points; (3) a hover point refinement and time allocation algorithm to further optimize the time allocation. In the latter subproblem, we complete the remaining trajectory planning and beamforming design in each flying and hovering stage by developing a semi-definite relaxation (SDR) and successive convex approximation (SCA) method. Finally, we conduct a series of simulations to demonstrate the superiority of the proposed scheme over existing sequential access and leader-follower strategies.
comment: 13 pages, 15 figures
Analytical Framework for Assessing Effective Regional Inertia
This paper proposes a novel formulation of effective regional inertia that explicitly accounts for both system topology and the spatial distribution of inertia. Unlike traditional approaches that model a region as an aggregated machine with an equivalent inertia, the proposed metric provides a topology-aware representation. The methodology builds on an analytical framework that extends classical slow coherency theory to address network partitioning and regional frequency stability. Based on these partitions, we develop a systematic procedure to evaluate the effective inertia of each region, enabling a more accurate interpretation of local inertial contributions, including those from virtual inertia provided by inverter-based resources (IBRs). Case studies on the IEEE 39-bus and 68-bus systems demonstrate that the integration of inertial devices does not uniformly improve system frequency response, underscoring the importance of the proposed metric for effective regional inertia assessment.
Reliability entails input-selective contraction and regulation in excitable networks
The animal nervous system offers a model of computation combining digital reliability and analog efficiency. Understanding how this sweet spot can be realized is a core question of neuromorphic engineering. To this aim, this paper explores the connection between reliability, contraction, and regulation in excitable systems. Using the FitzHugh-Nagumo model of excitable behavior as a proof-of-concept, it is shown that neuronal reliability can be formalized as an average trajectory contraction property induced by the input. In excitable networks, reliability is shown to enable regulation of the network to a robustly stable steady state. It is thus posited that regulation provides a notion of dynamical analog computation, and that stability makes such a computation model robust.
Many-vs-Many Missile Guidance via Virtual Targets
This paper presents a novel approach to many-vs-many missile guidance using virtual targets (VTs) generated by a Normalizing Flows-based trajectory predictor. Rather than assigning n interceptors directly to m physical targets through conventional weapon target assignment algorithms, we propose a centralized strategy that constructs n VT trajectories representing probabilistic predictions of maneuvering target behavior. Each interceptor is guided toward its assigned VT using Zero-Effort-Miss guidance during midcourse flight, transitioning to Proportional Navigation guidance for terminal interception. This approach treats many-vs-many engagements as many-vs-distribution scenarios, exploiting numerical superiority (n > m) by distributing interceptors across diverse trajectory hypotheses rather than pursuing identical deterministic predictions. Monte Carlo simulations across various target-interceptor configurations (1-6 targets, 1-8 interceptors) demonstrate that the VT method matches or exceeds baseline straight-line prediction performance by 0-4.1% when n = m, with improvements increasing to 5.8-14.4% when n > m. The results confirm that probabilistic VTs enable effective exploitation of numerical superiority, significantly increasing interception probability in many-vs-many scenarios.
comment: will be submitted to Journal of Guidance, Control, and Dynamics as Technical Note
Decentralized Approach to Detect and Eliminate Flapping Phenomena due to Flexible Resources
This paper presents a decentralized methodology for detecting and mitigating flapping phenomena in power systems, primarily caused by the operation of discrete devices. The proposed approach applies moving-window autocorrelation to local measurements, enabling each device to autonomously identify sustained oscillations. Upon detection, a probabilistic, device-specific mitigation strategy is executed. Flexible demand resources (DFRs), under-load tap changers (ULTCs), and automatic voltage regulators (AVRs) are utilised to illustrate the performance of the proposed approach to both discrete and continuous-operation devices. Results show that the proposed method is robust and properly distinguishes damped oscillations from persistent flapping, allowing devices to independently recognize problematic operating scenarios and implement corrective actions accordingly.
Before AI Takes Over: Rethinking Nonlinear Signal Processing in Communications
There is an urgent reflection on traditional nonlinear signal processing methods in communications before Artificial Intelligence (AI) dominates the field. It implies a need to reassess or reinterpret established theories and tools, highlighting the tension between data-driven and model-based approaches. This paper calls for preserving valuable insights from classical signal processing while exploring how they can coexist or integrate with emerging AI methods.
comment: Submitted to npj Wireless Technology
Coherency among Power System Devices
The paper proposes a novel general definition of coherency among power system devices of any type. The proposed approach is thus not limited to synchronous machines. With this aim, the paper shows that coherency can be formally based on the difference in the complex frequency of the current injections of any two devices electrically connected to the same grid. The proposed definition is model-agnostic, making it general and suitable for modern power systems composed of a heterogeneous mix of technologies. The paper also provides a systematic analytical procedure to study the properties that specific device models must satisfy to be coherent. Time-domain simulations are conducted in three case studies whose results illustrate the ability of our definition to evaluate coherency among any type of device.
Using ensemble learning with hybrid graph neural networks and transformers to predict traffic in cities
Intelligent transportation systems (ITS) still have a hard time accurately predicting traffic in cities, especially in big, multimodal settings with complicated spatiotemporal dynamics. This paper presents HybridST, a hybrid architecture that integrates Graph Neural Networks (GNNs), multi-head temporal Transformers, and supervised ensemble learning methods (XGBoost or Random Forest) to collectively capture spatial dependencies, long-range temporal patterns, and exogenous signals, including weather, calendar, or control states. We test our model on the METR-LA, PEMS-BAY, and Seattle Loop tree public benchmark datasets. These datasets include situations ranging from freeway sensor networks to vehicle-infrastructure cooperative perception. Experimental results show that HybridST consistently beats classical baselines (LSTM, GCN, DCRNN, PDFormer) on important metrics like MAE and RMSE, while still being very scalable and easy to understand. The proposed framework presents a promising avenue for real-time urban mobility planning, energy optimization, and congestion alleviation strategies, especially within the framework of smart cities and significant events such as the 2030 FIFA World Cup.
Generalized Swing Control Framework for Inverter-based Resources
This paper proposes a novel control framework designed for Inverter-Based Resources (IBRs), denoted as Generalized Swing Control (GSC). The proposed GSC framework generalizes the definition of Grid-Forming (GFM) control schemes and exploits the coupling between active and reactive power dynamics. To validate the proposed scheme, we conduct extensive time-domain simulations and small-signal analysis using a modified version of the WSCC 9-bus system and a 1479-bus dynamic model of the all-island Irish transmission system. The case studies focus on evaluating the dynamic performance of the proposed framework under different configurations, including Virtual Synchronous Machine (VSM), coupled-VSM and dual-VSM schemes. To address the nonlinear nature of power system dynamics, sensitivity analysis based on Monte Carlo methods are employed to improve parameter tuning and assess the stability of GSC configurations in the studied systems.
Decentralized Voltage Control of AC Microgrids with Constant Power Loads using Control Barrier Functions
This paper proposes a novel nonlinear decentralized voltage controller for constrained regulation of meshed AC Microgrid networks with high penetration of constant power loads. Perceiving the load demand as an unknown disturbance, the network model is reformulated in a cascaded structure composed of a nominal, i.e. uncertainty-free, and an error subsystem. The latter captures the distance between the true and the nominal state trajectories, for which we prove boundedness via a suitable control barrier function. Under sufficient conditions, we prove asymptotic stability of the cascaded dynamics with respect to an equilibrium set and also provide an estimate of the region of attraction. In addition, it is rigorously shown that the proposed nonlinear control law also enforces constrained regulation around a rated voltage value, without the need of saturation devices. The operation of the closed-loop system is illustrated in a simulation scenario, demonstrating bounded operation and convergence to a neighbourhood of the desired reference vector.
comment: 12 pages
Explicit MPC for the constrained zonotope case with low-rank matrix updates
Solving the explicit Model Predictive Control (MPC) problem requires enumerating all critical regions and their associated feedback laws, a task that scales exponentially with the system dimension and the prediction horizon, as well. When the problem's constraints are boxes or zonotopes, the feasible domain admits a compact constrained-zonotope representation. Building on this insight, we exploit the geometric properties of the equivalent constrained-zonotope reformulation to accelerate the computation of the explicit solution. Specifically, we formulate the multi-parametric problem in the lifted generator space and solve it using second-order optimality conditions, employ low-rank matrix updates to reduce computation time, and introduce an analytic enumeration of candidate active sets that yields the explicit solution in tree form.
A Kullback-Leibler divergence method for input-system-state identification
The capability of a novel Kullback-Leibler divergence method is examined herein within the Kalman filter framework to select the input-parameter-state estimation execution with the most plausible results. This identification suffers from the uncertainty related to obtaining different results from different initial parameter set guesses, and the examined approach uses the information gained from the data in going from the prior to the posterior distribution to address the issue. Firstly, the Kalman filter is performed for a number of different initial parameter sets providing the system input-parameter-state estimation. Secondly, the resulting posterior distributions are compared simultaneously to the initial prior distributions using the Kullback-Leibler divergence. Finally, the identification with the least Kullback-Leibler divergence is selected as the one with the most plausible results. Importantly, the method is shown to select the better performed identification in linear, nonlinear, and limited information applications, providing a powerful tool for system monitoring.
comment: 32 pages, 17 figures, published in Journal of Sound and Vibration
Constrained Performance Boosting Control for Nonlinear Systems via ADMM
We present the Alternating Direction Method of Multipliers for Performance Boosting (ADMM-PB), an approach to design performance boosting controllers for stable or pre-stabilized nonlinear systems, while explicitly seeking input and state constraint satisfaction. Rooted on a recently proposed approach for designing neural-network controllers that guarantees closed-loop stability by design while minimizing generic cost functions, our strategy integrates it within an alternating direction method of multipliers routine to seek constraint handling without modifying the controller structure of the aforementioned seminal strategy. Our numerical results showcase the advantages of the proposed approach over a baseline penalizing constraint violation through barrier-like terms in the cost, indicating that ADMM-PB can lead to considerably lower constraint violations at the price of inducing slightly more cautious closed-loop behaviors.
H-Infinity Filter Enhanced CNN-LSTM for Arrhythmia Detection from Heart Sound Recordings ICSE
Early detection of heart arrhythmia can prevent severe future complications in cardiac patients. While manual diagnosis still remains the clinical standard, it relies heavily on visual interpretation and is inherently subjective. In recent years, deep learning has emerged as a powerful tool to automate arrhythmia detection, offering improved accuracy, consistency, and efficiency. Several variants of convolutional and recurrent neural network architectures have been widely explored to capture spatial and temporal patterns in physiological signals. However, despite these advancements, current models often struggle to generalize well in real-world scenarios, especially when dealing with small or noisy datasets, which are common challenges in biomedical applications. In this paper, a novel CNN-H-Infinity-LSTM architecture is proposed to identify arrhythmic heart signals from heart sound recordings. This architecture introduces trainable parameters inspired by the H-Infinity filter from control theory, enhancing robustness and generalization. Extensive experimentation on the PhysioNet CinC Challenge 2016 dataset, a public benchmark of heart audio recordings, demonstrates that the proposed model achieves stable convergence and outperforms existing benchmarks, with a test accuracy of 99.42% and an F1 score of 98.85%.
comment: This is a preprint of a paper to appear at the 15th IEEE International Conference on Systems Engineering and Technology (ICSET 2025)
ZJUNlict Extended Team Description Paper 2025
This paper presents the ZJUNlict team's work over the past year, covering both hardware and software advancements. In the hardware domain, the integration of an IMU into the v2023 robot was completed to enhance posture accuracy and angular velocity planning. On the software side, key modules were optimized, including the strategy and CUDA modules, with significant improvements in decision making efficiency, ball pursuit prediction, and ball possession prediction to adapt to high-tempo game dynamics.
Performance Analysis of NOMA-Assisted Optical OFDM ISAC Systems with Clipping Distortion
This paper studies the performance of optical orthogonal frequency-division multiplexing (OFDM)-based multi-user integrated sensing and communication (ISAC) systems employing non-orthogonal multiple access (NOMA). Due to their inherent high peak-to-average power ratio (PAPR), OFDM waveforms are clipped to fit the limited dynamic range of the optical transmitters (e.g., light-emitting diodes (LEDs)), resulting in clipping distortion. To alleviate the impact of the distortion, we propose a novel transmitter architecture where the clipping processes are performed before NOMA superposition coding. We then analyze the performance of the proposed optical ISAC systems considering the effects of power allocation and clipping distortion. For the communication subsystem, we analyze the effect of NOMA on the achievable sum rate and bit error rate (BER). For the sensing subsystem, the root mean square error (RMSE) and Cram\'er-Rao bound (CRB) of estimating the transmission distance accuracy are obtained. Simulation results reveal that allocating more power to the strong user yields a higher sum rate, lower BER, and better sensing performance, whereas a more balanced power allocation among users results in degraded BER and sensing performance.
A Reliability-Cost Optimization Framework for EV and DER Integration in Standard and Reconfigurable Distribution Network Topologies
The rapid growth of electric vehicle (EV) adoption poses operational and economic challenges for power distribution systems, including increased line loading levels and network congestions. This may require potential infrastructure reinforcement and expansion. As a fast inexpensive alternative solution, network topology reconfiguration (NTR) offers a practical means to redistribute power flows, reduce operational costs, and defer infrastructure upgrades. This paper presents a linear programming framework to evaluate the impact of varying EV penetration on operational costs under four configurations: standard distribution network (SDN), SDN with NTR (SDNTR), SDN with distributed energy resources (SDN-DER), and SDNTR with DERs (SDNTR-DER). Numerical simulations are conducted on the IEEE 33-bus system. The analysis demonstrates that integrating DERs reduces operational costs, while NTR further enhances system flexibility, enabling higher EV penetration levels without compromising feasibility. The combined SDNTR-DER approach offers the most cost-effective and reliable pathway for accommodating future EV growth while mitigating the need for immediate infrastructure upgrades.
Online Distributed Zeroth-Order Optimization With Non-Zero-Mean Adverse Noises
In this paper, the problem of online distributed zeroth-order optimization subject to a set constraint is studied via a multi-agent network, where each agent can communicate with its immediate neighbors via a time-varying directed graph. Different from the existing works on online distributed zeroth- order optimization, we consider the case where the estimate on the gradients are influenced by some non-zero-mean adverse noises. To handle this problem, we propose a new online dis- tributed zeroth-order mirror descent algorithm involving a kernel function-based estimator and a clipped strategy. Particularly, in the estimator, the kernel function-based strategy is provided to deal with the adverse noises, and eliminate the low-order terms in the Taylor expansions of the objective functions. Furthermore, the performance of the presented algorithm is measured by employing the dynamic regrets, where the offline benchmarks are to find the optimal point at each time. Under the mild assumptions on the graph and the objective functions, we prove that if the variation in the optimal point sequence grows at a certain rate, then the high probability bound of the dynamic regrets increases sublinearly. Finally, a simulation experiment is worked out to demonstrate the effectiveness of our theoretical results.
Near Optimal Convergence to Coarse Correlated Equilibrium in General-Sum Markov Games
No-regret learning dynamics play a central role in game theory, enabling decentralized convergence to equilibrium for concepts such as Coarse Correlated Equilibrium (CCE) or Correlated Equilibrium (CE). In this work, we improve the convergence rate to CCE in general-sum Markov games, reducing it from the previously best-known rate of $\mathcal{O}(\log^5 T / T)$ to a sharper $\mathcal{O}(\log T / T)$. This matches the best known convergence rate for CE in terms of $T$, number of iterations, while also improving the dependence on the action set size from polynomial to polylogarithmic-yielding exponential gains in high-dimensional settings. Our approach builds on recent advances in adaptive step-size techniques for no-regret algorithms in normal-form games, and extends them to the Markovian setting via a stage-wise scheme that adjusts learning rates based on real-time feedback. We frame policy updates as an instance of Optimistic Follow-the-Regularized-Leader (OFTRL), customized for value-iteration-based learning. The resulting self-play algorithm achieves, to our knowledge, the fastest known convergence rate to CCE in Markov games.
Census-Based Population Autonomy For Distributed Robotic Teaming
Collaborating teams of robots show promise due in their ability to complete missions more efficiently and with improved robustness, attributes that are particularly useful for systems operating in marine environments. A key issue is how to model, analyze, and design these multi-robot systems to realize the full benefits of collaboration, a challenging task since the domain of multi-robot autonomy encompasses both collective and individual behaviors. This paper introduces a layered model of multi-robot autonomy that uses the principle of census, or a weighted count of the inputs from neighbors, for collective decision-making about teaming, coupled with multi-objective behavior optimization for individual decision-making about actions. The census component is expressed as a nonlinear opinion dynamics model and the multi-objective behavior optimization is accomplished using interval programming. This model can be reduced to recover foundational algorithms in distributed optimization and control, while the full model enables new types of collective behaviors that are useful in real-world scenarios. To illustrate these points, a new method for distributed optimization of subgroup allocation is introduced where robots use a gradient descent algorithm to minimize portions of the cost functions that are locally known, while being influenced by the opinion states from neighbors to account for the unobserved costs. With this method the group can collectively use the information contained in the Hessian matrix of the total global cost. The utility of this model is experimentally validated in three categorically different experiments with fleets of autonomous surface vehicles: an adaptive sampling scenario, a high value unit protection scenario, and a competitive game of capture the flag.
comment: 16 pages, 17 figures
Microgrids optimal radial reconfiguration via FORWARD algorithm
Microgrids offer a promising paradigm for integrating distributed energy resources, bolstering energy resilience, and reducing the impact of blackouts. However, their inherent decentralization and dynamic operation present substantial energy management complexities. These complexities, including balancing supply and demand, ensuring system stability, and minimizing operational costs, often necessitate solving computationally intractable NP-hard Mixed-Integer Non-Linear Programming (MINLP) problems. Traditional MINLP solvers struggle with the scalability and feasibility guarantees required for these challenges. To address this, this paper tackles the problem of resource allocation and radial configuration design for microgrid power distribution and proposes and abstracted problem which is solved by introducing a permutation-based iterative search method over the recently introduced FORWARD method to efficiently identify feasible, near-optimal radial network structures while inherently respecting physical constraints. Furthermore, this paper investigates the integration of the proposed method as a warm-start strategy for benchmark MINLP solvers offering a scalable solution for comprehensive microgrid design.
Quantifying Power Systems Resilience Using Statistical Analysis and Bayesian Learning
The increasing frequency and intensity of extreme weather events is significantly affecting the power grid, causing large-scale outages and impacting power system resilience. Yet limited work has been done on systematically modeling the impacts of weather parameters to quantify resilience. This study presents a framework using statistical and Bayesian learning approaches to quantitatively model the relationship between weather parameters and power system resilience metrics. By leveraging real-world publicly available outage and weather data, we identify key weather variables of wind speed, temperature, and precipitation influencing a particular region's resilience metrics. A case study of Cook County, Illinois, and Miami-Dade County, Florida, reveals that these weather parameters are critical factors in resiliency analysis and risk assessment. Additionally, we find that these weather variables have combined effects when studied jointly compared to their effects in isolation. This framework provides valuable insights for understanding how weather events affect power distribution system performance, supporting decision-makers in developing more effective strategies for risk mitigation, resource allocation, and adaptation to changing climatic conditions.
Distributed Incast Detection in Data Center Networks
Incast traffic in data centers can lead to severe performance degradation, such as packet loss and increased latency. Effectively addressing incast requires prompt and accurate detection. Existing solutions, including MA-ECN, BurstRadar and Pulser, typically rely on fixed thresholds of switch port egress queue lengths or their gradients to identify microburst caused by incast flows. However, these queue length related methods often suffer from delayed detection and high error rates. In this study, we propose a distributed incast detection method for data center networks at the switch-level, leveraging a probabilistic hypothesis test with an optimal detection threshold. By analyzing the arrival intervals of new flows, our algorithm can immediately determine if a flow is part of an incast traffic from its initial packet. The experimental results demonstrate that our method offers significant improvements over existing approaches in both detection speed and inference accuracy.
Oscillation Analysis and Damping Control for a Proposed North American AC-DC Macrogrid
In recent years, several studies conducted by both industry and U.S. Department of Energy (DOE)-funded initiatives have proposed linking North America's Eastern and Western Interconnections (EI and WI) through a multiterminal DC (MTDC) macrogrid. These studies have explored the advantages and opportunities of the proposed configuration from the perspectives of capacity sharing and frequency support. However, the potential challenges of small-signal stability arising from this interconnection have not been thoroughly examined. To address this gap, detailed model-based simulation studies are performed in this paper to assess the risks of poorly damped inter-area oscillations in the proposed macrogrid. A custom-built dynamic model of the MTDC system is developed and integrated with industry-grade models of the EI and WI, incorporating high levels of inverter-based energy resources. Through model-based oscillation analysis, potential shifts in inter-area modes for both EI and WI, resulting from the MTDC integration are characterized, and modes with inadequate damping are identified. Furthermore, to mitigate the risks of unstable oscillations, supplementary damping controllers are designed for the MTDC system, leveraging wide-area feedback to modulate active power set points at selected converter stations. A frequency scanning approach is employed for data-driven model linearization and controller synthesis. The damping performance is evaluated under the designed operating conditions and selected contingency scenarios.
Robust reduced-order model predictive control using peak-to-peak analysis of filtered signals
We address the design of a model predictive control (MPC) scheme for large-scale linear systems using reduced-order models (ROMs). Our approach uses a ROM, leverages tools from robust control, and integrates them into an MPC framework to achieve computational tractability with robust constraint satisfaction. Our key contribution is a method to obtain guaranteed bounds on the predicted outputs of the full-order system by predicting a (scalar) error-bounding system alongside the ROM. This bound is then used to formulate a robust ROM-based MPC that guarantees constraint satisfaction and robust performance. Our method is developed step-by-step by (i) analysing the error, (ii) bounding the peak-to-peak gain, an (iii) using filtered signals. We demonstrate our method on a 100-dimensional mass-spring-damper system, achieving over four orders of magnitude reduction in conservatism relative to existing approaches.
comment: Code available at: https://github.com/KohlerJohannes/ROM_MPC_ECC
Observer-based neural networks for flow estimation and control
Neural network observers (NNOs) are proposed for real-time estimation of fluid flows, addressing a key challenge in flow control: obtaining real-time flow states from a limited set of sparse and noisy sensor data. For this task, we propose a generalization of the classical Luenberger observer. In the present framework, the estimation loop is composed of subsystems modeled as neural networks (NNs). By combining flow information from selected probes and an NN surrogate model (NNSM) of the flow system, we train NNOs capable of fusing information to provide the best estimation of the states, that can in turn be fed back to an NN controller (NNC). The NNO capabilities are demonstrated for three nonlinear dynamical systems. First, a variation of the Kuramoto-Sivashinsky (KS) equation with control inputs is studied, where variables are sparsely probed. We show that the NNO is able to track states even when probes are contaminated with random noise or with sensors at insufficient sample rates to match the control time step. Then, a confined cylinder flow is investigated, where velocity signals along the cylinder wake are estimated by using a small set of wall pressure sensors. In both the KS and cylinder problems, we show that the estimated states can be used to enable closed-loop control, taking advantage of stabilizing NNCs. Finally, we present a legacy dataset of a turbulent boundary layer experiment, where convolutional NNs (CNNs) are employed to implement the models required for the estimation loop. We show that, by combining low-resolution noise-corrupted sensor data with an imperfect NNSM, it is possible to produce more accurate estimates, outperforming both the direct reconstructions via specialized super-resolution NNs and the direct model propagation from initial conditions.
Digital Twin-Driven Pavement Health Monitoring and Maintenance Optimization Using Graph Neural Networks
Pavement infrastructure monitoring is challenged by complex spatial dependencies, changing environmental conditions, and non-linear deterioration across road networks. Traditional Pavement Management Systems (PMS) remain largely reactive, lacking real-time intelligence for failure prevention and optimal maintenance planning. To address this, we propose a unified Digital Twin (DT) and Graph Neural Network (GNN) framework for scalable, data-driven pavement health monitoring and predictive maintenance. Pavement segments and spatial relations are modeled as graph nodes and edges, while real-time UAV, sensor, and LiDAR data stream into the DT. The inductive GNN learns deterioration patterns from graph-structured inputs to forecast distress and enable proactive interventions. Trained on a real-world-inspired dataset with segment attributes and dynamic connectivity, our model achieves an R2 of 0.3798, outperforming baseline regressors and effectively capturing non-linear degradation. We also develop an interactive dashboard and reinforcement learning module for simulation, visualization, and adaptive maintenance planning. This DT-GNN integration enhances forecasting precision and establishes a closed feedback loop for continuous improvement, positioning the approach as a foundation for proactive, intelligent, and sustainable pavement management, with future extensions toward real-world deployment, multi-agent coordination, and smart-city integration.
Toward an Agricultural Operational Design Domain: A Framework
The agricultural sector increasingly relies on autonomous systems that operate in complex and variable environments. Unlike on-road applications, agricultural automation integrates driving and working processes, each of which imposes distinct operational constraints. Handling this complexity and ensuring consistency throughout the development and validation processes requires a structured, transparent, and verified description of the environment. However, existing Operational Design Domain (ODD) concepts do not yet address the unique challenges of agricultural applications. Therefore, this work introduces the Agricultural ODD (Ag-ODD) Framework, which can be used to describe and verify the operational boundaries of autonomous agricultural systems. The Ag-ODD Framework consists of three core elements. First, the Ag-ODD description concept, which provides a structured method for unambiguously defining environmental and operational parameters using concepts from ASAM Open ODD and CityGML. Second, the 7-Layer Model derived from the PEGASUS 6-Layer Model, has been extended to include a process layer to capture dynamic agricultural operations. Third, the iterative verification process verifies the Ag-ODD against its corresponding logical scenarios, derived from the 7-Layer Model, to ensure the Ag-ODD's completeness and consistency. Together, these elements provide a consistent approach for creating unambiguous and verifiable Ag-ODD. Demonstrative use cases show how the Ag-ODD Framework can support the standardization and scalability of environmental descriptions for autonomous agricultural systems.
comment: 18 pages, 7 figures, 2 tables
Guided Bayesian Optimization: Data-Efficient Controller Tuning with Digital Twin
This article presents the guided Bayesian optimization algorithm as an efficient data-driven method for iteratively tuning closed-loop controller parameters using an event-triggered digital twin of the system based on available closed-loop data. We define a controller tuning framework independent of the controller or the plant structure. Our proposed methodology is model-free, making it suitable for nonlinear and unmodelled plants with measurement noise. The objective function consists of performance metrics modeled by Gaussian processes. We utilize the available information in the closed-loop system to identify and progressively maintain a digital twin that guides the optimizer, improving the data efficiency of our method. Switching the digital twin on and off is triggered by data-driven criteria related to the digital twin's uncertainty estimations in the BO tuning framework. Effectively, it replaces much of the exploration of the real system with exploration performed on the digital twin. We analyze the properties of our method in simulation and demonstrate its performance on two real closed-loop systems with different plant and controller structures. The experimental results show that our method requires fewer experiments on the physical plant than Bayesian optimization to find the optimal controller parameters.
comment: This work has been published in IEEE Transactions on Automation Science and Engineering
Drift Plus Optimistic Penalty: A Learning Framework for Stochastic Network Optimization with Improved Regret Bounds
We consider the problem of joint routing and scheduling in queueing networks, where the edge transmission costs are unknown. At each time-slot, the network controller receives noisy observations of transmission costs only for those edges it selects for transmission. The network controller's objective is to make routing and scheduling decisions so that the total expected cost is minimized. This problem exhibits an exploration-exploitation trade-off, however, previous bandit-style solutions cannot be directly applied to this problem due to the queueing dynamics. In order to ensure network stability, the network controller needs to optimize throughput and cost simultaneously. We show that the best achievable cost is lower bounded by the solution to a static optimization problem, and develop a network control policy using techniques from Lyapunov drift-plus-penalty optimization and multi-arm bandits. We show that the policy achieves a sub-linear regret of order $O(\sqrt{T}\log T)$, as compared to the best policy that has complete knowledge of arrivals and costs. Finally, we evaluate the proposed policy using simulations and show that its regret is indeed sub-linear.
Constrained Optimal Fuel Consumption of HEVs under Observational Noise
In our prior work, we investigated the minimum fuel consumption of a hybrid electric vehicle (HEV) under a state-of-charge (SOC) balance constraint, assuming perfect SOC measurements and accurate reference speed profiles. The constrained optimal fuel consumption (COFC) problem was addressed using a constrained reinforcement learning (CRL) framework. However, in real-world scenarios, SOC readings are often corrupted by sensor noise, and reference speeds may deviate from actual driving conditions. To account for these imperfections, this study reformulates the COFC problem by explicitly incorporating observational noise in both SOC and reference speed. We adopt a robust CRL approach, where the noise is modeled as a uniform distribution, and employ a structured training procedure to ensure stability. The proposed method is evaluated through simulations on the Toyota Prius hybrid system (THS), using both the New European Driving Cycle (NEDC) and the Worldwide Harmonized Light Vehicles Test Cycle (WLTC). Results show that fuel consumption and SOC constraint satisfaction remain robust across varying noise levels. Furthermore, the analysis reveals that observational noise in SOC and speed can impact fuel consumption to different extents. To the best of our knowledge, this is the first study to explicitly examine how observational noise -- commonly encountered in dynamometer testing and predictive energy control (PEC) applications -- affects constrained optimal fuel consumption in HEVs.
comment: Minor text and figure adjustments; no substantive changes
Virtual Target Trajectory Prediction for Stochastic Targets
Trajectory prediction of aerial vehicles is a key requirement in applications ranging from missile guidance to UAV collision avoidance. While most prediction methods assume deterministic target motion, real-world targets often exhibit stochastic behaviors such as evasive maneuvers or random gliding patterns. This paper introduces a probabilistic framework based on Conditional Normalizing Flows (CNFs) to model and predict such stochastic dynamics directly from trajectory data. The learned model generates probability distributions of future target positions conditioned on initial states and dynamic parameters, enabling efficient sampling and exact density evaluation. To provide deterministic surrogates compatible with existing guidance and planning algorithms, sampled trajectories are clustered using a time series k-means approach, yielding a set of representative "virtual target" trajectories. The method is target-agnostic, computationally efficient, and requires only trajectory data for training, making it suitable as a drop-in replacement for deterministic predictors. Simulated scenarios with maneuvering and ballistic targets demonstrate that the proposed approach bridges the gap between deterministic assumptions and stochastic reality, advancing guidance and control algorithms for autonomous vehicles.
comment: Manuscript accepted by Journal of Guidance, Control, and Dynamics
Constrained computational hybrid controller for Input Affine Hybrid Dynamical Systems
Hybrid dynamical systems are viewed as the most complicated systems with continuous and event-based behaviors. Since traditional controllers cannot handle these systems, some newly-developed controllers have been published in recent decades to deal with them. This paper presents a novel implementable constrained final-state controller based on partitioning the system's state-space, computational simulations, and graph theory. Experimental results and a comparison with Model Predictive Controller on the three tank benchmark and swing-up control of a pendulum show the effectiveness of the proposed Computational Hybrid Controller(CHC).
A moving horizon estimator for aquifer thermal energy storages
Aquifer thermal energy storages (ATES) represent groundwater saturated aquifers that store thermal energy in the form of heated or cooled groundwater. Combining two ATES, one can harness excess thermal energy from summer (heat) and winter (cold) to support the building's heating, ventilation, and air conditioning (HVAC) technology. In general, a dynamic operation of ATES throughout the year is beneficial to avoid using fossil fuel-based HVAC technology and maximize the ``green use'' of ATES. Model predictive control (MPC) with an appropriate system model may become a crucial control approach for ATES systems. Consequently, the MPC model should reflect spatial temperature profiles around ATES' boreholes to predict extracted groundwater temperatures accurately. However, meaningful predictions require the estimation of the current state of the system, as measurements are usually only at the borehole of the ATES. In control, this is often realized by model-based observers. Still, observing the state of an ATES system is non-trivial, since the model is typically hybrid. We show how to exploit the specific structure of the hybrid ATES model and design an easy-to-solve moving horizon estimator based on a quadratic program.
comment: European Control Conference 2025 (ECC), Thessaloniki, Greece
Chance-Constrained Neural MPC under Uncontrollable Agents via Sequential Convex Programming
This work investigates the challenge of ensuring safety guarantees under uncontrollable agents whose behaviors are stochastic and depend on both their own and the system's states. We present a neural model predictive control (MPC) framework that predicts the trajectory of the uncontrollable agent using a predictor learned from offline data. To provide probabilistic guarantees on prediction errors, we employ split conformal prediction to construct region-specific, time-dependent uncertainty bounds, which are integrated into the MPC formulation. To solve the resulting non-convex, discontinuous optimization problem, we propose a two-loop iterative sequential convex programming algorithm. The inner loop solves convexified subproblems with fixed error bounds, while the outer loop refines these bounds based on updated control sequences. We establish convergence guarantees under mild regularity conditions and demonstrate the optimality of the algorithm. We illustrate our method with an autonomous driving scenario involving interactive pedestrians. Experimental results demonstrate that our approach achieves superior safety and efficiency compared to baseline methods, with success rates exceeding 99.5\% while maintaining higher average speeds in multi-pedestrian scenarios.
Communicating Plans, Not Percepts: Scalable Multi-Agent Coordination with Embodied World Models NeurIPS 2025
Robust coordination is critical for effective decision-making in multi-agent systems, especially under partial observability. A central question in Multi-Agent Reinforcement Learning (MARL) is whether to engineer communication protocols or learn them end-to-end. We investigate this dichotomy using embodied world models. We propose and compare two communication strategies for a cooperative task-allocation problem. The first, Learned Direct Communication (LDC), learns a protocol end-to-end. The second, Intention Communication, uses an engineered inductive bias: a compact, learned world model, the Imagined Trajectory Generation Module (ITGM), which uses the agent's own policy to simulate future states. A Message Generation Network (MGN) then compresses this plan into a message. We evaluate these approaches on goal-directed interaction in a grid world, a canonical abstraction for embodied AI problems, while scaling environmental complexity. Our experiments reveal that while emergent communication is viable in simple settings, the engineered, world model-based approach shows superior performance, sample efficiency, and scalability as complexity increases. These findings advocate for integrating structured, predictive models into MARL agents to enable active, goal-driven coordination.
comment: Published in the Proceedings of the 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: Scaling Environments for Agents (SEA). Additionally accepted for presentation in the NeurIPS 2025 Workshop: Embodied World Models for Decision Making (EWM) and the NeurIPS 2025 Workshop: Optimization for Machine Learning (OPT)
Neural Network Aided Kalman Filtering with Model Predictive Control Enables Robot-Assisted Drone Recovery on a Wavy Surface
Recovering a drone on a disturbed water surface remains a significant challenge in maritime robotics. In this paper, we propose a unified framework for robot-assisted drone recovery on a wavy surface that addresses two major tasks: Firstly, accurate prediction of a moving drone's position under wave-induced disturbances using KalmanNet Plus Plus (KalmanNet++), a Neural Network Aided Kalman Filtering we proposed. Secondly, effective motion planning using the desired position we got for a manipulator via Receding Horizon Model Predictive Control (RHMPC). Specifically, we compared multiple prediction methods and proposed KalmanNet Plus Plus to predict the position of the UAV, thereby obtaining the desired position. The KalmanNet++ predicts the drone's future position 0.1\,s ahead, while the manipulator plans a capture trajectory in real time, thus overcoming not only wave-induced base motions but also limited constraints such as torque constraints and joint constraints. For the system design, we provide a collaborative system, comprising a manipulator subsystem and a UAV subsystem, enables drone lifting and drone recovery. Simulation and real-world experiments using wave-disturbed motion data demonstrate that our approach achieves a high success rate - above 95\% and outperforms conventional baseline methods by up to 10\% in efficiency and 20\% in precision. The results underscore the feasibility and robustness of our system, which achieves state-of-the-art performance and offers a practical solution for maritime drone operations.
comment: 17 pages, 51 figures
On the Number of Control Nodes in Boolean Networks with Degree Constraints
This paper studies the minimum control node set problem for Boolean networks (BNs) with degree constraints. The main contribution is to derive the nontrivial lower and upper bounds on the size of the minimum control node set through combinatorial analysis of four types of BNs (i.e., $k$-$k$-XOR-BNs, simple $k$-$k$-AND-BNs, $k$-$k$-AND-BNs with negation and $k$-$k$-NC-BNs, where the $k$-$k$-AND-BN with negation is an extension of the simple $k$-$k$-AND-BN that considers the occurrence of negation and NC means nested canalyzing). More specifically, four bounds for the size of the minimum control node set: general lower bound, best case upper bound, worst case lower bound, and general upper bound are studied. By dividing nodes into three disjoint sets, extending the time to reach the target state, and utilizing necessary conditions for controllability, these bounds are obtained, and further meaningful results and phenomena are discovered. Notably, all of the above results involving the AND function also apply to the OR function.
comment: 35 pages, 9 figures
Machine Learning-assisted Dynamics-Constrained Day-Ahead Energy Scheduling
TThe rapid expansion of inverter-based resources, such as wind and solar power plants, will significantly diminish the presence of conventional synchronous generators in fu-ture power grids with rich renewable energy sources. This transition introduces in-creased complexity and reduces dynamic stability in system operation and control, with low inertia being a widely recognized challenge. However, the literature has not thoroughly explored grid dynamic performance associated with energy scheduling so-lutions that traditionally only consider grid steady-state constraints. This paper will bridge the gap by enforcing grid dynamic constraints when conducting optimal energy scheduling; particularly, this paper explores locational post-contingency rate of change of frequency (RoCoF) requirements to accommodate substantial inertia reductions. This paper introduces a machine learning-assisted RoCoF-constrained unit commit-ment (ML-RCUC) model designed to ensure RoCoF stability after the most severe generator outage while maintaining operational efficiency. A graph-informed NN (GINN)-based RoCoF predictor is first trained on a high-fidelity simulation dataset to track the highest locational RoCoF, which is then reformulated as mixed-integer linear programming constraints that are integrated into the unit commitment model. Case studies, by solving the optimization problem ML-RCUC and validating its solutions with time-domain simulations, demonstrate that the proposed method can ensure loca-tional RoCoF stability with minimum conservativeness.
Improving the Accuracy of DC Optimal Power Flow Formulations via Parameter Optimization
DC Optimal Power Flow (DC-OPF) problems optimize the generators' active power setpoints while satisfying constraints based on the DC power flow linearization. The computational tractability advantages of DC-OPF problems come at the expense of inaccuracies relative to AC Optimal Power Flow (AC-OPF) problems that accurately model the nonlinear steady-state behavior of power grids. This paper proposes an algorithm that significantly improves the accuracy of the generators' active power setpoints from DC-OPF problems with respect to the corresponding AC-OPF problems over a specified range of operating conditions. Using sensitivity information in a machine learning-inspired methodology, this algorithm tunes coefficient and bias parameters in the DC power flow approximation to improve the accuracy of the resulting DC-OPF solutions. Employing the Truncated Newton Conjugate-Gradient (TNC) method -- a Quasi-Newton optimization technique -- this parameter tuning occurs during an offline training phase, with the resulting parameters then used in online computations. Numerical results underscore the algorithm's efficacy with accuracy improvements in squared two-norm and $\infty$-norm losses of up to $90\%$ and $79\%$, respectively, relative to traditional DC-OPF formulations.
Human-Exoskeleton Kinematic Calibration to Improve Hand Tracking for Dexterous Teleoperation
Hand exoskeletons are critical tools for dexterous teleoperation and immersive manipulation interfaces, but achieving accurate hand tracking remains a challenge due to user-specific anatomical variability and donning inconsistencies. These issues lead to kinematic misalignments that degrade tracking performance and limit applicability in precision tasks. We propose a subject-specific calibration framework for exoskeleton-based hand tracking that estimates virtual link parameters through residual-weighted optimization. A data-driven approach is introduced to empirically tune cost function weights using motion capture ground truth, enabling accurate and consistent calibration across users. Implemented on the Maestro hand exoskeleton with seven healthy participants, the method achieved substantial reductions in joint and fingertip tracking errors across diverse hand geometries. Qualitative visualizations using a Unity-based virtual hand further demonstrate improved motion fidelity. The proposed framework generalizes to exoskeletons with closed-loop kinematics and minimal sensing, laying the foundation for high-fidelity teleoperation and robot learning applications.
comment: 8 pages, 10 figures, 1 supplementary video, submitted to RA-L
Systems and Control (EESS)
LLM-Supported Formal Knowledge Representation for Enhancing Control Engineering Content with an Interactive Semantic Layer
The rapid growth of research output in control engineering calls for new approaches to structure and formalize domain knowledge. This paper briefly describes an LLM-supported method for semi-automated generation of formal knowledge representations that combine human readability with machine interpretability and increased expressiveness. Based on the Imperative Representation of Knowledge (PyIRK) framework, we demonstrate how language models can assist in transforming natural-language descriptions and mathematical definitions (available as LaTeX source code) into a formalized knowledge graph. As a first application we present the generation of an ``interactive semantic layer'' to enhance the source documents in order to facilitate knowledge transfer. From our perspective this contributes to the vision of easily accessible, collaborative, and verifiable knowledge bases for the control engineering domain.
comment: 4 pages, 2 figures
Adjustable Low-Cost Highly Sensitive Microwave Oscillator Sensor for Liquid Level Detection
This paper explores the implementation of a low-cost high-precision microwave oscillator sensor with an adjustable input resistance to enhance its limit of detection (LoD). To achieve this, we introduce a \textit{Z$_{2}$} branch in the input network, comprising a transmission line, a capacitor (\textit{C$_{B}$}) and a resistor (\textit{R$_{V}$}). The sensor is tested with eight different liquids with different dielectric constants, including water, IV fluid, milk, ethanol, acetone, petrol, olive oil, and Vaseline. By fine-tuning the \textit{Z$_{2}$} branch, a clear relationship is found between $\varepsilon_{r}$ of materials and R$_{V}$.Our experimental results demonstrate outstanding characteristics, including remarkable linearity (nonlinearity < 2.44\%), high accuracy with an average sensitivity of 21 kHz/$\mu$m, and an excellent limit of detection (LoD < 0.05 mm). The sensor also exhibits good stability across a range of liquid temperatures and shows robust and repeatable behavior. Considering the strong absorption of microwave energy in liquids with high dielectric constants, this oscillator sensor is a superior choice over capacitive sensors for such applications. We validate the performance of the oscillator sensor using water as a representative liquid. Additionally, we substantiate the sensor's improvement through both experimental results and theoretical analysis. Its advantages, including affordability, compatibility with CMOS and MEMS technologies, and ease of fabrication, make it an excellent choice for small-scale liquid detection applications.
An unscented Kalman filter method for real time input-parameter-state estimation
The input-parameter-state estimation capabilities of a novel unscented Kalman filter is examined herein on both linear and nonlinear systems. The unknown input is estimated in two stages within each time step. Firstly, the predicted dynamic states and the system parameters provide an estimation of the input. Secondly, the corrected with measurements states and parameters provide a final estimation. Importantly, it is demonstrated using the perturbation analysis that, a system with at least a zero or a non-zero known input can potentially be uniquely identified. This output-only methodology allows for a better understanding of the system compared to classical output-only parameter identification strategies, given that all the dynamic states, the parameters, and the input are estimated jointly and in real-time.
comment: author-accepted manuscript (AAM) published in Mechanical Systems and Signal Processing
Policy Gradient Methods for Information-Theoretic Opacity in Markov Decision Processes
Opacity, or non-interference, is a property ensuring that an external observer cannot infer confidential information (the "secret") from system observations. We introduce an information-theoretic measure of opacity, which quantifies information leakage using the conditional entropy of the secret given the observer's partial observations in a system modeled as a Markov decision process (MDP). Our objective is to find a control policy that maximizes opacity while satisfying task performance constraints, assuming that an informed observer is aware of the control policy and system dynamics. Specifically, we consider a class of opacity called state-based opacity, where the secret is a propositional formula about the past or current state of the system, and a special case of state-based opacity called language-based opacity, where the secret is defined by a temporal logic formula (LTL) or a regular language recognized by a finite-state automaton. First, we prove that finite-memory policies can outperform Markov policies in optimizing information-theoretic opacity. Second, we develop an algorithm to compute a maximally opaque Markov policy using a primal-dual gradient-based algorithm, and prove its convergence. Since opacity cannot be expressed as a cumulative cost, we develop a novel method to compute the gradient of conditional entropy with respect to policy parameters using observable operators in hidden Markov models. The experimental results validate the effectiveness and optimality of our proposed methods.
Feedback dynamics in Politics: The interplay between sentiment and engagement
We investigate feedback mechanisms in political communication by testing whether politicians adapt the sentiment of their messages in response to public engagement. Using over 1.5 million tweets from Members of Parliament in the United Kingdom, Spain, and Greece during 2021, we identify sentiment dynamics through a simple yet interpretable linear model. The analysis reveals a closed-loop behavior: engagement with positive and negative messages influences the sentiment of subsequent posts. Moreover, the learned coefficients highlight systematic differences across political roles: opposition members are more reactive to negative engagement, whereas government officials respond more to positive signals. These results provide a quantitative, control-oriented view of behavioral adaptation in online politics, showing how feedback principles can explain the self-reinforcing dynamics that emerge in social media discourse.
comment: 6 pages, 7 figures
Stochastic Redistribution of Indistinguishable Items in Shared Habitation: A Multi-Agent Simulation Framework
This paper presents a discrete-event stochastic model for the redistribution of indistinguishable personal items, exemplified by socks, among multiple cohabitants sharing a communal laundry system. Drawing on concepts from ecological population dynamics, diffusion processes, and stochastic exchange theory, the model captures the probabilistic mechanisms underlying item mixing, recovery, and loss. Each cohabitant is represented as an autonomous agent whose belongings interact through iterative cycles of collective washing, sorting, and partial correction. The system's evolution is characterized by random mixing events, selective recollection, and attrition over time. Implemented using the SimPy discrete-event simulation framework, the model demonstrates that even minimal exchange probabilities can generate emergent asymmetries, quasi-equilibrium distributions, and long-term disorder. The findings illustrate how stochastic processes inherent to shared domestic systems can produce persistent imbalances, offering a quantitative perspective on an everyday social phenomenon.
Natural-gas storage modelling by deep reinforcement learning
We introduce GasRL, a simulator that couples a calibrated representation of the natural gas market with a model of storage-operator policies trained with deep reinforcement learning (RL). We use it to analyse how optimal stockpile management affects equilibrium prices and the dynamics of demand and supply. We test various RL algorithms and find that Soft Actor Critic (SAC) exhibits superior performance in the GasRL environment: multiple objectives of storage operators - including profitability, robust market clearing and price stabilisation - are successfully achieved. Moreover, the equilibrium price dynamics induced by SAC-derived optimal policies have characteristics, such as volatility and seasonality, that closely match those of real-world prices. Remarkably, this adherence to the historical distribution of prices is obtained without explicitly calibrating the model to price data. We show how the simulator can be used to assess the effects of EU-mandated minimum storage thresholds. We find that such thresholds have a positive effect on market resilience against unanticipated shifts in the distribution of supply shocks. For example, with unusually large shocks, market disruptions are averted more often if a threshold is in place.
comment: 8 pages, 5 figures, published on
ISAC Empowered Air-Sea Collaborative System: A UAV-USV Joint Inspection Framework
In this paper, we construct an air-sea collaborative system framework based on the Integrated Sensing and Communication (ISAC) techniques, where the Unmanned Aerial Vehicle (UAV) and Unmanned Surface Vehicle (USV) jointly inspect targets of interest while keeping communication with each other simultaneously. First, we demonstrate the unique challenges encountered in this collaborative system, i.e., the coupling and heterogeneity of the UAV/USV's trajectories. Then, we formulate a total energy consumption minimization problem to jointly optimize the trajectories, flying and hovering times, target scheduling, and beamformers under the constraints of water currents, collision avoidance, and Sensing and Communication (S\&C) requirements. To address the strong coupling of the variables, we divide the original problem into two subproblems, namely, the hover point selection and the joint trajectory planning and beamforming design. In the first subproblem, we propose a three-step hierarchical method including: (1) a virtual base station coverage (VBSC) and clustering algorithm to obtain the target scheduling and rough position of hover points; (2) a Bi-traveling salesman problem with neighborhood (Bi-TSPN)-based algorithm to determine the visiting order sequence of the hover points; (3) a hover point refinement and time allocation algorithm to further optimize the time allocation. In the latter subproblem, we complete the remaining trajectory planning and beamforming design in each flying and hovering stage by developing a semi-definite relaxation (SDR) and successive convex approximation (SCA) method. Finally, we conduct a series of simulations to demonstrate the superiority of the proposed scheme over existing sequential access and leader-follower strategies.
comment: 13 pages, 15 figures
Analytical Framework for Assessing Effective Regional Inertia
This paper proposes a novel formulation of effective regional inertia that explicitly accounts for both system topology and the spatial distribution of inertia. Unlike traditional approaches that model a region as an aggregated machine with an equivalent inertia, the proposed metric provides a topology-aware representation. The methodology builds on an analytical framework that extends classical slow coherency theory to address network partitioning and regional frequency stability. Based on these partitions, we develop a systematic procedure to evaluate the effective inertia of each region, enabling a more accurate interpretation of local inertial contributions, including those from virtual inertia provided by inverter-based resources (IBRs). Case studies on the IEEE 39-bus and 68-bus systems demonstrate that the integration of inertial devices does not uniformly improve system frequency response, underscoring the importance of the proposed metric for effective regional inertia assessment.
Reliability entails input-selective contraction and regulation in excitable networks
The animal nervous system offers a model of computation combining digital reliability and analog efficiency. Understanding how this sweet spot can be realized is a core question of neuromorphic engineering. To this aim, this paper explores the connection between reliability, contraction, and regulation in excitable systems. Using the FitzHugh-Nagumo model of excitable behavior as a proof-of-concept, it is shown that neuronal reliability can be formalized as an average trajectory contraction property induced by the input. In excitable networks, reliability is shown to enable regulation of the network to a robustly stable steady state. It is thus posited that regulation provides a notion of dynamical analog computation, and that stability makes such a computation model robust.
Many-vs-Many Missile Guidance via Virtual Targets
This paper presents a novel approach to many-vs-many missile guidance using virtual targets (VTs) generated by a Normalizing Flows-based trajectory predictor. Rather than assigning n interceptors directly to m physical targets through conventional weapon target assignment algorithms, we propose a centralized strategy that constructs n VT trajectories representing probabilistic predictions of maneuvering target behavior. Each interceptor is guided toward its assigned VT using Zero-Effort-Miss guidance during midcourse flight, transitioning to Proportional Navigation guidance for terminal interception. This approach treats many-vs-many engagements as many-vs-distribution scenarios, exploiting numerical superiority (n > m) by distributing interceptors across diverse trajectory hypotheses rather than pursuing identical deterministic predictions. Monte Carlo simulations across various target-interceptor configurations (1-6 targets, 1-8 interceptors) demonstrate that the VT method matches or exceeds baseline straight-line prediction performance by 0-4.1% when n = m, with improvements increasing to 5.8-14.4% when n > m. The results confirm that probabilistic VTs enable effective exploitation of numerical superiority, significantly increasing interception probability in many-vs-many scenarios.
comment: will be submitted to Journal of Guidance, Control, and Dynamics as Technical Note
Decentralized Approach to Detect and Eliminate Flapping Phenomena due to Flexible Resources
This paper presents a decentralized methodology for detecting and mitigating flapping phenomena in power systems, primarily caused by the operation of discrete devices. The proposed approach applies moving-window autocorrelation to local measurements, enabling each device to autonomously identify sustained oscillations. Upon detection, a probabilistic, device-specific mitigation strategy is executed. Flexible demand resources (DFRs), under-load tap changers (ULTCs), and automatic voltage regulators (AVRs) are utilised to illustrate the performance of the proposed approach to both discrete and continuous-operation devices. Results show that the proposed method is robust and properly distinguishes damped oscillations from persistent flapping, allowing devices to independently recognize problematic operating scenarios and implement corrective actions accordingly.
Before AI Takes Over: Rethinking Nonlinear Signal Processing in Communications
There is an urgent reflection on traditional nonlinear signal processing methods in communications before Artificial Intelligence (AI) dominates the field. It implies a need to reassess or reinterpret established theories and tools, highlighting the tension between data-driven and model-based approaches. This paper calls for preserving valuable insights from classical signal processing while exploring how they can coexist or integrate with emerging AI methods.
comment: Submitted to npj Wireless Technology
Coherency among Power System Devices
The paper proposes a novel general definition of coherency among power system devices of any type. The proposed approach is thus not limited to synchronous machines. With this aim, the paper shows that coherency can be formally based on the difference in the complex frequency of the current injections of any two devices electrically connected to the same grid. The proposed definition is model-agnostic, making it general and suitable for modern power systems composed of a heterogeneous mix of technologies. The paper also provides a systematic analytical procedure to study the properties that specific device models must satisfy to be coherent. Time-domain simulations are conducted in three case studies whose results illustrate the ability of our definition to evaluate coherency among any type of device.
Using ensemble learning with hybrid graph neural networks and transformers to predict traffic in cities
Intelligent transportation systems (ITS) still have a hard time accurately predicting traffic in cities, especially in big, multimodal settings with complicated spatiotemporal dynamics. This paper presents HybridST, a hybrid architecture that integrates Graph Neural Networks (GNNs), multi-head temporal Transformers, and supervised ensemble learning methods (XGBoost or Random Forest) to collectively capture spatial dependencies, long-range temporal patterns, and exogenous signals, including weather, calendar, or control states. We test our model on the METR-LA, PEMS-BAY, and Seattle Loop tree public benchmark datasets. These datasets include situations ranging from freeway sensor networks to vehicle-infrastructure cooperative perception. Experimental results show that HybridST consistently beats classical baselines (LSTM, GCN, DCRNN, PDFormer) on important metrics like MAE and RMSE, while still being very scalable and easy to understand. The proposed framework presents a promising avenue for real-time urban mobility planning, energy optimization, and congestion alleviation strategies, especially within the framework of smart cities and significant events such as the 2030 FIFA World Cup.
Generalized Swing Control Framework for Inverter-based Resources
This paper proposes a novel control framework designed for Inverter-Based Resources (IBRs), denoted as Generalized Swing Control (GSC). The proposed GSC framework generalizes the definition of Grid-Forming (GFM) control schemes and exploits the coupling between active and reactive power dynamics. To validate the proposed scheme, we conduct extensive time-domain simulations and small-signal analysis using a modified version of the WSCC 9-bus system and a 1479-bus dynamic model of the all-island Irish transmission system. The case studies focus on evaluating the dynamic performance of the proposed framework under different configurations, including Virtual Synchronous Machine (VSM), coupled-VSM and dual-VSM schemes. To address the nonlinear nature of power system dynamics, sensitivity analysis based on Monte Carlo methods are employed to improve parameter tuning and assess the stability of GSC configurations in the studied systems.
Decentralized Voltage Control of AC Microgrids with Constant Power Loads using Control Barrier Functions
This paper proposes a novel nonlinear decentralized voltage controller for constrained regulation of meshed AC Microgrid networks with high penetration of constant power loads. Perceiving the load demand as an unknown disturbance, the network model is reformulated in a cascaded structure composed of a nominal, i.e. uncertainty-free, and an error subsystem. The latter captures the distance between the true and the nominal state trajectories, for which we prove boundedness via a suitable control barrier function. Under sufficient conditions, we prove asymptotic stability of the cascaded dynamics with respect to an equilibrium set and also provide an estimate of the region of attraction. In addition, it is rigorously shown that the proposed nonlinear control law also enforces constrained regulation around a rated voltage value, without the need of saturation devices. The operation of the closed-loop system is illustrated in a simulation scenario, demonstrating bounded operation and convergence to a neighbourhood of the desired reference vector.
comment: 12 pages
Explicit MPC for the constrained zonotope case with low-rank matrix updates
Solving the explicit Model Predictive Control (MPC) problem requires enumerating all critical regions and their associated feedback laws, a task that scales exponentially with the system dimension and the prediction horizon, as well. When the problem's constraints are boxes or zonotopes, the feasible domain admits a compact constrained-zonotope representation. Building on this insight, we exploit the geometric properties of the equivalent constrained-zonotope reformulation to accelerate the computation of the explicit solution. Specifically, we formulate the multi-parametric problem in the lifted generator space and solve it using second-order optimality conditions, employ low-rank matrix updates to reduce computation time, and introduce an analytic enumeration of candidate active sets that yields the explicit solution in tree form.
A Kullback-Leibler divergence method for input-system-state identification
The capability of a novel Kullback-Leibler divergence method is examined herein within the Kalman filter framework to select the input-parameter-state estimation execution with the most plausible results. This identification suffers from the uncertainty related to obtaining different results from different initial parameter set guesses, and the examined approach uses the information gained from the data in going from the prior to the posterior distribution to address the issue. Firstly, the Kalman filter is performed for a number of different initial parameter sets providing the system input-parameter-state estimation. Secondly, the resulting posterior distributions are compared simultaneously to the initial prior distributions using the Kullback-Leibler divergence. Finally, the identification with the least Kullback-Leibler divergence is selected as the one with the most plausible results. Importantly, the method is shown to select the better performed identification in linear, nonlinear, and limited information applications, providing a powerful tool for system monitoring.
comment: 32 pages, 17 figures, published in Journal of Sound and Vibration
Constrained Performance Boosting Control for Nonlinear Systems via ADMM
We present the Alternating Direction Method of Multipliers for Performance Boosting (ADMM-PB), an approach to design performance boosting controllers for stable or pre-stabilized nonlinear systems, while explicitly seeking input and state constraint satisfaction. Rooted on a recently proposed approach for designing neural-network controllers that guarantees closed-loop stability by design while minimizing generic cost functions, our strategy integrates it within an alternating direction method of multipliers routine to seek constraint handling without modifying the controller structure of the aforementioned seminal strategy. Our numerical results showcase the advantages of the proposed approach over a baseline penalizing constraint violation through barrier-like terms in the cost, indicating that ADMM-PB can lead to considerably lower constraint violations at the price of inducing slightly more cautious closed-loop behaviors.
H-Infinity Filter Enhanced CNN-LSTM for Arrhythmia Detection from Heart Sound Recordings ICSE
Early detection of heart arrhythmia can prevent severe future complications in cardiac patients. While manual diagnosis still remains the clinical standard, it relies heavily on visual interpretation and is inherently subjective. In recent years, deep learning has emerged as a powerful tool to automate arrhythmia detection, offering improved accuracy, consistency, and efficiency. Several variants of convolutional and recurrent neural network architectures have been widely explored to capture spatial and temporal patterns in physiological signals. However, despite these advancements, current models often struggle to generalize well in real-world scenarios, especially when dealing with small or noisy datasets, which are common challenges in biomedical applications. In this paper, a novel CNN-H-Infinity-LSTM architecture is proposed to identify arrhythmic heart signals from heart sound recordings. This architecture introduces trainable parameters inspired by the H-Infinity filter from control theory, enhancing robustness and generalization. Extensive experimentation on the PhysioNet CinC Challenge 2016 dataset, a public benchmark of heart audio recordings, demonstrates that the proposed model achieves stable convergence and outperforms existing benchmarks, with a test accuracy of 99.42% and an F1 score of 98.85%.
comment: This is a preprint of a paper to appear at the 15th IEEE International Conference on Systems Engineering and Technology (ICSET 2025)
ZJUNlict Extended Team Description Paper 2025
This paper presents the ZJUNlict team's work over the past year, covering both hardware and software advancements. In the hardware domain, the integration of an IMU into the v2023 robot was completed to enhance posture accuracy and angular velocity planning. On the software side, key modules were optimized, including the strategy and CUDA modules, with significant improvements in decision making efficiency, ball pursuit prediction, and ball possession prediction to adapt to high-tempo game dynamics.
Performance Analysis of NOMA-Assisted Optical OFDM ISAC Systems with Clipping Distortion
This paper studies the performance of optical orthogonal frequency-division multiplexing (OFDM)-based multi-user integrated sensing and communication (ISAC) systems employing non-orthogonal multiple access (NOMA). Due to their inherent high peak-to-average power ratio (PAPR), OFDM waveforms are clipped to fit the limited dynamic range of the optical transmitters (e.g., light-emitting diodes (LEDs)), resulting in clipping distortion. To alleviate the impact of the distortion, we propose a novel transmitter architecture where the clipping processes are performed before NOMA superposition coding. We then analyze the performance of the proposed optical ISAC systems considering the effects of power allocation and clipping distortion. For the communication subsystem, we analyze the effect of NOMA on the achievable sum rate and bit error rate (BER). For the sensing subsystem, the root mean square error (RMSE) and Cram\'er-Rao bound (CRB) of estimating the transmission distance accuracy are obtained. Simulation results reveal that allocating more power to the strong user yields a higher sum rate, lower BER, and better sensing performance, whereas a more balanced power allocation among users results in degraded BER and sensing performance.
A Reliability-Cost Optimization Framework for EV and DER Integration in Standard and Reconfigurable Distribution Network Topologies
The rapid growth of electric vehicle (EV) adoption poses operational and economic challenges for power distribution systems, including increased line loading levels and network congestions. This may require potential infrastructure reinforcement and expansion. As a fast inexpensive alternative solution, network topology reconfiguration (NTR) offers a practical means to redistribute power flows, reduce operational costs, and defer infrastructure upgrades. This paper presents a linear programming framework to evaluate the impact of varying EV penetration on operational costs under four configurations: standard distribution network (SDN), SDN with NTR (SDNTR), SDN with distributed energy resources (SDN-DER), and SDNTR with DERs (SDNTR-DER). Numerical simulations are conducted on the IEEE 33-bus system. The analysis demonstrates that integrating DERs reduces operational costs, while NTR further enhances system flexibility, enabling higher EV penetration levels without compromising feasibility. The combined SDNTR-DER approach offers the most cost-effective and reliable pathway for accommodating future EV growth while mitigating the need for immediate infrastructure upgrades.
Online Distributed Zeroth-Order Optimization With Non-Zero-Mean Adverse Noises
In this paper, the problem of online distributed zeroth-order optimization subject to a set constraint is studied via a multi-agent network, where each agent can communicate with its immediate neighbors via a time-varying directed graph. Different from the existing works on online distributed zeroth- order optimization, we consider the case where the estimate on the gradients are influenced by some non-zero-mean adverse noises. To handle this problem, we propose a new online dis- tributed zeroth-order mirror descent algorithm involving a kernel function-based estimator and a clipped strategy. Particularly, in the estimator, the kernel function-based strategy is provided to deal with the adverse noises, and eliminate the low-order terms in the Taylor expansions of the objective functions. Furthermore, the performance of the presented algorithm is measured by employing the dynamic regrets, where the offline benchmarks are to find the optimal point at each time. Under the mild assumptions on the graph and the objective functions, we prove that if the variation in the optimal point sequence grows at a certain rate, then the high probability bound of the dynamic regrets increases sublinearly. Finally, a simulation experiment is worked out to demonstrate the effectiveness of our theoretical results.
Near Optimal Convergence to Coarse Correlated Equilibrium in General-Sum Markov Games
No-regret learning dynamics play a central role in game theory, enabling decentralized convergence to equilibrium for concepts such as Coarse Correlated Equilibrium (CCE) or Correlated Equilibrium (CE). In this work, we improve the convergence rate to CCE in general-sum Markov games, reducing it from the previously best-known rate of $\mathcal{O}(\log^5 T / T)$ to a sharper $\mathcal{O}(\log T / T)$. This matches the best known convergence rate for CE in terms of $T$, number of iterations, while also improving the dependence on the action set size from polynomial to polylogarithmic-yielding exponential gains in high-dimensional settings. Our approach builds on recent advances in adaptive step-size techniques for no-regret algorithms in normal-form games, and extends them to the Markovian setting via a stage-wise scheme that adjusts learning rates based on real-time feedback. We frame policy updates as an instance of Optimistic Follow-the-Regularized-Leader (OFTRL), customized for value-iteration-based learning. The resulting self-play algorithm achieves, to our knowledge, the fastest known convergence rate to CCE in Markov games.
Census-Based Population Autonomy For Distributed Robotic Teaming
Collaborating teams of robots show promise due in their ability to complete missions more efficiently and with improved robustness, attributes that are particularly useful for systems operating in marine environments. A key issue is how to model, analyze, and design these multi-robot systems to realize the full benefits of collaboration, a challenging task since the domain of multi-robot autonomy encompasses both collective and individual behaviors. This paper introduces a layered model of multi-robot autonomy that uses the principle of census, or a weighted count of the inputs from neighbors, for collective decision-making about teaming, coupled with multi-objective behavior optimization for individual decision-making about actions. The census component is expressed as a nonlinear opinion dynamics model and the multi-objective behavior optimization is accomplished using interval programming. This model can be reduced to recover foundational algorithms in distributed optimization and control, while the full model enables new types of collective behaviors that are useful in real-world scenarios. To illustrate these points, a new method for distributed optimization of subgroup allocation is introduced where robots use a gradient descent algorithm to minimize portions of the cost functions that are locally known, while being influenced by the opinion states from neighbors to account for the unobserved costs. With this method the group can collectively use the information contained in the Hessian matrix of the total global cost. The utility of this model is experimentally validated in three categorically different experiments with fleets of autonomous surface vehicles: an adaptive sampling scenario, a high value unit protection scenario, and a competitive game of capture the flag.
comment: 16 pages, 17 figures
Microgrids optimal radial reconfiguration via FORWARD algorithm
Microgrids offer a promising paradigm for integrating distributed energy resources, bolstering energy resilience, and reducing the impact of blackouts. However, their inherent decentralization and dynamic operation present substantial energy management complexities. These complexities, including balancing supply and demand, ensuring system stability, and minimizing operational costs, often necessitate solving computationally intractable NP-hard Mixed-Integer Non-Linear Programming (MINLP) problems. Traditional MINLP solvers struggle with the scalability and feasibility guarantees required for these challenges. To address this, this paper tackles the problem of resource allocation and radial configuration design for microgrid power distribution and proposes and abstracted problem which is solved by introducing a permutation-based iterative search method over the recently introduced FORWARD method to efficiently identify feasible, near-optimal radial network structures while inherently respecting physical constraints. Furthermore, this paper investigates the integration of the proposed method as a warm-start strategy for benchmark MINLP solvers offering a scalable solution for comprehensive microgrid design.
Quantifying Power Systems Resilience Using Statistical Analysis and Bayesian Learning
The increasing frequency and intensity of extreme weather events is significantly affecting the power grid, causing large-scale outages and impacting power system resilience. Yet limited work has been done on systematically modeling the impacts of weather parameters to quantify resilience. This study presents a framework using statistical and Bayesian learning approaches to quantitatively model the relationship between weather parameters and power system resilience metrics. By leveraging real-world publicly available outage and weather data, we identify key weather variables of wind speed, temperature, and precipitation influencing a particular region's resilience metrics. A case study of Cook County, Illinois, and Miami-Dade County, Florida, reveals that these weather parameters are critical factors in resiliency analysis and risk assessment. Additionally, we find that these weather variables have combined effects when studied jointly compared to their effects in isolation. This framework provides valuable insights for understanding how weather events affect power distribution system performance, supporting decision-makers in developing more effective strategies for risk mitigation, resource allocation, and adaptation to changing climatic conditions.
Distributed Incast Detection in Data Center Networks
Incast traffic in data centers can lead to severe performance degradation, such as packet loss and increased latency. Effectively addressing incast requires prompt and accurate detection. Existing solutions, including MA-ECN, BurstRadar and Pulser, typically rely on fixed thresholds of switch port egress queue lengths or their gradients to identify microburst caused by incast flows. However, these queue length related methods often suffer from delayed detection and high error rates. In this study, we propose a distributed incast detection method for data center networks at the switch-level, leveraging a probabilistic hypothesis test with an optimal detection threshold. By analyzing the arrival intervals of new flows, our algorithm can immediately determine if a flow is part of an incast traffic from its initial packet. The experimental results demonstrate that our method offers significant improvements over existing approaches in both detection speed and inference accuracy.
Oscillation Analysis and Damping Control for a Proposed North American AC-DC Macrogrid
In recent years, several studies conducted by both industry and U.S. Department of Energy (DOE)-funded initiatives have proposed linking North America's Eastern and Western Interconnections (EI and WI) through a multiterminal DC (MTDC) macrogrid. These studies have explored the advantages and opportunities of the proposed configuration from the perspectives of capacity sharing and frequency support. However, the potential challenges of small-signal stability arising from this interconnection have not been thoroughly examined. To address this gap, detailed model-based simulation studies are performed in this paper to assess the risks of poorly damped inter-area oscillations in the proposed macrogrid. A custom-built dynamic model of the MTDC system is developed and integrated with industry-grade models of the EI and WI, incorporating high levels of inverter-based energy resources. Through model-based oscillation analysis, potential shifts in inter-area modes for both EI and WI, resulting from the MTDC integration are characterized, and modes with inadequate damping are identified. Furthermore, to mitigate the risks of unstable oscillations, supplementary damping controllers are designed for the MTDC system, leveraging wide-area feedback to modulate active power set points at selected converter stations. A frequency scanning approach is employed for data-driven model linearization and controller synthesis. The damping performance is evaluated under the designed operating conditions and selected contingency scenarios.
Robust reduced-order model predictive control using peak-to-peak analysis of filtered signals
We address the design of a model predictive control (MPC) scheme for large-scale linear systems using reduced-order models (ROMs). Our approach uses a ROM, leverages tools from robust control, and integrates them into an MPC framework to achieve computational tractability with robust constraint satisfaction. Our key contribution is a method to obtain guaranteed bounds on the predicted outputs of the full-order system by predicting a (scalar) error-bounding system alongside the ROM. This bound is then used to formulate a robust ROM-based MPC that guarantees constraint satisfaction and robust performance. Our method is developed step-by-step by (i) analysing the error, (ii) bounding the peak-to-peak gain, an (iii) using filtered signals. We demonstrate our method on a 100-dimensional mass-spring-damper system, achieving over four orders of magnitude reduction in conservatism relative to existing approaches.
comment: Code available at: https://github.com/KohlerJohannes/ROM_MPC_ECC
Observer-based neural networks for flow estimation and control
Neural network observers (NNOs) are proposed for real-time estimation of fluid flows, addressing a key challenge in flow control: obtaining real-time flow states from a limited set of sparse and noisy sensor data. For this task, we propose a generalization of the classical Luenberger observer. In the present framework, the estimation loop is composed of subsystems modeled as neural networks (NNs). By combining flow information from selected probes and an NN surrogate model (NNSM) of the flow system, we train NNOs capable of fusing information to provide the best estimation of the states, that can in turn be fed back to an NN controller (NNC). The NNO capabilities are demonstrated for three nonlinear dynamical systems. First, a variation of the Kuramoto-Sivashinsky (KS) equation with control inputs is studied, where variables are sparsely probed. We show that the NNO is able to track states even when probes are contaminated with random noise or with sensors at insufficient sample rates to match the control time step. Then, a confined cylinder flow is investigated, where velocity signals along the cylinder wake are estimated by using a small set of wall pressure sensors. In both the KS and cylinder problems, we show that the estimated states can be used to enable closed-loop control, taking advantage of stabilizing NNCs. Finally, we present a legacy dataset of a turbulent boundary layer experiment, where convolutional NNs (CNNs) are employed to implement the models required for the estimation loop. We show that, by combining low-resolution noise-corrupted sensor data with an imperfect NNSM, it is possible to produce more accurate estimates, outperforming both the direct reconstructions via specialized super-resolution NNs and the direct model propagation from initial conditions.
Digital Twin-Driven Pavement Health Monitoring and Maintenance Optimization Using Graph Neural Networks
Pavement infrastructure monitoring is challenged by complex spatial dependencies, changing environmental conditions, and non-linear deterioration across road networks. Traditional Pavement Management Systems (PMS) remain largely reactive, lacking real-time intelligence for failure prevention and optimal maintenance planning. To address this, we propose a unified Digital Twin (DT) and Graph Neural Network (GNN) framework for scalable, data-driven pavement health monitoring and predictive maintenance. Pavement segments and spatial relations are modeled as graph nodes and edges, while real-time UAV, sensor, and LiDAR data stream into the DT. The inductive GNN learns deterioration patterns from graph-structured inputs to forecast distress and enable proactive interventions. Trained on a real-world-inspired dataset with segment attributes and dynamic connectivity, our model achieves an R2 of 0.3798, outperforming baseline regressors and effectively capturing non-linear degradation. We also develop an interactive dashboard and reinforcement learning module for simulation, visualization, and adaptive maintenance planning. This DT-GNN integration enhances forecasting precision and establishes a closed feedback loop for continuous improvement, positioning the approach as a foundation for proactive, intelligent, and sustainable pavement management, with future extensions toward real-world deployment, multi-agent coordination, and smart-city integration.
Toward an Agricultural Operational Design Domain: A Framework
The agricultural sector increasingly relies on autonomous systems that operate in complex and variable environments. Unlike on-road applications, agricultural automation integrates driving and working processes, each of which imposes distinct operational constraints. Handling this complexity and ensuring consistency throughout the development and validation processes requires a structured, transparent, and verified description of the environment. However, existing Operational Design Domain (ODD) concepts do not yet address the unique challenges of agricultural applications. Therefore, this work introduces the Agricultural ODD (Ag-ODD) Framework, which can be used to describe and verify the operational boundaries of autonomous agricultural systems. The Ag-ODD Framework consists of three core elements. First, the Ag-ODD description concept, which provides a structured method for unambiguously defining environmental and operational parameters using concepts from ASAM Open ODD and CityGML. Second, the 7-Layer Model derived from the PEGASUS 6-Layer Model, has been extended to include a process layer to capture dynamic agricultural operations. Third, the iterative verification process verifies the Ag-ODD against its corresponding logical scenarios, derived from the 7-Layer Model, to ensure the Ag-ODD's completeness and consistency. Together, these elements provide a consistent approach for creating unambiguous and verifiable Ag-ODD. Demonstrative use cases show how the Ag-ODD Framework can support the standardization and scalability of environmental descriptions for autonomous agricultural systems.
comment: 18 pages, 7 figures, 2 tables
Guided Bayesian Optimization: Data-Efficient Controller Tuning with Digital Twin
This article presents the guided Bayesian optimization algorithm as an efficient data-driven method for iteratively tuning closed-loop controller parameters using an event-triggered digital twin of the system based on available closed-loop data. We define a controller tuning framework independent of the controller or the plant structure. Our proposed methodology is model-free, making it suitable for nonlinear and unmodelled plants with measurement noise. The objective function consists of performance metrics modeled by Gaussian processes. We utilize the available information in the closed-loop system to identify and progressively maintain a digital twin that guides the optimizer, improving the data efficiency of our method. Switching the digital twin on and off is triggered by data-driven criteria related to the digital twin's uncertainty estimations in the BO tuning framework. Effectively, it replaces much of the exploration of the real system with exploration performed on the digital twin. We analyze the properties of our method in simulation and demonstrate its performance on two real closed-loop systems with different plant and controller structures. The experimental results show that our method requires fewer experiments on the physical plant than Bayesian optimization to find the optimal controller parameters.
comment: This work has been published in IEEE Transactions on Automation Science and Engineering
Drift Plus Optimistic Penalty: A Learning Framework for Stochastic Network Optimization with Improved Regret Bounds
We consider the problem of joint routing and scheduling in queueing networks, where the edge transmission costs are unknown. At each time-slot, the network controller receives noisy observations of transmission costs only for those edges it selects for transmission. The network controller's objective is to make routing and scheduling decisions so that the total expected cost is minimized. This problem exhibits an exploration-exploitation trade-off, however, previous bandit-style solutions cannot be directly applied to this problem due to the queueing dynamics. In order to ensure network stability, the network controller needs to optimize throughput and cost simultaneously. We show that the best achievable cost is lower bounded by the solution to a static optimization problem, and develop a network control policy using techniques from Lyapunov drift-plus-penalty optimization and multi-arm bandits. We show that the policy achieves a sub-linear regret of order $O(\sqrt{T}\log T)$, as compared to the best policy that has complete knowledge of arrivals and costs. Finally, we evaluate the proposed policy using simulations and show that its regret is indeed sub-linear.
Constrained Optimal Fuel Consumption of HEVs under Observational Noise
In our prior work, we investigated the minimum fuel consumption of a hybrid electric vehicle (HEV) under a state-of-charge (SOC) balance constraint, assuming perfect SOC measurements and accurate reference speed profiles. The constrained optimal fuel consumption (COFC) problem was addressed using a constrained reinforcement learning (CRL) framework. However, in real-world scenarios, SOC readings are often corrupted by sensor noise, and reference speeds may deviate from actual driving conditions. To account for these imperfections, this study reformulates the COFC problem by explicitly incorporating observational noise in both SOC and reference speed. We adopt a robust CRL approach, where the noise is modeled as a uniform distribution, and employ a structured training procedure to ensure stability. The proposed method is evaluated through simulations on the Toyota Prius hybrid system (THS), using both the New European Driving Cycle (NEDC) and the Worldwide Harmonized Light Vehicles Test Cycle (WLTC). Results show that fuel consumption and SOC constraint satisfaction remain robust across varying noise levels. Furthermore, the analysis reveals that observational noise in SOC and speed can impact fuel consumption to different extents. To the best of our knowledge, this is the first study to explicitly examine how observational noise -- commonly encountered in dynamometer testing and predictive energy control (PEC) applications -- affects constrained optimal fuel consumption in HEVs.
comment: Minor text and figure adjustments; no substantive changes
Virtual Target Trajectory Prediction for Stochastic Targets
Trajectory prediction of aerial vehicles is a key requirement in applications ranging from missile guidance to UAV collision avoidance. While most prediction methods assume deterministic target motion, real-world targets often exhibit stochastic behaviors such as evasive maneuvers or random gliding patterns. This paper introduces a probabilistic framework based on Conditional Normalizing Flows (CNFs) to model and predict such stochastic dynamics directly from trajectory data. The learned model generates probability distributions of future target positions conditioned on initial states and dynamic parameters, enabling efficient sampling and exact density evaluation. To provide deterministic surrogates compatible with existing guidance and planning algorithms, sampled trajectories are clustered using a time series k-means approach, yielding a set of representative "virtual target" trajectories. The method is target-agnostic, computationally efficient, and requires only trajectory data for training, making it suitable as a drop-in replacement for deterministic predictors. Simulated scenarios with maneuvering and ballistic targets demonstrate that the proposed approach bridges the gap between deterministic assumptions and stochastic reality, advancing guidance and control algorithms for autonomous vehicles.
comment: Manuscript accepted by Journal of Guidance, Control, and Dynamics
Constrained computational hybrid controller for Input Affine Hybrid Dynamical Systems
Hybrid dynamical systems are viewed as the most complicated systems with continuous and event-based behaviors. Since traditional controllers cannot handle these systems, some newly-developed controllers have been published in recent decades to deal with them. This paper presents a novel implementable constrained final-state controller based on partitioning the system's state-space, computational simulations, and graph theory. Experimental results and a comparison with Model Predictive Controller on the three tank benchmark and swing-up control of a pendulum show the effectiveness of the proposed Computational Hybrid Controller(CHC).
A moving horizon estimator for aquifer thermal energy storages
Aquifer thermal energy storages (ATES) represent groundwater saturated aquifers that store thermal energy in the form of heated or cooled groundwater. Combining two ATES, one can harness excess thermal energy from summer (heat) and winter (cold) to support the building's heating, ventilation, and air conditioning (HVAC) technology. In general, a dynamic operation of ATES throughout the year is beneficial to avoid using fossil fuel-based HVAC technology and maximize the ``green use'' of ATES. Model predictive control (MPC) with an appropriate system model may become a crucial control approach for ATES systems. Consequently, the MPC model should reflect spatial temperature profiles around ATES' boreholes to predict extracted groundwater temperatures accurately. However, meaningful predictions require the estimation of the current state of the system, as measurements are usually only at the borehole of the ATES. In control, this is often realized by model-based observers. Still, observing the state of an ATES system is non-trivial, since the model is typically hybrid. We show how to exploit the specific structure of the hybrid ATES model and design an easy-to-solve moving horizon estimator based on a quadratic program.
comment: European Control Conference 2025 (ECC), Thessaloniki, Greece
Chance-Constrained Neural MPC under Uncontrollable Agents via Sequential Convex Programming
This work investigates the challenge of ensuring safety guarantees under uncontrollable agents whose behaviors are stochastic and depend on both their own and the system's states. We present a neural model predictive control (MPC) framework that predicts the trajectory of the uncontrollable agent using a predictor learned from offline data. To provide probabilistic guarantees on prediction errors, we employ split conformal prediction to construct region-specific, time-dependent uncertainty bounds, which are integrated into the MPC formulation. To solve the resulting non-convex, discontinuous optimization problem, we propose a two-loop iterative sequential convex programming algorithm. The inner loop solves convexified subproblems with fixed error bounds, while the outer loop refines these bounds based on updated control sequences. We establish convergence guarantees under mild regularity conditions and demonstrate the optimality of the algorithm. We illustrate our method with an autonomous driving scenario involving interactive pedestrians. Experimental results demonstrate that our approach achieves superior safety and efficiency compared to baseline methods, with success rates exceeding 99.5\% while maintaining higher average speeds in multi-pedestrian scenarios.
Communicating Plans, Not Percepts: Scalable Multi-Agent Coordination with Embodied World Models NeurIPS 2025
Robust coordination is critical for effective decision-making in multi-agent systems, especially under partial observability. A central question in Multi-Agent Reinforcement Learning (MARL) is whether to engineer communication protocols or learn them end-to-end. We investigate this dichotomy using embodied world models. We propose and compare two communication strategies for a cooperative task-allocation problem. The first, Learned Direct Communication (LDC), learns a protocol end-to-end. The second, Intention Communication, uses an engineered inductive bias: a compact, learned world model, the Imagined Trajectory Generation Module (ITGM), which uses the agent's own policy to simulate future states. A Message Generation Network (MGN) then compresses this plan into a message. We evaluate these approaches on goal-directed interaction in a grid world, a canonical abstraction for embodied AI problems, while scaling environmental complexity. Our experiments reveal that while emergent communication is viable in simple settings, the engineered, world model-based approach shows superior performance, sample efficiency, and scalability as complexity increases. These findings advocate for integrating structured, predictive models into MARL agents to enable active, goal-driven coordination.
comment: Published in the Proceedings of the 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: Scaling Environments for Agents (SEA). Additionally accepted for presentation in the NeurIPS 2025 Workshop: Embodied World Models for Decision Making (EWM) and the NeurIPS 2025 Workshop: Optimization for Machine Learning (OPT)
Neural Network Aided Kalman Filtering with Model Predictive Control Enables Robot-Assisted Drone Recovery on a Wavy Surface
Recovering a drone on a disturbed water surface remains a significant challenge in maritime robotics. In this paper, we propose a unified framework for robot-assisted drone recovery on a wavy surface that addresses two major tasks: Firstly, accurate prediction of a moving drone's position under wave-induced disturbances using KalmanNet Plus Plus (KalmanNet++), a Neural Network Aided Kalman Filtering we proposed. Secondly, effective motion planning using the desired position we got for a manipulator via Receding Horizon Model Predictive Control (RHMPC). Specifically, we compared multiple prediction methods and proposed KalmanNet Plus Plus to predict the position of the UAV, thereby obtaining the desired position. The KalmanNet++ predicts the drone's future position 0.1\,s ahead, while the manipulator plans a capture trajectory in real time, thus overcoming not only wave-induced base motions but also limited constraints such as torque constraints and joint constraints. For the system design, we provide a collaborative system, comprising a manipulator subsystem and a UAV subsystem, enables drone lifting and drone recovery. Simulation and real-world experiments using wave-disturbed motion data demonstrate that our approach achieves a high success rate - above 95\% and outperforms conventional baseline methods by up to 10\% in efficiency and 20\% in precision. The results underscore the feasibility and robustness of our system, which achieves state-of-the-art performance and offers a practical solution for maritime drone operations.
comment: 17 pages, 51 figures
On the Number of Control Nodes in Boolean Networks with Degree Constraints
This paper studies the minimum control node set problem for Boolean networks (BNs) with degree constraints. The main contribution is to derive the nontrivial lower and upper bounds on the size of the minimum control node set through combinatorial analysis of four types of BNs (i.e., $k$-$k$-XOR-BNs, simple $k$-$k$-AND-BNs, $k$-$k$-AND-BNs with negation and $k$-$k$-NC-BNs, where the $k$-$k$-AND-BN with negation is an extension of the simple $k$-$k$-AND-BN that considers the occurrence of negation and NC means nested canalyzing). More specifically, four bounds for the size of the minimum control node set: general lower bound, best case upper bound, worst case lower bound, and general upper bound are studied. By dividing nodes into three disjoint sets, extending the time to reach the target state, and utilizing necessary conditions for controllability, these bounds are obtained, and further meaningful results and phenomena are discovered. Notably, all of the above results involving the AND function also apply to the OR function.
comment: 35 pages, 9 figures
Machine Learning-assisted Dynamics-Constrained Day-Ahead Energy Scheduling
TThe rapid expansion of inverter-based resources, such as wind and solar power plants, will significantly diminish the presence of conventional synchronous generators in fu-ture power grids with rich renewable energy sources. This transition introduces in-creased complexity and reduces dynamic stability in system operation and control, with low inertia being a widely recognized challenge. However, the literature has not thoroughly explored grid dynamic performance associated with energy scheduling so-lutions that traditionally only consider grid steady-state constraints. This paper will bridge the gap by enforcing grid dynamic constraints when conducting optimal energy scheduling; particularly, this paper explores locational post-contingency rate of change of frequency (RoCoF) requirements to accommodate substantial inertia reductions. This paper introduces a machine learning-assisted RoCoF-constrained unit commit-ment (ML-RCUC) model designed to ensure RoCoF stability after the most severe generator outage while maintaining operational efficiency. A graph-informed NN (GINN)-based RoCoF predictor is first trained on a high-fidelity simulation dataset to track the highest locational RoCoF, which is then reformulated as mixed-integer linear programming constraints that are integrated into the unit commitment model. Case studies, by solving the optimization problem ML-RCUC and validating its solutions with time-domain simulations, demonstrate that the proposed method can ensure loca-tional RoCoF stability with minimum conservativeness.
Improving the Accuracy of DC Optimal Power Flow Formulations via Parameter Optimization
DC Optimal Power Flow (DC-OPF) problems optimize the generators' active power setpoints while satisfying constraints based on the DC power flow linearization. The computational tractability advantages of DC-OPF problems come at the expense of inaccuracies relative to AC Optimal Power Flow (AC-OPF) problems that accurately model the nonlinear steady-state behavior of power grids. This paper proposes an algorithm that significantly improves the accuracy of the generators' active power setpoints from DC-OPF problems with respect to the corresponding AC-OPF problems over a specified range of operating conditions. Using sensitivity information in a machine learning-inspired methodology, this algorithm tunes coefficient and bias parameters in the DC power flow approximation to improve the accuracy of the resulting DC-OPF solutions. Employing the Truncated Newton Conjugate-Gradient (TNC) method -- a Quasi-Newton optimization technique -- this parameter tuning occurs during an offline training phase, with the resulting parameters then used in online computations. Numerical results underscore the algorithm's efficacy with accuracy improvements in squared two-norm and $\infty$-norm losses of up to $90\%$ and $79\%$, respectively, relative to traditional DC-OPF formulations.
Human-Exoskeleton Kinematic Calibration to Improve Hand Tracking for Dexterous Teleoperation
Hand exoskeletons are critical tools for dexterous teleoperation and immersive manipulation interfaces, but achieving accurate hand tracking remains a challenge due to user-specific anatomical variability and donning inconsistencies. These issues lead to kinematic misalignments that degrade tracking performance and limit applicability in precision tasks. We propose a subject-specific calibration framework for exoskeleton-based hand tracking that estimates virtual link parameters through residual-weighted optimization. A data-driven approach is introduced to empirically tune cost function weights using motion capture ground truth, enabling accurate and consistent calibration across users. Implemented on the Maestro hand exoskeleton with seven healthy participants, the method achieved substantial reductions in joint and fingertip tracking errors across diverse hand geometries. Qualitative visualizations using a Unity-based virtual hand further demonstrate improved motion fidelity. The proposed framework generalizes to exoskeletons with closed-loop kinematics and minimal sensing, laying the foundation for high-fidelity teleoperation and robot learning applications.
comment: 8 pages, 10 figures, 1 supplementary video, submitted to RA-L
Robotics
TACO: Trajectory-Aware Controller Optimization for Quadrotors ICRA 2026
Controller performance in quadrotor trajectory tracking depends heavily on parameter tuning, yet standard approaches often rely on fixed, manually tuned parameters that sacrifice task-specific performance. We present Trajectory-Aware Controller Optimization (TACO), a framework that adapts controller parameters online based on the upcoming reference trajectory and current quadrotor state. TACO employs a learned predictive model and a lightweight optimization scheme to optimize controller gains in real time with respect to a broad class of trajectories, and can also be used to adapt trajectories to improve dynamic feasibility while respecting smoothness constraints. To enable large-scale training, we also introduce a parallelized quadrotor simulator supporting fast data collection on diverse trajectories. Experiments on a variety of trajectory types show that TACO outperforms conventional, static parameter tuning while operating orders of magnitude faster than black-box optimization baselines, enabling practical real-time deployment on a physical quadrotor. Furthermore, we show that adapting trajectories using TACO significantly reduces the tracking error obtained by the quadrotor.
comment: 8 pages, 6 figures. In submission to ICRA 2026
TurboMap: GPU-Accelerated Local Mapping for Visual SLAM ICRA 2026
This paper presents TurboMap, a GPU-accelerated and CPU-optimized local mapping module for visual SLAM systems. We identify key performance bottlenecks in the local mapping process for visual SLAM and address them through targeted GPU and CPU optimizations. Specifically, we offload map point triangulation and fusion to the GPU, accelerate redundant keyframe culling on the CPU, and integrate a GPU-accelerated solver to speed up local bundle adjustment. Our implementation is built on top of ORB-SLAM3 and leverages CUDA for GPU programming. The experimental results show that TurboMap achieves an average speedup of 1.3x in the EuRoC dataset and 1.6x in the TUM-VI dataset in the local mapping module, on both desktop and embedded platforms, while maintaining the accuracy of the original system.
comment: Submitted to ICRA 2026
Path-Coordinated Continual Learning with Neural Tangent Kernel-Justified Plasticity: A Theoretical Framework with Near State-of-the-Art Performance
Catastrophic forgetting is one of the fundamental issues of continual learning because neural networks forget the tasks learned previously when trained on new tasks. The proposed framework is a new path-coordinated framework of continual learning that unites the Neural Tangent Kernel (NTK) theory of principled plasticity bounds, statistical validation by Wilson confidence intervals, and evaluation of path quality by the use of multiple metrics. Experimental evaluation shows an average accuracy of 66.7% at the cost of 23.4% catastrophic forgetting on Split-CIFAR10, a huge improvement over the baseline and competitive performance achieved, which is very close to state-of-the-art results. Further, it is found out that NTK condition numbers are predictive indicators of learning capacity limits, showing the existence of a critical threshold at condition number $>10^{11}$. It is interesting to note that the proposed strategy shows a tendency of lowering forgetting as the sequence of tasks progresses (27% to 18%), which is a system stabilization. The framework validates 80% of discovered paths with a rigorous statistical guarantee and maintains 90-97% retention on intermediate tasks. The core capacity limits of the continual learning environment are determined in the analysis, and actionable insights to enhance the adaptive regularization are offered.
comment: Under review, IEEE Letters
Stein-based Optimization of Sampling Distributions in Model Predictive Path Integral Control
This paper presents a novel method for Model Predictive Path Integral (MPPI) control that optimizes sample generation towards an optimal trajectory through Stein Variational Gradient Descent (SVGD). MPPI is traditionally reliant on randomly sampled trajectories, often by a Gaussian distribution. The result can lead to sample deprivation, under-representing the space of possible trajectories, and yield suboptimal results. Through introducing SVGD updates in between MPPI environment steps, we present Stein-Optimized Path-Integral Inference (SOPPI), an MPPI/SVGD algorithm that can dynamically update noise distributions at runtime to shape a more optimal representation without an excessive increase in computational requirements. We demonstrate the efficacy of our method systems ranging from a Cart-Pole to a two-dimensional bipedal walking task, indicating improved performance above standard MPPI across a range of hyper-parameters and demonstrate feasibility at lower particle counts. We discuss the applicability of this MPPI/SVGD method to higher degree-of-freedom systems, as well as its potential to new developments in state-of-the-art differentiable simulators.
comment: 8 pages, 6 figures
TRACE: Textual Reasoning for Affordance Coordinate Extraction ICCV 2025
Vision-Language Models (VLMs) struggle to translate high-level instructions into the precise spatial affordances required for robotic manipulation. While visual Chain-of-Thought (CoT) methods exist, they are often computationally intensive. In this work, we introduce TRACE (Textual Reasoning for Affordance Coordinate Extraction), a novel methodology that integrates a textual Chain of Reasoning (CoR) into the affordance prediction process. We use this methodology to create the TRACE dataset, a large-scale collection created via an autonomous pipeline that pairs instructions with explicit textual rationales. By fine-tuning a VLM on this data, our model learns to externalize its spatial reasoning before acting. Our experiments show that our TRACE-tuned model achieves state-of-the-art performance, reaching 48.1% accuracy on the primary Where2Place (W2P) benchmark (a 9.6% relative improvement) and 55.0% on the more challenging W2P(h) subset. Crucially, an ablation study demonstrates that performance scales directly with the amount of reasoning data used, confirming the CoR's effectiveness. Furthermore, analysis of the model's attention maps reveals an interpretable reasoning process where focus shifts dynamically across reasoning steps. This work shows that training VLMs to generate a textual CoR is an effective and robust strategy for enhancing the precision, reliability, and interpretability of VLM-based robot control. Our dataset and code are available at https://github.com/jink-ucla/TRACE
comment: ICCV 2025. *Equal contribution. {\dag}Corresponding author
Hybrid Neural Network-Based Indoor Localisation System for Mobile Robots Using CSI Data in a Robotics Simulator
We present a hybrid neural network model for inferring the position of mobile robots using Channel State Information (CSI) data from a Massive MIMO system. By leveraging an existing CSI dataset, our approach integrates a Convolutional Neural Network (CNN) with a Multilayer Perceptron (MLP) to form a Hybrid Neural Network (HyNN) that estimates 2D robot positions. CSI readings are converted into synthetic images using the TINTO tool. The localisation solution is integrated with a robotics simulator, and the Robot Operating System (ROS), which facilitates its evaluation through heterogeneous test cases, and the adoption of state estimators like Kalman filters. Our contributions illustrate the potential of our HyNN model in achieving precise indoor localisation and navigation for mobile robots in complex environments. The study follows, and proposes, a generalisable procedure applicable beyond the specific use case studied, making it adaptable to different scenarios and datasets.
comment: 13 pages, 7 figures. Conference paper (ROBOVIS 2025)
Fractional Diffusion Bridge Models NeurIPS 2025
We present Fractional Diffusion Bridge Models (FDBM), a novel generative diffusion bridge framework driven by an approximation of the rich and non-Markovian fractional Brownian motion (fBM). Real stochastic processes exhibit a degree of memory effects (correlations in time), long-range dependencies, roughness and anomalous diffusion phenomena that are not captured in standard diffusion or bridge modeling due to the use of Brownian motion (BM). As a remedy, leveraging a recent Markovian approximation of fBM (MA-fBM), we construct FDBM that enable tractable inference while preserving the non-Markovian nature of fBM. We prove the existence of a coupling-preserving generative diffusion bridge and leverage it for future state prediction from paired training data. We then extend our formulation to the Schr\"{o}dinger bridge problem and derive a principled loss function to learn the unpaired data translation. We evaluate FDBM on both tasks: predicting future protein conformations from aligned data, and unpaired image translation. In both settings, FDBM achieves superior performance compared to the Brownian baselines, yielding lower root mean squared deviation (RMSD) of C$_\alpha$ atomic positions in protein structure prediction and lower Fr\'echet Inception Distance (FID) in unpaired image translation.
comment: To appear in NeurIPS 2025 proceedings. This version includes post-camera-ready revisions
GenDexHand: Generative Simulation for Dexterous Hands
Data scarcity remains a fundamental bottleneck for embodied intelligence. Existing approaches use large language models (LLMs) to automate gripper-based simulation generation, but they transfer poorly to dexterous manipulation, which demands more specialized environment design. Meanwhile, dexterous manipulation tasks are inherently more difficult due to their higher degrees of freedom. Massively generating feasible and trainable dexterous hand tasks remains an open challenge. To this end, we present GenDexHand, a generative simulation pipeline that autonomously produces diverse robotic tasks and environments for dexterous manipulation. GenDexHand introduces a closed-loop refinement process that adjusts object placements and scales based on vision-language model (VLM) feedback, substantially improving the average quality of generated environments. Each task is further decomposed into sub-tasks to enable sequential reinforcement learning, reducing training time and increasing success rates. Our work provides a viable path toward scalable training of diverse dexterous hand behaviors in embodied intelligence by offering a simulation-based solution to synthetic data generation. Our website: https://winniechen2002.github.io/GenDexHand/.
MOBIUS: A Multi-Modal Bipedal Robot that can Walk, Crawl, Climb, and Roll
This article presents a Multi-Modal Bipedal Intelligent Urban Scout robot (MOBIUS) capable of walking, crawling, climbing, and rolling. MOBIUS features four limbs--two 6-DoF arms with two-finger grippers for manipulation and climbing, and two 4-DoF legs for locomotion--enabling smooth transitions across diverse terrains without reconfiguration. A hybrid control architecture combines reinforcement learning-based locomotion with model-based predictive and admittance control enhanced for safety by a Reference Governor toward compliant contact interactions. A high-level MIQCP planner autonomously selects locomotion modes to balance stability and energy efficiency. Hardware experiments demonstrate robust gait transitions, dynamic climbing, and full-body load support via pinch grasp. Overall, MOBIUS demonstrates the importance of tight integration between morphology, high-level planning, and control to enable mobile loco-manipulation and grasping, substantially expanding its interaction capabilities, workspace, and traversability.
comment: 23 pages, 20 figures. Collaborative work between the Robotics and Mechanisms Laboratory (RoMeLa) and Mitsubishi Electric Research Laboratories (MERL)
Lightweight Learning from Actuation-Space Demonstrations via Flow Matching for Whole-Body Soft Robotic Grasping
Robotic grasping under uncertainty remains a fundamental challenge due to its uncertain and contact-rich nature. Traditional rigid robotic hands, with limited degrees of freedom and compliance, rely on complex model-based and heavy feedback controllers to manage such interactions. Soft robots, by contrast, exhibit embodied mechanical intelligence: their underactuated structures and passive flexibility of their whole body, naturally accommodate uncertain contacts and enable adaptive behaviors. To harness this capability, we propose a lightweight actuation-space learning framework that infers distributional control representations for whole-body soft robotic grasping, directly from deterministic demonstrations using a flow matching model (Rectified Flow),without requiring dense sensing or heavy control loops. Using only 30 demonstrations (less than 8% of the reachable workspace), the learned policy achieves a 97.5% grasp success rate across the whole workspace, generalizes to grasped-object size variations of +-33%, and maintains stable performance when the robot's dynamic response is directly adjusted by scaling the execution time from 20% to 200%. These results demonstrate that actuation-space learning, by leveraging its passive redundant DOFs and flexibility, converts the body's mechanics into functional control intelligence and substantially reduces the burden on central controllers for this uncertain-rich task.
3EED: Ground Everything Everywhere in 3D NeurIPS 2025
Visual grounding in 3D is the key for embodied agents to localize language-referred objects in open-world environments. However, existing benchmarks are limited to indoor focus, single-platform constraints, and small scale. We introduce 3EED, a multi-platform, multi-modal 3D grounding benchmark featuring RGB and LiDAR data from vehicle, drone, and quadruped platforms. We provide over 128,000 objects and 22,000 validated referring expressions across diverse outdoor scenes -- 10x larger than existing datasets. We develop a scalable annotation pipeline combining vision-language model prompting with human verification to ensure high-quality spatial grounding. To support cross-platform learning, we propose platform-aware normalization and cross-modal alignment techniques, and establish benchmark protocols for in-domain and cross-platform evaluations. Our findings reveal significant performance gaps, highlighting the challenges and opportunities of generalizable 3D grounding. The 3EED dataset and benchmark toolkit are released to advance future research in language-driven 3D embodied perception.
comment: NeurIPS 2025 DB Track; 29 pages, 17 figures, 10 tables; Project Page at https://project-3eed.github.io/
Unified Diffusion VLA: Vision-Language-Action Model via Joint Discrete Denoising Diffusion Process
Vision-language-action (VLA) models aim to understand natural language instructions and visual observations and to execute corresponding actions as an embodied agent. Recent work integrates future images into the understanding-acting loop, yielding unified VLAs that jointly understand, generate, and act -- reading text and images and producing future images and actions. However, these models either rely on external experts for modality unification or treat image generation and action prediction as separate processes, limiting the benefits of direct synergy between these tasks. Our core philosophy is to optimize generation and action jointly through a synchronous denoising process, where the iterative refinement enables actions to evolve from initialization, under constant and sufficient visual guidance. We ground this philosophy in our proposed Unified Diffusion VLA and Joint Discrete Denoising Diffusion Process (JD3P), which is a joint diffusion process that integrates multiple modalities into a single denoising trajectory to serve as the key mechanism enabling understanding, generation, and acting to be intrinsically synergistic. Our model and theory are built on a unified tokenized space of all modalities and a hybrid attention mechanism. We further propose a two-stage training pipeline and several inference-time techniques that optimize performance and efficiency. Our approach achieves state-of-the-art performance on benchmarks such as CALVIN, LIBERO, and SimplerEnv with 4$\times$ faster inference than autoregressive methods, and we demonstrate its effectiveness through in-depth analysis and real-world evaluations. Our project page is available at https://irpn-eai.github.io/UD-VLA.github.io/.
MARS: Multi-Agent Robotic System with Multimodal Large Language Models for Assistive Intelligence
Multimodal large language models (MLLMs) have shown remarkable capabilities in cross-modal understanding and reasoning, offering new opportunities for intelligent assistive systems, yet existing systems still struggle with risk-aware planning, user personalization, and grounding language plans into executable skills in cluttered homes. We introduce MARS - a Multi-Agent Robotic System powered by MLLMs for assistive intelligence and designed for smart home robots supporting people with disabilities. The system integrates four agents: a visual perception agent for extracting semantic and spatial features from environment images, a risk assessment agent for identifying and prioritizing hazards, a planning agent for generating executable action sequences, and an evaluation agent for iterative optimization. By combining multimodal perception with hierarchical multi-agent decision-making, the framework enables adaptive, risk-aware, and personalized assistance in dynamic indoor environments. Experiments on multiple datasets demonstrate the superior overall performance of the proposed system in risk-aware planning and coordinated multi-agent execution compared with state-of-the-art multimodal models. The proposed approach also highlights the potential of collaborative AI for practical assistive scenarios and provides a generalizable methodology for deploying MLLM-enabled multi-agent systems in real-world environments.
comment: 3 figures, 1 table; under review at Multimedia Systems (Springer)
PixelVLA: Advancing Pixel-level Understanding in Vision-Language-Action Model
Vision-Language-Action models (VLAs) are emerging as powerful tools for learning generalizable visuomotor control policies. However, current VLAs are mostly trained on large-scale image-text-action data and remain limited in two key ways: (i) they struggle with pixel-level scene understanding, and (ii) they rely heavily on textual prompts, which reduces their flexibility in real-world settings. To address these challenges, we introduce PixelVLA, the first VLA model designed to support both pixel-level reasoning and multimodal prompting with text and visual inputs. Our approach is built on a new visuomotor instruction tuning framework that integrates a multiscale pixel-aware encoder with a visual prompting encoder. To train PixelVLA effectively, we further propose a two-stage automated annotation pipeline that generates Pixel-160K, a large-scale dataset with pixel-level annotations derived from existing robot data. Experiments on three standard VLA benchmarks and two VLA model variants show that PixelVLA improves manipulation success rates by 10.1%-17.8% over OpenVLA, while requiring only 1.5% of its pretraining cost. These results demonstrate that PixelVLA can be integrated into existing VLAs to enable more accurate, efficient, and versatile robot control in complex environments. The dataset and code will be released as open source.
comment: 17pages,7 figures, 5 tabels
Phy-Tac: Toward Human-Like Grasping via Physics-Conditioned Tactile Goals
Humans naturally grasp objects with minimal level required force for stability, whereas robots often rely on rigid, over-squeezing control. To narrow this gap, we propose a human-inspired physics-conditioned tactile method (Phy-Tac) for force-optimal stable grasping (FOSG) that unifies pose selection, tactile prediction, and force regulation. A physics-based pose selector first identifies feasible contact regions with optimal force distribution based on surface geometry. Then, a physics-conditioned latent diffusion model (Phy-LDM) predicts the tactile imprint under FOSG target. Last, a latent-space LQR controller drives the gripper toward this tactile imprint with minimal actuation, preventing unnecessary compression. Trained on a physics-conditioned tactile dataset covering diverse objects and contact conditions, the proposed Phy-LDM achieves superior tactile prediction accuracy, while the Phy-Tac outperforms fixed-force and GraspNet-based baselines in grasp stability and force efficiency. Experiments on classical robotic platforms demonstrate force-efficient and adaptive manipulation that bridges the gap between robotic and human grasping.
comment: 9 papges, 10 figures, 3 tables
Discriminately Treating Motion Components Evolves Joint Depth and Ego-Motion Learning
Unsupervised learning of depth and ego-motion, two fundamental 3D perception tasks, has made significant strides in recent years. However, most methods treat ego-motion as an auxiliary task, either mixing all motion types or excluding depth-independent rotational motions in supervision. Such designs limit the incorporation of strong geometric constraints, reducing reliability and robustness under diverse conditions. This study introduces a discriminative treatment of motion components, leveraging the geometric regularities of their respective rigid flows to benefit both depth and ego-motion estimation. Given consecutive video frames, network outputs first align the optical axes and imaging planes of the source and target cameras. Optical flows between frames are transformed through these alignments, and deviations are quantified to impose geometric constraints individually on each ego-motion component, enabling more targeted refinement. These alignments further reformulate the joint learning process into coaxial and coplanar forms, where depth and each translation component can be mutually derived through closed-form geometric relationships, introducing complementary constraints that improve depth robustness. DiMoDE, a general depth and ego-motion joint learning framework incorporating these designs, achieves state-of-the-art performance on multiple public datasets and a newly collected diverse real-world dataset, particularly under challenging conditions. Our source code will be publicly available at mias.group/DiMoDE upon publication.
comment: 18 pages, 14 figures
SE(3)-PoseFlow: Estimating 6D Pose Distributions for Uncertainty-Aware Robotic Manipulation
Object pose estimation is a fundamental problem in robotics and computer vision, yet it remains challenging due to partial observability, occlusions, and object symmetries, which inevitably lead to pose ambiguity and multiple hypotheses consistent with the same observation. While deterministic deep networks achieve impressive performance under well-constrained conditions, they are often overconfident and fail to capture the multi-modality of the underlying pose distribution. To address these challenges, we propose a novel probabilistic framework that leverages flow matching on the SE(3) manifold for estimating 6D object pose distributions. Unlike existing methods that regress a single deterministic output, our approach models the full pose distribution with a sample-based estimate and enables reasoning about uncertainty in ambiguous cases such as symmetric objects or severe occlusions. We achieve state-of-the-art results on Real275, YCB-V, and LM-O, and demonstrate how our sample-based pose estimates can be leveraged in downstream robotic manipulation tasks such as active perception for disambiguating uncertain viewpoints or guiding grasp synthesis in an uncertainty-aware manner.
Floor Plan-Guided Visual Navigation Incorporating Depth and Directional Cues
Guiding an agent to a specific target in indoor environments based solely on RGB inputs and a floor plan is a promising yet challenging problem. Although existing methods have made significant progress, two challenges remain unresolved. First, the modality gap between egocentric RGB observations and the floor plan hinders the integration of visual and spatial information for both local obstacle avoidance and global planning. Second, accurate localization is critical for navigation performance, but remains challenging at deployment in unseen environments due to the lack of explicit geometric alignment between RGB inputs and floor plans. We propose a novel diffusion-based policy, denoted as GlocDiff, which integrates global path planning from the floor plan with local depth-aware features derived from RGB observations. The floor plan offers explicit global guidance, while the depth features provide implicit geometric cues, collectively enabling precise prediction of optimal navigation directions and robust obstacle avoidance. Moreover, GlocDiff introduces noise perturbation during training to enhance robustness against pose estimation errors, and we find that combining this with a relatively stable VO module during inference results in significantly improved navigation performance. Extensive experiments on the FloNa benchmark demonstrate GlocDiff's efficiency and effectiveness in achieving superior navigation performance, and the success of real-world deployments also highlights its potential for widespread practical applications.
MO-SeGMan: Rearrangement Planning Framework for Multi Objective Sequential and Guided Manipulation in Constrained Environments
In this work, we introduce MO-SeGMan, a Multi-Objective Sequential and Guided Manipulation planner for highly constrained rearrangement problems. MO-SeGMan generates object placement sequences that minimize both replanning per object and robot travel distance while preserving critical dependency structures with a lazy evaluation method. To address highly cluttered, non-monotone scenarios, we propose a Selective Guided Forward Search (SGFS) that efficiently relocates only critical obstacles and to feasible relocation points. Furthermore, we adopt a refinement method for adaptive subgoal selection to eliminate unnecessary pick-and-place actions, thereby improving overall solution quality. Extensive evaluations on nine benchmark rearrangement tasks demonstrate that MO-SeGMan generates feasible motion plans in all cases, consistently achieving faster solution times and superior solution quality compared to the baselines. These results highlight the robustness and scalability of the proposed framework for complex rearrangement planning problems.
comment: 8 pages, 8 figures, website:https://sites.google.com/view/mo-segman/
AERMANI-VLM: Structured Prompting and Reasoning for Aerial Manipulation with Vision Language Models
The rapid progress of vision--language models (VLMs) has sparked growing interest in robotic control, where natural language can express the operation goals while visual feedback links perception to action. However, directly deploying VLM-driven policies on aerial manipulators remains unsafe and unreliable since the generated actions are often inconsistent, hallucination-prone, and dynamically infeasible for flight. In this work, we present AERMANI-VLM, the first framework to adapt pretrained VLMs for aerial manipulation by separating high-level reasoning from low-level control, without any task-specific fine-tuning. Our framework encodes natural language instructions, task context, and safety constraints into a structured prompt that guides the model to generate a step-by-step reasoning trace in natural language. This reasoning output is used to select from a predefined library of discrete, flight-safe skills, ensuring interpretable and temporally consistent execution. By decoupling symbolic reasoning from physical action, AERMANI-VLM mitigates hallucinated commands and prevents unsafe behavior, enabling robust task completion. We validate the framework in both simulation and hardware on diverse multi-step pick-and-place tasks, demonstrating strong generalization to previously unseen commands, objects, and environments.
Designing for Distributed Heterogeneous Modularity: On Software Architecture and Deployment of MoonBots SP
This paper presents the software architecture and deployment strategy behind the MoonBot platform: a modular space robotic system composed of heterogeneous components distributed across multiple computers, networks and ultimately celestial bodies. We introduce a principled approach to distributed, heterogeneous modularity, extending modular robotics beyond physical reconfiguration to software, communication and orchestration. We detail the architecture of our system that integrates component-based design, a data-oriented communication model using ROS2 and Zenoh, and a deployment orchestrator capable of managing complex multi-module assemblies. These abstractions enable dynamic reconfiguration, decentralized control, and seamless collaboration between numerous operators and modules. At the heart of this system lies our open-source Motion Stack software, validated by months of field deployment with self-assembling robots, inter-robot cooperation, and remote operation. Our architecture tackles the significant hurdles of modular robotics by significantly reducing integration and maintenance overhead, while remaining scalable and robust. Although tested with space in mind, we propose generalizable patterns for designing robotic systems that must scale across time, hardware, teams and operational environments.
comment: 6 pages, 8 figures. Accepted at ISPARO 2025
FoldPath: End-to-End Object-Centric Motion Generation via Modulated Implicit Paths IROS 2025
Object-Centric Motion Generation (OCMG) is instrumental in advancing automated manufacturing processes, particularly in domains requiring high-precision expert robotic motions, such as spray painting and welding. To realize effective automation, robust algorithms are essential for generating extended, object-aware trajectories across intricate 3D geometries. However, contemporary OCMG techniques are either based on ad-hoc heuristics or employ learning-based pipelines that are still reliant on sensitive post-processing steps to generate executable paths. We introduce FoldPath, a novel, end-to-end, neural field based method for OCMG. Unlike prior deep learning approaches that predict discrete sequences of end-effector waypoints, FoldPath learns the robot motion as a continuous function, thus implicitly encoding smooth output paths. This paradigm shift eliminates the need for brittle post-processing steps that concatenate and order the predicted discrete waypoints. Particularly, our approach demonstrates superior predictive performance compared to recently proposed learning-based methods, and attains generalization capabilities even in real industrial settings, where only a limited amount of 70 expert samples are provided. We validate FoldPath through comprehensive experiments in a realistic simulation environment and introduce new, rigorous metrics designed to comprehensively evaluate long-horizon robotic paths, thus advancing the OCMG task towards practical maturity.
comment: Accepted at 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025)
CaRLi-V: Camera-RADAR-LiDAR Point-Wise 3D Velocity Estimation
Accurate point-wise velocity estimation in 3D is crucial for robot interaction with non-rigid, dynamic agents, such as humans, enabling robust performance in path planning, collision avoidance, and object manipulation in dynamic environments. To this end, this paper proposes a novel RADAR, LiDAR, and camera fusion pipeline for point-wise 3D velocity estimation named CaRLi-V. This pipeline leverages raw RADAR measurements to create a novel RADAR representation, the velocity cube, which densely represents radial velocities within the RADAR's field-of-view. By combining the velocity cube for radial velocity extraction, optical flow for tangential velocity estimation, and LiDAR for point-wise range measurements through a closed-form solution, our approach can produce 3D velocity estimates for a dense array of points. Developed as an open-source ROS2 package, CaRLi-V has been field-tested against a custom dataset and proven to produce low velocity error metrics relative to ground truth, enabling point-wise velocity estimation for robotic applications.
EREBUS: End-to-end Robust Event Based Underwater Simulation ICRA
The underwater domain presents a vast array of challenges for roboticists and computer vision researchers alike, such as poor lighting conditions and high dynamic range scenes. In these adverse conditions, traditional vision techniques struggle to adapt and lead to suboptimal performance. Event-based cameras present an attractive solution to this problem, mitigating the issues of traditional cameras by tracking changes in the footage on a frame-by-frame basis. In this paper, we introduce a pipeline which can be used to generate realistic synthetic data of an event-based camera mounted to an AUV (Autonomous Underwater Vehicle) in an underwater environment for training vision models. We demonstrate the effectiveness of our pipeline using the task of rock detection with poor visibility and suspended particulate matter, but the approach can be generalized to other underwater tasks.
comment: Accepted to ICRA AQUA2SIM Workshop 2025, 6 pages, 3 figures, conference paper
CM-LIUW-Odometry: Robust and High-Precision LiDAR-Inertial-UWB-Wheel Odometry for Extreme Degradation Coal Mine Tunnels IROS 2025
Simultaneous Localization and Mapping (SLAM) in large-scale, complex, and GPS-denied underground coal mine environments presents significant challenges. Sensors must contend with abnormal operating conditions: GPS unavailability impedes scene reconstruction and absolute geographic referencing, uneven or slippery terrain degrades wheel odometer accuracy, and long, feature-poor tunnels reduce LiDAR effectiveness. To address these issues, we propose CoalMine-LiDAR-IMU-UWB-Wheel-Odometry (CM-LIUW-Odometry), a multimodal SLAM framework based on the Iterated Error-State Kalman Filter (IESKF). First, LiDAR-inertial odometry is tightly fused with UWB absolute positioning constraints to align the SLAM system with a global coordinate. Next, wheel odometer is integrated through tight coupling, enhanced by nonholonomic constraints (NHC) and vehicle lever arm compensation, to address performance degradation in areas beyond UWB measurement range. Finally, an adaptive motion mode switching mechanism dynamically adjusts the robot's motion mode based on UWB measurement range and environmental degradation levels. Experimental results validate that our method achieves superior accuracy and robustness in real-world underground coal mine scenarios, outperforming state-of-the-art approaches. We open source our code of this work on Github to benefit the robotics community.
comment: Accepted by IROS 2025
Lateral Velocity Model for Vehicle Parking Applications
Automated parking requires accurate localization for quick and precise maneuvering in tight spaces. While the longitudinal velocity can be measured using wheel encoders, the estimation of the lateral velocity remains a key challenge due to the absence of dedicated sensors in consumer-grade vehicles. Existing approaches often rely on simplified vehicle models, such as the zero-slip model, which assumes no lateral velocity at the rear axle. It is well established that this assumption does not hold during low-speed driving and researchers thus introduce additional heuristics to account for differences. In this work, we analyze real-world data from parking scenarios and identify a systematic deviation from the zero-slip assumption. We provide explanations for the observed effects and then propose a lateral velocity model that better captures the lateral dynamics of the vehicle during parking. The model improves estimation accuracy, while relying on only two parameters, making it well-suited for integration into consumer-grade applications.
comment: This manuscript has been submitted to Vehicle System Dynamics for possible publication
Model to Model: Understanding the Venus Flytrap Snapping Mechanism and Transferring it to a 3D-printed Bistable Soft Robotic Demonstrator
The Venus flytrap (Dionaea muscipula) does not only serve as the textbook model for a carnivorous plant, but also has long intrigued both botanists and engineers with its rapidly closing leaf trap. The trap closure is triggered by two consecutive touches of a potential prey, after which the lobes rapidly switch from their concave open-state to their convex close-state and catch the prey within 100-500 ms after being triggered. This transformation from concave to convex is initiated by changes in turgor pressure and the release of stored elastic energy from prestresses in the concave state, which accelerate this movement, leading to inversion of the lobes bi-axial curvature. Possessing two low-energy states, the leaves can be characterized as bistable systems. With our research, we seek to deepen the understanding of Venus flytrap motion mechanics and apply its principles to the design of an artificial bistable lobe actuator. We identified geometrical characteristics, such as dimensional ratios and the thickness gradient in the lobe, and transferred these to two 3D-printed bistable actuator models. One actuator parallels the simulated geometry of a Venus flytrap leaf, the other is a lobe model designed with CAD. Both models display concave-convex bi-stability and snap close. These demonstrators are the first step in the development of an artificial Venus flytrap that mimics the mechanical behavior of the biological model and can be used as a soft fast gripper.
comment: Conference Proceedings Paper Living machines 2025
Design and development of an electronics-free earthworm robot
Soft robotic systems have gained widespread attention due to their inherent flexibility, adaptability, and safety, making them well-suited for varied applications. Among bioinspired designs, earthworm locomotion has been extensively studied for its efficient peristaltic motion, enabling movement in confined and unstructured environments. Existing earthworm-inspired robots primarily utilize pneumatic actuation due to its high force-to-weight ratio and ease of implementation. However, these systems often rely on bulky, power-intensive electronic control units, limiting their practicality. In this work, we present an electronics-free, earthworm-inspired pneumatic robot utilizing a modified Pneumatic Logic Gate (PLG) design. By integrating preconfigured PLG units with bellow actuators, we achieved a plug-and-play style modular system capable of peristaltic locomotion without external electronic components. The proposed design reduces system complexity while maintaining efficient actuation. We characterize the bellow actuators under different operating conditions and evaluate the robots locomotion performance. Our findings demonstrate that the modified PLG-based control system effectively generates peristaltic wave propagation, achieving autonomous motion with minimal deviation. This study serves as a proof of concept for the development of electronics-free, peristaltic soft robots. The proposed system has potential for applications in hazardous environments, where untethered, adaptable locomotion is critical. Future work will focus on further optimizing the robot design and exploring untethered operation using onboard compressed air sources.
comment: Conference Proceedings Paper Living Machines 2025
Thermo-responsive closing and reopening artificial Venus Flytrap utilizing shape memory elastomers
Despite their often perceived static and slow nature, some plants can move faster than the blink of an eye. The rapid snap closure motion of the Venus flytrap (Dionaea muscipula) has long captivated the interest of researchers and engineers alike, serving as a model for plant-inspired soft machines and robots. The translation of the fast snapping closure has inspired the development of various artificial Venus flytrap (AVF) systems. However, translating both the closing and reopening motion of D. muscipula into an autonomous plant inspired soft machine has yet to be achieved. In this study, we present an AVF that autonomously closes and reopens, utilizing novel thermo-responsive UV-curable shape memory materials for soft robotic systems. The life-sized thermo-responsive AVF exhibits closing and reopening motions triggered in a naturally occurring temperature range. The doubly curved trap lobes, built from shape memory polymers, close at 38{\deg}C, while reopening initiates around 45{\deg}C, employing shape memory elastomer strips as antagonistic actuators to facilitate lobe reopening. This work represents the first demonstration of thermo-responsive closing and reopening in an AVF with programmed sequential motion in response to increasing temperature. This approach marks the next step toward autonomously bidirectional moving soft machines/robots.
comment: Conference Proceedings Paper Living Machines 2025
Embodied Cognition Augmented End2End Autonomous Driving
In recent years, vision-based end-to-end autonomous driving has emerged as a new paradigm. However, popular end-to-end approaches typically rely on visual feature extraction networks trained under label supervision. This limited supervision framework restricts the generality and applicability of driving models. In this paper, we propose a novel paradigm termed $E^{3}AD$, which advocates for comparative learning between visual feature extraction networks and the general EEG large model, in order to learn latent human driving cognition for enhancing end-to-end planning. In this work, we collected a cognitive dataset for the mentioned contrastive learning process. Subsequently, we investigated the methods and potential mechanisms for enhancing end-to-end planning with human driving cognition, using popular driving models as baselines on publicly available autonomous driving datasets. Both open-loop and closed-loop tests are conducted for a comprehensive evaluation of planning performance. Experimental results demonstrate that the $E^{3}AD$ paradigm significantly enhances the end-to-end planning performance of baseline models. Ablation studies further validate the contribution of driving cognition and the effectiveness of comparative learning process. To the best of our knowledge, this is the first work to integrate human driving cognition for improving end-to-end autonomous driving planning. It represents an initial attempt to incorporate embodied cognitive data into end-to-end autonomous driving, providing valuable insights for future brain-inspired autonomous driving systems. Our code will be made available at Github
comment: 24 pages,4 pages
RobustVLA: Robustness-Aware Reinforcement Post-Training for Vision-Language-Action Models
Vision-Language-Action (VLA) models have recently emerged as powerful general-purpose policies for robotic manipulation, benefiting from large-scale multi-modal pre-training. However, they often fail to generalize reliably in out-of-distribution deployments, where unavoidable disturbances such as observation noise, sensor errors, or actuation perturbations become prevalent. While recent Reinforcement Learning (RL)-based post-training provides a practical means to adapt pre-trained VLA models, existing methods mainly emphasize reward maximization and overlook robustness to environmental uncertainty. In this work, we introduce RobustVLA, a lightweight online RL post-training method designed to explicitly enhance the resilience of VLA models. Through a systematic robustness analysis, we identify two key regularizations: Jacobian regularization, which mitigates sensitivity to observation noise, and smoothness regularization, which stabilizes policies under action perturbations. Extensive experiments across diverse robotic environments demonstrate that RobustVLA significantly outperforms prior state-of-the-art methods in robustness and reliability. Our results highlight the importance of principled robustness-aware RL post-training as a key step toward improving the reliability and robustness of VLA models.
A High-Speed Capable Spherical Robot
This paper designs a new spherical robot structure capable of supporting high-speed motion at up to 10 m/s. Building upon a single-pendulum-driven spherical robot, the design incorporates a momentum wheel with an axis aligned with the secondary pendulum, creating a novel spherical robot structure. Practical experiments with the physical prototype have demonstrated that this new spherical robot can achieve stable high-speed motion through simple decoupled control, which was unattainable with the original structure. The spherical robot designed for high-speed motion not only increases speed but also significantly enhances obstacle-crossing performance and terrain robustness.
comment: 5 pages
Lyapunov Stability Learning with Nonlinear Control via Inductive Biases
Finding a control Lyapunov function (CLF) in a dynamical system with a controller is an effective way to guarantee stability, which is a crucial issue in safety-concerned applications. Recently, deep learning models representing CLFs have been applied into a learner-verifier framework to identify satisfiable candidates. However, the learner treats Lyapunov conditions as complex constraints for optimisation, which is hard to achieve global convergence. It is also too complicated to implement these Lyapunov conditions for verification. To improve this framework, we treat Lyapunov conditions as inductive biases and design a neural CLF and a CLF-based controller guided by this knowledge. This design enables a stable optimisation process with limited constraints, and allows end-to-end learning of both the CLF and the controller. Our approach achieves a higher convergence rate and larger region of attraction (ROA) in learning the CLF compared to existing methods among abundant experiment cases. We also thoroughly reveal why the success rate decreases with previous methods during learning.
comment: Accepted by IEEE Robio 2025
Contact Map Transfer with Conditional Diffusion Model for Generalizable Dexterous Grasp Generation
Dexterous grasp generation is a fundamental challenge in robotics, requiring both grasp stability and adaptability across diverse objects and tasks. Analytical methods ensure stable grasps but are inefficient and lack task adaptability, while generative approaches improve efficiency and task integration but generalize poorly to unseen objects and tasks due to data limitations. In this paper, we propose a transfer-based framework for dexterous grasp generation, leveraging a conditional diffusion model to transfer high-quality grasps from shape templates to novel objects within the same category. Specifically, we reformulate the grasp transfer problem as the generation of an object contact map, incorporating object shape similarity and task specifications into the diffusion process. To handle complex shape variations, we introduce a dual mapping mechanism, capturing intricate geometric relationship between shape templates and novel objects. Beyond the contact map, we derive two additional object-centric maps, the part map and direction map, to encode finer contact details for more stable grasps. We then develop a cascaded conditional diffusion model framework to jointly transfer these three maps, ensuring their intra-consistency. Finally, we introduce a robust grasp recovery mechanism, identifying reliable contact points and optimizing grasp configurations efficiently. Extensive experiments demonstrate the superiority of our proposed method. Our approach effectively balances grasp quality, generation efficiency, and generalization performance across various tasks. Project homepage: https://cmtdiffusion.github.io/
Design and Fabrication of Origami-Inspired Knitted Fabrics for Soft Robotics
Soft robots employing compliant materials and deformable structures offer great potential for wearable devices that are comfortable and safe for human interaction. However, achieving both structural integrity and compliance for comfort remains a significant challenge. In this study, we present a novel fabrication and design method that combines the advantages of origami structures with the material programmability and wearability of knitted fabrics. We introduce a general design method that translates origami patterns into knit designs by programming both stitch and material patterns. The method creates folds in preferred directions while suppressing unintended buckling and bending by selectively incorporating heat fusible yarn to create rigid panels around compliant creases. We experimentally quantify folding moments and show that stitch patterning enhances folding directionality while the heat fusible yarn (1) keeps geometry consistent by reducing edge curl and (2) prevents out-of-plane deformations by stiffening panels. We demonstrate the framework through the successful reproduction of complex origami tessellations, including Miura-ori, Yoshimura, and Kresling patterns, and present a wearable knitted Kaleidocycle robot capable of locomotion. The combination of structural reconfigurability, material programmability, and potential for manufacturing scalability highlights knitted origami as a promising platform for next-generation wearable robotics.
Improving Needle Penetration via Precise Rotational Insertion Using Iterative Learning Control
Achieving precise control of robotic tool paths is often challenged by inherent system misalignments, unmodeled dynamics, and actuation inaccuracies. This work introduces an Iterative Learning Control (ILC) strategy to enable precise rotational insertion of a tool during robotic surgery, improving penetration efficacy and safety compared to straight insertion tested in subretinal injection. A 4 degree of freedom (DOF) robot manipulator is used, where misalignment of the fourth joint complicates the simple application of needle rotation, motivating an ILC approach that iteratively adjusts joint commands based on positional feedback. The process begins with calibrating the forward kinematics for the chosen surgical tool to achieve higher accuracy, followed by successive ILC iterations guided by Optical Coherence Tomography (OCT) volume scans to measure the error and refine control inputs. Experimental results, tested on subretinal injection tasks on ex vivo pig eyes, show that the optimized trajectory resulted in higher success rates in tissue penetration and subretinal injection compared to straight insertion, demonstrating the effectiveness of ILC in overcoming misalignment challenges. This approach offers potential applications for other high precision robot tasks requiring controlled insertions as well.
comment: 10 pages, 10 figures
Don't Just Search, Understand: Semantic Path Planning Agent for Spherical Tensegrity Robots in Unknown Environments
Endowed with inherent dynamical properties that grant them remarkable ruggedness and adaptability, spherical tensegrity robots stand as prototypical examples of hybrid softrigid designs and excellent mobile platforms. However, path planning for these robots in unknown environments presents a significant challenge, requiring a delicate balance between efficient exploration and robust planning. Traditional path planners, which treat the environment as a geometric grid, often suffer from redundant searches and are prone to failure in complex scenarios due to their lack of semantic understanding. To overcome these limitations, we reframe path planning in unknown environments as a semantic reasoning task. We introduce a Semantic Agent for Tensegrity robots (SATPlanner) driven by a Large Language Model (LLM). SATPlanner leverages high-level environmental comprehension to generate efficient and reliable planning strategies.At the core of SATPlanner is an Adaptive Observation Window mechanism, inspired by the "fast" and "slow" thinking paradigms of LLMs. This mechanism dynamically adjusts the perceptual field of the agent: it narrows for rapid traversal of open spaces and expands to reason about complex obstacle configurations. This allows the agent to construct a semantic belief of the environment, enabling the search space to grow only linearly with the path length (O(L)) while maintaining path quality. We extensively evaluate SATPlanner in 1,000 simulation trials, where it achieves a 100% success rate, outperforming other real-time planning algorithms. Critically, SATPlanner reduces the search space by 37.2% compared to the A* algorithm while achieving comparable, near-optimal path lengths. Finally, the practical feasibility of SATPlanner is validated on a physical spherical tensegrity robot prototype.
comment: 8 pages, 5 figures
High-Precision Surgical Robotic System for Intraocular Procedures
Despite the extensive demonstration of robotic systems for both cataract and vitreoretinal procedures, existing technologies or mechanisms still possess insufficient accuracy, precision, and degrees of freedom for instrument manipulation or potentially automated tool exchange during surgical procedures. A new robotic system that focuses on improving tooltip accuracy, tracking performance, and smooth instrument exchange mechanism is therefore designed and manufactured. Its tooltip accuracy, precision, and mechanical capability of maintaining small incision through remote center of motion were externally evaluated using an optical coherence tomography (OCT) system. Through robot calibration and precise coordinate registration, the accuracy of tooltip positioning was measured to be 0.053$\pm$0.031 mm, and the overall performance was demonstrated on an OCT-guided automated cataract lens extraction procedure with deep learning-based pre-operative anatomical modeling and real-time supervision.
Embodiment Transfer Learning for Vision-Language-Action Models
Vision-language-action (VLA) models have significantly advanced robotic learning, enabling training on large-scale, cross-embodiment data and fine-tuning for specific robots. However, state-of-the-art autoregressive VLAs struggle with multi-robot collaboration. We introduce embodiment transfer learning, denoted as ET-VLA, a novel framework for efficient and effective transfer of pre-trained VLAs to multi-robot. ET-VLA's core is Synthetic Continued Pretraining (SCP), which uses synthetically generated data to warm up the model for the new embodiment, bypassing the need for real human demonstrations and reducing data collection costs. SCP enables the model to learn correct actions and precise action token numbers. Following SCP, the model is fine-tuned on target embodiment data. To further enhance the model performance on multi-embodiment, we present the Embodied Graph-of-Thought technique, a novel approach that formulates each sub-task as a node, that allows the VLA model to distinguish the functionalities and roles of each embodiment during task execution. Our work considers bimanual robots, a simple version of multi-robot to verify our approaches. We validate the effectiveness of our method on both simulation benchmarks and real robots covering three different bimanual embodiments. In particular, our proposed ET-VLA \space can outperform OpenVLA on six real-world tasks over 53.2%. We will open-source all codes to support the community in advancing VLA models for robot learning.
Saliency-Guided Domain Adaptation for Left-Hand Driving in Autonomous Steering
Domain adaptation is required for automated driving models to generalize well across diverse road conditions. This paper explores a training method for domain adaptation to adapt PilotNet, an end-to-end deep learning-based model, for left-hand driving conditions using real-world Australian highway data. Four training methods were evaluated: (1) a baseline model trained on U.S. right-hand driving data, (2) a model trained on flipped U.S. data, (3) a model pretrained on U.S. data and then fine-tuned on Australian highways, and (4) a model pretrained on flipped U.S. data and then finetuned on Australian highways. This setup examines whether incorporating flipped data enhances the model adaptation by providing an initial left-hand driving alignment. The paper compares model performance regarding steering prediction accuracy and attention, using saliency-based analysis to measure attention shifts across significant road regions. Results show that pretraining on flipped data alone worsens prediction stability due to misaligned feature representations, but significantly improves adaptation when followed by fine-tuning, leading to lower prediction error and stronger focus on left-side cues. To validate this approach across different architectures, the same experiments were done on ResNet, which confirmed similar adaptation trends. These findings emphasize the importance of preprocessing techniques, such as flipped-data pretraining, followed by fine-tuning to improve model adaptation with minimal retraining requirements.
Tackling the Kidnapped Robot Problem via Sparse Feasible Hypothesis Sampling and Reliable Batched Multi-Stage Inference
This paper addresses the Kidnapped Robot Problem (KRP), a core localization challenge of relocalizing a robot in a known map without prior pose estimate when localization loss or at SLAM initialization. For this purpose, a passive 2-D global relocalization framework is proposed. It estimates the global pose efficiently and reliably from a single LiDAR scan and an occupancy grid map while the robot remains stationary, thereby enhancing the long-term autonomy of mobile robots. The proposed framework casts global relocalization as a non-convex problem and solves it via the multi-hypothesis scheme with batched multi-stage inference and early termination, balancing completeness and efficiency. The Rapidly-exploring Random Tree (RRT), under traversability constraints, asymptotically covers the reachable space to generate sparse, uniformly distributed feasible positional hypotheses, fundamentally reducing the sampling space. The hypotheses are preliminarily ordered by the proposed Scan Mean Absolute Difference (SMAD), a coarse beam-error level metric that facilitates the early termination by prioritizing high-likelihood candidates. The SMAD computation is optimized for non-panoramic scans. And the Translation-Affinity Scan-to-Map Alignment Metric (TAM) is proposed for reliable orientation selection at hypothesized positions and accurate final pose evaluation to mitigate degradation in conventional likelihood-field metrics under translational uncertainty induced by sparse hypotheses, as well as non-panoramic LiDAR scan and environmental changes. Real-world experiments on a resource-constrained mobile robot with non-panoramic LiDAR scan demonstrate that the proposed framework outperforms existing methods in both global relocalization success rate and computational efficiency.
comment: 10 pages, 8 figures. This work has been submitted to the IEEE for possible publication
Closed-loop Control of Steerable Balloon Endoscopes for Robot-assisted Transcatheter Intracardiac Procedures
To move away from open-heart surgery towards safer transcatheter procedures, there is a growing need for improved imaging techniques and robotic solutions to enable simple, accurate tool navigation. Common imaging modalities, such as fluoroscopy and ultrasound, have limitations that can be overcome using cardioscopy, i.e., direct optical visualization inside the beating heart. We present a cardioscope designed as a steerable balloon. As a balloon, it can be collapsed to pass through the vasculature and subsequently inflated inside the heart for visualization and tool delivery through an integrated working channel. Through careful design of balloon wall thickness, a single input, balloon inflation pressure, is used to independently control two outputs, balloon diameter (corresponding to field of view diameter) and balloon bending angle (enabling precise working channel positioning). This balloon technology can be tuned to produce cardioscopes designed for a range of intracardiac tasks. To illustrate this approach, a balloon design is presented for the specific task of aortic leaflet laceration. Image-based closed-loop control of bending angle is also demonstrated as a means of enabling stable orientation control during tool insertion and removal.
comment: 8 pages, 11 figures
LiDAR-VGGT: Cross-Modal Coarse-to-Fine Fusion for Globally Consistent and Metric-Scale Dense Mapping
Reconstructing large-scale colored point clouds is an important task in robotics, supporting perception, navigation, and scene understanding. Despite advances in LiDAR inertial visual odometry (LIVO), its performance remains highly sensitive to extrinsic calibration. Meanwhile, 3D vision foundation models, such as VGGT, suffer from limited scalability in large environments and inherently lack metric scale. To overcome these limitations, we propose LiDAR-VGGT, a novel framework that tightly couples LiDAR inertial odometry with the state-of-the-art VGGT model through a two-stage coarse- to-fine fusion pipeline: First, a pre-fusion module with robust initialization refinement efficiently estimates VGGT poses and point clouds with coarse metric scale within each session. Then, a post-fusion module enhances cross-modal 3D similarity transformation, using bounding-box-based regularization to reduce scale distortions caused by inconsistent FOVs between LiDAR and camera sensors. Extensive experiments across multiple datasets demonstrate that LiDAR-VGGT achieves dense, globally consistent colored point clouds and outperforms both VGGT-based methods and LIVO baselines. The implementation of our proposed novel color point cloud evaluation toolkit will be released as open source.
Scaling Cross-Embodiment World Models for Dexterous Manipulation
Cross-embodiment learning seeks to build generalist robots that operate across diverse morphologies, but differences in action spaces and kinematics hinder data sharing and policy transfer. This raises a central question: Is there any invariance that allows actions to transfer across embodiments? We conjecture that environment dynamics are embodiment-invariant, and that world models capturing these dynamics can provide a unified interface across embodiments. To learn such a unified world model, the crucial step is to design state and action representations that abstract away embodiment-specific details while preserving control relevance. To this end, we represent different embodiments (e.g., human hands and robot hands) as sets of 3D particles and define actions as particle displacements, creating a shared representation for heterogeneous data and control problems. A graph-based world model is then trained on exploration data from diverse simulated robot hands and real human hands, and integrated with model-based planning for deployment on novel hardware. Experiments on rigid and deformable manipulation tasks reveal three findings: (i) scaling to more training embodiments improves generalization to unseen ones, (ii) co-training on both simulated and real data outperforms training on either alone, and (iii) the learned models enable effective control on robots with varied degrees of freedom. These results establish world models as a promising interface for cross-embodiment dexterous manipulation.
An Enhanced Proprioceptive Method for Soft Robots Integrating Bend Sensors and IMUs
This study presents an enhanced proprioceptive method for accurate shape estimation of soft robots using only off-the-shelf sensors, ensuring cost-effectiveness and easy applicability. By integrating inertial measurement units (IMUs) with complementary bend sensors, IMU drift is mitigated, enabling reliable long-term proprioception. A Kalman filter fuses segment tip orientations from both sensors in a mutually compensatory manner, improving shape estimation over single-sensor methods. A piecewise constant curvature model estimates the tip location from the fused orientation data and reconstructs the robot's deformation. Experiments under no loading, external forces, and passive obstacle interactions during 45 minutes of continuous operation showed a root mean square error of 16.96 mm (2.91% of total length), a 56% reduction compared to IMU-only benchmarks. These results demonstrate that our approach not only enables long-duration proprioception in soft robots but also maintains high accuracy and robustness across these diverse conditions.
Robotic Monitoring of Colorimetric Leaf Sensors for Precision Agriculture ICRA
Common remote sensing modalities (RGB, multispectral, hyperspectral imaging or LiDAR) are often used to indirectly measure crop health and do not directly capture plant stress indicators. Commercially available direct leaf sensors are bulky, powered electronics that are expensive and interfere with crop growth. In contrast, low-cost, passive and bio-degradable leaf sensors offer an opportunity to advance real-time monitoring as they directly interface with the crop surface while not interfering with crop growth. To this end, we co-design a sensor-detector system, where the sensor is a passive colorimetric leaf sensor that directly measures crop health in a precision agriculture setting, and the detector autonomously obtains optical signals from these leaf sensors. The detector comprises a low size weight and power (SWaP) mobile ground robot with an onboard monocular RGB camera and object detector to localize each leaf sensor, as well as a hyperspectral camera with a motorized mirror and halogen light to acquire hyperspectral images. The sensor's crop health-dependent optical signals can be extracted from the hyperspectral images. The proof-of-concept system is demonstrated in row-crop environments both indoors and outdoors where it is able to autonomously navigate, locate and obtain a hyperspectral image of all leaf sensors present, and acquire interpretable spectral resonance with 80 $\%$ accuracy within a required retrieval distance from the sensor.
comment: Revised version. Initial version was accepted to the Novel Approaches for Precision Agriculture and Forestry with Autonomous Robots IEEE ICRA Workshop - 2025
Kinematically Controllable Cable Robots with Reconfigurable End-effectors
To enlarge the translational workspace of cable-driven robots, one common approach is to increase the number of cables. However, this introduces two challenges: (1) cable interference significantly reduces the rotational workspace, and (2) the solution of tensions in cables becomes non-unique, resulting in difficulties for kinematic control of the robot. In this work, we design structurally simple reconfigurable end-effectors for cable robots. By incorporating a spring, a helical-grooved shaft, and a matching nut, relative linear motions between end-effector components are converted into relative rotations, thereby expanding the rotational workspace of the mechanism. Meanwhile, a bearing is introduced to provide an additional rotational degree of freedom, making the mechanism non-redundant. As a result, the robot's motion can be controlled purely through kinematics without additional tension sensing and control.
comment: 8 pages, 7 figures, Technical Report
Mixed-Density Diffuser: Efficient Planning with Non-uniform Temporal Resolution
Recent studies demonstrate that diffusion planners benefit from sparse-step planning over single-step planning. Training models to skip steps in their trajectories helps capture long-term dependencies without additional or memory computational cost. However, predicting excessively sparse plans degrades performance. We hypothesize this temporal density threshold is non-uniform across a temporal horizon and that certain parts of a planned trajectory should be more densely planned. We propose Mixed Density Diffuser (MDD), a diffusion planner where the densities throughout the horizon are tunable hyperparameters. MDD achieves a new SOTA across the Maze2D, Franka Kitchen, and Antmaze D4RL task domains.
comment: European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESSAN) (under review)
MarsLGPR: Mars Rover Localization with Ground Penetrating Radar
In this work, we propose the use of Ground Penetrating Radar (GPR) for rover localization on Mars. Precise pose estimation is an important task for mobile robots exploring planetary surfaces, as they operate in GPS-denied environments. Although visual odometry provides accurate localization, it is computationally expensive and can fail in dim or high-contrast lighting. Wheel encoders can also provide odometry estimation, but are prone to slipping on the sandy terrain encountered on Mars. Although traditionally a scientific surveying sensor, GPR has been used on Earth for terrain classification and localization through subsurface feature matching. The Perseverance rover and the upcoming ExoMars rover have GPR sensors already equipped to aid in the search of water and mineral resources. We propose to leverage GPR to aid in Mars rover localization. Specifically, we develop a novel GPR-based deep learning model that predicts 1D relative pose translation. We fuse our GPR pose prediction method with inertial and wheel encoder data in a filtering framework to output rover localization. We perform experiments in a Mars analog environment and demonstrate that our GPR-based displacement predictions both outperform wheel encoders and improve multi-modal filtering estimates in high-slip environments. Lastly, we present the first dataset aimed at GPR-based localization in Mars analog environments, which will be made publicly available at https://umfieldrobotics.github.io/marslgpr.
Cosmos-Surg-dVRK: World Foundation Model-based Automated Online Evaluation of Surgical Robot Policy Learning
The rise of surgical robots and vision-language-action models has accelerated the development of autonomous surgical policies and efficient assessment strategies. However, evaluating these policies directly on physical robotic platforms such as the da Vinci Research Kit (dVRK) remains hindered by high costs, time demands, reproducibility challenges, and variability in execution. World foundation models (WFM) for physical AI offer a transformative approach to simulate complex real-world surgical tasks, such as soft tissue deformation, with high fidelity. This work introduces Cosmos-Surg-dVRK, a surgical finetune of the Cosmos WFM, which, together with a trained video classifier, enables fully automated online evaluation and benchmarking of surgical policies. We evaluate Cosmos-Surg-dVRK using two distinct surgical datasets. On tabletop suture pad tasks, the automated pipeline achieves strong correlation between online rollouts in Cosmos-Surg-dVRK and policy outcomes on the real dVRK Si platform, as well as good agreement between human labelers and the V-JEPA 2-derived video classifier. Additionally, preliminary experiments with ex-vivo porcine cholecystectomy tasks in Cosmos-Surg-dVRK demonstrate promising alignment with real-world evaluations, highlighting the platform's potential for more complex surgical procedures.
comment: minor metadata and notation fixes; +3 citations
FlexEvent: Towards Flexible Event-Frame Object Detection at Varying Operational Frequencies NeurIPS 2025
Event cameras offer unparalleled advantages for real-time perception in dynamic environments, thanks to the microsecond-level temporal resolution and asynchronous operation. Existing event detectors, however, are limited by fixed-frequency paradigms and fail to fully exploit the high-temporal resolution and adaptability of event data. To address these limitations, we propose FlexEvent, a novel framework that enables detection at varying frequencies. Our approach consists of two key components: FlexFuse, an adaptive event-frame fusion module that integrates high-frequency event data with rich semantic information from RGB frames, and FlexTune, a frequency-adaptive fine-tuning mechanism that generates frequency-adjusted labels to enhance model generalization across varying operational frequencies. This combination allows our method to detect objects with high accuracy in both fast-moving and static scenarios, while adapting to dynamic environments. Extensive experiments on large-scale event camera datasets demonstrate that our approach surpasses state-of-the-art methods, achieving significant improvements in both standard and high-frequency settings. Notably, our method maintains robust performance when scaling from 20 Hz to 90 Hz and delivers accurate detection up to 180 Hz, proving its effectiveness in extreme conditions. Our framework sets a new benchmark for event-based object detection and paves the way for more adaptable, real-time vision systems.
comment: NeurIPS 2025; 28 pages, 14 figures, 10 tables; Code at https://flexevent.github.io/
If They Disagree, Will You Conform? Exploring the Role of Robots' Value Awareness in a Decision-Making Task
This study investigates whether the opinions of robotic agents can influence human decision-making when robots display value awareness (i.e., the capability of understanding human preferences and prioritizing them in decision-making). We designed an experiment in which participants interacted with two Furhat robots - one programmed to be Value-Aware and the other Non-Value-Aware - during a labeling task for images representing human values. Results indicate that participants distinguished the Value-Aware robot from the Non-Value-Aware one. Although their explicit choices did not indicate a clear preference for one robot over the other, participants directed their gaze more toward the Value-Aware robot. Additionally, the Value-Aware robot was perceived as more loyal, suggesting that value awareness in a social robot may enhance its perceived commitment to the group. Finally, when both robots disagreed with the participant, conformity occurred in about one out of four trials, and participants took longer to confirm their responses, suggesting that two robots expressing dissent may introduce hesitation in decision-making. On one hand, this highlights the potential risk that robots, if misused, could manipulate users for unethical purposes. On the other hand, it reinforces the idea that social robots could encourage reflection in ambiguous situations and help users avoid scams.
comment: Pre-print version
DW-A-PRM: A Dynamic Weighted Planner
Robot path planning plays a pivotal role in enabling autonomous systems to navigate safely and efficiently in complex and uncertain environments. Despite extensive research on classical graph-based methods and sampling-based planners, achieving an optimal balance between global optimality, computational efficiency, and adaptability to dynamic environments remains an open challenge. To address this issue, this paper proposes a hybrid path planning framework, which integrates heuristic-driven search with probabilistic roadmap construction under a dynamic weighting scheme. By coupling the global guidance of A* with the stochastic exploration of PRM, the method achieves a synergistic balance between search optimality and computational tractability. Comprehensive experiments in diverse simulated environments demonstrate that the proposed method consistently yields smoother and shorter paths while significantly reducing computational overhead compared with conventional approach and other hybrid planners. These results highlight the potential of the proposed framework as an effective and generalizable solution for real-time robotic navigation in complex environments.
RL-100: Performant Robotic Manipulation with Real-World Reinforcement Learning
Real-world robotic manipulation in homes and factories demands reliability, efficiency, and robustness that approach or surpass skilled human operators. We present RL-100, a real-world reinforcement learning training framework built on diffusion visuomotor policies trained by supervised learning. RL-100 introduces a three-stage pipeline. First, imitation learning leverages human priors. Second, iterative offline reinforcement learning uses an Offline Policy Evaluation procedure, abbreviated OPE, to gate PPO-style updates that are applied in the denoising process for conservative and reliable improvement. Third, online reinforcement learning eliminates residual failure modes. An additional lightweight consistency distillation head compresses the multi-step sampling process in diffusion into a single-step policy, enabling high-frequency control with an order-of-magnitude reduction in latency while preserving task performance. The framework is task-, embodiment-, and representation-agnostic and supports both 3D point clouds and 2D RGB inputs, a variety of robot platforms, and both single-step and action-chunk policies. We evaluate RL-100 on seven real-robot tasks spanning dynamic rigid-body control, such as Push-T and Agile Bowling, fluids and granular pouring, deformable cloth folding, precise dexterous unscrewing, and multi-stage orange juicing. RL-100 attains 100\% success across evaluated trials for a total of 900 out of 900 episodes, including up to 250 out of 250 consecutive trials on one task. The method achieves near-human teleoperation or better time efficiency and demonstrates multi-hour robustness with uninterrupted operation lasting up to two hours.
comment: https://lei-kun.github.io/RL-100/
Neuro-Symbolic Imitation Learning: Discovering Symbolic Abstractions for Skill Learning ICRA
Imitation learning is a popular method for teaching robots new behaviors. However, most existing methods focus on teaching short, isolated skills rather than long, multi-step tasks. To bridge this gap, imitation learning algorithms must not only learn individual skills but also an abstract understanding of how to sequence these skills to perform extended tasks effectively. This paper addresses this challenge by proposing a neuro-symbolic imitation learning framework. Using task demonstrations, the system first learns a symbolic representation that abstracts the low-level state-action space. The learned representation decomposes a task into easier subtasks and allows the system to leverage symbolic planning to generate abstract plans. Subsequently, the system utilizes this task decomposition to learn a set of neural skills capable of refining abstract plans into actionable robot commands. Experimental results in three simulated robotic environments demonstrate that, compared to baselines, our neuro-symbolic approach increases data efficiency, improves generalization capabilities, and facilitates interpretability.
comment: IEEE International Conference on Robotics and Automation (ICRA) 2025
Infinite-Horizon Value Function Approximation for Model Predictive Control
Model Predictive Control has emerged as a popular tool for robots to generate complex motions. However, the real-time requirement has limited the use of hard constraints and large preview horizons, which are necessary to ensure safety and stability. In practice, practitioners have to carefully design cost functions that can imitate an infinite horizon formulation, which is tedious and often results in local minima. In this work, we study how to approximate the infinite horizon value function of constrained optimal control problems with neural networks using value iteration and trajectory optimization. Furthermore, we experimentally demonstrate how using this value function approximation as a terminal cost provides global stability to the model predictive controller. The approach is validated on two toy problems and a real-world scenario with online obstacle avoidance on an industrial manipulator where the value function is conditioned to the goal and obstacle.
UniVLA: Learning to Act Anywhere with Task-centric Latent Actions
A generalist robot should perform effectively across various environments. However, most existing approaches heavily rely on scaling action-annotated data to enhance their capabilities. Consequently, they are often limited to single physical specification and struggle to learn transferable knowledge across different embodiments and environments. To confront these limitations, we propose UniVLA, a new framework for learning cross-embodiment vision-language-action (VLA) policies. Our key innovation is to derive task-centric action representations from videos with a latent action model. This enables us to exploit extensive data across a wide spectrum of embodiments and perspectives. To mitigate the effect of task-irrelevant dynamics, we incorporate language instructions and establish a latent action model within the DINO feature space. Learned from internet-scale videos, the generalist policy can be deployed to various robots through efficient latent action decoding. We obtain state-of-the-art results across multiple manipulation and navigation benchmarks, as well as real-robot deployments. UniVLA achieves superior performance over OpenVLA with less than 1/20 of pretraining compute and 1/10 of downstream data. Continuous performance improvements are observed as heterogeneous data, even including human videos, are incorporated into the training pipeline. The results underscore UniVLA's potential to facilitate scalable and efficient robot policy learning.
comment: Accepted to RSS 2025. Code is available at https://github.com/OpenDriveLab/UniVLA
A Helping (Human) Hand in Kinematic Structure Estimation ICRA25
Visual uncertainties such as occlusions, lack of texture, and noise present significant challenges in obtaining accurate kinematic models for safe robotic manipulation. We introduce a probabilistic real-time approach that leverages the human hand as a prior to mitigate these uncertainties. By tracking the constrained motion of the human hand during manipulation and explicitly modeling uncertainties in visual observations, our method reliably estimates an object's kinematic model online. We validate our approach on a novel dataset featuring challenging objects that are occluded during manipulation and offer limited articulations for perception. The results demonstrate that by incorporating an appropriate prior and explicitly accounting for uncertainties, our method produces accurate estimates, outperforming two recent baselines by 195% and 140%, respectively. Furthermore, we demonstrate that our approach's estimates are precise enough to allow a robot to manipulate even small objects safely.
comment: Accepted at ICRA25; 8 pages + 7 figures; For supplementary material, see https://www.tu.berlin/robotics/papers/helpinghands
VO-DP: Semantic-Geometric Adaptive Diffusion Policy for Vision-Only Robotic Manipulation
In the context of imitation learning, visuomotor-based diffusion policy learning is one of the main directions in robotic manipulation. Most of these approaches rely on point clouds as observation inputs and construct scene representations through point clouds feature learning, which enables them to achieve remarkable accuracy. However, the existing literature lacks an in-depth exploration of vision-only solutions that have significant potential. In this paper, we propose a Vision-Only and single-view Diffusion Policy learning method (VO-DP) that leverages pretrained visual foundation models to achieve effective fusion of semantic and geometric features. We utilize intermediate features from VGGT incorporating semantic features from DINOv2 and geometric features from Alternating Attention blocks. Features are fused via cross-attention and spatially compressed with a CNN to form the input to the policy head. Extensive experiments demonstrate that VO-DP not only outperforms the vision-only baseline DP significantly but also exhibits distinct performance trends against the point cloud-based method DP3: in simulation tasks, VO-DP achieves an average success rate of 64.6% on par with DP3 64.0% and far higher than DP 34.8%, while in real-world tasks, it reaches 87.9%, outperforming both DP3 67.5% and DP 11.2% by a notable margin. Further robustness evaluations confirm that VO-DP remains highly stable under varying conditions including color, size, background, and lighting. Lastly, we open-source a training library for robotic manipulation. Built on Accelerate, this library supports multi-machine and multi-GPU parallel training, as well as mixed precision training. It is compatible with visuomotor policies such as DP, DP3 and VO-DP, and also supports the RoboTwin simulator.
Bellman Diffusion Models
Diffusion models have seen tremendous success as generative architectures. Recently, they have been shown to be effective at modelling policies for offline reinforcement learning and imitation learning. We explore using diffusion as a model class for the successor state measure (SSM) of a policy. We find that enforcing the Bellman flow constraints leads to a simple Bellman update on the diffusion step distribution.
MARFT: Multi-Agent Reinforcement Fine-Tuning
LLM-based Multi-Agent Systems have demonstrated remarkable capabilities in addressing complex, agentic tasks, from generating high-quality presentation slides to even conducting sophisticated scientific research. Meanwhile, RL has been widely recognized for its effectiveness in enhancing agent intelligence, but limited research has investigated the fine-tuning of LaMAS using foundational RL techniques. Moreover, the direct application of MARL methods to LaMAS introduces significant challenges, stemming from the unique characteristics and mechanisms inherent to LaMAS. To address these challenges, this article presents a comprehensive study of LLM-based MARL and proposes a novel paradigm termed Multi-Agent Reinforcement Fine-Tuning (MARFT). We introduce a brand-new MG called Flex-MG, which aligns with the LaMAS optimization in real-world applications and a universal algorithmic framework tailored specifically for LaMAS, outlining the conceptual foundations, key distinctions, and practical implementation strategies. We review the evolution from RL to RFT, setting the stage for a parallel analysis in the multi-agent domain. In the context of LaMAS, we elucidate critical differences between MARL and MARFT. These differences motivate a transition toward a LaMAS-oriented formulation of RFT. Central to this work is a robust and scalable MARFT framework. We detail the core algorithm and provide a complete, open-source implementation to facilitate adoption and further research. The latter sections of the paper explore real-world application perspectives and opening challenges in MARFT. By bridging theoretical underpinnings with practical methodologies, this work serves as a roadmap for researchers seeking to advance MARFT toward resilient and adaptive solutions in agentic systems. Our implementation of the proposed framework is publicly available at: https://github.com/jwliao-ai/MARFT.
comment: 42 pages
A Time-dependent Risk-aware distributed Multi-Agent Path Finder based on A* IROS 2025
Multi-Agent Path-Finding (MAPF) focuses on the collaborative planning of paths for multiple agents within shared spaces, aiming for collision-free navigation. Conventional planning methods often overlook the presence of other agents, which can result in conflicts. In response, this article introduces the A$^*_+$T algorithm, a distributed approach that improves coordination among agents by anticipating their positions based on their movement speeds. The algorithm also considers dynamic obstacles, assessing potential collisions with respect to observed speeds and trajectories, thereby facilitating collision-free path planning in environments populated by other agents and moving objects. It incorporates a risk layer surrounding both dynamic and static entities, enhancing its utility in real-world applications. Each agent functions autonomously while being mindful of the paths chosen by others, effectively addressing the complexities inherent in multi-agent situations. The performance of A$^*_+$T has been rigorously tested in the Gazebo simulation environment and benchmarked against established approaches such as CBS, ECBS, and SIPP. Furthermore, the algorithm has shown competence in single-agent experiments, with results demonstrating its effectiveness in managing dynamic obstacles and affirming its practical relevance across various scenarios.
comment: 8 pages, 10 figures, 2 tabels, submited to IROS 2025
DTAA: A Detect, Track and Avoid Architecture for navigation in spaces with Multiple Velocity Objects
Proactive collision avoidance measures are imperative in environments where humans and robots coexist. Moreover, the introduction of high quality legged robots into workplaces highlighted the crucial role of a robust, fully autonomous safety solution for robots to be viable in shared spaces or in co-existence with humans. This article establishes for the first time ever an innovative Detect-Track-and-Avoid Architecture (DTAA) to enhance safety and overall mission performance. The proposed novel architectyre has the merit ot integrating object detection using YOLOv8, utilizing Ultralytics embedded object tracking, and state estimation of tracked objects through Kalman filters. Moreover, a novel heuristic clustering is employed to facilitate active avoidance of multiple closely positioned objects with similar velocities, creating sets of unsafe spaces for the Nonlinear Model Predictive Controller (NMPC) to navigate around. The NMPC identifies the most hazardous unsafe space, considering not only their current positions but also their predicted future locations. In the sequel, the NMPC calculates maneuvers to guide the robot along a path planned by D$^{*}_{+}$ towards its intended destination, while maintaining a safe distance to all identified obstacles. The efficacy of the novelly suggested DTAA framework is being validated by Real-life experiments featuring a Boston Dynamics Spot robot that demonstrates the robot's capability to consistently maintain a safe distance from humans in dynamic subterranean, urban indoor, and outdoor environments.
Spatiotemporal Calibration for Laser Vision Sensor in Hand-eye System Based on Straight-line Constraint
Laser vision sensors (LVS) are critical perception modules for industrial robots, facilitating real-time acquisition of workpiece geometric data in welding applications. However, the camera communication delay will lead to a temporal desynchronization between captured images and the robot motions. Additionally, hand-eye extrinsic parameters may vary during prolonged measurement. To address these issues, we introduce a measurement model of LVS considering the effect of the camera's time-offset and propose a teaching-free spatiotemporal calibration method utilizing line constraints. This method involves a robot equipped with an LVS repeatedly scanning straight-line fillet welds using S-shaped trajectories. Regardless of the robot's orientation changes, all measured welding positions are constrained to a straight-line, represented by Plucker coordinates. Moreover, a nonlinear optimization model based on straight-line constraints is established. Subsequently, the Levenberg-Marquardt algorithm (LMA) is employed to optimize parameters, including time-offset, hand-eye extrinsic parameters, and straight-line parameters. The feasibility and accuracy of the proposed approach are quantitatively validated through experiments on curved weld scanning. We open-sourced the code, dataset, and simulation report at https://anonymous.4open.science/r/LVS_ST_CALIB-015F/README.md.
comment: Submitted to IEEE RAL
Dual-Regularized Riccati Recursions for Interior-Point Optimal Control
We derive closed-form extensions of Riccati's recursions (both sequential and parallel) for solving dual-regularized LQR problems. We show how these methods can be used to solve general constrained, non-convex, discrete-time optimal control problems via a regularized interior point method, while guaranteeing that each primal step is a descent direction of an Augmented Barrier-Lagrangian merit function. We provide MIT-licensed implementations of our methods in C++ and JAX.
From Grounding to Manipulation: Case Studies of Foundation Model Integration in Embodied Robotic Systems EMNLP 2025
Foundation models (FMs) are increasingly used to bridge language and action in embodied agents, yet the operational characteristics of different FM integration strategies remain under-explored -- particularly for complex instruction following and versatile action generation in changing environments. This paper examines three paradigms for building robotic systems: end-to-end vision-language-action (VLA) models that implicitly integrate perception and planning, and modular pipelines incorporating either vision-language models (VLMs) or multimodal large language models (LLMs). We evaluate these paradigms through two focused case studies: a complex instruction grounding task assessing fine-grained instruction understanding and cross-modal disambiguation, and an object manipulation task targeting skill transfer via VLA finetuning. Our experiments in zero-shot and few-shot settings reveal trade-offs in generalization and data efficiency. By exploring performance limits, we distill design implications for developing language-driven physical agents and outline emerging challenges and opportunities for FM-powered robotics in real-world conditions.
comment: EMNLP 2025 camera ready
Simultaneous System Identification and Model Predictive Control with No Dynamic Regret
We provide an algorithm for the simultaneous system identification and model predictive control of nonlinear systems. The algorithm has finite-time near-optimality guarantees and asymptotically converges to the optimal (non-causal) controller. Particularly, the algorithm enjoys sublinear dynamic regret, defined herein as the suboptimality against an optimal clairvoyant controller that knows how the unknown disturbances and system dynamics will adapt to its actions. The algorithm is self-supervised and applies to control-affine systems with unknown dynamics and disturbances that can be expressed in reproducing kernel Hilbert spaces. Such spaces can model external disturbances and modeling errors that can even be adaptive to the system's state and control input. For example, they can model wind and wave disturbances to aerial and marine vehicles, or inaccurate model parameters such as inertia of mechanical systems. The algorithm first generates random Fourier features that are used to approximate the unknown dynamics or disturbances. Then, it employs model predictive control based on the current learned model of the unknown dynamics (or disturbances). The model of the unknown dynamics is updated online using least squares based on the data collected while controlling the system. We validate our algorithm in both hardware experiments and physics-based simulations. The simulations include (i) a cart-pole aiming to maintain the pole upright despite inaccurate model parameters, and (ii) a quadrotor aiming to track reference trajectories despite unmodeled aerodynamic drag effects. The hardware experiments include a quadrotor aiming to track a circular trajectory despite unmodeled aerodynamic drag effects, ground effects, and wind disturbances.
comment: IEEE Transactions on Robotics (T-RO). v6 update on stability analysis in Appendix J under relaxed Assumption 1
iKap: Kinematics-aware Planning with Imperative Learning
Trajectory planning in robotics aims to generate collision-free pose sequences that can be reliably executed. Recently, vision-to-planning systems have gained increasing attention for their efficiency and ability to interpret and adapt to surrounding environments. However, traditional modular systems suffer from increased latency and error propagation, while purely data-driven approaches often overlook the robot's kinematic constraints. This oversight leads to discrepancies between planned trajectories and those that are executable. To address these challenges, we propose iKap, a novel vision-to-planning system that integrates the robot's kinematic model directly into the learning pipeline. iKap employs a self-supervised learning approach and incorporates the state transition model within a differentiable bi-level optimization framework. This integration ensures the network learns collision-free waypoints while satisfying kinematic constraints, enabling gradient back-propagation for end-to-end training. Our experimental results demonstrate that iKap achieves higher success rates and reduced latency compared to the state-of-the-art methods. Besides the complete system, iKap offers a visual-to-planning network that seamlessly works with various controllers, providing a robust solution for robots navigating complex environments.
comment: 6 pages, 6 figures
Dexterous Contact-Rich Manipulation via the Contact Trust Region
What is a good local description of contact dynamics for contact-rich manipulation, and where can we trust this local description? While many approaches often rely on the Taylor approximation of dynamics with an ellipsoidal trust region, we argue that such approaches are fundamentally inconsistent with the unilateral nature of contact. As a remedy, we present the Contact Trust Region (CTR), which captures the unilateral nature of contact while remaining efficient for computation. With CTR, we first develop a Model-Predictive Control (MPC) algorithm capable of synthesizing local contact-rich plans. Then, we extend this capability to plan globally by stitching together local MPC plans, enabling efficient and dexterous contact-rich manipulation. To verify the performance of our method, we perform comprehensive evaluations, both in high-fidelity simulation and on hardware, on two contact-rich systems: a planar IiwaBimanual system and a 3D AllegroHand system. On both systems, our method offers a significantly lower-compute alternative to existing RL-based approaches to contact-rich manipulation. In particular, our Allegro in-hand manipulation policy, in the form of a roadmap, takes fewer than 10 minutes to build offline on a standard laptop using just its CPU, with online inference taking just a few seconds. Experiment data, video and code are available at ctr.theaiinstitute.com.
Adaptive Multirobot Virtual Structure Control using Dual Quaternions
This paper presents a control strategy based on dual quaternions for the coordinated formation flying of small UAV groups. A virtual structure is employed to define the desired formation, enabling unified control of its position, orientation, and shape. This abstraction makes formation management easier by allowing a low-level controller to compute individual UAV commands efficiently. The proposed controller integrates a pose control module with a geometry-based adaptive strategy, ensuring precise and robust task execution. The effectiveness of the approach is demonstrated through both simulation and experimental results.
MarsLGPR: Mars Rover Localization with Ground Penetrating Radar
In this work, we propose the use of Ground Penetrating Radar (GPR) for rover localization on Mars. Precise pose estimation is an important task for mobile robots exploring planetary surfaces, as they operate in GPS-denied environments. Although visual odometry provides accurate localization, it is computationally expensive and can fail in dim or high-contrast lighting. Wheel encoders can also provide odometry estimation, but are prone to slipping on the sandy terrain encountered on Mars. Although traditionally a scientific surveying sensor, GPR has been used on Earth for terrain classification and localization through subsurface feature matching. The Perseverance rover and the upcoming ExoMars rover have GPR sensors already equipped to aid in the search of water and mineral resources. We propose to leverage GPR to aid in Mars rover localization. Specifically, we develop a novel GPR-based deep learning model that predicts 1D relative pose translation. We fuse our GPR pose prediction method with inertial and wheel encoder data in a filtering framework to output rover localization. We perform experiments in a Mars analog environment and demonstrate that our GPR-based displacement predictions both outperform wheel encoders and improve multi-modal filtering estimates in high-slip environments. Lastly, we present the first dataset aimed at GPR-based localization in Mars analog environments, which will be made publicly available at https://umfieldrobotics.github.io/marslgpr.
comment: IEEE Transactions on Field Robotics (2025)
Integrated Shape-Force Estimation for Continuum Robots: A Virtual-Work and Polynomial-Curvature Framework
Cable-driven continuum robots (CDCRs) are widely used in surgical and inspection tasks that require dexterous manipulation in confined spaces. Existing model-based estimation methods either assume constant curvature or rely on geometry-space interpolants, both of which struggle with accuracy under large deformations and sparse sensing. This letter introduces an integrated shape-force estimation framework that combines cable-tension measurements with tip-pose data to reconstruct backbone shape and estimate external tip force simultaneously. The framework employs polynomial curvature kinematics (PCK) and a virtual-work-based static formulation expressed directly in curvature space, where polynomial modal coefficients serve as generalized coordinates. The proposed method is validated through Cosserat-rod-based simulations and hardware experiments on a torque-cell-enabled CDCR prototype. Results show that the second-order PCK model achieves superior shape and force accuracy, combining a lightweight shape optimization with a closed-form, iteration-free force estimation, offering a compact and robust alternative to prior constant-curvature and geometry-space approaches.
Interactive Identification of Granular Materials using Force Measurements
Despite the potential the ability to identify granular materials creates for applications such as robotic cooking or earthmoving, granular material identification remains a challenging area, existing methods mostly relying on shaking the materials in closed containers. This work presents an interactive material identification framework that enables robots to identify a wide range of granular materials using only force-torque measurements. Unlike prior works, the proposed approach uses direct interaction with the materials. The approach is evaluated through experiments with a real-world dataset comprising 11 granular materials, which we also make publicly available. Results show that our method can identify a wide range of granular materials with near-perfect accuracy while relying solely on force measurements obtained from direct interaction. Further, our comprehensive data analysis and experiments show that a high-performancefeature space must combine features related to the force signal's time-domain dynamics and frequency spectrum. We account for this by proposing a combination of the raw signal and its high-frequency magnitude histogram as the suggested feature space representation. We show that the proposed feature space outperforms baselines by a significant margin. The code and data set are available at: https://irobotics.aalto.fi/identify_granular/.
comment: Accepted to 2025 IEEE International Conference on Systems, Man, and Cybernetics (SMC)
Closing the Intent-to-Behavior Gap via Fulfillment Priority Logic
Practitioners designing reinforcement learning policies face a fundamental challenge: translating intended behavioral objectives into representative reward functions. This challenge stems from behavioral intent requiring simultaneous achievement of multiple competing objectives, typically addressed through labor-intensive linear reward composition that yields brittle results. Consider the ubiquitous robotics scenario where performance maximization directly conflicts with energy conservation. Such competitive dynamics are resistant to simple linear reward combinations. In this paper, we present the concept of objective fulfillment upon which we build Fulfillment Priority Logic (FPL). FPL allows practitioners to define logical formula representing their intentions and priorities within multi-objective reinforcement learning. Our novel Balanced Policy Gradient algorithm leverages FPL specifications to achieve up to 500\% better sample efficiency compared to Soft Actor Critic. Notably, this work constitutes the first implementation of non-linear utility scalarization design, specifically for continuous control problems.
The Difference between the Left and Right Invariant Extended Kalman Filter
The extended Kalman filter (EKF) has been the industry standard for state estimation problems over the past sixty years. The Invariant Extended Kalman Filter (IEKF) is a recent development of the EKF for the class of group-affine systems on Lie groups that has shown superior performance for inertial navigation problems. The IEKF comes in two versions, left- and right- handed respectively, and there is a perception in the robotics community that these filters are different and one should choose the handedness of the IEKF to match handedness of the measurement model for a given filtering problem. In this paper, we revisit these algorithms and demonstrate that the left- and right- IEKF algorithms (with reset step) are identical, that is, the choice of the handedness does not affect the IEKF's performance when the reset step is properly implemented. The reset step was not originally proposed as part of the IEKF, however, we provide simulations to show that the reset step improves asymptotic performance of all versions of the the filter, and should be included in all high performance algorithms. The GNSS-aided inertial navigation system (INS) is used as a motivating example to demonstrate the equivalence of the two filters.
comment: 20 pages, 4 figures, submitted to Control Engineering Practice
Learning Terrain-Specialized Policies for Adaptive Locomotion in Challenging Environments
Legged robots must exhibit robust and agile locomotion across diverse, unstructured terrains, a challenge exacerbated under blind locomotion settings where terrain information is unavailable. This work introduces a hierarchical reinforcement learning framework that leverages terrain-specialized policies and curriculum learning to enhance agility and tracking performance in complex environments. We validated our method on simulation, where our approach outperforms a generalist policy by up to 16% in success rate and achieves lower tracking errors as the velocity target increases, particularly on low-friction and discontinuous terrains, demonstrating superior adaptability and robustness across mixed-terrain scenarios.
comment: Accepted to the 22nd International Conference on Advanced Robotics (ICAR 2025). 7 pages
End-to-End Crop Row Navigation via LiDAR-Based Deep Reinforcement Learning
Reliable navigation in under-canopy agricultural environments remains a challenge due to GNSS unreliability, cluttered rows, and variable lighting. To address these limitations, we present an end-to-end learning-based navigation system that maps raw 3D LiDAR data directly to control commands using a deep reinforcement learning policy trained entirely in simulation. Our method includes a voxel-based downsampling strategy that reduces LiDAR input size by 95.83%, enabling efficient policy learning without relying on labeled datasets or manually designed control interfaces. The policy was validated in simulation, achieving a 100% success rate in straight-row plantations and showing a gradual decline in performance as row curvature increased, tested across varying sinusoidal frequencies and amplitudes.
comment: Accepted to the 22nd International Conference on Advanced Robotics (ICAR 2025). 7 pages
Multi-Objective Planning with Contextual Lexicographic Reward Preferences AAMAS
Autonomous agents are often required to plan under multiple objectives whose preference ordering varies based on context. The agent may encounter multiple contexts during its course of operation, each imposing a distinct lexicographic ordering over the objectives, with potentially different reward functions associated with each context. Existing approaches to multi-objective planning typically consider a single preference ordering over the objectives, across the state space, and do not support planning under multiple objective orderings within an environment. We present Contextual Lexicographic Markov Decision Process (CLMDP), a framework that enables planning under varying lexicographic objective orderings, depending on the context. In a CLMDP, both the objective ordering at a state and the associated reward functions are determined by the context. We employ a Bayesian approach to infer a state-context mapping from expert trajectories. Our algorithm to solve a CLMDP first computes a policy for each objective ordering and then combines them into a single context-aware policy that is valid and cycle-free. The effectiveness of the proposed approach is evaluated in simulation and using a mobile robot.
comment: 9 pages, 5 figures, 2 tables, To appear in Proceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems (AAMAS) 2025
Multiagent Systems
JaxMARL-HFT: GPU-Accelerated Large-Scale Multi-Agent Reinforcement Learning for High-Frequency Trading
Agent-based modelling (ABM) approaches for high-frequency financial markets are difficult to calibrate and validate, partly due to the large parameter space created by defining fixed agent policies. Multi-agent reinforcement learning (MARL) enables more realistic agent behaviour and reduces the number of free parameters, but the heavy computational cost has so far limited research efforts. To address this, we introduce JaxMARL-HFT (JAX-based Multi-Agent Reinforcement Learning for High-Frequency Trading), the first GPU-accelerated open-source multi-agent reinforcement learning environment for high-frequency trading (HFT) on market-by-order (MBO) data. Extending the JaxMARL framework and building on the JAX-LOB implementation, JaxMARL-HFT is designed to handle a heterogeneous set of agents, enabling diverse observation/action spaces and reward functions. It is designed flexibly, so it can also be used for single-agent RL, or extended to act as an ABM with fixed-policy agents. Leveraging JAX enables up to a 240x reduction in end-to-end training time, compared with state-of-the-art reference implementations on the same hardware. This significant speed-up makes it feasible to exploit the large, granular datasets available in high-frequency trading, and to perform the extensive hyperparameter sweeps required for robust and efficient MARL research in trading. We demonstrate the use of JaxMARL-HFT with independent Proximal Policy Optimization (IPPO) for a two-player environment, with an order execution and a market making agent, using one year of LOB data (400 million orders), and show that these agents learn to outperform standard benchmarks. The code for the JaxMARL-HFT framework is available on GitHub.
comment: Code available at: https://github.com/vmohl/JaxMARL-HFT
ABIDES-MARL: A Multi-Agent Reinforcement Learning Environment for Endogenous Price Formation and Execution in a Limit Order Book
We present ABIDES-MARL, a framework that combines a new multi-agent reinforcement learning (MARL) methodology with a new realistic limit-order-book (LOB) simulation system to study equilibrium behavior in complex financial market games. The system extends ABIDES-Gym by decoupling state collection from kernel interruption, enabling synchronized learning and decision-making for multiple adaptive agents while maintaining compatibility with standard RL libraries. It preserves key market features such as price-time priority and discrete tick sizes. Methodologically, we use MARL to approximate equilibrium-like behavior in multi-period trading games with a finite number of heterogeneous agents-an informed trader, a liquidity trader, noise traders, and competing market makers-all with individual price impacts. This setting bridges optimal execution and market microstructure by embedding the liquidity trader's optimization problem within a strategic trading environment. We validate the approach by solving an extended Kyle model within the simulation system, recovering the gradual price discovery phenomenon. We then extend the analysis to a liquidity trader's problem where market liquidity arises endogenously and show that, at equilibrium, execution strategies shape market-maker behavior and price dynamics. ABIDES-MARL provides a reproducible foundation for analyzing equilibrium and strategic adaptation in realistic markets and contributes toward building economically interpretable agentic AI systems for finance.
Learning what to say and how precisely: Efficient Communication via Differentiable Discrete Communication Learning
Effective communication in multi-agent reinforcement learning (MARL) is critical for success but constrained by bandwidth, yet past approaches have been limited to complex gating mechanisms that only decide \textit{whether} to communicate, not \textit{how precisely}. Learning to optimize message precision at the bit-level is fundamentally harder, as the required discretization step breaks gradient flow. We address this by generalizing Differentiable Discrete Communication Learning (DDCL), a framework for end-to-end optimization of discrete messages. Our primary contribution is an extension of DDCL to support unbounded signals, transforming it into a universal, plug-and-play layer for any MARL architecture. We verify our approach with three key results. First, through a qualitative analysis in a controlled environment, we demonstrate \textit{how} agents learn to dynamically modulate message precision according to the informational needs of the task. Second, we integrate our variant of DDCL into four state-of-the-art MARL algorithms, showing it reduces bandwidth by over an order of magnitude while matching or exceeding task performance. Finally, we provide direct evidence for the \enquote{Bitter Lesson} in MARL communication: a simple Transformer-based policy leveraging DDCL matches the performance of complex, specialized architectures, questioning the necessity of bespoke communication designs.
comment: 30 pages, 12 figures, 6 tables
An Explanation-oriented Inquiry Dialogue Game for Expert Collaborative Recommendations
This work presents a requirement analysis for collaborative dialogues among medical experts and an inquiry dialogue game based on this analysis for incorporating explainability into multiagent system design. The game allows experts with different knowledge bases to collaboratively make recommendations while generating rich traces of the reasoning process through combining explanation-based illocutionary forces in an inquiry dialogue. The dialogue game was implemented as a prototype web-application and evaluated against the specification through a formative user study. The user study confirms that the dialogue game meets the needs for collaboration among medical experts. It also provides insights on the real-life value of dialogue-based communication tools for the medical community.
Designing Non-monetary Intersection Control Mechanisms for Efficient Selfish Routing
Urban traffic congestion stems from the misalignment between self-interested routing decisions and socially optimal flows. Intersections, as critical bottlenecks, amplify these inefficiencies because existing control schemes often neglect drivers' strategic behavior. Autonomous intersections, enabled by vehicle-to-infrastructure communication, permit vehicle-level scheduling based on individual requests. Leveraging this fine-grained control, we propose a non-monetary mechanism that strategically adjusts request timestamps-delaying or advancing passage times-to incentivize socially efficient routing. We present a hierarchical architecture separating local scheduling by roadside units from network-wide timestamp adjustments by a central planner. We establish an experimentally validated analytical model, prove the existence and essential uniqueness of equilibrium flows and formulate the planner's problem as an offline bilevel optimization program solvable with standard tools. Experiments on the Sioux Falls network show up to a 68% reduction in the efficiency gap between equilibrium and optimal flows, demonstrating scalability and effectiveness.
From Pixels to Cooperation Multi Agent Reinforcement Learning based on Multimodal World Models
Learning cooperative multi-agent policies directly from high-dimensional, multimodal sensory inputs like pixels and audio (from pixels) is notoriously sample-inefficient. Model-free Multi-Agent Reinforcement Learning (MARL) algorithms struggle with the joint challenge of representation learning, partial observability, and credit assignment. To address this, we propose a novel framework based on a shared, generative Multimodal World Model (MWM). Our MWM is trained to learn a compressed latent representation of the environment's dynamics by fusing distributed, multimodal observations from all agents using a scalable attention-based mechanism. Subsequently, we leverage this learned MWM as a fast, "imagined" simulator to train cooperative MARL policies (e.g., MAPPO) entirely within its latent space, decoupling representation learning from policy learning. We introduce a new set of challenging multimodal, multi-agent benchmarks built on a 3D physics simulator. Our experiments demonstrate that our MWM-MARL framework achieves orders-of-magnitude greater sample efficiency compared to state-of-the-art model-free MARL baselines. We further show that our proposed multimodal fusion is essential for task success in environments with sensory asymmetry and that our architecture provides superior robustness to sensor-dropout, a critical feature for real-world deployment.
Credit Network Modeling and Analysis via Large Language Models
We investigate the application of large language models (LLMs) to construct credit networks from firms' textual financial statements and to analyze the resulting network structures. We start with using LLMs to translate each firm's financial statement into a credit network that pertains solely to that firm. These networks are then aggregated to form a comprehensive credit network representing the whole financial system. During this process, the inconsistencies in financial statements are automatically detected and human intervention is involved. We demonstrate that this translation process is effective across financial statements corresponding to credit networks with diverse topological structures. We further investigate the reasoning capabilities of LLMs in analyzing credit networks and determining optimal strategies for executing financial operations to maximize network performance measured by the total assets of firms, which is an inherently combinatorial optimization challenge. To demonstrate this capability, we focus on two financial operations: portfolio compression and debt removal, applying them to both synthetic and real-world datasets. Our findings show that LLMs can generate coherent reasoning and recommend effective executions of these operations to enhance overall network performance.
comment: 8 pages, 5 figures, 4 tables
GTAlign: Game-Theoretic Alignment of LLM Assistants for Social Welfare
Large Language Models (LLMs) have achieved remarkable progress in reasoning, yet sometimes produce responses that are suboptimal for users in tasks such as writing, information seeking, or providing practical guidance. Conventional alignment practices typically assume that maximizing model reward also maximizes user welfare, but this assumption frequently fails in practice: models may over-clarify or generate overly verbose reasoning when users prefer concise answers. Such behaviors resemble the prisoner's dilemma, where individually rational choices lead to socially suboptimal outcomes. The fundamental challenge is the lack of a principled decision making mechanism that mutually benefits both the LLM and the user. We propose Game-Theoretic Alignment (GTAlign), an alignment framework that integrates game-theoretic decision making into both reasoning and training. During reasoning, the model explicitly treats user-LLM interaction as a strategic game: it constructs payoff matrices within its reasoning chain to estimate welfare for both itself and the user, and then selects actions that are mutually beneficial. During training, we introduce a social welfare reward that reinforces cooperative responses, aligning model behavior with socially efficient outcomes. In addition, we introduce an inference technique that leverages game-theoretic reasoning to dynamically adapt LLM's response when pricing policies of LLM service change. Extensive experiments demonstrate that GTAlign substantially improves reasoning efficiency, answer quality, and social welfare compared to baselines across diverse tasks. The code is available at https://github.com/ulab-uiuc/GTAlign .
comment: 31 pages, 6 figures
Breaking the Performance Ceiling in Reinforcement Learning requires Inference Strategies
Reinforcement learning (RL) systems have countless applications, from energy-grid management to protein design. However, such real-world scenarios are often extremely difficult, combinatorial in nature, and require complex coordination between multiple agents. This level of complexity can cause even state-of-the-art RL systems, trained until convergence, to hit a performance ceiling which they are unable to break out of with zero-shot inference. Meanwhile, many digital or simulation-based applications allow for an inference phase that utilises a specific time and compute budget to explore multiple attempts before outputting a final solution. In this work, we show that such an inference phase employed at execution time, and the choice of a corresponding inference strategy, are key to breaking the performance ceiling observed in complex multi-agent RL problems. Our main result is striking: we can obtain up to a 126% and, on average, a 45% improvement over the previous state-of-the-art across 17 tasks, using only a couple seconds of extra wall-clock time during execution. We also demonstrate promising compute scaling properties, supported by over 60k experiments, making it the largest study on inference strategies for complex RL to date. Our experimental data and code are available at https://sites.google.com/view/inference-strategies-rl.
comment: Neurips '25 version
MARFT: Multi-Agent Reinforcement Fine-Tuning
LLM-based Multi-Agent Systems have demonstrated remarkable capabilities in addressing complex, agentic tasks, from generating high-quality presentation slides to even conducting sophisticated scientific research. Meanwhile, RL has been widely recognized for its effectiveness in enhancing agent intelligence, but limited research has investigated the fine-tuning of LaMAS using foundational RL techniques. Moreover, the direct application of MARL methods to LaMAS introduces significant challenges, stemming from the unique characteristics and mechanisms inherent to LaMAS. To address these challenges, this article presents a comprehensive study of LLM-based MARL and proposes a novel paradigm termed Multi-Agent Reinforcement Fine-Tuning (MARFT). We introduce a brand-new MG called Flex-MG, which aligns with the LaMAS optimization in real-world applications and a universal algorithmic framework tailored specifically for LaMAS, outlining the conceptual foundations, key distinctions, and practical implementation strategies. We review the evolution from RL to RFT, setting the stage for a parallel analysis in the multi-agent domain. In the context of LaMAS, we elucidate critical differences between MARL and MARFT. These differences motivate a transition toward a LaMAS-oriented formulation of RFT. Central to this work is a robust and scalable MARFT framework. We detail the core algorithm and provide a complete, open-source implementation to facilitate adoption and further research. The latter sections of the paper explore real-world application perspectives and opening challenges in MARFT. By bridging theoretical underpinnings with practical methodologies, this work serves as a roadmap for researchers seeking to advance MARFT toward resilient and adaptive solutions in agentic systems. Our implementation of the proposed framework is publicly available at: https://github.com/jwliao-ai/MARFT.
comment: 42 pages
Language-Driven Coordination and Learning in Multi-Agent Simulation Environments
This paper introduces LLM-MARL, a unified framework that incorporates large language models (LLMs) into multi-agent reinforcement learning (MARL) to enhance coordination, communication, and generalization in simulated game environments. The framework features three modular components of Coordinator, Communicator, and Memory, which dynamically generate subgoals, facilitate symbolic inter-agent messaging, and support episodic recall. Training combines PPO with a language-conditioned loss and LLM query gating. LLM-MARL is evaluated in Google Research Football, MAgent Battle, and StarCraft II. Results show consistent improvements over MAPPO and QMIX in win rate, coordination score, and zero-shot generalization. Ablation studies demonstrate that subgoal generation and language-based messaging each contribute significantly to performance gains. Qualitative analysis reveals emergent behaviors such as role specialization and communication-driven tactics. By bridging language modeling and policy learning, this work contributes to the design of intelligent, cooperative agents in interactive simulations. It offers a path forward for leveraging LLMs in multi-agent systems used for training, games, and human-AI collaboration.
The Digital Ecosystem of Beliefs: does evolution favour AI over humans?
As AI systems are integrated into social networks, there are AI safety concerns that AI-generated content may dominate the web, e.g. in popularity or impact on beliefs. To understand such questions, this paper proposes the Digital Ecosystem of Beliefs (Digico), the first evolutionary framework for controlled experimentation with multi-population interactions in simulated social networks. Following a Universal Darwinism approach, the framework models a population of agents which change their messaging strategies due to evolutionary updates. They interact via messages, update their beliefs following a contagion model, and maintain their beliefs through cognitive Lamarckian inheritance. Initial experiments with Digico implement two types of agents, which are modelled to represent AIs vs humans based on higher rates of communication, higher rates of evolution, seeding fixed beliefs with propaganda aims, and higher influence on the recommendation algorithm. These experiments show that: a) when AIs have faster messaging, evolution, and more influence on the recommendation algorithm, they get 80% to 95% of the views; b) AIs designed for propaganda can typically convince 50% of humans to adopt extreme beliefs, and up to 85% when agents believe only a limited number of channels; c) a penalty for content that violates agents' beliefs reduces propaganda effectiveness up to 8%. We further discuss Digico as a tool for systematic experimentation across multi-agent configurations, the implications for legislation, personal use, and platform design, and the use of Digico for studying evolutionary principles.
Long-Term Mapping of the Douro River Plume with Multi-Agent Reinforcement Learning
We study the problem of long-term (multiple days) mapping of a river plume using multiple autonomous underwater vehicles (AUVs), focusing on the Douro river representative use-case. We propose an energy - and communication - efficient multi-agent reinforcement learning approach in which a central coordinator intermittently communicates with the AUVs, collecting measurements and issuing commands. Our approach integrates spatiotemporal Gaussian process regression (GPR) with a multi-head Q-network controller that regulates direction and speed for each AUV. Simulations using the Delft3D ocean model demonstrate that our method consistently outperforms both single- and multi-agent benchmarks, with scaling the number of agents both improving mean squared error (MSE) and operational endurance. In some instances, our algorithm demonstrates that doubling the number of AUVs can more than double endurance while maintaining or improving accuracy, underscoring the benefits of multi-agent coordination. Our learned policies generalize across unseen seasonal regimes over different months and years, demonstrating promise for future developments of data-driven long-term monitoring of dynamic plume environments.
GraphTeam: Facilitating Large Language Model-based Graph Analysis via Multi-Agent Collaboration
Graphs are widely used for modeling relational data in real-world scenarios, such as social networks and urban computing. Existing LLM-based graph analysis approaches either integrate graph neural networks (GNNs) for specific machine learning tasks, limiting their transferability, or rely solely on LLMs' internal reasoning ability, resulting in suboptimal performance. To address these limitations, we take advantage of recent advances in LLM-based agents, which have shown capabilities of utilizing external knowledge or tools for problem solving. By simulating human problem-solving strategies such as analogy and collaboration, we propose a multi-agent system based on LLMs named GraphTeam, for graph analysis. GraphTeam consists of five LLM-based agents from three modules, and the agents with different specialities can collaborate with each other to address complex problems. Specifically, (1) input-output normalization module: the question agent extracts and refines four key arguments from the original question, facilitating the problem understanding, and the answer agent organizes the results to meet the output requirement; (2) external knowledge retrieval module: we first build a knowledge base consisting of relevant documentation and experience information, and then the search agent retrieves the most relevant entries for each question. (3) problem-solving module: given the retrieved information from search agent, the coding agent uses established algorithms via programming to generate solutions, and in case the coding agent does not work, the reasoning agent will directly compute the results without programming. Extensive experiments on six graph analysis benchmarks demonstrate that GraphTeam achieves state-of-the-art performance with an average 25.85% improvement over the best baseline in terms of accuracy. The code and data are available at https://github.com/BUPT-GAMMA/GraphTeam.
Systems and Control (CS)
Model Predictive Control with Multiple Constraint Horizons
In this work we propose a Model Predictive Control (MPC) formulation that splits constraints in two different types. Motivated by safety considerations, the first type of constraint enforces a control-invariant set, while the second type could represent a less restrictive constraint on the system state. This distinction enables closed-loop sub- optimality results for nonlinear MPC with heterogeneous state constraints (distinct constraints across open loop predicted states), and no terminal elements. Removing the non-invariant constraint recovers the partially constrained case. Beyond its theoretical interest, heterogeneous constrained MPC shows how constraint choices shape the system's closed loop. In the partially constrained case, adjusting the constraint horizon (how many predicted- state constraints are enforced) trades estimation accuracy for computational cost. Our analysis yields first, a sub- optimality upper-bound accounting for distinct constraint sets, their horizons and decay rates, that is tighter for short horizons than prior work. Second, to our knowledge, we give the first lower bound (beyond open-loop cost) on closed-loop sub-optimality. Together these bounds provide a powerful analysis framework, allowing designers to evaluate the effect of horizons in MPC sub-optimality. We demonstrate our results via simulations on nonlinear and linear safety-critical systems.
comment: Submitted to Transactions on Automatic Control
Hopfield Neural Networks for Online Constrained Parameter Estimation with Time-Varying Dynamics and Disturbances
This paper proposes two projector-based Hopfield neural network (HNN) estimators for online, constrained parameter estimation under time-varying data, additive disturbances, and slowly drifting physical parameters. The first is a constraint-aware HNN that enforces linear equalities and inequalities (via slack neurons) and continuously tracks the constrained least-squares target. The second augments the state with compensation neurons and a concatenated regressor to absorb bias-like disturbance components within the same energy function. For both estimators we establish global uniform ultimate boundedness with explicit convergence rate and ultimate bound, and we derive practical tuning rules that link the three design gains to closed-loop bandwidth and steady-state accuracy. We also introduce an online identifiability monitor that adapts the constraint weight and time step, and, when needed, projects updates onto identifiable subspaces to prevent drift in poorly excited directions...
comment: Submitted to International Journal od Adaptive Control and Signal Processing
Second-Order Policy Gradient Methods for the Linear Quadratic Regulator
Policy gradient methods are a powerful family of reinforcement learning algorithms for continuous control that optimize a policy directly. However, standard first-order methods often converge slowly. Second-order methods can accelerate learning by using curvature information, but they are typically expensive to compute. The linear quadratic regulator (LQR) is a practical setting in which key quantities, such as the policy gradient, admit closed-form expressions. In this work, we develop second-order policy gradient algorithms for LQR by deriving explicit formulas for both the approximate and exact Hessians used in Gauss--Newton and Newton methods, respectively. Numerical experiments show a faster convergence rate for the proposed second-order approach over the standard first-order policy gradient baseline.
comment: 8 pages, 2 figs
ABIDES-MARL: A Multi-Agent Reinforcement Learning Environment for Endogenous Price Formation and Execution in a Limit Order Book
We present ABIDES-MARL, a framework that combines a new multi-agent reinforcement learning (MARL) methodology with a new realistic limit-order-book (LOB) simulation system to study equilibrium behavior in complex financial market games. The system extends ABIDES-Gym by decoupling state collection from kernel interruption, enabling synchronized learning and decision-making for multiple adaptive agents while maintaining compatibility with standard RL libraries. It preserves key market features such as price-time priority and discrete tick sizes. Methodologically, we use MARL to approximate equilibrium-like behavior in multi-period trading games with a finite number of heterogeneous agents-an informed trader, a liquidity trader, noise traders, and competing market makers-all with individual price impacts. This setting bridges optimal execution and market microstructure by embedding the liquidity trader's optimization problem within a strategic trading environment. We validate the approach by solving an extended Kyle model within the simulation system, recovering the gradual price discovery phenomenon. We then extend the analysis to a liquidity trader's problem where market liquidity arises endogenously and show that, at equilibrium, execution strategies shape market-maker behavior and price dynamics. ABIDES-MARL provides a reproducible foundation for analyzing equilibrium and strategic adaptation in realistic markets and contributes toward building economically interpretable agentic AI systems for finance.
MOBIUS: A Multi-Modal Bipedal Robot that can Walk, Crawl, Climb, and Roll
This article presents a Multi-Modal Bipedal Intelligent Urban Scout robot (MOBIUS) capable of walking, crawling, climbing, and rolling. MOBIUS features four limbs--two 6-DoF arms with two-finger grippers for manipulation and climbing, and two 4-DoF legs for locomotion--enabling smooth transitions across diverse terrains without reconfiguration. A hybrid control architecture combines reinforcement learning-based locomotion with model-based predictive and admittance control enhanced for safety by a Reference Governor toward compliant contact interactions. A high-level MIQCP planner autonomously selects locomotion modes to balance stability and energy efficiency. Hardware experiments demonstrate robust gait transitions, dynamic climbing, and full-body load support via pinch grasp. Overall, MOBIUS demonstrates the importance of tight integration between morphology, high-level planning, and control to enable mobile loco-manipulation and grasping, substantially expanding its interaction capabilities, workspace, and traversability.
comment: 23 pages, 20 figures. Collaborative work between the Robotics and Mechanisms Laboratory (RoMeLa) and Mitsubishi Electric Research Laboratories (MERL)
On polynomial explicit partial estimator design for nonlinear systems with parametric uncertainties
This paper investigates the idea of designing data-driven partial estimators for nonlinear systems showing parametric uncertainties using sparse multivariate polynomial relationships. A general framework is first presented and then validated on two illustrative examples with comparison to different possible Machine/Deep-Learning based alternatives. The results suggests the superiority of the proposed sparse identification scheme, at least when the learning data is small.
comment: Submitted to ACC2026
Deep Learning Prediction of Beam Coherence Time for Near-FieldTeraHertz Networks
Large multiple antenna arrays coupled with accurate beamforming are essential in terahertz (THz) communications to ensure link reliability. However, as the number of antennas increases, beam alignment (focusing) and beam tracking in mobile networks incur prohibitive overhead. Additionally, the near-field region expands both with the size of antenna arrays and the carrier frequency, calling for adjustments in the beamforming to account for spherical wavefront instead of the conventional planar wave assumption. In this letter, we introduce a novel beam coherence time for mobile THz networks, to drastically reduce the rate of beam updates. Then, we propose a deep learning model, relying on a simple feedforward neural network with a time-dependent input, to predict the beam coherence time and adjust the beamforming on the fly with minimal overhead. Our numerical results demonstrate the effectiveness of the proposed approach by enabling higher data rates while reducing the overhead, especially at high (i.e., vehicular) mobility.
comment: IEEE Wireless Communication Letters (accepted October 2025)
Data-driven stabilization of nonlinear systems via descriptor embedding
We introduce the notion of descriptor embedding for nonlinear systems and use it for the data-driven design of stabilizing controllers. Specifically, we provide sufficient data-dependent LMI conditions which, if feasible, return a stabilizing nonlinear controller of the form $u=K(x)Z(x)$ where $K(x)$ belongs to a polytope and $Z$ is a user-defined function. The proposed method is then extended to account for the presence of uncertainties and noisy data. Furthermore, a method to estimate the resulting region of attraction is given using only data. Simulation examples are used to illustrate the results and compare them to existing methods from the literature.
comment: 16 pages, 5 figures, submitted to IEEE Transactions on Automatic Control
Evolutionary Dynamics in Continuous-time Finite-state Mean Field Games -- Part I: Equilibria
We study a dynamic game with a large population of players who choose actions from a finite set in continuous time. Each player has a state in a finite state space that evolves stochastically with their actions. A player's reward depends not only on their own state and action but also on the distribution of states and actions across the population, capturing effects such as congestion in traffic networks. While prior work in evolutionary game theory has primarily focused on static games without individual player state dynamics, we present the first comprehensive evolutionary analysis of such dynamic games. We propose an evolutionary model together with a mean field approximation of the finite-population game and establish strong approximation guarantees. We show that standard solution concepts for dynamic games lack an evolutionary interpretation, and we propose a new concept - the Mixed Stationary Nash Equilibrium (MSNE) - which admits one. We analyze the relationship between MSNE and the rest points of the mean field evolutionary model and study the evolutionary stability of MSNE.
Risk Aware Safe Control with Cooperative Sensing for Dynamic Obstacle Avoidance
This paper presents the design, development, and on vehicle implementation and validation of a safety critical controller for autonomous driving under sensing and communication uncertainty. Cooperative sensing, fused via a Wasserstein barycenter (WB), is used to optimize the distribution of the dynamic obstacle locations. The Conditional Value at Risk (CVaR) is introduced to form a risk aware control-barrier-function (CBF) framework with the optimized distribution samplings. The proposed WB CVaR CBF safety filter improves control inputs that minimize tail risk while certifying forward invariance of the safe set. A model predictive controller (MPC) performs path tracking, and the safety filter modulates the nominal control inputs to enforce risk aware constraints. We detail the software architecture and integration with vehicle actuation and cooperative sensing. The approach is evaluated on a full-scale autonomous vehicle (AV) in scenarios with measurement noise, communication perturbations, and input disturbances, and is compared against a baseline MPC CBF design. Results demonstrate improved safety margins and robustness, highlighting the practicality of deploying the risk-aware safety filter on an actual AV.
Orthogonal-by-construction augmentation of physics-based input-output models
Model augmentation is a promising approach for integrating first-principles-based models with machine learning components. Augmentation can result in better model accuracy and faster convergence compared to black-box system identification methods, while maintaining interpretability of the models in terms of how the original dynamics are complemented by learning. A widely used augmentation structure in the literature is based on the parallel connection of the physics-based and learning components, for both of which the corresponding parameters are jointly optimized. However, due to overlap in representation of the system dynamics by such an additive structure, estimation often leads to physically unrealistic parameters, compromising model interpretability. To overcome this limitation, this paper introduces a novel orthogonal-by-construction model augmentation structure for input-output models, that guarantees recovery of the physically true parameters under appropriate identifiability conditions.
comment: Submitted for publication
A High-Speed Capable Spherical Robot
This paper designs a new spherical robot structure capable of supporting high-speed motion at up to 10 m/s. Building upon a single-pendulum-driven spherical robot, the design incorporates a momentum wheel with an axis aligned with the secondary pendulum, creating a novel spherical robot structure. Practical experiments with the physical prototype have demonstrated that this new spherical robot can achieve stable high-speed motion through simple decoupled control, which was unattainable with the original structure. The spherical robot designed for high-speed motion not only increases speed but also significantly enhances obstacle-crossing performance and terrain robustness.
comment: 5 pages
Koopman-based Prediction of Connectivity for Flying Ad Hoc Networks
The application of machine learning (ML) to communication systems is expected to play a pivotal role in future artificial intelligence (AI)-based next-generation wireless networks. While most existing works focus on ML techniques for static wireless environments, they often face limitations when applied to highly dynamic environments, such as flying ad hoc networks (FANETs). This paper explores the use of data-driven Koopman approaches to address these challenges. Specifically, we investigate how these approaches can model UAV trajectory dynamics within FANETs, enabling more accurate predictions and improved network performance. By leveraging Koopman operator theory, we propose two possible approaches -- centralized and distributed -- to efficiently address the challenges posed by the constantly changing topology of FANETs. To demonstrate this, we consider a FANET performing surveillance with UAVs following pre-determined trajectories and predict signal-to-interference-plus-noise ratios (SINRs) to ensure reliable communication between UAVs. Our results show that these approaches can accurately predict connectivity and isolation events that lead to modelled communication outages. This capability could help UAVs schedule their transmissions based on these predictions.
Experimental Demonstration of Software-Orchestrated Quantum Network Applications over a Campus-Scale Testbed
To fulfill their promise, quantum networks must transform from isolated testbeds into scalable infrastructures for distributed quantum applications. In this paper, we present a prototype orchestrator for the Argonne Quantum Network (ArQNet) testbed that leverages design principles of software-defined networking (SDN) to automate typical quantum communication experiments across buildings in the Argonne campus connected over deployed, telecom fiber. Our implementation validates a scalable architecture supporting service-level abstraction of quantum networking tasks, distributed time synchronization, and entanglement verification across remote nodes. We present a prototype service of continuous, stable entanglement distribution between remote sites that ran for 12 hours, which defines a promising path towards scalable quantum networks.
comment: 11 pages, 8 figures, journal
Deep Learning-Accelerated Shapley Value for Fair Allocation in Power Systems: The Case of Carbon Emission Responsibility
Allocating costs, benefits, and emissions fairly among power system participant entities represents a persistent challenge. The Shapley value provides an axiomatically fair solution, yet computational barriers have limited its adoption beyond small-scale applications. This paper presents SurroShap, a scalable Shapley value approximation framework combining efficient coalition sampling with deep learning surrogate models that accelerate characteristic function evaluations. Exemplified through carbon emission responsibility allocation in power networks, SurroShap enables Shapley-based fair allocation for power systems with thousands of entities for the first time. We derive theoretical error bounds proving that time-averaged SurroShap allocations converge to be $\varepsilon$-close to exact Shapley values. Experiments on nine systems ranging from 26 to 1,951 entities demonstrate completion within the real-time operational window even at maximum scale, achieving 10^4-10^5 speedups over other sampling-based methods while maintaining tight error bounds. The resulting Shapley-based carbon allocations possess six desirable properties aligning individual interests with decarbonization goals. Year-long simulations on the Texas 2000-bus system validate real-world applicability, with regional analysis revealing how renewable-rich areas offset emission responsibility through exports while load centers bear responsibility for driving system-wide generation.
Neural Networks for AC Optimal Power Flow: Improving Worst-Case Guarantees during Training SC
The AC Optimal Power Flow (AC-OPF) problem is central to power system operation but challenging to solve efficiently due to its nonconvex and nonlinear nature. Neural networks (NNs) offer fast surrogates, yet their black-box behavior raises concerns about constraint violations that can compromise safety. We propose a verification-informed NN framework that incorporates worst-case constraint violations directly into training, producing models that are both accurate and provably safer. Through post-hoc verification, we achieve substantial reductions in worst-case violations and, for the first time, verify all operational constraints of large-scale AC-OPF proxies. Practical feasibility is further enhanced via restoration and warm-start strategies for infeasible operating points. Experiments on systems ranging from 57 to 793 buses demonstrate scalability, speed, and reliability, bridging the gap between ML acceleration and safe, real-time deployment of AC-OPF solutions - and paving the way toward data-driven optimal control.
comment: Submitted to PSCC 2026 (under review)
Limits of Safe AI Deployment: Differentiating Oversight and Control
Oversight and control, which we collectively call supervision, are often discussed as ways to ensure that AI systems are accountable, reliable, and able to fulfill governance and management requirements. However, the requirements for "human oversight" risk codifying vague or inconsistent interpretations of key concepts like oversight and control. This ambiguous terminology could undermine efforts to design or evaluate systems that must operate under meaningful human supervision. This matters because the term is used by regulatory texts such as the EU AI Act. This paper undertakes a targeted critical review of literature on supervision outside of AI, along with a brief summary of past work on the topic related to AI. We next differentiate control as ex-ante or real-time and operational rather than policy or governance, and oversight as performed ex-post, or a policy and governance function. Control aims to prevent failures, while oversight focuses on detection, remediation, or incentives for future prevention. Building on this, we make three contributions. 1) We propose a framework to align regulatory expectations with what is technically and organizationally plausible, articulating the conditions under which each mechanism is possible, where they fall short, and what is required to make them meaningful in practice. 2) We outline how supervision methods should be documented and integrated into risk management, and drawing on the Microsoft Responsible AI Maturity Model, we outline a maturity model for AI supervision. 3) We explicitly highlight boundaries of these mechanisms, including where they apply, where they fail, and where it is clear that no existing methods suffice. This foregrounds the question of whether meaningful supervision is possible in a given deployment context, and can support regulators, auditors, and practitioners in identifying both present and future limitations.
comment: Revised to improve table formatting and update draft
Dual-Regularized Riccati Recursions for Interior-Point Optimal Control
We derive closed-form extensions of Riccati's recursions (both sequential and parallel) for solving dual-regularized LQR problems. We show how these methods can be used to solve general constrained, non-convex, discrete-time optimal control problems via a regularized interior point method, while guaranteeing that each primal step is a descent direction of an Augmented Barrier-Lagrangian merit function. We provide MIT-licensed implementations of our methods in C++ and JAX.
Spherical Point Process with Random Heights: New Approach for Modeling and Analysis of Downlink Satellite Networks
The Low Earth Orbit (LEO) satellite industry is undergoing rapid expansion, with operators competitively launching satellites due to the first-come, first-served principle governing orbital rights. This has led to the formation of increasingly large-scale, volumetric constellation where satellites operate across a diverse range of altitudes. To address the need for analyzing such complex networks, this paper establishes a new analytical framework for LEO constellations by leveraging a 3D Poisson point process (PPP). Specifically, we introduce a random height model (RHM) that can capture various altitude distributions by applying a random radial displacement to points generated by a homogeneous PPP on a nominal shell. Building on this, we derive an analytical expression for the downlink coverage probability. To motivate our model, we show that the altitude distributions of several leading satellite constellations, including Starlink, align with our model's assumptions. We then demonstrate through Monte Carlo simulations that the coverage probability of our RHM closely matches that of these real-world networks. Finally, we confirm the accuracy of our analytical expressions by showing their agreement with simulation results. Our work thereby provides a powerful tool for understanding and predict how the statistical distribution of satellite altitudes impacts network performance.
comment: submitted to IEEE journal
Aggregative games with bilevel structures: Distributed algorithms and convergence analysis
In this paper, the problem of distributively seeking the equilibria of aggregative games with bilevel structures is studied. Different from the traditional aggregative games, here the aggregation is determined by the minimizer of a virtual leader's objective function in the inner level, which depends on the actions of the players in the outer level. Moreover, the global objective function of the virtual leader is formed by the sum of some local functions with two arguments, each of which is strongly convex with respect to the second argument. When making decisions, each player in the outer level only has access to a local part of the virtual leader's objective function. To handle this problem, first, we propose a second order gradient-based distributed algorithm, where the Hessian matrices associated with the objective functions of the leader are involved. By the algorithm, players update their actions while cooperatively minimizing the objective function of the virtual leader to estimate the aggregation by communicating with their neighbors via a connected graph. Under mild assumptions on the graph and cost functions, we prove that the actions of players asymptotically converge to the Nash equilibrium point. Then, for the case where the Hessian matrices associated with the objective functions of the virtual leader are not available, we propose a first order gradient-based distributed algorithm, where a distributed two-point estimate strategy is developed to estimate the gradients of players' cost functions in the outer level. Under the same conditions, we prove that the convergence errors of players' actions to the Nash equilibrium point are linear with respect to the estimate parameters. Finally, simulations are provided to demonstrate the effectiveness of our theoretical results.
Simultaneous System Identification and Model Predictive Control with No Dynamic Regret
We provide an algorithm for the simultaneous system identification and model predictive control of nonlinear systems. The algorithm has finite-time near-optimality guarantees and asymptotically converges to the optimal (non-causal) controller. Particularly, the algorithm enjoys sublinear dynamic regret, defined herein as the suboptimality against an optimal clairvoyant controller that knows how the unknown disturbances and system dynamics will adapt to its actions. The algorithm is self-supervised and applies to control-affine systems with unknown dynamics and disturbances that can be expressed in reproducing kernel Hilbert spaces. Such spaces can model external disturbances and modeling errors that can even be adaptive to the system's state and control input. For example, they can model wind and wave disturbances to aerial and marine vehicles, or inaccurate model parameters such as inertia of mechanical systems. The algorithm first generates random Fourier features that are used to approximate the unknown dynamics or disturbances. Then, it employs model predictive control based on the current learned model of the unknown dynamics (or disturbances). The model of the unknown dynamics is updated online using least squares based on the data collected while controlling the system. We validate our algorithm in both hardware experiments and physics-based simulations. The simulations include (i) a cart-pole aiming to maintain the pole upright despite inaccurate model parameters, and (ii) a quadrotor aiming to track reference trajectories despite unmodeled aerodynamic drag effects. The hardware experiments include a quadrotor aiming to track a circular trajectory despite unmodeled aerodynamic drag effects, ground effects, and wind disturbances.
comment: IEEE Transactions on Robotics (T-RO). v6 update on stability analysis in Appendix J under relaxed Assumption 1
Adaptive Multirobot Virtual Structure Control using Dual Quaternions
This paper presents a control strategy based on dual quaternions for the coordinated formation flying of small UAV groups. A virtual structure is employed to define the desired formation, enabling unified control of its position, orientation, and shape. This abstraction makes formation management easier by allowing a low-level controller to compute individual UAV commands efficiently. The proposed controller integrates a pose control module with a geometry-based adaptive strategy, ensuring precise and robust task execution. The effectiveness of the approach is demonstrated through both simulation and experimental results.
Action Chunking and Exploratory Data Collection Yield Exponential Improvements in Behavior Cloning for Continuous Control
This paper presents a theoretical analysis of two of the most impactful interventions in modern learning from demonstration in robotics and continuous control: the practice of action-chunking (predicting sequences of actions in open-loop) and exploratory augmentation of expert demonstrations. Though recent results show that learning from demonstration, also known as imitation learning (IL), can suffer errors that compound exponentially with task horizon in continuous settings, we demonstrate that action chunking and exploratory data collection circumvent exponential compounding errors in different regimes. Our results identify control-theoretic stability as the key mechanism underlying the benefits of these interventions. On the empirical side, we validate our predictions and the role of control-theoretic stability through experimentation on popular robot learning benchmarks. On the theoretical side, we demonstrate that the control-theoretic lens provides fine-grained insights into how compounding error arises, leading to tighter statistical guarantees on imitation learning error when these interventions are applied than previous techniques based on information-theoretic considerations alone.
comment: Updated manuscript. Added new experiments, figures, and exposition
Long-Term Mapping of the Douro River Plume with Multi-Agent Reinforcement Learning
We study the problem of long-term (multiple days) mapping of a river plume using multiple autonomous underwater vehicles (AUVs), focusing on the Douro river representative use-case. We propose an energy - and communication - efficient multi-agent reinforcement learning approach in which a central coordinator intermittently communicates with the AUVs, collecting measurements and issuing commands. Our approach integrates spatiotemporal Gaussian process regression (GPR) with a multi-head Q-network controller that regulates direction and speed for each AUV. Simulations using the Delft3D ocean model demonstrate that our method consistently outperforms both single- and multi-agent benchmarks, with scaling the number of agents both improving mean squared error (MSE) and operational endurance. In some instances, our algorithm demonstrates that doubling the number of AUVs can more than double endurance while maintaining or improving accuracy, underscoring the benefits of multi-agent coordination. Our learned policies generalize across unseen seasonal regimes over different months and years, demonstrating promise for future developments of data-driven long-term monitoring of dynamic plume environments.
Detection Augmented Bandit Procedures for Piecewise Stationary MABs: A Modular Approach
Conventional Multi-Armed Bandit (MAB) algorithms are designed for stationary environments, where the reward distributions associated with the arms do not change with time. In many applications, however, the environment is more accurately modeled as being non-stationary. In this work, piecewise stationary MAB (PS-MAB) environments are investigated, in which the reward distributions associated with a subset of the arms change at some change-points and remain stationary between change-points. Our focus is on the asymptotic analysis of PS-MABs, for which practical algorithms based on change detection have been previously proposed. Our goal is to modularize the design and analysis of such Detection Augmented Bandit (DAB) procedures. To this end, we first provide novel, improved performance lower bounds for PS-MABs. Then, we identify the requirements for stationary bandit algorithms and change detectors in a DAB procedure that are needed for the modularization. We assume that the rewards are sub-Gaussian. Under this assumption and a condition on the separation of the change-points, we show that the analysis of DAB procedures can indeed be modularized, so that the regret bounds can be obtained in a unified manner for various combinations of change detectors and bandit algorithms. Through this analysis, we develop new modular DAB procedures that are order-optimal. Finally, we showcase the practical effectiveness of our modular DAB approach in our experiments, studying its regret performance compared to other methods and investigating its detection capabilities.
comment: 30 pages, 4 figures, 1 table, submitted to TIT
The Difference between the Left and Right Invariant Extended Kalman Filter
The extended Kalman filter (EKF) has been the industry standard for state estimation problems over the past sixty years. The Invariant Extended Kalman Filter (IEKF) is a recent development of the EKF for the class of group-affine systems on Lie groups that has shown superior performance for inertial navigation problems. The IEKF comes in two versions, left- and right- handed respectively, and there is a perception in the robotics community that these filters are different and one should choose the handedness of the IEKF to match handedness of the measurement model for a given filtering problem. In this paper, we revisit these algorithms and demonstrate that the left- and right- IEKF algorithms (with reset step) are identical, that is, the choice of the handedness does not affect the IEKF's performance when the reset step is properly implemented. The reset step was not originally proposed as part of the IEKF, however, we provide simulations to show that the reset step improves asymptotic performance of all versions of the the filter, and should be included in all high performance algorithms. The GNSS-aided inertial navigation system (INS) is used as a motivating example to demonstrate the equivalence of the two filters.
comment: 20 pages, 4 figures, submitted to Control Engineering Practice
Expertise and confidence explain how social influence evolves along intellective tasks
Discovering the antecedents of individuals' influence in collaborative environments is an important, practical, and challenging problem. In this paper, we study interpersonal influence in small groups of individuals who collectively execute a sequence of intellective tasks. We observe that along an issue sequence with feedback, individuals with higher expertise and social confidence are accorded higher interpersonal influence. We also observe that low-performing individuals tend to underestimate their high-performing teammate's expertise. Based on these observations, we introduce three hypotheses and present empirical and theoretical support for their validity. We report empirical evidence on longstanding theories of transactive memory systems, social comparison, and confidence heuristics on the origins of social influence. We propose a cognitive dynamical model inspired by these theories to describe the process by which individuals adjust interpersonal influences over time. We demonstrate the model's accuracy in predicting individuals' influence and provide analytical results on its asymptotic behavior for the case with identically performing individuals. Lastly, we propose a novel approach using deep neural networks on a pre-trained text embedding model for predicting the influence of individuals. Using message contents, message times, and individual correctness collected during tasks, we are able to accurately predict individuals' self-reported influence over time. Extensive experiments verify the accuracy of the proposed models compared to baselines such as structural balance and reflected appraisal model. While the neural networks model is the most accurate, the dynamical model is the most interpretable for influence prediction.
Joint Scheduling of DER under Demand Charges: Structure and Approximation
We study the joint scheduling of behind-the-meter distributed energy resources (DERs), including flexible loads, renewable generation, and battery energy storage systems, under net energy metering tariffs with demand charges. The problem is formulated as a stochastic dynamic program aimed at maximizing expected operational surplus while accounting for renewable generation uncertainty. We analytically characterize the optimal control policy and show that it admits a threshold-based structure. However, due to the strong temporal coupling of the storage and demand charge constraints, the number of conditional branches in the policy scales combinatorially with the scheduling horizon, as it requires a look-ahead over future states. To overcome the high computational complexity in the general formulation, an efficient approximation algorithm is proposed, which searches for the peak demand under a mildly relaxed problem. We show that the algorithm scales linearly with the scheduling horizon. Extensive simulations using two open-source datasets validate the proposed algorithm and compare its performance against different DER control strategies, including a reinforcement learning-based one. Under varying storage and tariff parameters, the results show that the proposed algorithm outperforms various benchmarks in achieving a relatively small solution gap compared to a theoretical upper bound.
comment: 15 pages, 6 tables, 8 figures
Multi-Objective Planning with Contextual Lexicographic Reward Preferences AAMAS
Autonomous agents are often required to plan under multiple objectives whose preference ordering varies based on context. The agent may encounter multiple contexts during its course of operation, each imposing a distinct lexicographic ordering over the objectives, with potentially different reward functions associated with each context. Existing approaches to multi-objective planning typically consider a single preference ordering over the objectives, across the state space, and do not support planning under multiple objective orderings within an environment. We present Contextual Lexicographic Markov Decision Process (CLMDP), a framework that enables planning under varying lexicographic objective orderings, depending on the context. In a CLMDP, both the objective ordering at a state and the associated reward functions are determined by the context. We employ a Bayesian approach to infer a state-context mapping from expert trajectories. Our algorithm to solve a CLMDP first computes a policy for each objective ordering and then combines them into a single context-aware policy that is valid and cycle-free. The effectiveness of the proposed approach is evaluated in simulation and using a mobile robot.
comment: 9 pages, 5 figures, 2 tables, To appear in Proceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems (AAMAS) 2025
Systems and Control (EESS)
Model Predictive Control with Multiple Constraint Horizons
In this work we propose a Model Predictive Control (MPC) formulation that splits constraints in two different types. Motivated by safety considerations, the first type of constraint enforces a control-invariant set, while the second type could represent a less restrictive constraint on the system state. This distinction enables closed-loop sub- optimality results for nonlinear MPC with heterogeneous state constraints (distinct constraints across open loop predicted states), and no terminal elements. Removing the non-invariant constraint recovers the partially constrained case. Beyond its theoretical interest, heterogeneous constrained MPC shows how constraint choices shape the system's closed loop. In the partially constrained case, adjusting the constraint horizon (how many predicted- state constraints are enforced) trades estimation accuracy for computational cost. Our analysis yields first, a sub- optimality upper-bound accounting for distinct constraint sets, their horizons and decay rates, that is tighter for short horizons than prior work. Second, to our knowledge, we give the first lower bound (beyond open-loop cost) on closed-loop sub-optimality. Together these bounds provide a powerful analysis framework, allowing designers to evaluate the effect of horizons in MPC sub-optimality. We demonstrate our results via simulations on nonlinear and linear safety-critical systems.
comment: Submitted to Transactions on Automatic Control
Hopfield Neural Networks for Online Constrained Parameter Estimation with Time-Varying Dynamics and Disturbances
This paper proposes two projector-based Hopfield neural network (HNN) estimators for online, constrained parameter estimation under time-varying data, additive disturbances, and slowly drifting physical parameters. The first is a constraint-aware HNN that enforces linear equalities and inequalities (via slack neurons) and continuously tracks the constrained least-squares target. The second augments the state with compensation neurons and a concatenated regressor to absorb bias-like disturbance components within the same energy function. For both estimators we establish global uniform ultimate boundedness with explicit convergence rate and ultimate bound, and we derive practical tuning rules that link the three design gains to closed-loop bandwidth and steady-state accuracy. We also introduce an online identifiability monitor that adapts the constraint weight and time step, and, when needed, projects updates onto identifiable subspaces to prevent drift in poorly excited directions...
comment: Submitted to International Journal od Adaptive Control and Signal Processing
Second-Order Policy Gradient Methods for the Linear Quadratic Regulator
Policy gradient methods are a powerful family of reinforcement learning algorithms for continuous control that optimize a policy directly. However, standard first-order methods often converge slowly. Second-order methods can accelerate learning by using curvature information, but they are typically expensive to compute. The linear quadratic regulator (LQR) is a practical setting in which key quantities, such as the policy gradient, admit closed-form expressions. In this work, we develop second-order policy gradient algorithms for LQR by deriving explicit formulas for both the approximate and exact Hessians used in Gauss--Newton and Newton methods, respectively. Numerical experiments show a faster convergence rate for the proposed second-order approach over the standard first-order policy gradient baseline.
comment: 8 pages, 2 figs
ABIDES-MARL: A Multi-Agent Reinforcement Learning Environment for Endogenous Price Formation and Execution in a Limit Order Book
We present ABIDES-MARL, a framework that combines a new multi-agent reinforcement learning (MARL) methodology with a new realistic limit-order-book (LOB) simulation system to study equilibrium behavior in complex financial market games. The system extends ABIDES-Gym by decoupling state collection from kernel interruption, enabling synchronized learning and decision-making for multiple adaptive agents while maintaining compatibility with standard RL libraries. It preserves key market features such as price-time priority and discrete tick sizes. Methodologically, we use MARL to approximate equilibrium-like behavior in multi-period trading games with a finite number of heterogeneous agents-an informed trader, a liquidity trader, noise traders, and competing market makers-all with individual price impacts. This setting bridges optimal execution and market microstructure by embedding the liquidity trader's optimization problem within a strategic trading environment. We validate the approach by solving an extended Kyle model within the simulation system, recovering the gradual price discovery phenomenon. We then extend the analysis to a liquidity trader's problem where market liquidity arises endogenously and show that, at equilibrium, execution strategies shape market-maker behavior and price dynamics. ABIDES-MARL provides a reproducible foundation for analyzing equilibrium and strategic adaptation in realistic markets and contributes toward building economically interpretable agentic AI systems for finance.
MOBIUS: A Multi-Modal Bipedal Robot that can Walk, Crawl, Climb, and Roll
This article presents a Multi-Modal Bipedal Intelligent Urban Scout robot (MOBIUS) capable of walking, crawling, climbing, and rolling. MOBIUS features four limbs--two 6-DoF arms with two-finger grippers for manipulation and climbing, and two 4-DoF legs for locomotion--enabling smooth transitions across diverse terrains without reconfiguration. A hybrid control architecture combines reinforcement learning-based locomotion with model-based predictive and admittance control enhanced for safety by a Reference Governor toward compliant contact interactions. A high-level MIQCP planner autonomously selects locomotion modes to balance stability and energy efficiency. Hardware experiments demonstrate robust gait transitions, dynamic climbing, and full-body load support via pinch grasp. Overall, MOBIUS demonstrates the importance of tight integration between morphology, high-level planning, and control to enable mobile loco-manipulation and grasping, substantially expanding its interaction capabilities, workspace, and traversability.
comment: 23 pages, 20 figures. Collaborative work between the Robotics and Mechanisms Laboratory (RoMeLa) and Mitsubishi Electric Research Laboratories (MERL)
On polynomial explicit partial estimator design for nonlinear systems with parametric uncertainties
This paper investigates the idea of designing data-driven partial estimators for nonlinear systems showing parametric uncertainties using sparse multivariate polynomial relationships. A general framework is first presented and then validated on two illustrative examples with comparison to different possible Machine/Deep-Learning based alternatives. The results suggests the superiority of the proposed sparse identification scheme, at least when the learning data is small.
comment: Submitted to ACC2026
Deep Learning Prediction of Beam Coherence Time for Near-FieldTeraHertz Networks
Large multiple antenna arrays coupled with accurate beamforming are essential in terahertz (THz) communications to ensure link reliability. However, as the number of antennas increases, beam alignment (focusing) and beam tracking in mobile networks incur prohibitive overhead. Additionally, the near-field region expands both with the size of antenna arrays and the carrier frequency, calling for adjustments in the beamforming to account for spherical wavefront instead of the conventional planar wave assumption. In this letter, we introduce a novel beam coherence time for mobile THz networks, to drastically reduce the rate of beam updates. Then, we propose a deep learning model, relying on a simple feedforward neural network with a time-dependent input, to predict the beam coherence time and adjust the beamforming on the fly with minimal overhead. Our numerical results demonstrate the effectiveness of the proposed approach by enabling higher data rates while reducing the overhead, especially at high (i.e., vehicular) mobility.
comment: IEEE Wireless Communication Letters (accepted October 2025)
Data-driven stabilization of nonlinear systems via descriptor embedding
We introduce the notion of descriptor embedding for nonlinear systems and use it for the data-driven design of stabilizing controllers. Specifically, we provide sufficient data-dependent LMI conditions which, if feasible, return a stabilizing nonlinear controller of the form $u=K(x)Z(x)$ where $K(x)$ belongs to a polytope and $Z$ is a user-defined function. The proposed method is then extended to account for the presence of uncertainties and noisy data. Furthermore, a method to estimate the resulting region of attraction is given using only data. Simulation examples are used to illustrate the results and compare them to existing methods from the literature.
comment: 16 pages, 5 figures, submitted to IEEE Transactions on Automatic Control
Evolutionary Dynamics in Continuous-time Finite-state Mean Field Games -- Part I: Equilibria
We study a dynamic game with a large population of players who choose actions from a finite set in continuous time. Each player has a state in a finite state space that evolves stochastically with their actions. A player's reward depends not only on their own state and action but also on the distribution of states and actions across the population, capturing effects such as congestion in traffic networks. While prior work in evolutionary game theory has primarily focused on static games without individual player state dynamics, we present the first comprehensive evolutionary analysis of such dynamic games. We propose an evolutionary model together with a mean field approximation of the finite-population game and establish strong approximation guarantees. We show that standard solution concepts for dynamic games lack an evolutionary interpretation, and we propose a new concept - the Mixed Stationary Nash Equilibrium (MSNE) - which admits one. We analyze the relationship between MSNE and the rest points of the mean field evolutionary model and study the evolutionary stability of MSNE.
Risk Aware Safe Control with Cooperative Sensing for Dynamic Obstacle Avoidance
This paper presents the design, development, and on vehicle implementation and validation of a safety critical controller for autonomous driving under sensing and communication uncertainty. Cooperative sensing, fused via a Wasserstein barycenter (WB), is used to optimize the distribution of the dynamic obstacle locations. The Conditional Value at Risk (CVaR) is introduced to form a risk aware control-barrier-function (CBF) framework with the optimized distribution samplings. The proposed WB CVaR CBF safety filter improves control inputs that minimize tail risk while certifying forward invariance of the safe set. A model predictive controller (MPC) performs path tracking, and the safety filter modulates the nominal control inputs to enforce risk aware constraints. We detail the software architecture and integration with vehicle actuation and cooperative sensing. The approach is evaluated on a full-scale autonomous vehicle (AV) in scenarios with measurement noise, communication perturbations, and input disturbances, and is compared against a baseline MPC CBF design. Results demonstrate improved safety margins and robustness, highlighting the practicality of deploying the risk-aware safety filter on an actual AV.
Orthogonal-by-construction augmentation of physics-based input-output models
Model augmentation is a promising approach for integrating first-principles-based models with machine learning components. Augmentation can result in better model accuracy and faster convergence compared to black-box system identification methods, while maintaining interpretability of the models in terms of how the original dynamics are complemented by learning. A widely used augmentation structure in the literature is based on the parallel connection of the physics-based and learning components, for both of which the corresponding parameters are jointly optimized. However, due to overlap in representation of the system dynamics by such an additive structure, estimation often leads to physically unrealistic parameters, compromising model interpretability. To overcome this limitation, this paper introduces a novel orthogonal-by-construction model augmentation structure for input-output models, that guarantees recovery of the physically true parameters under appropriate identifiability conditions.
comment: Submitted for publication
A High-Speed Capable Spherical Robot
This paper designs a new spherical robot structure capable of supporting high-speed motion at up to 10 m/s. Building upon a single-pendulum-driven spherical robot, the design incorporates a momentum wheel with an axis aligned with the secondary pendulum, creating a novel spherical robot structure. Practical experiments with the physical prototype have demonstrated that this new spherical robot can achieve stable high-speed motion through simple decoupled control, which was unattainable with the original structure. The spherical robot designed for high-speed motion not only increases speed but also significantly enhances obstacle-crossing performance and terrain robustness.
comment: 5 pages
Koopman-based Prediction of Connectivity for Flying Ad Hoc Networks
The application of machine learning (ML) to communication systems is expected to play a pivotal role in future artificial intelligence (AI)-based next-generation wireless networks. While most existing works focus on ML techniques for static wireless environments, they often face limitations when applied to highly dynamic environments, such as flying ad hoc networks (FANETs). This paper explores the use of data-driven Koopman approaches to address these challenges. Specifically, we investigate how these approaches can model UAV trajectory dynamics within FANETs, enabling more accurate predictions and improved network performance. By leveraging Koopman operator theory, we propose two possible approaches -- centralized and distributed -- to efficiently address the challenges posed by the constantly changing topology of FANETs. To demonstrate this, we consider a FANET performing surveillance with UAVs following pre-determined trajectories and predict signal-to-interference-plus-noise ratios (SINRs) to ensure reliable communication between UAVs. Our results show that these approaches can accurately predict connectivity and isolation events that lead to modelled communication outages. This capability could help UAVs schedule their transmissions based on these predictions.
Experimental Demonstration of Software-Orchestrated Quantum Network Applications over a Campus-Scale Testbed
To fulfill their promise, quantum networks must transform from isolated testbeds into scalable infrastructures for distributed quantum applications. In this paper, we present a prototype orchestrator for the Argonne Quantum Network (ArQNet) testbed that leverages design principles of software-defined networking (SDN) to automate typical quantum communication experiments across buildings in the Argonne campus connected over deployed, telecom fiber. Our implementation validates a scalable architecture supporting service-level abstraction of quantum networking tasks, distributed time synchronization, and entanglement verification across remote nodes. We present a prototype service of continuous, stable entanglement distribution between remote sites that ran for 12 hours, which defines a promising path towards scalable quantum networks.
comment: 11 pages, 8 figures, journal
Deep Learning-Accelerated Shapley Value for Fair Allocation in Power Systems: The Case of Carbon Emission Responsibility
Allocating costs, benefits, and emissions fairly among power system participant entities represents a persistent challenge. The Shapley value provides an axiomatically fair solution, yet computational barriers have limited its adoption beyond small-scale applications. This paper presents SurroShap, a scalable Shapley value approximation framework combining efficient coalition sampling with deep learning surrogate models that accelerate characteristic function evaluations. Exemplified through carbon emission responsibility allocation in power networks, SurroShap enables Shapley-based fair allocation for power systems with thousands of entities for the first time. We derive theoretical error bounds proving that time-averaged SurroShap allocations converge to be $\varepsilon$-close to exact Shapley values. Experiments on nine systems ranging from 26 to 1,951 entities demonstrate completion within the real-time operational window even at maximum scale, achieving 10^4-10^5 speedups over other sampling-based methods while maintaining tight error bounds. The resulting Shapley-based carbon allocations possess six desirable properties aligning individual interests with decarbonization goals. Year-long simulations on the Texas 2000-bus system validate real-world applicability, with regional analysis revealing how renewable-rich areas offset emission responsibility through exports while load centers bear responsibility for driving system-wide generation.
Neural Networks for AC Optimal Power Flow: Improving Worst-Case Guarantees during Training SC
The AC Optimal Power Flow (AC-OPF) problem is central to power system operation but challenging to solve efficiently due to its nonconvex and nonlinear nature. Neural networks (NNs) offer fast surrogates, yet their black-box behavior raises concerns about constraint violations that can compromise safety. We propose a verification-informed NN framework that incorporates worst-case constraint violations directly into training, producing models that are both accurate and provably safer. Through post-hoc verification, we achieve substantial reductions in worst-case violations and, for the first time, verify all operational constraints of large-scale AC-OPF proxies. Practical feasibility is further enhanced via restoration and warm-start strategies for infeasible operating points. Experiments on systems ranging from 57 to 793 buses demonstrate scalability, speed, and reliability, bridging the gap between ML acceleration and safe, real-time deployment of AC-OPF solutions - and paving the way toward data-driven optimal control.
comment: Submitted to PSCC 2026 (under review)
Limits of Safe AI Deployment: Differentiating Oversight and Control
Oversight and control, which we collectively call supervision, are often discussed as ways to ensure that AI systems are accountable, reliable, and able to fulfill governance and management requirements. However, the requirements for "human oversight" risk codifying vague or inconsistent interpretations of key concepts like oversight and control. This ambiguous terminology could undermine efforts to design or evaluate systems that must operate under meaningful human supervision. This matters because the term is used by regulatory texts such as the EU AI Act. This paper undertakes a targeted critical review of literature on supervision outside of AI, along with a brief summary of past work on the topic related to AI. We next differentiate control as ex-ante or real-time and operational rather than policy or governance, and oversight as performed ex-post, or a policy and governance function. Control aims to prevent failures, while oversight focuses on detection, remediation, or incentives for future prevention. Building on this, we make three contributions. 1) We propose a framework to align regulatory expectations with what is technically and organizationally plausible, articulating the conditions under which each mechanism is possible, where they fall short, and what is required to make them meaningful in practice. 2) We outline how supervision methods should be documented and integrated into risk management, and drawing on the Microsoft Responsible AI Maturity Model, we outline a maturity model for AI supervision. 3) We explicitly highlight boundaries of these mechanisms, including where they apply, where they fail, and where it is clear that no existing methods suffice. This foregrounds the question of whether meaningful supervision is possible in a given deployment context, and can support regulators, auditors, and practitioners in identifying both present and future limitations.
comment: Revised to improve table formatting and update draft
Dual-Regularized Riccati Recursions for Interior-Point Optimal Control
We derive closed-form extensions of Riccati's recursions (both sequential and parallel) for solving dual-regularized LQR problems. We show how these methods can be used to solve general constrained, non-convex, discrete-time optimal control problems via a regularized interior point method, while guaranteeing that each primal step is a descent direction of an Augmented Barrier-Lagrangian merit function. We provide MIT-licensed implementations of our methods in C++ and JAX.
Spherical Point Process with Random Heights: New Approach for Modeling and Analysis of Downlink Satellite Networks
The Low Earth Orbit (LEO) satellite industry is undergoing rapid expansion, with operators competitively launching satellites due to the first-come, first-served principle governing orbital rights. This has led to the formation of increasingly large-scale, volumetric constellation where satellites operate across a diverse range of altitudes. To address the need for analyzing such complex networks, this paper establishes a new analytical framework for LEO constellations by leveraging a 3D Poisson point process (PPP). Specifically, we introduce a random height model (RHM) that can capture various altitude distributions by applying a random radial displacement to points generated by a homogeneous PPP on a nominal shell. Building on this, we derive an analytical expression for the downlink coverage probability. To motivate our model, we show that the altitude distributions of several leading satellite constellations, including Starlink, align with our model's assumptions. We then demonstrate through Monte Carlo simulations that the coverage probability of our RHM closely matches that of these real-world networks. Finally, we confirm the accuracy of our analytical expressions by showing their agreement with simulation results. Our work thereby provides a powerful tool for understanding and predict how the statistical distribution of satellite altitudes impacts network performance.
comment: submitted to IEEE journal
Aggregative games with bilevel structures: Distributed algorithms and convergence analysis
In this paper, the problem of distributively seeking the equilibria of aggregative games with bilevel structures is studied. Different from the traditional aggregative games, here the aggregation is determined by the minimizer of a virtual leader's objective function in the inner level, which depends on the actions of the players in the outer level. Moreover, the global objective function of the virtual leader is formed by the sum of some local functions with two arguments, each of which is strongly convex with respect to the second argument. When making decisions, each player in the outer level only has access to a local part of the virtual leader's objective function. To handle this problem, first, we propose a second order gradient-based distributed algorithm, where the Hessian matrices associated with the objective functions of the leader are involved. By the algorithm, players update their actions while cooperatively minimizing the objective function of the virtual leader to estimate the aggregation by communicating with their neighbors via a connected graph. Under mild assumptions on the graph and cost functions, we prove that the actions of players asymptotically converge to the Nash equilibrium point. Then, for the case where the Hessian matrices associated with the objective functions of the virtual leader are not available, we propose a first order gradient-based distributed algorithm, where a distributed two-point estimate strategy is developed to estimate the gradients of players' cost functions in the outer level. Under the same conditions, we prove that the convergence errors of players' actions to the Nash equilibrium point are linear with respect to the estimate parameters. Finally, simulations are provided to demonstrate the effectiveness of our theoretical results.
Simultaneous System Identification and Model Predictive Control with No Dynamic Regret
We provide an algorithm for the simultaneous system identification and model predictive control of nonlinear systems. The algorithm has finite-time near-optimality guarantees and asymptotically converges to the optimal (non-causal) controller. Particularly, the algorithm enjoys sublinear dynamic regret, defined herein as the suboptimality against an optimal clairvoyant controller that knows how the unknown disturbances and system dynamics will adapt to its actions. The algorithm is self-supervised and applies to control-affine systems with unknown dynamics and disturbances that can be expressed in reproducing kernel Hilbert spaces. Such spaces can model external disturbances and modeling errors that can even be adaptive to the system's state and control input. For example, they can model wind and wave disturbances to aerial and marine vehicles, or inaccurate model parameters such as inertia of mechanical systems. The algorithm first generates random Fourier features that are used to approximate the unknown dynamics or disturbances. Then, it employs model predictive control based on the current learned model of the unknown dynamics (or disturbances). The model of the unknown dynamics is updated online using least squares based on the data collected while controlling the system. We validate our algorithm in both hardware experiments and physics-based simulations. The simulations include (i) a cart-pole aiming to maintain the pole upright despite inaccurate model parameters, and (ii) a quadrotor aiming to track reference trajectories despite unmodeled aerodynamic drag effects. The hardware experiments include a quadrotor aiming to track a circular trajectory despite unmodeled aerodynamic drag effects, ground effects, and wind disturbances.
comment: IEEE Transactions on Robotics (T-RO). v6 update on stability analysis in Appendix J under relaxed Assumption 1
Adaptive Multirobot Virtual Structure Control using Dual Quaternions
This paper presents a control strategy based on dual quaternions for the coordinated formation flying of small UAV groups. A virtual structure is employed to define the desired formation, enabling unified control of its position, orientation, and shape. This abstraction makes formation management easier by allowing a low-level controller to compute individual UAV commands efficiently. The proposed controller integrates a pose control module with a geometry-based adaptive strategy, ensuring precise and robust task execution. The effectiveness of the approach is demonstrated through both simulation and experimental results.
Action Chunking and Exploratory Data Collection Yield Exponential Improvements in Behavior Cloning for Continuous Control
This paper presents a theoretical analysis of two of the most impactful interventions in modern learning from demonstration in robotics and continuous control: the practice of action-chunking (predicting sequences of actions in open-loop) and exploratory augmentation of expert demonstrations. Though recent results show that learning from demonstration, also known as imitation learning (IL), can suffer errors that compound exponentially with task horizon in continuous settings, we demonstrate that action chunking and exploratory data collection circumvent exponential compounding errors in different regimes. Our results identify control-theoretic stability as the key mechanism underlying the benefits of these interventions. On the empirical side, we validate our predictions and the role of control-theoretic stability through experimentation on popular robot learning benchmarks. On the theoretical side, we demonstrate that the control-theoretic lens provides fine-grained insights into how compounding error arises, leading to tighter statistical guarantees on imitation learning error when these interventions are applied than previous techniques based on information-theoretic considerations alone.
comment: Updated manuscript. Added new experiments, figures, and exposition
Long-Term Mapping of the Douro River Plume with Multi-Agent Reinforcement Learning
We study the problem of long-term (multiple days) mapping of a river plume using multiple autonomous underwater vehicles (AUVs), focusing on the Douro river representative use-case. We propose an energy - and communication - efficient multi-agent reinforcement learning approach in which a central coordinator intermittently communicates with the AUVs, collecting measurements and issuing commands. Our approach integrates spatiotemporal Gaussian process regression (GPR) with a multi-head Q-network controller that regulates direction and speed for each AUV. Simulations using the Delft3D ocean model demonstrate that our method consistently outperforms both single- and multi-agent benchmarks, with scaling the number of agents both improving mean squared error (MSE) and operational endurance. In some instances, our algorithm demonstrates that doubling the number of AUVs can more than double endurance while maintaining or improving accuracy, underscoring the benefits of multi-agent coordination. Our learned policies generalize across unseen seasonal regimes over different months and years, demonstrating promise for future developments of data-driven long-term monitoring of dynamic plume environments.
Detection Augmented Bandit Procedures for Piecewise Stationary MABs: A Modular Approach
Conventional Multi-Armed Bandit (MAB) algorithms are designed for stationary environments, where the reward distributions associated with the arms do not change with time. In many applications, however, the environment is more accurately modeled as being non-stationary. In this work, piecewise stationary MAB (PS-MAB) environments are investigated, in which the reward distributions associated with a subset of the arms change at some change-points and remain stationary between change-points. Our focus is on the asymptotic analysis of PS-MABs, for which practical algorithms based on change detection have been previously proposed. Our goal is to modularize the design and analysis of such Detection Augmented Bandit (DAB) procedures. To this end, we first provide novel, improved performance lower bounds for PS-MABs. Then, we identify the requirements for stationary bandit algorithms and change detectors in a DAB procedure that are needed for the modularization. We assume that the rewards are sub-Gaussian. Under this assumption and a condition on the separation of the change-points, we show that the analysis of DAB procedures can indeed be modularized, so that the regret bounds can be obtained in a unified manner for various combinations of change detectors and bandit algorithms. Through this analysis, we develop new modular DAB procedures that are order-optimal. Finally, we showcase the practical effectiveness of our modular DAB approach in our experiments, studying its regret performance compared to other methods and investigating its detection capabilities.
comment: 30 pages, 4 figures, 1 table, submitted to TIT
The Difference between the Left and Right Invariant Extended Kalman Filter
The extended Kalman filter (EKF) has been the industry standard for state estimation problems over the past sixty years. The Invariant Extended Kalman Filter (IEKF) is a recent development of the EKF for the class of group-affine systems on Lie groups that has shown superior performance for inertial navigation problems. The IEKF comes in two versions, left- and right- handed respectively, and there is a perception in the robotics community that these filters are different and one should choose the handedness of the IEKF to match handedness of the measurement model for a given filtering problem. In this paper, we revisit these algorithms and demonstrate that the left- and right- IEKF algorithms (with reset step) are identical, that is, the choice of the handedness does not affect the IEKF's performance when the reset step is properly implemented. The reset step was not originally proposed as part of the IEKF, however, we provide simulations to show that the reset step improves asymptotic performance of all versions of the the filter, and should be included in all high performance algorithms. The GNSS-aided inertial navigation system (INS) is used as a motivating example to demonstrate the equivalence of the two filters.
comment: 20 pages, 4 figures, submitted to Control Engineering Practice
Expertise and confidence explain how social influence evolves along intellective tasks
Discovering the antecedents of individuals' influence in collaborative environments is an important, practical, and challenging problem. In this paper, we study interpersonal influence in small groups of individuals who collectively execute a sequence of intellective tasks. We observe that along an issue sequence with feedback, individuals with higher expertise and social confidence are accorded higher interpersonal influence. We also observe that low-performing individuals tend to underestimate their high-performing teammate's expertise. Based on these observations, we introduce three hypotheses and present empirical and theoretical support for their validity. We report empirical evidence on longstanding theories of transactive memory systems, social comparison, and confidence heuristics on the origins of social influence. We propose a cognitive dynamical model inspired by these theories to describe the process by which individuals adjust interpersonal influences over time. We demonstrate the model's accuracy in predicting individuals' influence and provide analytical results on its asymptotic behavior for the case with identically performing individuals. Lastly, we propose a novel approach using deep neural networks on a pre-trained text embedding model for predicting the influence of individuals. Using message contents, message times, and individual correctness collected during tasks, we are able to accurately predict individuals' self-reported influence over time. Extensive experiments verify the accuracy of the proposed models compared to baselines such as structural balance and reflected appraisal model. While the neural networks model is the most accurate, the dynamical model is the most interpretable for influence prediction.
Joint Scheduling of DER under Demand Charges: Structure and Approximation
We study the joint scheduling of behind-the-meter distributed energy resources (DERs), including flexible loads, renewable generation, and battery energy storage systems, under net energy metering tariffs with demand charges. The problem is formulated as a stochastic dynamic program aimed at maximizing expected operational surplus while accounting for renewable generation uncertainty. We analytically characterize the optimal control policy and show that it admits a threshold-based structure. However, due to the strong temporal coupling of the storage and demand charge constraints, the number of conditional branches in the policy scales combinatorially with the scheduling horizon, as it requires a look-ahead over future states. To overcome the high computational complexity in the general formulation, an efficient approximation algorithm is proposed, which searches for the peak demand under a mildly relaxed problem. We show that the algorithm scales linearly with the scheduling horizon. Extensive simulations using two open-source datasets validate the proposed algorithm and compare its performance against different DER control strategies, including a reinforcement learning-based one. Under varying storage and tariff parameters, the results show that the proposed algorithm outperforms various benchmarks in achieving a relatively small solution gap compared to a theoretical upper bound.
comment: 15 pages, 6 tables, 8 figures
Robotics
SLAP: Shortcut Learning for Abstract Planning
Long-horizon decision-making with sparse rewards and continuous states and actions remains a fundamental challenge in AI and robotics. Task and motion planning (TAMP) is a model-based framework that addresses this challenge by planning hierarchically with abstract actions (options). These options are manually defined, limiting the agent to behaviors that we as human engineers know how to program (pick, place, move). In this work, we propose Shortcut Learning for Abstract Planning (SLAP), a method that leverages existing TAMP options to automatically discover new ones. Our key idea is to use model-free reinforcement learning (RL) to learn shortcuts in the abstract planning graph induced by the existing options in TAMP. Without any additional assumptions or inputs, shortcut learning leads to shorter solutions than pure planning, and higher task success rates than flat and hierarchical RL. Qualitatively, SLAP discovers dynamic physical improvisations (e.g., slap, wiggle, wipe) that differ significantly from the manually-defined ones. In experiments in four simulated robotic environments, we show that SLAP solves and generalizes to a wide range of tasks, reducing overall plan lengths by over 50% and consistently outperforming planning and RL baselines.
Deployable Vision-driven UAV River Navigation via Human-in-the-loop Preference Alignment ICRA 2026
Rivers are critical corridors for environmental monitoring and disaster response, where Unmanned Aerial Vehicles (UAVs) guided by vision-driven policies can provide fast, low-cost coverage. However, deployment exposes simulation-trained policies with distribution shift and safety risks and requires efficient adaptation from limited human interventions. We study human-in-the-loop (HITL) learning with a conservative overseer who vetoes unsafe or inefficient actions and provides statewise preferences by comparing the agent's proposal with a corrective override. We introduce Statewise Hybrid Preference Alignment for Robotics (SPAR-H), which fuses direct preference optimization on policy logits with a reward-based pathway that trains an immediate-reward estimator from the same preferences and updates the policy using a trust-region surrogate. With five HITL rollouts collected from a fixed novice policy, SPAR-H achieves the highest final episodic reward and the lowest variance across initial conditions among tested methods. The learned reward model aligns with human-preferred actions and elevates nearby non-intervened choices, supporting stable propagation of improvements. We benchmark SPAR-H against imitation learning (IL), direct preference variants, and evaluative reinforcement learning (RL) in the HITL setting, and demonstrate real-world feasibility of continual preference alignment for UAV river following. Overall, dual statewise preferences empirically provide a practical route to data-efficient online adaptation in riverine navigation.
comment: Submitted to ICRA 2026
AquaROM: shape optimization pipeline for soft swimmers using parametric reduced order models
The efficient optimization of actuated soft structures, particularly under complex nonlinear forces, remains a critical challenge in advancing robotics. Simulations of nonlinear structures, such as soft-bodied robots modeled using the finite element method (FEM), often demand substantial computational resources, especially during optimization. To address this challenge, we propose a novel optimization algorithm based on a tensorial parametric reduced order model (PROM). Our algorithm leverages dimensionality reduction and solution approximation techniques to facilitate efficient solving of nonlinear constrained optimization problems. The well-structured tensorial approach enables the use of analytical gradients within a specifically chosen reduced order basis (ROB), significantly enhancing computational efficiency. To showcase the performance of our method, we apply it to optimizing soft robotic swimmer shapes. These actuated soft robots experience hydrodynamic forces, subjecting them to both internal and external nonlinear forces, which are incorporated into our optimization process using a data-free ROB for fast and accurate computations. This approach not only reduces computational complexity but also unlocks new opportunities to optimize complex nonlinear systems in soft robotics, paving the way for more efficient design and control.
GauDP: Reinventing Multi-Agent Collaboration through Gaussian-Image Synergy in Diffusion Policies NeurIPS 2025
Recently, effective coordination in embodied multi-agent systems has remained a fundamental challenge, particularly in scenarios where agents must balance individual perspectives with global environmental awareness. Existing approaches often struggle to balance fine-grained local control with comprehensive scene understanding, resulting in limited scalability and compromised collaboration quality. In this paper, we present GauDP, a novel Gaussian-image synergistic representation that facilitates scalable, perception-aware imitation learning in multi-agent collaborative systems. Specifically, GauDP constructs a globally consistent 3D Gaussian field from decentralized RGB observations, then dynamically redistributes 3D Gaussian attributes to each agent's local perspective. This enables all agents to adaptively query task-critical features from the shared scene representation while maintaining their individual viewpoints. This design facilitates both fine-grained control and globally coherent behavior without requiring additional sensing modalities (e.g., 3D point cloud). We evaluate GauDP on the RoboFactory benchmark, which includes diverse multi-arm manipulation tasks. Our method achieves superior performance over existing image-based methods and approaches the effectiveness of point-cloud-driven methods, while maintaining strong scalability as the number of agents increases.
comment: Accepted by NeurIPS 2025. Project page: https://ziyeeee.github.io/gaudp.io/
Breaking the Latency Barrier: Synergistic Perception and Control for High-Frequency 3D Ultrasound Servoing
Real-time tracking of dynamic targets amidst large-scale, high-frequency disturbances remains a critical unsolved challenge in Robotic Ultrasound Systems (RUSS), primarily due to the end-to-end latency of existing systems. This paper argues that breaking this latency barrier requires a fundamental shift towards the synergistic co-design of perception and control. We realize it in a novel framework with two tightly-coupled contributions: (1) a Decoupled Dual-Stream Perception Network that robustly estimates 3D translational state from 2D images at high frequency, and (2) a Single-Step Flow Policy that generates entire action sequences in one inference pass, bypassing the iterative bottleneck of conventional policies. This synergy enables a closed-loop control frequency exceeding 60Hz. On a dynamic phantom, our system not only tracks complex 3D trajectories with a mean error below 6.5mm but also demonstrates robust re-acquisition from over 170mm displacement. Furthermore, it can track targets at speeds of 102mm/s, achieving a terminal error below 1.7mm. Moreover, in-vivo experiments on a human volunteer validate the framework's effectiveness and robustness in a realistic clinical setting. Our work presents a RUSS holistically architected to unify high-bandwidth tracking with large-scale repositioning, a critical step towards robust autonomy in dynamic clinical environments.
URDF-Anything: Constructing Articulated Objects with 3D Multimodal Language Model NeurIPS 2025
Constructing accurate digital twins of articulated objects is essential for robotic simulation training and embodied AI world model building, yet historically requires painstaking manual modeling or multi-stage pipelines. In this work, we propose \textbf{URDF-Anything}, an end-to-end automatic reconstruction framework based on a 3D multimodal large language model (MLLM). URDF-Anything utilizes an autoregressive prediction framework based on point-cloud and text multimodal input to jointly optimize geometric segmentation and kinematic parameter prediction. It implements a specialized $[SEG]$ token mechanism that interacts directly with point cloud features, enabling fine-grained part-level segmentation while maintaining consistency with the kinematic parameter predictions. Experiments on both simulated and real-world datasets demonstrate that our method significantly outperforms existing approaches regarding geometric segmentation (mIoU 17\% improvement), kinematic parameter prediction (average error reduction of 29\%), and physical executability (surpassing baselines by 50\%). Notably, our method exhibits excellent generalization ability, performing well even on objects outside the training set. This work provides an efficient solution for constructing digital twins for robotic simulation, significantly enhancing the sim-to-real transfer capability.
comment: Accepted to the 39th Conference on Neural Information Processing Systems (NeurIPS 2025)
pacSTL: PAC-Bounded Signal Temporal Logic from Data-Driven Reachability Analysis
Real-world robotic systems must comply with safety requirements in the presence of uncertainty. To define and measure requirement adherence, Signal Temporal Logic (STL) offers a mathematically rigorous and expressive language. However, standard STL cannot account for uncertainty. We address this problem by presenting pacSTL, a framework that combines Probably Approximately Correct (PAC) bounded set predictions with an interval extension of STL through optimization problems on the atomic proposition level. pacSTL provides PAC-bounded robustness intervals on the specification level that can be utilized in monitoring. We demonstrate the effectiveness of this approach through maritime navigation and analyze the efficiency and scalability of pacSTL through simulation and real-world experimentation on model vessels.
Fast-SmartWay: Panoramic-Free End-to-End Zero-Shot Vision-and-Language Navigation
Recent advances in Vision-and-Language Navigation in Continuous Environments (VLN-CE) have leveraged multimodal large language models (MLLMs) to achieve zero-shot navigation. However, existing methods often rely on panoramic observations and two-stage pipelines involving waypoint predictors, which introduce significant latency and limit real-world applicability. In this work, we propose Fast-SmartWay, an end-to-end zero-shot VLN-CE framework that eliminates the need for panoramic views and waypoint predictors. Our approach uses only three frontal RGB-D images combined with natural language instructions, enabling MLLMs to directly predict actions. To enhance decision robustness, we introduce an Uncertainty-Aware Reasoning module that integrates (i) a Disambiguation Module for avoiding local optima, and (ii) a Future-Past Bidirectional Reasoning mechanism for globally coherent planning. Experiments on both simulated and real-robot environments demonstrate that our method significantly reduces per-step latency while achieving competitive or superior performance compared to panoramic-view baselines. These results demonstrate the practicality and effectiveness of Fast-SmartWay for real-world zero-shot embodied navigation.
Maestro: Orchestrating Robotics Modules with Vision-Language Models for Zero-Shot Generalist Robots
Today's best-explored routes towards generalist robots center on collecting ever larger "observations-in actions-out" robotics datasets to train large end-to-end models, copying a recipe that has worked for vision-language models (VLMs). We pursue a road less traveled: building generalist policies directly around VLMs by augmenting their general capabilities with specific robot capabilities encapsulated in a carefully curated set of perception, planning, and control modules. In Maestro, a VLM coding agent dynamically composes these modules into a programmatic policy for the current task and scenario. Maestro's architecture benefits from a streamlined closed-loop interface without many manually imposed structural constraints, and a comprehensive and diverse tool repertoire. As a result, it largely surpasses today's VLA models for zero-shot performance on challenging manipulation skills. Further, Maestro is easily extensible to incorporate new modules, easily editable to suit new embodiments such as a quadruped-mounted arm, and even easily adapts from minimal real-world experiences through local code edits.
comment: Project website: https://maestro-robot.github.io
Heuristic Step Planning for Learning Dynamic Bipedal Locomotion: A Comparative Study of Model-Based and Model-Free Approaches
This work presents an extended framework for learning-based bipedal locomotion that incorporates a heuristic step-planning strategy guided by desired torso velocity tracking. The framework enables precise interaction between a humanoid robot and its environment, supporting tasks such as crossing gaps and accurately approaching target objects. Unlike approaches based on full or simplified dynamics, the proposed method avoids complex step planners and analytical models. Step planning is primarily driven by heuristic commands, while a Raibert-type controller modulates the foot placement length based on the error between desired and actual torso velocity. We compare our method with a model-based step-planning approach -- the Linear Inverted Pendulum Model (LIPM) controller. Experimental results demonstrate that our approach attains comparable or superior accuracy in maintaining target velocity (up to 80%), significantly greater robustness on uneven terrain (over 50% improvement), and improved energy efficiency. These results suggest that incorporating complex analytical, model-based components into the training architecture may be unnecessary for achieving stable and robust bipedal walking, even in unstructured environments.
Real-Time Learning of Predictive Dynamic Obstacle Models for Robotic Motion Planning ICRA
Autonomous systems often must predict the motions of nearby agents from partial and noisy data. This paper asks and answers the question: "can we learn, in real-time, a nonlinear predictive model of another agent's motions?" Our online framework denoises and forecasts such dynamics using a modified sliding-window Hankel Dynamic Mode Decomposition (Hankel-DMD). Partial noisy measurements are embedded into a Hankel matrix, while an associated Page matrix enables singular-value hard thresholding (SVHT) to estimate the effective rank. A Cadzow projection enforces structured low-rank consistency, yielding a denoised trajectory and local noise variance estimates. From this representation, a time-varying Hankel-DMD lifted linear predictor is constructed for multi-step forecasts. The residual analysis provides variance-tracking signals that can support downstream estimators and risk-aware planning. We validate the approach in simulation under Gaussian and heavy-tailed noise, and experimentally on a dynamic crane testbed. Results show that the method achieves stable variance-aware denoising and short-horizon prediction suitable for integration into real-time control frameworks.
comment: 10 pages, 6 figures, submitted to IEEE International Conference on Robotics and Automation (ICRA) 2025
When Semantics Connect the Swarm: LLM-Driven Fuzzy Control for Cooperative Multi-Robot Underwater Coverage
Underwater multi-robot cooperative coverage remains challenging due to partial observability, limited communication, environmental uncertainty, and the lack of access to global localization. To address these issues, this paper presents a semantics-guided fuzzy control framework that couples Large Language Models (LLMs) with interpretable control and lightweight coordination. Raw multimodal observations are compressed by the LLM into compact, human-interpretable semantic tokens that summarize obstacles, unexplored regions, and Objects Of Interest (OOIs) under uncertain perception. A fuzzy inference system with pre-defined membership functions then maps these tokens into smooth and stable steering and gait commands, enabling reliable navigation without relying on global positioning. Then, we further coordinate multiple robots by introducing semantic communication that shares intent and local context in linguistic form, enabling agreement on who explores where while avoiding redundant revisits. Extensive simulations in unknown reef-like environments show that, under limited sensing and communication, the proposed framework achieves robust OOI-oriented navigation and cooperative coverage with improved efficiency and adaptability, narrowing the gap between semantic cognition and distributed underwater control in GPS-denied, map-free conditions.
comment: This paper has been submitted to IEEE Transactions on Mobile Computing
Model-free source seeking of exponentially convergent unicycle: theoretical and robotic experimental results
This paper introduces a novel model-free, real-time unicycle-based source seeking design. This design steers autonomously the unicycle dynamic system towards the extremum point of an objective function or physical/scaler signal that is unknown expression-wise, but accessible via measurements. A key contribution of this paper is that the introduced design converges exponentially to the extremum point of objective functions (or scaler signals) that behave locally like a higher-degree power functions (e.g., fourth degree polynomial function) as opposed to locally quadratic objective functions, the usual case in literature. We provide theoretical and simulation results to support out theoretical results. Also, for the first time in the literature, we provide experimental robotic results that demonstrate the effectiveness of the proposed design and its exponential convergence ability.
Dropping the D: RGB-D SLAM Without the Depth Sensor
We present DropD-SLAM, a real-time monocular SLAM system that achieves RGB-D-level accuracy without relying on depth sensors. The system replaces active depth input with three pretrained vision modules: a monocular metric depth estimator, a learned keypoint detector, and an instance segmentation network. Dynamic objects are suppressed using dilated instance masks, while static keypoints are assigned predicted depth values and backprojected into 3D to form metrically scaled features. These are processed by an unmodified RGB-D SLAM back end for tracking and mapping. On the TUM RGB-D benchmark, DropD-SLAM attains 7.4 cm mean ATE on static sequences and 1.8 cm on dynamic sequences, matching or surpassing state-of-the-art RGB-D methods while operating at 22 FPS on a single GPU. These results suggest that modern pretrained vision models can replace active depth sensors as reliable, real-time sources of metric scale, marking a step toward simpler and more cost-effective SLAM systems.
Robust Trajectory Generation and Control for Quadrotor Motion Planning with Field-of-View Control Barrier Certification
Many approaches to multi-robot coordination are susceptible to failure due to communication loss and uncertainty in estimation. We present a real-time communication-free distributed navigation algorithm certified by control barrier functions, that models and controls the onboard sensing behavior to keep neighbors in the limited field of view for position estimation. The approach is robust to temporary tracking loss and directly synthesizes control to stabilize visual contact through control Lyapunov-barrier functions. The main contributions of this paper are a continuous-time robust trajectory generation and control method certified by control barrier functions for distributed multi-robot systems and a discrete optimization procedure, namely, MPC-CBF, to approximate the certified controller. In addition, we propose a linear surrogate of high-order control barrier function constraints and use sequential quadratic programming to solve MPC-CBF efficiently.
comment: 8 pages, 8 figures, 3 tables, accepted to RA-L 2025
MOSPA: Human Motion Generation Driven by Spatial Audio NeurIPS 2025
Enabling virtual humans to dynamically and realistically respond to diverse auditory stimuli remains a key challenge in character animation, demanding the integration of perceptual modeling and motion synthesis. Despite its significance, this task remains largely unexplored. Most previous works have primarily focused on mapping modalities like speech, audio, and music to generate human motion. As of yet, these models typically overlook the impact of spatial features encoded in spatial audio signals on human motion. To bridge this gap and enable high-quality modeling of human movements in response to spatial audio, we introduce the first comprehensive Spatial Audio-Driven Human Motion (SAM) dataset, which contains diverse and high-quality spatial audio and motion data. For benchmarking, we develop a simple yet effective diffusion-based generative framework for human MOtion generation driven by SPatial Audio, termed MOSPA, which faithfully captures the relationship between body motion and spatial audio through an effective fusion mechanism. Once trained, MOSPA can generate diverse, realistic human motions conditioned on varying spatial audio inputs. We perform a thorough investigation of the proposed dataset and conduct extensive experiments for benchmarking, where our method achieves state-of-the-art performance on this task. Our code and model are publicly available at https://github.com/xsy27/Mospa-Acoustic-driven-Motion-Generation
comment: NeurIPS 2025 (Spotlight)
Co-MTP: A Cooperative Trajectory Prediction Framework with Multi-Temporal Fusion for Autonomous Driving ICRA 2025
Vehicle-to-everything technologies (V2X) have become an ideal paradigm to extend the perception range and see through the occlusion. Exiting efforts focus on single-frame cooperative perception, however, how to capture the temporal cue between frames with V2X to facilitate the prediction task even the planning task is still underexplored. In this paper, we introduce the Co-MTP, a general cooperative trajectory prediction framework with multi-temporal fusion for autonomous driving, which leverages the V2X system to fully capture the interaction among agents in both history and future domains to benefit the planning. In the history domain, V2X can complement the incomplete history trajectory in single-vehicle perception, and we design a heterogeneous graph transformer to learn the fusion of the history feature from multiple agents and capture the history interaction. Moreover, the goal of prediction is to support future planning. Thus, in the future domain, V2X can provide the prediction results of surrounding objects, and we further extend the graph transformer to capture the future interaction among the ego planning and the other vehicles' intentions and obtain the final future scenario state under a certain planning action. We evaluate the Co-MTP framework on the real-world dataset V2X-Seq, and the results show that Co-MTP achieves state-of-the-art performance and that both history and future fusion can greatly benefit prediction.
comment: 8 pages, 3 figures, ICRA 2025
Event-RGB Fusion for Spacecraft Pose Estimation Under Harsh Lighting
Spacecraft pose estimation is crucial for autonomous in-space operations, such as rendezvous, docking and on-orbit servicing. Vision-based pose estimation methods, which typically employ RGB imaging sensors, is a compelling solution for spacecraft pose estimation, but are challenged by harsh lighting conditions, which produce imaging artifacts such as glare, over-exposure, blooming and lens flare. Due to their much higher dynamic range, neuromorphic or event sensors are more resilient to extreme lighting conditions. However, event sensors generally have lower spatial resolution and suffer from reduced signal-to-noise ratio during periods of low relative motion. This work addresses these individual sensor limitations by introducing a sensor fusion approach combining RGB and event sensors. A beam-splitter prism was employed to achieve precise optical and temporal alignment. Then, a RANSAC-based technique was developed to fuse the information from the RGB and event channels to achieve pose estimation that leveraged the strengths of the two modalities. The pipeline was complemented by dropout uncertainty estimation to detect extreme conditions that affect either channel. To benchmark the performance of the proposed event-RGB fusion method, we collected a comprehensive real dataset of RGB and event data for satellite pose estimation in a laboratory setting under a variety of challenging illumination conditions. Encouraging results on the dataset demonstrate the efficacy of our event-RGB fusion approach and further supports the usage of event sensors for spacecraft pose estimation. To support community research on this topic, our dataset has been released publicly.
comment: Associated dataset: https://zenodo.org/records/15861758
Systems and Control (CS)
Universal Barrier Functions for Safety and Stability of Constrained Nonlinear Systems
In this paper, we address the problem of synthesizing safe and stabilizing controllers for nonlinear systems subject to complex safety specifications and input constraints. We introduce the Universal Barrier Function (UBF), a single continuously differentiable scalar-valued function that encodes both stability and safety criteria while accounting for input constraints. Using the UBF, we formulate a Quadratic Program (UBF-QP) to generate control inputs that are both safe and stabilizing under input constraints. We demonstrate that the UBF-QP is feasible if a UBF exists. Furthermore, under mild conditions, we prove that a UBF always exists. The proposed framework is then extended to systems with higher relative degrees. Finally, numerical simulations illustrate the effectiveness of our proposed approach.
comment: 16 pages, 14 figures
Robust Self-Triggered Control Approaches Optimizing Sampling Sequences with Synchronous Measurements
Feedback control algorithms traditionally rely on periodic execution on digital platforms. While this simplifies design and analysis, it often leads to inefficient resource usage (e.g., CPU, network bandwidth) in embedded control and shared networks. This work investigates self-triggering implementations of linear controllers in sampled-data systems with synchronous measurements. Our approach precomputes the next sampling sequence over a finite horizon based on current state information. We introduce a novel optimal self-triggering scheme that guarantees exponential stability for unperturbed systems and global uniform ultimate boundedness for perturbed systems. This ensures robustness against external disturbances with explicit performance guarantees. Simulations demonstrate the benefits of our approach.
comment: Note: This research was conducted in 2017--2018. The literature review has not been updated and may not reflect subsequent or concurrent developments in the field
GOSPA-Driven Non-Myopic Multi-Sensor Management with Multi-Bernoulli Filtering
In this paper, we propose a non-myopic sensor management algorithm for multi-target tracking, with multiple sensors operating in the same surveillance area. The algorithm is based on multi-Bernoulli filtering and selects the actions that solve a non-myopic minimisation problem, where the cost function is the mean square generalised optimal sub-pattern assignment (GOSPA) error, over a future time window. For tractability, the sensor management algorithm actually uses an upper bound of the GOSPA error and is implemented via Monte Carlo Tree Search (MCTS). The sensors have the ability to jointly optimise and select their actions with the considerations of all other sensors in the surveillance area. The benefits of the proposed algorithm are analysed via simulations.
comment: submitted to IEEE Transactions on Aerospace and Electronic Systems November 2025
Online Energy Storage Arbitrage under Imperfect Predictions: A Conformal Risk-Aware Approach
This work proposes a conformal approach for energy storage arbitrage to control the downside risks arose from imperfect price forecasts. Energy storage arbitrage relies solely on predictions of future market prices, while inaccurate price predictions may lead to significant profit losses. Based on conformal decision theory, we describe a controller that dynamically adjusts decision conservativeness through prediction sets without distributional assumptions. To enable online calibration when online profit loss feedback is unobservable, we establish that a temporal difference error serves as a measurable proxy. Building on this insight, we develop two online calibration strategies: prediction error-based adaptation targeting forecast accuracy, and value error-based calibration focusing on decision quality. Analysis of the conformal controller proves bounded long-term risk with convergence guarantees in temporal difference error, which further effectively manages risk exposure in potential profit losses. Case studies demonstrate superior performance in balancing risk and opportunity compared to benchmarks under varying forecast conditions.
On Structural Properties of Risk-Averse Optimal Stopping Problems
We establish structural properties of optimal stopping problems under time-consistent dynamic (coherent) risk measures, focusing on value function monotonicity and the existence of control limit (threshold) optimal policies. While such results are well developed for risk-neutral (expected-value) models, they remain underexplored in risk-averse settings. Coherent risk measures typically lack the tower property and are subadditive rather than additive, complicating structural analysis. We show that value function monotonicity mirrors the risk-neutral case. Moreover, if the risk envelope associated with each coherent risk measure admits a minimal element, the risk-averse optimal stopping problem reduces to an equivalent risk-neutral formulation. We also develop a general procedure for identifying control limit optimal policies and use it to derive practical, verifiable conditions on the risk measures and MDP structure that guarantee their existence. We illustrate the theory and verify these conditions through optimal stopping problems arising in operations, marketing, and finance.
Autonomous Vehicle front steering control computation saving
For autonomous vehicles lane keeping purposes it is crucial to control the vehicle yaw rate. As it is known a vehicle yaw rate control can be achieved handling the steering angle. One option is to consider a robust controller and depending of the requirements the synthesis can drive to a high order controller. Nowadays this kind of vehicles needs a networked based control (IVN -Intelligent Vehicle Network-)with a considerable amount of control loops for different vehicle components. Therefore, in this environment the controllers computation saving could be a good option for unload the network and digital processors. That is the main target of this contribution; in order to accomplish this goal a interlacing implementation technique is considered. Results in a real path tracking illustrates viability of this procedure.
comment: 17 pages, 6 figures
Secure Distributed Consensus Estimation under False Data Injection Attacks: A Defense Strategy Based on Partial Channel Coding
This article investigates the security issue caused by false data injection attacks in distributed estimation, wherein each sensor can construct two types of residues based on local estimates and neighbor information, respectively. The resource-constrained attacker can select partial channels from the sensor network and arbitrarily manipulate the transmitted data. We derive necessary and sufficient conditions to reveal system vulnerabilities, under which the attacker is able to diverge the estimation error while preserving the stealthiness of all residues. We propose two defense strategies with mechanisms of exploiting the Euclidean distance between local estimates to detect attacks, and adopting the coding scheme to protect the transmitted data, respectively. It is proven that the former has the capability to address the majority of security loopholes, while the latter can serve as an additional enhancement to the former. By employing the time-varying coding matrix to mitigate the risk of being cracked, we demonstrate that the latter can safeguard against adversaries injecting stealthy sequences into the encoded channels. Hence, drawing upon the security analysis, we further provide a procedure to select security-critical channels that need to be encoded, thereby achieving a trade-off between security and coding costs. Finally, some numerical simulations are conducted to demonstrate the theoretical results.
Parallel KKT Solver in PIQP for Multistage Optimization
This paper presents an efficient parallel Cholesky factorization and triangular solve algorithm for the Karush-Kuhn-Tucker (KKT) systems arising in multistage optimization problems, with a focus on model predictive control and trajectory optimization for racing. The proposed approach directly parallelizes solving the KKT systems with block-tridiagonal-arrow KKT matrices on the linear algebra level arising in interior-point methods. The algorithm is implemented as a new backend of the PIQP solver and released as open source. Numerical experiments on the chain-of-masses benchmarks and a minimum curvature race line optimization problem demonstrate substantial performance gains compared to other state-of-the-art solvers.
Traffic-Aware Grid Planning for Dynamic Wireless Electric Vehicle Charging
Dynamic Wireless Electric Vehicle Charging (DWC) on electrified roadways is an emerging technology that can significantly reduce battery sizes, eliminate charging downtime, and alleviate range anxiety, specially for long-haul transportation and fleet operations of electric vehicles (EVs). However, these systems introduce new challenges for power system planning due to their short-duration and high-power demands which can strain the grid if not properly managed. As the energy demands from DWC depend on vehicle speed, density, dwell time in charging zones, and load profiles along road segments, there is a need for integrated planning of such systems, jointly considering both traffic behavior and EV energy consumption. In this paper, we propose a traffic-aware grid planning framework for DWC. We leverage a macroscopic Cell Transmission Model of traffic flow to estimate real-time, spatiotemporal EV charging demand from DWC corridors. The demand model is then integrated into an AC Optimal Power Flow based formulation to optimally size a microgrid that supports DWC under varying traffic conditions while minimizing the cost of operation. Our framework explicitly models how spatiotemporal traffic patterns affect the utilization of grid resources to obtain system designs that achieve lower costs and are easier to operationalize as compared to planning models that rely on worst-case traffic data. We demonstrate the framework on data from a 14-mile segment of the I-210W highway in California, USA, evaluating multiple traffic scenarios like free-flow, severe congestion, accidents of varying severity, and natural disasters like forest fires. Our results demonstrate that traffic-aware grid planning significantly reduces infrastructure costs as compared to worst-scenario based modeling, while ensuring reliability of service in terms of meeting charging demands under diverse traffic conditions.
Low-Cost Carriers in Aviation: Significance and Developments
This paper aims to discuss the impacts of low-cost airlines on the air transport market and, in particular, to present the most recent findings from the specialized literature in this field. To this end, several papers published on the topic since 2015 were selected and analyzed. Based on this analysis, the main subjects addressed in the studies were categorized into five groups: (i) impacts of low-cost airlines on competing carriers; (ii) impacts on airports; (iii) general effects on air transport demand; (iv) effects on passengers' choice processes; and (v) broader effects on geographical regions.
Minimizing Maximum Latency of Task Offloading for Multi-UAV-assisted Maritime Search and Rescue
Unmanned Aerial Vehicles (UAVs) play a crucial role in Maritime Search and Rescue (MSAR), contributing to the improvement of rescue efficiency and reduction of casualties. Typically, UAVs equipped with cameras collect data from disaster areas and transmit it to the shore-based rescue command centers. By deploying Mobile Edge Computing (MEC) servers, UAVs can pre-process video footage to reduce data transmission volume, thus reducing transmission delays. However, the limited computational capacity and energy of UAVs pose significant challenges to the efficiency of UAV-assisted MSAR systems. To address these problems, in this paper, we investigate a multi-UAV assisted MSAR system consisting of multiple Surveillance UAVs (S-UAVs) and a Relay UAV (R-UAV). Then, we formulate a joint optimization problem to minimize the maximum total latency among all S-UAVs via jointly making the computing offloading decisions, R-UAV deployment, and the association between a S-UAV and rescue targets while ensuring that all targets are monitored by S-UAVs. Since the formulated optimization problem is typically hard to solve due to its non-convexity, we propose an effective iterative algorithm by breaking it into three sub-problems. Numerical simulation results show the effectiveness of the proposed algorithm with various performance parameters.
Real-Time Learning of Predictive Dynamic Obstacle Models for Robotic Motion Planning ICRA
Autonomous systems often must predict the motions of nearby agents from partial and noisy data. This paper asks and answers the question: "can we learn, in real-time, a nonlinear predictive model of another agent's motions?" Our online framework denoises and forecasts such dynamics using a modified sliding-window Hankel Dynamic Mode Decomposition (Hankel-DMD). Partial noisy measurements are embedded into a Hankel matrix, while an associated Page matrix enables singular-value hard thresholding (SVHT) to estimate the effective rank. A Cadzow projection enforces structured low-rank consistency, yielding a denoised trajectory and local noise variance estimates. From this representation, a time-varying Hankel-DMD lifted linear predictor is constructed for multi-step forecasts. The residual analysis provides variance-tracking signals that can support downstream estimators and risk-aware planning. We validate the approach in simulation under Gaussian and heavy-tailed noise, and experimentally on a dynamic crane testbed. Results show that the method achieves stable variance-aware denoising and short-horizon prediction suitable for integration into real-time control frameworks.
comment: 10 pages, 6 figures, submitted to IEEE International Conference on Robotics and Automation (ICRA) 2025
When Semantics Connect the Swarm: LLM-Driven Fuzzy Control for Cooperative Multi-Robot Underwater Coverage
Underwater multi-robot cooperative coverage remains challenging due to partial observability, limited communication, environmental uncertainty, and the lack of access to global localization. To address these issues, this paper presents a semantics-guided fuzzy control framework that couples Large Language Models (LLMs) with interpretable control and lightweight coordination. Raw multimodal observations are compressed by the LLM into compact, human-interpretable semantic tokens that summarize obstacles, unexplored regions, and Objects Of Interest (OOIs) under uncertain perception. A fuzzy inference system with pre-defined membership functions then maps these tokens into smooth and stable steering and gait commands, enabling reliable navigation without relying on global positioning. Then, we further coordinate multiple robots by introducing semantic communication that shares intent and local context in linguistic form, enabling agreement on who explores where while avoiding redundant revisits. Extensive simulations in unknown reef-like environments show that, under limited sensing and communication, the proposed framework achieves robust OOI-oriented navigation and cooperative coverage with improved efficiency and adaptability, narrowing the gap between semantic cognition and distributed underwater control in GPS-denied, map-free conditions.
comment: This paper has been submitted to IEEE Transactions on Mobile Computing
Deep Q-Network for Optimizing NOMA-Aided Resource Allocation in Smart Factories with URLLC Constraints
This paper presents a Deep Q-Network (DQN)- based algorithm for NOMA-aided resource allocation in smart factories, addressing the stringent requirements of Ultra-Reliable Low-Latency Communication (URLLC). The proposed algorithm dynamically allocates sub-channels and optimizes power levels to maximize throughput while meeting strict latency constraints. By incorporating a tunable parameter {\lambda}, the algorithm balances the trade-off between throughput and latency, making it suitable for various devices, including robots, sensors, and controllers, each with distinct communication needs. Simulation results show that robots achieve higher throughput, while sensors and controllers meet the low-latency requirements of URLLC, ensuring reliable communication for real-time industrial applications.
comment: Accepted for presentation at the IEEE Wireless Communications and Networking Conference (WCNC) 2025. This is the preprint version of the paper
High-Power Dual-Channel Field Chamber for High-Frequency Magnetic Neuromodulation
Several novel methods, including magnetogenetics and magnetoelectric stimulation, use high frequency alternating magnetic fields to precisely manipulate neural activity. To quantify the behavioral effects of such interventions in a freely moving mouse, we developed a dual-channel magnetic chamber, specifically designed for rate-sensitive magnetothermal-genetic stimulation, and adaptable for other uses of alternating magnetic fields. Through an optimized coil design, the system allows independent control of two spatially orthogonal uniform magnetic fields delivered at different frequencies within a 10 cm x 10 cm x 6 cm chamber. The two channels have nominal frequencies of 50 and 550 kHz with peak magnetic field strengths of 88 and 12.5 mT, achieved with resonant coil drives having peak voltages of 1.6 and 1.8 kV and currents of 1.0 and 0.26 kA, respectively. Additionally, a liquid cooling system enables magnetic field generation for second-level duration, and an observation port and camera allow video capture of the animal's behavior within the chamber. The system generates high-amplitude magnetic fields across two widely separated frequency channels with negligible interference (< 1%). Relatively uniform magnetic field distribution (+/-10% across 94% of the chamber volume) is maintained throughout the chamber, and temperature increase of the inner side of the coil enclosure during the operation is limited to < 0.35 {\deg}C/s to ensure in vivo safety. Using cobalt-doped and undoped iron oxide nanoparticles, we demonstrate channel-specific heating rates of 3.5 {\deg}C/s and 1.5 {\deg}C/s, respectively, validating frequency-selectivity. Both channels can run continuously for four seconds stably.
comment: 25 pages, 8 figures
Magnetic Materials for Transcranial Magnetic Stimulation (TMS)
Various coils for transcranial magnetic stimulation (TMS) are widely available for clinical and research use. These coils are almost all designed as air coils, which require large levels of energy to achieve a given magnetic flux density and in turn electric field strength, whereas in other sectors, such as power electronics or electrical machines, magnetic materials have been used for a long time to achieve higher efficiencies. We tested the impact on the electric and magnetic properties of different soft magnetic materials, including various ferrite cores, laminated sheet materials of nonisotropic corn-oriented silicon-steel, non-oriented silicon-steel, as well as cobalt-iron, and soft magnetic compound powder cores with insulated particles. Every material led to a reduction in coil current and voltage for the same target electric field strength. For the same field energy, every material yielded lower losses. Most common materials saturated already at very low currents. More material in thicker layers could shift the saturation point but at the cost of high weight. Due to their low saturation flux density, ferrites appear unsuitable for the high amplitude requirements of TMS. Laminated sheet materials and powder cores reduce the pulse energy, but the laminated sheet material adds more weight for the same effect than powder cores. Thus, appropriate magnetic materials can reduce the required pulse energy. Saturation flux density is the most relevant parameter, whereas the permeability beyond a certain base level is practically irrelevant. Most importantly, the weight of a magnetic-core coil may always be increased compared to an air coil for the same target field.
comment: 30 pages, 14 figures, 2 tables
Iterative Cut-Based PWA Approximation of Multi-Dimensional Nonlinear Systems
PieceWise Affine (PWA) approximations for nonlinear functions have been extensively used for tractable, computationally efficient control of nonlinear systems. However, reaching a desired approximation accuracy without prior information about the behavior of the nonlinear systems remains a challenge in the function approximation and control literature. As the name suggests, PWA approximation aims at approximating a nonlinear function or system by dividing the domain into multiple subregions where the nonlinear function or dynamics is approximated locally by an affine function also called local mode. Without prior knowledge of the form of the nonlinearity, the required number of modes, the locations of the subregions, and the local approximations need to be optimized simultaneously, which becomes highly complex for large-scale systems with multi-dimensional nonlinear functions. This paper introduces a novel approach for PWA approximation of multi-dimensional nonlinear systems, utilizing a hinging hyperplane formalism for cut-based partitioning of the domain. The complexity of the PWA approximation is iteratively increased until reaching the desired accuracy level. Further, the tractable cut definitions allow for different forms of subregions, as well as the ability to impose continuity constraints on the PWA approximation. The methodology is explained via multiple examples and its performance is compared to two existing approaches through case studies, showcasing its efficacy.
comment: 9 pages, 4 figures, submitted to journal
ORFit: One-Pass Learning via Bridging Orthogonal Gradient Descent and Recursive Least-Squares
While large machine learning models have shown remarkable performance in various domains, their training typically requires iterating for many passes over the training data. However, due to computational and memory constraints and potential privacy concerns, storing and accessing all the data is impractical in many real-world scenarios where the data arrives in a stream. In this paper, we investigate the problem of one-pass learning, in which a model is trained on sequentially arriving data without retraining on previous datapoints. Motivated by the demonstrated effectiveness of overparameterized models and the phenomenon of benign overfitting, we propose Orthogonal Recursive Fitting (ORFit), an algorithm for one-pass learning which seeks to perfectly fit each new datapoint while minimally altering the predictions on previous datapoints. ORFit updates the parameters in a direction orthogonal to past gradients, similar to orthogonal gradient descent (OGD) in continual learning. We show that, interestingly, ORFit's update leads to an operation similar to the recursive least-squares (RLS) algorithm in adaptive filtering but with significantly improved memory and computational efficiency, i.e., linear, instead of quadratic, in the number of parameters. To further reduce memory usage, we leverage the structure of the streaming data via an incremental principal component analysis (IPCA). We show that using the principal components is minimax optimal, i.e., it minimizes the worst-case forgetting of previous predictions for unknown future updates. Further, we prove that, for overparameterized linear models, the parameter vector obtained by ORFit matches what the standard multi-pass stochastic gradient descent (SGD) would converge to. Finally, we extend our results to the nonlinear setting for highly overparameterized models, relevant for deep learning.
comment: Journal extension of v1: Y. Min, K, Ahn, N. Azizan, "One-Pass Learning via Bridging Orthogonal Gradient Descent and Recursive Least-Squares," IEEE Conference on Decision and Control, 2022
Spatio-Temporal Consistent Soft Sensor Modeling and Monitoring of Thermal Power Plants based on Physical Knowledge
Data-driven soft sensors have been widely applied in complex industrial processes. However, the interpretable spatio-temporal features extraction by soft sensors remains a challenge. In this light, this work introduces a novel method termed spatio-temporal consistent and interpretable model (STCIM). First, temporal and spatial features are captured and aligned by a far topological spatio-temporal consistency extraction block. Then, the features are mapped into an interpretable latent space for further prediction by explicitly giving physical meanings to latent variables. The efficacy of the proposed STCIM is demonstrated through the modeling of two generated datasets and a real-life dataset of coal-fired power plants. The corresponding experiments show: 1) The generalization of STCIM outperforms other methods, especially in different operation situations. 2) The far topological spatio-temporal consistency is vital for feature alignment. 3) The hyper-parameters of physics-informed interpretable latent space loss decide the performance of STCIM.
Lyapunov Neural ODE State-Feedback Control Policies
Deep neural networks are increasingly used as an effective parameterization of control policies in various learning-based control paradigms. For continuous-time optimal control problems (OCPs), which are central to many decision-making tasks, control policy learning can be cast as a neural ordinary differential equation (NODE) problem wherein state and control constraints are naturally accommodated. This paper presents a NODE approach to solving continuous-time OCPs for the case of stabilizing a known constrained nonlinear system around a target state. The approach, termed Lyapunov-NODE control (L-NODEC), uses a novel Lyapunov loss formulation that incorporates an exponentially-stabilizing control Lyapunov function to learn a state-feedback neural control policy, bridging the gap of solving continuous-time OCPs via NODEs with stability guarantees. The proposed Lyapunov loss allows L-NODEC to guarantee exponential stability of the controlled system, as well as its adversarial robustness to perturbations to the initial state. The performance of L-NODEC is illustrated in two problems, including a dose delivery problem in plasma medicine. In both cases, L-NODEC effectively stabilizes the controlled system around the target state despite perturbations to the initial state and reduces the inference time necessary to reach the target.
Systems and Control (EESS)
Universal Barrier Functions for Safety and Stability of Constrained Nonlinear Systems
In this paper, we address the problem of synthesizing safe and stabilizing controllers for nonlinear systems subject to complex safety specifications and input constraints. We introduce the Universal Barrier Function (UBF), a single continuously differentiable scalar-valued function that encodes both stability and safety criteria while accounting for input constraints. Using the UBF, we formulate a Quadratic Program (UBF-QP) to generate control inputs that are both safe and stabilizing under input constraints. We demonstrate that the UBF-QP is feasible if a UBF exists. Furthermore, under mild conditions, we prove that a UBF always exists. The proposed framework is then extended to systems with higher relative degrees. Finally, numerical simulations illustrate the effectiveness of our proposed approach.
comment: 16 pages, 14 figures
Robust Self-Triggered Control Approaches Optimizing Sampling Sequences with Synchronous Measurements
Feedback control algorithms traditionally rely on periodic execution on digital platforms. While this simplifies design and analysis, it often leads to inefficient resource usage (e.g., CPU, network bandwidth) in embedded control and shared networks. This work investigates self-triggering implementations of linear controllers in sampled-data systems with synchronous measurements. Our approach precomputes the next sampling sequence over a finite horizon based on current state information. We introduce a novel optimal self-triggering scheme that guarantees exponential stability for unperturbed systems and global uniform ultimate boundedness for perturbed systems. This ensures robustness against external disturbances with explicit performance guarantees. Simulations demonstrate the benefits of our approach.
comment: Note: This research was conducted in 2017--2018. The literature review has not been updated and may not reflect subsequent or concurrent developments in the field
GOSPA-Driven Non-Myopic Multi-Sensor Management with Multi-Bernoulli Filtering
In this paper, we propose a non-myopic sensor management algorithm for multi-target tracking, with multiple sensors operating in the same surveillance area. The algorithm is based on multi-Bernoulli filtering and selects the actions that solve a non-myopic minimisation problem, where the cost function is the mean square generalised optimal sub-pattern assignment (GOSPA) error, over a future time window. For tractability, the sensor management algorithm actually uses an upper bound of the GOSPA error and is implemented via Monte Carlo Tree Search (MCTS). The sensors have the ability to jointly optimise and select their actions with the considerations of all other sensors in the surveillance area. The benefits of the proposed algorithm are analysed via simulations.
comment: submitted to IEEE Transactions on Aerospace and Electronic Systems November 2025
Online Energy Storage Arbitrage under Imperfect Predictions: A Conformal Risk-Aware Approach
This work proposes a conformal approach for energy storage arbitrage to control the downside risks arose from imperfect price forecasts. Energy storage arbitrage relies solely on predictions of future market prices, while inaccurate price predictions may lead to significant profit losses. Based on conformal decision theory, we describe a controller that dynamically adjusts decision conservativeness through prediction sets without distributional assumptions. To enable online calibration when online profit loss feedback is unobservable, we establish that a temporal difference error serves as a measurable proxy. Building on this insight, we develop two online calibration strategies: prediction error-based adaptation targeting forecast accuracy, and value error-based calibration focusing on decision quality. Analysis of the conformal controller proves bounded long-term risk with convergence guarantees in temporal difference error, which further effectively manages risk exposure in potential profit losses. Case studies demonstrate superior performance in balancing risk and opportunity compared to benchmarks under varying forecast conditions.
On Structural Properties of Risk-Averse Optimal Stopping Problems
We establish structural properties of optimal stopping problems under time-consistent dynamic (coherent) risk measures, focusing on value function monotonicity and the existence of control limit (threshold) optimal policies. While such results are well developed for risk-neutral (expected-value) models, they remain underexplored in risk-averse settings. Coherent risk measures typically lack the tower property and are subadditive rather than additive, complicating structural analysis. We show that value function monotonicity mirrors the risk-neutral case. Moreover, if the risk envelope associated with each coherent risk measure admits a minimal element, the risk-averse optimal stopping problem reduces to an equivalent risk-neutral formulation. We also develop a general procedure for identifying control limit optimal policies and use it to derive practical, verifiable conditions on the risk measures and MDP structure that guarantee their existence. We illustrate the theory and verify these conditions through optimal stopping problems arising in operations, marketing, and finance.
Autonomous Vehicle front steering control computation saving
For autonomous vehicles lane keeping purposes it is crucial to control the vehicle yaw rate. As it is known a vehicle yaw rate control can be achieved handling the steering angle. One option is to consider a robust controller and depending of the requirements the synthesis can drive to a high order controller. Nowadays this kind of vehicles needs a networked based control (IVN -Intelligent Vehicle Network-)with a considerable amount of control loops for different vehicle components. Therefore, in this environment the controllers computation saving could be a good option for unload the network and digital processors. That is the main target of this contribution; in order to accomplish this goal a interlacing implementation technique is considered. Results in a real path tracking illustrates viability of this procedure.
comment: 17 pages, 6 figures
Secure Distributed Consensus Estimation under False Data Injection Attacks: A Defense Strategy Based on Partial Channel Coding
This article investigates the security issue caused by false data injection attacks in distributed estimation, wherein each sensor can construct two types of residues based on local estimates and neighbor information, respectively. The resource-constrained attacker can select partial channels from the sensor network and arbitrarily manipulate the transmitted data. We derive necessary and sufficient conditions to reveal system vulnerabilities, under which the attacker is able to diverge the estimation error while preserving the stealthiness of all residues. We propose two defense strategies with mechanisms of exploiting the Euclidean distance between local estimates to detect attacks, and adopting the coding scheme to protect the transmitted data, respectively. It is proven that the former has the capability to address the majority of security loopholes, while the latter can serve as an additional enhancement to the former. By employing the time-varying coding matrix to mitigate the risk of being cracked, we demonstrate that the latter can safeguard against adversaries injecting stealthy sequences into the encoded channels. Hence, drawing upon the security analysis, we further provide a procedure to select security-critical channels that need to be encoded, thereby achieving a trade-off between security and coding costs. Finally, some numerical simulations are conducted to demonstrate the theoretical results.
Parallel KKT Solver in PIQP for Multistage Optimization
This paper presents an efficient parallel Cholesky factorization and triangular solve algorithm for the Karush-Kuhn-Tucker (KKT) systems arising in multistage optimization problems, with a focus on model predictive control and trajectory optimization for racing. The proposed approach directly parallelizes solving the KKT systems with block-tridiagonal-arrow KKT matrices on the linear algebra level arising in interior-point methods. The algorithm is implemented as a new backend of the PIQP solver and released as open source. Numerical experiments on the chain-of-masses benchmarks and a minimum curvature race line optimization problem demonstrate substantial performance gains compared to other state-of-the-art solvers.
Traffic-Aware Grid Planning for Dynamic Wireless Electric Vehicle Charging
Dynamic Wireless Electric Vehicle Charging (DWC) on electrified roadways is an emerging technology that can significantly reduce battery sizes, eliminate charging downtime, and alleviate range anxiety, specially for long-haul transportation and fleet operations of electric vehicles (EVs). However, these systems introduce new challenges for power system planning due to their short-duration and high-power demands which can strain the grid if not properly managed. As the energy demands from DWC depend on vehicle speed, density, dwell time in charging zones, and load profiles along road segments, there is a need for integrated planning of such systems, jointly considering both traffic behavior and EV energy consumption. In this paper, we propose a traffic-aware grid planning framework for DWC. We leverage a macroscopic Cell Transmission Model of traffic flow to estimate real-time, spatiotemporal EV charging demand from DWC corridors. The demand model is then integrated into an AC Optimal Power Flow based formulation to optimally size a microgrid that supports DWC under varying traffic conditions while minimizing the cost of operation. Our framework explicitly models how spatiotemporal traffic patterns affect the utilization of grid resources to obtain system designs that achieve lower costs and are easier to operationalize as compared to planning models that rely on worst-case traffic data. We demonstrate the framework on data from a 14-mile segment of the I-210W highway in California, USA, evaluating multiple traffic scenarios like free-flow, severe congestion, accidents of varying severity, and natural disasters like forest fires. Our results demonstrate that traffic-aware grid planning significantly reduces infrastructure costs as compared to worst-scenario based modeling, while ensuring reliability of service in terms of meeting charging demands under diverse traffic conditions.
Low-Cost Carriers in Aviation: Significance and Developments
This paper aims to discuss the impacts of low-cost airlines on the air transport market and, in particular, to present the most recent findings from the specialized literature in this field. To this end, several papers published on the topic since 2015 were selected and analyzed. Based on this analysis, the main subjects addressed in the studies were categorized into five groups: (i) impacts of low-cost airlines on competing carriers; (ii) impacts on airports; (iii) general effects on air transport demand; (iv) effects on passengers' choice processes; and (v) broader effects on geographical regions.
Minimizing Maximum Latency of Task Offloading for Multi-UAV-assisted Maritime Search and Rescue
Unmanned Aerial Vehicles (UAVs) play a crucial role in Maritime Search and Rescue (MSAR), contributing to the improvement of rescue efficiency and reduction of casualties. Typically, UAVs equipped with cameras collect data from disaster areas and transmit it to the shore-based rescue command centers. By deploying Mobile Edge Computing (MEC) servers, UAVs can pre-process video footage to reduce data transmission volume, thus reducing transmission delays. However, the limited computational capacity and energy of UAVs pose significant challenges to the efficiency of UAV-assisted MSAR systems. To address these problems, in this paper, we investigate a multi-UAV assisted MSAR system consisting of multiple Surveillance UAVs (S-UAVs) and a Relay UAV (R-UAV). Then, we formulate a joint optimization problem to minimize the maximum total latency among all S-UAVs via jointly making the computing offloading decisions, R-UAV deployment, and the association between a S-UAV and rescue targets while ensuring that all targets are monitored by S-UAVs. Since the formulated optimization problem is typically hard to solve due to its non-convexity, we propose an effective iterative algorithm by breaking it into three sub-problems. Numerical simulation results show the effectiveness of the proposed algorithm with various performance parameters.
Real-Time Learning of Predictive Dynamic Obstacle Models for Robotic Motion Planning ICRA
Autonomous systems often must predict the motions of nearby agents from partial and noisy data. This paper asks and answers the question: "can we learn, in real-time, a nonlinear predictive model of another agent's motions?" Our online framework denoises and forecasts such dynamics using a modified sliding-window Hankel Dynamic Mode Decomposition (Hankel-DMD). Partial noisy measurements are embedded into a Hankel matrix, while an associated Page matrix enables singular-value hard thresholding (SVHT) to estimate the effective rank. A Cadzow projection enforces structured low-rank consistency, yielding a denoised trajectory and local noise variance estimates. From this representation, a time-varying Hankel-DMD lifted linear predictor is constructed for multi-step forecasts. The residual analysis provides variance-tracking signals that can support downstream estimators and risk-aware planning. We validate the approach in simulation under Gaussian and heavy-tailed noise, and experimentally on a dynamic crane testbed. Results show that the method achieves stable variance-aware denoising and short-horizon prediction suitable for integration into real-time control frameworks.
comment: 10 pages, 6 figures, submitted to IEEE International Conference on Robotics and Automation (ICRA) 2025
When Semantics Connect the Swarm: LLM-Driven Fuzzy Control for Cooperative Multi-Robot Underwater Coverage
Underwater multi-robot cooperative coverage remains challenging due to partial observability, limited communication, environmental uncertainty, and the lack of access to global localization. To address these issues, this paper presents a semantics-guided fuzzy control framework that couples Large Language Models (LLMs) with interpretable control and lightweight coordination. Raw multimodal observations are compressed by the LLM into compact, human-interpretable semantic tokens that summarize obstacles, unexplored regions, and Objects Of Interest (OOIs) under uncertain perception. A fuzzy inference system with pre-defined membership functions then maps these tokens into smooth and stable steering and gait commands, enabling reliable navigation without relying on global positioning. Then, we further coordinate multiple robots by introducing semantic communication that shares intent and local context in linguistic form, enabling agreement on who explores where while avoiding redundant revisits. Extensive simulations in unknown reef-like environments show that, under limited sensing and communication, the proposed framework achieves robust OOI-oriented navigation and cooperative coverage with improved efficiency and adaptability, narrowing the gap between semantic cognition and distributed underwater control in GPS-denied, map-free conditions.
comment: This paper has been submitted to IEEE Transactions on Mobile Computing
Deep Q-Network for Optimizing NOMA-Aided Resource Allocation in Smart Factories with URLLC Constraints
This paper presents a Deep Q-Network (DQN)- based algorithm for NOMA-aided resource allocation in smart factories, addressing the stringent requirements of Ultra-Reliable Low-Latency Communication (URLLC). The proposed algorithm dynamically allocates sub-channels and optimizes power levels to maximize throughput while meeting strict latency constraints. By incorporating a tunable parameter {\lambda}, the algorithm balances the trade-off between throughput and latency, making it suitable for various devices, including robots, sensors, and controllers, each with distinct communication needs. Simulation results show that robots achieve higher throughput, while sensors and controllers meet the low-latency requirements of URLLC, ensuring reliable communication for real-time industrial applications.
comment: Accepted for presentation at the IEEE Wireless Communications and Networking Conference (WCNC) 2025. This is the preprint version of the paper
High-Power Dual-Channel Field Chamber for High-Frequency Magnetic Neuromodulation
Several novel methods, including magnetogenetics and magnetoelectric stimulation, use high frequency alternating magnetic fields to precisely manipulate neural activity. To quantify the behavioral effects of such interventions in a freely moving mouse, we developed a dual-channel magnetic chamber, specifically designed for rate-sensitive magnetothermal-genetic stimulation, and adaptable for other uses of alternating magnetic fields. Through an optimized coil design, the system allows independent control of two spatially orthogonal uniform magnetic fields delivered at different frequencies within a 10 cm x 10 cm x 6 cm chamber. The two channels have nominal frequencies of 50 and 550 kHz with peak magnetic field strengths of 88 and 12.5 mT, achieved with resonant coil drives having peak voltages of 1.6 and 1.8 kV and currents of 1.0 and 0.26 kA, respectively. Additionally, a liquid cooling system enables magnetic field generation for second-level duration, and an observation port and camera allow video capture of the animal's behavior within the chamber. The system generates high-amplitude magnetic fields across two widely separated frequency channels with negligible interference (< 1%). Relatively uniform magnetic field distribution (+/-10% across 94% of the chamber volume) is maintained throughout the chamber, and temperature increase of the inner side of the coil enclosure during the operation is limited to < 0.35 {\deg}C/s to ensure in vivo safety. Using cobalt-doped and undoped iron oxide nanoparticles, we demonstrate channel-specific heating rates of 3.5 {\deg}C/s and 1.5 {\deg}C/s, respectively, validating frequency-selectivity. Both channels can run continuously for four seconds stably.
comment: 25 pages, 8 figures
Magnetic Materials for Transcranial Magnetic Stimulation (TMS)
Various coils for transcranial magnetic stimulation (TMS) are widely available for clinical and research use. These coils are almost all designed as air coils, which require large levels of energy to achieve a given magnetic flux density and in turn electric field strength, whereas in other sectors, such as power electronics or electrical machines, magnetic materials have been used for a long time to achieve higher efficiencies. We tested the impact on the electric and magnetic properties of different soft magnetic materials, including various ferrite cores, laminated sheet materials of nonisotropic corn-oriented silicon-steel, non-oriented silicon-steel, as well as cobalt-iron, and soft magnetic compound powder cores with insulated particles. Every material led to a reduction in coil current and voltage for the same target electric field strength. For the same field energy, every material yielded lower losses. Most common materials saturated already at very low currents. More material in thicker layers could shift the saturation point but at the cost of high weight. Due to their low saturation flux density, ferrites appear unsuitable for the high amplitude requirements of TMS. Laminated sheet materials and powder cores reduce the pulse energy, but the laminated sheet material adds more weight for the same effect than powder cores. Thus, appropriate magnetic materials can reduce the required pulse energy. Saturation flux density is the most relevant parameter, whereas the permeability beyond a certain base level is practically irrelevant. Most importantly, the weight of a magnetic-core coil may always be increased compared to an air coil for the same target field.
comment: 30 pages, 14 figures, 2 tables
Iterative Cut-Based PWA Approximation of Multi-Dimensional Nonlinear Systems
PieceWise Affine (PWA) approximations for nonlinear functions have been extensively used for tractable, computationally efficient control of nonlinear systems. However, reaching a desired approximation accuracy without prior information about the behavior of the nonlinear systems remains a challenge in the function approximation and control literature. As the name suggests, PWA approximation aims at approximating a nonlinear function or system by dividing the domain into multiple subregions where the nonlinear function or dynamics is approximated locally by an affine function also called local mode. Without prior knowledge of the form of the nonlinearity, the required number of modes, the locations of the subregions, and the local approximations need to be optimized simultaneously, which becomes highly complex for large-scale systems with multi-dimensional nonlinear functions. This paper introduces a novel approach for PWA approximation of multi-dimensional nonlinear systems, utilizing a hinging hyperplane formalism for cut-based partitioning of the domain. The complexity of the PWA approximation is iteratively increased until reaching the desired accuracy level. Further, the tractable cut definitions allow for different forms of subregions, as well as the ability to impose continuity constraints on the PWA approximation. The methodology is explained via multiple examples and its performance is compared to two existing approaches through case studies, showcasing its efficacy.
comment: 9 pages, 4 figures, submitted to journal
ORFit: One-Pass Learning via Bridging Orthogonal Gradient Descent and Recursive Least-Squares
While large machine learning models have shown remarkable performance in various domains, their training typically requires iterating for many passes over the training data. However, due to computational and memory constraints and potential privacy concerns, storing and accessing all the data is impractical in many real-world scenarios where the data arrives in a stream. In this paper, we investigate the problem of one-pass learning, in which a model is trained on sequentially arriving data without retraining on previous datapoints. Motivated by the demonstrated effectiveness of overparameterized models and the phenomenon of benign overfitting, we propose Orthogonal Recursive Fitting (ORFit), an algorithm for one-pass learning which seeks to perfectly fit each new datapoint while minimally altering the predictions on previous datapoints. ORFit updates the parameters in a direction orthogonal to past gradients, similar to orthogonal gradient descent (OGD) in continual learning. We show that, interestingly, ORFit's update leads to an operation similar to the recursive least-squares (RLS) algorithm in adaptive filtering but with significantly improved memory and computational efficiency, i.e., linear, instead of quadratic, in the number of parameters. To further reduce memory usage, we leverage the structure of the streaming data via an incremental principal component analysis (IPCA). We show that using the principal components is minimax optimal, i.e., it minimizes the worst-case forgetting of previous predictions for unknown future updates. Further, we prove that, for overparameterized linear models, the parameter vector obtained by ORFit matches what the standard multi-pass stochastic gradient descent (SGD) would converge to. Finally, we extend our results to the nonlinear setting for highly overparameterized models, relevant for deep learning.
comment: Journal extension of v1: Y. Min, K, Ahn, N. Azizan, "One-Pass Learning via Bridging Orthogonal Gradient Descent and Recursive Least-Squares," IEEE Conference on Decision and Control, 2022
Spatio-Temporal Consistent Soft Sensor Modeling and Monitoring of Thermal Power Plants based on Physical Knowledge
Data-driven soft sensors have been widely applied in complex industrial processes. However, the interpretable spatio-temporal features extraction by soft sensors remains a challenge. In this light, this work introduces a novel method termed spatio-temporal consistent and interpretable model (STCIM). First, temporal and spatial features are captured and aligned by a far topological spatio-temporal consistency extraction block. Then, the features are mapped into an interpretable latent space for further prediction by explicitly giving physical meanings to latent variables. The efficacy of the proposed STCIM is demonstrated through the modeling of two generated datasets and a real-life dataset of coal-fired power plants. The corresponding experiments show: 1) The generalization of STCIM outperforms other methods, especially in different operation situations. 2) The far topological spatio-temporal consistency is vital for feature alignment. 3) The hyper-parameters of physics-informed interpretable latent space loss decide the performance of STCIM.
Lyapunov Neural ODE State-Feedback Control Policies
Deep neural networks are increasingly used as an effective parameterization of control policies in various learning-based control paradigms. For continuous-time optimal control problems (OCPs), which are central to many decision-making tasks, control policy learning can be cast as a neural ordinary differential equation (NODE) problem wherein state and control constraints are naturally accommodated. This paper presents a NODE approach to solving continuous-time OCPs for the case of stabilizing a known constrained nonlinear system around a target state. The approach, termed Lyapunov-NODE control (L-NODEC), uses a novel Lyapunov loss formulation that incorporates an exponentially-stabilizing control Lyapunov function to learn a state-feedback neural control policy, bridging the gap of solving continuous-time OCPs via NODEs with stability guarantees. The proposed Lyapunov loss allows L-NODEC to guarantee exponential stability of the controlled system, as well as its adversarial robustness to perturbations to the initial state. The performance of L-NODEC is illustrated in two problems, including a dose delivery problem in plasma medicine. In both cases, L-NODEC effectively stabilizes the controlled system around the target state despite perturbations to the initial state and reduces the inference time necessary to reach the target.
Multiagent Systems
Predictive Auxiliary Learning for Belief-based Multi-Agent Systems
The performance of multi-agent reinforcement learning (MARL) in partially observable environments depends on effectively aggregating information from observations, communications, and reward signals. While most existing multi-agent systems primarily rely on rewards as the only feedback for policy training, our research shows that introducing auxiliary predictive tasks can significantly enhance learning efficiency and stability. We propose Belief-based Predictive Auxiliary Learning (BEPAL), a framework that incorporates auxiliary training objectives to support policy optimization. BEPAL follows the centralized training with decentralized execution paradigm. Each agent learns a belief model that predicts unobservable state information, such as other agents' rewards or motion directions, alongside its policy model. By enriching hidden state representations with information that does not directly contribute to immediate reward maximization, this auxiliary learning process stabilizes MARL training and improves overall performance. We evaluate BEPAL in the predator-prey environment and Google Research Football, where it achieves an average improvement of about 16 percent in performance metrics and demonstrates more stable convergence compared to baseline methods.
GOSPA-Driven Non-Myopic Multi-Sensor Management with Multi-Bernoulli Filtering
In this paper, we propose a non-myopic sensor management algorithm for multi-target tracking, with multiple sensors operating in the same surveillance area. The algorithm is based on multi-Bernoulli filtering and selects the actions that solve a non-myopic minimisation problem, where the cost function is the mean square generalised optimal sub-pattern assignment (GOSPA) error, over a future time window. For tractability, the sensor management algorithm actually uses an upper bound of the GOSPA error and is implemented via Monte Carlo Tree Search (MCTS). The sensors have the ability to jointly optimise and select their actions with the considerations of all other sensors in the surveillance area. The benefits of the proposed algorithm are analysed via simulations.
comment: submitted to IEEE Transactions on Aerospace and Electronic Systems November 2025
A Simple Logic of Cohesive Group Agency
We propose a structure to represent the social fabric of a group. We call it the `cohesion network' of the group. It can be seen as a graph whose vertices are strict subgroups and whose edges indicate a prescribed `pro-social behaviour' from one subgroup towards another. In social psychology, pro-social behaviours are building blocks of full-blown cooperation, which we assimilate here with `group cohesiveness'. We then define a formal framework to study cohesive group agency. To do so, we simply instantiate pro-social behaviour with the more specific relation of `successful assistance' between acting entities in a group. The relations of assistance within a group at the moment of agency constitute the social fabric of the cohesive group agency. We build our logical theory upon the logic of agency "bringing-it-about". We obtain a family of logics of cohesive group agency, one for every class of cohesion networks.
Robotics
Multi-Mapcher: Loop Closure Detection-Free Heterogeneous LiDAR Multi-Session SLAM Leveraging Outlier-Robust Registration for Autonomous Vehicles
As various 3D light detection and ranging (LiDAR) sensors have been introduced to the market, research on multi-session simultaneous localization and mapping (MSS) using heterogeneous LiDAR sensors has been actively conducted. Existing MSS methods mostly rely on loop closure detection for inter-session alignment; however, the performance of loop closure detection can be potentially degraded owing to the differences in the density and field of view (FoV) of the sensors used in different sessions. In this study, we challenge the existing paradigm that relies heavily on loop detection modules and propose a novel MSS framework, called Multi-Mapcher, that employs large-scale map-to-map registration to perform inter-session initial alignment, which is commonly assumed to be infeasible, by leveraging outlier-robust 3D point cloud registration. Next, after finding inter-session loops by radius search based on the assumption that the inter-session initial alignment is sufficiently precise, anchor node-based robust pose graph optimization is employed to build a consistent global map. As demonstrated in our experiments, our approach shows substantially better MSS performance for various LiDAR sensors used to capture the sessions and is faster than state-of-the-art approaches. Our code is available at https://github.com/url-kaist/multi-mapcher.
comment: 13 pages, 12 figures
Improving Robustness to Out-of-Distribution States in Imitation Learning via Deep Koopman-Boosted Diffusion Policy
Integrating generative models with action chunking has shown significant promise in imitation learning for robotic manipulation. However, the existing diffusion-based paradigm often struggles to capture strong temporal dependencies across multiple steps, particularly when incorporating proprioceptive input. This limitation can lead to task failures, where the policy overfits to proprioceptive cues at the expense of capturing the visually derived features of the task. To overcome this challenge, we propose the Deep Koopman-boosted Dual-branch Diffusion Policy (D3P) algorithm. D3P introduces a dual-branch architecture to decouple the roles of different sensory modality combinations. The visual branch encodes the visual observations to indicate task progression, while the fused branch integrates both visual and proprioceptive inputs for precise manipulation. Within this architecture, when the robot fails to accomplish intermediate goals, such as grasping a drawer handle, the policy can dynamically switch to execute action chunks generated by the visual branch, allowing recovery to previously observed states and facilitating retrial of the task. To further enhance visual representation learning, we incorporate a Deep Koopman Operator module that captures structured temporal dynamics from visual inputs. During inference, we use the test-time loss of the generative model as a confidence signal to guide the aggregation of the temporally overlapping predicted action chunks, thereby enhancing the reliability of policy execution. In simulation experiments across six RLBench tabletop tasks, D3P outperforms the state-of-the-art diffusion policy by an average of 14.6\%. On three real-world robotic manipulation tasks, it achieves a 15.0\% improvement. Code: https://github.com/dianyeHuang/D3P.
comment: Accepted by IEEE T-RO
Adaptive and Multi-object Grasping via Deformable Origami Modules
Soft robotics gripper have shown great promise in handling fragile and geometrically complex objects. However, most existing solutions rely on bulky actuators, complex control strategies, or advanced tactile sensing to achieve stable and reliable grasping performance. In this work, we present a multi-finger hybrid gripper featuring passively deformable origami modules that generate constant force and torque output. Each finger composed of parallel origami modules is driven by a 1-DoF actuator mechanism, enabling passive shape adaptability and stable grasping force without active sensing or feedback control. More importantly, we demonstrate an interesting capability in simultaneous multi-object grasping, which allows stacked objects of varied shape and size to be picked, transported and placed independently at different states, significantly improving manipulation efficiency compared to single-object grasping. These results highlight the potential of origami-based compliant structures as scalable modules for adaptive, stable and efficient multi-object manipulation in domestic and industrial pick-and-place scenarios.
Descriptive Model-based Learning and Control for Bipedal Locomotion
Bipedal balance is challenging due to its multi-phase, hybrid nature and high-dimensional state space. Traditional balance control approaches for bipedal robots rely on low-dimensional models for locomotion planning and reactive control, constraining the full robot to behave like these simplified models. This involves tracking preset reference paths for the Center of Mass and upper body obtained through low-dimensional models, often resulting in inefficient walking patterns with bent knees. However, we observe that bipedal balance is inherently low-dimensional and can be effectively described with simple state and action descriptors in a low-dimensional state space. This allows the robot's motion to evolve freely in its high-dimensional state space, only constraining its projection in the low-dimensional state space. In this work, we propose a novel control approach that avoids prescribing a low-dimensional model to the full model. Instead, our control framework uses a descriptive model with the minimum degrees of freedom necessary to maintain balance, allowing the remaining degrees of freedom to evolve freely in the high-dimensional space. This results in an efficient human-like walking gait and improved robustness.
comment: 8 pages, 15 figures
OmniTrack++: Omnidirectional Multi-Object Tracking by Learning Large-FoV Trajectory Feedback CVPR 2025
This paper investigates Multi-Object Tracking (MOT) in panoramic imagery, which introduces unique challenges including a 360{\deg} Field of View (FoV), resolution dilution, and severe view-dependent distortions. Conventional MOT methods designed for narrow-FoV pinhole cameras generalize unsatisfactorily under these conditions. To address panoramic distortion, large search space, and identity ambiguity under a 360{\deg} FoV, OmniTrack++ adopts a feedback-driven framework that progressively refines perception with trajectory cues. A DynamicSSM block first stabilizes panoramic features, implicitly alleviating geometric distortion. On top of normalized representations, FlexiTrack Instances use trajectory-informed feedback for flexible localization and reliable short-term association. To ensure long-term robustness, an ExpertTrack Memory consolidates appearance cues via a Mixture-of-Experts design, enabling recovery from fragmented tracks and reducing identity drift. Finally, a Tracklet Management module adaptively switches between end-to-end and tracking-by-detection modes according to scene dynamics, offering a balanced and scalable solution for panoramic MOT. To support rigorous evaluation, we establish the EmboTrack benchmark, a comprehensive dataset for panoramic MOT that includes QuadTrack, captured with a quadruped robot, and BipTrack, collected with a bipedal wheel-legged robot. Together, these datasets span wide-angle environments and diverse motion patterns, providing a challenging testbed for real-world panoramic perception. Extensive experiments on JRDB and EmboTrack demonstrate that OmniTrack++ achieves state-of-the-art performance, yielding substantial HOTA improvements of +25.5% on JRDB and +43.07% on QuadTrack over the original OmniTrack. Datasets and code will be made publicly available at https://github.com/xifen523/OmniTrack.
comment: Extended version of CVPR 2025 paper arXiv:2503.04565. Datasets and code will be made publicly available at https://github.com/xifen523/OmniTrack
Design and Development of a Modular Bucket Drum Excavator for Lunar ISRU
In-Situ Resource Utilization (ISRU) is one of the key technologies for enabling sustainable access to the Moon. The ability to excavate lunar regolith is the first step in making lunar resources accessible and usable. This work presents the development of a bucket drum for the modular robotic system MoonBot, as part of the Japanese Moonshot program. A 3D-printed prototype made of PLA was manufactured to evaluate its efficiency through a series of sandbox tests. The resulting tool weighs 4.8 kg and has a volume of 14.06 L. It is capable of continuous excavation at a rate of 777.54 kg/h with a normalized energy consumption of 0.022 Wh/kg. In batch operation, the excavation rate is 172.02 kg/h with a normalized energy consumption of 0.86 Wh per kilogram of excavated material. The obtained results demonstrate the successful implementation of the concept. A key advantage of the developed tool is its compatibility with the modular MoonBot robotic platform, which enables flexible and efficient mission planning. Further improvements may include the integration of sensors and an autonomous control system to enhance the excavation process.
comment: 6 pages, 4 figures. Accepted at IEEE iSpaRo 2025
Bootstrap Off-policy with World Model NeurIPS 2025
Online planning has proven effective in reinforcement learning (RL) for improving sample efficiency and final performance. However, using planning for environment interaction inevitably introduces a divergence between the collected data and the policy's actual behaviors, degrading both model learning and policy improvement. To address this, we propose BOOM (Bootstrap Off-policy with WOrld Model), a framework that tightly integrates planning and off-policy learning through a bootstrap loop: the policy initializes the planner, and the planner refines actions to bootstrap the policy through behavior alignment. This loop is supported by a jointly learned world model, which enables the planner to simulate future trajectories and provides value targets to facilitate policy improvement. The core of BOOM is a likelihood-free alignment loss that bootstraps the policy using the planner's non-parametric action distribution, combined with a soft value-weighted mechanism that prioritizes high-return behaviors and mitigates variability in the planner's action quality within the replay buffer. Experiments on the high-dimensional DeepMind Control Suite and Humanoid-Bench show that BOOM achieves state-of-the-art results in both training stability and final performance. The code is accessible at https://github.com/molumitu/BOOM_MBRL.
comment: NeurIPS 2025
iFlyBot-VLA Technical Report
We introduce iFlyBot-VLA, a large-scale Vision-Language-Action (VLA) model trained under a novel framework. The main contributions are listed as follows: (1) a latent action model thoroughly trained on large-scale human and robotic manipulation videos; (2) a dual-level action representation framework that jointly supervises both the Vision-Language Model (VLM) and the action expert during training; (3) a mixed training strategy that combines robot trajectory data with general QA and spatial QA datasets, effectively enhancing the 3D perceptual and reasoning capabilities of the VLM backbone. Specifically, the VLM is trained to predict two complementary forms of actions: latent actions, derived from our latent action model pretrained on cross-embodiment manipulation data, which capture implicit high-level intentions; and structured discrete action tokens, obtained through frequency-domain transformations of continuous control signals, which encode explicit low-level dynamics. This dual supervision aligns the representation spaces of language, vision, and action, enabling the VLM to directly contribute to action generation. Experimental results on the LIBERO Franka benchmark demonstrate the superiority of our frame-work, while real-world evaluations further show that iFlyBot-VLA achieves competitive success rates across diverse and challenging manipulation tasks. Furthermore, we plan to open-source a portion of our self-constructed dataset to support future research in the community
Runge-Kutta Approximations for Direct Coning Compensation Applying Lie Theory
The integration of gyroscope measurements is an essential task for most navigation systems. Modern vehicles typically use strapdown systems, such that gyro integration requires coning compensation to account for the sensor's rotation during the integration. Many coning compensation algorithms have been developed and a few are reviewed. This work introduces a new class of coning correction algorithm built directly from the classical Runge-Kutta integration routines. A simple case is shown to collapse to one of the most popular coning algorithms and a clear procedure for generating higher-order algorithms is presented.
RoboOmni: Proactive Robot Manipulation in Omni-modal Context
Recent advances in Multimodal Large Language Models (MLLMs) have driven rapid progress in Vision-Language-Action (VLA) models for robotic manipulation. Although effective in many scenarios, current approaches largely rely on explicit instructions, whereas in real-world interactions, humans rarely issue instructions directly. Effective collaboration requires robots to infer user intentions proactively. In this work, we introduce cross-modal contextual instructions, a new setting where intent is derived from spoken dialogue, environmental sounds, and visual cues rather than explicit commands. To address this new setting, we present RoboOmni, a Perceiver-Thinker-Talker-Executor framework based on end-to-end omni-modal LLMs that unifies intention recognition, interaction confirmation, and action execution. RoboOmni fuses auditory and visual signals spatiotemporally for robust intention recognition, while supporting direct speech interaction. To address the absence of training data for proactive intention recognition in robotic manipulation, we build OmniAction, comprising 140k episodes, 5k+ speakers, 2.4k event sounds, 640 backgrounds, and six contextual instruction types. Experiments in simulation and real-world settings show that RoboOmni surpasses text- and ASR-based baselines in success rate, inference speed, intention recognition, and proactive assistance.
Beyond the Uncanny Valley: A Mixed-Method Investigation of Anthropomorphism in Protective Responses to Robot Abuse
Robots with anthropomorphic features are increasingly shaping how humans perceive and morally engage with them. Our research investigates how different levels of anthropomorphism influence protective responses to robot abuse, extending the Computers as Social Actors (CASA) and uncanny valley theories into a moral domain. In an experiment, we invite 201 participants to view videos depicting abuse toward a robot with low (Spider), moderate (Two-Foot), or high (Humanoid) anthropomorphism. To provide a comprehensive analysis, we triangulate three modalities: self-report surveys measuring emotions and uncanniness, physiological data from automated facial expression analysis, and qualitative reflections. Findings indicate that protective responses are not linear. The moderately anthropomorphic Two-Foot robot, rated highest in eeriness and "spine-tingling" sensations consistent with the uncanny valley, elicited the strongest physiological anger expressions. Self-reported anger and guilt are significantly higher for both the Two-Foot and Humanoid robots compared to the Spider. Qualitative findings further reveal that as anthropomorphism increases, moral reasoning shifts from technical assessments of property damage to condemnation of the abuser's character, while governance proposals expand from property law to calls for quasi-animal rights and broader societal responsibility. These results suggest that the uncanny valley does not dampen moral concern but paradoxically heightens protective impulses, offering critical implications for robot design, policy, and future legal frameworks.
Knolling Bot: Teaching Robots the Human Notion of Tidiness NeurIPS 2025
For robots to truly collaborate and assist humans, they must understand not only logic and instructions, but also the subtle emotions, aesthetics, and feelings that define our humanity. Human art and aesthetics are among the most elusive concepts-often difficult even for people to articulate-and without grasping these fundamentals, robots will be unable to help in many spheres of daily life. Consider the long-promised robotic butler: automating domestic chores demands more than motion planning. It requires an internal model of cleanliness and tidiness-a challenge largely unexplored by AI. To bridge this gap, we propose an approach that equips domestic robots to perform simple tidying tasks via knolling, the practice of arranging scattered items into neat, space-efficient layouts. Unlike the uniformity of industrial settings, household environments feature diverse objects and highly subjective notions of tidiness. Drawing inspiration from NLP, we treat knolling as a sequential prediction problem and employ a transformer based model to forecast each object's placement. Our method learns a generalizable concept of tidiness, generates diverse solutions adaptable to varying object sets, and incorporates human preferences for personalized arrangements. This work represents a step forward in building robots that internalize human aesthetic sense and can genuinely co-create in our living spaces.
comment: Accepted at the 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Creative AI Track
MindJourney: Test-Time Scaling with World Models for Spatial Reasoning
Spatial reasoning in 3D space is central to human cognition and indispensable for embodied tasks such as navigation and manipulation. However, state-of-the-art vision-language models (VLMs) struggle frequently with tasks as simple as anticipating how a scene will look after an egocentric motion: they perceive 2D images but lack an internal model of 3D dynamics. We therefore propose MindJourney, a test-time scaling framework that grants a VLM with this missing capability by coupling it to a controllable world model based on video diffusion. The VLM iteratively sketches a concise camera trajectory, while the world model synthesizes the corresponding view at each step. The VLM then reasons over this multi-view evidence gathered during the interactive exploration. Without any fine-tuning, our MindJourney achieves over an average 7.7% performance boost on the representative spatial reasoning benchmark SAT, showing that pairing VLMs with world models for test-time scaling offers a simple, plug-and-play route to robust 3D reasoning. Meanwhile, our method also improves upon the test-time inference VLMs trained through reinforcement learning, which demonstrates the potential of our method that utilizes world models for test-time scaling.
comment: Project Page: https://umass-embodied-agi.github.io/MindJourney
World-Env: Leveraging World Model as a Virtual Environment for VLA Post-Training
Vision-Language-Action (VLA) models trained via imitation learning suffer from significant performance degradation in data-scarce scenarios due to their reliance on large-scale demonstration datasets. Although reinforcement learning (RL)-based post-training has proven effective in addressing data scarcity, its application to VLA models is hindered by the non-resettable nature of real-world environments. This limitation is particularly critical in high-risk domains such as industrial automation, where interactions often induce state changes that are costly or infeasible to revert. Furthermore, existing VLA approaches lack a reliable mechanism for detecting task completion, leading to redundant actions that reduce overall task success rates. To address these challenges, we propose World-Env, an RL-based post-training framework that replaces physical interaction with a low-cost, world model-based virtual simulator. World-Env consists of two key components: (1) a video-based world simulator that generates temporally consistent future visual observations, and (2) a vision-language model (VLM)-guided instant reflector that provides continuous reward signals and predicts action termination. This simulated environment enables VLA models to safely explore and generalize beyond their initial imitation learning distribution. Our method achieves notable performance gains with as few as five expert demonstrations per task. Experiments on complex robotic manipulation tasks demonstrate that World-Env effectively overcomes the data inefficiency, safety constraints, and inefficient execution of conventional VLA models that rely on real-world interaction, offering a practical and scalable solution for post-training in resource-constrained settings. Our code is available at https://github.com/amap-cvlab/world-env.
Multiagent Systems
A CPU-Centric Perspective on Agentic AI
Agentic AI frameworks add a decision-making orchestrator embedded with external tools, including web search, Python interpreter, contextual database, and others, on top of monolithic LLMs, turning them from passive text oracles into autonomous problem-solvers that can plan, call tools, remember past steps, and adapt on the fly. This paper aims to characterize and understand the system bottlenecks introduced by agentic AI workloads from a largely overlooked CPU-centric perspective. We first systematically characterize Agentic AI on the basis of orchestrator/decision making component, inference path dynamics and repetitiveness of the agentic flow which directly influences the system-level performance. Thereafter, based on the characterization, we choose five representative agentic AI workloads- Haystack RAG, Toolformer, ChemCrow, Langchain and SWE-Agent to profile latency, throughput and energy metrics and demystify the significant impact of CPUs on these metrics relative to GPUs. We observe that - 1. Tool processing on CPUs can take up to 90.6% of the total latency; 2. Agentic throughput gets bottlenecked either by CPU factors - coherence, synchronization and over-subscription of cores or GPU factors - main memory capacity and bandwidth; \circled{3} CPU dynamic energy consumes up to 44% of the total dynamic energy at large batch sizes. Based on the profiling insights, we present two key optimizations- 1. CPU and GPU-Aware Micro-batching (CGAM) and 2. Mixed Agentic Workload Scheduling (MAWS) for homogeneous and heterogeneous agentic workloads respectively to demonstrate the potential to improve the performance, efficiency, and scalability of agentic AI. We achieve up to 2.1x and 1.41x P50 latency speedup compared to the multi-processing benchmark for homogeneous and heterogeneous agentic workloads respectively.
Leveraging Multi-Agent System (MAS) and Fine-Tuned Small Language Models (SLMs) for Automated Telecom Network Troubleshooting
Telecom networks are rapidly growing in scale and complexity, making effective management, operation, and optimization increasingly challenging. Although Artificial Intelligence (AI) has been applied to many telecom tasks, existing models are often narrow in scope, require large amounts of labeled data, and struggle to generalize across heterogeneous deployments. Consequently, network troubleshooting continues to rely heavily on Subject Matter Experts (SMEs) to manually correlate various data sources to identify root causes and corrective actions. To address these limitations, we propose a Multi-Agent System (MAS) that employs an agentic workflow, with Large Language Models (LLMs) coordinating multiple specialized tools for fully automated network troubleshooting. Once faults are detected by AI/ML-based monitors, the framework dynamically activates agents such as an orchestrator, solution planner, executor, data retriever, and root-cause analyzer to diagnose issues and recommend remediation strategies within a short time frame. A key component of this system is the solution planner, which generates appropriate remediation plans based on internal documentation. To enable this, we fine-tuned a Small Language Model (SLM) on proprietary troubleshooting documents to produce domain-grounded solution plans. Experimental results demonstrate that the proposed framework significantly accelerates troubleshooting automation across both Radio Access Network (RAN) and Core network domains.
comment: 6 pages, 7 figures, 1 table
AgentGit: A Version Control Framework for Reliable and Scalable LLM-Powered Multi-Agent Systems
With the rapid progress of large language models (LLMs), LLM-powered multi-agent systems (MAS) are drawing increasing interest across academia and industry. However, many current MAS frameworks struggle with reliability and scalability, especially on complex tasks. We present AgentGit, a framework that brings Git-like rollback and branching to MAS workflows. Built as an infrastructure layer on top of LangGraph, AgentGit supports state commit, revert, and branching, allowing agents to traverse, compare, and explore multiple trajectories efficiently. To evaluate AgentGit, we designed an experiment that optimizes target agents by selecting better prompts. We ran a multi-step A/B test against three baselines -- LangGraph, AutoGen, and Agno -- on a real-world task: retrieving and analyzing paper abstracts. Results show that AgentGit significantly reduces redundant computation, lowers runtime and token usage, and supports parallel exploration across multiple branches, enhancing both reliability and scalability in MAS development. This work offers a practical path to more robust MAS design and enables error recovery, safe exploration, iterative debugging, and A/B testing in collaborative AI systems.
Spatial Crowdsourcing-based Task Allocation for UAV-assisted Maritime Data Collection
Driven by the unceasing development of maritime services, tasks of unmanned aerial vehicle (UAV)-assisted maritime data collection (MDC) are becoming increasingly diverse, complex and personalized. As a result, effective task allocation for MDC is becoming increasingly critical. In this work, integrating the concept of spatial crowdsourcing (SC), we develop an SC-based MDC network model and investigate the task allocation problem for UAV-assisted MDC. In variable maritime service scenarios, tasks are allocated to UAVs based on the spatial and temporal requirements of the tasks, as well as the mobility of the UAVs. To address this problem, we design an SC-based task allocation algorithm for the MDC (SC-MDC-TA). The quality estimation is utilized to assess and regulate task execution quality by evaluating signal to interference plus noise ratio and the UAV energy consumption. The reverse auction is employed to potentially reduce the task waiting time as much as possible while ensuring timely completion. Additionally, we establish typical task allocation scenarios based on maritime service requirements indicated by electronic navigational charts. Simulation results demonstrate that the proposed SC-MDC-TA algorithm effectively allocates tasks for various MDC scenarios. Furthermore, compared to the benchmark, the SC-MDC-TA algorithm can also reduce the task completion time and lower the UAV energy consumption.
EvoMem: Improving Multi-Agent Planning with Dual-Evolving Memory
Planning has been a cornerstone of artificial intelligence for solving complex problems, and recent progress in LLM-based multi-agent frameworks have begun to extend this capability. However, the role of human-like memory within these frameworks remains largely unexplored. Understanding how agents coordinate through memory is critical for natural language planning, where iterative reasoning, constraint tracking, and error correction drive the success. Inspired by working memory model in cognitive psychology, we present EvoMem, a multi-agent framework built on a dual-evolving memory mechanism. The framework consists of three agents (Constraint Extractor, Verifier, and Actor) and two memory modules: Constraint Memory (CMem), which evolves across queries by storing task-specific rules and constraints while remains fixed within a query, and Query-feedback Memory (QMem), which evolves within a query by accumulating feedback across iterations for solution refinement. Both memory modules are reset at the end of each query session. Evaluations on trip planning, meeting planning, and calendar scheduling show consistent performance improvements, highlighting the effectiveness of EvoMem. This success underscores the importance of memory in enhancing multi-agent planning.
Sherlock: Reliable and Efficient Agentic Workflow Execution
With the increasing adoption of large language models (LLM), agentic workflows, which compose multiple LLM calls with tools, retrieval, and reasoning steps, are increasingly replacing traditional applications. However, such workflows are inherently error-prone: incorrect or partially correct output at one step can propagate or even amplify through subsequent stages, compounding the impact on the final output. Recent work proposes integrating verifiers that validate LLM output or actions, such as self-reflection, debate, or LLM-as-a-judge mechanisms. Yet, verifying every step introduces significant latency and cost overheads. In this work, we seek to answer three key questions: which nodes in a workflow are most error-prone and thus deserve costly verification, how to select the most appropriate verifier for each node, and how to use verification with minimal impact to latency? Our solution, Sherlock, addresses these using counterfactual analysis on agentic workflows to identify error-prone nodes and selectively attaching cost-optimal verifiers only where necessary. At runtime, Sherlock speculatively executes downstream tasks to reduce latency overhead, while verification runs in the background. If verification fails, execution is rolled back to the last verified output. Compared to the non-verifying baseline, Sherlock delivers an 18.3% accuracy gain on average across benchmarks. Sherlock reduces workflow execution time by up to 48.7% over non-speculative execution and lowers verification cost by 26.0% compared to the Monte Carlo search-based method, demonstrating that principled, fault-aware verification effectively balances efficiency and reliability in agentic workflows.
STACKFEED: Structured Textual Actor-Critic Knowledge Base Editing with FeedBack
Large Language Models (LLMs) often generate incorrect or outdated information, especially in low-resource settings or when dealing with private data. To address this, Retrieval-Augmented Generation (RAG) uses external knowledge bases (KBs), but these can also suffer from inaccuracies. We introduce STACKFEED, a novel Structured Textual Actor-Critic Knowledge base editing with FEEDback approach that iteratively refines the KB based on expert feedback using a multi-actor, centralized critic reinforcement learning framework. STACKFEED defines a ReACT actor agent on each document to perform structured edits based on document specific targeted instructions. Experimental results showcase that STACKFEED significantly improves KB quality and performance of the RAG system. We evaluate STACKFEED on low-resource programming problems, modified python packaged and factual question-answering tasks.
3MDBench: Medical Multimodal Multi-agent Dialogue Benchmark EMNLP 25
Though Large Vision-Language Models (LVLMs) are being actively explored in medicine, their ability to conduct complex real-world telemedicine consultations combining accurate diagnosis with professional dialogue remains underexplored. This paper presents 3MDBench (Medical Multimodal Multi-agent Dialogue Benchmark), an open-source framework for simulating and evaluating LVLM-driven telemedical consultations. 3MDBench simulates patient variability through temperament-based Patient Agent and evaluates diagnostic accuracy and dialogue quality via Assessor Agent. It includes 2996 cases across 34 diagnoses from real-world telemedicine interactions, combining textual and image-based data. The experimental study compares diagnostic strategies for widely used open and closed-source LVLMs. We demonstrate that multimodal dialogue with internal reasoning improves F1 score by 6.5% over non-dialogue settings, highlighting the importance of context-aware, information-seeking questioning. Moreover, injecting predictions from a diagnostic convolutional neural network into the LVLM's context boosts F1 by up to 20%. Source code is available at https://github.com/univanxx/3mdbench.
comment: EMNLP 25 (main)
H-NeiFi: Non-Invasive and Consensus-Efficient Multi-Agent Opinion Guidance
The openness of social media enables the free exchange of opinions, but it also presents challenges in guiding opinion evolution towards global consensus. Existing methods often directly modify user views or enforce cross-group connections. These intrusive interventions undermine user autonomy, provoke psychological resistance, and reduce the efficiency of global consensus. Additionally, due to the lack of a long-term perspective, promoting local consensus often exacerbates divisions at the macro level. To address these issues, we propose the hierarchical, non-intrusive opinion guidance framework, H-NeiFi. It first establishes a two-layer dynamic model based on social roles, considering the behavioral characteristics of both experts and non-experts. Additionally, we introduce a non-intrusive neighbor filtering method that adaptively controls user communication channels. Using multi-agent reinforcement learning (MARL), we optimize information propagation paths through a long-term reward function, avoiding direct interference with user interactions. Experiments show that H-NeiFi increases consensus speed by 22.0% to 30.7% and maintains global convergence even in the absence of experts. This approach enables natural and efficient consensus guidance by protecting user interaction autonomy, offering a new paradigm for social network governance.
KVCOMM: Online Cross-context KV-cache Communication for Efficient LLM-based Multi-agent Systems NeurIPS2025
Multi-agent large language model (LLM) systems are increasingly adopted for complex language processing tasks that require communication and coordination among agents. However, these systems often suffer substantial overhead from repeated reprocessing of overlapping contexts across agents. In typical pipelines, once an agent receives a message from its predecessor, the full context-including prior turns-must be reprocessed from scratch, leading to inefficient processing. While key-value (KV) caching is an effective solution for avoiding redundant computation in single-agent settings where prefixes remain unchanged, it cannot be directly reused in multi-agent scenarios due to diverging prefixes introduced by agent-specific context extensions. We identify that the core challenge lies in the offset variance of KV-caches across agents. To address this, we propose KVCOMM, a training-free framework that enables efficient prefilling in multi-agent inference by reusing KV-caches and aligning cache offsets of overlapping contexts under diverse prefix contexts. KVCOMM estimates and adjusts KV-caches for shared content by referencing a pool of cached examples-termed anchors-that store observed cache deviations under varying prefixes. The anchor pool is maintained and updated online, allowing dynamic adaptation to distinct user requests and context structures. KVCOMM achieves over 70% reuse rate across diverse multi-agent workloads, including retrieval-augmented generation, math reasoning, and collaborative coding tasks, all without quality degradation. Particularly, when each fully-connected agent receives 1K input tokens with 512 prefix tokens and 512 output tokens under a five-agent setting, KVCOMM achieves up to 7.8x speedup compared to the standard prefill pipeline, reducing TTFT from ~430 ms to ~55 ms.
comment: Accepted for publication in NeurIPS2025. Code is available at \url{https://github.com/FastMAS/KVCOMM}
Systems and Control (CS)
Quantum Computing for EVs to Enhance Grid Resilience and Disaster Relief: Challenges and Opportunities
The power grid is the foundation of modern society, however extreme weather events have increasingly caused widespread outages. Enhancing grid resilience is therefore critical to maintaining secure and reliable operations. In disaster relief and restoration, vehicle-to-grid (V2G) technology allows electric vehicles (EVs) to serve as mobile energy resources by discharging to support critical loads or regulating grid frequency as needed. Effective V2G operation requires coordinated charging and discharging of many EVs through optimization. Similarly, in grid restoration, EVs must be strategically routed to affected areas, forming the mobile charging station placement (CSP) problem, which presents another complex optimization challenge. This work reviews state-of-the-art optimization methods for V2G and mobile CSP applications, outlines their limitations, and explores how quantum computing (QC) could overcome current computational bottlenecks. A QC-focused perspective is presented on enhancing grid resilience and accelerating restoration as extreme weather events grow more frequent and severe.
comment: 11 pages, 0 figures, 2 tables, Submitted to IEEE Transactions on Smart Grid
Hybrid Quantum-Classical Optimization of the Resource Scheduling Problem
Resource scheduling is critical in many industries, especially in power systems. The Unit Commitment problem determines the on/off status and output levels of generators under many constraints. Traditional exact methods, such as mathematical programming methods or dynamic programming, remain the backbone of UC solution techniques, but they often rely on linear approximations or exhaustive search, leading to high computational burdens as system size grows. Metaheuristic approaches, such as genetic algorithms, particle swarm optimization, and other evolutionary methods, have been explored to mitigate this complexity; however, they typically lack optimality guarantees, exhibit sensitivity to initial conditions, and can become prohibitively time-consuming for large-scale systems. In this paper, we introduce a quantum-classical hybrid algorithm for UC and, by extension, other resource scheduling problems, that leverages Benders decomposition to decouple binary commitment decisions from continuous economic dispatch. The binary master problem is formulated as a quadratic unconstrained binary optimization model and solved on a quantum annealer. The continuous subproblem, which minimizes generation costs, with Lagrangian cuts feeding back to the master until convergence. We evaluate our hybrid framework on systems scaled from 10 to 1,000 generation units. Compared against a classical mixed-integer nonlinear programming baseline, the hybrid algorithm achieves a consistently lower computation-time growth rate and maintains an absolute optimality gap below 1.63%. These results demonstrate that integrating quantum annealing within a hybrid quantum-classical Benders decomposition loop can significantly accelerate large-scale resource scheduling without sacrificing solution quality, pointing toward a viable path for addressing the escalating complexity of modern power grids.
comment: 13 pages, 7 figures, 1 table Submitted to Next Research
Unveiling Uniform Shifted Power Law in Stochastic Human and Autonomous Driving Behavior
Accurately simulating rare but safety-critical driving behaviors is essential for the evaluation and certification of autonomous vehicles (AVs). However, current models often fail to reproduce realistic collision rates when calibrated on real-world data, largely due to inadequate representation of long-tailed behavioral distributions. Here, we uncover a simple yet unifying shifted power law that robustly characterizes the stochasticity of both human-driven vehicle (HV) and AV behaviors, especially in the long-tail regime. The model adopts a parsimonious analytical form with only one or two parameters, enabling efficient calibration even under data sparsity. Analyzing large-scale, micro-level trajectory data from global HV and AV datasets, the shifted power law achieves an average R2 of 0.97 and a nearly identical tail distribution, uniformly fits both frequent behaviors and rare safety-critical deviations, significantly outperforming existing Gaussian-based baselines. When integrated into an agent-based traffic simulator, it enables forward-rolling simulations that reproduce realistic crash patterns for both HVs and AVs, achieving rates consistent with real-world statistics and improving the fidelity of safety assessment without post hoc correction. This discovery offers a unified and data-efficient foundation for modeling high-risk behavior and improves the fidelity of simulation-based safety assessments for mixed AV/HV traffic. The shifted power law provides a promising path toward simulation-driven validation and global certification of AV technologies.
Frequency Quality Assessment of GFM and GFL Converters and Synchronous Condensers
This paper compares the impact of different conventional and emerging technologies and control strategies on frequency quality. We study, in particular, the long-term dynamic performance of grid-forming (GFM) and grid-following (GFL) inverter-based resources (IBRs) as well as conventional synchronous machines. Extensive simulations and several realistic scenarios consider both short-term and long-term aspects of frequency quality. It is shown that, while overall GFM IBRs significantly improve frequency quality, a combination of GFL IBRs providing frequency support such as wind and batteries, and synchronous condensers, might be enough to meet similar frequency quality standards. Another result of the paper is that the need for automatic generation control (AGC) becomes less clear in GFM IBR-dominated grids from a frequency quality perspective.
Adaptive Federated Learning to Optimize the MultiCast flows in Data Centers
Data centers play an increasingly critical role in societal digitalization, yet their rapidly growing energy demand poses significant challenges for sustainable operation. To enhance the energy efficiency of geographically distributed data centers, this paper formulates a multi-period optimization model that captures the interdependence of electricity, heat, and data flows. The optimization of such multicast flows inherently involves mixed-integer formulations and the access to proprietary or sensitive datasets, which correspondingly exacerbate computational complexity and raise data-privacy concerns. To address these challenges, an adaptive federated learning-to-optimization approach is proposed, accounting for the heterogeneity of datasets across distributed data centers. To safeguard privacy, cryptography techniques are leveraged in both the learning and optimization processes. A model acceptance criterion with convergence guarantee is developed to improve learning performance and filter out potentially contaminated data, while a verifiable double aggregation mechanism is further proposed to simultaneously ensure privacy and integrity of shared data during optimization. Theoretical analysis and numerical simulations demonstrate that the proposed approach preserves the privacy and integrity of shared data, achieves near-optimal performance, and exhibits high computational efficiency, making it suitable for large-scale data center optimization under privacy constraints.
Efficiency and Optimality in Electrochemical Battery Model Parameter Identification: A Comparative Study of Estimation Techniques
Parameter identification for electrochemical battery models has always been challenging due to the multitude of parameters involved, most of which cannot be directly measured. This paper evaluates the efficiency and optimality of three widely-used parameter identification methods for electrochemical battery models: Least Squares Method (LS), Particle Swarm Optimization (PSO), and Genetic Algorithm (GA). Therefore, a Single Particle Model (SPM) of a battery was developed and discretized. Battery parameter grouping was then performed to reduce the number of parameters required. Using a set of parameters previously identified from a real battery as a benchmark, we generated fitting and validation datasets to assess the methods' runtime and accuracy. The comparative analysis reveals that PSO outperforms the other methods in terms of accuracy and stability, making it highly effective for parameter identification when there is no prior knowledge of the battery's internal parameters. In contrast, LS is better suited for minor adjustments in parameters, particularly for aging batteries, whereas GA lags behind in both computational efficiency and optimality with respect to PSO.
comment: Accepted and published in the Proceedings of the 2024 10th International Conference on Optimization and Applications (ICOA), IEEE, 2024. Copyright 2024 IEEE. This is the author's accepted manuscript; the final version is available at IEEE Xplore (DOI: 10.1109/ICOA62581.2024.10754301)
Digital Twin of Aerosol Jet Printing
Aerosol Jet (AJ) printing is a versatile additive manufacturing technique capable of producing high-resolution interconnects on both 2D and 3D substrates. The AJ process is complex and dynamic with many hidden and unobservable states that influence the machine performance, including aerosol particle diameter, aerosol carrier density, vial level, and ink deposition in the tube and nozzle. Despite its promising potential, the widespread adoption of AJ printing is limited by inconsistencies in print quality that often stem from variability in these hidden states. To address these challenges, we develop a digital twin model of the AJ process that offers real-time insights into the machine's operations. The digital twin is built around a physics-based macro-model created through simulation and experimentation. The states and parameters of the digital model are continuously updated using probabilistic sequential estimation techniques to closely align with real-time measurements extracted from the AJ system's sensor and video data. The result is a digital model of the AJ process that continuously evolves over a physical machine's lifecycle. The digital twin enables accurate monitoring of unobservable physical characteristics, detects and predicts anomalous behavior, and forecasts the effect of control adjustments. This work presents a comprehensive end-to-end digital twin framework that integrates customized computer vision techniques, physics-based macro-modeling, and advanced probabilistic estimation methods to construct an evolving digital representation of the AJ equipment and process. While the methodologies are customized for aerosol jet printing, the process for constructing the digital twin can be applied for other advanced manufacturing techniques.
Towards Quantum Algorithms for the Optimization of Spanning Trees: The Power Distribution Grids Use Case
Optimizing the topology of networks is an important challenge across engineering disciplines. In energy systems, network reconfiguration can substantially reduce losses and costs and thus support the energy transition. Unfortunately, many related optimization problems are NP hard, restricting practical applications. In this article, we address the problem of minimizing losses in radial networks, a problem that routinely arises in distribution grid operation. We show that even the computation of approximate solutions is computationally hard and propose quantum optimization as a promising alternative. We derive two quantum algorithmic primitives based on the Quantum Alternating Operator Ansatz (QAOA) that differ in the sampling of network topologies: a tailored sampling of radial topologies and simple sampling with penalty terms to suppress non-radial topologies. We show how to apply these algorithmic primitives to distribution grid reconfiguration and quantify the necessary quantum resources.
FTT-GRU: A Hybrid Fast Temporal Transformer with GRU for Remaining Useful Life Prediction
Accurate prediction of the remaining useful life (RUL) of industrial machinery is essential for reducing downtime and optimizing maintenance schedules. Existing approaches, such as long short-term memory (LSTM) networks and convolutional neural networks (CNNs), often struggle to model both global temporal dependencies and fine-grained degradation trends in multivariate sensor data. We propose a hybrid model, FTT-GRU, which combines a Fast Temporal Transformer (FTT) -- a lightweight Transformer variant using linearized attention via fast Fourier transform (FFT) -- with a gated recurrent unit (GRU) layer for sequential modeling. To the best of our knowledge, this is the first application of an FTT with a GRU for RUL prediction on NASA CMAPSS, enabling simultaneous capture of global and local degradation patterns in a compact architecture. On CMAPSS FD001, FTT-GRU attains RMSE 30.76, MAE 18.97, and $R^2=0.45$, with 1.12 ms CPU latency at batch=1. Relative to the best published deep baseline (TCN--Attention), it improves RMSE by 1.16\% and MAE by 4.00\%. Training curves averaged over $k=3$ runs show smooth convergence with narrow 95\% confidence bands, and ablations (GRU-only, FTT-only) support the contribution of both components. These results demonstrate that a compact Transformer-RNN hybrid delivers accurate and efficient RUL predictions on CMAPSS, making it suitable for real-time industrial prognostics.
comment: 5 pages, The 2025 International Conference on Computational Science and Computational Intelligence
Rotatable Antenna System Empowered Low-Altitude Economy: Opportunities and Challenges
Low-altitude economy (LAE) is an emerging technological paradigm that enables continuous airspace coverage at multiple altitudes by providing highly reliable data connectivity for numerous low-altitude applications. However, existing networks cannot sufficiently support LAE development, as current base stations (BSs) are primarily designed for terrestrial users and lack the capability to provide continuous coverage at low altitudes. To overcome these challenges, rotatable antenna system (RAS) is introduced in LAE, enabling flexible beamforming by dynamically adjusting the boresight of directional antennas to extend low-altitude coverage and enhance the stability of data transmission. In this article, we first provide an overview of RAS-empowered LAE applications, including low-altitude communication, sensing, control, and computation. Then, we present two practical RAS deployment strategies for LAE scenarios, namely RAS-aided multi-BS and multi-unmanned aerial vehicle (UAV) cooperative coverages, as well as provide detailed discussions on their system architectures and performance benefits. Additionally, key design issues of RAS in LAE are discussed, including channel modeling and estimation, cellular access and interference cancellation, as well as RAS configuration and boresight optimization. Finally, we demonstrate the performance gains of RAS in LAE networks through experimental and simulation results.
comment: 8 pages, 5 figures, accepted in IEEE Wireless Communication (Early Access)
Image-based ground distance detection for crop-residue-covered soil
Conservation agriculture features a soil surface covered with crop residues, which brings benefits of improving soil health and saving water. However, one significant challenge in conservation agriculture lies in precisely controlling the seeding depth on the soil covered with crop residues. This is constrained by the lack of ground distance information, since current distance measurement techniques, like laser, ultrasonic, or mechanical displacement sensors, are incapable of differentiating whether the distance information comes from the residue or the soil. This paper presents an image-based method to get the ground distance information for the crop-residues-covered soil. This method is performed with 3D camera and RGB camera, obtaining depth image and color image at the same time. The color image is used to distinguish the different areas of residues and soil and finally generates a mask image. The mask image is applied to the depth image so that only the soil area depth information can be used to calculate the ground distance, and residue areas can be recognized and excluded from ground distance detection. Experimentation shows that this distance measurement method is feasible for real-time implementation, and the measurement error is within plus or minus 3mm. It can be applied in conservation agriculture machinery for precision depth seeding, as well as other depth-control-demanding applications like transplant or tillage.
comment: under review at Computers and Electronics in Agriculture
On Improvisation and Open-Endedness: Insights for Experiential AI AAAI 2026
Improvisation-the art of spontaneous creation that unfolds moment-to-moment without a scripted outcome-requires practitioners to continuously sense, adapt, and create anew. It is a fundamental mode of human creativity spanning music, dance, and everyday life. The open-ended nature of improvisation produces a stream of novel, unrepeatable moments-an aspect highly valued in artistic creativity. In parallel, open-endedness (OE)-a system's capacity for unbounded novelty and endless "interestingness"-is exemplified in natural or cultural evolution and has been considered "the last grand challenge" in artificial life (ALife). The rise of generative AI now raises the question in computational creativity (CC) research: What makes a "good" improvisation for AI? Can AI learn to improvise in a genuinely open-ended way? In this work-in-progress paper, we report insights from in-depth interviews with 6 experts in improvisation across dance, music, and contact improvisation. We draw systemic connections between human improvisational arts and the design of future experiential AI agents that could improvise alone or alongside humans-or even with other AI agents-embodying qualities of improvisation drawn from practice: active listening (umwelt and awareness), being in the time (mindfulness and ephemerality), embracing the unknown (source of randomness and serendipity), non-judgmental flow (acceptance and dynamical stability, balancing structure and surprise (unpredictable criticality at edge of chaos), imaginative metaphor (synaesthesia and planning), empathy, trust, boundary, and care (mutual theory of mind), and playfulness and intrinsic motivation (maintaining interestingness).
comment: Submitted to AAAI 2026 Creative AI for Live Interactive Performances Workshop (CLIP) as a work-in-progress paper
Descriptive Model-based Learning and Control for Bipedal Locomotion
Bipedal balance is challenging due to its multi-phase, hybrid nature and high-dimensional state space. Traditional balance control approaches for bipedal robots rely on low-dimensional models for locomotion planning and reactive control, constraining the full robot to behave like these simplified models. This involves tracking preset reference paths for the Center of Mass and upper body obtained through low-dimensional models, often resulting in inefficient walking patterns with bent knees. However, we observe that bipedal balance is inherently low-dimensional and can be effectively described with simple state and action descriptors in a low-dimensional state space. This allows the robot's motion to evolve freely in its high-dimensional state space, only constraining its projection in the low-dimensional state space. In this work, we propose a novel control approach that avoids prescribing a low-dimensional model to the full model. Instead, our control framework uses a descriptive model with the minimum degrees of freedom necessary to maintain balance, allowing the remaining degrees of freedom to evolve freely in the high-dimensional space. This results in an efficient human-like walking gait and improved robustness.
comment: 8 pages, 15 figures
Optimization of continuous-flow over traffic networks with fundamental diagram constraints
Optimal transport (OT) theory provides a principled framework for modeling mass movement in applications such as mobility, logistics, and economics. Classical formulations, however, generally ignore capacity limits that are intrinsic in applications, in particular in traffic flow problems. We address this limitation by incorporating fundamental diagrams into a dynamic continuous-flow OT model on graphs, thereby including empirical relations between local density and maximal flux. We adopt an Eulerian kinetic action on graphs that preserves displacement interpolation in direct analogy with the continuous theory. Momentum lives on edges and density on nodes, mirroring road-network semantics in which segments carry speed and intersections store mass. The resulting fundamental-diagram-constrained OT problem preserves mass conservation and admits a convex variational discretization, yielding optimal congestion-aware traffic flow over road networks. We establish the existence and uniqueness of the optimal flow with sources and sinks, and develop an efficient convex optimization method. Numerical studies begin with a single-lane line network and scale to a city-level road network.
comment: 13 pages, 5 figures
Symbol Detection in a MIMO Wireless Communication System Using a FeFET-coupled CMOS Ring Oscillator Array
Symbol decoding in multiple-input multiple-output (MIMO) wireless communication systems requires the deployment of fast, energy-efficient computing hardware deployable at the edge. The brute-force, exact maximum likelihood (ML) decoder, solved on conventional classical digital hardware, has exponential time complexity. Approximate classical solvers implemented on the same hardware have polynomial time complexity at the best. In this article, we design an alternative ring-oscillator-based coupled oscillator array to act as an oscillator Ising machine (OIM) and heuristically solve the ML-based MIMO detection problem. Complementary metal oxide semiconductor (CMOS) technology is used to design the ring oscillators, and ferroelectric field effect transistor (FeFET) technology is chosen as the coupling element (X) between the oscillators in this CMOS + X OIM design. For this purpose, we experimentally report high linear range of conductance variation (1 micro-S to 60 micro-S) in a FeFET device fabricated at 28 nm high-K/ metal gate (HKMG) CMOS technology node. We incorporate the conductance modulation characteristic in SPICE simulation of the ring oscillators connected in an all-to-all fashion through a crossbar array of these FeFET devices. We show that the above range of conductance variation of the FeFET device is suitable to obtain optimum OIM performance with no significant performance drop up to a MIMO size of 100 transmitting and 100 receiving antennas, thereby making FeFET a suitable device for this application. Our simulations and associated analysis using the Kuramoto model of oscillators also predict that this designed classical analog OIM, if implemented experimentally, will offer logarithmic scaling of computation time with MIMO size, thereby offering a huge improvement (in terms of computation speed) over aforementioned MIMO decoders run on conventional digital hardware.
comment: 58 pages including supplementary information, 5 main figures, 4 main tables, 2 supplementary figures, 2 supplementary tables
CT-ESKF: A General Framework of Covariance Transformation-Based Error-State Kalman Filter
Invariant extended Kalman filter (InEKF) possesses excellent trajectory-independent property and better consistency compared to conventional extended Kalman filter (EKF). However, when applied to scenarios involving both global-frame and body-frame observations, InEKF may fail to preserve its trajectory-independent property. This work introduces the concept of equivalence between error states and covariance matrices among different error-state Kalman filters, and shows that although InEKF exhibits trajectory independence, its covariance propagation is actually equivalent to EKF. A covariance transformation-based error-state Kalman filter (CT-ESKF) framework is proposed that unifies various error-state Kalman filtering algorithms. The framework gives birth to novel filtering algorithms that demonstrate improved performance in integrated navigation systems that incorporate both global and body-frame observations. Experimental results show that the EKF with covariance transformation outperforms both InEKF and original EKF in a representative INS/GNSS/Odometer integrated navigation system.
comment: 19 pages, 12 figures
Large Language Models for Control
This paper investigates using large language models (LLMs) to generate control actions directly, without requiring control-engineering expertise or hand-tuned algorithms. We implement several variants: (i) prompt-only, (ii) tool-assisted with access to historical data, and (iii) prediction-assisted using learned or simple models to score candidate actions. We compare them on tracking accuracy and actuation effort, with and without a prompt that requests lower actuator usage. Results show prompt-only LLMs already produce viable control, while tool-augmented versions adapt better to changing objectives but can be more sensitive to constraints, supporting LLM-in-the-loop control for evolving cyber-physical systems today and operator and human inputs.
Continuous Classification Aggregation
We prove that any optimal, independent, and zero unanimous fuzzy classification aggregation function of a continuum of individual classifications of $m\ge 3$ objects into $2\le p\le m$ types must be a weighted arithmetic mean. We also provide a characterization for the case when $m=p=2$.
comment: 18 pages
Imperfect Competition in Markets for Short-Circuit Current Services
An important limitation of Inverter-Based Resources (IBR) is their reduced contribution to Short-Circuit Current (SCC), as compared to that of Synchronous Generators (SGs). With increasing penetration of IBR in most power systems, the reducing SCC poses challenges to a secure system operation, as line protections may not trip when required. In order to address this issue, the SCC ancillary service could be procured via an economic mechanism, aiming at securing adequate SCC on all buses. However, the suitability of markets for SCC services is not well understood, given that these could be prone to market-power issues: since the SCC contributions from various SGs to a certain bus are determined by the electrical topology of the grid, this is a highly local service. It is necessary to understand if SGs at advantageous electrical locations could exert market power and, if so, how it could be mitigated. In order to fill this gap, this paper adopts an SCC-constrained bilevel model to investigate strategic behaviors of SGs. To address the non-convexity due to unit commitment variables, the model is restructured through a primal-dual formulation. Based on a modified IEEE 30-bus system, cases with strategic SGs placed at different buses are analyzed. These studies demonstrate that agents exerting market power could achieve up to triple revenues from SCC provision, highlighting the need to carefully design these markets.
comment: Ancillary services, short-circuit current, market power, bilevel optimization, primal-dual formulation. A paper submitted to
Distributionally Robust Control Synthesis for Stochastic Systems with Safety and Reach-Avoid Specifications
We investigate the problem of synthesizing distributionally robust control policies for stochastic systems under safety and reach-avoid specifications. Using a game-theoretical framework, we consider the setting where the probability distribution of the disturbance at each time step is selected from an ambiguity set defined by the Wasserstein distance. The goal is to synthesize a distributionally robust control policy that ensures the satisfaction probability exceeds a specified threshold under any distribution within the ambiguity set. First, for both safety and reach-avoid specifications, we establish the existence of optimal policies by leveraging the dynamic programming principles. Then we demonstrate how the associated optimization problem can be efficiently solved using the dual representation of Wasserstein distributionally robust optimization. Furthermore, for safety specifications in particular, we introduce a novel concept of distributionally robust control barrier certificates and show how these enable the efficient synthesis of controllers through sum-of-squares programming techniques. Finally, our experimental results reveal that incorporating distributional robustness during the synthesis phase significantly improves the satisfaction probability during online execution, even with limited statistical knowledge of the disturbance distribution.
Introducing Coherent-Control Koopman Modeling to Reservoir Scale Porous Media Flow Studies
Accurate and robust surrogate modeling is essential for the real-time control and optimization of large-scale subsurface systems, such as geological CO2 storage and waterflood management. This study investigates the limits of classical Dynamic Mode Decomposition with control (DMDc) and introduces CCKM, as a robust alter-native, in enforcing control in pressure and water saturation reservoir dynamics under challenging prediction scenarios. We introduced a control-coherent incremental ({\Delta}) CCKM formulation, in which the field update is driven by actuator changes rather than rather than actuator levels as in the original level formulation and compared them both against DMDc and a Hybrid B-only surrogate that re-uses DMDcs bottom-B (same-step feed-through), showing that only CCKM remains stable and accurate under regime shifts. Two representative cases are considered: (i) an out-of-distribution shut-in and restart case, and (ii) an in-distribution bottomhole pressure (BHP) drawdown. Results show that only CCKM consistently maintains stability and accuracy across both scenarios, achieving sub-bar mean absolute error and sub-percent Frobenius norm percent change error (FPCE) even under regime shifts, while DMDc exhibit large unphysical errors during control transients. The findings demonstrate that strict control-coherence is critical for reliable surrogate modeling, particularly in settings with abrupt changes in control strategy. The proposed framework is broadly applicable to real-time reservoir optimization and can be integrated seamlessly into existing optimization and monitoring workflows, enabling fast and trustworthy deci-sion support in the presence of both expected and unexpected actuation regimes.
Systems and Control (EESS)
Quantum Computing for EVs to Enhance Grid Resilience and Disaster Relief: Challenges and Opportunities
The power grid is the foundation of modern society, however extreme weather events have increasingly caused widespread outages. Enhancing grid resilience is therefore critical to maintaining secure and reliable operations. In disaster relief and restoration, vehicle-to-grid (V2G) technology allows electric vehicles (EVs) to serve as mobile energy resources by discharging to support critical loads or regulating grid frequency as needed. Effective V2G operation requires coordinated charging and discharging of many EVs through optimization. Similarly, in grid restoration, EVs must be strategically routed to affected areas, forming the mobile charging station placement (CSP) problem, which presents another complex optimization challenge. This work reviews state-of-the-art optimization methods for V2G and mobile CSP applications, outlines their limitations, and explores how quantum computing (QC) could overcome current computational bottlenecks. A QC-focused perspective is presented on enhancing grid resilience and accelerating restoration as extreme weather events grow more frequent and severe.
comment: 11 pages, 0 figures, 2 tables, Submitted to IEEE Transactions on Smart Grid
Hybrid Quantum-Classical Optimization of the Resource Scheduling Problem
Resource scheduling is critical in many industries, especially in power systems. The Unit Commitment problem determines the on/off status and output levels of generators under many constraints. Traditional exact methods, such as mathematical programming methods or dynamic programming, remain the backbone of UC solution techniques, but they often rely on linear approximations or exhaustive search, leading to high computational burdens as system size grows. Metaheuristic approaches, such as genetic algorithms, particle swarm optimization, and other evolutionary methods, have been explored to mitigate this complexity; however, they typically lack optimality guarantees, exhibit sensitivity to initial conditions, and can become prohibitively time-consuming for large-scale systems. In this paper, we introduce a quantum-classical hybrid algorithm for UC and, by extension, other resource scheduling problems, that leverages Benders decomposition to decouple binary commitment decisions from continuous economic dispatch. The binary master problem is formulated as a quadratic unconstrained binary optimization model and solved on a quantum annealer. The continuous subproblem, which minimizes generation costs, with Lagrangian cuts feeding back to the master until convergence. We evaluate our hybrid framework on systems scaled from 10 to 1,000 generation units. Compared against a classical mixed-integer nonlinear programming baseline, the hybrid algorithm achieves a consistently lower computation-time growth rate and maintains an absolute optimality gap below 1.63%. These results demonstrate that integrating quantum annealing within a hybrid quantum-classical Benders decomposition loop can significantly accelerate large-scale resource scheduling without sacrificing solution quality, pointing toward a viable path for addressing the escalating complexity of modern power grids.
comment: 13 pages, 7 figures, 1 table Submitted to Next Research
Unveiling Uniform Shifted Power Law in Stochastic Human and Autonomous Driving Behavior
Accurately simulating rare but safety-critical driving behaviors is essential for the evaluation and certification of autonomous vehicles (AVs). However, current models often fail to reproduce realistic collision rates when calibrated on real-world data, largely due to inadequate representation of long-tailed behavioral distributions. Here, we uncover a simple yet unifying shifted power law that robustly characterizes the stochasticity of both human-driven vehicle (HV) and AV behaviors, especially in the long-tail regime. The model adopts a parsimonious analytical form with only one or two parameters, enabling efficient calibration even under data sparsity. Analyzing large-scale, micro-level trajectory data from global HV and AV datasets, the shifted power law achieves an average R2 of 0.97 and a nearly identical tail distribution, uniformly fits both frequent behaviors and rare safety-critical deviations, significantly outperforming existing Gaussian-based baselines. When integrated into an agent-based traffic simulator, it enables forward-rolling simulations that reproduce realistic crash patterns for both HVs and AVs, achieving rates consistent with real-world statistics and improving the fidelity of safety assessment without post hoc correction. This discovery offers a unified and data-efficient foundation for modeling high-risk behavior and improves the fidelity of simulation-based safety assessments for mixed AV/HV traffic. The shifted power law provides a promising path toward simulation-driven validation and global certification of AV technologies.
Frequency Quality Assessment of GFM and GFL Converters and Synchronous Condensers
This paper compares the impact of different conventional and emerging technologies and control strategies on frequency quality. We study, in particular, the long-term dynamic performance of grid-forming (GFM) and grid-following (GFL) inverter-based resources (IBRs) as well as conventional synchronous machines. Extensive simulations and several realistic scenarios consider both short-term and long-term aspects of frequency quality. It is shown that, while overall GFM IBRs significantly improve frequency quality, a combination of GFL IBRs providing frequency support such as wind and batteries, and synchronous condensers, might be enough to meet similar frequency quality standards. Another result of the paper is that the need for automatic generation control (AGC) becomes less clear in GFM IBR-dominated grids from a frequency quality perspective.
Adaptive Federated Learning to Optimize the MultiCast flows in Data Centers
Data centers play an increasingly critical role in societal digitalization, yet their rapidly growing energy demand poses significant challenges for sustainable operation. To enhance the energy efficiency of geographically distributed data centers, this paper formulates a multi-period optimization model that captures the interdependence of electricity, heat, and data flows. The optimization of such multicast flows inherently involves mixed-integer formulations and the access to proprietary or sensitive datasets, which correspondingly exacerbate computational complexity and raise data-privacy concerns. To address these challenges, an adaptive federated learning-to-optimization approach is proposed, accounting for the heterogeneity of datasets across distributed data centers. To safeguard privacy, cryptography techniques are leveraged in both the learning and optimization processes. A model acceptance criterion with convergence guarantee is developed to improve learning performance and filter out potentially contaminated data, while a verifiable double aggregation mechanism is further proposed to simultaneously ensure privacy and integrity of shared data during optimization. Theoretical analysis and numerical simulations demonstrate that the proposed approach preserves the privacy and integrity of shared data, achieves near-optimal performance, and exhibits high computational efficiency, making it suitable for large-scale data center optimization under privacy constraints.
Efficiency and Optimality in Electrochemical Battery Model Parameter Identification: A Comparative Study of Estimation Techniques
Parameter identification for electrochemical battery models has always been challenging due to the multitude of parameters involved, most of which cannot be directly measured. This paper evaluates the efficiency and optimality of three widely-used parameter identification methods for electrochemical battery models: Least Squares Method (LS), Particle Swarm Optimization (PSO), and Genetic Algorithm (GA). Therefore, a Single Particle Model (SPM) of a battery was developed and discretized. Battery parameter grouping was then performed to reduce the number of parameters required. Using a set of parameters previously identified from a real battery as a benchmark, we generated fitting and validation datasets to assess the methods' runtime and accuracy. The comparative analysis reveals that PSO outperforms the other methods in terms of accuracy and stability, making it highly effective for parameter identification when there is no prior knowledge of the battery's internal parameters. In contrast, LS is better suited for minor adjustments in parameters, particularly for aging batteries, whereas GA lags behind in both computational efficiency and optimality with respect to PSO.
comment: Accepted and published in the Proceedings of the 2024 10th International Conference on Optimization and Applications (ICOA), IEEE, 2024. Copyright 2024 IEEE. This is the author's accepted manuscript; the final version is available at IEEE Xplore (DOI: 10.1109/ICOA62581.2024.10754301)
Digital Twin of Aerosol Jet Printing
Aerosol Jet (AJ) printing is a versatile additive manufacturing technique capable of producing high-resolution interconnects on both 2D and 3D substrates. The AJ process is complex and dynamic with many hidden and unobservable states that influence the machine performance, including aerosol particle diameter, aerosol carrier density, vial level, and ink deposition in the tube and nozzle. Despite its promising potential, the widespread adoption of AJ printing is limited by inconsistencies in print quality that often stem from variability in these hidden states. To address these challenges, we develop a digital twin model of the AJ process that offers real-time insights into the machine's operations. The digital twin is built around a physics-based macro-model created through simulation and experimentation. The states and parameters of the digital model are continuously updated using probabilistic sequential estimation techniques to closely align with real-time measurements extracted from the AJ system's sensor and video data. The result is a digital model of the AJ process that continuously evolves over a physical machine's lifecycle. The digital twin enables accurate monitoring of unobservable physical characteristics, detects and predicts anomalous behavior, and forecasts the effect of control adjustments. This work presents a comprehensive end-to-end digital twin framework that integrates customized computer vision techniques, physics-based macro-modeling, and advanced probabilistic estimation methods to construct an evolving digital representation of the AJ equipment and process. While the methodologies are customized for aerosol jet printing, the process for constructing the digital twin can be applied for other advanced manufacturing techniques.
Towards Quantum Algorithms for the Optimization of Spanning Trees: The Power Distribution Grids Use Case
Optimizing the topology of networks is an important challenge across engineering disciplines. In energy systems, network reconfiguration can substantially reduce losses and costs and thus support the energy transition. Unfortunately, many related optimization problems are NP hard, restricting practical applications. In this article, we address the problem of minimizing losses in radial networks, a problem that routinely arises in distribution grid operation. We show that even the computation of approximate solutions is computationally hard and propose quantum optimization as a promising alternative. We derive two quantum algorithmic primitives based on the Quantum Alternating Operator Ansatz (QAOA) that differ in the sampling of network topologies: a tailored sampling of radial topologies and simple sampling with penalty terms to suppress non-radial topologies. We show how to apply these algorithmic primitives to distribution grid reconfiguration and quantify the necessary quantum resources.
FTT-GRU: A Hybrid Fast Temporal Transformer with GRU for Remaining Useful Life Prediction
Accurate prediction of the remaining useful life (RUL) of industrial machinery is essential for reducing downtime and optimizing maintenance schedules. Existing approaches, such as long short-term memory (LSTM) networks and convolutional neural networks (CNNs), often struggle to model both global temporal dependencies and fine-grained degradation trends in multivariate sensor data. We propose a hybrid model, FTT-GRU, which combines a Fast Temporal Transformer (FTT) -- a lightweight Transformer variant using linearized attention via fast Fourier transform (FFT) -- with a gated recurrent unit (GRU) layer for sequential modeling. To the best of our knowledge, this is the first application of an FTT with a GRU for RUL prediction on NASA CMAPSS, enabling simultaneous capture of global and local degradation patterns in a compact architecture. On CMAPSS FD001, FTT-GRU attains RMSE 30.76, MAE 18.97, and $R^2=0.45$, with 1.12 ms CPU latency at batch=1. Relative to the best published deep baseline (TCN--Attention), it improves RMSE by 1.16\% and MAE by 4.00\%. Training curves averaged over $k=3$ runs show smooth convergence with narrow 95\% confidence bands, and ablations (GRU-only, FTT-only) support the contribution of both components. These results demonstrate that a compact Transformer-RNN hybrid delivers accurate and efficient RUL predictions on CMAPSS, making it suitable for real-time industrial prognostics.
comment: 5 pages, The 2025 International Conference on Computational Science and Computational Intelligence
Rotatable Antenna System Empowered Low-Altitude Economy: Opportunities and Challenges
Low-altitude economy (LAE) is an emerging technological paradigm that enables continuous airspace coverage at multiple altitudes by providing highly reliable data connectivity for numerous low-altitude applications. However, existing networks cannot sufficiently support LAE development, as current base stations (BSs) are primarily designed for terrestrial users and lack the capability to provide continuous coverage at low altitudes. To overcome these challenges, rotatable antenna system (RAS) is introduced in LAE, enabling flexible beamforming by dynamically adjusting the boresight of directional antennas to extend low-altitude coverage and enhance the stability of data transmission. In this article, we first provide an overview of RAS-empowered LAE applications, including low-altitude communication, sensing, control, and computation. Then, we present two practical RAS deployment strategies for LAE scenarios, namely RAS-aided multi-BS and multi-unmanned aerial vehicle (UAV) cooperative coverages, as well as provide detailed discussions on their system architectures and performance benefits. Additionally, key design issues of RAS in LAE are discussed, including channel modeling and estimation, cellular access and interference cancellation, as well as RAS configuration and boresight optimization. Finally, we demonstrate the performance gains of RAS in LAE networks through experimental and simulation results.
comment: 8 pages, 5 figures, accepted in IEEE Wireless Communication (Early Access)
Image-based ground distance detection for crop-residue-covered soil
Conservation agriculture features a soil surface covered with crop residues, which brings benefits of improving soil health and saving water. However, one significant challenge in conservation agriculture lies in precisely controlling the seeding depth on the soil covered with crop residues. This is constrained by the lack of ground distance information, since current distance measurement techniques, like laser, ultrasonic, or mechanical displacement sensors, are incapable of differentiating whether the distance information comes from the residue or the soil. This paper presents an image-based method to get the ground distance information for the crop-residues-covered soil. This method is performed with 3D camera and RGB camera, obtaining depth image and color image at the same time. The color image is used to distinguish the different areas of residues and soil and finally generates a mask image. The mask image is applied to the depth image so that only the soil area depth information can be used to calculate the ground distance, and residue areas can be recognized and excluded from ground distance detection. Experimentation shows that this distance measurement method is feasible for real-time implementation, and the measurement error is within plus or minus 3mm. It can be applied in conservation agriculture machinery for precision depth seeding, as well as other depth-control-demanding applications like transplant or tillage.
comment: under review at Computers and Electronics in Agriculture
On Improvisation and Open-Endedness: Insights for Experiential AI AAAI 2026
Improvisation-the art of spontaneous creation that unfolds moment-to-moment without a scripted outcome-requires practitioners to continuously sense, adapt, and create anew. It is a fundamental mode of human creativity spanning music, dance, and everyday life. The open-ended nature of improvisation produces a stream of novel, unrepeatable moments-an aspect highly valued in artistic creativity. In parallel, open-endedness (OE)-a system's capacity for unbounded novelty and endless "interestingness"-is exemplified in natural or cultural evolution and has been considered "the last grand challenge" in artificial life (ALife). The rise of generative AI now raises the question in computational creativity (CC) research: What makes a "good" improvisation for AI? Can AI learn to improvise in a genuinely open-ended way? In this work-in-progress paper, we report insights from in-depth interviews with 6 experts in improvisation across dance, music, and contact improvisation. We draw systemic connections between human improvisational arts and the design of future experiential AI agents that could improvise alone or alongside humans-or even with other AI agents-embodying qualities of improvisation drawn from practice: active listening (umwelt and awareness), being in the time (mindfulness and ephemerality), embracing the unknown (source of randomness and serendipity), non-judgmental flow (acceptance and dynamical stability, balancing structure and surprise (unpredictable criticality at edge of chaos), imaginative metaphor (synaesthesia and planning), empathy, trust, boundary, and care (mutual theory of mind), and playfulness and intrinsic motivation (maintaining interestingness).
comment: Submitted to AAAI 2026 Creative AI for Live Interactive Performances Workshop (CLIP) as a work-in-progress paper
Descriptive Model-based Learning and Control for Bipedal Locomotion
Bipedal balance is challenging due to its multi-phase, hybrid nature and high-dimensional state space. Traditional balance control approaches for bipedal robots rely on low-dimensional models for locomotion planning and reactive control, constraining the full robot to behave like these simplified models. This involves tracking preset reference paths for the Center of Mass and upper body obtained through low-dimensional models, often resulting in inefficient walking patterns with bent knees. However, we observe that bipedal balance is inherently low-dimensional and can be effectively described with simple state and action descriptors in a low-dimensional state space. This allows the robot's motion to evolve freely in its high-dimensional state space, only constraining its projection in the low-dimensional state space. In this work, we propose a novel control approach that avoids prescribing a low-dimensional model to the full model. Instead, our control framework uses a descriptive model with the minimum degrees of freedom necessary to maintain balance, allowing the remaining degrees of freedom to evolve freely in the high-dimensional space. This results in an efficient human-like walking gait and improved robustness.
comment: 8 pages, 15 figures
Optimization of continuous-flow over traffic networks with fundamental diagram constraints
Optimal transport (OT) theory provides a principled framework for modeling mass movement in applications such as mobility, logistics, and economics. Classical formulations, however, generally ignore capacity limits that are intrinsic in applications, in particular in traffic flow problems. We address this limitation by incorporating fundamental diagrams into a dynamic continuous-flow OT model on graphs, thereby including empirical relations between local density and maximal flux. We adopt an Eulerian kinetic action on graphs that preserves displacement interpolation in direct analogy with the continuous theory. Momentum lives on edges and density on nodes, mirroring road-network semantics in which segments carry speed and intersections store mass. The resulting fundamental-diagram-constrained OT problem preserves mass conservation and admits a convex variational discretization, yielding optimal congestion-aware traffic flow over road networks. We establish the existence and uniqueness of the optimal flow with sources and sinks, and develop an efficient convex optimization method. Numerical studies begin with a single-lane line network and scale to a city-level road network.
comment: 13 pages, 5 figures
Symbol Detection in a MIMO Wireless Communication System Using a FeFET-coupled CMOS Ring Oscillator Array
Symbol decoding in multiple-input multiple-output (MIMO) wireless communication systems requires the deployment of fast, energy-efficient computing hardware deployable at the edge. The brute-force, exact maximum likelihood (ML) decoder, solved on conventional classical digital hardware, has exponential time complexity. Approximate classical solvers implemented on the same hardware have polynomial time complexity at the best. In this article, we design an alternative ring-oscillator-based coupled oscillator array to act as an oscillator Ising machine (OIM) and heuristically solve the ML-based MIMO detection problem. Complementary metal oxide semiconductor (CMOS) technology is used to design the ring oscillators, and ferroelectric field effect transistor (FeFET) technology is chosen as the coupling element (X) between the oscillators in this CMOS + X OIM design. For this purpose, we experimentally report high linear range of conductance variation (1 micro-S to 60 micro-S) in a FeFET device fabricated at 28 nm high-K/ metal gate (HKMG) CMOS technology node. We incorporate the conductance modulation characteristic in SPICE simulation of the ring oscillators connected in an all-to-all fashion through a crossbar array of these FeFET devices. We show that the above range of conductance variation of the FeFET device is suitable to obtain optimum OIM performance with no significant performance drop up to a MIMO size of 100 transmitting and 100 receiving antennas, thereby making FeFET a suitable device for this application. Our simulations and associated analysis using the Kuramoto model of oscillators also predict that this designed classical analog OIM, if implemented experimentally, will offer logarithmic scaling of computation time with MIMO size, thereby offering a huge improvement (in terms of computation speed) over aforementioned MIMO decoders run on conventional digital hardware.
comment: 58 pages including supplementary information, 5 main figures, 4 main tables, 2 supplementary figures, 2 supplementary tables
CT-ESKF: A General Framework of Covariance Transformation-Based Error-State Kalman Filter
Invariant extended Kalman filter (InEKF) possesses excellent trajectory-independent property and better consistency compared to conventional extended Kalman filter (EKF). However, when applied to scenarios involving both global-frame and body-frame observations, InEKF may fail to preserve its trajectory-independent property. This work introduces the concept of equivalence between error states and covariance matrices among different error-state Kalman filters, and shows that although InEKF exhibits trajectory independence, its covariance propagation is actually equivalent to EKF. A covariance transformation-based error-state Kalman filter (CT-ESKF) framework is proposed that unifies various error-state Kalman filtering algorithms. The framework gives birth to novel filtering algorithms that demonstrate improved performance in integrated navigation systems that incorporate both global and body-frame observations. Experimental results show that the EKF with covariance transformation outperforms both InEKF and original EKF in a representative INS/GNSS/Odometer integrated navigation system.
comment: 19 pages, 12 figures
Large Language Models for Control
This paper investigates using large language models (LLMs) to generate control actions directly, without requiring control-engineering expertise or hand-tuned algorithms. We implement several variants: (i) prompt-only, (ii) tool-assisted with access to historical data, and (iii) prediction-assisted using learned or simple models to score candidate actions. We compare them on tracking accuracy and actuation effort, with and without a prompt that requests lower actuator usage. Results show prompt-only LLMs already produce viable control, while tool-augmented versions adapt better to changing objectives but can be more sensitive to constraints, supporting LLM-in-the-loop control for evolving cyber-physical systems today and operator and human inputs.
Continuous Classification Aggregation
We prove that any optimal, independent, and zero unanimous fuzzy classification aggregation function of a continuum of individual classifications of $m\ge 3$ objects into $2\le p\le m$ types must be a weighted arithmetic mean. We also provide a characterization for the case when $m=p=2$.
comment: 18 pages
Imperfect Competition in Markets for Short-Circuit Current Services
An important limitation of Inverter-Based Resources (IBR) is their reduced contribution to Short-Circuit Current (SCC), as compared to that of Synchronous Generators (SGs). With increasing penetration of IBR in most power systems, the reducing SCC poses challenges to a secure system operation, as line protections may not trip when required. In order to address this issue, the SCC ancillary service could be procured via an economic mechanism, aiming at securing adequate SCC on all buses. However, the suitability of markets for SCC services is not well understood, given that these could be prone to market-power issues: since the SCC contributions from various SGs to a certain bus are determined by the electrical topology of the grid, this is a highly local service. It is necessary to understand if SGs at advantageous electrical locations could exert market power and, if so, how it could be mitigated. In order to fill this gap, this paper adopts an SCC-constrained bilevel model to investigate strategic behaviors of SGs. To address the non-convexity due to unit commitment variables, the model is restructured through a primal-dual formulation. Based on a modified IEEE 30-bus system, cases with strategic SGs placed at different buses are analyzed. These studies demonstrate that agents exerting market power could achieve up to triple revenues from SCC provision, highlighting the need to carefully design these markets.
comment: Ancillary services, short-circuit current, market power, bilevel optimization, primal-dual formulation. A paper submitted to
Distributionally Robust Control Synthesis for Stochastic Systems with Safety and Reach-Avoid Specifications
We investigate the problem of synthesizing distributionally robust control policies for stochastic systems under safety and reach-avoid specifications. Using a game-theoretical framework, we consider the setting where the probability distribution of the disturbance at each time step is selected from an ambiguity set defined by the Wasserstein distance. The goal is to synthesize a distributionally robust control policy that ensures the satisfaction probability exceeds a specified threshold under any distribution within the ambiguity set. First, for both safety and reach-avoid specifications, we establish the existence of optimal policies by leveraging the dynamic programming principles. Then we demonstrate how the associated optimization problem can be efficiently solved using the dual representation of Wasserstein distributionally robust optimization. Furthermore, for safety specifications in particular, we introduce a novel concept of distributionally robust control barrier certificates and show how these enable the efficient synthesis of controllers through sum-of-squares programming techniques. Finally, our experimental results reveal that incorporating distributional robustness during the synthesis phase significantly improves the satisfaction probability during online execution, even with limited statistical knowledge of the disturbance distribution.
Introducing Coherent-Control Koopman Modeling to Reservoir Scale Porous Media Flow Studies
Accurate and robust surrogate modeling is essential for the real-time control and optimization of large-scale subsurface systems, such as geological CO2 storage and waterflood management. This study investigates the limits of classical Dynamic Mode Decomposition with control (DMDc) and introduces CCKM, as a robust alter-native, in enforcing control in pressure and water saturation reservoir dynamics under challenging prediction scenarios. We introduced a control-coherent incremental ({\Delta}) CCKM formulation, in which the field update is driven by actuator changes rather than rather than actuator levels as in the original level formulation and compared them both against DMDc and a Hybrid B-only surrogate that re-uses DMDcs bottom-B (same-step feed-through), showing that only CCKM remains stable and accurate under regime shifts. Two representative cases are considered: (i) an out-of-distribution shut-in and restart case, and (ii) an in-distribution bottomhole pressure (BHP) drawdown. Results show that only CCKM consistently maintains stability and accuracy across both scenarios, achieving sub-bar mean absolute error and sub-percent Frobenius norm percent change error (FPCE) even under regime shifts, while DMDc exhibit large unphysical errors during control transients. The findings demonstrate that strict control-coherence is critical for reliable surrogate modeling, particularly in settings with abrupt changes in control strategy. The proposed framework is broadly applicable to real-time reservoir optimization and can be integrated seamlessly into existing optimization and monitoring workflows, enabling fast and trustworthy deci-sion support in the presence of both expected and unexpected actuation regimes.
Robotics
Whole-Body Proprioceptive Morphing: A Modular Soft Gripper for Robust Cross-Scale Grasping
Biological systems, such as the octopus, exhibit masterful cross-scale manipulation by adaptively reconfiguring their entire form, a capability that remains elusive in robotics. Conventional soft grippers, while compliant, are mostly constrained by a fixed global morphology, and prior shape-morphing efforts have been largely confined to localized deformations, failing to replicate this biological dexterity. Inspired by this natural exemplar, we introduce the paradigm of collaborative, whole-body proprioceptive morphing, realized in a modular soft gripper architecture. Our design is a distributed network of modular self-sensing pneumatic actuators that enables the gripper to intelligently reconfigure its entire topology, achieving multiple morphing states that are controllable to form diverse polygonal shapes. By integrating rich proprioceptive feedback from embedded sensors, our system can seamlessly transition from a precise pinch to a large envelope grasp. We experimentally demonstrate that this approach expands the grasping envelope and enhances generalization across diverse object geometries (standard and irregular) and scales (up to 10$\times$), while also unlocking novel manipulation modalities such as multi-object and internal hook grasping. This work presents a low-cost, easy-to-fabricate, and scalable framework that fuses distributed actuation with integrated sensing, offering a new pathway toward achieving biological levels of dexterity in robotic manipulation.
Dual-Stream Diffusion for World-Model Augmented Vision-Language-Action Model
Recently, augmenting Vision-Language-Action models (VLAs) with world modeling has shown promise in improving robotic policy learning. However, it remains challenging to jointly predict next-state observations and action sequences because of the inherent difference between the two modalities. To address this, we propose DUal-STream diffusion (DUST), a world-model augmented VLA framework that handles the modality conflict and enhances the performance of VLAs across diverse tasks. Specifically, we propose a multimodal diffusion transformer architecture that explicitly maintains separate modality streams while still enabling cross-modal knowledge sharing. In addition, we introduce independent noise perturbations for each modality and a decoupled flow-matching loss. This design enables the model to learn the joint distribution in a bidirectional manner while avoiding the need for a unified latent space. Based on the decoupling of modalities during training, we also introduce a joint sampling method that supports test-time scaling, where action and vision tokens evolve asynchronously at different rates. Through experiments on simulated benchmarks such as RoboCasa and GR-1, DUST achieves up to 6% gains over baseline methods, while our test-time scaling approach provides an additional 2-5% boost. On real-world tasks with the Franka Research 3, DUST improves success rates by 13%, confirming its effectiveness beyond simulation. Furthermore, pre-training on action-free videos from BridgeV2 yields significant transfer gains on RoboCasa, underscoring DUST's potential for large-scale VLA pretraining.
comment: 20 pages, 10 figures
Toward Accurate Long-Horizon Robotic Manipulation: Language-to-Action with Foundation Models via Scene Graphs
This paper presents a framework that leverages pre-trained foundation models for robotic manipulation without domain-specific training. The framework integrates off-the-shelf models, combining multimodal perception from foundation models with a general-purpose reasoning model capable of robust task sequencing. Scene graphs, dynamically maintained within the framework, provide spatial awareness and enable consistent reasoning about the environment. The framework is evaluated through a series of tabletop robotic manipulation experiments, and the results highlight its potential for building robotic manipulation systems directly on top of off-the-shelf foundation models.
EBT-Policy: Energy Unlocks Emergent Physical Reasoning Capabilities
Implicit policies parameterized by generative models, such as Diffusion Policy, have become the standard for policy learning and Vision-Language-Action (VLA) models in robotics. However, these approaches often suffer from high computational cost, exposure bias, and unstable inference dynamics, which lead to divergence under distribution shifts. Energy-Based Models (EBMs) address these issues by learning energy landscapes end-to-end and modeling equilibrium dynamics, offering improved robustness and reduced exposure bias. Yet, policies parameterized by EBMs have historically struggled to scale effectively. Recent work on Energy-Based Transformers (EBTs) demonstrates the scalability of EBMs to high-dimensional spaces, but their potential for solving core challenges in physically embodied models remains underexplored. We introduce a new energy-based architecture, EBT-Policy, that solves core issues in robotic and real-world settings. Across simulated and real-world tasks, EBT-Policy consistently outperforms diffusion-based policies, while requiring less training and inference computation. Remarkably, on some tasks it converges within just two inference steps, a 50x reduction compared to Diffusion Policy's 100. Moreover, EBT-Policy exhibits emergent capabilities not seen in prior models, such as zero-shot recovery from failed action sequences using only behavior cloning and without explicit retry training. By leveraging its scalar energy for uncertainty-aware inference and dynamic compute allocation, EBT-Policy offers a promising path toward robust, generalizable robot behavior under distribution shifts.
comment: 9 pages, 6 figures, 4 tables
Preliminary Prototyping of Avoidance Behaviors Triggered by a User's Physical Approach to a Robot
Human-robot interaction frequently involves physical proximity or contact. In human-human settings, people flexibly accept, reject, or tolerate such approaches depending on the relationship and context. We explore the design of a robot's rejective internal state and corresponding avoidance behaviors, such as withdrawing or pushing away, when a person approaches. We model the accumulation and decay of discomfort as a function of interpersonal distance, and implement tolerance (endurance) and limit-exceeding avoidance driven by the Dominance axis of the PAD affect model. The behaviors and their intensities are realized on an arm robot. Results illustrate a coherent pipeline from internal state parameters to graded endurance motions and, once a limit is crossed, to avoidance actions.
comment: Workshop on Socially Aware and Cooperative Intelligent Systems in HAI 2025
Learning Soft Robotic Dynamics with Active Exploration
Soft robots offer unmatched adaptability and safety in unstructured environments, yet their compliant, high-dimensional, and nonlinear dynamics make modeling for control notoriously difficult. Existing data-driven approaches often fail to generalize, constrained by narrowly focused task demonstrations or inefficient random exploration. We introduce SoftAE, an uncertainty-aware active exploration framework that autonomously learns task-agnostic and generalizable dynamics models of soft robotic systems. SoftAE employs probabilistic ensemble models to estimate epistemic uncertainty and actively guides exploration toward underrepresented regions of the state-action space, achieving efficient coverage of diverse behaviors without task-specific supervision. We evaluate SoftAE on three simulated soft robotic platforms -- a continuum arm, an articulated fish in fluid, and a musculoskeletal leg with hybrid actuation -- and on a pneumatically actuated continuum soft arm in the real world. Compared with random exploration and task-specific model-based reinforcement learning, SoftAE produces more accurate dynamics models, enables superior zero-shot control on unseen tasks, and maintains robustness under sensing noise, actuation delays, and nonlinear material effects. These results demonstrate that uncertainty-driven active exploration can yield scalable, reusable dynamics models across diverse soft robotic morphologies, representing a step toward more autonomous, adaptable, and data-efficient control in compliant robots.
Towards a Multi-Embodied Grasping Agent
Multi-embodiment grasping focuses on developing approaches that exhibit generalist behavior across diverse gripper designs. Existing methods often learn the kinematic structure of the robot implicitly and face challenges due to the difficulty of sourcing the required large-scale data. In this work, we present a data-efficient, flow-based, equivariant grasp synthesis architecture that can handle different gripper types with variable degrees of freedom and successfully exploit the underlying kinematic model, deducing all necessary information solely from the gripper and scene geometry. Unlike previous equivariant grasping methods, we translated all modules from the ground up to JAX and provide a model with batching capabilities over scenes, grippers, and grasps, resulting in smoother learning, improved performance and faster inference time. Our dataset encompasses grippers ranging from humanoid hands to parallel yaw grippers and includes 25,000 scenes and 20 million grasps.
comment: 9 pages, 3 figures
Modified-Emergency Index (MEI): A Criticality Metric for Autonomous Driving in Lateral Conflict
Effective, reliable, and efficient evaluation of autonomous driving safety is essential to demonstrate its trustworthiness. Criticality metrics provide an objective means of assessing safety. However, as existing metrics primarily target longitudinal conflicts, accurately quantifying the risks of lateral conflicts - prevalent in urban settings - remains challenging. This paper proposes the Modified-Emergency Index (MEI), a metric designed to quantify evasive effort in lateral conflicts. Compared to the original Emergency Index (EI), MEI refines the estimation of the time available for evasive maneuvers, enabling more precise risk quantification. We validate MEI on a public lateral conflict dataset based on Argoverse-2, from which we extract over 1,500 high-quality AV conflict cases, including more than 500 critical events. MEI is then compared with the well-established ACT and the widely used PET metrics. Results show that MEI consistently outperforms them in accurately quantifying criticality and capturing risk evolution. Overall, these findings highlight MEI as a promising metric for evaluating urban conflicts and enhancing the safety assessment framework for autonomous driving. The open-source implementation is available at https://github.com/AutoChengh/MEI.
A Modular and Scalable System Architecture for Heterogeneous UAV Swarms Using ROS 2 and PX4-Autopilot
In this paper a modular and scalable architecture for heterogeneous swarm-based Counter Unmanned Aerial Systems (C-UASs) built on PX4-Autopilot and Robot Operating System 2 (ROS 2) framework is presented. The proposed architecture emphasizes seamless integration of hardware components by introducing independent ROS 2 nodes for each component of a Unmanned Aerial Vehicle (UAV). Communication between swarm participants is abstracted in software, allowing the use of various technologies without architectural changes. Key functionalities are supported, e.g. leader following and formation flight to maneuver the swarm. The system also allows computer vision algorithms to be integrated for the detection and tracking of UAVs. Additionally, a ground station control is integrated for the coordination of swarm operations. Swarm-based Unmanned Aerial System (UAS) architecture is verified within a Gazebo simulation environment but also in real-world demonstrations.
Vectorized Online POMDP Planning ICRA 2026
Planning under partial observability is an essential capability of autonomous robots. The Partially Observable Markov Decision Process (POMDP) provides a powerful framework for planning under partial observability problems, capturing the stochastic effects of actions and the limited information available through noisy observations. POMDP solving could benefit tremendously from massive parallelization of today's hardware, but parallelizing POMDP solvers has been challenging. They rely on interleaving numerical optimization over actions with the estimation of their values, which creates dependencies and synchronization bottlenecks between parallel processes that can quickly offset the benefits of parallelization. In this paper, we propose Vectorized Online POMDP Planner (VOPP), a novel parallel online solver that leverages a recent POMDP formulation that analytically solves part of the optimization component, leaving only the estimation of expectations for numerical computation. VOPP represents all data structures related to planning as a collection of tensors and implements all planning steps as fully vectorized computations over this representation. The result is a massively parallel solver with no dependencies and synchronization bottlenecks between parallel computations. Experimental results indicate that VOPP is at least 20X more efficient in computing near-optimal solutions compared to an existing state-of-the-art parallel online solver.
comment: 8 pages, 3 figures. Submitted to ICRA 2026
Hybrid Gripper Finger Enabling In-Grasp Friction Modulation Using Inflatable Silicone Pockets ICRA 2026
Grasping objects with diverse mechanical properties, such as heavy, slippery, or fragile items, remains a significant challenge in robotics. Conventional grippers often rely on applying high normal forces, which can cause damage to objects. To address this limitation, we present a hybrid gripper finger that combines a rigid structural shell with a soft, inflatable silicone pocket. The gripper finger can actively modulate its surface friction by controlling the internal air pressure of the silicone pocket. Results from fundamental experiments indicate that increasing the internal pressure results in a proportional increase in the effective coefficient of friction. This enables the gripper to stably lift heavy and slippery objects without increasing the gripping force and to handle fragile or deformable objects, such as eggs, fruits, and paper cups, with minimal damage by increasing friction rather than applying excessive force. The experimental results demonstrate that the hybrid gripper finger with adaptable friction provides a robust and safer alternative to relying solely on high normal forces, thereby enhancing the gripper flexibility in handling delicate, fragile, and diverse objects.
comment: Submitted to ICRA 2026
MobiDock: Design and Control of A Modular Self Reconfigurable Bimanual Mobile Manipulator via Robotic Docking ICRA2026
Multi-robot systems, particularly mobile manipulators, face challenges in control coordination and dynamic stability when working together. To address this issue, this study proposes MobiDock, a modular self-reconfigurable mobile manipulator system that allows two independent robots to physically connect and form a unified mobile bimanual platform. This process helps transform a complex multi-robot control problem into the management of a simpler, single system. The system utilizes an autonomous docking strategy based on computer vision with AprilTag markers and a new threaded screw-lock mechanism. Experimental results show that the docked configuration demonstrates better performance in dynamic stability and operational efficiency compared to two independently cooperating robots. Specifically, the unified system has lower Root Mean Square (RMS) Acceleration and Jerk values, higher angular precision, and completes tasks significantly faster. These findings confirm that physical reconfiguration is a powerful design principle that simplifies cooperative control, improving stability and performance for complex tasks in real-world environments.
comment: ICRA2026 submited
Confined Space Underwater Positioning Using Collaborative Robots
Positioning of underwater robots in confined and cluttered spaces remains a key challenge for field operations. Existing systems are mostly designed for large, open-water environments and struggle in industrial settings due to poor coverage, reliance on external infrastructure, and the need for feature-rich surroundings. Multipath effects from continuous sound reflections further degrade signal quality, reducing accuracy and reliability. Accurate and easily deployable positioning is essential for repeatable autonomous missions; however, this requirement has created a technological bottleneck limiting underwater robotic deployment. This paper presents the Collaborative Aquatic Positioning (CAP) system, which integrates collaborative robotics and sensor fusion to overcome these limitations. Inspired by the "mother-ship" concept, the surface vehicle acts as a mobile leader to assist in positioning a submerged robot, enabling localization even in GPS-denied and highly constrained environments. The system is validated in a large test tank through repeatable autonomous missions using CAP's position estimates for real-time trajectory control. Experimental results demonstrate a mean Euclidean distance (MED) error of 70 mm, achieved in real time without requiring fixed infrastructure, extensive calibration, or environmental features. CAP leverages advances in mobile robot sensing and leader-follower control to deliver a step change in accurate, practical, and infrastructure-free underwater localization.
comment: 31 pages including appendix, 24 figures
WildfireX-SLAM: A Large-scale Low-altitude RGB-D Dataset for Wildfire SLAM and Beyond
3D Gaussian splatting (3DGS) and its subsequent variants have led to remarkable progress in simultaneous localization and mapping (SLAM). While most recent 3DGS-based SLAM works focus on small-scale indoor scenes, developing 3DGS-based SLAM methods for large-scale forest scenes holds great potential for many real-world applications, especially for wildfire emergency response and forest management. However, this line of research is impeded by the absence of a comprehensive and high-quality dataset, and collecting such a dataset over real-world scenes is costly and technically infeasible. To this end, we have built a large-scale, comprehensive, and high-quality synthetic dataset for SLAM in wildfire and forest environments. Leveraging the Unreal Engine 5 Electric Dreams Environment Sample Project, we developed a pipeline to easily collect aerial and ground views, including ground-truth camera poses and a range of additional data modalities from unmanned aerial vehicle. Our pipeline also provides flexible controls on environmental factors such as light, weather, and types and conditions of wildfire, supporting the need for various tasks covering forest mapping, wildfire emergency response, and beyond. The resulting pilot dataset, WildfireX-SLAM, contains 5.5k low-altitude RGB-D aerial images from a large-scale forest map with a total size of 16 km2. On top of WildfireX-SLAM, a thorough benchmark is also conducted, which not only reveals the unique challenges of 3DGS-based SLAM in the forest but also highlights potential improvements for future works. The dataset and code will be publicly available. Project page: https://zhicongsun.github.io/wildfirexslam.
comment: This paper has been accepted by MMM 2026
Learning Generalizable Visuomotor Policy through Dynamics-Alignment
Behavior cloning methods for robot learning suffer from poor generalization due to limited data support beyond expert demonstrations. Recent approaches leveraging video prediction models have shown promising results by learning rich spatiotemporal representations from large-scale datasets. However, these models learn action-agnostic dynamics that cannot distinguish between different control inputs, limiting their utility for precise manipulation tasks and requiring large pretraining datasets. We propose a Dynamics-Aligned Flow Matching Policy (DAP) that integrates dynamics prediction into policy learning. Our method introduces a novel architecture where policy and dynamics models provide mutual corrective feedback during action generation, enabling self-correction and improved generalization. Empirical validation demonstrates generalization performance superior to baseline methods on real-world robotic manipulation tasks, showing particular robustness in OOD scenarios including visual distractions and lighting variations.
comment: 9 pages, 6 figures
RObotic MAnipulation Network (ROMAN) -- Hybrid Hierarchical Learning for Solving Complex Sequential Tasks
Solving long sequential tasks poses a significant challenge in embodied artificial intelligence. Enabling a robotic system to perform diverse sequential tasks with a broad range of manipulation skills is an active area of research. In this work, we present a Hybrid Hierarchical Learning framework, the Robotic Manipulation Network (ROMAN), to address the challenge of solving multiple complex tasks over long time horizons in robotic manipulation. ROMAN achieves task versatility and robust failure recovery by integrating behavioural cloning, imitation learning, and reinforcement learning. It consists of a central manipulation network that coordinates an ensemble of various neural networks, each specialising in distinct re-combinable sub-tasks to generate their correct in-sequence actions for solving complex long-horizon manipulation tasks. Experimental results show that by orchestrating and activating these specialised manipulation experts, ROMAN generates correct sequential activations for accomplishing long sequences of sophisticated manipulation tasks and achieving adaptive behaviours beyond demonstrations, while exhibiting robustness to various sensory noises. These results demonstrate the significance and versatility of ROMAN's dynamic adaptability featuring autonomous failure recovery capabilities, and highlight its potential for various autonomous manipulation tasks that demand adaptive motor skills.
comment: To appear in Nature Machine Intelligence. Includes the main and supplementary manuscript. Total of 70 pages, with a total of 9 Figures and 17 Tables
GenSwarm: Scalable Multi-Robot Code-Policy Generation and Deployment via Language Models
The development of control policies for multi-robot systems traditionally follows a complex and labor-intensive process, often lacking the flexibility to adapt to dynamic tasks. This has motivated research on methods to automatically create control policies. However, these methods require iterative processes of manually crafting and refining objective functions, thereby prolonging the development cycle. This work introduces \textit{GenSwarm}, an end-to-end system that leverages large language models to automatically generate and deploy control policies for multi-robot tasks based on simple user instructions in natural language. As a multi-language-agent system, GenSwarm achieves zero-shot learning, enabling rapid adaptation to altered or unseen tasks. The white-box nature of the code policies ensures strong reproducibility and interpretability. With its scalable software and hardware architectures, GenSwarm supports efficient policy deployment on both simulated and real-world multi-robot systems, realizing an instruction-to-execution end-to-end functionality that could prove valuable for robotics specialists and non-specialists alike.The code of the proposed GenSwarm system is available online: https://github.com/WindyLab/GenSwarm.
comment: This article has been accepted for publication in npj Robotics
A Study on Human-Swarm Interaction: A Framework for Assessing Situation Awareness and Task Performance
This paper introduces a framework for human swarm interaction studies that measures situation awareness in dynamic environments. A tablet-based interface was developed for a user study by implementing the concepts introduced in the framework, where operators guided a robotic swarm in a single-target search task, marking hazardous cells unknown to the swarm. Both subjective and objective situation awareness measures were used, with task performance evaluated based on how close the robots were to the target. The framework enabled a structured investigation of the role of situation awareness in human swarm interaction, leading to key findings such as improved task performance across attempts, showing the interface was learnable, centroid active robot position proved to be a useful task performance metric for assessing situation awareness, perception and projection played a key role in task performance, highlighting their importance in interface design and objective situation awareness influenced both subjective situation awareness and task performance, emphasizing the need for interfaces that emphasise objective situation awareness. These findings validate our framework as a structured approach for integrating situation awareness concepts into human swarm interaction studies, offering a systematic way to assess situation awareness and task performance. The framework can be applied to other swarming studies to evaluate interface learnability, identify meaningful task performance metrics, and refine interface designs to enhance situation awareness, ultimately improving human swarm interaction in dynamic environments.
comment: 10 pages, 8 figures, 2 tables, 2 equations
Uncertainty-Based Smooth Policy Regularisation for Reinforcement Learning with Few Demonstrations
In reinforcement learning with sparse rewards, demonstrations can accelerate learning, but determining when to imitate them remains challenging. We propose Smooth Policy Regularisation from Demonstrations (SPReD), a framework that addresses the fundamental question: when should an agent imitate a demonstration versus follow its own policy? SPReD uses ensemble methods to explicitly model Q-value distributions for both demonstration and policy actions, quantifying uncertainty for comparisons. We develop two complementary uncertainty-aware methods: a probabilistic approach estimating the likelihood of demonstration superiority, and an advantage-based approach scaling imitation by statistical significance. Unlike prevailing methods (e.g. Q-filter) that make binary imitation decisions, SPReD applies continuous, uncertainty-proportional regularisation weights, reducing gradient variance during training. Despite its computational simplicity, SPReD achieves remarkable gains in experiments across eight robotics tasks, outperforming existing approaches by up to a factor of 14 in complex tasks while maintaining robustness to demonstration quality and quantity. Our code is available at https://github.com/YujieZhu7/SPReD.
A Tactile Feedback Approach to Path Recovery after High-Speed Impacts for Collision-Resilient Drones
Aerial robots are a well-established solution for exploration, monitoring, and inspection, thanks to their superior maneuverability and agility. However, in many environments, they risk crashing and sustaining damage after collisions. Traditional methods focus on avoiding obstacles entirely, but these approaches can be limiting, particularly in cluttered spaces or on weight-and compute-constrained platforms such as drones. This paper presents a novel approach to enhance drone robustness and autonomy by developing a path recovery and adjustment method for a high-speed collision-resilient aerial robot equipped with lightweight, distributed tactile sensors. The proposed system explicitly models collisions using pre-collision velocities, rates and tactile feedback to predict post-collision dynamics, improving state estimation accuracy. Additionally, we introduce a computationally efficient vector-field-based path representation that guarantees convergence to a user-specified path, while naturally avoiding known obstacles. Post-collision, contact point locations are incorporated into the vector field as a repulsive potential, enabling the drone to avoid obstacles while naturally returning to its path. The effectiveness of this method is validated through Monte Carlo simulations and demonstrated on a physical prototype, showing successful path following, collision recovery, and adjustment at speeds up to 3.7 m/s.
SafeAgentBench: A Benchmark for Safe Task Planning of Embodied LLM Agents
With the integration of large language models (LLMs), embodied agents have strong capabilities to understand and plan complicated natural language instructions. However, a foreseeable issue is that those embodied agents can also flawlessly execute some hazardous tasks, potentially causing damages in the real world. Existing benchmarks predominantly overlook critical safety risks, focusing solely on planning performance, while a few evaluate LLMs' safety awareness only on non-interactive image-text data. To address this gap, we present SafeAgentBench -- the first comprehensive benchmark for safety-aware task planning of embodied LLM agents in interactive simulation environments, covering both explicit and implicit hazards. SafeAgentBench includes: (1) an executable, diverse, and high-quality dataset of 750 tasks, rigorously curated to cover 10 potential hazards and 3 task types; (2) SafeAgentEnv, a universal embodied environment with a low-level controller, supporting multi-agent execution with 17 high-level actions for 9 state-of-the-art baselines; and (3) reliable evaluation methods from both execution and semantic perspectives. Experimental results show that, although agents based on different design frameworks exhibit substantial differences in task success rates, their overall safety awareness remains weak. The most safety-conscious baseline achieves only a 10% rejection rate for detailed hazardous tasks. Moreover, simply replacing the LLM driving the agent does not lead to notable improvements in safety awareness. Dataset and codes are available in https://github.com/shengyin1224/SafeAgentBench and https://huggingface.co/datasets/safeagentbench/SafeAgentBench.
comment: 28 pages, 19 tables, 15 figures
From Canada to Japan: How 10,000 km Affect User Perception in Robot Teleoperation
Robot teleoperation (RTo) has emerged as a viable alternative to local control, particularly when human intervention is still necessary. This research aims to study the distance effect on user perception in RTo, exploring the potential of teleoperated robots for older adult care. We propose an evaluation of non-expert users' perception of long-distance RTo, examining how their perception changes before and after interaction, as well as comparing it to that of locally operated robots. We have designed a specific protocol consisting of multiple questionnaires, along with a dedicated software architecture using the Robotics Operating System (ROS) and Unity. The results revealed no statistically significant differences between the local and remote robot conditions, suggesting that robots may be a viable alternative to traditional local control.
comment: Author preprint - Accepted for Humanoids 2025
A Practical-Driven Framework for Transitioning Drive-by-Wire to Autonomous Driving Systems: A Case Study with a Chrysler Pacifica Hybrid Vehicle
Transitioning from a Drive-by-Wire (DBW) system to a fully autonomous driving system (ADS) involves multiple stages of development and demands robust positioning and sensing capabilities. This paper presents a practice-driven framework for facilitating the DBW-to-ADS transition using a 2022 Chrysler Pacifica Hybrid Minivan equipped with cameras, LiDAR, GNSS, and onboard computing hardware configured with the Robot Operating System (ROS) and Autoware.AI. The implementation showcases offline autonomous operations utilizing pre-recorded LiDAR and camera data, point clouds, and vector maps, enabling effective localization and path planning within a structured test environment. The study addresses key challenges encountered during the transition, particularly those related to wireless-network-assisted sensing and positioning. It offers practical solutions for overcoming software incompatibility constraints, sensor synchronization issues, and limitations in real-time perception. Furthermore, the integration of sensing, data fusion, and automation is emphasized as a critical factor in supporting autonomous driving systems in map generation, simulation, and training. Overall, the transition process outlined in this work aims to provide actionable strategies for researchers pursuing DBW-to-ADS conversion. It offers direction for incorporating real-time perception, GNSS-LiDAR-camera integration, and fully ADS-equipped autonomous vehicle operations, thus contributing to the advancement of robust autonomous vehicle technologies.
comment: This updated version includes further implementation details and experimental validation. Accepted for presentation at The 22nd International Conference on Automation Technology (AUTOMATION 2025), Taipei, Taiwan, November 2025
Panoramic Out-of-Distribution Segmentation for Autonomous Driving
Panoramic imaging enables capturing 360{\deg} images with an ultra-wide Field-of-View (FoV) for dense omnidirectional perception, which is critical to applications, such as autonomous driving and augmented reality, etc. However, current panoramic semantic segmentation methods fail to identify outliers, and pinhole Out-of-distribution Segmentation (OoS) models perform unsatisfactorily in the panoramic domain due to background clutter and pixel distortions. To address these issues, we introduce a new task, Panoramic Out-of-distribution Segmentation (PanOoS), with the aim of achieving comprehensive and safe scene understanding. Furthermore, we propose the first solution, POS, which adapts to the characteristics of panoramic images through text-guided prompt distribution learning. Specifically, POS integrates a disentanglement strategy designed to materialize the cross-domain generalization capability of CLIP. The proposed Prompt-based Restoration Attention (PRA) optimizes semantic decoding by prompt guidance and self-adaptive correction, while Bilevel Prompt Distribution Learning (BPDL) refines the manifold of per-pixel mask embeddings via semantic prototype supervision. Besides, to compensate for the scarcity of PanOoS datasets, we establish two benchmarks: DenseOoS, which features diverse outliers in complex environments, and QuadOoS, captured by a quadruped robot with a panoramic annular lens system. Extensive experiments demonstrate superior performance of POS, with AuPRC improving by 34.25% and FPR95 decreasing by 21.42% on DenseOoS, outperforming state-of-the-art pinhole-OoS methods. Moreover, POS achieves leading closed-set segmentation capabilities and advances the development of panoramic understanding. Code and datasets will be available at https://github.com/MengfeiD/PanOoS.
comment: Code and datasets will be available at https://github.com/MengfeiD/PanOoS
Sim2Real Diffusion: Leveraging Foundation Vision Language Models for Adaptive Automated Driving
Simulation-based design, optimization, and validation of autonomous vehicles have proven to be crucial for their improvement over the years. Nevertheless, the ultimate measure of effectiveness is their successful transition from simulation to reality (sim2real). However, existing sim2real transfer methods struggle to address the autonomy-oriented requirements of balancing: (i) conditioned domain adaptation, (ii) robust performance with limited examples, (iii) modularity in handling multiple domain representations, and (iv) real-time performance. To alleviate these pain points, we present a unified framework for learning cross-domain adaptive representations through conditional latent diffusion for sim2real transferable automated driving. Our framework offers options to leverage: (i) alternate foundation models, (ii) a few-shot fine-tuning pipeline, and (iii) textual as well as image prompts for mapping across given source and target domains. It is also capable of generating diverse high-quality samples when diffusing across parameter spaces such as times of day, weather conditions, seasons, and operational design domains. We systematically analyze the presented framework and report our findings in terms of performance benchmarks and ablation studies. Additionally, we demonstrate its serviceability for autonomous driving using behavioral cloning case studies. Our experiments indicate that the proposed framework is capable of bridging the perceptual sim2real gap by over 40%.
comment: Accepted in IEEE Robotics and Automation Letters (RA-L)
Faster Model Predictive Control via Self-Supervised Initialization Learning
Model Predictive Control (MPC) is widely used in robot control by optimizing a sequence of control outputs over a finite-horizon. Computational approaches for MPC include deterministic methods (e.g., iLQR and COBYLA), as well as sampling-based methods (e.g., MPPI and CEM). However, complex system dynamics and non-convex or non-differentiable cost terms often lead to prohibitive optimization times that limit real-world deployment. Prior efforts to accelerate MPC have limitations on: (i) reusing previous solutions fails under sharp state changes and (ii) pure imitation learning does not target compute efficiency directly and suffers from suboptimality in the training data. To address these, We propose a warm-start framework that learns a policy to generate high-quality initial guesses for MPC solver. The policy is first trained via behavior cloning from expert MPC rollouts and then fine-tuned online with reinforcement learning to directly minimize MPC optimization time. We empirically validate that our approach improves both deterministic and sampling-based MPC methods, achieving up to 21.6% faster optimization and 34.1% more tracking accuracy for deterministic MPC in Formula 1 track path-tracking domain, and improving safety by 100%, path efficiency by 12.8%, and steering smoothness by 7.2% for sampling-based MPC in obstacle-rich navigation domain. These results demonstrate that our framework not only accelerates MPC but also improves overall control performance. Furthermore, it can be applied to a broader range of control algorithms that benefit from good initial guesses.
LPAC: Learnable Perception-Action-Communication Loops with Applications to Coverage Control
Coverage control is the problem of navigating a robot swarm to collaboratively monitor features or a phenomenon of interest not known a priori. The problem is challenging in decentralized settings with robots that have limited communication and sensing capabilities. We propose a learnable Perception-Action-Communication (LPAC) architecture for the problem, wherein a convolutional neural network (CNN) processes localized perception; a graph neural network (GNN) facilitates robot communications; finally, a shallow multi-layer perceptron (MLP) computes robot actions. The GNN enables collaboration in the robot swarm by computing what information to communicate with nearby robots and how to incorporate received information. Evaluations show that the LPAC models -- trained using imitation learning -- outperform standard decentralized and centralized coverage control algorithms. The learned policy generalizes to environments different from the training dataset, transfers to larger environments with more robots, and is robust to noisy position estimates. The results indicate the suitability of LPAC architectures for decentralized navigation in robot swarms to achieve collaborative behavior.
comment: 20 Pages, 20 figures,
ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning NeurIPS 2025
We propose ReinFlow, a simple yet effective online reinforcement learning (RL) framework that fine-tunes a family of flow matching policies for continuous robotic control. Derived from rigorous RL theory, ReinFlow injects learnable noise into a flow policy's deterministic path, converting the flow into a discrete-time Markov Process for exact and straightforward likelihood computation. This conversion facilitates exploration and ensures training stability, enabling ReinFlow to fine-tune diverse flow model variants, including Rectified Flow [35] and Shortcut Models [19], particularly at very few or even one denoising step. We benchmark ReinFlow in representative locomotion and manipulation tasks, including long-horizon planning with visual input and sparse reward. The episode reward of Rectified Flow policies obtained an average net growth of 135.36% after fine-tuning in challenging legged locomotion tasks while saving denoising steps and 82.63% of wall time compared to state-of-the-art diffusion RL fine-tuning method DPPO [43]. The success rate of the Shortcut Model policies in state and visual manipulation tasks achieved an average net increase of 40.34% after fine-tuning with ReinFlow at four or even one denoising step, whose performance is comparable to fine-tuned DDIM policies while saving computation time for an average of 23.20%. Project webpage: https://reinflow.github.io/
comment: NeurIPS 2025
Curvature-Aware Calibration of Tactile Sensors for Accurate Force Estimation on Non-Planar Surfaces
Flexible tactile sensors are increasingly used in real-world applications such as robotic grippers, prosthetic hands, wearable gloves, and assistive devices, where they need to conform to curved and irregular surfaces. However, most existing tactile sensors are calibrated only on flat substrates, and their accuracy and consistency degrade once mounted on curved geometries. This limitation restricts their reliability in practical use. To address this challenge, we develop a calibration model for a widely used resistive tactile sensor design that enables accurate force estimation on one-dimensional curved surfaces. We then train a neural network (a multilayer perceptron) to predict local curvature from baseline sensor outputs recorded under no applied load, achieving an R2 score of 0.91. The proposed approach is validated on five daily objects with varying curvatures under forces from 2 N to 8 N. Results show that the curvature-aware calibration maintains consistent force accuracy across all surfaces, while flat-surface calibration underestimates force as curvature increases. Our results demonstrate that curvature-aware modeling improves the accuracy, consistency, and reliability of flexible tactile sensors, enabling dependable performance across real-world applications.
comment: This work has been submitted to the IEEE for possible publication
ReactEMG: Zero-Shot, Low-Latency Intent Detection via sEMG
Surface electromyography (sEMG) signals show promise for effective human-computer interfaces, particularly in rehabilitation and prosthetics. However, challenges remain in developing systems that respond quickly and reliably to user intent, across different subjects and without requiring time-consuming calibration. In this work, we propose a framework for EMG-based intent detection that addresses these challenges. Unlike traditional gesture recognition models that wait until a gesture is completed before classifying it, our approach uses a segmentation strategy to assign intent labels at every timestep as the gesture unfolds. We introduce a novel masked modeling strategy that aligns muscle activations with their corresponding user intents, enabling rapid onset detection and stable tracking of ongoing gestures. In evaluations against baseline methods, considering both accuracy and stability for device control, our approach surpasses state-of-the-art performance in zero-shot transfer conditions, demonstrating its potential for wearable robotics and next-generation prosthetic systems. Our project page is available at: https://reactemg.github.io
Vision-Based Online Key Point Estimation of Deformable Robots
The precise control of soft and continuum robots requires knowledge of their shape, which has, in contrast to classical rigid robots, infinite degrees of freedom. To partially reconstruct the shape, proprioceptive techniques use built-in sensors resulting in inaccurate results and increased fabrication complexity. Exteroceptive methods so far rely on expensive tracking systems with reflective markers placed on all components, which are infeasible for deformable robots interacting with the environment due to marker occlusion and damage. Here, a regression approach is presented for 3D key point estimation using a convolutional neural network. The proposed approach takes advantage of data-driven supervised learning and is capable of online marker-less estimation during inference. Two images of a robotic system are taken simultaneously at 25 Hz from two different perspectives, and are fed to the network, which returns for each pair the parameterized key point or PCC shape representations. The proposed approach outperforms marker-less state-of-the-art methods by a maximum of 4.5% in estimation accuracy while at the same time being more robust and requiring no prior knowledge of the shape. Online evaluations on two types of soft robotic arms and a soft robotic fish demonstrate our method's accuracy and versatility on highly deformable systems.
Multiagent Systems
Social learning moderates the tradeoffs between efficiency, stability, and equity in group foraging
Social learning shapes collective search by influencing how individuals use peer information. Empirical and computational studies show that optimal information sharing that is neither too localized nor too diffuse, can enhance resource detection and coordination. Building on these insights, we develop a randomized search model that integrates social learning with area-restricted search (ARS) to investigate how communication distance affects collective foraging. The model includes three behavioral modes: exploration, exploitation, and targeted walk, which are governed by a single parameter, $\rho$, that balances exploration and exploitation at the group level. We quantify how $\rho$ influences group efficiency ($\eta$), temporal variability/burstiness ($B$), and agent variability/equity in resource distribution ($\sigma$), revealing a clear trade-off among these outcomes. When $\rho \to 0$, agents explore independently, maximizing collective exploration. As $\rho$ increases, individuals preferentially exploit patches discovered by others: $\eta$ first rises and then declines, while $B$ shows the opposite trend. Group efficiency is optimized at interior $\rho$ values that balance exploration and exploitation. At the largest $\rho$, equality among agents is highest, but efficiency declines and burstiness is maximized too. Finally, by introducing negative rewards, we examine how social learning mitigates risk.
comment: Code and data: https://github.com/LoneStar97/social-learning-search ; additional simulations: https://www.youtube.com/playlist?list=PLgRFM9nAjJRwoZvCGBAdCIE-BYNgPmSuV
Challenges in Credit Assignment for Multi-Agent Reinforcement Learning in Open Agent Systems
In the rapidly evolving field of multi-agent reinforcement learning (MARL), understanding the dynamics of open systems is crucial. Openness in MARL refers to the dynam-ic nature of agent populations, tasks, and agent types with-in a system. Specifically, there are three types of openness as reported in (Eck et al. 2023) [2]: agent openness, where agents can enter or leave the system at any time; task openness, where new tasks emerge, and existing ones evolve or disappear; and type openness, where the capabil-ities and behaviors of agents change over time. This report provides a conceptual and empirical review, focusing on the interplay between openness and the credit assignment problem (CAP). CAP involves determining the contribution of individual agents to the overall system performance, a task that becomes increasingly complex in open environ-ments. Traditional credit assignment (CA) methods often assume static agent populations, fixed and pre-defined tasks, and stationary types, making them inadequate for open systems. We first conduct a conceptual analysis, in-troducing new sub-categories of openness to detail how events like agent turnover or task cancellation break the assumptions of environmental stationarity and fixed team composition that underpin existing CAP methods. We then present an empirical study using representative temporal and structural algorithms in an open environment. The results demonstrate that openness directly causes credit misattribution, evidenced by unstable loss functions and significant performance degradation.
FinPos: A Position-Aware Trading Agent System for Real Financial Markets
The exceptional potential of large language models (LLMs) in handling text information has garnered significant attention in the field of financial trading. However, current trading agents primarily focus on single-step trading tasks and lack awareness of continuous position management. Therefore, we propose a position-aware trading task designed to simulate a more realistic market. To address this task, we develop a trading agent system, FinPos, optimized for position management. FinPos is able to interpret various types of market information from a professional perspective, providing a reliable basis for positioning decisions. To mitigate the substantial market risks arising from position fluctuations, FinPos employs dual decision agents. Furthermore, the continuous nature of position management necessitates our adoption of multi-timescale rewards, which in turn empowers FinPos to effectively balance short-term fluctuations against long-term trends. Extensive experiments demonstrate that FinPos surpasses state-of-the-art trading agents in the position-aware trading task, which closely mirrors real market conditions. More importantly, our findings reveal that LLM-centered agent systems exhibit a vast, largely unexplored potential in long-term market decision-making.
comment: LLM Applications, LLM Agents, Financial Technology
DCcluster-Opt: Benchmarking Dynamic Multi-Objective Optimization for Geo-Distributed Data Center Workloads NeurIPS 2025
The increasing energy demands and carbon footprint of large-scale AI require intelligent workload management in globally distributed data centers. Yet progress is limited by the absence of benchmarks that realistically capture the interplay of time-varying environmental factors (grid carbon intensity, electricity prices, weather), detailed data center physics (CPUs, GPUs, memory, HVAC energy), and geo-distributed network dynamics (latency and transmission costs). To bridge this gap, we present DCcluster-Opt: an open-source, high-fidelity simulation benchmark for sustainable, geo-temporal task scheduling. DCcluster-Opt combines curated real-world datasets, including AI workload traces, grid carbon intensity, electricity markets, weather across 20 global regions, cloud transmission costs, and empirical network delay parameters with physics-informed models of data center operations, enabling rigorous and reproducible research in sustainable computing. It presents a challenging scheduling problem where a top-level coordinating agent must dynamically reassign or defer tasks that arrive with resource and service-level agreement requirements across a configurable cluster of data centers to optimize multiple objectives. The environment also models advanced components such as heat recovery. A modular reward system enables an explicit study of trade-offs among carbon emissions, energy costs, service level agreements, and water use. It provides a Gymnasium API with baseline controllers, including reinforcement learning and rule-based strategies, to support reproducible ML research and a fair comparison of diverse algorithms. By offering a realistic, configurable, and accessible testbed, DCcluster-Opt accelerates the development and validation of next-generation sustainable computing solutions for geo-distributed data centers.
comment: Submitted to the NeurIPS 2025 conference
LC-Opt: Benchmarking Reinforcement Learning and Agentic AI for End-to-End Liquid Cooling Optimization in Data Centers NeurIPS 2025
Liquid cooling is critical for thermal management in high-density data centers with the rising AI workloads. However, machine learning-based controllers are essential to unlock greater energy efficiency and reliability, promoting sustainability. We present LC-Opt, a Sustainable Liquid Cooling (LC) benchmark environment, for reinforcement learning (RL) control strategies in energy-efficient liquid cooling of high-performance computing (HPC) systems. Built on the baseline of a high-fidelity digital twin of Oak Ridge National Lab's Frontier Supercomputer cooling system, LC-Opt provides detailed Modelica-based end-to-end models spanning site-level cooling towers to data center cabinets and server blade groups. RL agents optimize critical thermal controls like liquid supply temperature, flow rate, and granular valve actuation at the IT cabinet level, as well as cooling tower (CT) setpoints through a Gymnasium interface, with dynamic changes in workloads. This environment creates a multi-objective real-time optimization challenge balancing local thermal regulation and global energy efficiency, and also supports additional components like a heat recovery unit (HRU). We benchmark centralized and decentralized multi-agent RL approaches, demonstrate policy distillation into decision and regression trees for interpretable control, and explore LLM-based methods that explain control actions in natural language through an agentic mesh architecture designed to foster user trust and simplify system management. LC-Opt democratizes access to detailed, customizable liquid cooling models, enabling the ML community, operators, and vendors to develop sustainable data center liquid cooling control solutions.
comment: Submitted to the NeurIPS 2025 conference
GenSwarm: Scalable Multi-Robot Code-Policy Generation and Deployment via Language Models
The development of control policies for multi-robot systems traditionally follows a complex and labor-intensive process, often lacking the flexibility to adapt to dynamic tasks. This has motivated research on methods to automatically create control policies. However, these methods require iterative processes of manually crafting and refining objective functions, thereby prolonging the development cycle. This work introduces \textit{GenSwarm}, an end-to-end system that leverages large language models to automatically generate and deploy control policies for multi-robot tasks based on simple user instructions in natural language. As a multi-language-agent system, GenSwarm achieves zero-shot learning, enabling rapid adaptation to altered or unseen tasks. The white-box nature of the code policies ensures strong reproducibility and interpretability. With its scalable software and hardware architectures, GenSwarm supports efficient policy deployment on both simulated and real-world multi-robot systems, realizing an instruction-to-execution end-to-end functionality that could prove valuable for robotics specialists and non-specialists alike.The code of the proposed GenSwarm system is available online: https://github.com/WindyLab/GenSwarm.
comment: This article has been accepted for publication in npj Robotics
Exploiting Agent Symmetries for Performance Analysis of Distributed Optimization Methods
We show that, in many settings, the worst-case performance of a distributed optimization algorithm is independent of the number of agents in the system, and can thus be computed in the fundamental case with just two agents. This result relies on a novel approach that systematically exploits symmetries in worst-case performance computation, framed as Semidefinite Programming (SDP) via the Performance Estimation Problem (PEP) framework. Harnessing agent symmetries in the PEP yields compact problems whose size is independent of the number of agents in the system. When all agents are equivalent in the problem, we establish the explicit conditions under which the resulting worst-case performance is independent of the number of agents and is therefore equivalent to the basic case with two agents. Our compact PEP formulation also allows the consideration of multiple equivalence classes of agents, and its size only depends on the number of equivalence classes. This enables practical and automated performance analysis of distributed algorithms in numerous complex and realistic settings, such as the analysis of the worst agent performance. We leverage this new tool to analyze the performance of the EXTRA algorithm in advanced settings and its scalability with the number of agents, providing a tighter analysis and deeper understanding of the algorithm performance.
comment: submitted to Open Journal of Mathematical Optimization (OJMO)
A Framework for Objective-Driven Dynamical Stochastic Fields
Fields offer a versatile approach for describing complex systems composed of interacting and dynamic components. In particular, some of these dynamical and stochastic systems may exhibit goal-directed behaviors aimed at achieving specific objectives, which we refer to as $\textit{intelligent fields}$. However, due to their inherent complexity, it remains challenging to develop a formal theoretical description of such systems and to effectively translate these descriptions into practical applications. In this paper, we propose three fundamental principles to establish a theoretical framework for understanding intelligent fields: complete configuration, locality, and purposefulness. Moreover, we explore methodologies for designing such fields from the perspective of artificial intelligence applications. This initial investigation aims to lay the groundwork for future theoretical developments and practical advances in understanding and harnessing the potential of such objective-driven dynamical stochastic fields.
CogPlanner: Unveiling the Potential of Agentic Multimodal Retrieval Augmented Generation with Planning SIGIR
Multimodal Retrieval Augmented Generation (MRAG) systems have shown promise in enhancing the generation capabilities of multimodal large language models (MLLMs). However, existing MRAG frameworks primarily adhere to rigid, single-step retrieval strategies that fail to address real-world challenges of information acquisition and query reformulation. In this work, we introduce the task of Multimodal Retrieval Augmented Generation Planning (MRAG Planning) that aims at effective information seeking and integration while minimizing computational overhead. Specifically, we propose CogPlanner, an agentic plug-and-play framework inspired by human cognitive processes, which iteratively determines query reformulation and retrieval strategies to generate accurate and contextually relevant responses. CogPlanner supports parallel and sequential modeling paradigms. Furthermore, we introduce CogBench, a new benchmark designed to rigorously evaluate the MRAG Planning task and facilitate lightweight CogPlanner integration with resource-efficient MLLMs, such as Qwen2-VL-7B-Cog. Experimental results demonstrate that CogPlanner significantly outperforms existing MRAG baselines, offering improvements in both accuracy and efficiency with minimal additional computational costs.
comment: Accepted by SIGIR-AP 2025
PartnerMAS: An LLM Hierarchical Multi-Agent Framework for Business Partner Selection on High-Dimensional Features
High-dimensional decision-making tasks, such as business partner selection, involve evaluating large candidate pools with heterogeneous numerical, categorical, and textual features. While large language models (LLMs) offer strong in-context reasoning capabilities, single-agent or debate-style systems often struggle with scalability and consistency in such settings. We propose PartnerMAS, a hierarchical multi-agent framework that decomposes evaluation into three layers: a Planner Agent that designs strategies, Specialized Agents that perform role-specific assessments, and a Supervisor Agent that integrates their outputs. To support systematic evaluation, we also introduce a curated benchmark dataset of venture capital co-investments, featuring diverse firm attributes and ground-truth syndicates. Across 140 cases, PartnerMAS consistently outperforms single-agent and debate-based multi-agent baselines, achieving up to 10--15\% higher match rates. Analysis of agent reasoning shows that planners are most responsive to domain-informed prompts, specialists produce complementary feature coverage, and supervisors play an important role in aggregation. Our findings demonstrate that structured collaboration among LLM agents can generate more robust outcomes than scaling individual models, highlighting PartnerMAS as a promising framework for high-dimensional decision-making in data-rich domains.
Adaptive Inference through Bayesian and Inverse Bayesian Inference with Symmetry-Bias in Nonstationary Environments
This study proposes the novel Bayesian and inverse Bayesian (BIB) inference framework that incorporates symmetry bias into the Bayesian updating process to perform both conventional and inverse Bayesian updates concurrently. Conventional Bayesian inference is constrained by a fundamental trade-off between adaptability to abrupt environmental changes and accuracy during stable periods. The BIB framework addresses this limitation by dynamically modulating the learning rate via inverse Bayesian updates, thereby enhancing adaptive flexibility. The BIB model was evaluated in a sequential estimation task involving observations drawn from a Gaussian distribution with a stochastically time-varying mean, where it exhibited spontaneous bursts in the learning rate during environmental transitions, transiently entering high-sensitivity states that facilitated rapid adaptation. This burst-relaxation dynamic serves as a mechanism for balancing adaptability and accuracy. Furthermore, avalanche analysis, detrended fluctuation analysis, and power spectral analysis revealed that the BIB system likely operates near a critical state-a property not observed in standard Bayesian inference. This suggests that the BIB model uniquely achieves a coexistence of computational efficiency and critical dynamics, resolving the adaptability-accuracy trade-off while maintaining scale-free behavior. These findings offer a new computational perspective on scale-free dynamics in natural systems and provide valuable insights for the design of adaptive inference systems in nonstationary environments.
Systems and Control (CS)
Technical Report for Dissipativity Learning in Reproducing Kernel Hilbert Space
This work presents a nonparametric framework for dissipativity learning in reproducing kernel Hilbert spaces, which enables data-driven certification of stability and performance properties for unknown nonlinear systems without requiring an explicit dynamic model. Dissipativity is a fundamental system property that generalizes Lyapunov stability, passivity, and finite L2 gain conditions through an energy balance inequality between a storage function and a supply rate. Unlike prior parametric formulations that approximate these functions using quadratic forms with fixed matrices, the proposed method represents them as Hilbert Schmidt operators acting on canonical kernel features, thereby capturing nonlinearities implicitly while preserving convexity and analytic tractability. The resulting operator optimization problem is formulated in the form of a one-class support vector machine and reduced, via the representer theorem, to a finite dimensional convex program expressed through kernel Gram matrices. Furthermore, statistical learning theory is applied to establish generalization guarantees, including confidence bounds on the dissipation rate and the L2 gain. Numerical results demonstrate that the proposed RKHS based dissipativity learning method effectively identifies nonlinear dissipative behavior directly from input output data, providing a powerful and interpretable framework for model free control analysis and synthesis.
comment: 26 pages, 3 figures
SpecAttn: Speculating Sparse Attention NeurIPS 2025
Large Language Models (LLMs) face significant computational bottlenecks during inference due to the quadratic complexity of self-attention mechanisms, particularly as context lengths increase. We introduce SpecAttn, a novel training-free approach that seamlessly integrates with existing speculative decoding techniques to enable efficient sparse attention in pre-trained transformers. Our key insight is to exploit the attention weights already computed by the draft model during speculative decoding to identify important tokens for the target model, eliminating redundant computation while maintaining output quality. SpecAttn employs three core techniques: KL divergence-based layer alignment between draft and target models, a GPU-optimized sorting-free algorithm for top-p token selection from draft attention patterns, and dynamic key-value cache pruning guided by these predictions. By leveraging the computational work already performed in standard speculative decoding pipelines, SpecAttn achieves over 75% reduction in key-value cache accesses with a mere 15.29% increase in perplexity on the PG-19 dataset, significantly outperforming existing sparse attention methods. Our approach demonstrates that speculative execution can be enhanced to provide approximate verification without significant performance degradation.
comment: Accepted to NeurIPS 2025 Workshop on Structured Probabilistic Inference & Generative Modeling
Risk-constrained stochastic scheduling of multi-market energy storage systems
Energy storage can promote the integration of renewables by operating with charge and discharge policies that balance an intermittent power supply. This study investigates the scheduling of energy storage assets under energy price uncertainty, with a focus on electricity markets. A two-stage stochastic risk-constrained approach is employed, whereby electricity price trajectories or specific power markets are observed, allowing for recourse in the schedule. Conditional value-at-risk is used to quantify tail risk in the optimization problems; this allows for the explicit specification of a probabilistic risk limit. The proposed approach is tested in an integrated hydrogen system (IHS) and a battery energy storage system (BESS). In the joint design and operation context for the IHS, the risk constraint results in larger installed unit capacities, increasing capital cost but enabling more energy inventory to buffer price uncertainty. As shown in both case studies, there is an operational trade-off between risk and expected reward; this is reflected in higher expected costs (or lower expected profits) with increasing levels of risk aversion. Despite the decrease in expected reward, both systems exhibit substantial benefits of increasing risk aversion. This work provides a general method to address uncertainties in energy storage scheduling, allowing operators to input their level of risk tolerance on asset decisions.
comment: 39 pages, 10 figures, 7 tables
Context-Aware Stochastic Modeling of Consumer Energy Resource Aggregators in Electricity Markets SC
Aggregators of consumer energy resources (CERs) like rooftop solar and battery energy storage (BES) face challenges due to their inherent uncertainties. A sensible approach is to use stochastic optimization to handle such uncertainties, which can lead to infeasible problems or loss in revenues if not chosen appropriately. This paper presents three efficient two-stage stochastic optimization methods: risk-neutral, robust, and chance-constrained, to address the impact of CER uncertainties for aggregators who participate in energy and regulation services markets in the Australian National Electricity Market. Furthermore, these methods utilize the flexibility of BES, considering precise state-of-charge dynamics and complementarity constraints, aiming for scalable performance while managing uncertainty. The problems are formed as two-stage stochastic mixed-integer linear programs, with relaxations adopted for large scenario sets. The solution approach employs scenario-based methodologies and affine recourse policies to obtain tractable reformulations. These methods are evaluated across use cases reflecting diverse operational and market settings, uncertainty characteristics, and decision-making preferences, demonstrating their ability to mitigate uncertainty, enhance profitability, and provide context-aware guidance for aggregators in choosing the most appropriate stochastic optimization method.
comment: Submitted to PSCC 2026
A Switching Strategy for Event-Trigger Control of Spacecraft Rendezvous
This paper presents the design of a state-feedback control law for spacecraft rendezvous, formulated using the Hill-Clohessy-Wiltshire equations. The proposed method introduces an impulsive control strategy to regulate thruster operations. Specifically, a state-dependent switching framework is developed to determine both the control input magnitudes and the precise state conditions that trigger thruster activation. The nonlinear control law is derived using principles from automatic control theory, particularly Lyapunov stability analysis and the Linear Matrix Inequality framework. The resulting closed-loop system is proven to be stable, while simultaneously minimizing the total number of actuation events. The effectiveness of the proposed method is demonstrated through a numerical case study, which includes a comparative analysis with a standard Model Predictive Control scheme, highlighting the advantages and trade-offs of the developed control structure.
comment: Submitted for EuroGNC 2026
Simplifying Preference Elicitation in Local Energy Markets: Combinatorial Clock Exchange
As distributed energy resources (DERs) proliferate, future power system will need new market platforms enabling prosumers to trade various electricity and grid-support products. However, prosumers often exhibit complex, product interdependent preferences and face limited cognitive and computational resources, hindering engagement with complex market structures and bid formats. We address this challenge by introducing a multi-product market that allows prosumers to express complex preferences through an intuitive format, by fusing combinatorial clock exchange and machine learning (ML) techniques. The iterative mechanism only requires prosumers to report their preferred package of products at posted prices, eliminating the need for forecasting product prices or adhering to complex bid formats, while the ML-aided price discovery speeds up convergence. The linear pricing rule further enhances transparency and interpretability. Finally, numerical simulations demonstrate convergence to clearing prices in approximately 15 clock iterations.
Value of Multi-pursuer Single-evader Pursuit-evasion Game with Terminal Cost of Evader's Position: Relaxation of Convexity Condition
In this study, we consider a multi-pursuer single-evader quantitative pursuit-evasion game with payoff function that includes only the terminal cost. The terminal cost is a function related only to the terminal position of the evader. This problem has been extensively studied in target defense games. Here, we prove that a candidate for the value function generated by geometric method is the viscosity solution of the corresponding Hamilton-Jacobi-Isaacs partial differential equation (HJI PDE) Dirichlet problem. Therefore, the value function of the game at each point can be computed by a mathematical program. In our work, the convexity of the terminal cost or the target is not required. The terminal cost only needs to be locally Lipschitz continuous. The cases in which the terminal costs or the targets are not convex are covered. Therefore, our result is more universal than those of previous studies, and the complexity of the proof is improved. We also discuss the optimal strategies in this game and present an intuitive explanation of this value function.
comment: 21 pages, 6 figures
Solving Infinite-Horizon Optimal Control Problems using the Extreme Theory of Functional Connections
This paper presents a physics-informed machine learning approach for synthesizing optimal feedback control policy for infinite-horizon optimal control problems by solving the Hamilton-Jacobi-Bellman (HJB) partial differential equation(PDE). The optimal control policy is derived analytically for affine dynamical systems with separable and strictly convex control costs, expressed as a function of the gradient of the value function. The resulting HJB-PDE is then solved by approximating the value function using the Extreme Theory of Functional Connections (X-TFC) - a hybrid approach that combines the Theory of Functional Connections (TFC) with the Extreme Learning Machine (ELM) algorithm. This approach ensures analytical satisfaction of boundary conditions and significantly reduces training cost compared to traditional Physics-Informed Neural Networks (PINNs). We benchmark the method on linear and non-linear systems with known analytical solutions as well as demonstrate its effectiveness on control tasks such as spacecraft optimal de-tumbling control.
comment: Accepted to Indian Control Conference (ICC-11), 6 pages, 12 figures
Optimal BESS Sizing and Placement for Mitigating EV-Induced Voltage Violations: A Scalable Spatio-Temporal Adaptive Targeting Strategy
The escalating adoption of electric vehicles (EVs) and the growing demand for charging solutions are driving a surge in EV charger installations in distribution networks. However, this rising EV load strains the distribution grid, causing severe voltage drops, particularly at feeder extremities. This study proposes a proactive voltage management (PVM) framework that can integrate Monte Carlo-based simulations of varying EV charging loads to (i) identify potential voltage violations through a voltage violation analysis (VVA) model, and (ii) then mitigate those violations with optimally-invested battery energy storage systems (BESS) through an optimal expansion planning (OEP) model. A novel spatio-temporal adaptive targeting (STAT) strategy is proposed to alleviate the computational complexity of the OEP model by defining a targeted OEP (T-OEP) model, solved by applying the OEP model to (i) a reduced set of representative critical time periods and (ii) candidate BESS installation nodes. The efficacy and scalability of the proposed approach are validated on 33-bus, 69-bus, and a large-scale 240-bus system. Results demonstrate that the strategic sizing and placement of BESS not only effectively mitigate voltage violations but also yield substantial cost savings on electricity purchases under time-of-use tariffs. This research offers a cost-effective and scalable solution for integrating high penetrations of EVs, providing crucial insights for future distribution network planning.
Analyzing the Impact of Demand Response on Short-Circuit Current via a Unit Commitment Model
In low-carbon grids, system flexibility can be enhanced through mechanisms such as Demand Response (DR), enabling the efficient utilization of renewable energy. However, as Synchronous Generators (SGs) are being replaced with renewable energy characterized by Inverter-Based Resources (IBR), system stability is severely affected. Due to the limited overload capability of IBR, their Short-Circuit Current (SCC) contribution is much smaller than that of SGs, which may result in protection devices failing to trip during faults. Consequently, the remaining SGs play a key role in offering sufficient SCC volumes. Given that the commitment of SGs is closely related to system load, DR can thus indirectly affect their SCC provision, a relationship that has not been investigated. Therefore, this paper incorporates both DR and SCC constraints into a unit commitment model and conducts studies on an IEEE 30-bus system. The results show that although DR can reduce social costs by lowering power demand, it may also lead to inadequate SCC levels. Nevertheless, the cost increases by only 0.3% when DR is combined with SCC constraints, indicating that DR can actually help achieve a stable system in a cost-effective manner.
comment: 1-5 pages. submitted to PESGM 2026, Canada
Learning a Network Digital Twin as a Hybrid System
Network digital twin (NDT) models are virtual models that replicate the behavior of physical communication networks and are considered a key technology component to enable novel features and capabilities in future 6G networks. In this work, we focus on NDTs that model the communication quality properties of a multi-cell, dynamically changing wireless network over a workspace populated with multiple moving users. We propose an NDT modeled as a hybrid system, where each mode corresponds to a different base station and comprises sub-modes that correspond to areas of the workspace with similar network characteristics. The proposed hybrid NDT is identified and continuously improved through an annealing optimization-based learning algorithm, driven by online data measurements collected by the users. The advantages of the proposed hybrid NDT are studied with respect to memory and computational efficiency, data consumption, and the ability to timely adapt to network changes. Finally, we validate the proposed methodology on real experimental data collected from a two-cell 5G testbed.
Which Top Energy-Intensive Manufacturing Countries Can Compete in a Renewable Energy Future?
In a world increasingly powered by renewables and aiming for greenhouse gas-neutral industrial production, the future competitiveness of todays top manufacturing countries is questioned. This study applies detailed energy system modeling to quantify the Renewable Pull, an incentive for industry relocation exerted by countries with favorable renewable conditions. Results reveal that the Renewable Pull is not a cross-industrial phenomenon but strongly depends on the relationship between energy costs and transport costs. The intensity of the Renewable Pull varies, with China, India, and Japan facing a significantly stronger effect than Germany and the United States. Incorporating national capital cost assumptions proves critical, reducing Germanys Renewable Pull by a factor of six and positioning it as the second least affected top manufacturing country after Saudi Arabia. Using Germany as a case study, the analysis moreover illustrates that targeted import strategies, especially within the EU, can nearly eliminate the Renewable Pull, offering policymakers clear options for risk mitigation.
comment: 29 pages, 16 figures
Multivariable Gradient-Based Extremum Seeking Control with Saturation Constraints
This paper addresses the multivariable gradient-based extremum seeking control (ESC) subject to saturation. Two distinct saturation scenarios are investigated here: saturation acting on the input of the function to be optimized, which is addressed using an anti-windup compensation strategy, and saturation affecting the gradient estimate. In both cases, the unknown Hessian matrix is represented using a polytopic uncertainty description, and sufficient conditions in the form of linear matrix inequalities (LMIs) are derived to design a stabilizing control gain. The proposed conditions guarantee exponential stability of the origin for the average closed-loop system under saturation constraints. With the proposed design conditions, non-diagonal control gain matrices can be obtained, generalizing conventional ESC designs that typically rely on diagonal structures. Stability and convergence are rigorously proven using the Averaging Theory for dynamical systems with Lipschitz continuous right-hand sides. Numerical simulations illustrate the effectiveness of the proposed ESC algorithms, confirming the convergence even in the presence of saturation.
comment: 15 pages, 6 figures
Supply Chain Exploitation of Secure ROS 2 Systems: A Proof-of-Concept on Autonomous Platform Compromise via Keystore Exfiltration
This paper presents a proof-of-concept supply chain attack against the Secure ROS 2 (SROS 2) framework, demonstrated on a Quanser QCar2 autonomous vehicle platform. A Trojan-infected Debian package modifies core ROS 2 security commands to exfiltrate newly generated keystore credentials via DNS in base64-encoded chunks to an attacker-controlled nameserver. Possession of these credentials enables the attacker to rejoin the SROS 2 network as an authenticated participant and publish spoofed control or perception messages without triggering authentication failures. We evaluate this capability on a secure ROS 2 Humble testbed configured for a four-stop-sign navigation routine using an Intel RealSense camera for perception. Experimental results show that control-topic injections can cause forced braking, sustained high-speed acceleration, and continuous turning loops, while perception-topic spoofing can induce phantom stop signs or suppress real detections. The attack generalizes to any data distribution service (DDS)-based robotic system using SROS 2, highlighting the need for both supply chain integrity controls and runtime semantic validation to safeguard autonomous systems against insider and impersonation threats.
comment: Author-accepted version (preprint). Presented at IEEE MILCOM 2025 Workshops, WS07: 2nd Workshop on Security, Resilience, and Robustness of Systems and Software (SRRSS), Los Angeles, Oct 2025. 6 pages. Primary: cs.CR; cross-lists: cs.RO, cs.OS. Program: https://milcom2025.ieee-milcom.org/workshop/ws07-2nd-workshop-security-resilient-and-robustness-systems-and-software/program
DCcluster-Opt: Benchmarking Dynamic Multi-Objective Optimization for Geo-Distributed Data Center Workloads NeurIPS 2025
The increasing energy demands and carbon footprint of large-scale AI require intelligent workload management in globally distributed data centers. Yet progress is limited by the absence of benchmarks that realistically capture the interplay of time-varying environmental factors (grid carbon intensity, electricity prices, weather), detailed data center physics (CPUs, GPUs, memory, HVAC energy), and geo-distributed network dynamics (latency and transmission costs). To bridge this gap, we present DCcluster-Opt: an open-source, high-fidelity simulation benchmark for sustainable, geo-temporal task scheduling. DCcluster-Opt combines curated real-world datasets, including AI workload traces, grid carbon intensity, electricity markets, weather across 20 global regions, cloud transmission costs, and empirical network delay parameters with physics-informed models of data center operations, enabling rigorous and reproducible research in sustainable computing. It presents a challenging scheduling problem where a top-level coordinating agent must dynamically reassign or defer tasks that arrive with resource and service-level agreement requirements across a configurable cluster of data centers to optimize multiple objectives. The environment also models advanced components such as heat recovery. A modular reward system enables an explicit study of trade-offs among carbon emissions, energy costs, service level agreements, and water use. It provides a Gymnasium API with baseline controllers, including reinforcement learning and rule-based strategies, to support reproducible ML research and a fair comparison of diverse algorithms. By offering a realistic, configurable, and accessible testbed, DCcluster-Opt accelerates the development and validation of next-generation sustainable computing solutions for geo-distributed data centers.
comment: Submitted to the NeurIPS 2025 conference
LC-Opt: Benchmarking Reinforcement Learning and Agentic AI for End-to-End Liquid Cooling Optimization in Data Centers NeurIPS 2025
Liquid cooling is critical for thermal management in high-density data centers with the rising AI workloads. However, machine learning-based controllers are essential to unlock greater energy efficiency and reliability, promoting sustainability. We present LC-Opt, a Sustainable Liquid Cooling (LC) benchmark environment, for reinforcement learning (RL) control strategies in energy-efficient liquid cooling of high-performance computing (HPC) systems. Built on the baseline of a high-fidelity digital twin of Oak Ridge National Lab's Frontier Supercomputer cooling system, LC-Opt provides detailed Modelica-based end-to-end models spanning site-level cooling towers to data center cabinets and server blade groups. RL agents optimize critical thermal controls like liquid supply temperature, flow rate, and granular valve actuation at the IT cabinet level, as well as cooling tower (CT) setpoints through a Gymnasium interface, with dynamic changes in workloads. This environment creates a multi-objective real-time optimization challenge balancing local thermal regulation and global energy efficiency, and also supports additional components like a heat recovery unit (HRU). We benchmark centralized and decentralized multi-agent RL approaches, demonstrate policy distillation into decision and regression trees for interpretable control, and explore LLM-based methods that explain control actions in natural language through an agentic mesh architecture designed to foster user trust and simplify system management. LC-Opt democratizes access to detailed, customizable liquid cooling models, enabling the ML community, operators, and vendors to develop sustainable data center liquid cooling control solutions.
comment: Submitted to the NeurIPS 2025 conference
Competitive Equilibrium for Electricity Markets with Spatially Flexible Loads
Electric vehicle charging and geo-distributed datacenters introduce spatially flexible loads (FLs) that couple power, transportation, and datacenter networks. These couplings create a closed-loop feedback between locational marginal prices (LMPs) and decisions of the FL systems, challenging the foundations of conventional competitive equilibrium (CE) in electricity markets. This paper studies a notion of generalized competitive equilibrium (GCE) that aims to capture such price-demand interactions across the interconnected infrastructures. We establish structural conditions under which the GCE preserves key properties of the conventional CE, including existence, uniqueness, and efficiency, without requiring detailed knowledge of decision processes for individual FL systems. The framework generalizes to settings where the grid is coupled with multiple FL systems. Stylized examples and case studies on the New York ISO grid, coupled with the Sioux Falls transportation and distributed datacenter networks, demonstrate the use of our theoretical framework and illustrate the mutual influence among the grid and the studied FL systems.
Non-Convex Over-the-Air Heterogeneous Federated Learning: A Bias-Variance Trade-off
Over-the-air (OTA) federated learning (FL) has been well recognized as a scalable paradigm that exploits the waveform superposition of the wireless multiple-access channel to aggregate model updates in a single use. Existing OTA-FL designs largely enforce zero-bias model updates by either assuming \emph{homogeneous} wireless conditions (equal path loss across devices) or forcing zero-bias updates to guarantee convergence. Under \emph{heterogeneous} wireless scenarios, however, such designs are constrained by the weakest device and inflate the update variance. Moreover, prior analyses of biased OTA-FL largely address convex objectives, while most modern AI models are highly non-convex. Motivated by these gaps, we study OTA-FL with stochastic gradient descent (SGD) for general smooth non-convex objectives under wireless heterogeneity. We develop novel OTA-FL SGD updates that allow a structured, time-invariant model bias while facilitating reduced variance updates. We derive a finite-time stationarity bound (expected time average squared gradient norm) that explicitly reveals a bias-variance trade-off. To optimize this trade-off, we pose a non-convex joint OTA power-control design and develop an efficient successive convex approximation (SCA) algorithm that requires only statistical CSI at the base station. Experiments on a non-convex image classification task validate the approach: the SCA-based design accelerates convergence via an optimized bias and improves generalization over prior OTA-FL baselines.
Data-Driven Stochastic Optimal Control in Reproducing Kernel Hilbert Spaces
This paper proposes a fully data-driven approach for optimal control of nonlinear control-affine systems represented by a stochastic diffusion. The focus is on the scenario where both the nonlinear dynamics and stage cost functions are unknown, while only a control penalty function and constraints are provided. To this end, we embed state probability densities into a reproducing kernel Hilbert space (RKHS) to leverage recent advances in operator regression, thereby identifying Markov transition operators associated with controlled diffusion processes. This operator learning approach integrates naturally with convex operator-theoretic Hamilton-Jacobi-Bellman recursions that scale linearly with state dimensionality, effectively solving a wide range of nonlinear optimal control problems. Numerical results demonstrate its ability to address diverse nonlinear control tasks, including the depth regulation of an autonomous underwater vehicle.
comment: author-submitted electronic preprint version: 19 pages, 5 figures, 3 tables
Observability for Nonlinear Systems: Connecting Variational Dynamics, Lyapunov Exponents, and Empirical Gramians
Observability quantification is a key problem in dynamic network sciences. While it has been thoroughly studied for linear systems, observability quantification for nonlinear networks is less intuitive and more cumbersome. One common approach to quantify observability for nonlinear systems is via the Empirical Gramian (Empr-Gram) -- a generalized form of the Gramian of linear systems. In this paper, we produce three new results. First, we establish that a variational form of discrete-time autonomous nonlinear systems yields a so-called Variational Gramian (Var-Gram) that is equivalent to the classic Empr-Gram; the former being easier to compute than the latter. Via Lyapunov exponents derived from Lyapunov's direct method, the paper's second result derives connections between existing observability measures and Var-Gram. The third result demonstrates the applicability of these new notions for sensor selection/placement in nonlinear systems. Numerical case studies demonstrate these three developments and their merits.
Partitioning and Observability in Linear Systems via Submodular Optimization
Network partitioning has gained recent attention as a pathway to enable decentralized operation and control in large-scale systems. This paper addresses the interplay between partitioning, observability, and sensor placement (SP) in dynamic networks. The problem, being computationally intractable at scale, is a largely unexplored, open problem in the literature. To that end, the paper's objective is designing scalable partitioning of linear systems while maximizing observability metrics of the subsystems. We show that the partitioning problem can be posed as a submodular maximization problem -- and the SP problem can subsequently be solved over the partitioned network. Consequently, theoretical bounds are derived to compare observability metrics of the original network with those of the resulting partitions, highlighting the impact of partitioning on system observability. Case studies on networks of varying sizes corroborate the derived theoretical bounds.
Analysis and Synthesis of Switched Optimization Algorithms
Deployment of optimization algorithms on networked systems face challenges associated with time delays and corruptions. One particular instance is the presence of time-varying delays arising from factors such as packet drops and irregular sampling. Fixed time delays can destabilize gradient descent algorithms, and this degradation is exacerbated by time-varying delays. This work concentrates on the analysis and creation of discrete-time optimization algorithms with certified exponential convergence rates that are robust against switched uncertainties between the optimizer and the gradient oracle. These optimization algorithms are implemented by a switch-scheduled output feedback controllers. Rate variation and sawtooth behavior (packet drops) in time-varying delays can be imposed through constraining switching sequences. Analysis is accomplished by bisection in the convergence rate to find Zames-Falb filter coefficents. Synthesis is performed by alternating between a filter coefficient search for a fixed controller, and a controller search for fixed multipliers.
Downlink Performance of Cell-Free Massive MIMO for LEO Satellite Mega-Constellation
Low-earth orbit (LEO) satellite communication (SatCom) has emerged as a promising technology to improve wireless connectivity in global areas. Cell-free massive multiple-input multiple-output (CF-mMIMO), an architecture proposed for next-generation networks, has yet to be fully explored for LEO satellites. In this paper, we investigate the downlink performance of a CF-mMIMO LEO SatCom network, where multiple satellite access points (SAPs) simultaneously serve the corresponding ground user terminals (UTs). Using tools from stochastic geometry, we model the locations of SAPs and UTs on surfaces of concentric spheres using Poisson point processes (PPPs) and present expressions on transmit and received signals, signal-to-interference-plus-noise ratio (SINR). Then, we derive the coverage probabilities in fading scenarios, considering significant system parameters such as the Nakagami fading parameter, the number of UTs, the number of SAPs, the orbital altitude, and the service range affected by the dome angle. Finally, the analytical model is verified by extensive Monte Carlo simulations. Simulation results indicate that stronger line-of-sight (LoS) effects and a more comprehensive service range of the UT result in a higher coverage probability, despite the presence of multi-user interference (MUI). Moreover, we found that there exist optimal numbers of UTs that maximize system capacity for different orbital altitudes and dome angles, providing valuable insights for system design.
Prescribed-Time Convergent Distributed Multiobjective Optimization With Dynamic Event-Triggered Communication
This paper addresses distributed constrained multiobjective resource allocation problems (DCMRAPs) in multi-agent networks, where agents face multiple conflicting local objectives under local and global constraints. By reformulating DCMRAPs as single-objective weighted $L_p$ problems, the proposed approach enables distributed solutions without relying on predefined weighting coefficients or centralized decision-making. Leveraging prescribed-time control and dynamic event-triggered mechanisms (ETMs), a novel distributed algorithm is proposed within a prescribed time through sampled communication. Using generalized time-based generators (TBGs), the algorithm provides more flexibility in optimizing solution accuracy and trajectory smoothness without the constraints of initial conditions. Novel dynamic ETMs, integrated with generalized TBGs, improve communication efficiency by adapting to local error metrics and network-based disagreements, while providing enhanced flexibility in balancing solution accuracy and communication frequency. The Zeno behavior is excluded. Validated by Lyapunov analysis and simulation experiments, our method demonstrates superior control performance and efficiency compared to existing methods, advancing distributed optimization across diverse applications.
comment: This work has been accepted and published in IEEE Transactions on Systems, Man, and Cybernetics: Systems
Kernel Mean Embedding Topology: Weak and Strong Forms for Stochastic Kernels and Implications for Model Learning
We introduce a novel topology, called Kernel Mean Embedding Topology, for stochastic kernels, in a weak and strong form. This topology, defined on the spaces of Bochner integrable functions from a signal space to a space of probability measures endowed with a Hilbert space structure, allows for a versatile formulation. This construction allows one to obtain both a strong and weak formulation. (i) For its weak formulation, we highlight the utility on relaxed policy spaces, and investigate connections with the Young narrow topology and Borkar (or \( w^* \))-topology, and establish equivalence properties. We report that, while both the \( w^* \)-topology and kernel mean embedding topology are relatively compact, they are not closed. Conversely, while the Young narrow topology is closed, it lacks relative compactness. (ii) We show that the strong form provides an appropriate formulation for placing topologies on spaces of models characterized by stochastic kernels with explicit robustness and learning theoretic implications on optimal stochastic control under discounted or average cost criteria. (iii) We thus show that this topology possesses several properties making it ideal to study optimality and approximations (under the weak formulation) and robustness (under the strong formulation) for many applications.
comment: 37 pages
Revealing Chaotic Dependence and Degree-Structure Mechanisms in Optimal Pinning Control of Complex Networks
Identifying an optimal set of driver nodes to achieve synchronization via pinning control is a fundamental challenge in complex network science, limited by computational intractability and the lack of general theory. Here, leveraging a degree-based mean-field (annealed) approximation from statistical physics, we analytically reveal how the structural degree distribution systematically governs synchronization performance, and derive an analytic characterization of the globally optimal pinning set and constructive algorithms with linear complexity (dominated by degree sorting, O(N+M). The optimal configuration exhibits a chaotic dependence--a discontinuous sensitivity--on its cardinality, whereby adding a single node can trigger abrupt changes in node composition and control effectiveness. This structural transition fundamentally challenges traditional heuristics that assume monotonic performance gains with budget. Systematic experiments on synthetic and empirical networks confirm that the proposed approach consistently outperforms degree-, betweenness-, and other centrality-based baselines. Furthermore, we quantify how key degree-distribution features--low-degree saturation, high-degree cutoff, and the power-law exponent--govern achievable synchronizability and shape the form of optimal sets. These results offer a systematic understanding of how degree heterogeneity shapes the network controllability. Our work establishes a unified link between degree heterogeneity and spectral controllability, offering both mechanistic insights and practical design rules for optimal driver-node selection in diverse complex systems.
comment: 16 pages, 6 figures; primary: eess.SY; cross-lists: cs.SY, math.OC. Submitted to IEEE TAC
Transformers as Implicit State Estimators: In-Context Learning in Dynamical Systems
Predicting the behavior of a dynamical system from noisy observations of its past outputs is a classical problem encountered across engineering and science. For linear systems with Gaussian inputs, the Kalman filter -- the best linear minimum mean-square error estimator of the state trajectory -- is optimal in the Bayesian sense. For nonlinear systems, Bayesian filtering is typically approached using suboptimal heuristics such as the Extended Kalman Filter (EKF), or numerical methods such as particle filtering (PF). In this work, we show that transformers, employed in an in-context learning (ICL) setting, can implicitly infer hidden states in order to predict the outputs of a wide family of dynamical systems, without test-time gradient updates or explicit knowledge of the system model. Specifically, when provided with a short context of past input-output pairs and, optionally, system parameters, a frozen transformer accurately predicts the current output. In linear-Gaussian regimes, its predictions closely match those of the Kalman filter; in nonlinear regimes, its performance approaches that of EKF and PF. Moreover, prediction accuracy degrades gracefully when key parameters, such as the state-transition matrix, are withheld from the context, demonstrating robustness and implicit parameter inference. These findings suggest that transformer in-context learning provides a flexible, non-parametric alternative for output prediction in dynamical systems, grounded in implicit latent-state estimation.
Action Chunking and Exploratory Data Collection Yield Exponential Improvements in Behavior Cloning for Continuous Control
This paper presents a theoretical analysis of two of the most impactful interventions in modern learning from demonstration in robotics and continuous control: the practice of action-chunking (predicting sequences of actions in open-loop) and exploratory augmentation of expert demonstrations. Though recent results show that learning from demonstration, also known as imitation learning (IL), can suffer errors that compound exponentially with task horizon in continuous settings, we demonstrate that action chunking and exploratory data collection circumvent exponential compounding errors in different regimes. Our results identify control-theoretic stability as the key mechanism underlying the benefits of these interventions. On the empirical side, we validate our predictions and the role of control-theoretic stability through experimentation on popular robot learning benchmarks. On the theoretical side, we demonstrate that the control-theoretic lens provides fine-grained insights into how compounding error arises, leading to tighter statistical guarantees on imitation learning error when these interventions are applied than previous techniques based on information-theoretic considerations alone.
comment: Updated manuscript. Added new experiments, figures, and exposition
Privacy Preservation by Local Design in Cooperative Networked Control Systems
In this paper, we study the privacy preservation problem in a cooperative networked control system, which has closed-loop dynamics, working for the task of linear quadratic Guassian (LQG) control. The system consists of a user and a server: the user owns the plant to control, while the server provides computation capability, and the user employs the server to compute control inputs for it. To enable the server's computation, the user needs to provide the measurements of the plant states to the server, who then calculates estimates of the states, based on which the control inputs are computed. However, the user regards the states as privacy, and makes an interesting request: the user wants the server to have "incorrect" knowledge of the state estimates rather than the true values. Regarding that, we propose a novel design methodology for the privacy preservation, in which the privacy scheme is locally equipped at the user side not open to the server, which manages to create a deviation in the server's knowledge of the state estimates from the true values. However, this methodology also raises significant challenges: in a closed-loop dynamic system, when the server's seized knowledge is incorrect, the system's behavior becomes complex to analyze; even the stability of the system becomes questionable, as the incorrectness will accumulate through the closed loop as time evolves. In this paper, we succeed in showing that the performance loss in LQG control caused by the proposed privacy scheme is bounded by rigorous mathematical proofs, which convinces the availability of the proposed design methodology. We also propose an associated novel privacy metric and obtain the analytical result on evaluating the privacy performance. Finally, we study the performance trade-off between privacy and control, where the accordingly proposed optimization problems are solved by numerical methods efficiently.
comment: 14 pages, 7 figures
Optimal and Heuristic Approaches for Platooning Systems with Deadlines
Efficient truck platooning is a key strategy for reducing freight costs, lowering fuel consumption, and mitigating emissions. Deadlines are critical in this context, as trucks must depart within specific time windows to meet delivery requirements and avoid penalties. In this paper, we investigate the optimal formation and dispatch of truck platoons at a highway station with finite capacity \(L\) and deadline constraints \(T\). The system operates in discrete time, with each arriving truck assigned a deadline of \(T\) slot units. The objective is to leverage the efficiency gains from forming large platoons while accounting for waiting costs and deadline violations. We formulate the problem as a Markov decision process and analyze the structure of the optimal policy \(\pi^\star\) for \(L = 3\), extending insights to arbitrary \(L\). We prove certain monotonicity properties of the optimal policy in the state space \(\mathcal{S}\) and identify classes of unreachable states. Moreover, since the size of \(\mathcal{S}\) grows exponentially with \(L\) and \(T\), we propose heuristics--including conditional and deep-learning based approaches--that exploit these structural insights while maintaining low computational complexity.
Systems and Control (EESS)
Technical Report for Dissipativity Learning in Reproducing Kernel Hilbert Space
This work presents a nonparametric framework for dissipativity learning in reproducing kernel Hilbert spaces, which enables data-driven certification of stability and performance properties for unknown nonlinear systems without requiring an explicit dynamic model. Dissipativity is a fundamental system property that generalizes Lyapunov stability, passivity, and finite L2 gain conditions through an energy balance inequality between a storage function and a supply rate. Unlike prior parametric formulations that approximate these functions using quadratic forms with fixed matrices, the proposed method represents them as Hilbert Schmidt operators acting on canonical kernel features, thereby capturing nonlinearities implicitly while preserving convexity and analytic tractability. The resulting operator optimization problem is formulated in the form of a one-class support vector machine and reduced, via the representer theorem, to a finite dimensional convex program expressed through kernel Gram matrices. Furthermore, statistical learning theory is applied to establish generalization guarantees, including confidence bounds on the dissipation rate and the L2 gain. Numerical results demonstrate that the proposed RKHS based dissipativity learning method effectively identifies nonlinear dissipative behavior directly from input output data, providing a powerful and interpretable framework for model free control analysis and synthesis.
comment: 26 pages, 3 figures
SpecAttn: Speculating Sparse Attention NeurIPS 2025
Large Language Models (LLMs) face significant computational bottlenecks during inference due to the quadratic complexity of self-attention mechanisms, particularly as context lengths increase. We introduce SpecAttn, a novel training-free approach that seamlessly integrates with existing speculative decoding techniques to enable efficient sparse attention in pre-trained transformers. Our key insight is to exploit the attention weights already computed by the draft model during speculative decoding to identify important tokens for the target model, eliminating redundant computation while maintaining output quality. SpecAttn employs three core techniques: KL divergence-based layer alignment between draft and target models, a GPU-optimized sorting-free algorithm for top-p token selection from draft attention patterns, and dynamic key-value cache pruning guided by these predictions. By leveraging the computational work already performed in standard speculative decoding pipelines, SpecAttn achieves over 75% reduction in key-value cache accesses with a mere 15.29% increase in perplexity on the PG-19 dataset, significantly outperforming existing sparse attention methods. Our approach demonstrates that speculative execution can be enhanced to provide approximate verification without significant performance degradation.
comment: Accepted to NeurIPS 2025 Workshop on Structured Probabilistic Inference & Generative Modeling
Risk-constrained stochastic scheduling of multi-market energy storage systems
Energy storage can promote the integration of renewables by operating with charge and discharge policies that balance an intermittent power supply. This study investigates the scheduling of energy storage assets under energy price uncertainty, with a focus on electricity markets. A two-stage stochastic risk-constrained approach is employed, whereby electricity price trajectories or specific power markets are observed, allowing for recourse in the schedule. Conditional value-at-risk is used to quantify tail risk in the optimization problems; this allows for the explicit specification of a probabilistic risk limit. The proposed approach is tested in an integrated hydrogen system (IHS) and a battery energy storage system (BESS). In the joint design and operation context for the IHS, the risk constraint results in larger installed unit capacities, increasing capital cost but enabling more energy inventory to buffer price uncertainty. As shown in both case studies, there is an operational trade-off between risk and expected reward; this is reflected in higher expected costs (or lower expected profits) with increasing levels of risk aversion. Despite the decrease in expected reward, both systems exhibit substantial benefits of increasing risk aversion. This work provides a general method to address uncertainties in energy storage scheduling, allowing operators to input their level of risk tolerance on asset decisions.
comment: 39 pages, 10 figures, 7 tables
Context-Aware Stochastic Modeling of Consumer Energy Resource Aggregators in Electricity Markets SC
Aggregators of consumer energy resources (CERs) like rooftop solar and battery energy storage (BES) face challenges due to their inherent uncertainties. A sensible approach is to use stochastic optimization to handle such uncertainties, which can lead to infeasible problems or loss in revenues if not chosen appropriately. This paper presents three efficient two-stage stochastic optimization methods: risk-neutral, robust, and chance-constrained, to address the impact of CER uncertainties for aggregators who participate in energy and regulation services markets in the Australian National Electricity Market. Furthermore, these methods utilize the flexibility of BES, considering precise state-of-charge dynamics and complementarity constraints, aiming for scalable performance while managing uncertainty. The problems are formed as two-stage stochastic mixed-integer linear programs, with relaxations adopted for large scenario sets. The solution approach employs scenario-based methodologies and affine recourse policies to obtain tractable reformulations. These methods are evaluated across use cases reflecting diverse operational and market settings, uncertainty characteristics, and decision-making preferences, demonstrating their ability to mitigate uncertainty, enhance profitability, and provide context-aware guidance for aggregators in choosing the most appropriate stochastic optimization method.
comment: Submitted to PSCC 2026
A Switching Strategy for Event-Trigger Control of Spacecraft Rendezvous
This paper presents the design of a state-feedback control law for spacecraft rendezvous, formulated using the Hill-Clohessy-Wiltshire equations. The proposed method introduces an impulsive control strategy to regulate thruster operations. Specifically, a state-dependent switching framework is developed to determine both the control input magnitudes and the precise state conditions that trigger thruster activation. The nonlinear control law is derived using principles from automatic control theory, particularly Lyapunov stability analysis and the Linear Matrix Inequality framework. The resulting closed-loop system is proven to be stable, while simultaneously minimizing the total number of actuation events. The effectiveness of the proposed method is demonstrated through a numerical case study, which includes a comparative analysis with a standard Model Predictive Control scheme, highlighting the advantages and trade-offs of the developed control structure.
comment: Submitted for EuroGNC 2026
Simplifying Preference Elicitation in Local Energy Markets: Combinatorial Clock Exchange
As distributed energy resources (DERs) proliferate, future power system will need new market platforms enabling prosumers to trade various electricity and grid-support products. However, prosumers often exhibit complex, product interdependent preferences and face limited cognitive and computational resources, hindering engagement with complex market structures and bid formats. We address this challenge by introducing a multi-product market that allows prosumers to express complex preferences through an intuitive format, by fusing combinatorial clock exchange and machine learning (ML) techniques. The iterative mechanism only requires prosumers to report their preferred package of products at posted prices, eliminating the need for forecasting product prices or adhering to complex bid formats, while the ML-aided price discovery speeds up convergence. The linear pricing rule further enhances transparency and interpretability. Finally, numerical simulations demonstrate convergence to clearing prices in approximately 15 clock iterations.
Value of Multi-pursuer Single-evader Pursuit-evasion Game with Terminal Cost of Evader's Position: Relaxation of Convexity Condition
In this study, we consider a multi-pursuer single-evader quantitative pursuit-evasion game with payoff function that includes only the terminal cost. The terminal cost is a function related only to the terminal position of the evader. This problem has been extensively studied in target defense games. Here, we prove that a candidate for the value function generated by geometric method is the viscosity solution of the corresponding Hamilton-Jacobi-Isaacs partial differential equation (HJI PDE) Dirichlet problem. Therefore, the value function of the game at each point can be computed by a mathematical program. In our work, the convexity of the terminal cost or the target is not required. The terminal cost only needs to be locally Lipschitz continuous. The cases in which the terminal costs or the targets are not convex are covered. Therefore, our result is more universal than those of previous studies, and the complexity of the proof is improved. We also discuss the optimal strategies in this game and present an intuitive explanation of this value function.
comment: 21 pages, 6 figures
Solving Infinite-Horizon Optimal Control Problems using the Extreme Theory of Functional Connections
This paper presents a physics-informed machine learning approach for synthesizing optimal feedback control policy for infinite-horizon optimal control problems by solving the Hamilton-Jacobi-Bellman (HJB) partial differential equation(PDE). The optimal control policy is derived analytically for affine dynamical systems with separable and strictly convex control costs, expressed as a function of the gradient of the value function. The resulting HJB-PDE is then solved by approximating the value function using the Extreme Theory of Functional Connections (X-TFC) - a hybrid approach that combines the Theory of Functional Connections (TFC) with the Extreme Learning Machine (ELM) algorithm. This approach ensures analytical satisfaction of boundary conditions and significantly reduces training cost compared to traditional Physics-Informed Neural Networks (PINNs). We benchmark the method on linear and non-linear systems with known analytical solutions as well as demonstrate its effectiveness on control tasks such as spacecraft optimal de-tumbling control.
comment: Accepted to Indian Control Conference (ICC-11), 6 pages, 12 figures
Optimal BESS Sizing and Placement for Mitigating EV-Induced Voltage Violations: A Scalable Spatio-Temporal Adaptive Targeting Strategy
The escalating adoption of electric vehicles (EVs) and the growing demand for charging solutions are driving a surge in EV charger installations in distribution networks. However, this rising EV load strains the distribution grid, causing severe voltage drops, particularly at feeder extremities. This study proposes a proactive voltage management (PVM) framework that can integrate Monte Carlo-based simulations of varying EV charging loads to (i) identify potential voltage violations through a voltage violation analysis (VVA) model, and (ii) then mitigate those violations with optimally-invested battery energy storage systems (BESS) through an optimal expansion planning (OEP) model. A novel spatio-temporal adaptive targeting (STAT) strategy is proposed to alleviate the computational complexity of the OEP model by defining a targeted OEP (T-OEP) model, solved by applying the OEP model to (i) a reduced set of representative critical time periods and (ii) candidate BESS installation nodes. The efficacy and scalability of the proposed approach are validated on 33-bus, 69-bus, and a large-scale 240-bus system. Results demonstrate that the strategic sizing and placement of BESS not only effectively mitigate voltage violations but also yield substantial cost savings on electricity purchases under time-of-use tariffs. This research offers a cost-effective and scalable solution for integrating high penetrations of EVs, providing crucial insights for future distribution network planning.
Analyzing the Impact of Demand Response on Short-Circuit Current via a Unit Commitment Model
In low-carbon grids, system flexibility can be enhanced through mechanisms such as Demand Response (DR), enabling the efficient utilization of renewable energy. However, as Synchronous Generators (SGs) are being replaced with renewable energy characterized by Inverter-Based Resources (IBR), system stability is severely affected. Due to the limited overload capability of IBR, their Short-Circuit Current (SCC) contribution is much smaller than that of SGs, which may result in protection devices failing to trip during faults. Consequently, the remaining SGs play a key role in offering sufficient SCC volumes. Given that the commitment of SGs is closely related to system load, DR can thus indirectly affect their SCC provision, a relationship that has not been investigated. Therefore, this paper incorporates both DR and SCC constraints into a unit commitment model and conducts studies on an IEEE 30-bus system. The results show that although DR can reduce social costs by lowering power demand, it may also lead to inadequate SCC levels. Nevertheless, the cost increases by only 0.3% when DR is combined with SCC constraints, indicating that DR can actually help achieve a stable system in a cost-effective manner.
comment: 1-5 pages. submitted to PESGM 2026, Canada
Learning a Network Digital Twin as a Hybrid System
Network digital twin (NDT) models are virtual models that replicate the behavior of physical communication networks and are considered a key technology component to enable novel features and capabilities in future 6G networks. In this work, we focus on NDTs that model the communication quality properties of a multi-cell, dynamically changing wireless network over a workspace populated with multiple moving users. We propose an NDT modeled as a hybrid system, where each mode corresponds to a different base station and comprises sub-modes that correspond to areas of the workspace with similar network characteristics. The proposed hybrid NDT is identified and continuously improved through an annealing optimization-based learning algorithm, driven by online data measurements collected by the users. The advantages of the proposed hybrid NDT are studied with respect to memory and computational efficiency, data consumption, and the ability to timely adapt to network changes. Finally, we validate the proposed methodology on real experimental data collected from a two-cell 5G testbed.
Which Top Energy-Intensive Manufacturing Countries Can Compete in a Renewable Energy Future?
In a world increasingly powered by renewables and aiming for greenhouse gas-neutral industrial production, the future competitiveness of todays top manufacturing countries is questioned. This study applies detailed energy system modeling to quantify the Renewable Pull, an incentive for industry relocation exerted by countries with favorable renewable conditions. Results reveal that the Renewable Pull is not a cross-industrial phenomenon but strongly depends on the relationship between energy costs and transport costs. The intensity of the Renewable Pull varies, with China, India, and Japan facing a significantly stronger effect than Germany and the United States. Incorporating national capital cost assumptions proves critical, reducing Germanys Renewable Pull by a factor of six and positioning it as the second least affected top manufacturing country after Saudi Arabia. Using Germany as a case study, the analysis moreover illustrates that targeted import strategies, especially within the EU, can nearly eliminate the Renewable Pull, offering policymakers clear options for risk mitigation.
comment: 29 pages, 16 figures
Multivariable Gradient-Based Extremum Seeking Control with Saturation Constraints
This paper addresses the multivariable gradient-based extremum seeking control (ESC) subject to saturation. Two distinct saturation scenarios are investigated here: saturation acting on the input of the function to be optimized, which is addressed using an anti-windup compensation strategy, and saturation affecting the gradient estimate. In both cases, the unknown Hessian matrix is represented using a polytopic uncertainty description, and sufficient conditions in the form of linear matrix inequalities (LMIs) are derived to design a stabilizing control gain. The proposed conditions guarantee exponential stability of the origin for the average closed-loop system under saturation constraints. With the proposed design conditions, non-diagonal control gain matrices can be obtained, generalizing conventional ESC designs that typically rely on diagonal structures. Stability and convergence are rigorously proven using the Averaging Theory for dynamical systems with Lipschitz continuous right-hand sides. Numerical simulations illustrate the effectiveness of the proposed ESC algorithms, confirming the convergence even in the presence of saturation.
comment: 15 pages, 6 figures
Supply Chain Exploitation of Secure ROS 2 Systems: A Proof-of-Concept on Autonomous Platform Compromise via Keystore Exfiltration
This paper presents a proof-of-concept supply chain attack against the Secure ROS 2 (SROS 2) framework, demonstrated on a Quanser QCar2 autonomous vehicle platform. A Trojan-infected Debian package modifies core ROS 2 security commands to exfiltrate newly generated keystore credentials via DNS in base64-encoded chunks to an attacker-controlled nameserver. Possession of these credentials enables the attacker to rejoin the SROS 2 network as an authenticated participant and publish spoofed control or perception messages without triggering authentication failures. We evaluate this capability on a secure ROS 2 Humble testbed configured for a four-stop-sign navigation routine using an Intel RealSense camera for perception. Experimental results show that control-topic injections can cause forced braking, sustained high-speed acceleration, and continuous turning loops, while perception-topic spoofing can induce phantom stop signs or suppress real detections. The attack generalizes to any data distribution service (DDS)-based robotic system using SROS 2, highlighting the need for both supply chain integrity controls and runtime semantic validation to safeguard autonomous systems against insider and impersonation threats.
comment: Author-accepted version (preprint). Presented at IEEE MILCOM 2025 Workshops, WS07: 2nd Workshop on Security, Resilience, and Robustness of Systems and Software (SRRSS), Los Angeles, Oct 2025. 6 pages. Primary: cs.CR; cross-lists: cs.RO, cs.OS. Program: https://milcom2025.ieee-milcom.org/workshop/ws07-2nd-workshop-security-resilient-and-robustness-systems-and-software/program
DCcluster-Opt: Benchmarking Dynamic Multi-Objective Optimization for Geo-Distributed Data Center Workloads NeurIPS 2025
The increasing energy demands and carbon footprint of large-scale AI require intelligent workload management in globally distributed data centers. Yet progress is limited by the absence of benchmarks that realistically capture the interplay of time-varying environmental factors (grid carbon intensity, electricity prices, weather), detailed data center physics (CPUs, GPUs, memory, HVAC energy), and geo-distributed network dynamics (latency and transmission costs). To bridge this gap, we present DCcluster-Opt: an open-source, high-fidelity simulation benchmark for sustainable, geo-temporal task scheduling. DCcluster-Opt combines curated real-world datasets, including AI workload traces, grid carbon intensity, electricity markets, weather across 20 global regions, cloud transmission costs, and empirical network delay parameters with physics-informed models of data center operations, enabling rigorous and reproducible research in sustainable computing. It presents a challenging scheduling problem where a top-level coordinating agent must dynamically reassign or defer tasks that arrive with resource and service-level agreement requirements across a configurable cluster of data centers to optimize multiple objectives. The environment also models advanced components such as heat recovery. A modular reward system enables an explicit study of trade-offs among carbon emissions, energy costs, service level agreements, and water use. It provides a Gymnasium API with baseline controllers, including reinforcement learning and rule-based strategies, to support reproducible ML research and a fair comparison of diverse algorithms. By offering a realistic, configurable, and accessible testbed, DCcluster-Opt accelerates the development and validation of next-generation sustainable computing solutions for geo-distributed data centers.
comment: Submitted to the NeurIPS 2025 conference
LC-Opt: Benchmarking Reinforcement Learning and Agentic AI for End-to-End Liquid Cooling Optimization in Data Centers NeurIPS 2025
Liquid cooling is critical for thermal management in high-density data centers with the rising AI workloads. However, machine learning-based controllers are essential to unlock greater energy efficiency and reliability, promoting sustainability. We present LC-Opt, a Sustainable Liquid Cooling (LC) benchmark environment, for reinforcement learning (RL) control strategies in energy-efficient liquid cooling of high-performance computing (HPC) systems. Built on the baseline of a high-fidelity digital twin of Oak Ridge National Lab's Frontier Supercomputer cooling system, LC-Opt provides detailed Modelica-based end-to-end models spanning site-level cooling towers to data center cabinets and server blade groups. RL agents optimize critical thermal controls like liquid supply temperature, flow rate, and granular valve actuation at the IT cabinet level, as well as cooling tower (CT) setpoints through a Gymnasium interface, with dynamic changes in workloads. This environment creates a multi-objective real-time optimization challenge balancing local thermal regulation and global energy efficiency, and also supports additional components like a heat recovery unit (HRU). We benchmark centralized and decentralized multi-agent RL approaches, demonstrate policy distillation into decision and regression trees for interpretable control, and explore LLM-based methods that explain control actions in natural language through an agentic mesh architecture designed to foster user trust and simplify system management. LC-Opt democratizes access to detailed, customizable liquid cooling models, enabling the ML community, operators, and vendors to develop sustainable data center liquid cooling control solutions.
comment: Submitted to the NeurIPS 2025 conference
Competitive Equilibrium for Electricity Markets with Spatially Flexible Loads
Electric vehicle charging and geo-distributed datacenters introduce spatially flexible loads (FLs) that couple power, transportation, and datacenter networks. These couplings create a closed-loop feedback between locational marginal prices (LMPs) and decisions of the FL systems, challenging the foundations of conventional competitive equilibrium (CE) in electricity markets. This paper studies a notion of generalized competitive equilibrium (GCE) that aims to capture such price-demand interactions across the interconnected infrastructures. We establish structural conditions under which the GCE preserves key properties of the conventional CE, including existence, uniqueness, and efficiency, without requiring detailed knowledge of decision processes for individual FL systems. The framework generalizes to settings where the grid is coupled with multiple FL systems. Stylized examples and case studies on the New York ISO grid, coupled with the Sioux Falls transportation and distributed datacenter networks, demonstrate the use of our theoretical framework and illustrate the mutual influence among the grid and the studied FL systems.
Non-Convex Over-the-Air Heterogeneous Federated Learning: A Bias-Variance Trade-off
Over-the-air (OTA) federated learning (FL) has been well recognized as a scalable paradigm that exploits the waveform superposition of the wireless multiple-access channel to aggregate model updates in a single use. Existing OTA-FL designs largely enforce zero-bias model updates by either assuming \emph{homogeneous} wireless conditions (equal path loss across devices) or forcing zero-bias updates to guarantee convergence. Under \emph{heterogeneous} wireless scenarios, however, such designs are constrained by the weakest device and inflate the update variance. Moreover, prior analyses of biased OTA-FL largely address convex objectives, while most modern AI models are highly non-convex. Motivated by these gaps, we study OTA-FL with stochastic gradient descent (SGD) for general smooth non-convex objectives under wireless heterogeneity. We develop novel OTA-FL SGD updates that allow a structured, time-invariant model bias while facilitating reduced variance updates. We derive a finite-time stationarity bound (expected time average squared gradient norm) that explicitly reveals a bias-variance trade-off. To optimize this trade-off, we pose a non-convex joint OTA power-control design and develop an efficient successive convex approximation (SCA) algorithm that requires only statistical CSI at the base station. Experiments on a non-convex image classification task validate the approach: the SCA-based design accelerates convergence via an optimized bias and improves generalization over prior OTA-FL baselines.
Data-Driven Stochastic Optimal Control in Reproducing Kernel Hilbert Spaces
This paper proposes a fully data-driven approach for optimal control of nonlinear control-affine systems represented by a stochastic diffusion. The focus is on the scenario where both the nonlinear dynamics and stage cost functions are unknown, while only a control penalty function and constraints are provided. To this end, we embed state probability densities into a reproducing kernel Hilbert space (RKHS) to leverage recent advances in operator regression, thereby identifying Markov transition operators associated with controlled diffusion processes. This operator learning approach integrates naturally with convex operator-theoretic Hamilton-Jacobi-Bellman recursions that scale linearly with state dimensionality, effectively solving a wide range of nonlinear optimal control problems. Numerical results demonstrate its ability to address diverse nonlinear control tasks, including the depth regulation of an autonomous underwater vehicle.
comment: author-submitted electronic preprint version: 19 pages, 5 figures, 3 tables
Observability for Nonlinear Systems: Connecting Variational Dynamics, Lyapunov Exponents, and Empirical Gramians
Observability quantification is a key problem in dynamic network sciences. While it has been thoroughly studied for linear systems, observability quantification for nonlinear networks is less intuitive and more cumbersome. One common approach to quantify observability for nonlinear systems is via the Empirical Gramian (Empr-Gram) -- a generalized form of the Gramian of linear systems. In this paper, we produce three new results. First, we establish that a variational form of discrete-time autonomous nonlinear systems yields a so-called Variational Gramian (Var-Gram) that is equivalent to the classic Empr-Gram; the former being easier to compute than the latter. Via Lyapunov exponents derived from Lyapunov's direct method, the paper's second result derives connections between existing observability measures and Var-Gram. The third result demonstrates the applicability of these new notions for sensor selection/placement in nonlinear systems. Numerical case studies demonstrate these three developments and their merits.
Partitioning and Observability in Linear Systems via Submodular Optimization
Network partitioning has gained recent attention as a pathway to enable decentralized operation and control in large-scale systems. This paper addresses the interplay between partitioning, observability, and sensor placement (SP) in dynamic networks. The problem, being computationally intractable at scale, is a largely unexplored, open problem in the literature. To that end, the paper's objective is designing scalable partitioning of linear systems while maximizing observability metrics of the subsystems. We show that the partitioning problem can be posed as a submodular maximization problem -- and the SP problem can subsequently be solved over the partitioned network. Consequently, theoretical bounds are derived to compare observability metrics of the original network with those of the resulting partitions, highlighting the impact of partitioning on system observability. Case studies on networks of varying sizes corroborate the derived theoretical bounds.
Analysis and Synthesis of Switched Optimization Algorithms
Deployment of optimization algorithms on networked systems face challenges associated with time delays and corruptions. One particular instance is the presence of time-varying delays arising from factors such as packet drops and irregular sampling. Fixed time delays can destabilize gradient descent algorithms, and this degradation is exacerbated by time-varying delays. This work concentrates on the analysis and creation of discrete-time optimization algorithms with certified exponential convergence rates that are robust against switched uncertainties between the optimizer and the gradient oracle. These optimization algorithms are implemented by a switch-scheduled output feedback controllers. Rate variation and sawtooth behavior (packet drops) in time-varying delays can be imposed through constraining switching sequences. Analysis is accomplished by bisection in the convergence rate to find Zames-Falb filter coefficents. Synthesis is performed by alternating between a filter coefficient search for a fixed controller, and a controller search for fixed multipliers.
Downlink Performance of Cell-Free Massive MIMO for LEO Satellite Mega-Constellation
Low-earth orbit (LEO) satellite communication (SatCom) has emerged as a promising technology to improve wireless connectivity in global areas. Cell-free massive multiple-input multiple-output (CF-mMIMO), an architecture proposed for next-generation networks, has yet to be fully explored for LEO satellites. In this paper, we investigate the downlink performance of a CF-mMIMO LEO SatCom network, where multiple satellite access points (SAPs) simultaneously serve the corresponding ground user terminals (UTs). Using tools from stochastic geometry, we model the locations of SAPs and UTs on surfaces of concentric spheres using Poisson point processes (PPPs) and present expressions on transmit and received signals, signal-to-interference-plus-noise ratio (SINR). Then, we derive the coverage probabilities in fading scenarios, considering significant system parameters such as the Nakagami fading parameter, the number of UTs, the number of SAPs, the orbital altitude, and the service range affected by the dome angle. Finally, the analytical model is verified by extensive Monte Carlo simulations. Simulation results indicate that stronger line-of-sight (LoS) effects and a more comprehensive service range of the UT result in a higher coverage probability, despite the presence of multi-user interference (MUI). Moreover, we found that there exist optimal numbers of UTs that maximize system capacity for different orbital altitudes and dome angles, providing valuable insights for system design.
Prescribed-Time Convergent Distributed Multiobjective Optimization With Dynamic Event-Triggered Communication
This paper addresses distributed constrained multiobjective resource allocation problems (DCMRAPs) in multi-agent networks, where agents face multiple conflicting local objectives under local and global constraints. By reformulating DCMRAPs as single-objective weighted $L_p$ problems, the proposed approach enables distributed solutions without relying on predefined weighting coefficients or centralized decision-making. Leveraging prescribed-time control and dynamic event-triggered mechanisms (ETMs), a novel distributed algorithm is proposed within a prescribed time through sampled communication. Using generalized time-based generators (TBGs), the algorithm provides more flexibility in optimizing solution accuracy and trajectory smoothness without the constraints of initial conditions. Novel dynamic ETMs, integrated with generalized TBGs, improve communication efficiency by adapting to local error metrics and network-based disagreements, while providing enhanced flexibility in balancing solution accuracy and communication frequency. The Zeno behavior is excluded. Validated by Lyapunov analysis and simulation experiments, our method demonstrates superior control performance and efficiency compared to existing methods, advancing distributed optimization across diverse applications.
comment: This work has been accepted and published in IEEE Transactions on Systems, Man, and Cybernetics: Systems
Kernel Mean Embedding Topology: Weak and Strong Forms for Stochastic Kernels and Implications for Model Learning
We introduce a novel topology, called Kernel Mean Embedding Topology, for stochastic kernels, in a weak and strong form. This topology, defined on the spaces of Bochner integrable functions from a signal space to a space of probability measures endowed with a Hilbert space structure, allows for a versatile formulation. This construction allows one to obtain both a strong and weak formulation. (i) For its weak formulation, we highlight the utility on relaxed policy spaces, and investigate connections with the Young narrow topology and Borkar (or \( w^* \))-topology, and establish equivalence properties. We report that, while both the \( w^* \)-topology and kernel mean embedding topology are relatively compact, they are not closed. Conversely, while the Young narrow topology is closed, it lacks relative compactness. (ii) We show that the strong form provides an appropriate formulation for placing topologies on spaces of models characterized by stochastic kernels with explicit robustness and learning theoretic implications on optimal stochastic control under discounted or average cost criteria. (iii) We thus show that this topology possesses several properties making it ideal to study optimality and approximations (under the weak formulation) and robustness (under the strong formulation) for many applications.
comment: 37 pages
Revealing Chaotic Dependence and Degree-Structure Mechanisms in Optimal Pinning Control of Complex Networks
Identifying an optimal set of driver nodes to achieve synchronization via pinning control is a fundamental challenge in complex network science, limited by computational intractability and the lack of general theory. Here, leveraging a degree-based mean-field (annealed) approximation from statistical physics, we analytically reveal how the structural degree distribution systematically governs synchronization performance, and derive an analytic characterization of the globally optimal pinning set and constructive algorithms with linear complexity (dominated by degree sorting, O(N+M). The optimal configuration exhibits a chaotic dependence--a discontinuous sensitivity--on its cardinality, whereby adding a single node can trigger abrupt changes in node composition and control effectiveness. This structural transition fundamentally challenges traditional heuristics that assume monotonic performance gains with budget. Systematic experiments on synthetic and empirical networks confirm that the proposed approach consistently outperforms degree-, betweenness-, and other centrality-based baselines. Furthermore, we quantify how key degree-distribution features--low-degree saturation, high-degree cutoff, and the power-law exponent--govern achievable synchronizability and shape the form of optimal sets. These results offer a systematic understanding of how degree heterogeneity shapes the network controllability. Our work establishes a unified link between degree heterogeneity and spectral controllability, offering both mechanistic insights and practical design rules for optimal driver-node selection in diverse complex systems.
comment: 16 pages, 6 figures; primary: eess.SY; cross-lists: cs.SY, math.OC. Submitted to IEEE TAC
Transformers as Implicit State Estimators: In-Context Learning in Dynamical Systems
Predicting the behavior of a dynamical system from noisy observations of its past outputs is a classical problem encountered across engineering and science. For linear systems with Gaussian inputs, the Kalman filter -- the best linear minimum mean-square error estimator of the state trajectory -- is optimal in the Bayesian sense. For nonlinear systems, Bayesian filtering is typically approached using suboptimal heuristics such as the Extended Kalman Filter (EKF), or numerical methods such as particle filtering (PF). In this work, we show that transformers, employed in an in-context learning (ICL) setting, can implicitly infer hidden states in order to predict the outputs of a wide family of dynamical systems, without test-time gradient updates or explicit knowledge of the system model. Specifically, when provided with a short context of past input-output pairs and, optionally, system parameters, a frozen transformer accurately predicts the current output. In linear-Gaussian regimes, its predictions closely match those of the Kalman filter; in nonlinear regimes, its performance approaches that of EKF and PF. Moreover, prediction accuracy degrades gracefully when key parameters, such as the state-transition matrix, are withheld from the context, demonstrating robustness and implicit parameter inference. These findings suggest that transformer in-context learning provides a flexible, non-parametric alternative for output prediction in dynamical systems, grounded in implicit latent-state estimation.
Action Chunking and Exploratory Data Collection Yield Exponential Improvements in Behavior Cloning for Continuous Control
This paper presents a theoretical analysis of two of the most impactful interventions in modern learning from demonstration in robotics and continuous control: the practice of action-chunking (predicting sequences of actions in open-loop) and exploratory augmentation of expert demonstrations. Though recent results show that learning from demonstration, also known as imitation learning (IL), can suffer errors that compound exponentially with task horizon in continuous settings, we demonstrate that action chunking and exploratory data collection circumvent exponential compounding errors in different regimes. Our results identify control-theoretic stability as the key mechanism underlying the benefits of these interventions. On the empirical side, we validate our predictions and the role of control-theoretic stability through experimentation on popular robot learning benchmarks. On the theoretical side, we demonstrate that the control-theoretic lens provides fine-grained insights into how compounding error arises, leading to tighter statistical guarantees on imitation learning error when these interventions are applied than previous techniques based on information-theoretic considerations alone.
comment: Updated manuscript. Added new experiments, figures, and exposition
Privacy Preservation by Local Design in Cooperative Networked Control Systems
In this paper, we study the privacy preservation problem in a cooperative networked control system, which has closed-loop dynamics, working for the task of linear quadratic Guassian (LQG) control. The system consists of a user and a server: the user owns the plant to control, while the server provides computation capability, and the user employs the server to compute control inputs for it. To enable the server's computation, the user needs to provide the measurements of the plant states to the server, who then calculates estimates of the states, based on which the control inputs are computed. However, the user regards the states as privacy, and makes an interesting request: the user wants the server to have "incorrect" knowledge of the state estimates rather than the true values. Regarding that, we propose a novel design methodology for the privacy preservation, in which the privacy scheme is locally equipped at the user side not open to the server, which manages to create a deviation in the server's knowledge of the state estimates from the true values. However, this methodology also raises significant challenges: in a closed-loop dynamic system, when the server's seized knowledge is incorrect, the system's behavior becomes complex to analyze; even the stability of the system becomes questionable, as the incorrectness will accumulate through the closed loop as time evolves. In this paper, we succeed in showing that the performance loss in LQG control caused by the proposed privacy scheme is bounded by rigorous mathematical proofs, which convinces the availability of the proposed design methodology. We also propose an associated novel privacy metric and obtain the analytical result on evaluating the privacy performance. Finally, we study the performance trade-off between privacy and control, where the accordingly proposed optimization problems are solved by numerical methods efficiently.
comment: 14 pages, 7 figures
Optimal and Heuristic Approaches for Platooning Systems with Deadlines
Efficient truck platooning is a key strategy for reducing freight costs, lowering fuel consumption, and mitigating emissions. Deadlines are critical in this context, as trucks must depart within specific time windows to meet delivery requirements and avoid penalties. In this paper, we investigate the optimal formation and dispatch of truck platoons at a highway station with finite capacity \(L\) and deadline constraints \(T\). The system operates in discrete time, with each arriving truck assigned a deadline of \(T\) slot units. The objective is to leverage the efficiency gains from forming large platoons while accounting for waiting costs and deadline violations. We formulate the problem as a Markov decision process and analyze the structure of the optimal policy \(\pi^\star\) for \(L = 3\), extending insights to arbitrary \(L\). We prove certain monotonicity properties of the optimal policy in the state space \(\mathcal{S}\) and identify classes of unreachable states. Moreover, since the size of \(\mathcal{S}\) grows exponentially with \(L\) and \(T\), we propose heuristics--including conditional and deep-learning based approaches--that exploit these structural insights while maintaining low computational complexity.
Systems and Control (CS)
Non-Convex Over-the-Air Heterogeneous Federated Learning: A Bias-Variance Trade-off
Over-the-air (OTA) federated learning (FL) has been well recognized as a scalable paradigm that exploits the waveform superposition of the wireless multiple-access channel to aggregate model updates in a single use. Existing OTA-FL designs largely enforce zero-bias model updates by either assuming \emph{homogeneous} wireless conditions (equal path loss across devices) or forcing zero-bias updates to guarantee convergence. Under \emph{heterogeneous} wireless scenarios, however, such designs are constrained by the weakest device and inflate the update variance. Moreover, prior analyses of biased OTA-FL largely address convex objectives, while most modern AI models are highly non-convex. Motivated by these gaps, we study OTA-FL with stochastic gradient descent (SGD) for general smooth non-convex objectives under wireless heterogeneity. We develop novel OTA-FL SGD updates that allow a structured, time-invariant model bias while facilitating reduced variance updates. We derive a finite-time stationarity bound (expected time average squared gradient norm) that explicitly reveals a bias-variance trade-off. To optimize this trade-off, we pose a non-convex joint OTA power-control design and develop an efficient successive convex approximation (SCA) algorithm that requires only statistical CSI at the base station. Experiments on a non-convex image classification task validate the approach: the SCA-based design accelerates convergence via an optimized bias and improves generalization over prior OTA-FL baselines.
Time-Optimal Model Predictive Control for Linear Systems with Multiplicative Uncertainties
This paper presents a time-optimal Model Predictive Control (MPC) scheme for linear discrete-time systems subject to multiplicative uncertainties represented by interval matrices. To render the uncertainty propagation computationally tractable, the set-valued error system dynamics are approximated using a matrix-zonotope-based bounding operator. Recursive feasibility and finite-time convergence are ensured through an adaptive terminal constraint mechanism. A key advantage of the proposed approach is that all the necessary bounding sets can be computed offline, substantially reducing the online computational burden. The effectiveness of the method is illustrated via a numerical case study on an orbital rendezvous maneuver between two satellites.
Pareto-Optimal Sampling and Resource Allocation for Timely Communication in Shared-Spectrum Low-Altitude Networks
Guaranteeing stringent data freshness for low-altitude unmanned aerial vehicles (UAVs) in shared spectrum forces a critical trade-off between two operational costs: the UAV's own energy consumption and the occupation of terrestrial channel resources. The core challenge is to satisfy the aerial data freshness while finding a Pareto-optimal balance between these costs. Leveraging predictive channel models and predictive UAV trajectories, we formulate a bi-objective Pareto optimization problem over a long-term planning horizon to jointly optimize the sampling timing for aerial traffic and the power and spectrum allocation for fair coexistence. However, the problem's non-convex, mixed-integer nature renders classical methods incapable of fully characterizing the complete Pareto frontier. Notably, we show monotonicity properties of the frontier, building on which we transform the bi-objective problem into several single-objective problems. We then propose a new graph-based algorithm and prove that it can find the complete set of Pareto optima with low complexity, linear in the horizon and near-quadratic in the resource block (RB) budget. Numerical comparisons show that our approach meets the stringent timeliness requirement and achieves a six-fold reduction in RB utilization or a 6 dB energy saving compared to benchmarks.
Graph approach for observability analysis in power system dynamic state estimation
The proposed approach yields a numerical method that provably executes in linear time with respect to the number of nodes and edges in a graph. The graph, constructed from the power system model, requires only knowledge of the dependencies between state-to-state and output-to-state variables within a state-space framework. While graph-based observability analysis methods exist for power system static-state estimation, the approach presented here is the first for dynamic-state estimation (DSE). We examine decentralized and centralized DSE scenarios and compare our findings with a well-established, albeit non-scalable, observability analysis method in the literature. When compared to the latter in a centralized DSE setting, our method reduced computation time by 1440x.
Statistically Adaptive Differential Protection for AC Microgrids Based on Kullback-Leibler Divergence
The proliferation of inverter-based resources challenges traditional microgrid protection by introducing variable fault currents and complex transients. This paper presents a statistically adaptive differential protection scheme based on Kullback-Leibler divergence, implemented via a Bartlett-corrected G-statistic computed on logarithm-transformed current magnitudes. The method is a multivariate fault detection engine that employs the Mahalanobis distance to distinguish healthy and faulty states, enabling robust detection even in noisy environments. Detection thresholds are statistically derived from a chi-squared distribution for precise control over the false alarm rate. Upon detection, a lightweight classifier identifies the fault type by assessing per-phase G-statistics against dedicated thresholds, enhanced by a temporal persistence filter for security. Extensive simulations on a modified CIGRE 14-bus microgrid show high efficacy: sub-cycle average detection delays, high detection and classification accuracy across operating modes, resilience to high-impedance faults up to 250 Ohms, tolerance to 10 ms communication delay, and noise levels down to a 20 dB signal-to-noise ratio. These findings demonstrate a reproducible and computationally efficient solution for next-generation AC microgrid protection.
Agentic AI Home Energy Management System: A Large Language Model Framework for Residential Load Scheduling
The electricity sector transition requires substantial increases in residential demand response capacity, yet Home Energy Management Systems (HEMS) adoption remains limited by user interaction barriers requiring translation of everyday preferences into technical parameters. While large language models have been applied to energy systems as code generators and parameter extractors, no existing implementation deploys LLMs as autonomous coordinators managing the complete workflow from natural language input to multi-appliance scheduling. This paper presents an agentic AI HEMS where LLMs autonomously coordinate multi-appliance scheduling from natural language requests to device control, achieving optimal scheduling without example demonstrations. A hierarchical architecture combining one orchestrator with three specialist agents uses the ReAct pattern for iterative reasoning, enabling dynamic coordination without hardcoded workflows while integrating Google Calendar for context-aware deadline extraction. Evaluation across three open-source models using real Austrian day-ahead electricity prices reveals substantial capability differences. Llama-3.3-70B successfully coordinates all appliances across all scenarios to match cost-optimal benchmarks computed via mixed-integer linear programming, while other models achieve perfect single-appliance performance but struggle to coordinate all appliances simultaneously. Progressive prompt engineering experiments demonstrate that analytical query handling without explicit guidance remains unreliable despite models' general reasoning capabilities. We open-source the complete system including orchestration logic, agent prompts, tools, and web interfaces to enable reproducibility, extension, and future research.
comment: 34 pages, 9 figures. Code available at https://github.com/RedaElMakroum/agentic-ai-hems
Optimal Bidding and Coordinated Dispatch of Hybrid Energy Systems in Regulation Markets
The increasing integration of renewable energy sources and distributed energy resources (DER) into modern power systems introduces significant uncertainty, posing challenges for maintaining grid flexibility and reliability. Hybrid energy systems (HES), composed of controllable generators, flexible loads, and battery storage, offer a decentralized solution to enhance flexibility compared to single centralized resources. This paper presents a two-level framework to enable HES participation in frequency regulation markets. The upper level performs a chance-constrained optimization to choose capacity bids based on historical regulation signals. At the lower level, a real-time control strategy disaggregates the regulation power among the constituent resources. This real-time control strategy is then benchmarked against an offline optimal dispatch to evaluate flexibility performance. Additionally, the framework evaluates the profitability of overbidding strategies and identifies thresholds beyond which performance degradation may lead to market penalties or disqualification. The proposed framework also compare the impact of imbalance of power capacities on performance and battery state of charge (SoC) through asymmetric HES configurations.
Two-Timescale Optimization Framework for IAB-Enabled Heterogeneous UAV Networks
In post-disaster scenarios, the rapid deployment of adequate communication infrastructure is essential to support disaster search, rescue, and recovery operations. To achieve this, uncrewed aerial vehicle (UAV) has emerged as a promising solution for emergency communication due to its low cost and deployment flexibility. However, conventional untethered UAV (U-UAV) is constrained by size, weight, and power (SWaP) limitations, making it incapable of maintaining the operation of a macro base station. To address this limitation, we propose a heterogeneous UAV-based framework that integrates tethered UAV (T-UAV) and U-UAVs, where U-UAVs are utilized to enhance the throughput of cell-edge ground user equipments (G-UEs) and guarantee seamless connectivity during G-UEs' mobility to safe zones. It is noted that the integrated access and backhaul (IAB) technique is adopted to support the wireless backhaul of U-UAVs. Accordingly, we formulate a two-timescale joint user scheduling and trajectory control optimization problem, aiming to maximize the downlink throughput under asymmetric traffic demands and G-UEs' mobility. To solve the formulated problem, we proposed a two-timescale multi-agent deep deterministic policy gradient (TTS-MADDPG) algorithm based on the centralized training and distributed execution paradigm. Numerical results show that the proposed algorithm outperforms other benchmarks, including the two-timescale multi-agent proximal policy optimization (TTS-MAPPO) algorithm and MADDPG scheduling method, with robust and higher throughput. Specifically, the proposed algorithm obtains up to 12.2\% average throughput gain compared to the MADDPG scheduling method.
Proxemics and Permeability of the Pedestrian Group
People tend to walk in groups, and interactions with those groups have a significant impact on crowd behavior and pedestrian traffic dynamics. Social norms can be seen as unwritten rules regulating people interactions in social settings. This article studies people interactions with groups and the emergence of group proxemics. Group zones, zone occupancy counts and people clearance from the group are studied using naturalistic data. Analysis indicate potential presence of three different zones in addition to the public zone. People tend to remain in the public zone and only progressively get closer to groups, and those closer approaches happen in a low frequency and for brief periods of time.
Life-cycle Modeling and the Walking Behavior of the Pedestrian-Group as an Emergent Agent: With Empirical Data on the Cohesion of the Group Formation
This article investigates the pedestrian group as an emergent agent. The article explores empirical data to derive emergent agency and formation state spaces and outline recurring patterns of walking behavior. In this analysis, pedestrian trajectories extracted from surveillance videos are used along with manually annotated pedestrian group memberships. We conducted manual expert evaluation of observed groups, produced new manual annotations for relevant events pertaining to group behavior and extracted metrics relevant group formation. This information along with quantitative analysis was used to model the life-cycle and formation of the group agent. Those models give structure to expectations around walking behavior of groups; from pedestrian walking independently to the emergence of a collective intention where group members tended to maintain bounded distance between each other. Disturbances to this bounded distance often happened in association with changes in either their agency or their formation states. We summarized the patterns of behavior along with the sequences of state transitions into abstract patterns, which can aid in the development of more detailed group agents in simulation and in the design of engineering systems to interact with such groups.
Efficient Collision-Avoidance Constraints for Ellipsoidal Obstacles in Optimal Control: Application to Path-Following MPC and UAVs
This article proposes a modular optimal control framework for local three-dimensional ellipsoidal obstacle avoidance, exemplarily applied to model predictive path-following control. Static as well as moving obstacles are considered. Central to the approach is a computationally efficient and continuously differentiable condition for detecting collisions with ellipsoidal obstacles. A novel two-stage optimization approach mitigates numerical issues arising from the structure of the resulting optimal control problem. The effectiveness of the approach is demonstrated through simulations and real-world experiments with the Crazyflie quadrotor. This represents the first hardware demonstration of an MPC controller of this kind for UAVs in a three-dimensional task.
Safety Margins of Inverse Optimal ISSf Controllers
We investigate the gain margin of a general nonlinear system under an inverse optimal input-to-state safe (ISSf) controller of the form u=u0(x)+u*(x,u0), where u0 is the nominal control and u* is the inverse optimal safety filter that minimally modifies the nominal controller's unsafe actions over the infinite horizon. By first establishing a converse ISSf-BF theorem, we reveal the equivalence among the achievability of ISSf by feedback, the achievability of inverse optimality, and the solvability of a Hamilton-Jacobi-Isaacs equation associated with the inverse optimal ISSf gain assignment. Then we develop a collection of safety margin results on the overall control u=u0+u*. In the absence of disturbances, we find that standard inverse optimal safe controllers have a certain degree of gain margin. Specifically, when f(x) acts safely but u0 acts unsafely, the gain can be decreased by up to half; and when f(x) acts unsafely, we establish that, if u0 acts safely, the gain can be increased arbitrarily, whereas if u0 acts unsafely, the control recovers the full gain margin [1/2,inf). It is shown, however, that under control gain variation, the safe set of these controllers is locally asymptotically stable, which implies that their safety is sensitive to large but bounded disturbances. To make inverse optimal ISSf controllers robust to gain variation, we propose a gain margin improvement approach at the expense of an increased control effort. This improvement allows the inverse optimal safe control to inherit the standard gain margin of [1/2,inf) without requiring prior knowledge of whether f(x) or u0 acts safely on the safety boundary, while simultaneously ensuring global asymptotic stability of the resulting safe set. In the presence of disturbances, this improvement idea renders inverse optimal ISSf controllers robust to gain variations with the same gain margin of [1/2,inf).
XWAVE: A Novel Software-Defined Everything Approach for the Manufacturing Industry
The manufacturing sector is moving from rigid, hardware-dependent systems toward flexible, software-driven environments. This transformation is shaped by the convergence of several Software-Defined technologies: Software-Defined Automation virtualizes industrial control, replacing proprietary PLCs with containerized, programmable solutions that enable scalability and interoperability. Software-Defined Compute and Communications provide a means to distribute intelligence seamlessly across devices, networks, and cloud platforms, reducing latency and enabling dynamic reconfiguration. Software-Defined Manufacturing Systems, usually implemented as Digital Twins, are real-time virtual models of machines and processes, allowing predictive analysis, optimization, and closer integration between human operators and intelligent systems. This work presents XWAVE, a project that unites these three Software-Defined paradigms to present a modular, fully software-defined manufacturing system.
Command-filter-based trajectory-tracking control of quadrotor subject to internal and external disturbances
We propose a command-filter backstepping controller that integrates a disturbance observer and a high-gain observer (HGO) to handle unknown internal and external disturbances acting on a quadrotor. To build the controller, we first define tracking errors between the measured and desired quadrotor outputs, which allow the system to be rewritten in a new set of state variables. Using this transformed model, we apply Lyapunov theory to derive a backstepping control law. To avoid repeated differentiation of states and virtual controls, a first-order command filter is introduced, and a nonlinear disturbance observer is added to provide disturbance estimates. Each state in the controller and observer is replaced with its estimate from the HGO. The resulting control law enables the quadrotor to follow its path despite internal and external disturbances, with each subsystem allowed its own disturbance type for realism. A new state transformation and Lyapunov-based derivation prevent the usual explosion of complexity, while the HGO reconstructs unmeasured states and their rates for output feedback. The nonlinear disturbance observer attenuates constant and nonlinear disturbances as well as band-limited white noise. The method reduces dependence on high-precision sensors and mitigates wind, model error, and rotor noise effects during flight. Unlike previous studies that treat either disturbance rejection or partial sensing, this work combines the command filter, disturbance observer, and HGO to address both challenges simultaneously while avoiding the complexity growth typical of backstepping designs.
Cooperative Task Spaces for Multi-Arm Manipulation Control based on Similarity Transformations
Many tasks in human environments require collaborative behavior between multiple kinematic chains, either to provide additional support for carrying big and bulky objects or to enable the dexterity that is required for in-hand manipulation. Since these complex systems often have a very high number of degrees of freedom coordinating their movements is notoriously difficult to model. In this article, we present the derivation of the theoretical foundations for cooperative task spaces of multi-arm robotic systems based on geometric primitives defined using conformal geometric algebra. Based on the similarity transformations of these cooperative geometric primitives, we derive an abstraction of complex robotic systems that enables representing these systems in a way that directly corresponds to single-arm systems. By deriving the associated analytic and geometric Jacobian matrices, we then show the straightforward integration of our approach into classical control techniques rooted in operational space control. We demonstrate this using bimanual manipulators, humanoids and multi-fingered hands in optimal control experiments for reaching desired geometric primitives and in teleoperation experiments using differential kinematics control. We then discuss how the geometric primitives naturally embed nullspace structures into the controllers that can be exploited for introducing secondary control objectives. This work, represents the theoretical foundations of this cooperative manipulation control framework, and thus the experiments are presented in an abstract way, while giving pointers towards potential future applications.
From Embedding to Control: Representations for Stochastic Multi-Object Systems
This paper studies how to achieve accurate modeling and effective control in stochastic nonlinear dynamics with multiple interacting objects. However, non-uniform interactions and random topologies make this task challenging. We address these challenges by proposing \textit{Graph Controllable Embeddings} (GCE), a general framework to learn stochastic multi-object dynamics for linear control. Specifically, GCE is built on Hilbert space embeddings, allowing direct embedding of probability distributions of controlled stochastic dynamics into a reproducing kernel Hilbert space (RKHS), which enables linear operations in its RKHS while retaining nonlinear expressiveness. We provide theoretical guarantees on the existence, convergence, and applicability of GCE. Notably, a mean field approximation technique is adopted to efficiently capture inter-object dependencies and achieve provably low sample complexity. By integrating graph neural networks, we construct data-dependent kernel features that are capable of adapting to dynamic interaction patterns and generalizing to even unseen topologies with only limited training instances. GCE scales seamlessly to multi-object systems of varying sizes and topologies. Leveraging the linearity of Hilbert spaces, GCE also supports simple yet effective control algorithms for synthesizing optimal sequences. Experiments on physical systems, robotics, and power grids validate GCE and demonstrate consistent performance improvement over various competitive embedding methods in both in-distribution and few-shot tests
Design of Orthogonal Phase of Arrival Positioning Scheme Based on 5G PRS and Optimization of TOA Performance
This study analyzes the performance of positioning techniques based on configuration changes of 5G New Radio signals. In 5G networks, a terminal position is determined from the Time of Arrival of Positioning Reference Signals transmitted by base stations. We propose an algorithm that improves TOA accuracy under low sampling rate constraints and implement 5G PRS for positioning in a software defined modem. We also examine how flexible time frequency resource allocation of PRS affects TOA estimation accuracy and discuss optimal PRS configurations for a given signal environment.
Confidential FRIT via Homomorphic Encryption
Edge computing alleviates the computation burden of data-driven control in cyber-physical systems (CPSs) by offloading complex processing to edge servers. However, the increasing sophistication of cyberattacks underscores the need for security measures that go beyond conventional IT protections and address the unique vulnerabilities of CPSs. This study proposes a confidential data-driven gain-tuning framework using homomorphic encryption, such as ElGamal and CKKS encryption schemes, to enhance cybersecurity in gain-tuning processes outsourced to external servers. The idea for realizing confidential FRIT is to replace the matrix inversion operation with a vector summation form, allowing homomorphic operations to be applied. Numerical examples under 128-bit security confirm performance comparable to conventional methods while providing guidelines for selecting suitable encryption schemes for secure CPS.
Green Wireless Network Scaling for Joint Deployment: Multi-BSs or Multi-RISs?
The imminent emergence of sixth-generation (6G) networks faces critical challenges from spatially heterogeneous traffic and escalating energy consumption, necessitating sustainable scaling strategies for network infrastructure such as base stations (BSs) and reconfigurable intelligent surfaces (RISs). This paper establishes fundamental scaling laws for the Integrated Relative Energy Efficiency (IREE) metric under joint multi-BS and multi-RIS deployment in traffic-mismatched scenarios. Specifically, we propose an Alternating Directional Dual-Radial Basis Function (ADD-RBF) framework that models the channels of BSs and RISs as two type of spatially decoupled RBF neurons to maximize IREE through alternative optimization, with proven universal approximation capability and convergence guarantees. Theoretical analysis reveals a scaling dichotomy: BS proliferation drives logarithmic capacity growth $\mathcal{O}(\log N^{BS})$ but only polynomial mismatch reduction $\mathcal{O}(1/\sqrt{N^{BS}})$, whereas RIS deployment achieves exponential mismatch mitigation $\mathcal{O}(\delta_{\text{err}}^{-(N^R+1)})$ despite its sub-logarithmic capacity gains. Simulation results validate that RISs excel in capturing spatial traffic correlations and alleviating hotspots, making them particularly effective when mismatch dominates, while BSs are preferable under capacity shortages. These findings offer practical guidelines for green 6G network design.
A Scenario-Based Approach for Stochastic Economic Model Predictive Control with an Expected Shortfall Constraint
This paper presents a novel approach to stochastic economic model predictive control (SEMPC) that minimizes average economic cost while satisfying an empirical expected shortfall (EES) constraint to manage risk. A new scenario-based problem formulation ensuring controlled risk with high confidence while minimizing the average cost is introduced. The probabilistic guarantees is dependent on the number of support elements over the entire input domain, which is difficult to find for high-dimensional systems. A heuristic algorithm is proposed to find the number of support elements. Finally, an efficient method is presented to reduce the computational complexity of the SEMPC problem with an EES constraint. The approach is validated on a water distribution network, showing its effectiveness in balancing performance and risk.
Competitive Equilibrium for Electricity Markets with Spatially Flexible Load
Electric vehicle charging and geo-distributed datacenters introduce spatially flexible loads (FLs) that couple power, transportation, and datacenter networks. These couplings create a closed-loop feedback between locational marginal prices (LMPs) and decisions of the FL systems, challenging the foundations of conventional competitive equilibrium (CE) in electricity markets. This paper studies a notion of generalized competitive equilibrium (GCE) that aims to capture such price-demand interactions across the interconnected infrastructures. We establish structural conditions under which the GCE preserves key properties of the conventional CE, including existence, uniqueness, and efficiency, without requiring detailed knowledge of decision processes for individual FL systems. The framework generalizes to settings where the grid is coupled with multiple FL systems. Stylized examples and case studies on the New York ISO grid, coupled with the Sioux Falls transportation and distributed datacenter networks, demonstrate the use of our theoretical framework and illustrate the mutual influence among the grid and the studied FL systems.
SUSTAINABLE Platform: Seamless Smart Farming Integration Towards Agronomy Automation SC2
The global agricultural sector is undergoing a transformative shift, driven by increasing food demands, climate variability and the need for sustainable practices. SUSTAINABLE is a smart farming platform designed to integrate IoT, AI, satellite imaging, and role-based task orchestration to enable efficient, traceable, and sustainable agriculture with a pilot usecase in viticulture. This paper explores current smart agriculture solutions, presents a comparative evaluation, and introduces SUSTAINABLE's key features, including satellite index integration, real-time environmental data, and role-aware task management tailored to Mediterranean vineyards.
comment: Accepted for presentation to 11th IEEE International Smart Cities Conference (ISC2 2025)
Dispatchable Current Source Virtual Oscillator Control Achieving Global Stability
This work introduces a novel dispatchable current source virtual oscillator control (dCVOC) scheme for grid-following (GFL) converters, which exhibits duality with dispatchable virtual oscillator control (dVOC) in two ways: a) the current frequency is generated through reactive power control, similar to a PLL ; b) the current magnitude reference is generated through active power control. We formally prove that our proposed control always admits a steady-state equilibrium and ensures global stability under reasonable conditions on grid and converter parameters, even when considering LVRT and current saturation constraints. Our approach avoids low-voltage transients and weak grid instability, which is not the case for conventional GFL control. The effectiveness of our proposed control is verified through high-fidelity electromagnetic transient simulations.
Quantitative Parameter Conditions for Stability and Coupling in GFM-GFL Converter Hybrid Systems from a Small-Signal Synchronous Perspective
With the development of renewable energy sources, power systems are gradually evolving into a system comprising both grid-forming (GFM) and grid-following (GFL) converters. However, the dynamic interaction between the two types of converters, especially low-inertia GFM converters and GFL converters, remains unclear due to the substantial differences in their synchronization mechanisms. To address this gap, this paper develops a small-signal synchronous stability model for power systems containing GFM and GFL converters, which considers network line dynamics. Based on subspace perturbation theory, we reveal that GFM and GFL subsystems can be effectively decoupled when GFL converters operate near unity power factor or when GFM converters possess sufficiently large inertia or damping, and provide lower bound of control parameters ensuring decoupling. Under the decoupling condition, we propose decentralized and analytical parameter-based stability criteria which have clear physical interpretations: the positive damping of converters compensates for the negative damping of the network. In the case of coupling, we also propose decentralized stability criteria based on the small phase theorem. The effectiveness of the theoretical analysis is validated through simulations in MATLAB/Simulink.
Adaptive Control for a Physics-Informed Model of a Thermal Energy Distribution System: Qualitative Analysis
Integrated energy systems (IES) are complex heterogeneous architectures that typically encompass power sources, hydrogen electrolyzers, energy storage, and heat exchangers. This integration is achieved through operating control strategy optimization. However, the lack of physical understanding as to how these systems evolve over time introduces uncertainties that hinder reliable application thereof. Techniques that can accommodate such uncertainties are fundamental for ensuring proper operation of these systems. Unfortunately, no unifying methodology exists for accommodating uncertainties in this regard. That being said, adaptive control (AC) is a discipline that may allow for accommodating such uncertainties in real-time. In the present work, we derive an AC formulation for linear systems in which all states are observable and apply it to the control of a glycol heat exchanger (GHX) in an IES. Based on prior research in which we quantified the uncertainties of the GHXs system dynamics, we introduced an error of 50% on four terms of the nominal model. In the case where a linear quadratic regulator is used as the nominal control for the reference system, we found that employing AC can reduce the mean absolute error and integral time absolute error by a factor of 30%-75%. This reduction is achieved with minimal computing overhead and control infrastructure, thus underscoring the strength of AC. However, the control effort induced is significant, therefore warranting further study in order to estimate its impact on a physical system. To address further challenges, including partially observable and non-linear dynamics, enhancements of the linear formulation are currently being developed.
Quantifying Grid-Forming Behavior: Bridging Device-level Dynamics and System-Level Strength
Grid-forming (GFM) technology is widely regarded as a promising solution for future power systems dominated by power electronics. However, a precise method for quantifying GFM converter behavior and a universally accepted GFM definition remain elusive. Moreover, the impact of GFM on system stability is not precisely quantified, creating a significant disconnect between device and system levels. To address these gaps from a small-signal perspective, at the device level, we introduce a novel metric, the Forming Index (FI) to quantify a converter's response to grid voltage fluctuations. Rather than enumerating various control architectures, the FI provides a metric for the converter's GFM ability by quantifying its sensitivity to grid variations. At the system level, we propose a new quantitative measure of system strength that captures the multi-bus voltage stiffness, which quantifies the voltage and phase angle responses of multiple buses to current or power disturbances. We further extend this concept to grid strength and bus strength to identify weak areas within the system. Finally, we bridge the device and system levels by formally proving that GFM converters enhance system strength. Our proposed framework provides a unified benchmark for GFM converter design, optimal placement, and system stability assessment.
Ferrohydrodynamic Microfluidics for Bioparticle Separation and Single-Cell Phenotyping: Principles, Applications, and Emerging Directions
Ferrohydrodynamic microfluidics relies on magnetic field gradients to manipulate diamagnetic particles in ferrofluid-filled microenvironments. It has emerged as a promising tool for label-free manipulation of bioparticles, including their separation and phenotyping. This perspective reviews recent progress in the development and applications of ferrofluid-based microfluidic platforms for multiscale bioparticle separation, ranging from micron-scale cells to submicron extracellular vesicles. We highlight the fundamental physical principles for ferrohydrodynamic manipulation, including the dominant magnetic buoyancy force resulting from the interaction of ferrofluids and particles. We then describe how these principles enable high-resolution size-based bioparticle separation, subcellular bioparticle enrichment, and phenotypic screening based on physical traits. We also discuss key challenges in ferrohydrodynamic microfluidics from the aspects of ferrofluid biocompatibility, system throughput, and nanoparticle depletion. Finally, we outline future research directions involving machine learning, 3D printing, and multiplexed detection. These insights chart a path for advancing ferrofluid-based technologies in precision biomedicine, diagnostics, and cellular engineering.
Cooperative Integrated Estimation-Guidance for Simultaneous Interception of Moving Targets
This paper proposes a cooperative integrated estimation-guidance framework for simultaneous interception of a non-maneuvering target using a team of unmanned autonomous vehicles, assuming only a subset of vehicles are equipped with dedicated sensors to measure the target's states. Unlike earlier approaches that focus solely on either estimation or guidance design, the proposed framework unifies both within a cooperative architecture. To circumvent the limitation posed by heterogeneity in target observability, sensorless vehicles estimate the target's state by leveraging information exchanged with neighboring agents over a directed communication topology through a prescribed-time observer. The proposed approach employs true proportional navigation guidance (TPNG), which uses an exact time-to-go formulation and is applicable across a wide spectrum of target motions. Furthermore, prescribed-time observer and controller are employed to achieve convergence to true target's state and consensus in time-to-go within set predefined times, respectively. Simulations demonstrate the effectiveness of the proposed framework under various engagement scenarios.
Finite Sample MIMO System Identification with Multisine Excitation: Nonparametric, Direct, and Two-step Parametric Estimators
Multisine excitations are widely used for identifying multi-input multi-output systems due to their periodicity, data compression properties, and control over the input spectrum. Despite their popularity, the finite sample statistical properties of frequency-domain estimators under multisine excitation, for both nonparametric and parametric settings, remain insufficiently understood. This paper develops a finite-sample statistical framework for least-squares estimation of the frequency response function (FRF) and its implications for parametric modeling. First, we derive exact distributional and covariance properties of the FRF estimator, explicitly accounting for aliasing effects under slow sampling regimes, and establish conditions for unbiasedness, uncorrelatedness, and consistency across multiple experiments. Second, we show that the FRF estimate is a sufficient statistic for any parametric model under Gaussian noise, leading to an exact equivalence between optimal two stage frequency-domain methods and time-domain prediction error and maximum likelihood estimation. This equivalence is shown to yield finite-sample concentration bounds for parametric maximum likelihood estimators, enabling rigorous uncertainty quantification, and closed-form prediction error method estimators without iterative optimization. The theoretical results are demonstrated in a representative case study.
comment: 16 pages, 4 figures
Data-Driven Stabilization Using Prior Knowledge on Stabilizability and Controllability
In this work, we study data-driven stabilization of linear time-invariant systems using prior knowledge of system-theoretic properties, specifically stabilizability and controllability. To formalize this, we extend the concept of data informativity by requiring the existence of a controller that stabilizes all systems consistent with the data and the prior knowledge. We show that if the system is controllable, then incorporating this as prior knowledge does not relax the conditions required for data-driven stabilization. Remarkably, however, we show that if the system is stabilizable, then using this as prior knowledge leads to necessary and sufficient conditions that are weaker than those for data-driven stabilization without prior knowledge. In other words, data-driven stabilization is easier if one knows that the underlying system is stabilizable. We also provide new data-driven control design methods in terms of linear matrix inequalities that complement the conditions for informativity.
comment: 6 pages
Decentralized Merging Control of Connected and Automated Vehicles to Enhance Safety and Energy Efficiency using Control Barrier Functions
This paper presents a decentralized Control Barrier Function (CBF) based approach for highway merging of Connected and Automated Vehicles (CAVs). In this control algorithm, each "host" vehicle negotiates with other agents in a control zone of the highway network, and enacts its own action, to perform safe and energy-efficient merge maneuvers. It uses predictor-corrector loops within the robust CBF setting for negotiation and to reconcile disagreements that may arise. There is no explicit order of vehicles and no priority. A notable feature is absence of gridlocks due to instability of the inter-agent system. Results from Monte Carlo simulations show significant improvement in the system-wide energy efficiency and traffic flow compared to a first-in-first-out approach, as well as enhanced robustness of the proposed decentralized controller compared to its centralized counterpart.
comment: This work has been submitted to a conference for possible publication and is under review. Paper summary: 8 pages, 5 figures, 2 tables
Recursive Experiment Design for Closed-Loop Identification with Output Perturbation Limits
In many applications, system identification experiments must be performed under output feedback to ensure safety or to maintain system operation. In this paper, we consider the online design of informative experiments for ARMAX models by applying a bounded perturbation to the input signal generated by a fixed output feedback controller. Specifically, the design constrains the resulting output perturbation within user-specified limits and can be efficiently computed in closed form. We demonstrate the effectiveness of the method in a numerical experiment.
Optimal and Heuristic Approaches for Platooning Systems with Deadlines
Efficient truck platooning is a key strategy for reducing freight costs, lowering fuel consumption, and mitigating emissions. Deadlines are critical in this context, as trucks must depart within specific time windows to meet delivery requirements and avoid penalties. In this paper, we investigate the optimal formation and dispatch of truck platoons at a highway station with finite capacity $L$ and deadline constraints $T$. The system operates in discrete time, with each arriving truck assigned a deadline of $T$ slot units. The objective is to leverage the efficiency gains from forming large platoons while accounting for waiting costs and deadline violations. We formulate the problem as a Markov decision process and analyze the structure of the optimal policy $\pi^\star$ for $L = 3$, extending insights to arbitrary $L$. We prove certain monotonicity properties of the optimal policy in the state space $\mathcal{S}$ and identify classes of unreachable states. Moreover, since the size of $\mathcal{S}$ grows exponentially with $L$ and $T$, we propose heuristics -- including conditional and deep-learning based approaches -- that exploit these structural insights while maintaining low computational complexity.
Convex computation of regions of attraction from data using Sums-of-Squares programming
The paper concentrates on the analysis of the Region of Attraction (RoA) for unknown autonomous dynamical systems. The aim is to explore a data-driven approach based on moment Sum-of-Squares (SoS) hierarchy, which enables novel RoA outer approximations despite the reduced information on the structure of the dynamics. The main contribution of this work is bypassing the system model and, consequently, the recurring constraint on its polynomial structure. Numerical experimentation showcases the influence of data on learned approximating sets, offering a promising outlook on the potential of this method.
Smart Exploration in Reinforcement Learning using Bounded Uncertainty Models
Reinforcement learning (RL) is a powerful framework for decision-making in uncertain environments, but it often requires large amounts of data to learn an optimal policy. We address this challenge by incorporating prior model knowledge to guide exploration and accelerate the learning process. Specifically, we assume access to a model set that contains the true transition kernel and reward function. We optimize over this model set to obtain upper and lower bounds on the Q-function, which are then used to guide the exploration of the agent. We provide theoretical guarantees on the convergence of the Q-function to the optimal Q-function under the proposed class of exploring policies. Furthermore, we also introduce a data-driven regularized version of the model set optimization problem that ensures the convergence of the class of exploring policies to the optimal policy. Lastly, we show that when the model set has a specific structure, namely the bounded-parameter MDP (BMDP) framework, the regularized model set optimization problem becomes convex and simple to implement. In this setting, we also prove finite-time convergence to the optimal policy under mild assumptions. We demonstrate the effectiveness of the proposed exploration strategy, which we call BUMEX (Bounded Uncertainty Model-based Exploration), in a simulation study. The results indicate that the proposed method can significantly accelerate learning in benchmark examples. A toolbox is available at https://github.com/JvHulst/BUMEX.
comment: Accepted for Presentation at 64th IEEE Conference on Decision and Control, CDC 2025, Rio de Janeiro, Brazil, 2025
Agile and Cooperative Aerial Manipulation of a Cable-Suspended Load
Quadrotors can carry slung loads to hard-to-reach locations at high speed. Since a single quadrotor has limited payload capacities, using a team of quadrotors to collaboratively manipulate a heavy object is a scalable and promising solution. However, existing control algorithms for multi-lifting systems only enable low-speed and low-acceleration operations due to the complex dynamic coupling between quadrotors and the load, limiting their use in time-critical missions such as search and rescue. In this work, we present a solution to significantly enhance the agility of cable-suspended multi-lifting systems. Unlike traditional cascaded solutions, we introduce a trajectory-based framework that solves the whole-body kinodynamic motion planning problem online, accounting for the dynamic coupling effects and constraints between the quadrotors and the load. The planned trajectory is provided to the quadrotors as a reference in a receding-horizon fashion and is tracked by an onboard controller that observes and compensates for the cable tension. Real-world experiments demonstrate that our framework can achieve at least eight times greater acceleration than state-of-the-art methods to follow agile trajectories. Our method can even perform complex maneuvers such as flying through narrow passages at high speed. Additionally, it exhibits high robustness against load uncertainties and does not require adding any sensors to the load, demonstrating strong practicality.
comment: 38 pages, 11 figures
SafEDMD: A Koopman-based data-driven controller design framework for nonlinear dynamical systems
The Koopman operator serves as the theoretical backbone for machine learning of dynamical control systems, where the operator is heuristically approximated by extended dynamic mode decomposition (EDMD). In this paper, we propose SafEDMD, a novel stability- and feedback-oriented EDMD-based controller design framework. Our approach leverages a reliable surrogate model generated in a data-driven fashion in order to provide closed-loop guarantees. In particular, we establish a controller design based on semi-definite programming with guaranteed stabilization of the underlying nonlinear system. As central ingredient, we derive proportional error bounds that vanish at the origin and are tailored to control tasks. We illustrate the developed method by means of several benchmark examples and highlight the advantages over state-of-the-art methods.
comment: Accepted for publication in Automatica
High Performance Distributed Control for Large-Scale Linear Systems: A Cover-Based Distributed Observer Approach
In recent years, the distributed-observer-based distributed control law has shown powerful ability to arbitrarily approximate the centralized control performance. However, the traditional distributed observer requires each local observer to reconstruct the state information of the whole system, which is unrealistic for large-scale scenarios. To fill this gap, This paper presents a coverage solution algorithm for large-scale systems that accounts for both physical and communication network characteristics, which can significantly reduce the dimension of local observers. Then, the cover-based distributed observer for large-scale systems is proposed to overcome the problem that the system dynamics are difficult to estimate due to the coupling between cover sets. Furthermore, the two-layer Lyapunov analysis method is adopted and the dynamic transformation lemma of compact errors is proved, which solves the problem of analyzing stability of the error dynamic of the cover-based distributed observer. Finally, it is proved that the distributed control law based on the cover-based distributed observer can also arbitrarily approximate the control performance of the centralized control law, and the dimension of the local observer is greatly reduced compared with the traditional method. The simulation results show the validity of the developed theories.
End-to-end guarantees for indirect data-driven control of bilinear systems with finite stochastic data
In this paper we propose an end-to-end algorithm for indirect data-driven control for bilinear systems with stability guarantees. We consider the case where the collected i.i.d. data is affected by probabilistic noise with possibly unbounded support and leverage tools from statistical learning theory to derive finite sample identification error bounds. To this end, we solve the bilinear identification problem by solving a set of linear and affine identification problems, by a particular choice of a control input during the data collection phase. We provide a priori as well as data-dependent finite sample identification error bounds on the individual matrices as well as ellipsoidal bounds, both of which are structurally suitable for control. Further, we integrate the structure of the derived identification error bounds in a robust controller design to obtain an exponentially stable closed-loop. By means of an extensive numerical study we showcase the interplay between the controller design and the derived identification error bounds. Moreover, we note appealing connections of our results to indirect data-driven control of general nonlinear systems through Koopman operator theory and discuss how our results may be applied in this setup.
Game Theoretic Resilience Recommendation Framework for CyberPhysical Microgrids Using Hypergraph MetaLearning
This paper presents a physics-aware cyberphysical resilience framework for radial microgrids under coordinated cyberattacks. The proposed approach models the attacker through a hypergraph neural network (HGNN) enhanced with model agnostic metalearning (MAML) to rapidly adapt to evolving defense strategies and predict high-impact contingencies. The defender is modeled via a bi-level Stackelberg game, where the upper level selects optimal tie-line switching and distributed energy resource (DER) dispatch using an Alternating Direction Method of Multipliers (ADMM) coordinator embedded within the Non-dominated Sorting Genetic Algorithm II (NSGA-II). The framework simultaneously optimizes load served, operational cost, and voltage stability, ensuring all post-defense states satisfy network physics constraints. The methodology is first validated on the IEEE 69-bus distribution test system with 12 DERs, 8 critical loads, and 5 tie-lines, and then extended to higher bus systems including the IEEE 123-bus feeder and a synthetic 300-bus distribution system. Results show that the proposed defense strategy restores nearly full service for 90% of top-ranked attacks, mitigates voltage violations, and identifies Feeder 2 as the principal vulnerability corridor. Actionable operating rules are derived, recommending pre-arming of specific tie-lines to enhance resilience, while higher bus system studies confirm scalability of the framework on the IEEE 123-bus and 300-bus systems.
Climate Science and Control Engineering: Insights, Parallels, and Connections
Climate science is the multidisciplinary field that studies the Earth's climate and its evolution. At the very core of climate science are indispensable climate models that predict future climate scenarios, inform policy decisions, and dictate how a country's economy should change in light of the changing climate. Climate models capture a wide range of interacting dynamic processes via extremely complex ordinary and partial differential equations. To model these large-scale complex processes, climate science leverages supercomputers, advanced simulations, and statistical methods to predict future climate. An area of engineering that is rarely studied in climate science is control engineering. Given that climate systems are inherently dynamic, it is intuitive to analyze them within the framework of dynamic system science. This perspective has been underexplored in the literature. In this manuscript, we provide a tutorial that: (i) introduces the control engineering community to climate dynamics and modeling, including spatiotemporal scales and challenges in climate modeling; (ii) offers a fresh perspective on climate models from a control systems viewpoint; and (iii) explores the relevance and applicability of various advanced graph and network control-based approaches in building a physics-informed framework for learning, control and estimation in climate systems. We also present simple and then more complex climate models, depicting fundamental ideas and processes that are instrumental in building climate change projections. This tutorial also builds parallels and observes connections between various contemporary problems at the forefront of climate science and their control theoretic counterparts. We specifically observe that an abundance of climate science problems can be linguistically reworded and mathematically framed as control theoretic ones.
On the Detection of Shared Data Manipulation in Distributed Optimization
This paper investigates the vulnerability of the Alternating Direction Method of Multipliers (ADMM) algorithm to shared data manipulation, with a focus on solving optimal power flow (OPF) problems. Deliberate data manipulation may cause the ADMM algorithm to converge to suboptimal solutions. We derive a sufficient condition for detecting data manipulation based on the theoretical convergence trajectory of the ADMM algorithm. We evaluate the performance of the detection condition on three data manipulation strategies with various levels of complexity and stealth. The simplest attack sends the target values and each iteration, the second attack uses a feedback loop to find the next target values, and the last attack uses a bilevel optimization to find the target values. We then extend the three data manipulation strategies to avoid detection by the detection conditions and a neural network (NN) detection model. We also propose an adversarial NN training framework to detect shared data manipulation. We illustrate the performance of our data manipulation strategy and detection framework on OPF problems. The results show that the proposed detection condition successfully detects most of the data manipulation attacks. However, the bilevel optimization attack strategy that incorporates the detection methods may avoid being detected. Countering this, our proposed adversarial training framework detects all the instances of the bilevel optimization attack.
Online Adaptation for Flying Quadrotors in Tight Formations
The task of flying in tight formations is challenging for teams of quadrotors because the complex aerodynamic wake interactions can destabilize individual team members as well as the team. Furthermore, these aerodynamic effects are highly nonlinear and fast-paced, making them difficult to model and predict. To overcome these challenges, we present L1 KNODE-DW MPC, an adaptive, mixed expert learning based control framework that allows individual quadrotors to accurately track trajectories while adapting to time-varying aerodynamic interactions during formation flights. We evaluate L1 KNODE-DW MPC in two different three-quadrotor formations and show that it outperforms several MPC baselines. Our results show that the proposed framework is capable of enabling the three-quadrotor team to remain vertically aligned in close proximity throughout the flight. These findings show that the L1 adaptive module compensates for unmodeled disturbances most effectively when paired with an accurate dynamics model. A video showcasing our framework and the physical experiments is available here: https://youtu.be/9QX1Q5Ut9Rs
comment: 10 pages, 4 figures
Understanding the Application of Utility Theory in Robotics and Artificial Intelligence: A Survey
As a unifying concept in economics, game theory, and operations research, even in the Robotics and AI field, the utility is used to evaluate the level of individual needs, preferences, and interests. Especially for decision-making and learning in multi-agent/robot systems (MAS/MRS), a suitable utility model can guide agents in choosing reasonable strategies to achieve their current needs and learning to cooperate and organize their behaviors, optimizing the system's utility, building stable and reliable relationships, and guaranteeing each group member's sustainable development, similar to the human society. Although these systems' complex, large-scale, and long-term behaviors are strongly determined by the fundamental characteristics of the underlying relationships, there has been less discussion on the theoretical aspects of mechanisms and the fields of applications in Robotics and AI. This paper introduces a utility-orient needs paradigm to describe and evaluate inter and outer relationships among agents' interactions. Then, we survey existing literature in relevant fields to support it and propose several promising research directions along with some open problems deemed necessary for further investigations.
comment: I am not sure whether withdrawing this paper is suitable. However, right now this paper has significant changes in its topic and author. So, I do not want to lead to any confusion about this paper. In the future, it will have a new version. I hope people will not have issues and confusion about the older one
Integrated Learning and Optimization to Control Load Demand and Wind Generation for Minimizing Ramping Cost in Real-Time Electricity Market
We developed a new integrated learning and optimization (ILO) methodology to predict context-aware unknown parameters in economic dispatch (ED), a crucial problem in power systems solved to generate optimal power dispatching decisions to serve consumer load. The ED formulation in the current study consists of load and renewable generation as unknown parameters in its constraints predicted using contextual information (e.g., prior load, temperature). The ILO framework train a neural network (NN) to estimate ED parameters by minimizing an application-specific regret function which is a difference between ground truth and NN-driven decisions favouring better ED decisions. We thoroughly analyze the feasible region of ED formulation to understand the impact of load and renewable learning together on the ED decisions. Corresponding to that we developed a new regret function to capture real-time electricity market operations where differences in predicted and true loads are corrected by ramping generators in real-time but at a higher cost than the market price. The proposed regret function when minimized using ILO framework train the NN to guide the load and renewable predictions to generate ED decisions favouring minimum generator ramping costs. This is unlike conventional sequential learning and optimization (SLO) framework which train NN to accurately estimate load and renewable instead of better ED decisions. The combined training of load and renewable using ILO is a new concept and lead to significantly improved ramping costs when compared with SLO based training of load and renewable and SLO trained load with 100% accurate renewable proving its decision-focused capability.
comment: The preprint was submitted to disseminate the idea as soon as possible and was submitted without asking one of the authors listed in the manuscript as he was the supervisor. Moreover, the submitted preprint mentions being submitted in a journal while it has not yet been submitted in a journal yet. The institute thus asked to withdraw the preprint
Systems and Control (EESS)
Non-Convex Over-the-Air Heterogeneous Federated Learning: A Bias-Variance Trade-off
Over-the-air (OTA) federated learning (FL) has been well recognized as a scalable paradigm that exploits the waveform superposition of the wireless multiple-access channel to aggregate model updates in a single use. Existing OTA-FL designs largely enforce zero-bias model updates by either assuming \emph{homogeneous} wireless conditions (equal path loss across devices) or forcing zero-bias updates to guarantee convergence. Under \emph{heterogeneous} wireless scenarios, however, such designs are constrained by the weakest device and inflate the update variance. Moreover, prior analyses of biased OTA-FL largely address convex objectives, while most modern AI models are highly non-convex. Motivated by these gaps, we study OTA-FL with stochastic gradient descent (SGD) for general smooth non-convex objectives under wireless heterogeneity. We develop novel OTA-FL SGD updates that allow a structured, time-invariant model bias while facilitating reduced variance updates. We derive a finite-time stationarity bound (expected time average squared gradient norm) that explicitly reveals a bias-variance trade-off. To optimize this trade-off, we pose a non-convex joint OTA power-control design and develop an efficient successive convex approximation (SCA) algorithm that requires only statistical CSI at the base station. Experiments on a non-convex image classification task validate the approach: the SCA-based design accelerates convergence via an optimized bias and improves generalization over prior OTA-FL baselines.
Time-Optimal Model Predictive Control for Linear Systems with Multiplicative Uncertainties
This paper presents a time-optimal Model Predictive Control (MPC) scheme for linear discrete-time systems subject to multiplicative uncertainties represented by interval matrices. To render the uncertainty propagation computationally tractable, the set-valued error system dynamics are approximated using a matrix-zonotope-based bounding operator. Recursive feasibility and finite-time convergence are ensured through an adaptive terminal constraint mechanism. A key advantage of the proposed approach is that all the necessary bounding sets can be computed offline, substantially reducing the online computational burden. The effectiveness of the method is illustrated via a numerical case study on an orbital rendezvous maneuver between two satellites.
Pareto-Optimal Sampling and Resource Allocation for Timely Communication in Shared-Spectrum Low-Altitude Networks
Guaranteeing stringent data freshness for low-altitude unmanned aerial vehicles (UAVs) in shared spectrum forces a critical trade-off between two operational costs: the UAV's own energy consumption and the occupation of terrestrial channel resources. The core challenge is to satisfy the aerial data freshness while finding a Pareto-optimal balance between these costs. Leveraging predictive channel models and predictive UAV trajectories, we formulate a bi-objective Pareto optimization problem over a long-term planning horizon to jointly optimize the sampling timing for aerial traffic and the power and spectrum allocation for fair coexistence. However, the problem's non-convex, mixed-integer nature renders classical methods incapable of fully characterizing the complete Pareto frontier. Notably, we show monotonicity properties of the frontier, building on which we transform the bi-objective problem into several single-objective problems. We then propose a new graph-based algorithm and prove that it can find the complete set of Pareto optima with low complexity, linear in the horizon and near-quadratic in the resource block (RB) budget. Numerical comparisons show that our approach meets the stringent timeliness requirement and achieves a six-fold reduction in RB utilization or a 6 dB energy saving compared to benchmarks.
Graph approach for observability analysis in power system dynamic state estimation
The proposed approach yields a numerical method that provably executes in linear time with respect to the number of nodes and edges in a graph. The graph, constructed from the power system model, requires only knowledge of the dependencies between state-to-state and output-to-state variables within a state-space framework. While graph-based observability analysis methods exist for power system static-state estimation, the approach presented here is the first for dynamic-state estimation (DSE). We examine decentralized and centralized DSE scenarios and compare our findings with a well-established, albeit non-scalable, observability analysis method in the literature. When compared to the latter in a centralized DSE setting, our method reduced computation time by 1440x.
Statistically Adaptive Differential Protection for AC Microgrids Based on Kullback-Leibler Divergence
The proliferation of inverter-based resources challenges traditional microgrid protection by introducing variable fault currents and complex transients. This paper presents a statistically adaptive differential protection scheme based on Kullback-Leibler divergence, implemented via a Bartlett-corrected G-statistic computed on logarithm-transformed current magnitudes. The method is a multivariate fault detection engine that employs the Mahalanobis distance to distinguish healthy and faulty states, enabling robust detection even in noisy environments. Detection thresholds are statistically derived from a chi-squared distribution for precise control over the false alarm rate. Upon detection, a lightweight classifier identifies the fault type by assessing per-phase G-statistics against dedicated thresholds, enhanced by a temporal persistence filter for security. Extensive simulations on a modified CIGRE 14-bus microgrid show high efficacy: sub-cycle average detection delays, high detection and classification accuracy across operating modes, resilience to high-impedance faults up to 250 Ohms, tolerance to 10 ms communication delay, and noise levels down to a 20 dB signal-to-noise ratio. These findings demonstrate a reproducible and computationally efficient solution for next-generation AC microgrid protection.
Agentic AI Home Energy Management System: A Large Language Model Framework for Residential Load Scheduling
The electricity sector transition requires substantial increases in residential demand response capacity, yet Home Energy Management Systems (HEMS) adoption remains limited by user interaction barriers requiring translation of everyday preferences into technical parameters. While large language models have been applied to energy systems as code generators and parameter extractors, no existing implementation deploys LLMs as autonomous coordinators managing the complete workflow from natural language input to multi-appliance scheduling. This paper presents an agentic AI HEMS where LLMs autonomously coordinate multi-appliance scheduling from natural language requests to device control, achieving optimal scheduling without example demonstrations. A hierarchical architecture combining one orchestrator with three specialist agents uses the ReAct pattern for iterative reasoning, enabling dynamic coordination without hardcoded workflows while integrating Google Calendar for context-aware deadline extraction. Evaluation across three open-source models using real Austrian day-ahead electricity prices reveals substantial capability differences. Llama-3.3-70B successfully coordinates all appliances across all scenarios to match cost-optimal benchmarks computed via mixed-integer linear programming, while other models achieve perfect single-appliance performance but struggle to coordinate all appliances simultaneously. Progressive prompt engineering experiments demonstrate that analytical query handling without explicit guidance remains unreliable despite models' general reasoning capabilities. We open-source the complete system including orchestration logic, agent prompts, tools, and web interfaces to enable reproducibility, extension, and future research.
comment: 34 pages, 9 figures. Code available at https://github.com/RedaElMakroum/agentic-ai-hems
Optimal Bidding and Coordinated Dispatch of Hybrid Energy Systems in Regulation Markets
The increasing integration of renewable energy sources and distributed energy resources (DER) into modern power systems introduces significant uncertainty, posing challenges for maintaining grid flexibility and reliability. Hybrid energy systems (HES), composed of controllable generators, flexible loads, and battery storage, offer a decentralized solution to enhance flexibility compared to single centralized resources. This paper presents a two-level framework to enable HES participation in frequency regulation markets. The upper level performs a chance-constrained optimization to choose capacity bids based on historical regulation signals. At the lower level, a real-time control strategy disaggregates the regulation power among the constituent resources. This real-time control strategy is then benchmarked against an offline optimal dispatch to evaluate flexibility performance. Additionally, the framework evaluates the profitability of overbidding strategies and identifies thresholds beyond which performance degradation may lead to market penalties or disqualification. The proposed framework also compare the impact of imbalance of power capacities on performance and battery state of charge (SoC) through asymmetric HES configurations.
Two-Timescale Optimization Framework for IAB-Enabled Heterogeneous UAV Networks
In post-disaster scenarios, the rapid deployment of adequate communication infrastructure is essential to support disaster search, rescue, and recovery operations. To achieve this, uncrewed aerial vehicle (UAV) has emerged as a promising solution for emergency communication due to its low cost and deployment flexibility. However, conventional untethered UAV (U-UAV) is constrained by size, weight, and power (SWaP) limitations, making it incapable of maintaining the operation of a macro base station. To address this limitation, we propose a heterogeneous UAV-based framework that integrates tethered UAV (T-UAV) and U-UAVs, where U-UAVs are utilized to enhance the throughput of cell-edge ground user equipments (G-UEs) and guarantee seamless connectivity during G-UEs' mobility to safe zones. It is noted that the integrated access and backhaul (IAB) technique is adopted to support the wireless backhaul of U-UAVs. Accordingly, we formulate a two-timescale joint user scheduling and trajectory control optimization problem, aiming to maximize the downlink throughput under asymmetric traffic demands and G-UEs' mobility. To solve the formulated problem, we proposed a two-timescale multi-agent deep deterministic policy gradient (TTS-MADDPG) algorithm based on the centralized training and distributed execution paradigm. Numerical results show that the proposed algorithm outperforms other benchmarks, including the two-timescale multi-agent proximal policy optimization (TTS-MAPPO) algorithm and MADDPG scheduling method, with robust and higher throughput. Specifically, the proposed algorithm obtains up to 12.2\% average throughput gain compared to the MADDPG scheduling method.
Proxemics and Permeability of the Pedestrian Group
People tend to walk in groups, and interactions with those groups have a significant impact on crowd behavior and pedestrian traffic dynamics. Social norms can be seen as unwritten rules regulating people interactions in social settings. This article studies people interactions with groups and the emergence of group proxemics. Group zones, zone occupancy counts and people clearance from the group are studied using naturalistic data. Analysis indicate potential presence of three different zones in addition to the public zone. People tend to remain in the public zone and only progressively get closer to groups, and those closer approaches happen in a low frequency and for brief periods of time.
Life-cycle Modeling and the Walking Behavior of the Pedestrian-Group as an Emergent Agent: With Empirical Data on the Cohesion of the Group Formation
This article investigates the pedestrian group as an emergent agent. The article explores empirical data to derive emergent agency and formation state spaces and outline recurring patterns of walking behavior. In this analysis, pedestrian trajectories extracted from surveillance videos are used along with manually annotated pedestrian group memberships. We conducted manual expert evaluation of observed groups, produced new manual annotations for relevant events pertaining to group behavior and extracted metrics relevant group formation. This information along with quantitative analysis was used to model the life-cycle and formation of the group agent. Those models give structure to expectations around walking behavior of groups; from pedestrian walking independently to the emergence of a collective intention where group members tended to maintain bounded distance between each other. Disturbances to this bounded distance often happened in association with changes in either their agency or their formation states. We summarized the patterns of behavior along with the sequences of state transitions into abstract patterns, which can aid in the development of more detailed group agents in simulation and in the design of engineering systems to interact with such groups.
Efficient Collision-Avoidance Constraints for Ellipsoidal Obstacles in Optimal Control: Application to Path-Following MPC and UAVs
This article proposes a modular optimal control framework for local three-dimensional ellipsoidal obstacle avoidance, exemplarily applied to model predictive path-following control. Static as well as moving obstacles are considered. Central to the approach is a computationally efficient and continuously differentiable condition for detecting collisions with ellipsoidal obstacles. A novel two-stage optimization approach mitigates numerical issues arising from the structure of the resulting optimal control problem. The effectiveness of the approach is demonstrated through simulations and real-world experiments with the Crazyflie quadrotor. This represents the first hardware demonstration of an MPC controller of this kind for UAVs in a three-dimensional task.
Safety Margins of Inverse Optimal ISSf Controllers
We investigate the gain margin of a general nonlinear system under an inverse optimal input-to-state safe (ISSf) controller of the form u=u0(x)+u*(x,u0), where u0 is the nominal control and u* is the inverse optimal safety filter that minimally modifies the nominal controller's unsafe actions over the infinite horizon. By first establishing a converse ISSf-BF theorem, we reveal the equivalence among the achievability of ISSf by feedback, the achievability of inverse optimality, and the solvability of a Hamilton-Jacobi-Isaacs equation associated with the inverse optimal ISSf gain assignment. Then we develop a collection of safety margin results on the overall control u=u0+u*. In the absence of disturbances, we find that standard inverse optimal safe controllers have a certain degree of gain margin. Specifically, when f(x) acts safely but u0 acts unsafely, the gain can be decreased by up to half; and when f(x) acts unsafely, we establish that, if u0 acts safely, the gain can be increased arbitrarily, whereas if u0 acts unsafely, the control recovers the full gain margin [1/2,inf). It is shown, however, that under control gain variation, the safe set of these controllers is locally asymptotically stable, which implies that their safety is sensitive to large but bounded disturbances. To make inverse optimal ISSf controllers robust to gain variation, we propose a gain margin improvement approach at the expense of an increased control effort. This improvement allows the inverse optimal safe control to inherit the standard gain margin of [1/2,inf) without requiring prior knowledge of whether f(x) or u0 acts safely on the safety boundary, while simultaneously ensuring global asymptotic stability of the resulting safe set. In the presence of disturbances, this improvement idea renders inverse optimal ISSf controllers robust to gain variations with the same gain margin of [1/2,inf).
XWAVE: A Novel Software-Defined Everything Approach for the Manufacturing Industry
The manufacturing sector is moving from rigid, hardware-dependent systems toward flexible, software-driven environments. This transformation is shaped by the convergence of several Software-Defined technologies: Software-Defined Automation virtualizes industrial control, replacing proprietary PLCs with containerized, programmable solutions that enable scalability and interoperability. Software-Defined Compute and Communications provide a means to distribute intelligence seamlessly across devices, networks, and cloud platforms, reducing latency and enabling dynamic reconfiguration. Software-Defined Manufacturing Systems, usually implemented as Digital Twins, are real-time virtual models of machines and processes, allowing predictive analysis, optimization, and closer integration between human operators and intelligent systems. This work presents XWAVE, a project that unites these three Software-Defined paradigms to present a modular, fully software-defined manufacturing system.
Command-filter-based trajectory-tracking control of quadrotor subject to internal and external disturbances
We propose a command-filter backstepping controller that integrates a disturbance observer and a high-gain observer (HGO) to handle unknown internal and external disturbances acting on a quadrotor. To build the controller, we first define tracking errors between the measured and desired quadrotor outputs, which allow the system to be rewritten in a new set of state variables. Using this transformed model, we apply Lyapunov theory to derive a backstepping control law. To avoid repeated differentiation of states and virtual controls, a first-order command filter is introduced, and a nonlinear disturbance observer is added to provide disturbance estimates. Each state in the controller and observer is replaced with its estimate from the HGO. The resulting control law enables the quadrotor to follow its path despite internal and external disturbances, with each subsystem allowed its own disturbance type for realism. A new state transformation and Lyapunov-based derivation prevent the usual explosion of complexity, while the HGO reconstructs unmeasured states and their rates for output feedback. The nonlinear disturbance observer attenuates constant and nonlinear disturbances as well as band-limited white noise. The method reduces dependence on high-precision sensors and mitigates wind, model error, and rotor noise effects during flight. Unlike previous studies that treat either disturbance rejection or partial sensing, this work combines the command filter, disturbance observer, and HGO to address both challenges simultaneously while avoiding the complexity growth typical of backstepping designs.
Cooperative Task Spaces for Multi-Arm Manipulation Control based on Similarity Transformations
Many tasks in human environments require collaborative behavior between multiple kinematic chains, either to provide additional support for carrying big and bulky objects or to enable the dexterity that is required for in-hand manipulation. Since these complex systems often have a very high number of degrees of freedom coordinating their movements is notoriously difficult to model. In this article, we present the derivation of the theoretical foundations for cooperative task spaces of multi-arm robotic systems based on geometric primitives defined using conformal geometric algebra. Based on the similarity transformations of these cooperative geometric primitives, we derive an abstraction of complex robotic systems that enables representing these systems in a way that directly corresponds to single-arm systems. By deriving the associated analytic and geometric Jacobian matrices, we then show the straightforward integration of our approach into classical control techniques rooted in operational space control. We demonstrate this using bimanual manipulators, humanoids and multi-fingered hands in optimal control experiments for reaching desired geometric primitives and in teleoperation experiments using differential kinematics control. We then discuss how the geometric primitives naturally embed nullspace structures into the controllers that can be exploited for introducing secondary control objectives. This work, represents the theoretical foundations of this cooperative manipulation control framework, and thus the experiments are presented in an abstract way, while giving pointers towards potential future applications.
From Embedding to Control: Representations for Stochastic Multi-Object Systems
This paper studies how to achieve accurate modeling and effective control in stochastic nonlinear dynamics with multiple interacting objects. However, non-uniform interactions and random topologies make this task challenging. We address these challenges by proposing \textit{Graph Controllable Embeddings} (GCE), a general framework to learn stochastic multi-object dynamics for linear control. Specifically, GCE is built on Hilbert space embeddings, allowing direct embedding of probability distributions of controlled stochastic dynamics into a reproducing kernel Hilbert space (RKHS), which enables linear operations in its RKHS while retaining nonlinear expressiveness. We provide theoretical guarantees on the existence, convergence, and applicability of GCE. Notably, a mean field approximation technique is adopted to efficiently capture inter-object dependencies and achieve provably low sample complexity. By integrating graph neural networks, we construct data-dependent kernel features that are capable of adapting to dynamic interaction patterns and generalizing to even unseen topologies with only limited training instances. GCE scales seamlessly to multi-object systems of varying sizes and topologies. Leveraging the linearity of Hilbert spaces, GCE also supports simple yet effective control algorithms for synthesizing optimal sequences. Experiments on physical systems, robotics, and power grids validate GCE and demonstrate consistent performance improvement over various competitive embedding methods in both in-distribution and few-shot tests
Design of Orthogonal Phase of Arrival Positioning Scheme Based on 5G PRS and Optimization of TOA Performance
This study analyzes the performance of positioning techniques based on configuration changes of 5G New Radio signals. In 5G networks, a terminal position is determined from the Time of Arrival of Positioning Reference Signals transmitted by base stations. We propose an algorithm that improves TOA accuracy under low sampling rate constraints and implement 5G PRS for positioning in a software defined modem. We also examine how flexible time frequency resource allocation of PRS affects TOA estimation accuracy and discuss optimal PRS configurations for a given signal environment.
Confidential FRIT via Homomorphic Encryption
Edge computing alleviates the computation burden of data-driven control in cyber-physical systems (CPSs) by offloading complex processing to edge servers. However, the increasing sophistication of cyberattacks underscores the need for security measures that go beyond conventional IT protections and address the unique vulnerabilities of CPSs. This study proposes a confidential data-driven gain-tuning framework using homomorphic encryption, such as ElGamal and CKKS encryption schemes, to enhance cybersecurity in gain-tuning processes outsourced to external servers. The idea for realizing confidential FRIT is to replace the matrix inversion operation with a vector summation form, allowing homomorphic operations to be applied. Numerical examples under 128-bit security confirm performance comparable to conventional methods while providing guidelines for selecting suitable encryption schemes for secure CPS.
Green Wireless Network Scaling for Joint Deployment: Multi-BSs or Multi-RISs?
The imminent emergence of sixth-generation (6G) networks faces critical challenges from spatially heterogeneous traffic and escalating energy consumption, necessitating sustainable scaling strategies for network infrastructure such as base stations (BSs) and reconfigurable intelligent surfaces (RISs). This paper establishes fundamental scaling laws for the Integrated Relative Energy Efficiency (IREE) metric under joint multi-BS and multi-RIS deployment in traffic-mismatched scenarios. Specifically, we propose an Alternating Directional Dual-Radial Basis Function (ADD-RBF) framework that models the channels of BSs and RISs as two type of spatially decoupled RBF neurons to maximize IREE through alternative optimization, with proven universal approximation capability and convergence guarantees. Theoretical analysis reveals a scaling dichotomy: BS proliferation drives logarithmic capacity growth $\mathcal{O}(\log N^{BS})$ but only polynomial mismatch reduction $\mathcal{O}(1/\sqrt{N^{BS}})$, whereas RIS deployment achieves exponential mismatch mitigation $\mathcal{O}(\delta_{\text{err}}^{-(N^R+1)})$ despite its sub-logarithmic capacity gains. Simulation results validate that RISs excel in capturing spatial traffic correlations and alleviating hotspots, making them particularly effective when mismatch dominates, while BSs are preferable under capacity shortages. These findings offer practical guidelines for green 6G network design.
A Scenario-Based Approach for Stochastic Economic Model Predictive Control with an Expected Shortfall Constraint
This paper presents a novel approach to stochastic economic model predictive control (SEMPC) that minimizes average economic cost while satisfying an empirical expected shortfall (EES) constraint to manage risk. A new scenario-based problem formulation ensuring controlled risk with high confidence while minimizing the average cost is introduced. The probabilistic guarantees is dependent on the number of support elements over the entire input domain, which is difficult to find for high-dimensional systems. A heuristic algorithm is proposed to find the number of support elements. Finally, an efficient method is presented to reduce the computational complexity of the SEMPC problem with an EES constraint. The approach is validated on a water distribution network, showing its effectiveness in balancing performance and risk.
Competitive Equilibrium for Electricity Markets with Spatially Flexible Load
Electric vehicle charging and geo-distributed datacenters introduce spatially flexible loads (FLs) that couple power, transportation, and datacenter networks. These couplings create a closed-loop feedback between locational marginal prices (LMPs) and decisions of the FL systems, challenging the foundations of conventional competitive equilibrium (CE) in electricity markets. This paper studies a notion of generalized competitive equilibrium (GCE) that aims to capture such price-demand interactions across the interconnected infrastructures. We establish structural conditions under which the GCE preserves key properties of the conventional CE, including existence, uniqueness, and efficiency, without requiring detailed knowledge of decision processes for individual FL systems. The framework generalizes to settings where the grid is coupled with multiple FL systems. Stylized examples and case studies on the New York ISO grid, coupled with the Sioux Falls transportation and distributed datacenter networks, demonstrate the use of our theoretical framework and illustrate the mutual influence among the grid and the studied FL systems.
SUSTAINABLE Platform: Seamless Smart Farming Integration Towards Agronomy Automation SC2
The global agricultural sector is undergoing a transformative shift, driven by increasing food demands, climate variability and the need for sustainable practices. SUSTAINABLE is a smart farming platform designed to integrate IoT, AI, satellite imaging, and role-based task orchestration to enable efficient, traceable, and sustainable agriculture with a pilot usecase in viticulture. This paper explores current smart agriculture solutions, presents a comparative evaluation, and introduces SUSTAINABLE's key features, including satellite index integration, real-time environmental data, and role-aware task management tailored to Mediterranean vineyards.
comment: Accepted for presentation to 11th IEEE International Smart Cities Conference (ISC2 2025)
Dispatchable Current Source Virtual Oscillator Control Achieving Global Stability
This work introduces a novel dispatchable current source virtual oscillator control (dCVOC) scheme for grid-following (GFL) converters, which exhibits duality with dispatchable virtual oscillator control (dVOC) in two ways: a) the current frequency is generated through reactive power control, similar to a PLL ; b) the current magnitude reference is generated through active power control. We formally prove that our proposed control always admits a steady-state equilibrium and ensures global stability under reasonable conditions on grid and converter parameters, even when considering LVRT and current saturation constraints. Our approach avoids low-voltage transients and weak grid instability, which is not the case for conventional GFL control. The effectiveness of our proposed control is verified through high-fidelity electromagnetic transient simulations.
Quantitative Parameter Conditions for Stability and Coupling in GFM-GFL Converter Hybrid Systems from a Small-Signal Synchronous Perspective
With the development of renewable energy sources, power systems are gradually evolving into a system comprising both grid-forming (GFM) and grid-following (GFL) converters. However, the dynamic interaction between the two types of converters, especially low-inertia GFM converters and GFL converters, remains unclear due to the substantial differences in their synchronization mechanisms. To address this gap, this paper develops a small-signal synchronous stability model for power systems containing GFM and GFL converters, which considers network line dynamics. Based on subspace perturbation theory, we reveal that GFM and GFL subsystems can be effectively decoupled when GFL converters operate near unity power factor or when GFM converters possess sufficiently large inertia or damping, and provide lower bound of control parameters ensuring decoupling. Under the decoupling condition, we propose decentralized and analytical parameter-based stability criteria which have clear physical interpretations: the positive damping of converters compensates for the negative damping of the network. In the case of coupling, we also propose decentralized stability criteria based on the small phase theorem. The effectiveness of the theoretical analysis is validated through simulations in MATLAB/Simulink.
Adaptive Control for a Physics-Informed Model of a Thermal Energy Distribution System: Qualitative Analysis
Integrated energy systems (IES) are complex heterogeneous architectures that typically encompass power sources, hydrogen electrolyzers, energy storage, and heat exchangers. This integration is achieved through operating control strategy optimization. However, the lack of physical understanding as to how these systems evolve over time introduces uncertainties that hinder reliable application thereof. Techniques that can accommodate such uncertainties are fundamental for ensuring proper operation of these systems. Unfortunately, no unifying methodology exists for accommodating uncertainties in this regard. That being said, adaptive control (AC) is a discipline that may allow for accommodating such uncertainties in real-time. In the present work, we derive an AC formulation for linear systems in which all states are observable and apply it to the control of a glycol heat exchanger (GHX) in an IES. Based on prior research in which we quantified the uncertainties of the GHXs system dynamics, we introduced an error of 50% on four terms of the nominal model. In the case where a linear quadratic regulator is used as the nominal control for the reference system, we found that employing AC can reduce the mean absolute error and integral time absolute error by a factor of 30%-75%. This reduction is achieved with minimal computing overhead and control infrastructure, thus underscoring the strength of AC. However, the control effort induced is significant, therefore warranting further study in order to estimate its impact on a physical system. To address further challenges, including partially observable and non-linear dynamics, enhancements of the linear formulation are currently being developed.
Quantifying Grid-Forming Behavior: Bridging Device-level Dynamics and System-Level Strength
Grid-forming (GFM) technology is widely regarded as a promising solution for future power systems dominated by power electronics. However, a precise method for quantifying GFM converter behavior and a universally accepted GFM definition remain elusive. Moreover, the impact of GFM on system stability is not precisely quantified, creating a significant disconnect between device and system levels. To address these gaps from a small-signal perspective, at the device level, we introduce a novel metric, the Forming Index (FI) to quantify a converter's response to grid voltage fluctuations. Rather than enumerating various control architectures, the FI provides a metric for the converter's GFM ability by quantifying its sensitivity to grid variations. At the system level, we propose a new quantitative measure of system strength that captures the multi-bus voltage stiffness, which quantifies the voltage and phase angle responses of multiple buses to current or power disturbances. We further extend this concept to grid strength and bus strength to identify weak areas within the system. Finally, we bridge the device and system levels by formally proving that GFM converters enhance system strength. Our proposed framework provides a unified benchmark for GFM converter design, optimal placement, and system stability assessment.
Ferrohydrodynamic Microfluidics for Bioparticle Separation and Single-Cell Phenotyping: Principles, Applications, and Emerging Directions
Ferrohydrodynamic microfluidics relies on magnetic field gradients to manipulate diamagnetic particles in ferrofluid-filled microenvironments. It has emerged as a promising tool for label-free manipulation of bioparticles, including their separation and phenotyping. This perspective reviews recent progress in the development and applications of ferrofluid-based microfluidic platforms for multiscale bioparticle separation, ranging from micron-scale cells to submicron extracellular vesicles. We highlight the fundamental physical principles for ferrohydrodynamic manipulation, including the dominant magnetic buoyancy force resulting from the interaction of ferrofluids and particles. We then describe how these principles enable high-resolution size-based bioparticle separation, subcellular bioparticle enrichment, and phenotypic screening based on physical traits. We also discuss key challenges in ferrohydrodynamic microfluidics from the aspects of ferrofluid biocompatibility, system throughput, and nanoparticle depletion. Finally, we outline future research directions involving machine learning, 3D printing, and multiplexed detection. These insights chart a path for advancing ferrofluid-based technologies in precision biomedicine, diagnostics, and cellular engineering.
Cooperative Integrated Estimation-Guidance for Simultaneous Interception of Moving Targets
This paper proposes a cooperative integrated estimation-guidance framework for simultaneous interception of a non-maneuvering target using a team of unmanned autonomous vehicles, assuming only a subset of vehicles are equipped with dedicated sensors to measure the target's states. Unlike earlier approaches that focus solely on either estimation or guidance design, the proposed framework unifies both within a cooperative architecture. To circumvent the limitation posed by heterogeneity in target observability, sensorless vehicles estimate the target's state by leveraging information exchanged with neighboring agents over a directed communication topology through a prescribed-time observer. The proposed approach employs true proportional navigation guidance (TPNG), which uses an exact time-to-go formulation and is applicable across a wide spectrum of target motions. Furthermore, prescribed-time observer and controller are employed to achieve convergence to true target's state and consensus in time-to-go within set predefined times, respectively. Simulations demonstrate the effectiveness of the proposed framework under various engagement scenarios.
Finite Sample MIMO System Identification with Multisine Excitation: Nonparametric, Direct, and Two-step Parametric Estimators
Multisine excitations are widely used for identifying multi-input multi-output systems due to their periodicity, data compression properties, and control over the input spectrum. Despite their popularity, the finite sample statistical properties of frequency-domain estimators under multisine excitation, for both nonparametric and parametric settings, remain insufficiently understood. This paper develops a finite-sample statistical framework for least-squares estimation of the frequency response function (FRF) and its implications for parametric modeling. First, we derive exact distributional and covariance properties of the FRF estimator, explicitly accounting for aliasing effects under slow sampling regimes, and establish conditions for unbiasedness, uncorrelatedness, and consistency across multiple experiments. Second, we show that the FRF estimate is a sufficient statistic for any parametric model under Gaussian noise, leading to an exact equivalence between optimal two stage frequency-domain methods and time-domain prediction error and maximum likelihood estimation. This equivalence is shown to yield finite-sample concentration bounds for parametric maximum likelihood estimators, enabling rigorous uncertainty quantification, and closed-form prediction error method estimators without iterative optimization. The theoretical results are demonstrated in a representative case study.
comment: 16 pages, 4 figures
Data-Driven Stabilization Using Prior Knowledge on Stabilizability and Controllability
In this work, we study data-driven stabilization of linear time-invariant systems using prior knowledge of system-theoretic properties, specifically stabilizability and controllability. To formalize this, we extend the concept of data informativity by requiring the existence of a controller that stabilizes all systems consistent with the data and the prior knowledge. We show that if the system is controllable, then incorporating this as prior knowledge does not relax the conditions required for data-driven stabilization. Remarkably, however, we show that if the system is stabilizable, then using this as prior knowledge leads to necessary and sufficient conditions that are weaker than those for data-driven stabilization without prior knowledge. In other words, data-driven stabilization is easier if one knows that the underlying system is stabilizable. We also provide new data-driven control design methods in terms of linear matrix inequalities that complement the conditions for informativity.
comment: 6 pages
Decentralized Merging Control of Connected and Automated Vehicles to Enhance Safety and Energy Efficiency using Control Barrier Functions
This paper presents a decentralized Control Barrier Function (CBF) based approach for highway merging of Connected and Automated Vehicles (CAVs). In this control algorithm, each "host" vehicle negotiates with other agents in a control zone of the highway network, and enacts its own action, to perform safe and energy-efficient merge maneuvers. It uses predictor-corrector loops within the robust CBF setting for negotiation and to reconcile disagreements that may arise. There is no explicit order of vehicles and no priority. A notable feature is absence of gridlocks due to instability of the inter-agent system. Results from Monte Carlo simulations show significant improvement in the system-wide energy efficiency and traffic flow compared to a first-in-first-out approach, as well as enhanced robustness of the proposed decentralized controller compared to its centralized counterpart.
comment: This work has been submitted to a conference for possible publication and is under review. Paper summary: 8 pages, 5 figures, 2 tables
Recursive Experiment Design for Closed-Loop Identification with Output Perturbation Limits
In many applications, system identification experiments must be performed under output feedback to ensure safety or to maintain system operation. In this paper, we consider the online design of informative experiments for ARMAX models by applying a bounded perturbation to the input signal generated by a fixed output feedback controller. Specifically, the design constrains the resulting output perturbation within user-specified limits and can be efficiently computed in closed form. We demonstrate the effectiveness of the method in a numerical experiment.
Optimal and Heuristic Approaches for Platooning Systems with Deadlines
Efficient truck platooning is a key strategy for reducing freight costs, lowering fuel consumption, and mitigating emissions. Deadlines are critical in this context, as trucks must depart within specific time windows to meet delivery requirements and avoid penalties. In this paper, we investigate the optimal formation and dispatch of truck platoons at a highway station with finite capacity $L$ and deadline constraints $T$. The system operates in discrete time, with each arriving truck assigned a deadline of $T$ slot units. The objective is to leverage the efficiency gains from forming large platoons while accounting for waiting costs and deadline violations. We formulate the problem as a Markov decision process and analyze the structure of the optimal policy $\pi^\star$ for $L = 3$, extending insights to arbitrary $L$. We prove certain monotonicity properties of the optimal policy in the state space $\mathcal{S}$ and identify classes of unreachable states. Moreover, since the size of $\mathcal{S}$ grows exponentially with $L$ and $T$, we propose heuristics -- including conditional and deep-learning based approaches -- that exploit these structural insights while maintaining low computational complexity.
Convex computation of regions of attraction from data using Sums-of-Squares programming
The paper concentrates on the analysis of the Region of Attraction (RoA) for unknown autonomous dynamical systems. The aim is to explore a data-driven approach based on moment Sum-of-Squares (SoS) hierarchy, which enables novel RoA outer approximations despite the reduced information on the structure of the dynamics. The main contribution of this work is bypassing the system model and, consequently, the recurring constraint on its polynomial structure. Numerical experimentation showcases the influence of data on learned approximating sets, offering a promising outlook on the potential of this method.
Smart Exploration in Reinforcement Learning using Bounded Uncertainty Models
Reinforcement learning (RL) is a powerful framework for decision-making in uncertain environments, but it often requires large amounts of data to learn an optimal policy. We address this challenge by incorporating prior model knowledge to guide exploration and accelerate the learning process. Specifically, we assume access to a model set that contains the true transition kernel and reward function. We optimize over this model set to obtain upper and lower bounds on the Q-function, which are then used to guide the exploration of the agent. We provide theoretical guarantees on the convergence of the Q-function to the optimal Q-function under the proposed class of exploring policies. Furthermore, we also introduce a data-driven regularized version of the model set optimization problem that ensures the convergence of the class of exploring policies to the optimal policy. Lastly, we show that when the model set has a specific structure, namely the bounded-parameter MDP (BMDP) framework, the regularized model set optimization problem becomes convex and simple to implement. In this setting, we also prove finite-time convergence to the optimal policy under mild assumptions. We demonstrate the effectiveness of the proposed exploration strategy, which we call BUMEX (Bounded Uncertainty Model-based Exploration), in a simulation study. The results indicate that the proposed method can significantly accelerate learning in benchmark examples. A toolbox is available at https://github.com/JvHulst/BUMEX.
comment: Accepted for Presentation at 64th IEEE Conference on Decision and Control, CDC 2025, Rio de Janeiro, Brazil, 2025
Agile and Cooperative Aerial Manipulation of a Cable-Suspended Load
Quadrotors can carry slung loads to hard-to-reach locations at high speed. Since a single quadrotor has limited payload capacities, using a team of quadrotors to collaboratively manipulate a heavy object is a scalable and promising solution. However, existing control algorithms for multi-lifting systems only enable low-speed and low-acceleration operations due to the complex dynamic coupling between quadrotors and the load, limiting their use in time-critical missions such as search and rescue. In this work, we present a solution to significantly enhance the agility of cable-suspended multi-lifting systems. Unlike traditional cascaded solutions, we introduce a trajectory-based framework that solves the whole-body kinodynamic motion planning problem online, accounting for the dynamic coupling effects and constraints between the quadrotors and the load. The planned trajectory is provided to the quadrotors as a reference in a receding-horizon fashion and is tracked by an onboard controller that observes and compensates for the cable tension. Real-world experiments demonstrate that our framework can achieve at least eight times greater acceleration than state-of-the-art methods to follow agile trajectories. Our method can even perform complex maneuvers such as flying through narrow passages at high speed. Additionally, it exhibits high robustness against load uncertainties and does not require adding any sensors to the load, demonstrating strong practicality.
comment: 38 pages, 11 figures
SafEDMD: A Koopman-based data-driven controller design framework for nonlinear dynamical systems
The Koopman operator serves as the theoretical backbone for machine learning of dynamical control systems, where the operator is heuristically approximated by extended dynamic mode decomposition (EDMD). In this paper, we propose SafEDMD, a novel stability- and feedback-oriented EDMD-based controller design framework. Our approach leverages a reliable surrogate model generated in a data-driven fashion in order to provide closed-loop guarantees. In particular, we establish a controller design based on semi-definite programming with guaranteed stabilization of the underlying nonlinear system. As central ingredient, we derive proportional error bounds that vanish at the origin and are tailored to control tasks. We illustrate the developed method by means of several benchmark examples and highlight the advantages over state-of-the-art methods.
comment: Accepted for publication in Automatica
High Performance Distributed Control for Large-Scale Linear Systems: A Cover-Based Distributed Observer Approach
In recent years, the distributed-observer-based distributed control law has shown powerful ability to arbitrarily approximate the centralized control performance. However, the traditional distributed observer requires each local observer to reconstruct the state information of the whole system, which is unrealistic for large-scale scenarios. To fill this gap, This paper presents a coverage solution algorithm for large-scale systems that accounts for both physical and communication network characteristics, which can significantly reduce the dimension of local observers. Then, the cover-based distributed observer for large-scale systems is proposed to overcome the problem that the system dynamics are difficult to estimate due to the coupling between cover sets. Furthermore, the two-layer Lyapunov analysis method is adopted and the dynamic transformation lemma of compact errors is proved, which solves the problem of analyzing stability of the error dynamic of the cover-based distributed observer. Finally, it is proved that the distributed control law based on the cover-based distributed observer can also arbitrarily approximate the control performance of the centralized control law, and the dimension of the local observer is greatly reduced compared with the traditional method. The simulation results show the validity of the developed theories.
End-to-end guarantees for indirect data-driven control of bilinear systems with finite stochastic data
In this paper we propose an end-to-end algorithm for indirect data-driven control for bilinear systems with stability guarantees. We consider the case where the collected i.i.d. data is affected by probabilistic noise with possibly unbounded support and leverage tools from statistical learning theory to derive finite sample identification error bounds. To this end, we solve the bilinear identification problem by solving a set of linear and affine identification problems, by a particular choice of a control input during the data collection phase. We provide a priori as well as data-dependent finite sample identification error bounds on the individual matrices as well as ellipsoidal bounds, both of which are structurally suitable for control. Further, we integrate the structure of the derived identification error bounds in a robust controller design to obtain an exponentially stable closed-loop. By means of an extensive numerical study we showcase the interplay between the controller design and the derived identification error bounds. Moreover, we note appealing connections of our results to indirect data-driven control of general nonlinear systems through Koopman operator theory and discuss how our results may be applied in this setup.
Game Theoretic Resilience Recommendation Framework for CyberPhysical Microgrids Using Hypergraph MetaLearning
This paper presents a physics-aware cyberphysical resilience framework for radial microgrids under coordinated cyberattacks. The proposed approach models the attacker through a hypergraph neural network (HGNN) enhanced with model agnostic metalearning (MAML) to rapidly adapt to evolving defense strategies and predict high-impact contingencies. The defender is modeled via a bi-level Stackelberg game, where the upper level selects optimal tie-line switching and distributed energy resource (DER) dispatch using an Alternating Direction Method of Multipliers (ADMM) coordinator embedded within the Non-dominated Sorting Genetic Algorithm II (NSGA-II). The framework simultaneously optimizes load served, operational cost, and voltage stability, ensuring all post-defense states satisfy network physics constraints. The methodology is first validated on the IEEE 69-bus distribution test system with 12 DERs, 8 critical loads, and 5 tie-lines, and then extended to higher bus systems including the IEEE 123-bus feeder and a synthetic 300-bus distribution system. Results show that the proposed defense strategy restores nearly full service for 90% of top-ranked attacks, mitigates voltage violations, and identifies Feeder 2 as the principal vulnerability corridor. Actionable operating rules are derived, recommending pre-arming of specific tie-lines to enhance resilience, while higher bus system studies confirm scalability of the framework on the IEEE 123-bus and 300-bus systems.
Climate Science and Control Engineering: Insights, Parallels, and Connections
Climate science is the multidisciplinary field that studies the Earth's climate and its evolution. At the very core of climate science are indispensable climate models that predict future climate scenarios, inform policy decisions, and dictate how a country's economy should change in light of the changing climate. Climate models capture a wide range of interacting dynamic processes via extremely complex ordinary and partial differential equations. To model these large-scale complex processes, climate science leverages supercomputers, advanced simulations, and statistical methods to predict future climate. An area of engineering that is rarely studied in climate science is control engineering. Given that climate systems are inherently dynamic, it is intuitive to analyze them within the framework of dynamic system science. This perspective has been underexplored in the literature. In this manuscript, we provide a tutorial that: (i) introduces the control engineering community to climate dynamics and modeling, including spatiotemporal scales and challenges in climate modeling; (ii) offers a fresh perspective on climate models from a control systems viewpoint; and (iii) explores the relevance and applicability of various advanced graph and network control-based approaches in building a physics-informed framework for learning, control and estimation in climate systems. We also present simple and then more complex climate models, depicting fundamental ideas and processes that are instrumental in building climate change projections. This tutorial also builds parallels and observes connections between various contemporary problems at the forefront of climate science and their control theoretic counterparts. We specifically observe that an abundance of climate science problems can be linguistically reworded and mathematically framed as control theoretic ones.
On the Detection of Shared Data Manipulation in Distributed Optimization
This paper investigates the vulnerability of the Alternating Direction Method of Multipliers (ADMM) algorithm to shared data manipulation, with a focus on solving optimal power flow (OPF) problems. Deliberate data manipulation may cause the ADMM algorithm to converge to suboptimal solutions. We derive a sufficient condition for detecting data manipulation based on the theoretical convergence trajectory of the ADMM algorithm. We evaluate the performance of the detection condition on three data manipulation strategies with various levels of complexity and stealth. The simplest attack sends the target values and each iteration, the second attack uses a feedback loop to find the next target values, and the last attack uses a bilevel optimization to find the target values. We then extend the three data manipulation strategies to avoid detection by the detection conditions and a neural network (NN) detection model. We also propose an adversarial NN training framework to detect shared data manipulation. We illustrate the performance of our data manipulation strategy and detection framework on OPF problems. The results show that the proposed detection condition successfully detects most of the data manipulation attacks. However, the bilevel optimization attack strategy that incorporates the detection methods may avoid being detected. Countering this, our proposed adversarial training framework detects all the instances of the bilevel optimization attack.
Online Adaptation for Flying Quadrotors in Tight Formations
The task of flying in tight formations is challenging for teams of quadrotors because the complex aerodynamic wake interactions can destabilize individual team members as well as the team. Furthermore, these aerodynamic effects are highly nonlinear and fast-paced, making them difficult to model and predict. To overcome these challenges, we present L1 KNODE-DW MPC, an adaptive, mixed expert learning based control framework that allows individual quadrotors to accurately track trajectories while adapting to time-varying aerodynamic interactions during formation flights. We evaluate L1 KNODE-DW MPC in two different three-quadrotor formations and show that it outperforms several MPC baselines. Our results show that the proposed framework is capable of enabling the three-quadrotor team to remain vertically aligned in close proximity throughout the flight. These findings show that the L1 adaptive module compensates for unmodeled disturbances most effectively when paired with an accurate dynamics model. A video showcasing our framework and the physical experiments is available here: https://youtu.be/9QX1Q5Ut9Rs
comment: 10 pages, 4 figures
Understanding the Application of Utility Theory in Robotics and Artificial Intelligence: A Survey
As a unifying concept in economics, game theory, and operations research, even in the Robotics and AI field, the utility is used to evaluate the level of individual needs, preferences, and interests. Especially for decision-making and learning in multi-agent/robot systems (MAS/MRS), a suitable utility model can guide agents in choosing reasonable strategies to achieve their current needs and learning to cooperate and organize their behaviors, optimizing the system's utility, building stable and reliable relationships, and guaranteeing each group member's sustainable development, similar to the human society. Although these systems' complex, large-scale, and long-term behaviors are strongly determined by the fundamental characteristics of the underlying relationships, there has been less discussion on the theoretical aspects of mechanisms and the fields of applications in Robotics and AI. This paper introduces a utility-orient needs paradigm to describe and evaluate inter and outer relationships among agents' interactions. Then, we survey existing literature in relevant fields to support it and propose several promising research directions along with some open problems deemed necessary for further investigations.
comment: I am not sure whether withdrawing this paper is suitable. However, right now this paper has significant changes in its topic and author. So, I do not want to lead to any confusion about this paper. In the future, it will have a new version. I hope people will not have issues and confusion about the older one
Integrated Learning and Optimization to Control Load Demand and Wind Generation for Minimizing Ramping Cost in Real-Time Electricity Market
We developed a new integrated learning and optimization (ILO) methodology to predict context-aware unknown parameters in economic dispatch (ED), a crucial problem in power systems solved to generate optimal power dispatching decisions to serve consumer load. The ED formulation in the current study consists of load and renewable generation as unknown parameters in its constraints predicted using contextual information (e.g., prior load, temperature). The ILO framework train a neural network (NN) to estimate ED parameters by minimizing an application-specific regret function which is a difference between ground truth and NN-driven decisions favouring better ED decisions. We thoroughly analyze the feasible region of ED formulation to understand the impact of load and renewable learning together on the ED decisions. Corresponding to that we developed a new regret function to capture real-time electricity market operations where differences in predicted and true loads are corrected by ramping generators in real-time but at a higher cost than the market price. The proposed regret function when minimized using ILO framework train the NN to guide the load and renewable predictions to generate ED decisions favouring minimum generator ramping costs. This is unlike conventional sequential learning and optimization (SLO) framework which train NN to accurately estimate load and renewable instead of better ED decisions. The combined training of load and renewable using ILO is a new concept and lead to significantly improved ramping costs when compared with SLO based training of load and renewable and SLO trained load with 100% accurate renewable proving its decision-focused capability.
comment: The preprint was submitted to disseminate the idea as soon as possible and was submitted without asking one of the authors listed in the manuscript as he was the supervisor. Moreover, the submitted preprint mentions being submitted in a journal while it has not yet been submitted in a journal yet. The institute thus asked to withdraw the preprint
Robotics
Running VLAs at Real-time Speed
In this paper, we show how to run pi0-level multi-view VLA at 30Hz frame rate and at most 480Hz trajectory frequency using a single consumer GPU. This enables dynamic and real-time tasks that were previously believed to be unattainable by large VLA models. To achieve it, we introduce a bag of strategies to eliminate the overheads in model inference. The real-world experiment shows that the pi0 policy with our strategy achieves a 100% success rate in grasping a falling pen task. Based on the results, we further propose a full streaming inference framework for real-time robot control of VLA. Code is available at https://github.com/Dexmal/realtime-vla.
comment: Code is available at https://github.com/Dexmal/realtime-vla
Hybrid Consistency Policy: Decoupling Multi-Modal Diversity and Real-Time Efficiency in Robotic Manipulation
In visuomotor policy learning, diffusion-based imitation learning has become widely adopted for its ability to capture diverse behaviors. However, approaches built on ordinary and stochastic denoising processes struggle to jointly achieve fast sampling and strong multi-modality. To address these challenges, we propose the Hybrid Consistency Policy (HCP). HCP runs a short stochastic prefix up to an adaptive switch time, and then applies a one-step consistency jump to produce the final action. To align this one-jump generation, HCP performs time-varying consistency distillation that combines a trajectory-consistency objective to keep neighboring predictions coherent and a denoising-matching objective to improve local fidelity. In both simulation and on a real robot, HCP with 25 SDE steps plus one jump approaches the 80-step DDPM teacher in accuracy and mode coverage while significantly reducing latency. These results show that multi-modality does not require slow inference, and a switch time decouples mode retention from speed. It yields a practical accuracy efficiency trade-off for robot policies.
Heuristic Adaptation of Potentially Misspecified Domain Support for Likelihood-Free Inference in Stochastic Dynamical Systems
In robotics, likelihood-free inference (LFI) can provide the domain distribution that adapts a learnt agent in a parametric set of deployment conditions. LFI assumes an arbitrary support for sampling, which remains constant as the initial generic prior is iteratively refined to more descriptive posteriors. However, a potentially misspecified support can lead to suboptimal, yet falsely certain, posteriors. To address this issue, we propose three heuristic LFI variants: EDGE, MODE, and CENTRE. Each interprets the posterior mode shift over inference steps in its own way and, when integrated into an LFI step, adapts the support alongside posterior inference. We first expose the support misspecification issue and evaluate our heuristics using stochastic dynamical benchmarks. We then evaluate the impact of heuristic support adaptation on parameter inference and policy learning for a dynamic deformable linear object (DLO) manipulation task. Inference results in a finer length and stiffness classification for a parametric set of DLOs. When the resulting posteriors are used as domain distributions for sim-based policy learning, they lead to more robust object-centric agent performance.
Hybrid DQN-TD3 Reinforcement Learning for Autonomous Navigation in Dynamic Environments
This paper presents a hierarchical path-planning and control framework that combines a high-level Deep Q-Network (DQN) for discrete sub-goal selection with a low-level Twin Delayed Deep Deterministic Policy Gradient (TD3) controller for continuous actuation. The high-level module selects behaviors and sub-goals; the low-level module executes smooth velocity commands. We design a practical reward shaping scheme (direction, distance, obstacle avoidance, action smoothness, collision penalty, time penalty, and progress), together with a LiDAR-based safety gate that prevents unsafe motions. The system is implemented in ROS + Gazebo (TurtleBot3) and evaluated with PathBench metrics, including success rate, collision rate, path efficiency, and re-planning efficiency, in dynamic and partially observable environments. Experiments show improved success rate and sample efficiency over single-algorithm baselines (DQN or TD3 alone) and rule-based planners, with better generalization to unseen obstacle configurations and reduced abrupt control changes. Code and evaluation scripts are available at the project repository.
comment: 6 pages, 5 figures; ROS+Gazebo (TurtleBot3) implementation; evaluation with PathBench metrics; code (primary): https://github.com/MayaCHEN-github/HierarchicalRL-robot-navigation; mirror (for reproducibility): https://github.com/ShowyHe/DRL-robot-navigation
REALMS2 -- Resilient Exploration And Lunar Mapping System 2 -- A Comprehensive Approach IROS 2025
The European Space Agency (ESA) and the European Space Resources Innovation Centre (ESRIC) created the Space Resources Challenge to invite researchers and companies to propose innovative solutions for Multi-Robot Systems (MRS) space prospection. This paper proposes the Resilient Exploration And Lunar Mapping System 2 (REALMS2), a MRS framework for planetary prospection and mapping. Based on Robot Operating System version 2 (ROS 2) and enhanced with Visual Simultaneous Localisation And Mapping (vSLAM) for map generation, REALMS2 uses a mesh network for a robust ad hoc network. A single graphical user interface (GUI) controls all the rovers, providing a simple overview of the robotic mission. This system is designed for heterogeneous multi-robot exploratory missions, tackling the challenges presented by extraterrestrial environments. REALMS2 was used during the second field test of the ESA-ESRIC Challenge and allowed to map around 60% of the area, using three homogeneous rovers while handling communication delays and blackouts.
comment: 8 Pages, 8 Figures, Submitted and Accepted to IROS 2025
A Sliding-Window Filter for Online Continuous-Time Continuum Robot State Estimation
Stochastic state estimation methods for continuum robots (CRs) often struggle to balance accuracy and computational efficiency. While several recent works have explored sliding-window formulations for CRs, these methods are limited to simplified, discrete-time approximations and do not provide stochastic representations. In contrast, current stochastic filter methods must run at the speed of measurements, limiting their full potential. Recent works in continuous-time estimation techniques for CRs show a principled approach to addressing this runtime constraint, but are currently restricted to offline operation. In this work, we present a sliding-window filter (SWF) for continuous-time state estimation of CRs that improves upon the accuracy of a filter approach while enabling continuous-time methods to operate online, all while running at faster-than-real-time speeds. This represents the first stochastic SWF specifically designed for CRs, providing a promising direction for future research in this area.
comment: 8 pages, 6 figures. Submitted to IEEE-RAS International Conference on Soft Robotics 2026
Spiking Patches: Asynchronous, Sparse, and Efficient Tokens for Event Cameras
We propose tokenization of events and present a tokenizer, Spiking Patches, specifically designed for event cameras. Given a stream of asynchronous and spatially sparse events, our goal is to discover an event representation that preserves these properties. Prior works have represented events as frames or as voxels. However, while these representations yield high accuracy, both frames and voxels are synchronous and decrease the spatial sparsity. Spiking Patches gives the means to preserve the unique properties of event cameras and we show in our experiments that this comes without sacrificing accuracy. We evaluate our tokenizer using a GNN, PCN, and a Transformer on gesture recognition and object detection. Tokens from Spiking Patches yield inference times that are up to 3.4x faster than voxel-based tokens and up to 10.4x faster than frames. We achieve this while matching their accuracy and even surpassing in some cases with absolute improvements up to 3.8 for gesture recognition and up to 1.4 for object detection. Thus, tokenization constitutes a novel direction in event-based vision and marks a step towards methods that preserve the properties of event cameras.
FLYINGTRUST: A Benchmark for Quadrotor Navigation Across Scenarios and Vehicles
Visual navigation algorithms for quadrotors often exhibit a large variation in performance when transferred across different vehicle platforms and scene geometries, which increases the cost and risk of field deployment. To support systematic early-stage evaluation, we introduce FLYINGTRUST, a high-fidelity, configurable benchmarking framework that measures how platform kinodynamics and scenario structure jointly affect navigation robustness. FLYINGTRUST models vehicle capability with two compact, physically interpretable indicators: maximum thrust-to-weight ratio and axis-wise maximum angular acceleration. The benchmark pairs a diverse scenario library with a heterogeneous set of real and virtual platforms and prescribes a standardized evaluation protocol together with a composite scoring method that balances scenario importance, platform importance and performance stability. We use FLYINGTRUST to compare representative optimization-based and learning-based navigation approaches under identical conditions, performing repeated trials per platform-scenario combination and reporting uncertainty-aware metrics. The results reveal systematic patterns: navigation success depends predictably on platform capability and scene geometry, and different algorithms exhibit distinct preferences and failure modes across the evaluated conditions. These observations highlight the practical necessity of incorporating both platform capability and scenario structure into algorithm design, evaluation, and selection, and they motivate future work on methods that remain robust across diverse platforms and scenarios.
Proxemics and Permeability of the Pedestrian Group
People tend to walk in groups, and interactions with those groups have a significant impact on crowd behavior and pedestrian traffic dynamics. Social norms can be seen as unwritten rules regulating people interactions in social settings. This article studies people interactions with groups and the emergence of group proxemics. Group zones, zone occupancy counts and people clearance from the group are studied using naturalistic data. Analysis indicate potential presence of three different zones in addition to the public zone. People tend to remain in the public zone and only progressively get closer to groups, and those closer approaches happen in a low frequency and for brief periods of time.
Adaptive Inverse Kinematics Framework for Learning Variable-Length Tool Manipulation in Robotics
Conventional robots possess a limited understanding of their kinematics and are confined to preprogrammed tasks, hindering their ability to leverage tools efficiently. Driven by the essential components of tool usage - grasping the desired outcome, selecting the most suitable tool, determining optimal tool orientation, and executing precise manipulations - we introduce a pioneering framework. Our novel approach expands the capabilities of the robot's inverse kinematics solver, empowering it to acquire a sequential repertoire of actions using tools of varying lengths. By integrating a simulation-learned action trajectory with the tool, we showcase the practicality of transferring acquired skills from simulation to real-world scenarios through comprehensive experimentation. Remarkably, our extended inverse kinematics solver demonstrates an impressive error rate of less than 1 cm. Furthermore, our trained policy achieves a mean error of 8 cm in simulation. Noteworthy, our model achieves virtually indistinguishable performance when employing two distinct tools of different lengths. This research provides an indication of potential advances in the exploration of all four fundamental aspects of tool usage, enabling robots to master the intricate art of tool manipulation across diverse tasks.
comment: 10 pages, 5 figures. Demonstrates a reinforcement learning framework for adaptive tool manipulation with variable-length extensions
RoboOS-NeXT: A Unified Memory-based Framework for Lifelong, Scalable, and Robust Multi-Robot Collaboration
The proliferation of collaborative robots across diverse tasks and embodiments presents a central challenge: achieving lifelong adaptability, scalable coordination, and robust scheduling in multi-agent systems. Existing approaches, from vision-language-action (VLA) models to hierarchical frameworks, fall short due to their reliance on limited or dividual-agent memory. This fundamentally constrains their ability to learn over long horizons, scale to heterogeneous teams, or recover from failures, highlighting the need for a unified memory representation. To address these limitations, we introduce RoboOS-NeXT, a unified memory-based framework for lifelong, scalable, and robust multi-robot collaboration. At the core of RoboOS-NeXT is the novel Spatio-Temporal-Embodiment Memory (STEM), which integrates spatial scene geometry, temporal event history, and embodiment profiles into a shared representation. This memory-centric design is integrated into a brain-cerebellum framework, where a high-level brain model performs global planning by retrieving and updating STEM, while low-level controllers execute actions locally. This closed loop between cognition, memory, and execution enables dynamic task allocation, fault-tolerant collaboration, and consistent state synchronization. We conduct extensive experiments spanning complex coordination tasks in restaurants, supermarkets, and households. Our results demonstrate that RoboOS-NeXT achieves superior performance across heterogeneous embodiments, validating its effectiveness in enabling lifelong, scalable, and robust multi-robot collaboration. Project website: https://flagopen.github.io/RoboOS/
Efficient Collision-Avoidance Constraints for Ellipsoidal Obstacles in Optimal Control: Application to Path-Following MPC and UAVs
This article proposes a modular optimal control framework for local three-dimensional ellipsoidal obstacle avoidance, exemplarily applied to model predictive path-following control. Static as well as moving obstacles are considered. Central to the approach is a computationally efficient and continuously differentiable condition for detecting collisions with ellipsoidal obstacles. A novel two-stage optimization approach mitigates numerical issues arising from the structure of the resulting optimal control problem. The effectiveness of the approach is demonstrated through simulations and real-world experiments with the Crazyflie quadrotor. This represents the first hardware demonstration of an MPC controller of this kind for UAVs in a three-dimensional task.
Human-in-the-loop Online Rejection Sampling for Robotic Manipulation
Reinforcement learning (RL) is widely used to produce robust robotic manipulation policies, but fine-tuning vision-language-action (VLA) models with RL can be unstable due to inaccurate value estimates and sparse supervision at intermediate steps. In contrast, imitation learning (IL) is easy to train but often underperforms due to its offline nature. In this paper, we propose Hi-ORS, a simple yet effective post-training method that utilizes rejection sampling to achieve both training stability and high robustness. Hi-ORS stabilizes value estimation by filtering out negatively rewarded samples during online fine-tuning, and adopts a reward-weighted supervised training objective to provide dense intermediate-step supervision. For systematic study, we develop an asynchronous inference-training framework that supports flexible online human-in-the-loop corrections, which serve as explicit guidance for learning error-recovery behaviors. Across three real-world tasks and two embodiments, Hi-ORS fine-tunes a pi-base policy to master contact-rich manipulation in just 1.5 hours of real-world training, outperforming RL and IL baselines by a substantial margin in both effectiveness and efficiency. Notably, the fine-tuned policy exhibits strong test-time scalability by reliably executing complex error-recovery behaviors to achieve better performance.
comment: 8 pages
CorVS: Person Identification via Video Trajectory-Sensor Correspondence in a Real-World Warehouse
Worker location data is key to higher productivity in industrial sites. Cameras are a promising tool for localization in logistics warehouses since they also offer valuable environmental contexts such as package status. However, identifying individuals with only visual data is often impractical. Accordingly, several prior studies identified people in videos by comparing their trajectories and wearable sensor measurements. While this approach has advantages such as independence from appearance, the existing methods may break down under real-world conditions. To overcome this challenge, we propose CorVS, a novel data-driven person identification method based on correspondence between visual tracking trajectories and sensor measurements. Firstly, our deep learning model predicts correspondence probabilities and reliabilities for every pair of a trajectory and sensor measurements. Secondly, our algorithm matches the trajectories and sensor measurements over time using the predicted probabilities and reliabilities. We developed a dataset with actual warehouse operations and demonstrated the method's effectiveness for real-world applications.
comment: 7 pages, 3 figures, accepted to IPIN 2025
Towards Reinforcement Learning Based Log Loading Automation
Forestry forwarders play a central role in mechanized timber harvesting by picking up and moving logs from the felling site to a processing area or a secondary transport vehicle. Forwarder operation is challenging and physically and mentally exhausting for the operator who must control the machine in remote areas for prolonged periods of time. Therefore, even partial automation of the process may reduce stress on the operator. This study focuses on continuing previous research efforts in application of reinforcement learning agents in automating log handling process, extending the task from grasping which was studied in previous research to full log loading operation. The resulting agent will be capable to automate a full loading procedure from locating and grappling to transporting and delivering the log to a forestry forwarder bed. To train the agent, a trailer type forestry forwarder simulation model in NVIDIA's Isaac Gym and a virtual environment for a typical log loading scenario were developed. With reinforcement learning agents and a curriculum learning approach, the trained agent may be a stepping stone towards application of reinforcement learning agents in automation of the forestry forwarder. The agent learnt grasping a log in a random position from grapple's random position and transport it to the bed with 94% success rate of the best performing agent.
Cooperative Task Spaces for Multi-Arm Manipulation Control based on Similarity Transformations
Many tasks in human environments require collaborative behavior between multiple kinematic chains, either to provide additional support for carrying big and bulky objects or to enable the dexterity that is required for in-hand manipulation. Since these complex systems often have a very high number of degrees of freedom coordinating their movements is notoriously difficult to model. In this article, we present the derivation of the theoretical foundations for cooperative task spaces of multi-arm robotic systems based on geometric primitives defined using conformal geometric algebra. Based on the similarity transformations of these cooperative geometric primitives, we derive an abstraction of complex robotic systems that enables representing these systems in a way that directly corresponds to single-arm systems. By deriving the associated analytic and geometric Jacobian matrices, we then show the straightforward integration of our approach into classical control techniques rooted in operational space control. We demonstrate this using bimanual manipulators, humanoids and multi-fingered hands in optimal control experiments for reaching desired geometric primitives and in teleoperation experiments using differential kinematics control. We then discuss how the geometric primitives naturally embed nullspace structures into the controllers that can be exploited for introducing secondary control objectives. This work, represents the theoretical foundations of this cooperative manipulation control framework, and thus the experiments are presented in an abstract way, while giving pointers towards potential future applications.
AgriGS-SLAM: Orchard Mapping Across Seasons via Multi-View Gaussian Splatting SLAM
Autonomous robots in orchards require real-time 3D scene understanding despite repetitive row geometry, seasonal appearance changes, and wind-driven foliage motion. We present AgriGS-SLAM, a Visual--LiDAR SLAM framework that couples direct LiDAR odometry and loop closures with multi-camera 3D Gaussian Splatting (3DGS) rendering. Batch rasterization across complementary viewpoints recovers orchard structure under occlusions, while a unified gradient-driven map lifecycle executed between keyframes preserves fine details and bounds memory. Pose refinement is guided by a probabilistic LiDAR-based depth consistency term, back-propagated through the camera projection to tighten geometry-appearance coupling. We deploy the system on a field platform in apple and pear orchards across dormancy, flowering, and harvesting, using a standardized trajectory protocol that evaluates both training-view and novel-view synthesis to reduce 3DGS overfitting in evaluation. Across seasons and sites, AgriGS-SLAM delivers sharper, more stable reconstructions and steadier trajectories than recent state-of-the-art 3DGS-SLAM baselines while maintaining real-time performance on-tractor. While demonstrated in orchard monitoring, the approach can be applied to other outdoor domains requiring robust multimodal perception.
Thor: Towards Human-Level Whole-Body Reactions for Intense Contact-Rich Environments
Humanoids hold great potential for service, industrial, and rescue applications, in which robots must sustain whole-body stability while performing intense, contact-rich interactions with the environment. However, enabling humanoids to generate human-like, adaptive responses under such conditions remains a major challenge. To address this, we propose Thor, a humanoid framework for human-level whole-body reactions in contact-rich environments. Based on the robot's force analysis, we design a force-adaptive torso-tilt (FAT2) reward function to encourage humanoids to exhibit human-like responses during force-interaction tasks. To mitigate the high-dimensional challenges of humanoid control, Thor introduces a reinforcement learning architecture that decouples the upper body, waist, and lower body. Each component shares global observations of the whole body and jointly updates its parameters. Finally, we deploy Thor on the Unitree G1, and it substantially outperforms baselines in force-interaction tasks. Specifically, the robot achieves a peak pulling force of 167.7 N (approximately 48% of the G1's body weight) when moving backward and 145.5 N when moving forward, representing improvements of 68.9% and 74.7%, respectively, compared with the best-performing baseline. Moreover, Thor is capable of pulling a loaded rack (130 N) and opening a fire door with one hand (60 N). These results highlight Thor's effectiveness in enhancing humanoid force-interaction capabilities.
PHUMA: Physically-Grounded Humanoid Locomotion Dataset
Motion imitation is a promising approach for humanoid locomotion, enabling agents to acquire humanlike behaviors. Existing methods typically rely on high-quality motion capture datasets such as AMASS, but these are scarce and expensive, limiting scalability and diversity. Recent studies attempt to scale data collection by converting large-scale internet videos, exemplified by Humanoid-X. However, they often introduce physical artifacts such as floating, penetration, and foot skating, which hinder stable imitation. In response, we introduce PHUMA, a Physically-grounded HUMAnoid locomotion dataset that leverages human video at scale, while addressing physical artifacts through careful data curation and physics-constrained retargeting. PHUMA enforces joint limits, ensures ground contact, and eliminates foot skating, producing motions that are both large-scale and physically reliable. We evaluated PHUMA in two sets of conditions: (i) imitation of unseen motion from self-recorded test videos and (ii) path following with pelvis-only guidance. In both cases, PHUMA-trained policies outperform Humanoid-X and AMASS, achieving significant gains in imitating diverse motions. The code is available at https://davian-robotics.github.io/PHUMA.
Self-localization on a 3D map by fusing global and local features from a monocular camera
Self-localization on a 3D map by using an inexpensive monocular camera is required to realize autonomous driving. Self-localization based on a camera often uses a convolutional neural network (CNN) that can extract local features that are calculated by nearby pixels. However, when dynamic obstacles, such as people, are present, CNN does not work well. This study proposes a new method combining CNN with Vision Transformer, which excels at extracting global features that show the relationship of patches on whole image. Experimental results showed that, compared to the state-of-the-art method (SOTA), the accuracy improvement rate in a CG dataset with dynamic obstacles is 1.5 times higher than that without dynamic obstacles. Moreover, the self-localization error of our method is 20.1% smaller than that of SOTA on public datasets. Additionally, our robot using our method can localize itself with 7.51cm error on average, which is more accurate than SOTA.
Adaptive Trajectory Refinement for Optimization-based Local Planning in Narrow Passages
Trajectory planning for mobile robots in cluttered environments remains a major challenge due to narrow passages, where conventional methods often fail or generate suboptimal paths. To address this issue, we propose the adaptive trajectory refinement algorithm, which consists of two main stages. First, to ensure safety at the path-segment level, a segment-wise conservative collision test is applied, where risk-prone trajectory path segments are recursively subdivided until collision risks are eliminated. Second, to guarantee pose-level safety, pose correction based on penetration direction and line search is applied, ensuring that each pose in the trajectory is collision-free and maximally clear from obstacles. Simulation results demonstrate that the proposed method achieves up to 1.69x higher success rates and up to 3.79x faster planning times than state-of-the-art approaches. Furthermore, real-world experiments confirm that the robot can safely pass through narrow passages while maintaining rapid planning performance.
Kinodynamic Task and Motion Planning using VLM-guided and Interleaved Sampling
Task and Motion Planning (TAMP) integrates high-level task planning with low-level motion feasibility, but existing methods are costly in long-horizon problems due to excessive motion sampling. While LLMs provide commonsense priors, they lack 3D spatial reasoning and cannot ensure geometric or dynamic feasibility. We propose a kinodynamic TAMP framework based on a hybrid state tree that uniformly represents symbolic and numeric states during planning, enabling task and motion decisions to be jointly decided. Kinodynamic constraints embedded in the TAMP problem are verified by an off-the-shelf motion planner and physics simulator, and a VLM guides exploring a TAMP solution and backtracks the search based on visual rendering of the states. Experiments on the simulated domains and in the real world show 32.14% - 1166.67% increased average success rates compared to traditional and LLM-based TAMP planners and reduced planning time on complex problems, with ablations further highlighting the benefits of VLM guidance.
Embodied Intelligence for Advanced Bioinspired Microrobotics: Examples and Insights
The term embodied intelligence (EI) conveys the notion that body morphology, material properties, interaction with the environment, and control strategies can be purposefully integrated into the process of robotic design to generate intelligent behavior; in particular, locomotion and navigation. In this paper, we discuss EI as a design principle for advanced microrobotics, with a particular focus on co-design -- the simultaneous and interdependent development of physical structure and behavioral function. To illustrate the contrast between EI-inspired systems and traditional architectures that decouple sensing, computation, and actuation, we present and discuss a collection of robots developed by the author and his team at the Autonomous Microrobotic Systems Laboratory (AMSL). These robots exhibit intelligent behavior that emerges from their structural dynamics and the physical interaction between their components and with the environment. Platforms such as the Bee++, RoBeetle, SMALLBug, SMARTI, WaterStrider, VLEIBot+, and FRISSHBot exemplify how feedback loops, decision logics, sensing mechanisms, and smart actuation strategies can be embedded into the physical properties of the robotic system itself. Along these lines, we contend that co-design is not only a method for empirical optimization under constraints, but also an enabler of EI, offering a scalable and robust alternative to classical control for robotics at the mm-to-cm-scale.
comment: 8 pages, 7 figures, accepted to ICAR 2025
Exploring Object-Aware Attention Guided Frame Association for RGB-D SLAM
Attention models have recently emerged as a powerful approach, demonstrating significant progress in various fields. Visualization techniques, such as class activation mapping, provide visual insights into the reasoning of convolutional neural networks (CNNs). Using network gradients, it is possible to identify regions where the network pays attention during image recognition tasks. Furthermore, these gradients can be combined with CNN features to localize more generalizable, task-specific attentive (salient) regions within scenes. However, explicit use of this gradient-based attention information integrated directly into CNN representations for semantic object understanding remains limited. Such integration is particularly beneficial for visual tasks like simultaneous localization and mapping (SLAM), where CNN representations enriched with spatially attentive object locations can enhance performance. In this work, we propose utilizing task-specific network attention for RGB-D indoor SLAM. Specifically, we integrate layer-wise attention information derived from network gradients with CNN feature representations to improve frame association performance. Experimental results indicate improved performance compared to baseline methods, particularly for large environments.
comment: double-column 5 pages, 3 figures
Beyond the Uncanny Valley: A Mixed-Method Investigation of Anthropomorphism in Protective Responses to Robot Abuse
Robots with anthropomorphic features are increasingly shaping how humans perceive and morally engage with them. Our research investigates how different levels of anthropomorphism influence protective responses to robot abuse, extending the Computers as Social Actors (CASA) and uncanny valley theories into a moral domain. In an experiment, we invite 201 participants to view videos depicting abuse toward a robot with low (Spider), moderate (Two-Foot), or high (Humanoid) anthropomorphism. To provide a comprehensive analysis, we triangulate three modalities: self-report surveys measuring emotions and uncanniness, physiological data from automated facial expression analysis, and qualitative reflections. Findings indicate that protective responses are not linear. The moderately anthropomorphic Two-Foot robot, rated highest in eeriness and "spine-tingling" sensations consistent with the uncanny valley, elicited the strongest physiological anger expressions. Self-reported anger and guilt are significantly higher for both the Two-Foot and Humanoid robots compared to the Spider. Qualitative findings further reveal that as anthropomorphism increases, moral reasoning shifts from technical assessments of property damage to condemnation of the abuser's character, while governance proposals expand from property law to calls for quasi-animal rights and broader societal responsibility. These results suggest that the uncanny valley does not dampen moral concern but paradoxically heightens protective impulses, offering critical implications for robot design, policy, and future legal frameworks.
I don't Want You to Die: A Shared Responsibility Framework for Safeguarding Child-Robot Companionship
Social robots like Moxie are designed to form strong emotional bonds with children, but their abrupt discontinuation can cause significant struggles and distress to children. When these services end, the resulting harm raises complex questions of who bears responsibility when children's emotional bonds are broken. Using the Moxie shutdown as a case study through a qualitative survey of 72 U.S. participants, our findings show that the responsibility is viewed as a shared duty across the robot company, parents, developers, and government. However, these attributions varied by political ideology and parental status of whether they have children. Participants' perceptions of whether the robot service should continue are highly polarized; supporters propose technical, financial, and governmental pathways for continuity, while opponents cite business realities and risks of unhealthy emotional dependency. Ultimately, this research contributes an empirically grounded shared responsibility framework for safeguarding child-robot companionship by detailing how accountability is distributed and contested, informing concrete design and policy implications to mitigate the emotional harm of robot discontinuation.
Morphology-Aware Graph Reinforcement Learning for Tensegrity Robot Locomotion
Tensegrity robots combine rigid rods and elastic cables, offering high resilience and deployability but posing major challenges for locomotion control due to their underactuated and highly coupled dynamics. This paper introduces a morphology-aware reinforcement learning framework that integrates a graph neural network (GNN) into the Soft Actor-Critic (SAC) algorithm. By representing the robot's physical topology as a graph, the proposed GNN-based policy captures coupling among components, enabling faster and more stable learning than conventional multilayer perceptron (MLP) policies. The method is validated on a physical 3-bar tensegrity robot across three locomotion primitives, including straight-line tracking and bidirectional turning. It shows superior sample efficiency, robustness to noise and stiffness variations, and improved trajectory accuracy. Notably, the learned policies transfer directly from simulation to hardware without fine-tuning, achieving stable real-world locomotion. These results demonstrate the advantages of incorporating structural priors into reinforcement learning for tensegrity robot control.
Accelerating Real-World Overtaking in F1TENTH Racing Employing Reinforcement Learning Methods
While autonomous racing performance in Time-Trial scenarios has seen significant progress and development, autonomous wheel-to-wheel racing and overtaking are still severely limited. These limitations are particularly apparent in real-life driving scenarios where state-of-the-art algorithms struggle to safely or reliably complete overtaking manoeuvres. This is important, as reliable navigation around other vehicles is vital for safe autonomous wheel-to-wheel racing. The F1Tenth Competition provides a useful opportunity for developing wheel-to-wheel racing algorithms on a standardised physical platform. The competition format makes it possible to evaluate overtaking and wheel-to-wheel racing algorithms against the state-of-the-art. This research presents a novel racing and overtaking agent capable of learning to reliably navigate a track and overtake opponents in both simulation and reality. The agent was deployed on an F1Tenth vehicle and competed against opponents running varying competitive algorithms in the real world. The results demonstrate that the agent's training against opponents enables deliberate overtaking behaviours with an overtaking rate of 87% compared 56% for an agent trained just to race.
SpikeATac: A Multimodal Tactile Finger with Taxelized Dynamic Sensing for Dexterous Manipulation
In this work, we introduce SpikeATac, a multimodal tactile finger combining a taxelized and highly sensitive dynamic response (PVDF) with a static transduction method (capacitive) for multimodal touch sensing. Named for its `spiky' response, SpikeATac's 16-taxel PVDF film sampled at 4 kHz provides fast, sensitive dynamic signals to the very onset and breaking of contact. We characterize the sensitivity of the different modalities, and show that SpikeATac provides the ability to stop quickly and delicately when grasping fragile, deformable objects. Beyond parallel grasping, we show that SpikeATac can be used in a learning-based framework to achieve new capabilities on a dexterous multifingered robot hand. We use a learning recipe that combines reinforcement learning from human feedback with tactile-based rewards to fine-tune the behavior of a policy to modulate force. Our hardware platform and learning pipeline together enable a difficult dexterous and contact-rich task that has not previously been achieved: in-hand manipulation of fragile objects. Videos are available at \href{https://roamlab.github.io/spikeatac/}{roamlab.github.io/spikeatac}.
comment: 9 pages, 8 figures, under review
A Multi-Modal Neuro-Symbolic Approach for Spatial Reasoning-Based Visual Grounding in Robotics
Visual reasoning, particularly spatial reasoning, is a challenging cognitive task that requires understanding object relationships and their interactions within complex environments, especially in robotics domain. Existing vision_language models (VLMs) excel at perception tasks but struggle with fine-grained spatial reasoning due to their implicit, correlation-driven reasoning and reliance solely on images. We propose a novel neuro_symbolic framework that integrates both panoramic-image and 3D point cloud information, combining neural perception with symbolic reasoning to explicitly model spatial and logical relationships. Our framework consists of a perception module for detecting entities and extracting attributes, and a reasoning module that constructs a structured scene graph to support precise, interpretable queries. Evaluated on the JRDB-Reasoning dataset, our approach demonstrates superior performance and reliability in crowded, human_built environments while maintaining a lightweight design suitable for robotics and embodied AI applications.
A Hermetic, Transparent Soft Growing Vine Robot System for Pipe Inspection
Rehabilitation of aging pipes requires accurate condition assessment and mapping far into the pipe interiors. Soft growing vine robot systems are particularly promising for navigating confined, sinuous paths such as in pipes, but are currently limited by complex subsystems and a lack of validation in real-world industrial settings. In this paper, we introduce the concept and implementation of a hermetic and transparent vine robot system for visual condition assessment and mapping within non-branching pipes. This design encloses all mechanical and electrical components within the vine robot's soft, airtight, and transparent body, protecting them from environmental interference while enabling visual sensing. Because this approach requires an enclosed mechanism for transporting sensors, we developed, modeled, and tested a passively adapting enclosed tip mount. Finally, we validated the hermetic and transparent vine robot system concept through a real-world condition assessment and mapping task in a wastewater pipe. This work advances the use of soft-growing vine robots in pipe inspection by developing and demonstrating a robust, streamlined, field-validated system suitable for continued development and deployment.
comment: 8 pages, 7 figures
Cooperative Integrated Estimation-Guidance for Simultaneous Interception of Moving Targets
This paper proposes a cooperative integrated estimation-guidance framework for simultaneous interception of a non-maneuvering target using a team of unmanned autonomous vehicles, assuming only a subset of vehicles are equipped with dedicated sensors to measure the target's states. Unlike earlier approaches that focus solely on either estimation or guidance design, the proposed framework unifies both within a cooperative architecture. To circumvent the limitation posed by heterogeneity in target observability, sensorless vehicles estimate the target's state by leveraging information exchanged with neighboring agents over a directed communication topology through a prescribed-time observer. The proposed approach employs true proportional navigation guidance (TPNG), which uses an exact time-to-go formulation and is applicable across a wide spectrum of target motions. Furthermore, prescribed-time observer and controller are employed to achieve convergence to true target's state and consensus in time-to-go within set predefined times, respectively. Simulations demonstrate the effectiveness of the proposed framework under various engagement scenarios.
RepV: Safety-Separable Latent Spaces for Scalable Neurosymbolic Plan Verification
As AI systems migrate to safety-critical domains, verifying that their actions comply with well-defined rules remains a challenge. Formal methods provide provable guarantees but demand hand-crafted temporal-logic specifications, offering limited expressiveness and accessibility. Deep learning approaches enable evaluation of plans against natural-language constraints, yet their opaque decision process invites misclassifications with potentially severe consequences. We introduce RepV, a neurosymbolic verifier that unifies both views by learning a latent space where safe and unsafe plans are linearly separable. Starting from a modest seed set of plans labeled by an off-the-shelf model checker, RepV trains a lightweight projector that embeds each plan, together with a language model-generated rationale, into a low-dimensional space; a frozen linear boundary then verifies compliance for unseen natural-language rules in a single forward pass. Beyond binary classification, RepV provides a probabilistic guarantee on the likelihood of correct verification based on its position in the latent space. This guarantee enables a guarantee-driven refinement of the planner, improving rule compliance without human annotations. Empirical evaluations show that RepV improves compliance prediction accuracy by up to 15% compared to baseline methods while adding fewer than 0.2M parameters. Furthermore, our refinement framework outperforms ordinary fine-tuning baselines across various planning domains. These results show that safety-separable latent spaces offer a scalable, plug-and-play primitive for reliable neurosymbolic plan verification. Code and data are available at: https://repv-project.github.io/.
comment: Code and data are available at: https://repv-project.github.io/
Heterogeneous Robot Collaboration in Unstructured Environments with Grounded Generative Intelligence
Heterogeneous robot teams operating in realistic settings often must accomplish complex missions requiring collaboration and adaptation to information acquired online. Because robot teams frequently operate in unstructured environments -- uncertain, open-world settings without prior maps -- subtasks must be grounded in robot capabilities and the physical world. While heterogeneous teams have typically been designed for fixed specifications, generative intelligence opens the possibility of teams that can accomplish a wide range of missions described in natural language. However, current large language model (LLM)-enabled teaming methods typically assume well-structured and known environments, limiting deployment in unstructured environments. We present SPINE-HT, a framework that addresses these limitations by grounding the reasoning abilities of LLMs in the context of a heterogeneous robot team through a three-stage process. Given language specifications describing mission goals and team capabilities, an LLM generates grounded subtasks which are validated for feasibility. Subtasks are then assigned to robots based on capabilities such as traversability or perception and refined given feedback collected during online operation. In simulation experiments with closed-loop perception and control, our framework achieves nearly twice the success rate compared to prior LLM-enabled heterogeneous teaming approaches. In real-world experiments with a Clearpath Jackal, a Clearpath Husky, a Boston Dynamics Spot, and a high-altitude UAV, our method achieves an 87\% success rate in missions requiring reasoning about robot capabilities and refining subtasks with online feedback. More information is provided at https://zacravichandran.github.io/SPINE-HT.
NaviTrace: Evaluating Embodied Navigation of Vision-Language Models
Vision-language models demonstrate unprecedented performance and generalization across a wide range of tasks and scenarios. Integrating these foundation models into robotic navigation systems opens pathways toward building general-purpose robots. Yet, evaluating these models' navigation capabilities remains constrained by costly real-world trials, overly simplified simulations, and limited benchmarks. We introduce NaviTrace, a high-quality Visual Question Answering benchmark where a model receives an instruction and embodiment type (human, legged robot, wheeled robot, bicycle) and must output a 2D navigation trace in image space. Across 1000 scenarios and more than 3000 expert traces, we systematically evaluate eight state-of-the-art VLMs using a newly introduced semantic-aware trace score. This metric combines Dynamic Time Warping distance, goal endpoint error, and embodiment-conditioned penalties derived from per-pixel semantics and correlates with human preferences. Our evaluation reveals consistent gap to human performance caused by poor spatial grounding and goal localization. NaviTrace establishes a scalable and reproducible benchmark for real-world robotic navigation. The benchmark and leaderboard can be found at https://leggedrobotics.github.io/navitrace_webpage/.
comment: 9 pages, 6 figures, under review at IEEE conference
Design for One, Deploy for Many: Navigating Tree Mazes with Multiple Agents
Maze-like environments, such as cave and pipe networks, pose unique challenges for multiple robots to coordinate, including communication constraints and congestion. To address these challenges, we propose a distributed multi-agent maze traversal algorithm for environments that can be represented by acyclic graphs. It uses a leader-switching mechanism where one agent, assuming a head role, employs any single-agent maze solver while the other agents each choose an agent to follow. The head role gets transferred to neighboring agents where necessary, ensuring it follows the same path as a single agent would. The multi-agent maze traversal algorithm is evaluated in simulations with groups of up to 300 agents, various maze sizes, and multiple single-agent maze solvers. It is compared against strategies that are na\"ive, or assume either global communication or full knowledge of the environment. The algorithm outperforms the na\"ive strategy in terms of makespan and sum-of-fuel. It is superior to the global-communication strategy in terms of makespan but is inferior to it in terms of sum-of-fuel. The findings suggest it is asymptotically equivalent to the full-knowledge strategy with respect to either metric. Moreover, real-world experiments with up to 20 Pi-puck robots confirm the feasibility of the approach.
comment: 7 pages, 7 figures, to be published in MRS 2025
Leveraging Foundation Models for Enhancing Robot Perception and Action
This thesis investigates how foundation models can be systematically leveraged to enhance robotic capabilities, enabling more effective localization, interaction, and manipulation in unstructured environments. The work is structured around four core lines of inquiry, each addressing a fundamental challenge in robotics while collectively contributing to a cohesive framework for semantics-aware robotic intelligence.
comment: Doctoral thesis
CronusVLA: Towards Efficient and Robust Manipulation via Multi-Frame Vision-Language-Action Modeling
Recent vision-language-action (VLA) models built on pretrained vision-language models (VLMs) have demonstrated strong performance in robotic manipulation. However, these models remain constrained by the single-frame image paradigm and fail to fully leverage the temporal information offered by multi-frame histories, as directly feeding multiple frames into VLM backbones incurs substantial computational overhead and inference latency. We propose CronusVLA, a unified framework that extends single-frame VLA models to the multi-frame paradigm. CronusVLA follows a two-stage process: (1) Single-frame pretraining on large-scale embodied datasets with autoregressive prediction of action tokens, establishing an effective embodied vision-language foundation; (2) Multi-frame post-training, which adapts the prediction of the vision-language backbone from discrete tokens to learnable features, and aggregates historical information via feature chunking. CronusVLA effectively addresses the existing challenges of multi-frame modeling while enhancing performance and observational robustness. To evaluate the robustness under temporal and spatial disturbances, we introduce SimplerEnv-OR, a novel benchmark featuring 24 types of observational disturbances and 120 severity levels. Experiments across three embodiments in simulated and real-world environments demonstrate that CronusVLA achieves leading performance and superior robustness, with a 70.9% success rate on SimplerEnv, a 26.8% improvement over OpenVLA on LIBERO, and the highest robustness score on SimplerEnv-OR. These results highlight the potential of efficient multi-frame adaptation in VLA models for more powerful and robust real-world deployment.
comment: 39 pages, 24 figures
Agile and Cooperative Aerial Manipulation of a Cable-Suspended Load
Quadrotors can carry slung loads to hard-to-reach locations at high speed. Since a single quadrotor has limited payload capacities, using a team of quadrotors to collaboratively manipulate a heavy object is a scalable and promising solution. However, existing control algorithms for multi-lifting systems only enable low-speed and low-acceleration operations due to the complex dynamic coupling between quadrotors and the load, limiting their use in time-critical missions such as search and rescue. In this work, we present a solution to significantly enhance the agility of cable-suspended multi-lifting systems. Unlike traditional cascaded solutions, we introduce a trajectory-based framework that solves the whole-body kinodynamic motion planning problem online, accounting for the dynamic coupling effects and constraints between the quadrotors and the load. The planned trajectory is provided to the quadrotors as a reference in a receding-horizon fashion and is tracked by an onboard controller that observes and compensates for the cable tension. Real-world experiments demonstrate that our framework can achieve at least eight times greater acceleration than state-of-the-art methods to follow agile trajectories. Our method can even perform complex maneuvers such as flying through narrow passages at high speed. Additionally, it exhibits high robustness against load uncertainties and does not require adding any sensors to the load, demonstrating strong practicality.
comment: 38 pages, 11 figures
LiGen: GAN-Augmented Spectral Fingerprinting for Indoor Positioning
Accurate and robust indoor localization is critical for smart building applications, yet existing Wi-Fi-based systems are often vulnerable to environmental conditions. This work presents a novel indoor localization system, called LiGen, that leverages the spectral intensity patterns of ambient light as fingerprints, offering a more stable and infrastructure-free alternative to radio signals. To address the limited spectral data, we design a data augmentation framework based on generative adversarial networks (GANs), featuring two variants: PointGAN, which generates fingerprints conditioned on coordinates, and FreeGAN, which uses a weak localization model to label unconditioned samples. Our positioning model, leveraging a Multi-Layer Perceptron (MLP) architecture to train on synthesized data, achieves submeter-level accuracy, outperforming Wi-Fi-based baselines by over 50\%. LiGen also demonstrates strong robustness in cluttered environments. To the best of our knowledge, this is the first system to combine spectral fingerprints with GAN-based data augmentation for indoor localization.
comment: 6 pages, 10 figures
C-NAV: Towards Self-Evolving Continual Object Navigation in Open World NeurIPS 2025
Embodied agents are expected to perform object navigation in dynamic, open-world environments. However, existing approaches typically rely on static trajectories and a fixed set of object categories during training, overlooking the real-world requirement for continual adaptation to evolving scenarios. To facilitate related studies, we introduce the continual object navigation benchmark, which requires agents to acquire navigation skills for new object categories while avoiding catastrophic forgetting of previously learned knowledge. To tackle this challenge, we propose C-Nav, a continual visual navigation framework that integrates two key innovations: (1) A dual-path anti-forgetting mechanism, which comprises feature distillation that aligns multi-modal inputs into a consistent representation space to ensure representation consistency, and feature replay that retains temporal features within the action decoder to ensure policy consistency. (2) An adaptive sampling strategy that selects diverse and informative experiences, thereby reducing redundancy and minimizing memory overhead. Extensive experiments across multiple model architectures demonstrate that C-Nav consistently outperforms existing approaches, achieving superior performance even compared to baselines with full trajectory retention, while significantly lowering memory requirements. The code will be publicly available at https://bigtree765.github.io/C-Nav-project.
comment: Accepted at NeurIPS 2025
Loop Closure from Two Views: Revisiting PGO for Scalable Trajectory Estimation through Monocular Priors
(Visual) Simultaneous Localization and Mapping (SLAM) remains a fundamental challenge in enabling autonomous systems to navigate and understand large-scale environments. Traditional SLAM approaches struggle to balance efficiency and accuracy, particularly in large-scale settings where extensive computational resources are required for scene reconstruction and Bundle Adjustment (BA). However, this scene reconstruction, in the form of sparse pointclouds of visual landmarks, is often only used within the SLAM system because navigation and planning methods require different map representations. In this work, we therefore investigate a more scalable Visual SLAM (VSLAM) approach without reconstruction, mainly based on approaches for two-view loop closures. By restricting the map to a sparse keyframed pose graph without dense geometry representations, our `2GO' system achieves efficient optimization with competitive absolute trajectory accuracy. In particular, we find that recent advancements in image matching and monocular depth priors enable very accurate trajectory optimization without BA. We conduct extensive experiments on diverse datasets, including large-scale scenarios, and provide a detailed analysis of the trade-offs between runtime, accuracy, and map size. Our results demonstrate that this streamlined approach supports real-time performance, scales well in map size and trajectory duration, and effectively broadens the capabilities of VSLAM for long-duration deployments to large environments.
Learning to Insert for Constructive Neural Vehicle Routing Solver NeurIPS 2025
Neural Combinatorial Optimisation (NCO) is a promising learning-based approach for solving Vehicle Routing Problems (VRPs) without extensive manual design. While existing constructive NCO methods typically follow an appending-based paradigm that sequentially adds unvisited nodes to partial solutions, this rigid approach often leads to suboptimal results. To overcome this limitation, we explore the idea of insertion-based paradigm and propose Learning to Construct with Insertion-based Paradigm (L2C-Insert), a novel learning-based method for constructive NCO. Unlike traditional approaches, L2C-Insert builds solutions by strategically inserting unvisited nodes at any valid position in the current partial solution, which can significantly enhance the flexibility and solution quality. The proposed framework introduces three key components: a novel model architecture for precise insertion position prediction, an efficient training scheme for model optimization, and an advanced inference technique that fully exploits the insertion paradigm's flexibility. Extensive experiments on both synthetic and real-world instances of the Travelling Salesman Problem (TSP) and Capacitated Vehicle Routing Problem (CVRP) demonstrate that L2C-Insert consistently achieves superior performance across various problem sizes.
comment: Accepted at NeurIPS 2025
SAFE: Multitask Failure Detection for Vision-Language-Action Models NeurIPS 2025
While vision-language-action models (VLAs) have shown promising robotic behaviors across a diverse set of manipulation tasks, they achieve limited success rates when deployed on novel tasks out of the box. To allow these policies to safely interact with their environments, we need a failure detector that gives a timely alert such that the robot can stop, backtrack, or ask for help. However, existing failure detectors are trained and tested only on one or a few specific tasks, while generalist VLAs require the detector to generalize and detect failures also in unseen tasks and novel environments. In this paper, we introduce the multitask failure detection problem and propose SAFE, a failure detector for generalist robot policies such as VLAs. We analyze the VLA feature space and find that VLAs have sufficient high-level knowledge about task success and failure, which is generic across different tasks. Based on this insight, we design SAFE to learn from VLA internal features and predict a single scalar indicating the likelihood of task failure. SAFE is trained on both successful and failed rollouts and is evaluated on unseen tasks. SAFE is compatible with different policy architectures. We test it on OpenVLA, $\pi_0$, and $\pi_0$-FAST in both simulated and real-world environments extensively. We compare SAFE with diverse baselines and show that SAFE achieves state-of-the-art failure detection performance and the best trade-off between accuracy and detection time using conformal prediction. More qualitative results and code can be found at the project webpage: https://vla-safe.github.io/
comment: NeurIPS 2025 camera ready. Project Page: https://vla-safe.github.io/
Towards Predicting Any Human Trajectory In Context NeurIPS 2025
Predicting accurate future trajectories of pedestrians is essential for autonomous systems but remains a challenging task due to the need for adaptability in different environments and domains. A common approach involves collecting scenario-specific data and performing fine-tuning via backpropagation. However, the need to fine-tune for each new scenario is often impractical for deployment on edge devices. To address this challenge, we introduce \paper, an In-Context Learning (ICL) framework for pedestrian trajectory prediction that enables adaptation without fine-tuning on the scenario-specific data at inference time without requiring weight updates. We propose a spatio-temporal similarity-based example selection (STES) method that selects relevant examples from previously observed trajectories within the same scene by identifying similar motion patterns at corresponding locations. To further refine this selection, we introduce prediction-guided example selection (PG-ES), which selects examples based on both the past trajectory and the predicted future trajectory, rather than relying solely on the past trajectory. This approach allows the model to account for long-term dynamics when selecting examples. Finally, instead of relying on small real-world datasets with limited scenario diversity, we train our model on a large-scale synthetic dataset to enhance its prediction ability by leveraging in-context examples. Extensive experiments demonstrate that TrajICL achieves remarkable adaptation across both in-domain and cross-domain scenarios, outperforming even fine-tuned approaches across multiple public benchmarks. Project Page: https://fujiry0.github.io/TrajICL-project-page/.
comment: NeurIPS 2025
FSR-VLN: Fast and Slow Reasoning for Vision-Language Navigation with Hierarchical Multi-modal Scene Graph
Visual-Language Navigation (VLN) is a fundamental challenge in robotic systems, with broad applications for the deployment of embodied agents in real-world environments. Despite recent advances, existing approaches are limited in long-range spatial reasoning, often exhibiting low success rates and high inference latency, particularly in long-range navigation tasks. To address these limitations, we propose FSR-VLN, a vision-language navigation system that combines a Hierarchical Multi-modal Scene Graph (HMSG) with Fast-to-Slow Navigation Reasoning (FSR). The HMSG provides a multi-modal map representation supporting progressive retrieval, from coarse room-level localization to fine-grained goal view and object identification. Building on HMSG, FSR first performs fast matching to efficiently select candidate rooms, views, and objects, then applies VLM-driven refinement for final goal selection. We evaluated FSR-VLN across four comprehensive indoor datasets collected by humanoid robots, utilizing 87 instructions that encompass a diverse range of object categories. FSR-VLN achieves state-of-the-art (SOTA) performance in all datasets, measured by the retrieval success rate (RSR), while reducing the response time by 82% compared to VLM-based methods on tour videos by activating slow reasoning only when fast intuition fails. Furthermore, we integrate FSR-VLN with speech interaction, planning, and control modules on a Unitree-G1 humanoid robot, enabling natural language interaction and real-time navigation.
comment: 8 pages
Human-assisted Robotic Policy Refinement via Action Preference Optimization NeurIPS 2025
Establishing a reliable and iteratively refined robotic system is essential for deploying real-world applications. While Vision-Language-Action (VLA) models are widely recognized as the foundation model for such robotic deployment, their reliance on offline expert demonstrations critically limits their capacity for post-deployment refinement. To mitigate this limitation, we introduce Action Preference Optimization (APO), a method designed to refine VLA models by human-assisted preference alignment gathered through interaction with environments. This method begins with a human-robot collaboration framework for reliable failure correction and interaction trajectory collection through human intervention. However, directly leveraging these interaction trajectories for preference optimization is non-trivial due to the challenges of irreversible robotic actions and token distribution mismatch. To solve this, APO proposes an adaptive reweighting algorithm with binary desirability signals derived from interaction, empowering VLA models effectively suppress failure-prone actions while enhancing corrective action adaptation. Ultimately, APO equips VLA models with the crucial capability to learn from failure, paving the way for their iterative refinement and reliable deployment in dynamic environments. The experiments conducted in simulation and real-world scenarios prove superior generalization and robustness of our human-assisted framework across a variety of manipulation tasks. We believe this work could bring insights for efficient and stable optimization of VLA models through human-robot collaboration. The code and dataset are released at https://github.com/GeWu-Lab/Action-Preference-Optimization
comment: Accepted By NeurIPS 2025
3D Equivariant Visuomotor Policy Learning via Spherical Projection
Equivariant models have recently been shown to improve the data efficiency of diffusion policy by a significant margin. However, prior work that explored this direction focused primarily on point cloud inputs generated by multiple cameras fixed in the workspace. This type of point cloud input is not compatible with the now-common setting where the primary input modality is an eye-in-hand RGB camera like a GoPro. This paper closes this gap by incorporating into the diffusion policy model a process that projects features from the 2D RGB camera image onto a sphere. This enables us to reason about symmetries in $\mathrm{SO}(3)$ without explicitly reconstructing a point cloud. We perform extensive experiments in both simulation and the real world that demonstrate that our method consistently outperforms strong baselines in terms of both performance and sample efficiency. Our work, Image-to-Sphere Policy ($\textbf{ISP}$), is the first $\mathrm{SO}(3)$-equivariant policy learning framework for robotic manipulation that works using only monocular RGB inputs.
Falconry-like palm landing by a flapping-wing drone based on the human gesture interaction and distance-aware flight planning
Flapping-wing drones have attracted significant attention due to their biomimetic flight. They are considered more human-friendly due to their characteristics such as low noise and flexible wings, making them suitable for human-drone interactions. However, few studies have explored the practical interaction between humans and flapping-wing drones. On establishing a physical interaction system with flapping-wing drones, we can acquire inspirations from falconers who guide birds of prey to land on their arms. This interaction interprets the human body as a dynamic landing platform, which can be utilized in various scenarios such as crowded or spatially constrained environments. Thus, in this study, we propose a falconry-like interaction system in which a flapping-wing drone performs a palm landing motion on a human hand. To achieve a safe approach toward humans, we design a trajectory planning method that considers both physical and psychological factors of the human safety such as the drone's velocity and distance from the user. We use a commercial flapping platform with our implemented motion planning and conduct experiments to evaluate the palm landing performance and safety. The results demonstrate that our approach enables safe and smooth hand landing interactions. To the best of our knowledge, it is the first time to achieve a contact-based interaction between flapping-wing drones and humans.
comment: 8 pages, 14 figures
DiffVLA++: Bridging Cognitive Reasoning and End-to-End Driving through Metric-Guided Alignment
Conventional end-to-end (E2E) driving models are effective at generating physically plausible trajectories, but often fail to generalize to long-tail scenarios due to the lack of essential world knowledge to understand and reason about surrounding environments. In contrast, Vision-Language-Action (VLA) models leverage world knowledge to handle challenging cases, but their limited 3D reasoning capability can lead to physically infeasible actions. In this work we introduce DiffVLA++, an enhanced autonomous driving framework that explicitly bridges cognitive reasoning and E2E planning through metric-guided alignment. First, we build a VLA module directly generating semantically grounded driving trajectories. Second, we design an E2E module with a dense trajectory vocabulary that ensures physical feasibility. Third, and most critically, we introduce a metric-guided trajectory scorer that guides and aligns the outputs of the VLA and E2E modules, thereby integrating their complementary strengths. The experiment on the ICCV 2025 Autonomous Grand Challenge leaderboard shows that DiffVLA++ achieves EPDMS of 49.12.
Mechanical Intelligence-Aware Curriculum Reinforcement Learning for Humanoids with Parallel Actuation
Reinforcement learning (RL) has enabled advances in humanoid robot locomotion, yet most learning frameworks do not account for mechanical intelligence embedded in parallel actuation mechanisms due to limitations in simulator support for closed kinematic chains. This omission can lead to inaccurate motion modeling and suboptimal policies, particularly for robots with high actuation complexity. This paper presents general formulations and simulation methods for three types of parallel mechanisms: a differential pulley, a five-bar linkage, and a four-bar linkage, and trains a parallel-mechanism aware policy through an end-to-end curriculum RL framework for BRUCE, a kid-sized humanoid robot. Unlike prior approaches that rely on simplified serial approximations, we simulate all closed-chain constraints natively using GPU-accelerated MuJoCo (MJX), preserving the hardware's mechanical nonlinear properties during training. We benchmark our RL approach against a model predictive controller (MPC), demonstrating better surface generalization and performance in real-world zero-shot deployment. This work highlights the computational approaches and performance benefits of fully simulating parallel mechanisms in end-to-end learning pipelines for legged humanoids. Project codes with parallel mechanisms: https://github.com/alvister88/og_bruce
comment: Proceeding to the IEEE Humanoid Conference 2025
Robust Offline Reinforcement Learning with Linearly Structured f-Divergence Regularization ICML 2025
The Robust Regularized Markov Decision Process (RRMDP) is proposed to learn policies robust to dynamics shifts by adding regularization to the transition dynamics in the value function. Existing methods mostly use unstructured regularization, potentially leading to conservative policies under unrealistic transitions. To address this limitation, we propose a novel framework, the $d$-rectangular linear RRMDP ($d$-RRMDP), which introduces latent structures into both transition kernels and regularization. We focus on offline reinforcement learning, where an agent learns policies from a precollected dataset in the nominal environment. We develop the Robust Regularized Pessimistic Value Iteration (R2PVI) algorithm that employs linear function approximation for robust policy learning in $d$-RRMDPs with $f$-divergence based regularization terms on transition kernels. We provide instance-dependent upper bounds on the suboptimality gap of R2PVI policies, demonstrating that these bounds are influenced by how well the dataset covers state-action spaces visited by the optimal robust policy under robustly admissible transitions. We establish information-theoretic lower bounds to verify that our algorithm is near-optimal. Finally, numerical experiments validate that R2PVI learns robust policies and exhibits superior computational efficiency compared to baseline methods.
comment: 41 pages, 3 figures, 2 tables. Published in Proceedings of the 42nd International Conference on Machine Learning (ICML 2025)
Online Adaptation for Flying Quadrotors in Tight Formations
The task of flying in tight formations is challenging for teams of quadrotors because the complex aerodynamic wake interactions can destabilize individual team members as well as the team. Furthermore, these aerodynamic effects are highly nonlinear and fast-paced, making them difficult to model and predict. To overcome these challenges, we present L1 KNODE-DW MPC, an adaptive, mixed expert learning based control framework that allows individual quadrotors to accurately track trajectories while adapting to time-varying aerodynamic interactions during formation flights. We evaluate L1 KNODE-DW MPC in two different three-quadrotor formations and show that it outperforms several MPC baselines. Our results show that the proposed framework is capable of enabling the three-quadrotor team to remain vertically aligned in close proximity throughout the flight. These findings show that the L1 adaptive module compensates for unmodeled disturbances most effectively when paired with an accurate dynamics model. A video showcasing our framework and the physical experiments is available here: https://youtu.be/9QX1Q5Ut9Rs
comment: 10 pages, 4 figures
Understanding the Application of Utility Theory in Robotics and Artificial Intelligence: A Survey
As a unifying concept in economics, game theory, and operations research, even in the Robotics and AI field, the utility is used to evaluate the level of individual needs, preferences, and interests. Especially for decision-making and learning in multi-agent/robot systems (MAS/MRS), a suitable utility model can guide agents in choosing reasonable strategies to achieve their current needs and learning to cooperate and organize their behaviors, optimizing the system's utility, building stable and reliable relationships, and guaranteeing each group member's sustainable development, similar to the human society. Although these systems' complex, large-scale, and long-term behaviors are strongly determined by the fundamental characteristics of the underlying relationships, there has been less discussion on the theoretical aspects of mechanisms and the fields of applications in Robotics and AI. This paper introduces a utility-orient needs paradigm to describe and evaluate inter and outer relationships among agents' interactions. Then, we survey existing literature in relevant fields to support it and propose several promising research directions along with some open problems deemed necessary for further investigations.
comment: I am not sure whether withdrawing this paper is suitable. However, right now this paper has significant changes in its topic and author. So, I do not want to lead to any confusion about this paper. In the future, it will have a new version. I hope people will not have issues and confusion about the older one
Object-Centric Kinodynamic Planning for Nonprehensile Robot Rearrangement Manipulation
Nonprehensile actions such as pushing are crucial for addressing multi-object rearrangement problems. Many traditional methods generate robot-centric actions, which differ from intuitive human strategies and are typically inefficient. To this end, we adopt an object-centric planning paradigm and propose a unified framework for addressing a range of large-scale, physics-intensive nonprehensile rearrangement problems challenged by modeling inaccuracies and real-world uncertainties. By assuming each object can actively move without being driven by robot interactions, our planner first computes desired object motions, which are then realized through robot actions generated online via a closed-loop pushing strategy. Through extensive experiments and in comparison with state-of-the-art baselines in both simulation and on a physical robot, we show that our object-centric planning framework can generate more intuitive and task-effective robot actions with significantly improved efficiency. In addition, we propose a benchmarking protocol to standardize and facilitate future research in nonprehensile rearrangement.
PoseDiff: A Unified Diffusion Model Bridging Robot Pose Estimation and Video-to-Action Control
We present PoseDiff, a conditional diffusion model that unifies robot state estimation and control within a single framework. At its core, PoseDiff maps raw visual observations into structured robot states-such as 3D keypoints or joint angles-from a single RGB image, eliminating the need for multi-stage pipelines or auxiliary modalities. Building upon this foundation, PoseDiff extends naturally to video-to-action inverse dynamics: by conditioning on sparse video keyframes generated by world models, it produces smooth and continuous long-horizon action sequences through an overlap-averaging strategy. This unified design enables scalable and efficient integration of perception and control. On the DREAM dataset, PoseDiff achieves state-of-the-art accuracy and real-time performance for pose estimation. On Libero-Object manipulation tasks, it substantially improves success rates over existing inverse dynamics modules, even under strict offline settings. Together, these results show that PoseDiff provides a scalable, accurate, and efficient bridge between perception, planning, and control in embodied AI. The video visualization results can be found on the project page: https://haozhuo-zhang.github.io/PoseDiff-project-page/.
comment: The experimental setup and metrics lacks rigor, affecting the fairness of the comparisons
Multiagent Systems
A General Incentives-Based Framework for Fairness in Multi-agent Resource Allocation
We introduce the General Incentives-based Framework for Fairness (GIFF), a novel approach for fair multi-agent resource allocation that infers fair decision-making from standard value functions. In resource-constrained settings, agents optimizing for efficiency often create inequitable outcomes. Our approach leverages the action-value (Q-)function to balance efficiency and fairness without requiring additional training. Specifically, our method computes a local fairness gain for each action and introduces a counterfactual advantage correction term to discourage over-allocation to already well-off agents. This approach is formalized within a centralized control setting, where an arbitrator uses the GIFF-modified Q-values to solve an allocation problem. Empirical evaluations across diverse domains, including dynamic ridesharing, homelessness prevention, and a complex job allocation task-demonstrate that our framework consistently outperforms strong baselines and can discover far-sighted, equitable policies. The framework's effectiveness is supported by a theoretical foundation; we prove its fairness surrogate is a principled lower bound on the true fairness improvement and that its trade-off parameter offers monotonic tuning. Our findings establish GIFF as a robust and principled framework for leveraging standard reinforcement learning components to achieve more equitable outcomes in complex multi-agent systems.
Agentic AI Home Energy Management System: A Large Language Model Framework for Residential Load Scheduling
The electricity sector transition requires substantial increases in residential demand response capacity, yet Home Energy Management Systems (HEMS) adoption remains limited by user interaction barriers requiring translation of everyday preferences into technical parameters. While large language models have been applied to energy systems as code generators and parameter extractors, no existing implementation deploys LLMs as autonomous coordinators managing the complete workflow from natural language input to multi-appliance scheduling. This paper presents an agentic AI HEMS where LLMs autonomously coordinate multi-appliance scheduling from natural language requests to device control, achieving optimal scheduling without example demonstrations. A hierarchical architecture combining one orchestrator with three specialist agents uses the ReAct pattern for iterative reasoning, enabling dynamic coordination without hardcoded workflows while integrating Google Calendar for context-aware deadline extraction. Evaluation across three open-source models using real Austrian day-ahead electricity prices reveals substantial capability differences. Llama-3.3-70B successfully coordinates all appliances across all scenarios to match cost-optimal benchmarks computed via mixed-integer linear programming, while other models achieve perfect single-appliance performance but struggle to coordinate all appliances simultaneously. Progressive prompt engineering experiments demonstrate that analytical query handling without explicit guidance remains unreliable despite models' general reasoning capabilities. We open-source the complete system including orchestration logic, agent prompts, tools, and web interfaces to enable reproducibility, extension, and future research.
comment: 34 pages, 9 figures. Code available at https://github.com/RedaElMakroum/agentic-ai-hems
Stop Wasting Your Tokens: Towards Efficient Runtime Multi-Agent Systems
While Multi-Agent Systems (MAS) excel at complex tasks, their growing autonomy with operational complexity often leads to critical inefficiencies, such as excessive token consumption and failures arising from misinformation. Existing methods primarily focus on post-hoc failure attribution, lacking proactive, real-time interventions to enhance robustness and efficiency. To this end, we introduce SupervisorAgent, a lightweight and modular framework for runtime, adaptive supervision that operates without altering the base agent's architecture. Triggered by an LLM-free adaptive filter, SupervisorAgent intervenes at critical junctures to proactively correct errors, guide inefficient behaviors, and purify observations. On the challenging GAIA benchmark, SupervisorAgent reduces the token consumption of the Smolagent framework by an average of 29.45% without compromising its success rate. Extensive experiments across five additional benchmarks (math reasoning, code generation, and question answering) and various SoTA foundation models validate the broad applicability and robustness of our approach. The code is available at https://github.com/LINs-lab/SupervisorAgent.
Proxemics and Permeability of the Pedestrian Group
People tend to walk in groups, and interactions with those groups have a significant impact on crowd behavior and pedestrian traffic dynamics. Social norms can be seen as unwritten rules regulating people interactions in social settings. This article studies people interactions with groups and the emergence of group proxemics. Group zones, zone occupancy counts and people clearance from the group are studied using naturalistic data. Analysis indicate potential presence of three different zones in addition to the public zone. People tend to remain in the public zone and only progressively get closer to groups, and those closer approaches happen in a low frequency and for brief periods of time.
Life-cycle Modeling and the Walking Behavior of the Pedestrian-Group as an Emergent Agent: With Empirical Data on the Cohesion of the Group Formation
This article investigates the pedestrian group as an emergent agent. The article explores empirical data to derive emergent agency and formation state spaces and outline recurring patterns of walking behavior. In this analysis, pedestrian trajectories extracted from surveillance videos are used along with manually annotated pedestrian group memberships. We conducted manual expert evaluation of observed groups, produced new manual annotations for relevant events pertaining to group behavior and extracted metrics relevant group formation. This information along with quantitative analysis was used to model the life-cycle and formation of the group agent. Those models give structure to expectations around walking behavior of groups; from pedestrian walking independently to the emergence of a collective intention where group members tended to maintain bounded distance between each other. Disturbances to this bounded distance often happened in association with changes in either their agency or their formation states. We summarized the patterns of behavior along with the sequences of state transitions into abstract patterns, which can aid in the development of more detailed group agents in simulation and in the design of engineering systems to interact with such groups.
Adaptive Context Length Optimization with Low-Frequency Truncation for Multi-Agent Reinforcement Learning
Recently, deep multi-agent reinforcement learning (MARL) has demonstrated promising performance for solving challenging tasks, such as long-term dependencies and non-Markovian environments. Its success is partly attributed to conditioning policies on large fixed context length. However, such large fixed context lengths may lead to limited exploration efficiency and redundant information. In this paper, we propose a novel MARL framework to obtain adaptive and effective contextual information. Specifically, we design a central agent that dynamically optimizes context length via temporal gradient analysis, enhancing exploration to facilitate convergence to global optima in MARL. Furthermore, to enhance the adaptive optimization capability of the context length, we present an efficient input representation for the central agent, which effectively filters redundant information. By leveraging a Fourier-based low-frequency truncation method, we extract global temporal trends across decentralized agents, providing an effective and efficient representation of the MARL environment. Extensive experiments demonstrate that the proposed method achieves state-of-the-art (SOTA) performance on long-term dependency tasks, including PettingZoo, MiniGrid, Google Research Football (GRF), and StarCraft Multi-Agent Challenge v2 (SMACv2).
The Geometry of Dialogue: Graphing Language Models to Reveal Synergistic Teams for Multi-Agent Collaboration
While a multi-agent approach based on large language models (LLMs) represents a promising strategy to surpass the capabilities of single models, its success is critically dependent on synergistic team composition. However, forming optimal teams is a significant challenge, as the inherent opacity of most models obscures the internal characteristics necessary for effective collaboration. In this paper, we propose an interaction-centric framework for automatic team composition that does not require any prior knowledge including their internal architectures, training data, or task performances. Our method constructs a "language model graph" that maps relationships between models from the semantic coherence of pairwise conversations, and then applies community detection to identify synergistic model clusters. Our experiments with diverse LLMs demonstrate that the proposed method discovers functionally coherent groups that reflect their latent specializations. Priming conversations with specific topics identified synergistic teams which outperform random baselines on downstream benchmarks and achieve comparable accuracy to that of manually-curated teams based on known model specializations. Our findings provide a new basis for the automated design of collaborative multi-agent LLM teams.
A Research Roadmap for Augmenting Software Engineering Processes and Software Products with Generative AI
Generative AI (GenAI) is rapidly transforming software engineering (SE) practices, influencing how SE processes are executed, as well as how software systems are developed, operated, and evolved. This paper applies design science research to build a roadmap for GenAI-augmented SE. The process consists of three cycles that incrementally integrate multiple sources of evidence, including collaborative discussions from the FSE 2025 "Software Engineering 2030" workshop, rapid literature reviews, and external feedback sessions involving peers. McLuhan's tetrads were used as a conceptual instrument to systematically capture the transforming effects of GenAI on SE processes and software products.The resulting roadmap identifies four fundamental forms of GenAI augmentation in SE and systematically characterizes their related research challenges and opportunities. These insights are then consolidated into a set of future research directions. By grounding the roadmap in a rigorous multi-cycle process and cross-validating it among independent author teams and peers, the study provides a transparent and reproducible foundation for analyzing how GenAI affects SE processes, methods and tools, and for framing future research within this rapidly evolving area. Based on these findings, the article finally makes ten predictions for SE in the year 2030.
Network-Constrained Policy Optimization for Adaptive Multi-agent Vehicle Routing
Traffic congestion in urban road networks leads to longer trip times and higher emissions, especially during peak periods. While the Shortest Path First (SPF) algorithm is optimal for a single vehicle in a static network, it performs poorly in dynamic, multi-vehicle settings, often worsening congestion by routing all vehicles along identical paths. We address dynamic vehicle routing through a multi-agent reinforcement learning (MARL) framework for coordinated, network-aware fleet navigation. We first propose Adaptive Navigation (AN), a decentralized MARL model where each intersection agent provides routing guidance based on (i) local traffic and (ii) neighborhood state modeled using Graph Attention Networks (GAT). To improve scalability in large networks, we further propose Hierarchical Hub-based Adaptive Navigation (HHAN), an extension of AN that assigns agents only to key intersections (hubs). Vehicles are routed hub-to-hub under agent control, while SPF handles micro-routing within each hub region. For hub coordination, HHAN adopts centralized training with decentralized execution (CTDE) under the Attentive Q-Mixing (A-QMIX) framework, which aggregates asynchronous vehicle decisions via attention. Hub agents use flow-aware state features that combine local congestion and predictive dynamics for proactive routing. Experiments on synthetic grids and real urban maps (Toronto, Manhattan) show that AN reduces average travel time versus SPF and learning baselines, maintaining 100% routing success. HHAN scales to networks with hundreds of intersections, achieving up to 15.9% improvement under heavy traffic. These findings highlight the potential of network-constrained MARL for scalable, coordinated, and congestion-aware routing in intelligent transportation systems.
comment: 29 pages, 12 figures. Fazel Arasteh and Arian Haghparast contributed equally to this research. Submitted to ACM Transactions on Spatial Algorithms and Systems (TSAS). The code for this work is publicly available at https://github.com/Arianhgh/HHAN
Cooperative Integrated Estimation-Guidance for Simultaneous Interception of Moving Targets
This paper proposes a cooperative integrated estimation-guidance framework for simultaneous interception of a non-maneuvering target using a team of unmanned autonomous vehicles, assuming only a subset of vehicles are equipped with dedicated sensors to measure the target's states. Unlike earlier approaches that focus solely on either estimation or guidance design, the proposed framework unifies both within a cooperative architecture. To circumvent the limitation posed by heterogeneity in target observability, sensorless vehicles estimate the target's state by leveraging information exchanged with neighboring agents over a directed communication topology through a prescribed-time observer. The proposed approach employs true proportional navigation guidance (TPNG), which uses an exact time-to-go formulation and is applicable across a wide spectrum of target motions. Furthermore, prescribed-time observer and controller are employed to achieve convergence to true target's state and consensus in time-to-go within set predefined times, respectively. Simulations demonstrate the effectiveness of the proposed framework under various engagement scenarios.
Design for One, Deploy for Many: Navigating Tree Mazes with Multiple Agents
Maze-like environments, such as cave and pipe networks, pose unique challenges for multiple robots to coordinate, including communication constraints and congestion. To address these challenges, we propose a distributed multi-agent maze traversal algorithm for environments that can be represented by acyclic graphs. It uses a leader-switching mechanism where one agent, assuming a head role, employs any single-agent maze solver while the other agents each choose an agent to follow. The head role gets transferred to neighboring agents where necessary, ensuring it follows the same path as a single agent would. The multi-agent maze traversal algorithm is evaluated in simulations with groups of up to 300 agents, various maze sizes, and multiple single-agent maze solvers. It is compared against strategies that are na\"ive, or assume either global communication or full knowledge of the environment. The algorithm outperforms the na\"ive strategy in terms of makespan and sum-of-fuel. It is superior to the global-communication strategy in terms of makespan but is inferior to it in terms of sum-of-fuel. The findings suggest it is asymptotically equivalent to the full-knowledge strategy with respect to either metric. Moreover, real-world experiments with up to 20 Pi-puck robots confirm the feasibility of the approach.
comment: 7 pages, 7 figures, to be published in MRS 2025
The Denario project: Deep knowledge AI agents for scientific discovery
We present Denario, an AI multi-agent system designed to serve as a scientific research assistant. Denario can perform many different tasks, such as generating ideas, checking the literature, developing research plans, writing and executing code, making plots, and drafting and reviewing a scientific paper. The system has a modular architecture, allowing it to handle specific tasks, such as generating an idea, or carrying out end-to-end scientific analysis using Cmbagent as a deep-research backend. In this work, we describe in detail Denario and its modules, and illustrate its capabilities by presenting multiple AI-generated papers generated by it in many different scientific disciplines such as astrophysics, biology, biophysics, biomedical informatics, chemistry, material science, mathematical physics, medicine, neuroscience and planetary science. Denario also excels at combining ideas from different disciplines, and we illustrate this by showing a paper that applies methods from quantum physics and machine learning to astrophysical data. We report the evaluations performed on these papers by domain experts, who provided both numerical scores and review-like feedback. We then highlight the strengths, weaknesses, and limitations of the current system. Finally, we discuss the ethical implications of AI-driven research and reflect on how such technology relates to the philosophy of science. We publicly release the code at https://github.com/AstroPilot-AI/Denario. A Denario demo can also be run directly on the web at https://huggingface.co/spaces/astropilot-ai/Denario, and the full app will be deployed on the cloud.
comment: 272 pages. Examples of 11 AI-generated paper drafts from different scientific disciplines. Code publicly available at https://github.com/AstroPilot-AI/Denario
Urban-MAS: Human-Centered Urban Prediction with LLM-Based Multi-Agent System SP
Urban Artificial Intelligence (Urban AI) has advanced human-centered urban tasks such as perception prediction and human dynamics. Large Language Models (LLMs) can integrate multimodal inputs to address heterogeneous data in complex urban systems but often underperform on domain-specific tasks. Urban-MAS, an LLM-based Multi-Agent System (MAS) framework, is introduced for human-centered urban prediction under zero-shot settings. It includes three agent types: Predictive Factor Guidance Agents, which prioritize key predictive factors to guide knowledge extraction and enhance the effectiveness of compressed urban knowledge in LLMs; Reliable UrbanInfo Extraction Agents, which improve robustness by comparing multiple outputs, validating consistency, and re-extracting when conflicts occur; and Multi-UrbanInfo Inference Agents, which integrate extracted multi-source information across dimensions for prediction. Experiments on running-amount prediction and urban perception across Tokyo, Milan, and Seattle demonstrate that Urban-MAS substantially reduces errors compared to single-LLM baselines. Ablation studies indicate that Predictive Factor Guidance Agents are most critical for enhancing predictive performance, positioning Urban-MAS as a scalable paradigm for human-centered urban AI prediction. Code is available on the project website:https://github.com/THETUREHOOHA/UrbanMAS
comment: Accepted to The 3rd ACM SIGSPATIAL International Workshop on Advances in Urban AI (UrbanAI'25)
MedAgentBoard: Benchmarking Multi-Agent Collaboration with Conventional Methods for Diverse Medical Tasks NeurIPS 2025
The rapid advancement of Large Language Models (LLMs) has stimulated interest in multi-agent collaboration for addressing complex medical tasks. However, the practical advantages of multi-agent collaboration approaches remain insufficiently understood. Existing evaluations often lack generalizability, failing to cover diverse tasks reflective of real-world clinical practice, and frequently omit rigorous comparisons against both single-LLM-based and established conventional methods. To address this critical gap, we introduce MedAgentBoard, a comprehensive benchmark for the systematic evaluation of multi-agent collaboration, single-LLM, and conventional approaches. MedAgentBoard encompasses four diverse medical task categories: (1) medical (visual) question answering, (2) lay summary generation, (3) structured Electronic Health Record (EHR) predictive modeling, and (4) clinical workflow automation, across text, medical images, and structured EHR data. Our extensive experiments reveal a nuanced landscape: while multi-agent collaboration demonstrates benefits in specific scenarios, such as enhancing task completeness in clinical workflow automation, it does not consistently outperform advanced single LLMs (e.g., in textual medical QA) or, critically, specialized conventional methods that generally maintain better performance in tasks like medical VQA and EHR-based prediction. MedAgentBoard offers a vital resource and actionable insights, emphasizing the necessity of a task-specific, evidence-based approach to selecting and developing AI solutions in medicine. It underscores that the inherent complexity and overhead of multi-agent collaboration must be carefully weighed against tangible performance gains. All code, datasets, detailed prompts, and experimental results are open-sourced at https://medagentboard.netlify.app/.
comment: Accepted by NeurIPS 2025 Datasets & Benchmarks Track
Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers
Academic poster generation is a crucial yet challenging task in scientific communication, requiring the compression of long-context interleaved documents into a single, visually coherent page. To address this challenge, we introduce the first benchmark and metric suite for poster generation, which pairs recent conference papers with author-designed posters and evaluates outputs on (i)Visual Quality-semantic alignment with human posters, (ii)Textual Coherence-language fluency, (iii)Holistic Assessment-six fine-grained aesthetic and informational criteria scored by a VLM-as-judge, and notably (iv)PaperQuiz-the poster's ability to convey core paper content as measured by VLMs answering generated quizzes. Building on this benchmark, we propose PosterAgent, a top-down, visual-in-the-loop multi-agent pipeline: the (a)Parser distills the paper into a structured asset library; the (b)Planner aligns text-visual pairs into a binary-tree layout that preserves reading order and spatial balance; and the (c)Painter-Commenter loop refines each panel by executing rendering code and using VLM feedback to eliminate overflow and ensure alignment. In our comprehensive evaluation, we find that GPT-4o outputs-though visually appealing at first glance-often exhibit noisy text and poor PaperQuiz scores, and we find that reader engagement is the primary aesthetic bottleneck, as human-designed posters rely largely on visual semantics to convey meaning. Our fully open-source variants (e.g. based on the Qwen-2.5 series) outperform existing 4o-driven multi-agent systems across nearly all metrics, while using 87% fewer tokens. It transforms a 22-page paper into a finalized yet editable .pptx poster - all for just $0.005. These findings chart clear directions for the next generation of fully automated poster-generation models. The code and datasets are available at https://github.com/Paper2Poster/Paper2Poster.
comment: Project Page: https://github.com/Paper2Poster/Paper2Poster
Understanding the Application of Utility Theory in Robotics and Artificial Intelligence: A Survey
As a unifying concept in economics, game theory, and operations research, even in the Robotics and AI field, the utility is used to evaluate the level of individual needs, preferences, and interests. Especially for decision-making and learning in multi-agent/robot systems (MAS/MRS), a suitable utility model can guide agents in choosing reasonable strategies to achieve their current needs and learning to cooperate and organize their behaviors, optimizing the system's utility, building stable and reliable relationships, and guaranteeing each group member's sustainable development, similar to the human society. Although these systems' complex, large-scale, and long-term behaviors are strongly determined by the fundamental characteristics of the underlying relationships, there has been less discussion on the theoretical aspects of mechanisms and the fields of applications in Robotics and AI. This paper introduces a utility-orient needs paradigm to describe and evaluate inter and outer relationships among agents' interactions. Then, we survey existing literature in relevant fields to support it and propose several promising research directions along with some open problems deemed necessary for further investigations.
comment: I am not sure whether withdrawing this paper is suitable. However, right now this paper has significant changes in its topic and author. So, I do not want to lead to any confusion about this paper. In the future, it will have a new version. I hope people will not have issues and confusion about the older one
LAFA: Agentic LLM-Driven Federated Analytics over Decentralized Data Sources
Large Language Models (LLMs) have shown great promise in automating data analytics tasks by interpreting natural language queries and generating multi-operation execution plans. However, existing LLM-agent-based analytics frameworks operate under the assumption of centralized data access, offering little to no privacy protection. In contrast, federated analytics (FA) enables privacy-preserving computation across distributed data sources, but lacks support for natural language input and requires structured, machine-readable queries. In this work, we present LAFA, the first system that integrates LLM-agent-based data analytics with FA. LAFA introduces a hierarchical multi-agent architecture that accepts natural language queries and transforms them into optimized, executable FA workflows. A coarse-grained planner first decomposes complex queries into sub-queries, while a fine-grained planner maps each subquery into a Directed Acyclic Graph of FA operations using prior structural knowledge. To improve execution efficiency, an optimizer agent rewrites and merges multiple DAGs, eliminating redundant operations and minimizing computational and communicational overhead. Our experiments demonstrate that LAFA consistently outperforms baseline prompting strategies by achieving higher execution plan success rates and reducing resource-intensive FA operations by a substantial margin. This work establishes a practical foundation for privacy-preserving, LLM-driven analytics that supports natural language input in the FA setting.
comment: This paper has been accepted by the 16th IEEE International Conference on Cloud Computing Technology and Science (CloudCom 2025)
Robotics
STITCH 2.0: Extending Augmented Suturing with EKF Needle Estimation and Thread Management
Surgical suturing is a high-precision task that impacts patient healing and scarring. Suturing skill varies widely between surgeons, highlighting the need for robot assistance. Previous robot suturing works, such as STITCH 1.0 [1], struggle to fully close wounds due to inaccurate needle tracking and poor thread management. To address these challenges, we present STITCH 2.0, an elevated augmented dexterity pipeline with seven improvements including: improved EKF needle pose estimation, new thread untangling methods, and an automated 3D suture alignment algorithm. Experimental results over 15 trials find that STITCH 2.0 on average achieves 74.4% wound closure with 4.87 sutures per trial, representing 66% more sutures in 38% less time compared to the previous baseline. When two human interventions are allowed, STITCH 2.0 averages six sutures with 100% wound closure rate. Project website: https://stitch-2.github.io/
comment: Published in RA-L 2025
GET-USE: Learning Generalized Tool Usage for Bimanual Mobile Manipulation via Simulated Embodiment Extensions
The ability to use random objects as tools in a generalizable manner is a missing piece in robots' intelligence today to boost their versatility and problem-solving capabilities. State-of-the-art robotic tool usage methods focused on procedurally generating or crowd-sourcing datasets of tools for a task to learn how to grasp and manipulate them for that task. However, these methods assume that only one object is provided and that it is possible, with the correct grasp, to perform the task; they are not capable of identifying, grasping, and using the best object for a task when many are available, especially when the optimal tool is absent. In this work, we propose GeT-USE, a two-step procedure that learns to perform real-robot generalized tool usage by learning first to extend the robot's embodiment in simulation and then transferring the learned strategies to real-robot visuomotor policies. Our key insight is that by exploring a robot's embodiment extensions (i.e., building new end-effectors) in simulation, the robot can identify the general tool geometries most beneficial for a task. This learned geometric knowledge can then be distilled to perform generalized tool usage tasks by selecting and using the best available real-world object as tool. On a real robot with 22 degrees of freedom (DOFs), GeT-USE outperforms state-of-the-art methods by 30-60% success rates across three vision-based bimanual mobile manipulation tool-usage tasks.
comment: 8 pages, 7 figures
Modeling Collapse of Steered Vine Robots Under Their Own Weight
Soft, vine-inspired growing robots that move by eversion are highly mobile in confined environments, but, when faced with gaps in the environment, they may collapse under their own weight while navigating a desired path. In this work, we present a comprehensive collapse model that can predict the collapse length of steered robots in any shape using true shape information and tail tension. We validate this model by collapsing several unsteered robots without true shape information. The model accurately predicts the trends of those experiments. We then attempt to collapse a robot steered with a single actuator at different orientations. Our models accurately predict collapse when it occurs. Finally, we demonstrate how this could be used in the field by having a robot attempt a gap-crossing task with and without inflating its actuators. The robot needs its actuators inflated to cross the gap without collapsing, which our model supports. Our model has been specifically tested on straight and series pouch motor-actuated robots made of non-stretchable material, but it could be applied to other robot variations. This work enables us to model the robot's collapse behavior in any open environment and understand the parameters it needs to succeed in 3D navigation tasks.
Robotic Assistant: Completing Collaborative Tasks with Dexterous Vision-Language-Action Models
We adapt a pre-trained Vision-Language-Action (VLA) model (Open-VLA) for dexterous human-robot collaboration with minimal language prompting. Our approach adds (i) FiLM conditioning to visual backbones for task-aware perception, (ii) an auxiliary intent head that predicts collaborator hand pose and target cues, and (iii) action-space post-processing that predicts compact deltas (position/rotation) and PCA-reduced finger joints before mapping to full commands. Using a multi-view, teleoperated Franka and Mimic-hand dataset augmented with MediaPipe hand poses, we demonstrate that delta actions are well-behaved and that four principal components explain ~96% of hand-joint variance. Ablations identify action post-processing as the primary performance driver; auxiliary intent helps, FiLM is mixed, and a directional motion loss is detrimental. A real-time stack (~0.3 s latency on one RTX 4090) composes "pick-up" and "pass" into a long-horizon behavior. We surface "trainer overfitting" to specific demonstrators as the key limitation.
Collision avoidance and path finding in a robotic mobile fulfillment system using multi-objective meta-heuristics
Multi-Agent Path Finding (MAPF) has gained significant attention, with most research focusing on minimizing collisions and travel time. This paper also considers energy consumption in the path planning of automated guided vehicles (AGVs). It addresses two main challenges: i) resolving collisions between AGVs and ii) assigning tasks to AGVs. We propose a new collision avoidance strategy that takes both energy use and travel time into account. For task assignment, we present two multi-objective algorithms: Non-Dominated Sorting Genetic Algorithm (NSGA) and Adaptive Large Neighborhood Search (ALNS). Comparative evaluations show that these proposed methods perform better than existing approaches in both collision avoidance and task assignment.
Learning to Plan & Schedule with Reinforcement-Learned Bimanual Robot Skills
Long-horizon contact-rich bimanual manipulation presents a significant challenge, requiring complex coordination involving a mixture of parallel execution and sequential collaboration between arms. In this paper, we introduce a hierarchical framework that frames this challenge as an integrated skill planning & scheduling problem, going beyond purely sequential decision-making to support simultaneous skill invocation. Our approach is built upon a library of single-arm and bimanual primitive skills, each trained using Reinforcement Learning (RL) in GPU-accelerated simulation. We then train a Transformer-based planner on a dataset of skill compositions to act as a high-level scheduler, simultaneously predicting the discrete schedule of skills as well as their continuous parameters. We demonstrate that our method achieves higher success rates on complex, contact-rich tasks than end-to-end RL approaches and produces more efficient, coordinated behaviors than traditional sequential-only planners.
Don't Blind Your VLA: Aligning Visual Representations for OOD Generalization
The growing success of Vision-Language-Action (VLA) models stems from the promise that pretrained Vision-Language Models (VLMs) can endow agents with transferable world knowledge and vision-language (VL) grounding, laying a foundation for action models with broader generalization. Yet when these VLMs are adapted to the action modality, it remains unclear to what extent their original VL representations and knowledge are preserved. In this work, we conduct a systematic study of representation retention during VLA fine-tuning, showing that naive action fine-tuning leads to degradation of visual representations. To characterize and measure these effects, we probe VLA's hidden representations and analyze attention maps, further, we design a set of targeted tasks and methods that contrast VLA models with their counterpart VLMs, isolating changes in VL capabilities induced by action fine-tuning. We further evaluate a range of strategies for aligning visual representations and introduce a simple yet effective method that mitigates degradation and yields improved generalization to out-of-distribution (OOD) scenarios. Taken together, our analysis clarifies the trade-off between action fine-tuning and the degradation of VL representations and highlights practical approaches to recover inherited VL capabilities. Code is publicly available: https://blind-vla-paper.github.io
comment: 13 pages, 6 figures
Incorporating Social Awareness into Control of Unknown Multi-Agent Systems: A Real-Time Spatiotemporal Tubes Approach
This paper presents a decentralized control framework that incorporates social awareness into multi-agent systems with unknown dynamics to achieve prescribed-time reach-avoid-stay tasks in dynamic environments. Each agent is assigned a social awareness index that quantifies its level of cooperation or self-interest, allowing heterogeneous social behaviors within the system. Building on the spatiotemporal tube (STT) framework, we propose a real-time STT framework that synthesizes tubes online for each agent while capturing its social interactions with others. A closed-form, approximation-free control law is derived to ensure that each agent remains within its evolving STT, thereby avoiding dynamic obstacles while also preventing inter-agent collisions in a socially aware manner, and reaching the target within a prescribed time. The proposed approach provides formal guarantees on safety and timing, and is computationally lightweight, model-free, and robust to unknown disturbances. The effectiveness and scalability of the framework are validated through simulation and hardware experiments on a 2D omnidirectional
Using VLM Reasoning to Constrain Task and Motion Planning ICRA 2026
In task and motion planning, high-level task planning is done over an abstraction of the world to enable efficient search in long-horizon robotics problems. However, the feasibility of these task-level plans relies on the downward refinability of the abstraction into continuous motion. When a domain's refinability is poor, task-level plans that appear valid may ultimately fail during motion planning, requiring replanning and resulting in slower overall performance. Prior works mitigate this by encoding refinement issues as constraints to prune infeasible task plans. However, these approaches only add constraints upon refinement failure, expending significant search effort on infeasible branches. We propose VIZ-COAST, a method of leveraging the common-sense spatial reasoning of large pretrained Vision-Language Models to identify issues with downward refinement a priori, bypassing the need to fix these failures during planning. Experiments on two challenging TAMP domains show that our approach is able to extract plausible constraints from images and domain descriptions, drastically reducing planning times and, in some cases, eliminating downward refinement failures altogether, generalizing to a diverse range of instances from the broader domain.
comment: 8 pages, 7 figures, 1 table. Submitted to ICRA 2026
Octopus-like Reaching Motion: A Perspective Inspired by Whipping
The stereotypical reaching motion of the octopus arm has drawn growing attention for its efficient control of a highly deformable body. Previous studies suggest that its characteristic bend propagation may share underlying principles with the dynamics of a whip. This work investigates whether whip-like passive dynamics in water can reproduce the kinematic features observed in biological reaching and their similarities and differences. Platform-based whipping tests were performed in water and air while systematically varying material stiffness and driving speed. Image-based quantification revealed that the Ecoflex Gel 2 arm driven at 150 rpm (motor speed) reproduced curvature propagation similar to that observed in octopus reaching. However, its bend-point velocity decreased monotonically rather than exhibiting the biological bell-shaped profile, confirming that the octopus reaching movement is not merely a passive whipping behavior. The absence of propagation in air further highlights the critical role of the surrounding medium in forming octopus-like reaching motion. This study provides a new perspective for understand biological reaching movement, and offers a potential platform for future hydrodynamic research.
comment: The first two listed authors contributed equally. Yiyuan Zhang is the corresponding author
Combining Moving Mass Actuators and Manoeuvring Models for Underwater Vehicles: A Lagrangian Approach
In this paper, we present a Newton-Euler formulation of the equations of motion for underwater vehicles with an interntal moving mass actuator. Furthermore, the moving mass dynamics are expressed as an extension to the manoeuvring model for underwater vehicles, originally introduced by Fossen (1991). The influence of the moving mass is described in body-frame and included as states in both an additional kinematic equation and as part of the coupled rigid-body kinetics of the underwater vehicle. The Coriolis-centripetal effects are derived from Kirchhoff's equations and the hydrostatics are derived using first principals. The proposed Newton-Euler model is validated through simulation and compared with the traditional Hamiltonian internal moving mass actuator formulation.
comment: \c{opyright} 2025 Alexander Rambech, Ivar Saksvik and Vahid Hassani. Accepted by IFAC for publication under a Creative Commons License CC-BY-NC-ND
SPADE: Sparsity Adaptive Depth Estimator for Zero-Shot, Real-Time, Monocular Depth Estimation in Underwater Environments
Underwater infrastructure requires frequent inspection and maintenance due to harsh marine conditions. Current reliance on human divers or remotely operated vehicles is limited by perceptual and operational challenges, especially around complex structures or in turbid water. Enhancing the spatial awareness of underwater vehicles is key to reducing piloting risks and enabling greater autonomy. To address these challenges, we present SPADE: SParsity Adaptive Depth Estimator, a monocular depth estimation pipeline that combines pre-trained relative depth estimator with sparse depth priors to produce dense, metric scale depth maps. Our two-stage approach first scales the relative depth map with the sparse depth points, then refines the final metric prediction with our proposed Cascade Conv-Deformable Transformer blocks. Our approach achieves improved accuracy and generalisation over state-of-the-art baselines and runs efficiently at over 15 FPS on embedded hardware, promising to support practical underwater inspection and intervention. This work has been submitted to IEEE Journal of Oceanic Engineering Special Issue of AUV 2026.
Solving the Right Problem with Multi-Robot Formations
Formation control simplifies minimizing multi-robot cost functions by encoding a cost function as a shape the robots maintain. However, by reducing complex cost functions to formations, discrepancies arise between maintaining the shape and minimizing the original cost function. For example, a Diamond or Box formation shape is often used for protecting all members of the formation. When more information about the surrounding environment becomes available, a static shape often no longer minimizes the original protection cost. We propose a formation planner to reduce mismatch between a formation and the cost function while still leveraging efficient formation controllers. Our formation planner is a two-step optimization problem that identifies desired relative robot positions. We first solve a constrained problem to estimate non-linear and non-differentiable costs with a weighted sum of surrogate cost functions. We theoretically analyze this problem and identify situations where weights do not need to be updated. The weighted, surrogate cost function is then minimized using relative positions between robots. The desired relative positions are realized using a non-cooperative formation controller derived from Lyapunov's direct approach. We then demonstrate the efficacy of this approach for military-like costs such as protection and obstacle avoidance. In simulations, we show a formation planner can reduce a single cost by over 75%. When minimizing a variety of cost functions simultaneously, using a formation planner with adaptive weights can reduce the cost by 20-40%. Formation planning provides better performance by minimizing a surrogate cost function that closely approximates the original cost function instead of relying on a shape abstraction.
comment: Submitted to SAE WCX 2026
Sim-to-Real Gentle Manipulation of Deformable and Fragile Objects with Stress-Guided Reinforcement Learning
Robotic manipulation of deformable and fragile objects presents significant challenges, as excessive stress can lead to irreversible damage to the object. While existing solutions rely on accurate object models or specialized sensors and grippers, this adds complexity and often lacks generalization. To address this problem, we present a vision-based reinforcement learning approach that incorporates a stress-penalized reward to discourage damage to the object explicitly. In addition, to bootstrap learning, we incorporate offline demonstrations as well as a designed curriculum progressing from rigid proxies to deformables. We evaluate the proposed method in both simulated and real-world scenarios, showing that the policy learned in simulation can be transferred to the real world in a zero-shot manner, performing tasks such as picking up and pushing tofu. Our results show that the learned policies exhibit a damage-aware, gentle manipulation behavior, demonstrating their effectiveness by decreasing the stress applied to fragile objects by 36.5% while achieving the task goals, compared to vanilla RL policies.
comment: Under review
Integrating Legal and Logical Specifications in Perception, Prediction, and Planning for Automated Driving: A Survey of Methods
This survey provides an analysis of current methodologies integrating legal and logical specifications into the perception, prediction, and planning modules of automated driving systems. We systematically explore techniques ranging from logic-based frameworks to computational legal reasoning approaches, emphasizing their capability to ensure regulatory compliance and interpretability in dynamic and uncertain driving environments. A central finding is that significant challenges arise at the intersection of perceptual reliability, legal compliance, and decision-making justifiability. To systematically analyze these challenges, we introduce a taxonomy categorizing existing approaches by their theoretical foundations, architectural implementations, and validation strategies. We particularly focus on methods that address perceptual uncertainty and incorporate explicit legal norms, facilitating decisions that are both technically robust and legally defensible. The review covers neural-symbolic integration methods for perception, logic-driven rule representation, and norm-aware prediction strategies, all contributing toward transparent and accountable autonomous vehicle operation. We highlight critical open questions and practical trade-offs that must be addressed, offering multidisciplinary insights from engineering, logic, and law to guide future developments in legally compliant autonomous driving systems.
comment: Accepted to 2025 IEEE International Automated Vehicle Validation Conference (IAVVC)
Geometric Robot Calibration Using a Calibration Plate
In this paper a new method for geometric robot calibration is introduced, which uses a calibration plate with precisely known distances between its measuring points. The relative measurement between two points on the calibration plate is used to determine predefined error parameters of the system. In comparison to conventional measurement methods, like laser tracker or motion capture systems, the calibration plate provides a more mechanically robust and cheaper alternative, which is furthermore easier to transport due to its small size. The calibration method, the plate design, the mathematical description of the error system as well as the identification of the parameters are described in detail. For identifying the error parameters, the least squares method and a constrained optimization problem are used. The functionality of this method was demonstrated in experiments that led to promising results, correlated with one of a laser tracker calibration. The modeling and identification of the error parameters is done for a gantry machine, but is not restricted to that type of robot.
comment: pp 309-317
An approach for combining transparency and motion assistance of a lower body exoskeleton
In this paper, an approach for gait assistance with a lower body exoskeleton is described. Two concepts, transparency and motion assistance, are combined. The transparent mode, where the system is following the user's free motion with a minimum of perceived interaction forces, is realized by exploiting the gear backlash of the actuation units. During walking a superimposed assistance mode applies an additional torque guiding the legs to their estimated future position. The concept of adaptive oscillators is utilized to learn the quasi-periodic signals typical for locomotion. First experiments showed promising results.
comment: 8 pages
Seeing Clearly and Deeply: An RGBD Imaging Approach with a Bio-inspired Monocentric Design
Achieving high-fidelity, compact RGBD imaging presents a dual challenge: conventional compact optics struggle with RGB sharpness across the entire depth-of-field, while software-only Monocular Depth Estimation (MDE) is an ill-posed problem reliant on unreliable semantic priors. While deep optics with elements like DOEs can encode depth, they introduce trade-offs in fabrication complexity and chromatic aberrations, compromising simplicity. To address this, we first introduce a novel bio-inspired all-spherical monocentric lens, around which we build the Bionic Monocentric Imaging (BMI) framework, a holistic co-design. This optical design naturally encodes depth into its depth-varying Point Spread Functions (PSFs) without requiring complex diffractive or freeform elements. We establish a rigorous physically-based forward model to generate a synthetic dataset by precisely simulating the optical degradation process. This simulation pipeline is co-designed with a dual-head, multi-scale reconstruction network that employs a shared encoder to jointly recover a high-fidelity All-in-Focus (AiF) image and a precise depth map from a single coded capture. Extensive experiments validate the state-of-the-art performance of the proposed framework. In depth estimation, the method attains an Abs Rel of 0.026 and an RMSE of 0.130, markedly outperforming leading software-only approaches and other deep optics systems. For image restoration, the system achieves an SSIM of 0.960 and a perceptual LPIPS score of 0.082, thereby confirming a superior balance between image fidelity and depth accuracy. This study illustrates that the integration of bio-inspired, fully spherical optics with a joint reconstruction algorithm constitutes an effective strategy for addressing the intrinsic challenges in high-performance compact RGBD imaging. Source code will be publicly available at https://github.com/ZongxiYu-ZJU/BMI.
comment: The source code will be publicly available at https://github.com/ZongxiYu-ZJU/BMI
Development of Implicit-Explicit Control Based Amphibious Centipede-Type Robot and Evaluation of its Mobile Performance
Multi-legged mobile robots possess high mobility performance in rough terrain environments, stemming from their high postural stability, joint flexibility, and the redundancy provided by multiple legs. In prior research on navigating between different environments such as land and water, the primary strategy employed involves switching to a controller that generates an appropriate gait for the new environment upon entering it. However, designing appropriate gaits for each complex and diverse environment and accurately determining controller switching for each environment is challenging. Therefore, this research develops a centipede-type mobile robot that navigates both aquatic and terrestrial environments with a simple, unified control scheme, based on the implicit-explicit control philosophy and by ingeniously designing the robot's body structure. In this research, we developed the robot featuring flexible joints and left and right legs on each body segment and focused on the leg structure which has extensive contact with the environment. This paper evaluates the locomotion performance on land and water using the three developed leg structures, using the robot's leg slip rate and actuator energy consumption as evaluation metrics. The experimental results confirmed the existence of an appropriate leg structure capable of navigating both aquatic and terrestrial environments under identical control.
SynHLMA:Synthesizing Hand Language Manipulation for Articulated Object with Discrete Human Object Interaction Representation
Generating hand grasps with language instructions is a widely studied topic that benefits from embodied AI and VR/AR applications. While transferring into hand articulatied object interaction (HAOI), the hand grasps synthesis requires not only object functionality but also long-term manipulation sequence along the object deformation. This paper proposes a novel HAOI sequence generation framework SynHLMA, to synthesize hand language manipulation for articulated objects. Given a complete point cloud of an articulated object, we utilize a discrete HAOI representation to model each hand object interaction frame. Along with the natural language embeddings, the representations are trained by an HAOI manipulation language model to align the grasping process with its language description in a shared representation space. A joint-aware loss is employed to ensure hand grasps follow the dynamic variations of articulated object joints. In this way, our SynHLMA achieves three typical hand manipulation tasks for articulated objects of HAOI generation, HAOI prediction and HAOI interpolation. We evaluate SynHLMA on our built HAOI-lang dataset and experimental results demonstrate the superior hand grasp sequence generation performance comparing with state-of-the-art. We also show a robotics grasp application that enables dexterous grasps execution from imitation learning using the manipulation sequence provided by our SynHLMA. Our codes and datasets will be made publicly available.
Time-Optimal Transport of Loosely Placed Liquid Filled Cups along Prescribed Paths
Handling loosely placed objects with robotic manipulators is a difficult task from the point of view of trajectory planning and control. This becomes even more challenging when the object to be handled is a container filled with liquid. This paper addresses the task of transporting a liquid-filled cup placed on a tray along a prescribed path in shortest time. The objective is to minimize swapping, thus avoiding spillage of the fluid. To this end, the sloshing dynamics is incorporated into the dynamic model used within the optimal control problem formulation. The optimization problem is solved using a direct multiple shooting approach.
One-shot Humanoid Whole-body Motion Learning
Whole-body humanoid motion represents a cornerstone challenge in robotics, integrating balance, coordination, and adaptability to enable human-like behaviors. However, existing methods typically require multiple training samples per motion category, rendering the collection of high-quality human motion datasets both labor-intensive and costly. To address this, we propose a novel approach that trains effective humanoid motion policies using only a single non-walking target motion sample alongside readily available walking motions. The core idea lies in leveraging order-preserving optimal transport to compute distances between walking and non-walking sequences, followed by interpolation along geodesics to generate new intermediate pose skeletons, which are then optimized for collision-free configurations and retargeted to the humanoid before integration into a simulated environment for policy training via reinforcement learning. Experimental evaluations on the CMU MoCap dataset demonstrate that our method consistently outperforms baselines, achieving superior performance across metrics. Code will be released upon acceptance.
comment: 10 pages, 3 figures, 5 tables
Hybrid Vision Servoing with Depp Alignment and GRU-Based Occlusion Recovery
Vision-based control systems, such as image-based visual servoing (IBVS), have been extensively explored for precise robot manipulation. A persistent challenge, however, is maintaining robust target tracking under partial or full occlusions. Classical methods like Lucas-Kanade (LK) offer lightweight tracking but are fragile to occlusion and drift, while deep learning-based approaches often require continuous visibility and intensive computation. To address these gaps, we propose a hybrid visual tracking framework that bridges advanced perception with real-time servo control. First, a fast global template matcher constrains the pose search region; next, a deep-feature Lucas-Kanade module operating on early VGG layers refines alignment to sub-pixel accuracy (<2px); then, a lightweight residual regressor corrects local misalignments caused by texture degradation or partial occlusion. When visual confidence falls below a threshold, a GRU-based predictor seamlessly extrapolates pose updates from recent motion history. Crucially, the pipeline's final outputs-translation, rotation, and scale deltas-are packaged as direct control signals for 30Hz image-based servo loops. Evaluated on handheld video sequences with up to 90% occlusion, our system sustains under 2px tracking error, demonstrating the robustness and low-latency precision essential for reliable real-world robot vision applications.
RoadSens-4M: A Multimodal Smartphone & Camera Dataset for Holistic Road-way Analysis
It's important to monitor road issues such as bumps and potholes to enhance safety and improve road conditions. Smartphones are equipped with various built-in sensors that offer a cost-effective and straightforward way to assess road quality. However, progress in this area has been slow due to the lack of high-quality, standardized datasets. This paper discusses a new dataset created by a mobile app that collects sensor data from devices like GPS, accelerometers, gyroscopes, magnetometers, gravity sensors, and orientation sensors. This dataset is one of the few that integrates Geographic Information System (GIS) data with weather information and video footage of road conditions, providing a comprehensive understanding of road issues with geographic context. The dataset allows for a clearer analysis of road conditions by compiling essential data, including vehicle speed, acceleration, rotation rates, and magnetic field intensity, along with the visual and spatial context provided by GIS, weather, and video data. Its goal is to provide funding for initiatives that enhance traffic management, infrastructure development, road safety, and urban planning. Additionally, the dataset will be publicly accessible to promote further research and innovation in smart transportation systems.
SoraNav: Adaptive UAV Task-Centric Navigation via Zeroshot VLM Reasoning
Interpreting visual observations and natural language instructions for complex task execution remains a key challenge in robotics and AI. Despite recent advances, language-driven navigation is still difficult, particularly for UAVs in small-scale 3D environments. Existing Vision-Language Navigation (VLN) approaches are mostly designed for ground robots and struggle to generalize to aerial tasks that require full 3D spatial reasoning. The emergence of large Vision-Language Models (VLMs), such as GPT and Claude, enables zero-shot semantic reasoning from visual and textual inputs. However, these models lack spatial grounding and are not directly applicable to navigation. To address these limitations, SoraNav is introduced, an adaptive UAV navigation framework that integrates zero-shot VLM reasoning with geometry-aware decision-making. Geometric priors are incorporated into image annotations to constrain the VLM action space and improve decision quality. A hybrid switching strategy leverages navigation history to alternate between VLM reasoning and geometry-based exploration, mitigating dead-ends and redundant revisits. A PX4-based hardware-software platform, comprising both a digital twin and a physical micro-UAV, enables reproducible evaluation. Experimental results show that in 2.5D scenarios, our method improves Success Rate (SR) by 25.7% and Success weighted by Path Length (SPL) by 17%. In 3D scenarios, it improves SR by 29.5% and SPL by 18.5% relative to the baseline.
Learning Spatial-Aware Manipulation Ordering NeurIPS 2025
Manipulation in cluttered environments is challenging due to spatial dependencies among objects, where an improper manipulation order can cause collisions or blocked access. Existing approaches often overlook these spatial relationships, limiting their flexibility and scalability. To address these limitations, we propose OrderMind, a unified spatial-aware manipulation ordering framework that directly learns object manipulation priorities based on spatial context. Our architecture integrates a spatial context encoder with a temporal priority structuring module. We construct a spatial graph using k-Nearest Neighbors to aggregate geometric information from the local layout and encode both object-object and object-manipulator interactions to support accurate manipulation ordering in real-time. To generate physically and semantically plausible supervision signals, we introduce a spatial prior labeling method that guides a vision-language model to produce reasonable manipulation orders for distillation. We evaluate OrderMind on our Manipulation Ordering Benchmark, comprising 163,222 samples of varying difficulty. Extensive experiments in both simulation and real-world environments demonstrate that our method significantly outperforms prior approaches in effectiveness and efficiency, enabling robust manipulation in cluttered scenes.
comment: Accepted to NeurIPS 2025
NanoVLA: Routing Decoupled Vision-Language Understanding for Nano-sized Generalist Robotic Policies
Vision-language-action (VLA) models have significantly advanced robotic manipulation by integrating vision-language models (VLMs), and action decoders into a unified architecture. However, their deployment on resource-constrained edge devices, such as mobile robots or embedded systems (e.g., Jetson Orin Nano), remains challenging due to high computational demands, especially in real-world scenarios where power, latency, and computational resources are critical. To close this gap, we introduce Nano-scale Vision-Language Action (NanoVLA), a family of lightweight VLA architectures that achieve high performance with minimal resources. Our core innovations include: (1) vision-language decoupling that moves conventional early vision and language inputs fusion in VLM to late stage, achieving better performance while enabling caching and reduce inference overhead and latency; (2) long-short action chunking to ensure smooth, coherent multi-step planning without sacrificing real-time responsiveness; (3) dynamic routing that adaptively assigns lightweight or heavy backbones based on task complexity, further optimizing inference efficiency. Experimental results on several benchmarks, as well as real-world deployments, demonstrate that NanoVLA achieves up to 52x faster inference on edge devices compared to previous state-of-the-art VLA models, with 98% less parameters while maintaining or surpassing their task accuracy and generalization. Ablation studies confirm that our decoupling strategy preserves cross-task transferability, and the routing module enhances cost-performance trade-offs, enabling practical, high-precision robotic manipulation on resource-constrained hardware.
Mean-Shift Theory and Its Applications in Swarm Robotics: A New Way to Enhance the Efficiency of Multi-Robot Collaboration
Swarms evolving from collective behaviors among multiple individuals are commonly seen in nature, which enables biological systems to exhibit more efficient and robust collaboration. Creating similar swarm intelligence in engineered robots poses challenges to the design of collaborative algorithms that can be programmed at large scales. The assignment-based method has played an eminent role for a very long time in solving collaboration problems of robot swarms. However, it faces fundamental limitations in terms of efficiency and robustness due to its unscalability to swarm variants. This article presents a tutorial review on recent advances in assignment-free collaboration of robot swarms, focusing on the problem of shape formation. A key theoretical component is the recently developed \emph{mean-shift exploration} strategy, which improves the collaboration efficiency of large-scale swarms by dozens of times. Further, the efficiency improvement is more significant as the swarm scale increases. Finally, this article discusses three important applications of the mean-shift exploration strategy, including precise shape formation, area coverage formation, and maneuvering formation, as well as their corresponding industrial scenarios in smart warehousing, area exploration, and cargo transportation.
Non-Invasive Calibration Of A Stewart Platform By Photogrammetry
Accurate calibration of a Stewart platform is important for their precise and efficient operation. However, the calibration of these platforms using forward kinematics is a challenge for researchers because forward kinematics normally generates multiple feasible and unfeasible solutions for any pose of the moving platform. The complex kinematic relations among the six actuator paths connecting the fixed base to the moving platform further compound the difficulty in establishing a straightforward and efficient calibration method. The authors developed a new forward kinematics-based calibration method using Denavit-Hartenberg convention and used the Stewart platform Tiger 66.1 developed in their lab for experimenting with the photogrammetry-based calibration strategies described in this paper. This system became operational upon completion of construction, marking its inaugural use. The authors used their calibration model for estimating the errors in the system and adopted three compensation options or strategies as per Least Square method to improve the accuracy of the system. These strategies leveraged a high-resolution digital camera and off-the-shelf software to capture the poses of the moving platform's center. This process is non-invasive and does not need any additional equipment to be attached to the hexapod or any alteration of the hexapod hardware. This photogrammetry-based calibration process involves multiple high-resolution images from different angles to measure the position and orientation of the platform center in the three-dimensional space. The Target poses and Actual poses are then compared, and the error compensations are estimated using the Least-Squared methods to calculate the Predicted poses. Results from each of the three compensation approaches demonstrated noticeable enhancements in platform pose accuracies, suggesting room for further improvements.
comment: The International Journal of Advanced Manufacturing Technology, 2024
Scalable predictive processing framework for multitask caregiving robots
The rapid aging of societies is intensifying demand for autonomous care robots; however, most existing systems are task-specific and rely on handcrafted preprocessing, limiting their ability to generalize across diverse scenarios. A prevailing theory in cognitive neuroscience proposes that the human brain operates through hierarchical predictive processing, which underlies flexible cognition and behavior by integrating multimodal sensory signals. Inspired by this principle, we introduce a hierarchical multimodal recurrent neural network grounded in predictive processing under the free-energy principle, capable of directly integrating over 30,000-dimensional visuo-proprioceptive inputs without dimensionality reduction. The model was able to learn two representative caregiving tasks, rigid-body repositioning and flexible-towel wiping, without task-specific feature engineering. We demonstrate three key properties: (i) self-organization of hierarchical latent dynamics that regulate task transitions, capture variability in uncertainty, and infer occluded states; (ii) robustness to degraded vision through visuo-proprioceptive integration; and (iii) asymmetric interference in multitask learning, where the more variable wiping task had little influence on repositioning, whereas learning the repositioning task led to a modest reduction in wiping performance, while the model maintained overall robustness. Although the evaluation was limited to simulation, these results establish predictive processing as a universal and scalable computational principle, pointing toward robust, flexible, and autonomous caregiving robots while offering theoretical insight into the human brain's ability to achieve flexible adaptation in uncertain real-world environments.
Large Language Model-assisted Autonomous Vehicle Recovery from Immobilization
Despite significant advancements in recent decades, autonomous vehicles (AVs) continue to face challenges in navigating certain traffic scenarios where human drivers excel. In such situations, AVs often become immobilized, disrupting overall traffic flow. Current recovery solutions, such as remote intervention (which is costly and inefficient) and manual takeover (which excludes non-drivers and limits AV accessibility), are inadequate. This paper introduces StuckSolver, a novel Large Language Model (LLM) driven recovery framework that enables AVs to resolve immobilization scenarios through self-reasoning and/or passenger-guided decision-making. StuckSolver is designed as a plug-in add-on module that operates on top of the AV's existing perception-planning-control stack, requiring no modification to its internal architecture. Instead, it interfaces with standard sensor data streams to detect immobilization states, interpret environmental context, and generate high-level recovery commands that can be executed by the AV's native planner. We evaluate StuckSolver on the Bench2Drive benchmark and in custom-designed uncertainty scenarios. Results show that StuckSolver achieves near-state-of-the-art performance through autonomous self-reasoning alone and exhibits further improvements when passenger guidance is incorporated.
comment: 8 pages
RADRON: Cooperative Localization of Ionizing Radiation Sources by MAVs with Compton Cameras
We present a novel approach to localizing radioactive material by cooperating Micro Aerial Vehicles (MAVs). Our approach utilizes a state-of-the-art single-detector Compton camera as a highly sensitive, yet miniature detector of ionizing radiation. The detector's exceptionally low weight (40 g) opens up new possibilities of radiation detection by a team of cooperating agile MAVs. We propose a new fundamental concept of fusing the Compton camera measurements to estimate the position of the radiation source in real time even from extremely sparse measurements. The data readout and processing are performed directly onboard and the results are used in a dynamic feedback to drive the motion of the vehicles. The MAVs are stabilized in a tightly cooperating swarm to maximize the information gained by the Compton cameras, rapidly locate the radiation source, and even track a moving radiation source.
comment: 8 pages, 9 figures, submitted for review to IEEE RA-L
DARTS: A Drone-Based AI-Powered Real-Time Traffic Incident Detection System
Rapid and reliable incident detection is critical for reducing crash-related fatalities, injuries, and congestion. However, conventional methods, such as closed-circuit television, dashcam footage, and sensor-based detection, separate detection from verification, suffer from limited flexibility, and require dense infrastructure or high penetration rates, restricting adaptability and scalability to shifting incident hotspots. To overcome these challenges, we developed DARTS, a drone-based, AI-powered real-time traffic incident detection system. DARTS integrates drones' high mobility and aerial perspective for adaptive surveillance, thermal imaging for better low-visibility performance and privacy protection, and a lightweight deep learning framework for real-time vehicle trajectory extraction and incident detection. The system achieved 99% detection accuracy on a self-collected dataset and supports simultaneous online visual verification, severity assessment, and incident-induced congestion propagation monitoring via a web-based interface. In a field test on Interstate 75 in Florida, DARTS detected and verified a rear-end collision 12 minutes earlier than the local transportation management center and monitored incident-induced congestion propagation, suggesting potential to support faster emergency response and enable proactive traffic control to reduce congestion and secondary crash risk. Crucially, DARTS's flexible deployment architecture reduces dependence on frequent physical patrols, indicating potential scalability and cost-effectiveness for use in remote areas and resource-constrained settings. This study presents a promising step toward a more flexible and integrated real-time traffic incident detection system, with significant implications for the operational efficiency and responsiveness of modern transportation management.
comment: Preprint version. This manuscript is currently under review at Transportation Research Part C: Emerging Technologies. The PDF corresponds to the version submitted in June 2025. The main findings of this work were recognized with the Best Intelligent Transportation Systems Paper Award at the 2025 TRB Annual Meeting
A New Type of Axis-Angle Attitude Control Law for Rotational Systems: Synthesis, Analysis, and Experiments
Over the past few decades, continuous quaternion-based attitude control has been proven highly effective for driving rotational systems that can be modeled as rigid bodies, such as satellites and drones. However, methods rooted in this approach do not enforce the existence of a unique closed-loop (CL) equilibrium attitude-error quaternion (AEQ); and, for rotational errors about the attitude-error Euler axis larger than {\pi}rad, their proportional-control effect diminishes as the system state moves away from the stable equilibrium of the CL rotational dynamics. In this paper, we introduce a new type of attitude control law that more effectively leverages the attitude-error Euler axis-angle information to guarantee a unique CL equilibrium AEQ and to provide greater flexibility in the use of proportional-control efforts. Furthermore, using two different control laws as examples-through the construction of a strict Lyapunov function for the CL dynamics-we demonstrate that the resulting unique equilibrium of the CL rotational system can be enforced to be uniformly asymptotically stable. To assess and demonstrate the functionality and performance of the proposed approach, we performed numerical simulations and executed dozens of real-time tumble-recovery maneuvers using a small quadrotor. These simulations and flight tests compellingly demonstrate that the proposed axis-angle-based method achieves superior flight performance-compared with that obtained using a high-performance quaternion-based controller-in terms of stabilization time.
comment: 2025 International Conference on Advanced Robotics (ICAR)
Curvature-Aware Calibration of Tactile Sensors for Accurate Force Estimation on Non-Planar Surfaces
Flexible tactile sensors are increasingly used in real-world applications such as robotic grippers, prosthetic hands, wearable gloves, and assistive devices, where they need to conform to curved and irregular surfaces. However, most existing tactile sensors are calibrated only on flat substrates, and their accuracy and consistency degrade once mounted on curved geometries. This limitation restricts their reliability in practical use. To address this challenge, we develop a calibration model for a widely used resistive tactile sensor design that enables accurate force estimation on one-dimensional curved surfaces. We then train a neural network (a multilayer perceptron) to predict local curvature from baseline sensor outputs recorded under no applied load, achieving an R2 score of 0.91. The proposed approach is validated on five daily objects with varying curvatures under forces from 2 N to 8 N. Results show that the curvature-aware calibration maintains consistent force accuracy across all surfaces, while flat-surface calibration underestimates force as curvature increases. Our results demonstrate that curvature-aware modeling improves the accuracy, consistency, and reliability of flexible tactile sensors, enabling dependable performance across real-world applications.
comment: This work has been submitted to the IEEE for possible publication
WaveVerif: Acoustic Side-Channel based Verification of Robotic Workflows
In this paper, we present a framework that uses acoustic side-channel analysis (ASCA) to monitor and verify whether a robot correctly executes its intended commands. We develop and evaluate a machine-learning-based workflow verification system that uses acoustic emissions generated by robotic movements. The system can determine whether real-time behavior is consistent with expected commands. The evaluation takes into account movement speed, direction, and microphone distance. The results show that individual robot movements can be validated with over 80% accuracy under baseline conditions using four different classifiers: Support Vector Machine (SVM), Deep Neural Network (DNN), Recurrent Neural Network (RNN), and Convolutional Neural Network (CNN). Additionally, workflows such as pick-and-place and packing could be identified with similarly high confidence. Our findings demonstrate that acoustic signals can support real-time, low-cost, passive verification in sensitive robotic environments without requiring hardware modifications.
comment: 11 pages, 3 figures, Corresponding Author: Prof. Shishir Nagaraja (shishir.nagaraja@newcastle.ac.uk)
Risk-Aware Safety Filters with Poisson Safety Functions and Laplace Guidance Fields
Robotic systems navigating in real-world settings require a semantic understanding of their environment to properly determine safe actions. This work aims to develop the mathematical underpinnings of such a representation -- specifically, the goal is to develop safety filters that are risk-aware. To this end, we take a two step approach: encoding an understanding of the environment via Poisson's equation, and associated risk via Laplace guidance fields. That is, we first solve a Dirichlet problem for Poisson's equation to generate a safety function that encodes system safety as its 0-superlevel set. We then separately solve a Dirichlet problem for Laplace's equation to synthesize a safe \textit{guidance field} that encodes variable levels of caution around obstacles -- by enforcing a tunable flux boundary condition. The safety function and guidance fields are then combined to define a safety constraint and used to synthesize a risk-aware safety filter which, given a semantic understanding of an environment with associated risk levels of environmental features, guarantees safety while prioritizing avoidance of higher risk obstacles. We demonstrate this method in simulation and discuss how \textit{a priori} understandings of obstacle risk can be directly incorporated into the safety filter to generate safe behaviors that are risk-aware.
BikeScenes: Online LiDAR Semantic Segmentation for Bicycles
The vulnerability of cyclists, exacerbated by the rising popularity of faster e-bikes, motivates adapting automotive perception technologies for bicycle safety. We use our multi-sensor 'SenseBike' research platform to develop and evaluate a 3D LiDAR segmentation approach tailored to bicycles. To bridge the automotive-to-bicycle domain gap, we introduce the novel BikeScenes-lidarseg Dataset, comprising 3021 consecutive LiDAR scans around the university campus of the TU Delft, semantically annotated for 29 dynamic and static classes. By evaluating model performance, we demonstrate that fine-tuning on our BikeScenes dataset achieves a mean Intersection-over-Union (mIoU) of 63.6%, significantly outperforming the 13.8% obtained with SemanticKITTI pre-training alone. This result underscores the necessity and effectiveness of domain-specific training. We highlight key challenges specific to bicycle-mounted, hardware-constrained perception systems and contribute the BikeScenes dataset as a resource for advancing research in cyclist-centric LiDAR segmentation.
Debate2Create: Robot Co-design via Large Language Model Debates
Automating the co-design of a robot's morphology and control is a long-standing challenge due to the vast design space and the tight coupling between body and behavior. We introduce Debate2Create (D2C), a framework in which large language model (LLM) agents engage in a structured dialectical debate to jointly optimize a robot's design and its reward function. In each round, a design agent proposes targeted morphological modifications, and a control agent devises a reward function tailored to exploit the new design. A panel of pluralistic judges then evaluates the design-control pair in simulation and provides feedback that guides the next round of debate. Through iterative debates, the agents progressively refine their proposals, producing increasingly effective robot designs. Notably, D2C yields diverse and specialized morphologies despite no explicit diversity objective. On a quadruped locomotion benchmark, D2C discovers designs that travel 73% farther than the default, demonstrating that structured LLM-based debate can serve as a powerful mechanism for emergent robot co-design. Our results suggest that multi-agent debate, when coupled with physics-grounded feedback, is a promising new paradigm for automated robot design.
Enhancing Underwater Object Detection through Spatio-Temporal Analysis and Spatial Attention Networks
This study examines the effectiveness of spatio-temporal modeling and the integration of spatial attention mechanisms in deep learning models for underwater object detection. Specifically, in the first phase, the performance of temporal-enhanced YOLOv5 variant T-YOLOv5 is evaluated, in comparison with the standard YOLOv5. For the second phase, an augmented version of T-YOLOv5 is developed, through the addition of a Convolutional Block Attention Module (CBAM). By examining the effectiveness of the already pre-existing YOLOv5 and T-YOLOv5 models and of the newly developed T-YOLOv5 with CBAM. With CBAM, the research highlights how temporal modeling improves detection accuracy in dynamic marine environments, particularly under conditions of sudden movements, partial occlusions, and gradual motion. The testing results showed that YOLOv5 achieved a mAP@50-95 of 0.563, while T-YOLOv5 and T-YOLOv5 with CBAM outperformed with mAP@50-95 scores of 0.813 and 0.811, respectively, highlighting their superior accuracy and generalization in detecting complex objects. The findings demonstrate that T-YOLOv5 significantly enhances detection reliability compared to the standard model, while T-YOLOv5 with CBAM further improves performance in challenging scenarios, although there is a loss of accuracy when it comes to simpler scenarios.
Force Characterization of Insect-Scale Aquatic Propulsion Based on Fluid-Structure Interaction
We present force characterizations of two newly developed insect-scale propulsors--one single-tailed and one double-tailed--for microrobotic swimmers that leverage fluid-structure interaction (FSI) to generate thrust. The designs of these two devices were inspired by anguilliform swimming and are driven by soft tails excited by high-work-density (HWD) actuators powered by shape-memory alloy (SMA) wires. While these propulsors have been demonstrated to be suitable for microrobotic aquatic locomotion and controllable with simple architectures for trajectory tracking in the two-dimensional (2D) space, the characteristics and magnitudes of the associated forces have not been studied systematically. In the research presented here, we adopted a theoretical framework based on the notion of reactive forces and obtained experimental data for characterization using a custom-built micro-N-resolution force sensor. We measured maximum and cycle-averaged force values with multi-test means of respectively 0.45 mN and 2.97 micro-N, for the tested single-tail propulsor. For the dual-tail propulsor, we measured maximum and cycle-averaged force values with multi-test means of 0.61 mN and 22.6 micro-N, respectively. These results represent the first measurements of the instantaneous thrust generated by insect-scale propulsors of this type and provide insights into FSI for efficient microrobotic propulsion.
comment: To be presented at ICAR 2025 in San Juan, Argentina
Taxonomy and Trends in Reinforcement Learning for Robotics and Control Systems: A Structured Review
Reinforcement learning (RL) has become a foundational approach for enabling intelligent robotic behavior in dynamic and uncertain environments. This work presents an in-depth review of RL principles, advanced deep reinforcement learning (DRL) algorithms, and their integration into robotic and control systems. Beginning with the formalism of Markov Decision Processes (MDPs), the study outlines essential elements of the agent-environment interaction and explores core algorithmic strategies including actor-critic methods, value-based learning, and policy gradients. Emphasis is placed on modern DRL techniques such as DDPG, TD3, PPO, and SAC, which have shown promise in solving high-dimensional, continuous control tasks. A structured taxonomy is introduced to categorize RL applications across domains such as locomotion, manipulation, multi-agent coordination, and human-robot interaction, along with training methodologies and deployment readiness levels. The review synthesizes recent research efforts, highlighting technical trends, design patterns, and the growing maturity of RL in real-world robotics. Overall, this work aims to bridge theoretical advances with practical implementations, providing a consolidated perspective on the evolving role of RL in autonomous robotic systems.
RoboOmni: Proactive Robot Manipulation in Omni-modal Context
Recent advances in Multimodal Large Language Models (MLLMs) have driven rapid progress in Vision-Language-Action (VLA) models for robotic manipulation. Although effective in many scenarios, current approaches largely rely on explicit instructions, whereas in real-world interactions, humans rarely issue instructions directly. Effective collaboration requires robots to infer user intentions proactively. In this work, we introduce cross-modal contextual instructions, a new setting where intent is derived from spoken dialogue, environmental sounds, and visual cues rather than explicit commands. To address this new setting, we present RoboOmni, a Perceiver-Thinker-Talker-Executor framework based on end-to-end omni-modal LLMs that unifies intention recognition, interaction confirmation, and action execution. RoboOmni fuses auditory and visual signals spatiotemporally for robust intention recognition, while supporting direct speech interaction. To address the absence of training data for proactive intention recognition in robotic manipulation, we build OmniAction, comprising 140k episodes, 5k+ speakers, 2.4k event sounds, 640 backgrounds, and six contextual instruction types. Experiments in simulation and real-world settings show that RoboOmni surpasses text- and ASR-based baselines in success rate, inference speed, intention recognition, and proactive assistance.
Optimal Kinematic Synthesis and Prototype Development of Knee Exoskeleton
The range of rotation (RoR) in a knee exoskeleton is a critical factor in rehabilitation, as it directly influences joint mobility, muscle activation, and recovery outcomes. A well-designed RoR ensures that patients achieve near-natural knee kinematics, which is essential for restoring gait patterns and preventing compensatory movements. This paper presents optimal design of one degree of freedom knee exoskeleton. In kinematic analysis, the existing design being represented by nonlinear and nonconvex mathematical functions. To obtain feasible and optimum measurement of the links of knee exoskeleton, an optimization problem is formulated based on the kinematic analysis and average human's leg measurement. The optimized solution increases the range of motion of knee exoskeleton during sit to stand motion by $24 \%$ as compared with inspired design. Furthermore, misalignment study is conducted by comparing the trajectory of human's knee and exoskeleton's knee during sit to stand motion. The joint movement is calculated using marker and camera system. Finally, a prototype of the knee joint exoskeleton is being developed based on optimal dimensions which validate the maximum range of motion achieved during simulation.
Dual-Regularized Riccati Recursions for Interior-Point Optimal Control
We derive closed-form extensions of Riccati's recursions (both sequential and parallel) for solving dual-regularized LQR problems. We show how these methods can be used to solve general constrained, non-convex, discrete-time optimal control problems via a regularized interior point method, while guaranteeing that each step is a descent direction of an Augmented Barrier-Lagrangian merit function. We provide MIT-licensed implementations of our methods in C++ and JAX.
Classification of Driver Behaviour Using External Observation Techniques for Autonomous Vehicles
Road traffic accidents remain a significant global concern, with human error, particularly distracted and impaired driving, among the leading causes. This study introduces a novel driver behaviour classification system that uses external observation techniques to detect indicators of distraction and impairment. The proposed framework employs advanced computer vision methodologies, including real-time object tracking, lateral displacement analysis, and lane position monitoring. The system identifies unsafe driving behaviours such as excessive lateral movement and erratic trajectory patterns by implementing the YOLO object detection model and custom lane estimation algorithms. Unlike systems reliant on inter-vehicular communication, this vision-based approach enables behavioural analysis of non-connected vehicles. Experimental evaluations on diverse video datasets demonstrate the framework's reliability and adaptability across varying road and environmental conditions.
SNN-Based Online Learning of Concepts and Action Laws in an Open World
We present the architecture of a fully autonomous, bio-inspired cognitive agent built around a spiking neural network (SNN) implementing the agent's semantic memory. This agent explores its universe and learns concepts of objects/situations and of its own actions in a one-shot manner. While object/situation concepts are unary, action concepts are triples made up of an initial situation, a motor activity, and an outcome. They embody the agent's knowledge of its universe's action laws. Both kinds of concepts have different degrees of generality. To make decisions the agent queries its semantic memory for the expected outcomes of envisaged actions and chooses the action to take on the basis of these predictions. Our experiments show that the agent handles new situations by appealing to previously learned general concepts and rapidly modifies its concepts to adapt to environment changes.
Redistributing Rewards Across Time and Agents for Multi-Agent Reinforcement Learning
Credit assignmen, disentangling each agent's contribution to a shared reward, is a critical challenge in cooperative multi-agent reinforcement learning (MARL). To be effective, credit assignment methods must preserve the environment's optimal policy. Some recent approaches attempt this by enforcing return equivalence, where the sum of distributed rewards must equal the team reward. However, their guarantees are conditional on a learned model's regression accuracy, making them unreliable in practice. We introduce Temporal-Agent Reward Redistribution (TAR$^2$), an approach that decouples credit modeling from this constraint. A neural network learns unnormalized contribution scores, while a separate, deterministic normalization step enforces return equivalence by construction. We demonstrate that this method is equivalent to a valid Potential-Based Reward Shaping (PBRS), which guarantees the optimal policy is preserved regardless of model accuracy. Empirically, on challenging SMACLite and Google Research Football (GRF) benchmarks, TAR$^2$ accelerates learning and achieves higher final performance than strong baselines. These results establish our method as an effective solution for the agent-temporal credit assignment problem.
comment: 16 pages, 4 figures, 4 tables
Multi-robot Motion Planning based on Nets-within-Nets Modeling and Simulation
This paper focuses on designing motion plans for a heterogeneous team of robots that must cooperate to fulfill a global mission. Robots move in an environment that contains some regions of interest, while the specification for the entire team can include avoidance, visits, or sequencing of these regions of interest. The mission is expressed in terms of a Petri net corresponding to an automaton, while each robot is also modeled by a state machine Petri net. The current work brings about the following contributions with respect to existing solutions for related problems. First, we propose a novel model, denoted High-Level robot team Petri Net (HLrtPN) system, to incorporate the specification and robot models into the Nets-within-Nets paradigm. A guard function, named Global Enabling Function, is designed to synchronize the firing of transitions so that robot motions do not violate the specification. Then, the solution is found by simulating the HLrtPN system in a specific software tool that accommodates Nets-within-Nets. Illustrative examples based on Linear Temporal Logic missions support the computational feasibility of the proposed framework.
comment: [Note for readers] This paper has been extended from a previous submission to 62nd IEEE Conference on Decision and Control, Dec. 13-15, 2023. This work has been submitted to the IEEE for possible publication
Control Modes of Teleoperated Surgical Robotic System's Tools in Ophthalmic Surgery
The introduction of a teleoperated surgical robotic system designed for minimally invasive procedures enables the emulation of two distinct control modes through a dedicated input device of the surgical console: (1) Inside Control Mode, which emulates tool manipulation near the distal end as if the surgeon was holding the tip of the instrument inside the patient's body; (2) Outside Control Mode, which emulates manipulation near the proximal end as if the surgeon was holding the tool externally. The aim of this research is to compare the surgeon's performance on these two modes of operation along with various scaling factors in a simulated vitreoretinal surgical setting. The console of Intraocular Robotic Interventional Surgical System (IRISS) was utilized but the surgical robot itself and the human eye anatomy was simulated by a virtual environment projected microscope view of an intraocular setup to a VR headset. Five experienced vitreoretinal surgeons and five subjects with no surgical experience used the system to perform four fundamental tool/tissue tasks common to vitreoretinal surgery: touch and reset; grasp and drop; inject; circular tracking. Results indicate that Inside Control outperforms Outside Control across multiple tasks and metrics. Higher scaling factors generally performed better, particularly for reducing trajectory errors and tissue damage. This improvement suggests that larger scaling factors enable more precise control, making them the preferred option for fine manipulation. However, completion time was not consistently reduced across all conditions, indicating that surgeons need to balance speed and accuracy based on surgical requirements. By optimizing control dynamics and user interface, robotic teleoperation has the potential to reduce complications, enhance dexterity, and expand the accessibility of high precision procedures to a broader range of practitioners.
comment: 10 pages, 11 figures
RoboCerebra: A Large-scale Benchmark for Long-horizon Robotic Manipulation Evaluation NeurIPS 2025
Recent advances in vision-language models (VLMs) have enabled instruction-conditioned robotic systems with improved generalization. However, most existing work focuses on reactive System 1 policies, underutilizing VLMs' strengths in semantic reasoning and long-horizon planning. These System 2 capabilities-characterized by deliberative, goal-directed thinking-remain under explored due to the limited temporal scale and structural complexity of current benchmarks. To address this gap, we introduce RoboCerebra, a benchmark for evaluating high-level reasoning in long-horizon robotic manipulation. RoboCerebra includes: (1) a large-scale simulation dataset with extended task horizons and diverse subtask sequences in household environments; (2) a hierarchical framework combining a high-level VLM planner with a low-level vision-language-action (VLA) controller; and (3) an evaluation protocol targeting planning, reflection, and memory through structured System 1-System 2 interaction. The dataset is constructed via a top-down pipeline, where GPT generates task instructions and decomposes them into subtask sequences. Human operators execute the subtasks in simulation, yielding high-quality trajectories with dynamic object variations. Compared to prior benchmarks, RoboCerebra features significantly longer action sequences and denser annotations. We further benchmark state-of-the-art VLMs as System 2 modules and analyze their performance across key cognitive dimensions, advancing the development of more capable and generalizable robotic planners.
comment: 25 pages, 18 figures, Accepted by NeurIPS 2025
ES-HPC-MPC: Exponentially Stable Hybrid Perception Constrained MPC for Quadrotor with Suspended Payloads
Aerial transportation using quadrotors with cable-suspended payloads holds great potential for applications in disaster response, logistics, and infrastructure maintenance. However, their hybrid and underactuated dynamics pose significant control and perception challenges. Traditional approaches often assume a taut cable condition, limiting their effectiveness in real-world applications where slack-to-taut transitions occur due to disturbances. We introduce ES-HPC-MPC, a model predictive control framework that enforces exponential stability and perception-constrained control under hybrid dynamics. Our method leverages Exponentially Stabilizing Control Lyapunov Functions (ES-CLFs) to enforce stability during the tasks and Control Barrier Functions (CBFs) to maintain the payload within the onboard camera's field of view (FoV). We validate our method through both simulation and real-world experiments, demonstrating stable trajectory tracking and reliable payload perception. We validate that our method maintains stability and satisfies perception constraints while tracking dynamically infeasible trajectories and when the system is subjected to hybrid mode transitions caused by unexpected disturbances.
comment: Accepted to IEEE Robotics and Automation Letters
A Constrained Saddle Search Approach for Constructing Singular and Flexible Bar Frameworks
Singularity analysis is essential in robot kinematics, as singular configurations cause loss of control and kinematic indeterminacy. This paper models singularities in bar frameworks as saddle points on constrained manifolds. Given an under-constrained, non-singular bar framework, by allowing one edge to vary its length while fixing lengths of others, we define the squared length of the free edge as an energy functional and show that its local saddle points correspond to singular and flexible frameworks. Using our constrained saddle search approach, we identify previously unknown singular and flexible bar frameworks, providing new insights into singular robotics design and analysis.
comment: 9 pages, 3 figures
STATE-NAV: Stability-Aware Traversability Estimation for Bipedal Navigation on Rough Terrain
Bipedal robots have advantages in maneuvering human-centered environments, but face greater failure risk compared to other stable mobile plarforms such as wheeled or quadrupedal robots. While learning-based traversability has been widely studied for these platforms, bipedal traversability has instead relied on manually designed rules with limited consideration of locomotion stability on rough terrain. In this work, we present the first learning-based traversability estimation and risk-sensitive navigation framework for bipedal robots operating in diverse, uneven environments. TravFormer, a transformer-based neural network, is trained to predict bipedal instability with uncertainty, enabling risk-aware and adaptive planning. Based on the network, we define traversability as stability-aware command velocity-the fastest command velocity that keeps instability below a user-defined limit. This velocity-based traversability is integrated into a hierarchical planner that combines traversability-informed Rapid Random Tree Star (TravRRT*) for time-efficient planning and Model Predictive Control (MPC) for safe execution. We validate our method in MuJoCo simulation and the real world, demonstrating improved navigation performance, with enhanced robustness and time efficiency across varying terrains compared to existing methods.
SARM: Stage-Aware Reward Modeling for Long Horizon Robot Manipulation
Large-scale robot learning has recently shown promise for enabling robots to perform complex tasks by integrating perception, control, and language understanding. Yet, it struggles with long-horizon, contact-rich manipulation such as deformable object handling, where demonstration quality is inconsistent. Reward modeling offers a natural solution: by providing grounded progress signals, it transforms noisy demonstrations into stable supervision that generalizes across diverse trajectories. We introduce a stage-aware, video-based reward modeling framework that jointly predicts high-level task stages and fine-grained progress. Reward labels are automatically derived from natural language subtask annotations, ensuring consistent progress estimation across variable-length demonstrations. This design overcomes frame-index labeling, which fails in variable-duration tasks like folding a T-shirt. Our reward model demonstrates robustness to variability, generalization to out-of-distribution settings, and strong utility for policy training. Building on it, we propose Reward-Aligned Behavior Cloning (RA-BC), which filters high-quality data and reweights samples by reward. Experiments show the reward model alone outperforms baselines on validation and real robot rollouts. Integrated into RA-BC, our approach achieves 83% success on folding T-shirts from the flattened state and 67% from the crumpled state -- far surpassing vanilla behavior cloning, which attains only 8% and 0% success. Overall, our results highlight reward modeling as a key enabler for scalable, annotation-efficient, and robust imitation learning in long-horizon manipulation.
Multiagent Systems
Counterfactual-based Agent Influence Ranker for Agentic AI Workflows EMNLP 2025
An Agentic AI Workflow (AAW), also known as an LLM-based multi-agent system, is an autonomous system that assembles several LLM-based agents to work collaboratively towards a shared goal. The high autonomy, widespread adoption, and growing interest in such AAWs highlight the need for a deeper understanding of their operations, from both quality and security aspects. To this day, there are no existing methods to assess the influence of each agent on the AAW's final output. Adopting techniques from related fields is not feasible since existing methods perform only static structural analysis, which is unsuitable for inference time execution. We present Counterfactual-based Agent Influence Ranker (CAIR) - the first method for assessing the influence level of each agent on the AAW's output and determining which agents are the most influential. By performing counterfactual analysis, CAIR provides a task-agnostic analysis that can be used both offline and at inference time. We evaluate CAIR using an AAWs dataset of our creation, containing 30 different use cases with 230 different functionalities. Our evaluation showed that CAIR produces consistent rankings, outperforms baseline methods, and can easily enhance the effectiveness and relevancy of downstream tasks.
comment: Accepted to EMNLP 2025, 27 pages, 6 figures
Solving the Right Problem with Multi-Robot Formations
Formation control simplifies minimizing multi-robot cost functions by encoding a cost function as a shape the robots maintain. However, by reducing complex cost functions to formations, discrepancies arise between maintaining the shape and minimizing the original cost function. For example, a Diamond or Box formation shape is often used for protecting all members of the formation. When more information about the surrounding environment becomes available, a static shape often no longer minimizes the original protection cost. We propose a formation planner to reduce mismatch between a formation and the cost function while still leveraging efficient formation controllers. Our formation planner is a two-step optimization problem that identifies desired relative robot positions. We first solve a constrained problem to estimate non-linear and non-differentiable costs with a weighted sum of surrogate cost functions. We theoretically analyze this problem and identify situations where weights do not need to be updated. The weighted, surrogate cost function is then minimized using relative positions between robots. The desired relative positions are realized using a non-cooperative formation controller derived from Lyapunov's direct approach. We then demonstrate the efficacy of this approach for military-like costs such as protection and obstacle avoidance. In simulations, we show a formation planner can reduce a single cost by over 75%. When minimizing a variety of cost functions simultaneously, using a formation planner with adaptive weights can reduce the cost by 20-40%. Formation planning provides better performance by minimizing a surrogate cost function that closely approximates the original cost function instead of relying on a shape abstraction.
comment: Submitted to SAE WCX 2026
Multi-party Agent Relation Sampling for Multi-party Ad Hoc Teamwork
Multi-agent reinforcement learning (MARl) has achieved strong results in cooperative tasks but typically assumes fixed, fully controlled teams. Ad hoc teamwork (AHT) relaxes this by allowing collaboration with unknown partners, yet existing variants still presume shared conventions. We introduce Multil-party Ad Hoc Teamwork (MAHT), where controlled agents must coordinate with multiple mutually unfamiliar groups of uncontrolled teammates. To address this, we propose MARs, which builds a sparse skeleton graph and applies relational modeling to capture cross-group dvnamics. Experiments on MPE and starCralt ll show that MARs outperforms MARL and AHT baselines while converging faster.
Collaborative Scheduling of Time-dependent UAVs,Vehicles and Workers for Crowdsensing in Disaster Response
Frequent natural disasters cause significant losses to human society, and timely, efficient collection of post-disaster environmental information is the foundation for effective rescue operations. Due to the extreme complexity of post-disaster environments, existing sensing technologies such as mobile crowdsensing suffer from weak environmental adaptability, insufficient professional sensing capabilities, and poor practicality of sensing solutions. Therefore, this paper explores a heterogeneous multi-agent online collaborative scheduling algorithm, HoCs-MPQ, to achieve efficient collection of post-disaster environmental information. HoCs-MPQ models collaboration and conflict relationships among multiple elements through weighted undirected graph construction, and iteratively solves the maximum weight independent set based on multi-priority queues, ultimately achieving collaborative sensing scheduling of time-dependent UA Vs, vehicles, and workers. Specifically, (1) HoCs-MPQ constructs weighted undirected graph nodes based on collaborative relationships among multiple elements and quantifies their weights, then models the weighted undirected graph based on conflict relationships between nodes; (2) HoCs-MPQ solves the maximum weight independent set based on iterated local search, and accelerates the solution process using multi-priority queues. Finally, we conducted detailed experiments based on extensive real-world and simulated data. The experiments show that, compared to baseline methods (e.g., HoCs-GREEDY, HoCs-K-WTA, HoCs-MADL, and HoCs-MARL), HoCs-MPQ improves task completion rates by an average of 54.13%, 23.82%, 14.12%, and 12.89% respectively, with computation time for single online autonomous scheduling decisions not exceeding 3 seconds.
On Robust Popular Matchings with Tie-Bounded Preferences and Stable Matchings with Two-Sided Ties
We are given a bipartite graph $G = \left( A \cup B, E \right)$. In the one-sided model, every $a \in A$ (often called agents) ranks its neighbours $z \in N_{a}$ strictly, and no $b \in B$ has any preference order over its neighbours $y \in N_{b}$, and vertices in $B$ abstain from casting their votes to matchings. In the two-sided model with one-sided ties, every $a \in A$ ranks its neighbours $z \in N_{a}$ strictly, and every $b \in B$ puts all of its neighbours into a single large tie, i.e., $b \in B$ prefers every $y \in N_{b}$ equally. In this two-sided model with one-sided ties, when two matchings compete in a majority election, $b \in B$ abstains from casting its vote for a matching when both the matchings saturate $b$ or both leave $b$ unsaturated; else $b$ prefers the matching where it is saturated. A popular matching $M$ is \emph{robust} if it remains popular among multiple instances. We have analysed the cases when a robust popular matching exists in the one-sided model where only one agent alters her preference order among the instances, and we have proposed a polynomial-time algorithm to decide if there exists a robust popular matching when instances differ only with respect to the preference orders of a single agent. We give a simple characterisation of popular matchings in the two-sided model with one-sided ties. We show that in the two-sided model with one-sided ties, if the input instances differ only with respect to the preference orders of a single agent, there is a polynomial-time algorithm to decide whether there exists a robust popular matching. We have been able to decide the stable matching problem in bipartite graphs $G = (A \cup B, E)$ where \textit{both} sides have weak preferences (ties allowed), with the restriction that every tie has length at most $k$.
Machine Learning and CPU (Central Processing Unit) Scheduling Co-Optimization over a Network of Computing Centers
In the rapidly evolving research on artificial intelligence (AI) the demand for fast, computationally efficient, and scalable solutions has increased in recent years. The problem of optimizing the computing resources for distributed machine learning (ML) and optimization is considered in this paper. Given a set of data distributed over a network of computing-nodes/servers, the idea is to optimally assign the CPU (central processing unit) usage while simultaneously training each computing node locally via its own share of data. This formulates the problem as a co-optimization setup to (i) optimize the data processing and (ii) optimally allocate the computing resources. The information-sharing network among the nodes might be time-varying, but with balanced weights to ensure consensus-type convergence of the algorithm. The algorithm is all-time feasible, which implies that the computing resource-demand balance constraint holds at all iterations of the proposed solution. Moreover, the solution allows addressing possible log-scale quantization over the information-sharing channels to exchange log-quantized data. For some example applications, distributed support-vector-machine (SVM) and regression are considered as the ML training models. Results from perturbation theory, along with Lyapunov stability and eigen-spectrum analysis, are used to prove the convergence towards the optimal case. As compared to existing CPU scheduling solutions, the proposed algorithm improves the cost optimality gap by more than $50\%$.
comment: EAAI Journal
The Iceberg Index: Measuring Workforce Exposure Across the AI Economy
Artificial Intelligence is reshaping America's \$9.4 trillion labor market, with cascading effects that extend far beyond visible technology sectors. When AI transforms quality control tasks in automotive plants, consequences spread through logistics networks, supply chains, and local service economies. Yet traditional workforce metrics cannot capture these ripple effects: they measure employment outcomes after disruption occurs, not where AI capabilities overlap with human skills before adoption crystallizes. Project Iceberg addresses this gap using Large Population Models to simulate the human-AI labor market, representing 151 million workers as autonomous agents executing over 32,000 skills and interacting with thousands of AI tools. It introduces the Iceberg Index, a skills-centered metric that measures the wage value of skills AI systems can perform within each occupation. The Index captures technical exposure, where AI can perform occupational tasks, not displacement outcomes or adoption timelines. Analysis shows that visible AI adoption concentrated in computing and technology (2.2% of wage value, approx \$211 billion) represents only the tip of the iceberg. Technical capability extends far below the surface through cognitive automation spanning administrative, financial, and professional services (11.7%, approx \$1.2 trillion). This exposure is fivefold larger and geographically distributed across all states rather than confined to coastal hubs. Traditional indicators such as GDP, income, and unemployment explain less than 5% of this skills-based variation, underscoring why new indices are needed to capture exposure in the AI economy. By simulating how these capabilities may spread under scenarios, Iceberg enables policymakers and business leaders to identify exposure hotspots, prioritize investments, and test interventions before committing billions to implementation
comment: iceberg.mit.edu
SeeingEye: Agentic Information Flow Unlocks Multimodal Reasoning In Text-only LLMs
Recent advances in text-only large language models (LLMs), such as DeepSeek-R1, demonstrate remarkable reasoning ability. However, these models remain fragile or entirely incapable when extended to multi-modal tasks. Existing approaches largely rely on single-form captions, which lack diversity and often fail to adapt across different types of Visual Question Answering (VQA) benchmarks. As a result, they provide no principled or efficient channel for transmitting fine-grained visual information. We introduce Seeing Eye, a modular framework that unlocks multimodal reasoning in text-only LLMs through an agent-based small VLM translator. This translator acts as a perception agent: it can invoke specialized tools (e.g., OCR and crop) and iteratively distill multimodal inputs into structured intermediate representations (SIRs) tailored to the question. These SIRs are then passed to the text-only LLM, which serves as a reasoning agent. Crucially, the translator and reasoner engage in multi-round feedback and interaction, enabling the extraction of targeted visual details and yielding more confident answers. Experiments on knowledge-intensive VQA benchmarks, including MMMU and MIA-Bench, demonstrate that Seeing Eye not only reduces inference cost but also surpasses much larger end-to-end VLMs. For example, an instantiation combining a 3B-parameter vision translator with an 8B-parameter language reasoner outperforms a monolithic 32B VLM on challenging knowledge-based questions. Our results highlight that decoupling perception from reasoning via agent information flow offers a scalable and plug-and-play pathway to multimodal reasoning, allowing strong text-only LLMs to fully leverage their reasoning capabilities. Code is available at: https://github.com/ulab-uiuc/SeeingEye
Multi-Agent Reinforcement Learning for Market Making: Competition without Collusion
Algorithmic collusion has emerged as a central question in AI: Will the interaction between different AI agents deployed in markets lead to collusion? More generally, understanding how emergent behavior, be it a cartel or market dominance from more advanced bots, affects the market overall is an important research question. We propose a hierarchical multi-agent reinforcement learning framework to study algorithmic collusion in market making. The framework includes a self-interested market maker (Agent~A), which is trained in an uncertain environment shaped by an adversary, and three bottom-layer competitors: the self-interested Agent~B1 (whose objective is to maximize its own PnL), the competitive Agent~B2 (whose objective is to minimize the PnL of its opponent), and the hybrid Agent~B$^\star$, which can modulate between the behavior of the other two. To analyze how these agents shape the behavior of each other and affect market outcomes, we propose interaction-level metrics that quantify behavioral asymmetry and system-level dynamics, while providing signals potentially indicative of emergent interaction patterns. Experimental results show that Agent~B2 secures dominant performance in a zero-sum setting against B1, aggressively capturing order flow while tightening average spreads, thus improving market execution efficiency. In contrast, Agent~B$^\star$ exhibits a self-interested inclination when co-existing with other profit-seeking agents, securing dominant market share through adaptive quoting, yet exerting a milder adverse impact on the rewards of Agents~A and B1 compared to B2. These findings suggest that adaptive incentive control supports more sustainable strategic co-existence in heterogeneous agent environments and offers a structured lens for evaluating behavioral design in algorithmic trading systems.
Debate2Create: Robot Co-design via Large Language Model Debates
Automating the co-design of a robot's morphology and control is a long-standing challenge due to the vast design space and the tight coupling between body and behavior. We introduce Debate2Create (D2C), a framework in which large language model (LLM) agents engage in a structured dialectical debate to jointly optimize a robot's design and its reward function. In each round, a design agent proposes targeted morphological modifications, and a control agent devises a reward function tailored to exploit the new design. A panel of pluralistic judges then evaluates the design-control pair in simulation and provides feedback that guides the next round of debate. Through iterative debates, the agents progressively refine their proposals, producing increasingly effective robot designs. Notably, D2C yields diverse and specialized morphologies despite no explicit diversity objective. On a quadruped locomotion benchmark, D2C discovers designs that travel 73% farther than the default, demonstrating that structured LLM-based debate can serve as a powerful mechanism for emergent robot co-design. Our results suggest that multi-agent debate, when coupled with physics-grounded feedback, is a promising new paradigm for automated robot design.
HyperMARL: Adaptive Hypernetworks for Multi-Agent RL NeurIPS 2025
Adaptive cooperation in multi-agent reinforcement learning (MARL) requires policies to express homogeneous, specialised, or mixed behaviours, yet achieving this adaptivity remains a critical challenge. While parameter sharing (PS) is standard for efficient learning, it notoriously suppresses the behavioural diversity required for specialisation. This failure is largely due to cross-agent gradient interference, a problem we find is surprisingly exacerbated by the common practice of coupling agent IDs with observations. Existing remedies typically add complexity through altered objectives, manual preset diversity levels, or sequential updates -- raising a fundamental question: can shared policies adapt without these intricacies? We propose a solution built on a key insight: an agent-conditioned hypernetwork can generate agent-specific parameters and decouple observation- and agent-conditioned gradients, directly countering the interference from coupling agent IDs with observations. Our resulting method, HyperMARL, avoids the complexities of prior work and empirically reduces policy gradient variance. Across diverse MARL benchmarks (22 scenarios, up to 30 agents), HyperMARL achieves performance competitive with six key baselines while preserving behavioural diversity comparable to non-parameter sharing methods, establishing it as a versatile and principled approach for adaptive MARL. The code is publicly available at https://github.com/KaleabTessera/HyperMARL.
comment: To appear at the 39th Conference on Neural Information Processing Systems (NeurIPS 2025). A preliminary version of this work was presented at the CoCoMARL workshop, RLC 2025
Redistributing Rewards Across Time and Agents for Multi-Agent Reinforcement Learning
Credit assignmen, disentangling each agent's contribution to a shared reward, is a critical challenge in cooperative multi-agent reinforcement learning (MARL). To be effective, credit assignment methods must preserve the environment's optimal policy. Some recent approaches attempt this by enforcing return equivalence, where the sum of distributed rewards must equal the team reward. However, their guarantees are conditional on a learned model's regression accuracy, making them unreliable in practice. We introduce Temporal-Agent Reward Redistribution (TAR$^2$), an approach that decouples credit modeling from this constraint. A neural network learns unnormalized contribution scores, while a separate, deterministic normalization step enforces return equivalence by construction. We demonstrate that this method is equivalent to a valid Potential-Based Reward Shaping (PBRS), which guarantees the optimal policy is preserved regardless of model accuracy. Empirically, on challenging SMACLite and Google Research Football (GRF) benchmarks, TAR$^2$ accelerates learning and achieves higher final performance than strong baselines. These results establish our method as an effective solution for the agent-temporal credit assignment problem.
comment: 16 pages, 4 figures, 4 tables
Binary Decision Process in Pre-Evacuation Behavior
In crowd evacuation the time interval before decisive movement towards a safe place is defined as the pre-evacuation phase, and it has crucial impact on the total time required for safe egress. This process mainly refers to situation awareness and response to an external stressors, e.g., fire alarms. Due to the complexity of human cognitive process, simulation is used to study this important time interval. In this paper a binary decision process is formulated to simulate pre-evacuation time of many evacuees in a given social context. The model combines the classic opinion dynamics (the French-DeGroot model) with binary phase transition to describe how group pre-evacuation time emerges from individual interaction. The model parameters are quantitatively meaningful to human factors research within socio-psychological background, e.g., whether an individual is stubborn or open-minded, or what kind of the social topology exists among the individuals and how it matters in aggregating individuals into social groups. The modeling framework also describes collective motion of many evacuee agents in a planar space, and the resulting multi-agent system is partly similar to the Vicsek flocking model, and it is meaningful to explore complex social behavior during phase transition of a non-equilibrium process.
comment: 5 pages
Systems and Control (CS)
Over 3 kV and Ultra-Low leakage Vertical (011) \b{eta}-Ga2O3 Power Diodes with Engineered Schottky Contact and High-permittivity Dielectric Field Plate
We report over 3 kV breakdown voltage and ultra-low leakage (011) \b{eta}-Ga2O3 power devices utilizing Schottky barrier engineering and high-permittivity (\k{appa}) dielectric (ZrO2) field plate. The (011) orientation of \b{eta}-Ga2O3 enabled low background doping and thick drift layers which are promising to support kV-class vertical \b{eta}-Ga2O3 power switches. The Schottky barrier engineering was performed with a composite Pt cap/PtOx/Pt (1.5 nm) anode contact to take advantage of the enhanced reverse blocking capabilities enabled by PtOx while allowing low turn-on voltage by the interfacing thin Pt layer. We also performed a systematic study using a co-processed Pt/(011) \b{eta}-Ga2O3 Schottky barrier diodes (SBDs) on the same wafer. The bare SBDs revealed a breakdown voltage of ~1.5 kV, while the field-plate Pt/(011) \b{eta}-Ga2O3 SBDs achieved an increased breakdown voltage of 2.75 kV owing to the edge field management. Further enhancement of the breakdown voltage was achieved by tunneling leakage management using composite Pt cap/PtOx/Pt (1.5 nm) Schottky contacts that ultimately enabled breakdown voltage of 3.7 kV for the field-plate diodes. Remarkably, the Pt cap/PtOx/Pt (1.5 nm) Schottky contacts maintained similar turn-on voltage as the Pt/(011) \b{eta}-Ga2O3 SBDs. The combination of efficient tunneling leakage management by composite Pt cap/PtOx/Pt (1.5 nm) contacts with similar turn-on voltage, edge field reduction by high-\k{appa} dielectric ZrO2 field plate, as well as the advantageous material properties offered by (011) \b{eta}-Ga2O3 demonstrate a promising strategy for developing ultra-low leakage and multi-kV class vertical (011) \b{eta}-Ga2O3 power devices.
An OPF-based Control Framework for Hybrid AC-MTDC Power Systems under Uncertainty
The increasing integration of renewable energy, particularly offshore wind, introduces significant uncertainty into hybrid AC-HVDC systems due to forecast errors and power fluctuations. Conventional control strategies typically rely on fixed setpoints and neglect frequency deviations, which can compromise system stability under rapid renewable variations. To address this challenge, this paper presents a forecast-integrated, optimal power flow (OPF)-based adaptive control framework. Wind speed forecasts generated using a Random Forest model are incorporated into a time-coupled OPF to determine baseline converter setpoints in anticipation of wind fluctuations, which are further adjusted in real time based on actual operating conditions. An adaptive droop control scheme is developed that jointly considers DC voltage and AC frequency deviations. The effectiveness of the proposed control framework is validated through hardware-in-the-loop (HIL) simulations, demonstrating its capability to ensure stable and robust operation of hybrid AC-HVDC systems under high penetration of renewable energy.
Incorporating Social Awareness into Control of Unknown Multi-Agent Systems: A Real-Time Spatiotemporal Tubes Approach
This paper presents a decentralized control framework that incorporates social awareness into multi-agent systems with unknown dynamics to achieve prescribed-time reach-avoid-stay tasks in dynamic environments. Each agent is assigned a social awareness index that quantifies its level of cooperation or self-interest, allowing heterogeneous social behaviors within the system. Building on the spatiotemporal tube (STT) framework, we propose a real-time STT framework that synthesizes tubes online for each agent while capturing its social interactions with others. A closed-form, approximation-free control law is derived to ensure that each agent remains within its evolving STT, thereby avoiding dynamic obstacles while also preventing inter-agent collisions in a socially aware manner, and reaching the target within a prescribed time. The proposed approach provides formal guarantees on safety and timing, and is computationally lightweight, model-free, and robust to unknown disturbances. The effectiveness and scalability of the framework are validated through simulation and hardware experiments on a 2D omnidirectional
Optimal and Heuristic Approaches for Platooning Systems with Deadlines
Efficient truck platooning is a key strategy for reducing freight costs, lowering fuel consumption, and mitigating emissions. Deadlines are critical in this context, as trucks must depart within specific time windows to meet delivery requirements and avoid penalties. In this paper, we investigate the optimal formation and dispatch of truck platoons at a highway station with finite capacity $L$ and deadline constraints $T$. The system operates in discrete time, with each arriving truck assigned a deadline of $T$ slot units. The objective is to leverage the efficiency gains from forming large platoons while accounting for waiting costs and deadline violations. We formulate the problem as a Markov decision process and analyze the structure of the optimal policy $\pi^\star$ for $L = 3$, extending insights to arbitrary $L$. We prove that the $\pi^\star$ is monotone in the state space $\mathcal{S}$ and identify classes of unreachable states. Moreover, since $\mathcal{S}$ grows exponentially with $L$ and $T$, we propose heuristics-including conditional and deep-learning based approaches-that exploit these structural insights while maintaining low computational complexity.
Deep Reinforcement Learning-Based Cooperative Rate Splitting for Satellite-to-Underground Communication Networks
Reliable downlink communication in satellite-to-underground networks remains challenging due to severe signal attenuation caused by underground soil and refraction in the air-soil interface. To address this, we propose a novel cooperative rate-splitting (CRS)-aided transmission framework, where an aboveground relay decodes and forwards the common stream to underground devices (UDs). Based on this framework, we formulate a max-min fairness optimization problem that jointly optimizes power allocation, message splitting, and time slot scheduling to maximize the minimum achievable rate across UDs. To solve this high-dimensional non-convex problem under uncertain channels, we develop a deep reinforcement learning solution framework based on the proximal policy optimization (PPO) algorithm that integrates distribution-aware action modeling and a multi-branch actor network. Simulation results under a realistic underground pipeline monitoring scenario demonstrate that the proposed approach achieves average max-min rate gains exceeding $167\%$ over conventional benchmark strategies across various numbers of UDs and underground conditions.
comment: 6 pages, 3 figures, 1 table, and submitted to IEEE TVT
Sum-of-Squares Certificates for Almost-Sure Reachability of Stochastic Polynomial Systems
In this paper, we present a computational approach to certify almost sure reachability for discrete-time polynomial stochastic systems by turning drift--variant criteria into sum-of-squares (SOS) programs solved with standard semidefinite solvers. Specifically, we provide an SOS method based on two complementary certificates: (i) a drift certificate that enforces a radially unbounded function to be non-increasing in expectation outside a compact set of states; and (ii) a variant certificate that guarantees a one-step decrease with positive probability and ensures the target contains its nonpositive sublevel set. We transform these conditions to SOS constraints. For the variant condition, we enforce a robust decrease over a parameterized disturbance ball with nonzero probability and encode the constraints via an S-procedure with polynomial multipliers. The resulting bilinearities are handled by an alternating scheme that alternates between optimizing multipliers and updating the variant and radius until a positive slack is obtained. Two case studies illustrate the workflow and certifies almost-sure reachability.
comment: 8 Pages, 8 Figs
A New Neural Network Paradigm for Scalable and Generalizable Stability Analysis of Power Systems
This paper presents a new neural network (NN) paradigm for scalable and generalizable stability analysis of power systems. The paradigm consists of two parts: the neural stability descriptor and the sample-augmented iterative training scheme. The first part, based on system decomposition, constructs the object (such as a stability function or condition) for stability analysis as a scalable aggregation of multiple NNs. These NNs remain fixed across varying power system structures and parameters, and are repeatedly shared within each system instance defined by these variations, thereby enabling the generalization of the neural stability descriptor across a class of power systems. The second part learns the neural stability descriptor by iteratively training the NNs with sample augmentation, guided by the tailored conservativeness-aware loss function. The training set is strategically constructed to promote the descriptor's generalizability, which is systematically evaluated by verification and validation during the training process. Specifically, the proposed NN paradigm is implemented for large-disturbance stability analysis of the bulk power grid and small-disturbance stability conditions of the microgrid system. Finally, numerical studies for the two implementations demonstrate the applicability and effectiveness of the proposed NN paradigm.
Combining Moving Mass Actuators and Manoeuvring Models for Underwater Vehicles: A Lagrangian Approach
In this paper, we present a Newton-Euler formulation of the equations of motion for underwater vehicles with an interntal moving mass actuator. Furthermore, the moving mass dynamics are expressed as an extension to the manoeuvring model for underwater vehicles, originally introduced by Fossen (1991). The influence of the moving mass is described in body-frame and included as states in both an additional kinematic equation and as part of the coupled rigid-body kinetics of the underwater vehicle. The Coriolis-centripetal effects are derived from Kirchhoff's equations and the hydrostatics are derived using first principals. The proposed Newton-Euler model is validated through simulation and compared with the traditional Hamiltonian internal moving mass actuator formulation.
comment: \c{opyright} 2025 Alexander Rambech, Ivar Saksvik and Vahid Hassani. Accepted by IFAC for publication under a Creative Commons License CC-BY-NC-ND
Data-Driven Stabilization Using Prior Knowledge on Stabilizability and Controllability
In this work, we study data-driven stabilization of linear time-invariant systems using prior knowledge of system-theoretic properties, specifically stabilizability and controllability. To formalize this, we extend the concept of data informativity by requiring the existence of a controller that stabilizes all systems consistent with the data and the prior knowledge. We show that if the system is controllable, then incorporating this as prior knowledge does not relax the conditions required for data-driven stabilization. Remarkably, however, we show that if the system is stabilizable, then using this as prior knowledge leads to necessary and sufficient conditions that are weaker than those for data-driven stabilization without prior knowledge. In other words, data-driven stabilization is easier if one knows that the underlying system is stabilizable. We also provide new data-driven control design methods in terms of linear matrix inequalities that complement the conditions for informativity.
comment: 6 pages
Quantum-Resilient Threat Modelling for Secure RIS-Assisted ISAC in 6G UAV Corridors
The rapid deployment of unmanned aerial vehicle (UAV) corridors in sixth-generation (6G) networks requires safe, intelligence-driven integrated sensing and communications (ISAC). Reconfigurable intelligent surfaces (RIS) enhance spectrum efficiency, localisation accuracy, and situational awareness, while introducing new vulnerabilities. The rise of quantum computing increases the risks associated with harvest-now-decrypt-later strategies and quantum-enhanced spoofing. We propose a Quantum-Resilient Threat Modelling (QRTM) framework for RIS-assisted ISAC in UAV corridors to address these challenges. QRTM integrates classical, quantum-ready, and quantum-aided adversaries, countered using post-quantum cryptographic (PQC) primitives: ML-KEM for key establishment and Falcon for authentication, both embedded within RIS control signalling and UAV coordination. To strengthen security sensing, the framework introduces RIS-coded scene watermarking validated through a generalised likelihood ratio test (GLRT), with its detection probability characterised by the Marcum Q function. Furthermore, a Secure ISAC Utility (SIU) jointly optimises secrecy rate, spoofing detection, and throughput under RIS constraints, enabled by a scheduler with computational complexity of O(n^2). Monte Carlo evaluations using 3GPP Release 19 mid-band urban-canyon models (7-15 GHz) demonstrate a spoof-detection probability approaching 0.99 at a false-alarm rate of 1e-3, secrecy-rate retention exceeding 90 percent against quantum-capable adversaries, and signal-interference utilisation improvements of about 25 percent compared with baselines. These results show a standards-compliant path towards reliable, quantum-resilient ISAC for UAV corridors in smart cities and non-terrestrial networks.
comment: 6 Pages, 5figures
Integrating Legal and Logical Specifications in Perception, Prediction, and Planning for Automated Driving: A Survey of Methods
This survey provides an analysis of current methodologies integrating legal and logical specifications into the perception, prediction, and planning modules of automated driving systems. We systematically explore techniques ranging from logic-based frameworks to computational legal reasoning approaches, emphasizing their capability to ensure regulatory compliance and interpretability in dynamic and uncertain driving environments. A central finding is that significant challenges arise at the intersection of perceptual reliability, legal compliance, and decision-making justifiability. To systematically analyze these challenges, we introduce a taxonomy categorizing existing approaches by their theoretical foundations, architectural implementations, and validation strategies. We particularly focus on methods that address perceptual uncertainty and incorporate explicit legal norms, facilitating decisions that are both technically robust and legally defensible. The review covers neural-symbolic integration methods for perception, logic-driven rule representation, and norm-aware prediction strategies, all contributing toward transparent and accountable autonomous vehicle operation. We highlight critical open questions and practical trade-offs that must be addressed, offering multidisciplinary insights from engineering, logic, and law to guide future developments in legally compliant autonomous driving systems.
comment: Accepted to 2025 IEEE International Automated Vehicle Validation Conference (IAVVC)
Lightweight Federated Learning in Mobile Edge Computing with Statistical and Device Heterogeneity Awareness
Federated learning enables collaborative machine learning while preserving data privacy, but high communication and computation costs, exacerbated by statistical and device heterogeneity, limit its practicality in mobile edge computing. Existing compression methods like sparsification and pruning reduce per-round costs but may increase training rounds and thus the total training cost, especially under heterogeneous environments. We propose a lightweight personalized FL framework built on parameter decoupling, which separates the model into shared and private subspaces, enabling us to uniquely apply gradient sparsification to the shared component and model pruning to the private one. This structural separation confines communication compression to global knowledge exchange and computation reduction to local personalization, protecting personalization quality while adapting to heterogeneous client resources. We theoretically analyze convergence under the combined effects of sparsification and pruning, revealing a sparsity-pruning trade-off that links to the iteration complexity. Guided by this analysis, we formulate a joint optimization that selects per-client sparsity and pruning rates and wireless bandwidth to reduce end-to-end training time. Simulation results demonstrate faster convergence and substantial reductions in overall communication and computation costs with negligible accuracy loss, validating the benefits of coordinated and resource-aware personalization in resource-constrained heterogeneous environments.
Tight Collision Avoidance for Stochastic Optimal Control: with Applications in Learning-based, Interactive Motion Planning
Trajectory planning in dense, interactive traffic scenarios presents significant challenges for autonomous vehicles, primarily due to the uncertainty of human driver behavior and the non-convex nature of collision avoidance constraints. This paper introduces a stochastic optimal control framework to address these issues simultaneously, without excessively conservative approximations. We opt to model human driver decisions as a Markov Decision Process and propose a method for handling collision avoidance between non-convex vehicle shapes by imposing a positive distance constraint between compact sets. In this framework, we investigate three alternative chance constraint formulations. To ensure computational tractability, we introduce tight, continuously differentiable reformulations of both the non-convex distance constraints and the chance constraints. The efficacy of our approach is demonstrated through simulation studies of two challenging interactive scenarios: an unregulated intersection crossing and a highway lane change in dense traffic.
comment: Preprint article, submitted for publication
Data-Enabled Predictive Control and Guidance for Autonomous Underwater Vehicles
This paper presents a fully data-driven control framework for autonomous underwater vehicles (AUVs) based on Data-Enabled Predictive Control (DeePC). The approach eliminates the need for explicit hydrodynamic modeling by exploiting measured input-output data to predict and optimize future system behavior. Classic DeePC was employed in the heading control, while a cascaded DeePC architecture is proposed for depth regulation, incorporating a loop-frequency separation to handle the different dynamic modes of input and output. For 3-D waypoint path following, the Adaptive Line-of-Sight algorithm is extended to a predictive formulation and integrated with DeePC. All methods are validated in extensive simulation on the REMUS 100 AUV and compared with classical PI/PID control. The results demonstrate superior tracking performance and robustness of DeePC under ocean-current disturbances and nonlinear operating conditions, while significantly reducing modeling effort.
comment: 12 pages, 6 figures
Shared Control for Vehicle Lane-Changing with Uncertain Driver Behaviors
Lane changes are common yet challenging driving maneuvers that require continuous decision-making and dynamic interaction with surrounding vehicles. Relying solely on human drivers for lane-changing can lead to traffic disturbances due to the stochastic nature of human behavior and its variability under different task demands. Such uncertainties may significantly degrade traffic string stability, which is critical for suppressing disturbance propagation and ensuring smooth merging of the lane-changing vehicles. This paper presents a human-automation shared lane-changing control framework that preserves driver authority while allowing automated assistance to achieve stable maneuvers in the presence of driver's behavioral uncertainty. Human driving behavior is modeled as a Markov jump process with transitions driven by task difficulty, providing a tractable representation of stochastic state switching. Based on this model, we first design a nominal stabilizing controller that guarantees stochastic ${L}_2$ string stability under imperfect mode estimation. To further balance performance and automated effort, we then develop a Minimal Intervention Controller (MIC) that retains acceptable stability while limiting automation. Simulations using lane-changing data from the NGSIM dataset verify that the nominal controller reduces speed perturbations and shorten lane-changing time, while the MIC further reduces automated effort and enhances comfort but with moderate stability and efficiency loss. Validations on the TGSIM dataset with SAE Level 2 vehicles show that the MIC enables earlier lane changes than Level 2 control while preserving driver authority with a slight stability compromise. These findings highlight the potential of shared control strategies to balance stability, efficiency, and driver acceptance.
Photoacoustics on the go: An Embedded Photoacoustic Sensing Platform
Several centimeters below the skin lie multiple biomarkers, such as glucose, oxygenation, and blood flow. Monitoring these biomarkers regularly and in a non-invasive manner would enable early insight into metabolic status and vascular health. Currently, there are only a handful of non-invasive monitoring systems. Optical methods offer molecular specificity (i.e., multi-biomarker monitoring) but have shallow reach (a few millimeters); ultrasound penetrates deeper but lacks specificity; and MRI is large, slow, and costly. Photoacoustic (PA) sensing combines the best of optical and ultrasound methods. A laser transmitter emits pulses that are absorbed by different molecules, providing specificity. These light pulses generate pressure changes that are captured by an ultrasound receiver, providing depth. Photoacoustic sensing is promising, but the current platforms are bulky, complex, and costly. We propose the first embedded PA platform. Our contributions are fourfold. First, inspired by LiDAR technology, we propose a novel transmitter that emits pulses similar to those in the state-of-the-art (SoA), but instead of using high-voltage sources and complex electronic interfaces, we use a simple low-power microcontroller (MCU). Second, we carry out a thorough analysis of our custom transmitter and a commercial system. Third, we build a basic ultrasound receiver that is able to process the faint signal generated by our transmitter. Lastly, we compare the performance of our platform against a SoA commercial system, and show that we can detect glucose and (de)oxygenated hemoglobin in two controlled solution studies. The resulting signal characteristics indicate a plausible path toward noninvasive, real-time, at-home sensing relevant to diabetes care. More broadly, this platform lays the groundwork for translating the promise of PA sensing into a broader practical reality.
Minimum time consensus for damped second order agents using Gröbner basis
A problem of achieving minimum time consensus for a set of $N$ second-order LTI system agents with bounded inputs and fuel constraints is considered. Unlike our other works, here the damping effect in agent dynamics is included. First, the attainable set for each agent with fuel budget constraints is characterized, and its boundary equations are derived. Then, using the convexity property, the minimum time at which attainable sets of all agents have a non-empty intersection is computed. By applying Helly's theorem, the computation reduces to finding the minimum time to consensus and the corresponding consensus point for each of the triplets separately.
Silicon-based Josephson junction field-effect transistors enabling cryogenic logic and quantum technologies
The continuous miniaturisation of metal-oxide-semiconductor field-effect transistors (MOSFETs) from long- to short-channel architectures has advanced beyond the predictions of Moore's Law. Continued advances in semiconductor electronics, even near current scaling and performance boundaries under cryogenic conditions, are driving the development of innovative device paradigms that enable ultra-low-power and high-speed functionality. Among emerging candidates, the Josephson Junction Field-Effect Transistor (JJFET or JoFET) provides an alternative by integrating superconducting source and drain electrodes for efficient, phase-coherent operation at ultra-low temperatures. These hybrid devices have the potential to bridge conventional semiconductor electronics with cryogenic logic and quantum circuits, enabling energy-efficient and high-coherence signal processing across temperature domains. This review traces the evolution from Josephson junctions to field-effect transistors, emphasising the structural and functional innovations that underpin modern device scalability. The performance and material compatibility of JJFETs fabricated on Si, GaAs, and InGaAs substrates are analysed, alongside an assessment of their switching dynamics and material compatibility. Particular attention is given to superconductor-silicon-superconductor Josephson junctions as the active core of JJFET architectures. By unfolding more than four decades of experimental progress, this work highlights the promise of JJFETs as foundational building blocks for next-generation cryogenic logic and quantum electronic systems.
Machine Learning and CPU (Central Processing Unit) Scheduling Co-Optimization over a Network of Computing Centers
In the rapidly evolving research on artificial intelligence (AI) the demand for fast, computationally efficient, and scalable solutions has increased in recent years. The problem of optimizing the computing resources for distributed machine learning (ML) and optimization is considered in this paper. Given a set of data distributed over a network of computing-nodes/servers, the idea is to optimally assign the CPU (central processing unit) usage while simultaneously training each computing node locally via its own share of data. This formulates the problem as a co-optimization setup to (i) optimize the data processing and (ii) optimally allocate the computing resources. The information-sharing network among the nodes might be time-varying, but with balanced weights to ensure consensus-type convergence of the algorithm. The algorithm is all-time feasible, which implies that the computing resource-demand balance constraint holds at all iterations of the proposed solution. Moreover, the solution allows addressing possible log-scale quantization over the information-sharing channels to exchange log-quantized data. For some example applications, distributed support-vector-machine (SVM) and regression are considered as the ML training models. Results from perturbation theory, along with Lyapunov stability and eigen-spectrum analysis, are used to prove the convergence towards the optimal case. As compared to existing CPU scheduling solutions, the proposed algorithm improves the cost optimality gap by more than $50\%$.
comment: EAAI Journal
The Waterbed Effect on Quasiperiodic Disturbance Observer: Avoidance of Sensitivity Tradeoff with Time Delays
In linear time-invariant systems, the sensitivity function to disturbances is designed under a sensitivity tradeoff known as the waterbed effect. To compensate for a quasiperiodic disturbance, a quasiperiodic disturbance observer using time delays was proposed. Its sensitivity function avoids the sensitivity tradeoff, achieving wideband harmonic suppression without amplifying aperiodic disturbances or shifting harmonic suppression frequencies. However, its open-loop transfer function is not rational and does not satisfy the assumptions of existing Bode sensitivity integrals due to its time delays. This paper provides Bode-like sensitivity integrals for the quasiperiodic disturbance observer in both continuous-time and discrete-time representations and clarifies the avoided sensitivity tradeoff with time delays.
Stochastic Long-Term Joint Decarbonization Planning for Power Systems and Data Centers: A Case Study in PJM
With the rapid growth of artificial intelligence (AI) and cloud services, data centers have become critical infrastructures driving digital economies, with increasing energy demand heightening concerns over electricity use and carbon emissions, emphasizing the need for carbon-aware infrastructure planning. Most studies assume static power systems, focus only on operational emissions, and overlook co-optimization. This paper proposes a dynamic joint planning framework that co-optimizes long-term data center and power system development over 15 years. The model determines siting, capacity, and type of data centers alongside power generation expansion, storage deployment, and retirements, accounting for both operational and embodied emissions. To handle multi-scale uncertainty, a large-scale two-stage stochastic program is formulated and solved via an enhanced Benders decomposition. Applied to the PJM Interconnection, with curated datasets released on GitHub, results show the system can support up to 55 GW peak data center demand, with Virginia (DOM) and Northern Illinois (ComEd) as optimal hosts. Compared to non-joint planning, the framework cuts investment cost by 12.6%, operational cost by 8.25%, and emissions by 5.63%. Including lifecycle emissions further raises renewable deployment by 25.5%, highlighting embodied carbon's role in deeper decarbonization.
Control Synthesis with Reinforcement Learning: A Modeling Perspective
Controllers designed with reinforcement learning can be sensitive to model mismatch. We demonstrate that designing such controllers in a virtual simulation environment with an inaccurate model is not suitable for deployment in a physical setup. Controllers designed using an accurate model is robust against disturbance and small mismatch between the physical setup and the mathematical model derived from first principles; while a poor model results in a controller that performs well in simulation but fails in physical experiments. Sensitivity analysis is used to justify these discrepancies and an empirical region of attraction estimation help us visualize their robustness.
A Parallelized Cutting-Plane Algorithm for Computationally Efficient Modelling to Generate Alternatives
Contemporary macro energy systems modelling is characterized by the need to represent strategic and operational decisions with high temporal and spatial resolution and represent discrete investment and retirement decisions. This drive towards greater fidelity, however, conflicts with a simultaneous push towards greater model representation of inherent complexity in decision making, including methods like Modelling to Generate Alternatives (MGA). MGA aims to map the feasible space of a model within a cost slack by varying investment parameters without changing the operational constraints, a process which frequently requires hundreds of solutions. For large, detailed energy system models this is impossible with traditional methods, leading researchers to reduce complexity with linearized investments and zonal or temporal aggregation. This research presents a new solution method for MGA type problems using cutting-plane methods based on a tailored reformulation of Benders Decomposition. We accelerate the algorithm by sharing cuts between MGA master problems and grouping MGA objectives. We find that our new solution method consistently solves MGA problems times faster and requires less memory than existing monolithic Modelling to Generate Alternatives solution methods on linear problems, enabling rapid computation of a greater number of solutions to highly resolved models. We also show that our novel cutting-plane algorithm enables the solution of very large MGA problems with integer investment decisions.
A New Type of Axis-Angle Attitude Control Law for Rotational Systems: Synthesis, Analysis, and Experiments
Over the past few decades, continuous quaternion-based attitude control has been proven highly effective for driving rotational systems that can be modeled as rigid bodies, such as satellites and drones. However, methods rooted in this approach do not enforce the existence of a unique closed-loop (CL) equilibrium attitude-error quaternion (AEQ); and, for rotational errors about the attitude-error Euler axis larger than {\pi}rad, their proportional-control effect diminishes as the system state moves away from the stable equilibrium of the CL rotational dynamics. In this paper, we introduce a new type of attitude control law that more effectively leverages the attitude-error Euler axis-angle information to guarantee a unique CL equilibrium AEQ and to provide greater flexibility in the use of proportional-control efforts. Furthermore, using two different control laws as examples-through the construction of a strict Lyapunov function for the CL dynamics-we demonstrate that the resulting unique equilibrium of the CL rotational system can be enforced to be uniformly asymptotically stable. To assess and demonstrate the functionality and performance of the proposed approach, we performed numerical simulations and executed dozens of real-time tumble-recovery maneuvers using a small quadrotor. These simulations and flight tests compellingly demonstrate that the proposed axis-angle-based method achieves superior flight performance-compared with that obtained using a high-performance quaternion-based controller-in terms of stabilization time.
comment: 2025 International Conference on Advanced Robotics (ICAR)
Risk-Aware Safety Filters with Poisson Safety Functions and Laplace Guidance Fields
Robotic systems navigating in real-world settings require a semantic understanding of their environment to properly determine safe actions. This work aims to develop the mathematical underpinnings of such a representation -- specifically, the goal is to develop safety filters that are risk-aware. To this end, we take a two step approach: encoding an understanding of the environment via Poisson's equation, and associated risk via Laplace guidance fields. That is, we first solve a Dirichlet problem for Poisson's equation to generate a safety function that encodes system safety as its 0-superlevel set. We then separately solve a Dirichlet problem for Laplace's equation to synthesize a safe \textit{guidance field} that encodes variable levels of caution around obstacles -- by enforcing a tunable flux boundary condition. The safety function and guidance fields are then combined to define a safety constraint and used to synthesize a risk-aware safety filter which, given a semantic understanding of an environment with associated risk levels of environmental features, guarantees safety while prioritizing avoidance of higher risk obstacles. We demonstrate this method in simulation and discuss how \textit{a priori} understandings of obstacle risk can be directly incorporated into the safety filter to generate safe behaviors that are risk-aware.
Targeted Resilient Zoning for High Impact Events via Multi Circuit Polelines
The increasing frequency and severity of High Impact and Low Probability events such as hurricanes and windstorms pose significant challenges to the resilience of electrical power distribution systems, particularly in regions of New England where there is a significant amount of overhead infrastructure in areas where vegetation is predominant. Traditional reliability-focused planning is insufficient to address the systemic vulnerabilities exposed by such extreme events. This paper presents a novel risk based framework for long term resilience planning of active overhead distribution systems, with a specific focus on mitigating the impacts of high wind and hurricane induced outages.
Distribution System Reconfiguration to Mitigate Load Altering Attacks via Stackelberg Games
The widespread integration of IoT-controllable devices (e.g., smart EV charging stations and heat pumps) into modern power systems enhances capabilities but introduces critical cybersecurity risks. Specifically, these devices are susceptible to load-altering attacks (LAAs) that can compromise power system safety. This paper quantifies the impact of LAAs on nodal voltage constraint violations in distribution networks (DNs). We first present closed-form expressions to analytically characterize LAA effects and quantify the minimum number of compromised devices for a successful LAA. Based on these insights, we propose a reactive defense mechanism that mitigates LAAs through DN reconfiguration. To address strategic adversaries, we then formulate defense strategies using a non-cooperative sequential game, which models the knowledgeable and strategic attacker, accounting for the worst-case scenario and enabling the reactive defender to devise an efficient and robust defense. Further, our formulation also accounts for uncertainties in attack localization. A novel Bayesian optimization approach is introduced to compute the Stackelberg equilibrium, significantly reducing computational burden efficiently. The game-theoretic strategy effectively mitigates the attack's impact while ensuring minimal system reconfiguration.
Dual-Regularized Riccati Recursions for Interior-Point Optimal Control
We derive closed-form extensions of Riccati's recursions (both sequential and parallel) for solving dual-regularized LQR problems. We show how these methods can be used to solve general constrained, non-convex, discrete-time optimal control problems via a regularized interior point method, while guaranteeing that each step is a descent direction of an Augmented Barrier-Lagrangian merit function. We provide MIT-licensed implementations of our methods in C++ and JAX.
Continuity Conditions for Piecewise Quadratic Functions on Simplicial Conic Partitions are Equivalent
Analysis of continuous-time piecewise linear systems based on piecewise quadratic (PWQ) Lyapunov functions typically requires continuity of these functions over a partition of the state space. Several conditions for guaranteeing continuity of PWQ functions over state space partitions can be found in the literature. In this technical note, we show that these continuity conditions are equivalent over so-called simplicial conic partitions. As a consequence, the choice of which condition to impose can be based solely on practical considerations such as specific application or numerical aspects, without introducing additional conservatism in the analysis.
comment: 8 pages, 3 figures
Binary Decision Process in Pre-Evacuation Behavior
In crowd evacuation the time interval before decisive movement towards a safe place is defined as the pre-evacuation phase, and it has crucial impact on the total time required for safe egress. This process mainly refers to situation awareness and response to an external stressors, e.g., fire alarms. Due to the complexity of human cognitive process, simulation is used to study this important time interval. In this paper a binary decision process is formulated to simulate pre-evacuation time of many evacuees in a given social context. The model combines the classic opinion dynamics (the French-DeGroot model) with binary phase transition to describe how group pre-evacuation time emerges from individual interaction. The model parameters are quantitatively meaningful to human factors research within socio-psychological background, e.g., whether an individual is stubborn or open-minded, or what kind of the social topology exists among the individuals and how it matters in aggregating individuals into social groups. The modeling framework also describes collective motion of many evacuee agents in a planar space, and the resulting multi-agent system is partly similar to the Vicsek flocking model, and it is meaningful to explore complex social behavior during phase transition of a non-equilibrium process.
comment: 5 pages
Multi-robot Motion Planning based on Nets-within-Nets Modeling and Simulation
This paper focuses on designing motion plans for a heterogeneous team of robots that must cooperate to fulfill a global mission. Robots move in an environment that contains some regions of interest, while the specification for the entire team can include avoidance, visits, or sequencing of these regions of interest. The mission is expressed in terms of a Petri net corresponding to an automaton, while each robot is also modeled by a state machine Petri net. The current work brings about the following contributions with respect to existing solutions for related problems. First, we propose a novel model, denoted High-Level robot team Petri Net (HLrtPN) system, to incorporate the specification and robot models into the Nets-within-Nets paradigm. A guard function, named Global Enabling Function, is designed to synchronize the firing of transitions so that robot motions do not violate the specification. Then, the solution is found by simulating the HLrtPN system in a specific software tool that accommodates Nets-within-Nets. Illustrative examples based on Linear Temporal Logic missions support the computational feasibility of the proposed framework.
comment: [Note for readers] This paper has been extended from a previous submission to 62nd IEEE Conference on Decision and Control, Dec. 13-15, 2023. This work has been submitted to the IEEE for possible publication
ES-HPC-MPC: Exponentially Stable Hybrid Perception Constrained MPC for Quadrotor with Suspended Payloads
Aerial transportation using quadrotors with cable-suspended payloads holds great potential for applications in disaster response, logistics, and infrastructure maintenance. However, their hybrid and underactuated dynamics pose significant control and perception challenges. Traditional approaches often assume a taut cable condition, limiting their effectiveness in real-world applications where slack-to-taut transitions occur due to disturbances. We introduce ES-HPC-MPC, a model predictive control framework that enforces exponential stability and perception-constrained control under hybrid dynamics. Our method leverages Exponentially Stabilizing Control Lyapunov Functions (ES-CLFs) to enforce stability during the tasks and Control Barrier Functions (CBFs) to maintain the payload within the onboard camera's field of view (FoV). We validate our method through both simulation and real-world experiments, demonstrating stable trajectory tracking and reliable payload perception. We validate that our method maintains stability and satisfies perception constraints while tracking dynamically infeasible trajectories and when the system is subjected to hybrid mode transitions caused by unexpected disturbances.
comment: Accepted to IEEE Robotics and Automation Letters
Privacy Preservation by Local Design in Cooperative Networked Control Systems
In this paper, we study the privacy preservation problem in a cooperative networked control system, which has closed-loop dynamics, working for the task of linear quadratic Guassian (LQG) control. The system consists of a user and a server: the user owns the plant to control, while the server provides computation capability, and the user employs the server to compute control inputs for it. To enable the server's computation, the user needs to provide the measurements of the plant states to the server, who then calculates estimates of the states, based on which the control inputs are computed. However, the user regards the states as privacy, and makes an interesting request: the user wants the server to have "incorrect" knowledge of the state estimates rather than the true values. Regarding that, we propose a novel design methodology for the privacy preservation, in which the privacy scheme is locally equipped at the user side not open to the server, which manages to create a deviation in the server's knowledge of the state estimates from the true values. However, this methodology also raises significant challenges: in a closed-loop dynamic system, when the server's seized knowledge is incorrect, the system's behavior becomes complex to analyze; even the stability of the system becomes questionable, as the incorrectness will accumulate through the closed loop as time evolves. In this paper, we succeed in showing that the performance loss in LQG control caused by the proposed privacy scheme is bounded by rigorous mathematical proofs, which convinces the availability of the proposed design methodology. We also propose an associated novel privacy metric and obtain the analytical result on evaluating the privacy performance. Finally, we study the performance trade-off between privacy and control, where the accordingly proposed optimization problems are solved by numerical methods efficiently.
comment: 14 pages, 7 figures
Cryo-CMOS Antenna for Wireless Communications within a Quantum Computer Cryostat
Scaling quantum computers from a few qubits to large numbers remains one of the critical challenges in realizing practical quantum advantage. Multi-core quantum architectures have emerged as a promising solution, enabling scalability through distributed quantum processing units (QPUs) interconnected via classical and quantum links. However, the bottleneck of wired connections persists, as densely packed wired interconnects, both vertically across temperature stages and horizontally within the same layer, introduce spatial constraints, power dissipation, and latency, which could hinder performance as the number of QPUs increases. To overcome these limitations, this work proposes a cryo-compatible on-chip differential dipole antenna operating at 28 GHz to enable short-range wireless communication within a quantum computer cryostat. Temperature-dependent material properties are incorporated to accurately capture antenna behavior at 4 K. Moreover, by embedding the antenna in a realistic cryostat structure, we evaluate the feasibility of antenna operation within the cryogenic environment. The proposed antenna achieves a reflection coefficient of -20.8 dB in free space and -18.38 dB within the cryostat, demonstrating efficient impedance matching.
Data-Efficient Excavation Force Estimation for Wheel Loaders
Accurate prediction of excavation forces is critical for enabling autonomous operation and optimizing control strategies in earthmoving machinery. Conventional approaches often depend on extensive data collection or computationally expensive simulations across multiple soil types, which limits their scalability and adaptability. This study presents a data-efficient framework that calibrates soil parameters using force measurements from the preceding bucket-loading cycle. The proposed method is based on an analytical soil-tool interaction model formulated through the fundamental earthmoving equation, and employs a multi-stage optimization procedure during the loading phase to identify relevant soil parameters. These estimated parameters are then used to predict excavation forces in the subsequent cycle, allowing the system to adapt its control inputs without relying on large-scale datasets or machine learning model training. The framework is validated through high-fidelity simulations in the Algoryx Dynamics engine under different soil types and excavation trajectories, achieving root-mean-square prediction errors between 10% and 15%. This cycle-to-cycle adaptation demonstrates strong potential for scalable, online force estimation and efficient path planning in wheel loader operations.
Systems and Control (EESS)
Over 3 kV and Ultra-Low leakage Vertical (011) \b{eta}-Ga2O3 Power Diodes with Engineered Schottky Contact and High-permittivity Dielectric Field Plate
We report over 3 kV breakdown voltage and ultra-low leakage (011) \b{eta}-Ga2O3 power devices utilizing Schottky barrier engineering and high-permittivity (\k{appa}) dielectric (ZrO2) field plate. The (011) orientation of \b{eta}-Ga2O3 enabled low background doping and thick drift layers which are promising to support kV-class vertical \b{eta}-Ga2O3 power switches. The Schottky barrier engineering was performed with a composite Pt cap/PtOx/Pt (1.5 nm) anode contact to take advantage of the enhanced reverse blocking capabilities enabled by PtOx while allowing low turn-on voltage by the interfacing thin Pt layer. We also performed a systematic study using a co-processed Pt/(011) \b{eta}-Ga2O3 Schottky barrier diodes (SBDs) on the same wafer. The bare SBDs revealed a breakdown voltage of ~1.5 kV, while the field-plate Pt/(011) \b{eta}-Ga2O3 SBDs achieved an increased breakdown voltage of 2.75 kV owing to the edge field management. Further enhancement of the breakdown voltage was achieved by tunneling leakage management using composite Pt cap/PtOx/Pt (1.5 nm) Schottky contacts that ultimately enabled breakdown voltage of 3.7 kV for the field-plate diodes. Remarkably, the Pt cap/PtOx/Pt (1.5 nm) Schottky contacts maintained similar turn-on voltage as the Pt/(011) \b{eta}-Ga2O3 SBDs. The combination of efficient tunneling leakage management by composite Pt cap/PtOx/Pt (1.5 nm) contacts with similar turn-on voltage, edge field reduction by high-\k{appa} dielectric ZrO2 field plate, as well as the advantageous material properties offered by (011) \b{eta}-Ga2O3 demonstrate a promising strategy for developing ultra-low leakage and multi-kV class vertical (011) \b{eta}-Ga2O3 power devices.
An OPF-based Control Framework for Hybrid AC-MTDC Power Systems under Uncertainty
The increasing integration of renewable energy, particularly offshore wind, introduces significant uncertainty into hybrid AC-HVDC systems due to forecast errors and power fluctuations. Conventional control strategies typically rely on fixed setpoints and neglect frequency deviations, which can compromise system stability under rapid renewable variations. To address this challenge, this paper presents a forecast-integrated, optimal power flow (OPF)-based adaptive control framework. Wind speed forecasts generated using a Random Forest model are incorporated into a time-coupled OPF to determine baseline converter setpoints in anticipation of wind fluctuations, which are further adjusted in real time based on actual operating conditions. An adaptive droop control scheme is developed that jointly considers DC voltage and AC frequency deviations. The effectiveness of the proposed control framework is validated through hardware-in-the-loop (HIL) simulations, demonstrating its capability to ensure stable and robust operation of hybrid AC-HVDC systems under high penetration of renewable energy.
Incorporating Social Awareness into Control of Unknown Multi-Agent Systems: A Real-Time Spatiotemporal Tubes Approach
This paper presents a decentralized control framework that incorporates social awareness into multi-agent systems with unknown dynamics to achieve prescribed-time reach-avoid-stay tasks in dynamic environments. Each agent is assigned a social awareness index that quantifies its level of cooperation or self-interest, allowing heterogeneous social behaviors within the system. Building on the spatiotemporal tube (STT) framework, we propose a real-time STT framework that synthesizes tubes online for each agent while capturing its social interactions with others. A closed-form, approximation-free control law is derived to ensure that each agent remains within its evolving STT, thereby avoiding dynamic obstacles while also preventing inter-agent collisions in a socially aware manner, and reaching the target within a prescribed time. The proposed approach provides formal guarantees on safety and timing, and is computationally lightweight, model-free, and robust to unknown disturbances. The effectiveness and scalability of the framework are validated through simulation and hardware experiments on a 2D omnidirectional
Optimal and Heuristic Approaches for Platooning Systems with Deadlines
Efficient truck platooning is a key strategy for reducing freight costs, lowering fuel consumption, and mitigating emissions. Deadlines are critical in this context, as trucks must depart within specific time windows to meet delivery requirements and avoid penalties. In this paper, we investigate the optimal formation and dispatch of truck platoons at a highway station with finite capacity $L$ and deadline constraints $T$. The system operates in discrete time, with each arriving truck assigned a deadline of $T$ slot units. The objective is to leverage the efficiency gains from forming large platoons while accounting for waiting costs and deadline violations. We formulate the problem as a Markov decision process and analyze the structure of the optimal policy $\pi^\star$ for $L = 3$, extending insights to arbitrary $L$. We prove that the $\pi^\star$ is monotone in the state space $\mathcal{S}$ and identify classes of unreachable states. Moreover, since $\mathcal{S}$ grows exponentially with $L$ and $T$, we propose heuristics-including conditional and deep-learning based approaches-that exploit these structural insights while maintaining low computational complexity.
Deep Reinforcement Learning-Based Cooperative Rate Splitting for Satellite-to-Underground Communication Networks
Reliable downlink communication in satellite-to-underground networks remains challenging due to severe signal attenuation caused by underground soil and refraction in the air-soil interface. To address this, we propose a novel cooperative rate-splitting (CRS)-aided transmission framework, where an aboveground relay decodes and forwards the common stream to underground devices (UDs). Based on this framework, we formulate a max-min fairness optimization problem that jointly optimizes power allocation, message splitting, and time slot scheduling to maximize the minimum achievable rate across UDs. To solve this high-dimensional non-convex problem under uncertain channels, we develop a deep reinforcement learning solution framework based on the proximal policy optimization (PPO) algorithm that integrates distribution-aware action modeling and a multi-branch actor network. Simulation results under a realistic underground pipeline monitoring scenario demonstrate that the proposed approach achieves average max-min rate gains exceeding $167\%$ over conventional benchmark strategies across various numbers of UDs and underground conditions.
comment: 6 pages, 3 figures, 1 table, and submitted to IEEE TVT
Sum-of-Squares Certificates for Almost-Sure Reachability of Stochastic Polynomial Systems
In this paper, we present a computational approach to certify almost sure reachability for discrete-time polynomial stochastic systems by turning drift--variant criteria into sum-of-squares (SOS) programs solved with standard semidefinite solvers. Specifically, we provide an SOS method based on two complementary certificates: (i) a drift certificate that enforces a radially unbounded function to be non-increasing in expectation outside a compact set of states; and (ii) a variant certificate that guarantees a one-step decrease with positive probability and ensures the target contains its nonpositive sublevel set. We transform these conditions to SOS constraints. For the variant condition, we enforce a robust decrease over a parameterized disturbance ball with nonzero probability and encode the constraints via an S-procedure with polynomial multipliers. The resulting bilinearities are handled by an alternating scheme that alternates between optimizing multipliers and updating the variant and radius until a positive slack is obtained. Two case studies illustrate the workflow and certifies almost-sure reachability.
comment: 8 Pages, 8 Figs
A New Neural Network Paradigm for Scalable and Generalizable Stability Analysis of Power Systems
This paper presents a new neural network (NN) paradigm for scalable and generalizable stability analysis of power systems. The paradigm consists of two parts: the neural stability descriptor and the sample-augmented iterative training scheme. The first part, based on system decomposition, constructs the object (such as a stability function or condition) for stability analysis as a scalable aggregation of multiple NNs. These NNs remain fixed across varying power system structures and parameters, and are repeatedly shared within each system instance defined by these variations, thereby enabling the generalization of the neural stability descriptor across a class of power systems. The second part learns the neural stability descriptor by iteratively training the NNs with sample augmentation, guided by the tailored conservativeness-aware loss function. The training set is strategically constructed to promote the descriptor's generalizability, which is systematically evaluated by verification and validation during the training process. Specifically, the proposed NN paradigm is implemented for large-disturbance stability analysis of the bulk power grid and small-disturbance stability conditions of the microgrid system. Finally, numerical studies for the two implementations demonstrate the applicability and effectiveness of the proposed NN paradigm.
Combining Moving Mass Actuators and Manoeuvring Models for Underwater Vehicles: A Lagrangian Approach
In this paper, we present a Newton-Euler formulation of the equations of motion for underwater vehicles with an interntal moving mass actuator. Furthermore, the moving mass dynamics are expressed as an extension to the manoeuvring model for underwater vehicles, originally introduced by Fossen (1991). The influence of the moving mass is described in body-frame and included as states in both an additional kinematic equation and as part of the coupled rigid-body kinetics of the underwater vehicle. The Coriolis-centripetal effects are derived from Kirchhoff's equations and the hydrostatics are derived using first principals. The proposed Newton-Euler model is validated through simulation and compared with the traditional Hamiltonian internal moving mass actuator formulation.
comment: \c{opyright} 2025 Alexander Rambech, Ivar Saksvik and Vahid Hassani. Accepted by IFAC for publication under a Creative Commons License CC-BY-NC-ND
Data-Driven Stabilization Using Prior Knowledge on Stabilizability and Controllability
In this work, we study data-driven stabilization of linear time-invariant systems using prior knowledge of system-theoretic properties, specifically stabilizability and controllability. To formalize this, we extend the concept of data informativity by requiring the existence of a controller that stabilizes all systems consistent with the data and the prior knowledge. We show that if the system is controllable, then incorporating this as prior knowledge does not relax the conditions required for data-driven stabilization. Remarkably, however, we show that if the system is stabilizable, then using this as prior knowledge leads to necessary and sufficient conditions that are weaker than those for data-driven stabilization without prior knowledge. In other words, data-driven stabilization is easier if one knows that the underlying system is stabilizable. We also provide new data-driven control design methods in terms of linear matrix inequalities that complement the conditions for informativity.
comment: 6 pages
Quantum-Resilient Threat Modelling for Secure RIS-Assisted ISAC in 6G UAV Corridors
The rapid deployment of unmanned aerial vehicle (UAV) corridors in sixth-generation (6G) networks requires safe, intelligence-driven integrated sensing and communications (ISAC). Reconfigurable intelligent surfaces (RIS) enhance spectrum efficiency, localisation accuracy, and situational awareness, while introducing new vulnerabilities. The rise of quantum computing increases the risks associated with harvest-now-decrypt-later strategies and quantum-enhanced spoofing. We propose a Quantum-Resilient Threat Modelling (QRTM) framework for RIS-assisted ISAC in UAV corridors to address these challenges. QRTM integrates classical, quantum-ready, and quantum-aided adversaries, countered using post-quantum cryptographic (PQC) primitives: ML-KEM for key establishment and Falcon for authentication, both embedded within RIS control signalling and UAV coordination. To strengthen security sensing, the framework introduces RIS-coded scene watermarking validated through a generalised likelihood ratio test (GLRT), with its detection probability characterised by the Marcum Q function. Furthermore, a Secure ISAC Utility (SIU) jointly optimises secrecy rate, spoofing detection, and throughput under RIS constraints, enabled by a scheduler with computational complexity of O(n^2). Monte Carlo evaluations using 3GPP Release 19 mid-band urban-canyon models (7-15 GHz) demonstrate a spoof-detection probability approaching 0.99 at a false-alarm rate of 1e-3, secrecy-rate retention exceeding 90 percent against quantum-capable adversaries, and signal-interference utilisation improvements of about 25 percent compared with baselines. These results show a standards-compliant path towards reliable, quantum-resilient ISAC for UAV corridors in smart cities and non-terrestrial networks.
comment: 6 Pages, 5figures
Integrating Legal and Logical Specifications in Perception, Prediction, and Planning for Automated Driving: A Survey of Methods
This survey provides an analysis of current methodologies integrating legal and logical specifications into the perception, prediction, and planning modules of automated driving systems. We systematically explore techniques ranging from logic-based frameworks to computational legal reasoning approaches, emphasizing their capability to ensure regulatory compliance and interpretability in dynamic and uncertain driving environments. A central finding is that significant challenges arise at the intersection of perceptual reliability, legal compliance, and decision-making justifiability. To systematically analyze these challenges, we introduce a taxonomy categorizing existing approaches by their theoretical foundations, architectural implementations, and validation strategies. We particularly focus on methods that address perceptual uncertainty and incorporate explicit legal norms, facilitating decisions that are both technically robust and legally defensible. The review covers neural-symbolic integration methods for perception, logic-driven rule representation, and norm-aware prediction strategies, all contributing toward transparent and accountable autonomous vehicle operation. We highlight critical open questions and practical trade-offs that must be addressed, offering multidisciplinary insights from engineering, logic, and law to guide future developments in legally compliant autonomous driving systems.
comment: Accepted to 2025 IEEE International Automated Vehicle Validation Conference (IAVVC)
Lightweight Federated Learning in Mobile Edge Computing with Statistical and Device Heterogeneity Awareness
Federated learning enables collaborative machine learning while preserving data privacy, but high communication and computation costs, exacerbated by statistical and device heterogeneity, limit its practicality in mobile edge computing. Existing compression methods like sparsification and pruning reduce per-round costs but may increase training rounds and thus the total training cost, especially under heterogeneous environments. We propose a lightweight personalized FL framework built on parameter decoupling, which separates the model into shared and private subspaces, enabling us to uniquely apply gradient sparsification to the shared component and model pruning to the private one. This structural separation confines communication compression to global knowledge exchange and computation reduction to local personalization, protecting personalization quality while adapting to heterogeneous client resources. We theoretically analyze convergence under the combined effects of sparsification and pruning, revealing a sparsity-pruning trade-off that links to the iteration complexity. Guided by this analysis, we formulate a joint optimization that selects per-client sparsity and pruning rates and wireless bandwidth to reduce end-to-end training time. Simulation results demonstrate faster convergence and substantial reductions in overall communication and computation costs with negligible accuracy loss, validating the benefits of coordinated and resource-aware personalization in resource-constrained heterogeneous environments.
Tight Collision Avoidance for Stochastic Optimal Control: with Applications in Learning-based, Interactive Motion Planning
Trajectory planning in dense, interactive traffic scenarios presents significant challenges for autonomous vehicles, primarily due to the uncertainty of human driver behavior and the non-convex nature of collision avoidance constraints. This paper introduces a stochastic optimal control framework to address these issues simultaneously, without excessively conservative approximations. We opt to model human driver decisions as a Markov Decision Process and propose a method for handling collision avoidance between non-convex vehicle shapes by imposing a positive distance constraint between compact sets. In this framework, we investigate three alternative chance constraint formulations. To ensure computational tractability, we introduce tight, continuously differentiable reformulations of both the non-convex distance constraints and the chance constraints. The efficacy of our approach is demonstrated through simulation studies of two challenging interactive scenarios: an unregulated intersection crossing and a highway lane change in dense traffic.
comment: Preprint article, submitted for publication
Data-Enabled Predictive Control and Guidance for Autonomous Underwater Vehicles
This paper presents a fully data-driven control framework for autonomous underwater vehicles (AUVs) based on Data-Enabled Predictive Control (DeePC). The approach eliminates the need for explicit hydrodynamic modeling by exploiting measured input-output data to predict and optimize future system behavior. Classic DeePC was employed in the heading control, while a cascaded DeePC architecture is proposed for depth regulation, incorporating a loop-frequency separation to handle the different dynamic modes of input and output. For 3-D waypoint path following, the Adaptive Line-of-Sight algorithm is extended to a predictive formulation and integrated with DeePC. All methods are validated in extensive simulation on the REMUS 100 AUV and compared with classical PI/PID control. The results demonstrate superior tracking performance and robustness of DeePC under ocean-current disturbances and nonlinear operating conditions, while significantly reducing modeling effort.
comment: 12 pages, 6 figures
Shared Control for Vehicle Lane-Changing with Uncertain Driver Behaviors
Lane changes are common yet challenging driving maneuvers that require continuous decision-making and dynamic interaction with surrounding vehicles. Relying solely on human drivers for lane-changing can lead to traffic disturbances due to the stochastic nature of human behavior and its variability under different task demands. Such uncertainties may significantly degrade traffic string stability, which is critical for suppressing disturbance propagation and ensuring smooth merging of the lane-changing vehicles. This paper presents a human-automation shared lane-changing control framework that preserves driver authority while allowing automated assistance to achieve stable maneuvers in the presence of driver's behavioral uncertainty. Human driving behavior is modeled as a Markov jump process with transitions driven by task difficulty, providing a tractable representation of stochastic state switching. Based on this model, we first design a nominal stabilizing controller that guarantees stochastic ${L}_2$ string stability under imperfect mode estimation. To further balance performance and automated effort, we then develop a Minimal Intervention Controller (MIC) that retains acceptable stability while limiting automation. Simulations using lane-changing data from the NGSIM dataset verify that the nominal controller reduces speed perturbations and shorten lane-changing time, while the MIC further reduces automated effort and enhances comfort but with moderate stability and efficiency loss. Validations on the TGSIM dataset with SAE Level 2 vehicles show that the MIC enables earlier lane changes than Level 2 control while preserving driver authority with a slight stability compromise. These findings highlight the potential of shared control strategies to balance stability, efficiency, and driver acceptance.
Photoacoustics on the go: An Embedded Photoacoustic Sensing Platform
Several centimeters below the skin lie multiple biomarkers, such as glucose, oxygenation, and blood flow. Monitoring these biomarkers regularly and in a non-invasive manner would enable early insight into metabolic status and vascular health. Currently, there are only a handful of non-invasive monitoring systems. Optical methods offer molecular specificity (i.e., multi-biomarker monitoring) but have shallow reach (a few millimeters); ultrasound penetrates deeper but lacks specificity; and MRI is large, slow, and costly. Photoacoustic (PA) sensing combines the best of optical and ultrasound methods. A laser transmitter emits pulses that are absorbed by different molecules, providing specificity. These light pulses generate pressure changes that are captured by an ultrasound receiver, providing depth. Photoacoustic sensing is promising, but the current platforms are bulky, complex, and costly. We propose the first embedded PA platform. Our contributions are fourfold. First, inspired by LiDAR technology, we propose a novel transmitter that emits pulses similar to those in the state-of-the-art (SoA), but instead of using high-voltage sources and complex electronic interfaces, we use a simple low-power microcontroller (MCU). Second, we carry out a thorough analysis of our custom transmitter and a commercial system. Third, we build a basic ultrasound receiver that is able to process the faint signal generated by our transmitter. Lastly, we compare the performance of our platform against a SoA commercial system, and show that we can detect glucose and (de)oxygenated hemoglobin in two controlled solution studies. The resulting signal characteristics indicate a plausible path toward noninvasive, real-time, at-home sensing relevant to diabetes care. More broadly, this platform lays the groundwork for translating the promise of PA sensing into a broader practical reality.
Minimum time consensus for damped second order agents using Gröbner basis
A problem of achieving minimum time consensus for a set of $N$ second-order LTI system agents with bounded inputs and fuel constraints is considered. Unlike our other works, here the damping effect in agent dynamics is included. First, the attainable set for each agent with fuel budget constraints is characterized, and its boundary equations are derived. Then, using the convexity property, the minimum time at which attainable sets of all agents have a non-empty intersection is computed. By applying Helly's theorem, the computation reduces to finding the minimum time to consensus and the corresponding consensus point for each of the triplets separately.
Silicon-based Josephson junction field-effect transistors enabling cryogenic logic and quantum technologies
The continuous miniaturisation of metal-oxide-semiconductor field-effect transistors (MOSFETs) from long- to short-channel architectures has advanced beyond the predictions of Moore's Law. Continued advances in semiconductor electronics, even near current scaling and performance boundaries under cryogenic conditions, are driving the development of innovative device paradigms that enable ultra-low-power and high-speed functionality. Among emerging candidates, the Josephson Junction Field-Effect Transistor (JJFET or JoFET) provides an alternative by integrating superconducting source and drain electrodes for efficient, phase-coherent operation at ultra-low temperatures. These hybrid devices have the potential to bridge conventional semiconductor electronics with cryogenic logic and quantum circuits, enabling energy-efficient and high-coherence signal processing across temperature domains. This review traces the evolution from Josephson junctions to field-effect transistors, emphasising the structural and functional innovations that underpin modern device scalability. The performance and material compatibility of JJFETs fabricated on Si, GaAs, and InGaAs substrates are analysed, alongside an assessment of their switching dynamics and material compatibility. Particular attention is given to superconductor-silicon-superconductor Josephson junctions as the active core of JJFET architectures. By unfolding more than four decades of experimental progress, this work highlights the promise of JJFETs as foundational building blocks for next-generation cryogenic logic and quantum electronic systems.
Machine Learning and CPU (Central Processing Unit) Scheduling Co-Optimization over a Network of Computing Centers
In the rapidly evolving research on artificial intelligence (AI) the demand for fast, computationally efficient, and scalable solutions has increased in recent years. The problem of optimizing the computing resources for distributed machine learning (ML) and optimization is considered in this paper. Given a set of data distributed over a network of computing-nodes/servers, the idea is to optimally assign the CPU (central processing unit) usage while simultaneously training each computing node locally via its own share of data. This formulates the problem as a co-optimization setup to (i) optimize the data processing and (ii) optimally allocate the computing resources. The information-sharing network among the nodes might be time-varying, but with balanced weights to ensure consensus-type convergence of the algorithm. The algorithm is all-time feasible, which implies that the computing resource-demand balance constraint holds at all iterations of the proposed solution. Moreover, the solution allows addressing possible log-scale quantization over the information-sharing channels to exchange log-quantized data. For some example applications, distributed support-vector-machine (SVM) and regression are considered as the ML training models. Results from perturbation theory, along with Lyapunov stability and eigen-spectrum analysis, are used to prove the convergence towards the optimal case. As compared to existing CPU scheduling solutions, the proposed algorithm improves the cost optimality gap by more than $50\%$.
comment: EAAI Journal
The Waterbed Effect on Quasiperiodic Disturbance Observer: Avoidance of Sensitivity Tradeoff with Time Delays
In linear time-invariant systems, the sensitivity function to disturbances is designed under a sensitivity tradeoff known as the waterbed effect. To compensate for a quasiperiodic disturbance, a quasiperiodic disturbance observer using time delays was proposed. Its sensitivity function avoids the sensitivity tradeoff, achieving wideband harmonic suppression without amplifying aperiodic disturbances or shifting harmonic suppression frequencies. However, its open-loop transfer function is not rational and does not satisfy the assumptions of existing Bode sensitivity integrals due to its time delays. This paper provides Bode-like sensitivity integrals for the quasiperiodic disturbance observer in both continuous-time and discrete-time representations and clarifies the avoided sensitivity tradeoff with time delays.
Stochastic Long-Term Joint Decarbonization Planning for Power Systems and Data Centers: A Case Study in PJM
With the rapid growth of artificial intelligence (AI) and cloud services, data centers have become critical infrastructures driving digital economies, with increasing energy demand heightening concerns over electricity use and carbon emissions, emphasizing the need for carbon-aware infrastructure planning. Most studies assume static power systems, focus only on operational emissions, and overlook co-optimization. This paper proposes a dynamic joint planning framework that co-optimizes long-term data center and power system development over 15 years. The model determines siting, capacity, and type of data centers alongside power generation expansion, storage deployment, and retirements, accounting for both operational and embodied emissions. To handle multi-scale uncertainty, a large-scale two-stage stochastic program is formulated and solved via an enhanced Benders decomposition. Applied to the PJM Interconnection, with curated datasets released on GitHub, results show the system can support up to 55 GW peak data center demand, with Virginia (DOM) and Northern Illinois (ComEd) as optimal hosts. Compared to non-joint planning, the framework cuts investment cost by 12.6%, operational cost by 8.25%, and emissions by 5.63%. Including lifecycle emissions further raises renewable deployment by 25.5%, highlighting embodied carbon's role in deeper decarbonization.
Control Synthesis with Reinforcement Learning: A Modeling Perspective
Controllers designed with reinforcement learning can be sensitive to model mismatch. We demonstrate that designing such controllers in a virtual simulation environment with an inaccurate model is not suitable for deployment in a physical setup. Controllers designed using an accurate model is robust against disturbance and small mismatch between the physical setup and the mathematical model derived from first principles; while a poor model results in a controller that performs well in simulation but fails in physical experiments. Sensitivity analysis is used to justify these discrepancies and an empirical region of attraction estimation help us visualize their robustness.
A Parallelized Cutting-Plane Algorithm for Computationally Efficient Modelling to Generate Alternatives
Contemporary macro energy systems modelling is characterized by the need to represent strategic and operational decisions with high temporal and spatial resolution and represent discrete investment and retirement decisions. This drive towards greater fidelity, however, conflicts with a simultaneous push towards greater model representation of inherent complexity in decision making, including methods like Modelling to Generate Alternatives (MGA). MGA aims to map the feasible space of a model within a cost slack by varying investment parameters without changing the operational constraints, a process which frequently requires hundreds of solutions. For large, detailed energy system models this is impossible with traditional methods, leading researchers to reduce complexity with linearized investments and zonal or temporal aggregation. This research presents a new solution method for MGA type problems using cutting-plane methods based on a tailored reformulation of Benders Decomposition. We accelerate the algorithm by sharing cuts between MGA master problems and grouping MGA objectives. We find that our new solution method consistently solves MGA problems times faster and requires less memory than existing monolithic Modelling to Generate Alternatives solution methods on linear problems, enabling rapid computation of a greater number of solutions to highly resolved models. We also show that our novel cutting-plane algorithm enables the solution of very large MGA problems with integer investment decisions.
A New Type of Axis-Angle Attitude Control Law for Rotational Systems: Synthesis, Analysis, and Experiments
Over the past few decades, continuous quaternion-based attitude control has been proven highly effective for driving rotational systems that can be modeled as rigid bodies, such as satellites and drones. However, methods rooted in this approach do not enforce the existence of a unique closed-loop (CL) equilibrium attitude-error quaternion (AEQ); and, for rotational errors about the attitude-error Euler axis larger than {\pi}rad, their proportional-control effect diminishes as the system state moves away from the stable equilibrium of the CL rotational dynamics. In this paper, we introduce a new type of attitude control law that more effectively leverages the attitude-error Euler axis-angle information to guarantee a unique CL equilibrium AEQ and to provide greater flexibility in the use of proportional-control efforts. Furthermore, using two different control laws as examples-through the construction of a strict Lyapunov function for the CL dynamics-we demonstrate that the resulting unique equilibrium of the CL rotational system can be enforced to be uniformly asymptotically stable. To assess and demonstrate the functionality and performance of the proposed approach, we performed numerical simulations and executed dozens of real-time tumble-recovery maneuvers using a small quadrotor. These simulations and flight tests compellingly demonstrate that the proposed axis-angle-based method achieves superior flight performance-compared with that obtained using a high-performance quaternion-based controller-in terms of stabilization time.
comment: 2025 International Conference on Advanced Robotics (ICAR)
Risk-Aware Safety Filters with Poisson Safety Functions and Laplace Guidance Fields
Robotic systems navigating in real-world settings require a semantic understanding of their environment to properly determine safe actions. This work aims to develop the mathematical underpinnings of such a representation -- specifically, the goal is to develop safety filters that are risk-aware. To this end, we take a two step approach: encoding an understanding of the environment via Poisson's equation, and associated risk via Laplace guidance fields. That is, we first solve a Dirichlet problem for Poisson's equation to generate a safety function that encodes system safety as its 0-superlevel set. We then separately solve a Dirichlet problem for Laplace's equation to synthesize a safe \textit{guidance field} that encodes variable levels of caution around obstacles -- by enforcing a tunable flux boundary condition. The safety function and guidance fields are then combined to define a safety constraint and used to synthesize a risk-aware safety filter which, given a semantic understanding of an environment with associated risk levels of environmental features, guarantees safety while prioritizing avoidance of higher risk obstacles. We demonstrate this method in simulation and discuss how \textit{a priori} understandings of obstacle risk can be directly incorporated into the safety filter to generate safe behaviors that are risk-aware.
Targeted Resilient Zoning for High Impact Events via Multi Circuit Polelines
The increasing frequency and severity of High Impact and Low Probability events such as hurricanes and windstorms pose significant challenges to the resilience of electrical power distribution systems, particularly in regions of New England where there is a significant amount of overhead infrastructure in areas where vegetation is predominant. Traditional reliability-focused planning is insufficient to address the systemic vulnerabilities exposed by such extreme events. This paper presents a novel risk based framework for long term resilience planning of active overhead distribution systems, with a specific focus on mitigating the impacts of high wind and hurricane induced outages.
Distribution System Reconfiguration to Mitigate Load Altering Attacks via Stackelberg Games
The widespread integration of IoT-controllable devices (e.g., smart EV charging stations and heat pumps) into modern power systems enhances capabilities but introduces critical cybersecurity risks. Specifically, these devices are susceptible to load-altering attacks (LAAs) that can compromise power system safety. This paper quantifies the impact of LAAs on nodal voltage constraint violations in distribution networks (DNs). We first present closed-form expressions to analytically characterize LAA effects and quantify the minimum number of compromised devices for a successful LAA. Based on these insights, we propose a reactive defense mechanism that mitigates LAAs through DN reconfiguration. To address strategic adversaries, we then formulate defense strategies using a non-cooperative sequential game, which models the knowledgeable and strategic attacker, accounting for the worst-case scenario and enabling the reactive defender to devise an efficient and robust defense. Further, our formulation also accounts for uncertainties in attack localization. A novel Bayesian optimization approach is introduced to compute the Stackelberg equilibrium, significantly reducing computational burden efficiently. The game-theoretic strategy effectively mitigates the attack's impact while ensuring minimal system reconfiguration.
Dual-Regularized Riccati Recursions for Interior-Point Optimal Control
We derive closed-form extensions of Riccati's recursions (both sequential and parallel) for solving dual-regularized LQR problems. We show how these methods can be used to solve general constrained, non-convex, discrete-time optimal control problems via a regularized interior point method, while guaranteeing that each step is a descent direction of an Augmented Barrier-Lagrangian merit function. We provide MIT-licensed implementations of our methods in C++ and JAX.
Continuity Conditions for Piecewise Quadratic Functions on Simplicial Conic Partitions are Equivalent
Analysis of continuous-time piecewise linear systems based on piecewise quadratic (PWQ) Lyapunov functions typically requires continuity of these functions over a partition of the state space. Several conditions for guaranteeing continuity of PWQ functions over state space partitions can be found in the literature. In this technical note, we show that these continuity conditions are equivalent over so-called simplicial conic partitions. As a consequence, the choice of which condition to impose can be based solely on practical considerations such as specific application or numerical aspects, without introducing additional conservatism in the analysis.
comment: 8 pages, 3 figures
Binary Decision Process in Pre-Evacuation Behavior
In crowd evacuation the time interval before decisive movement towards a safe place is defined as the pre-evacuation phase, and it has crucial impact on the total time required for safe egress. This process mainly refers to situation awareness and response to an external stressors, e.g., fire alarms. Due to the complexity of human cognitive process, simulation is used to study this important time interval. In this paper a binary decision process is formulated to simulate pre-evacuation time of many evacuees in a given social context. The model combines the classic opinion dynamics (the French-DeGroot model) with binary phase transition to describe how group pre-evacuation time emerges from individual interaction. The model parameters are quantitatively meaningful to human factors research within socio-psychological background, e.g., whether an individual is stubborn or open-minded, or what kind of the social topology exists among the individuals and how it matters in aggregating individuals into social groups. The modeling framework also describes collective motion of many evacuee agents in a planar space, and the resulting multi-agent system is partly similar to the Vicsek flocking model, and it is meaningful to explore complex social behavior during phase transition of a non-equilibrium process.
comment: 5 pages
Multi-robot Motion Planning based on Nets-within-Nets Modeling and Simulation
This paper focuses on designing motion plans for a heterogeneous team of robots that must cooperate to fulfill a global mission. Robots move in an environment that contains some regions of interest, while the specification for the entire team can include avoidance, visits, or sequencing of these regions of interest. The mission is expressed in terms of a Petri net corresponding to an automaton, while each robot is also modeled by a state machine Petri net. The current work brings about the following contributions with respect to existing solutions for related problems. First, we propose a novel model, denoted High-Level robot team Petri Net (HLrtPN) system, to incorporate the specification and robot models into the Nets-within-Nets paradigm. A guard function, named Global Enabling Function, is designed to synchronize the firing of transitions so that robot motions do not violate the specification. Then, the solution is found by simulating the HLrtPN system in a specific software tool that accommodates Nets-within-Nets. Illustrative examples based on Linear Temporal Logic missions support the computational feasibility of the proposed framework.
comment: [Note for readers] This paper has been extended from a previous submission to 62nd IEEE Conference on Decision and Control, Dec. 13-15, 2023. This work has been submitted to the IEEE for possible publication
ES-HPC-MPC: Exponentially Stable Hybrid Perception Constrained MPC for Quadrotor with Suspended Payloads
Aerial transportation using quadrotors with cable-suspended payloads holds great potential for applications in disaster response, logistics, and infrastructure maintenance. However, their hybrid and underactuated dynamics pose significant control and perception challenges. Traditional approaches often assume a taut cable condition, limiting their effectiveness in real-world applications where slack-to-taut transitions occur due to disturbances. We introduce ES-HPC-MPC, a model predictive control framework that enforces exponential stability and perception-constrained control under hybrid dynamics. Our method leverages Exponentially Stabilizing Control Lyapunov Functions (ES-CLFs) to enforce stability during the tasks and Control Barrier Functions (CBFs) to maintain the payload within the onboard camera's field of view (FoV). We validate our method through both simulation and real-world experiments, demonstrating stable trajectory tracking and reliable payload perception. We validate that our method maintains stability and satisfies perception constraints while tracking dynamically infeasible trajectories and when the system is subjected to hybrid mode transitions caused by unexpected disturbances.
comment: Accepted to IEEE Robotics and Automation Letters
Privacy Preservation by Local Design in Cooperative Networked Control Systems
In this paper, we study the privacy preservation problem in a cooperative networked control system, which has closed-loop dynamics, working for the task of linear quadratic Guassian (LQG) control. The system consists of a user and a server: the user owns the plant to control, while the server provides computation capability, and the user employs the server to compute control inputs for it. To enable the server's computation, the user needs to provide the measurements of the plant states to the server, who then calculates estimates of the states, based on which the control inputs are computed. However, the user regards the states as privacy, and makes an interesting request: the user wants the server to have "incorrect" knowledge of the state estimates rather than the true values. Regarding that, we propose a novel design methodology for the privacy preservation, in which the privacy scheme is locally equipped at the user side not open to the server, which manages to create a deviation in the server's knowledge of the state estimates from the true values. However, this methodology also raises significant challenges: in a closed-loop dynamic system, when the server's seized knowledge is incorrect, the system's behavior becomes complex to analyze; even the stability of the system becomes questionable, as the incorrectness will accumulate through the closed loop as time evolves. In this paper, we succeed in showing that the performance loss in LQG control caused by the proposed privacy scheme is bounded by rigorous mathematical proofs, which convinces the availability of the proposed design methodology. We also propose an associated novel privacy metric and obtain the analytical result on evaluating the privacy performance. Finally, we study the performance trade-off between privacy and control, where the accordingly proposed optimization problems are solved by numerical methods efficiently.
comment: 14 pages, 7 figures
Cryo-CMOS Antenna for Wireless Communications within a Quantum Computer Cryostat
Scaling quantum computers from a few qubits to large numbers remains one of the critical challenges in realizing practical quantum advantage. Multi-core quantum architectures have emerged as a promising solution, enabling scalability through distributed quantum processing units (QPUs) interconnected via classical and quantum links. However, the bottleneck of wired connections persists, as densely packed wired interconnects, both vertically across temperature stages and horizontally within the same layer, introduce spatial constraints, power dissipation, and latency, which could hinder performance as the number of QPUs increases. To overcome these limitations, this work proposes a cryo-compatible on-chip differential dipole antenna operating at 28 GHz to enable short-range wireless communication within a quantum computer cryostat. Temperature-dependent material properties are incorporated to accurately capture antenna behavior at 4 K. Moreover, by embedding the antenna in a realistic cryostat structure, we evaluate the feasibility of antenna operation within the cryogenic environment. The proposed antenna achieves a reflection coefficient of -20.8 dB in free space and -18.38 dB within the cryostat, demonstrating efficient impedance matching.
Data-Efficient Excavation Force Estimation for Wheel Loaders
Accurate prediction of excavation forces is critical for enabling autonomous operation and optimizing control strategies in earthmoving machinery. Conventional approaches often depend on extensive data collection or computationally expensive simulations across multiple soil types, which limits their scalability and adaptability. This study presents a data-efficient framework that calibrates soil parameters using force measurements from the preceding bucket-loading cycle. The proposed method is based on an analytical soil-tool interaction model formulated through the fundamental earthmoving equation, and employs a multi-stage optimization procedure during the loading phase to identify relevant soil parameters. These estimated parameters are then used to predict excavation forces in the subsequent cycle, allowing the system to adapt its control inputs without relying on large-scale datasets or machine learning model training. The framework is validated through high-fidelity simulations in the Algoryx Dynamics engine under different soil types and excavation trajectories, achieving root-mean-square prediction errors between 10% and 15%. This cycle-to-cycle adaptation demonstrates strong potential for scalable, online force estimation and efficient path planning in wheel loader operations.
Robotics
Embodying Physical Computing into Soft Robots
Softening and onboarding computers and controllers is one of the final frontiers in soft robotics towards their robustness and intelligence for everyday use. In this regard, embodying soft and physical computing presents exciting potential. Physical computing seeks to encode inputs into a mechanical computing kernel and leverage the internal interactions among this kernel's constituent elements to compute the output. Moreover, such input-to-output evolution can be re-programmable. This perspective paper proposes a framework for embodying physical computing into soft robots and discusses three unique strategies in the literature: analog oscillators, physical reservoir computing, and physical algorithmic computing. These embodied computers enable the soft robot to perform complex behaviors that would otherwise require CMOS-based electronics -- including coordinated locomotion with obstacle avoidance, payload weight and orientation classification, and programmable operation based on logical rules. This paper will detail the working principles of these embodied physical computing methods, survey the current state-of-the-art, and present a perspective for future development.
A Framework for the Systematic Evaluation of Obstacle Avoidance and Object-Aware Controllers
Real-time control is an essential aspect of safe robot operation in the real world with dynamic objects. We present a framework for the analysis of object-aware controllers, methods for altering a robot's motion to anticipate and avoid possible collisions. This framework is focused on three design considerations: kinematics, motion profiles, and virtual constraints. Additionally, the analysis in this work relies on verification of robot behaviors using fundamental robot-obstacle experimental scenarios. To showcase the effectiveness of our method we compare three representative object-aware controllers. The comparison uses metrics originating from the design considerations. From the analysis, we find that the design of object-aware controllers often lacks kinematic considerations, continuity of control points, and stability in movement profiles. We conclude that this framework can be used in the future to design, compare, and benchmark obstacle avoidance methods.
Fare: Failure Resilience in Learned Visual Navigation Control
While imitation learning (IL) enables effective visual navigation, IL policies are prone to unpredictable failures in out-of-distribution (OOD) scenarios. We advance the notion of failure-resilient policies, which not only detect failures but also recover from them automatically. Failure recognition that identifies the factors causing failure is key to informing recovery: e.g. pinpointing image regions triggering failure detections can provide cues to guide recovery. We present Fare, a framework to construct failure-resilient IL policies, embedding OOD-detection and recognition in them without using explicit failure data, and pairing them with recovery heuristics. Real-world experiments show that Fare enables failure recovery across two different policy architectures, enabling robust long-range navigation in complex environments.
Feature Matching-Based Gait Phase Prediction for Obstacle Crossing Control of Powered Transfemoral Prosthesis
For amputees with powered transfemoral prosthetics, navigating obstacles or complex terrain remains challenging. This study addresses this issue by using an inertial sensor on the sound ankle to guide obstacle-crossing movements. A genetic algorithm computes the optimal neural network structure to predict the required angles of the thigh and knee joints. A gait progression prediction algorithm determines the actuation angle index for the prosthetic knee motor, ultimately defining the necessary thigh and knee angles and gait progression. Results show that when the standard deviation of Gaussian noise added to the thigh angle data is less than 1, the method can effectively eliminate noise interference, achieving 100\% accuracy in gait phase estimation under 150 Hz, with thigh angle prediction error being 8.71\% and knee angle prediction error being 6.78\%. These findings demonstrate the method's ability to accurately predict gait progression and joint angles, offering significant practical value for obstacle negotiation in powered transfemoral prosthetics.
comment: 6 pages, conference
Multi-Agent Scenario Generation in Roundabouts with a Transformer-enhanced Conditional Variational Autoencoder
With the increasing integration of intelligent driving functions into serial-produced vehicles, ensuring their functionality and robustness poses greater challenges. Compared to traditional road testing, scenario-based virtual testing offers significant advantages in terms of time and cost efficiency, reproducibility, and exploration of edge cases. We propose a Transformer-enhanced Conditional Variational Autoencoder (CVAE-T) model for generating multi-agent traffic scenarios in roundabouts, which are characterized by high vehicle dynamics and complex layouts, yet remain relatively underexplored in current research. The results show that the proposed model can accurately reconstruct original scenarios and generate realistic, diverse synthetic scenarios. Besides, two Key-Performance-Indicators (KPIs) are employed to evaluate the interactive behavior in the generated scenarios. Analysis of the latent space reveals partial disentanglement, with several latent dimensions exhibiting distinct and interpretable effects on scenario attributes such as vehicle entry timing, exit timing, and velocity profiles. The results demonstrate the model's capability to generate scenarios for the validation of intelligent driving functions involving multi-agent interactions, as well as to augment data for their development and iterative improvement.
GroundLoc: Efficient Large-Scale Outdoor LiDAR-Only Localization
In this letter, we introduce GroundLoc, a LiDAR-only localization pipeline designed to localize a mobile robot in large-scale outdoor environments using prior maps. GroundLoc employs a Bird's-Eye View (BEV) image projection focusing on the perceived ground area and utilizes the place recognition network R2D2, or alternatively, the non-learning approach Scale-Invariant Feature Transform (SIFT), to identify and select keypoints for BEV image map registration. Our results demonstrate that GroundLoc outperforms state-of-the-art methods on the SemanticKITTI and HeLiPR datasets across various sensors. In the multi-session localization evaluation, GroundLoc reaches an Average Trajectory Error (ATE) well below 50 cm on all Ouster OS2 128 sequences while meeting online runtime requirements. The system supports various sensor models, as evidenced by evaluations conducted with Velodyne HDL-64E, Ouster OS2 128, Aeva Aeries II, and Livox Avia sensors. The prior maps are stored as 2D raster image maps, which can be created from a single drive and require only 4 MB of storage per square kilometer. The source code is available at https://github.com/dcmlr/groundloc.
Towards Quadrupedal Jumping and Walking for Dynamic Locomotion using Reinforcement Learning
This paper presents a curriculum-based reinforcement learning framework for training precise and high-performance jumping policies for the robot `Olympus'. Separate policies are developed for vertical and horizontal jumps, leveraging a simple yet effective strategy. First, we densify the inherently sparse jumping reward using the laws of projectile motion. Next, a reference state initialization scheme is employed to accelerate the exploration of dynamic jumping behaviors without reliance on reference trajectories. We also present a walking policy that, when combined with the jumping policies, unlocks versatile and dynamic locomotion capabilities. Comprehensive testing validates walking on varied terrain surfaces and jumping performance that exceeds previous works, effectively crossing the Sim2Real gap. Experimental validation demonstrates horizontal jumps up to 1.25 m with centimeter accuracy and vertical jumps up to 1.0 m. Additionally, we show that with only minor modifications, the proposed method can be used to learn omnidirectional jumping.
comment: 8 pages
Spatiotemporal Calibration of Doppler Velocity Logs for Underwater Robots
The calibration of extrinsic parameters and clock offsets between sensors for high-accuracy performance in underwater SLAM systems remains insufficiently explored. Existing methods for Doppler Velocity Log (DVL) calibration are either constrained to specific sensor configurations or rely on oversimplified assumptions, and none jointly estimate translational extrinsics and time offsets. We propose a Unified Iterative Calibration (UIC) framework for general DVL sensor setups, formulated as a Maximum A Posteriori (MAP) estimation with a Gaussian Process (GP) motion prior for high-fidelity motion interpolation. UIC alternates between efficient GP-based motion state updates and gradient-based calibration variable updates, supported by a provably statistically consistent sequential initialization scheme. The proposed UIC can be applied to IMU, cameras and other modalities as co-sensors. We release an open-source DVL-camera calibration toolbox. Beyond underwater applications, several aspects of UIC-such as the integration of GP priors for MAP-based calibration and the design of provably reliable initialization procedures-are broadly applicable to other multi-sensor calibration problems. Finally, simulations and real-world tests validate our approach.
An Adaptive Inspection Planning Approach Towards Routine Monitoring in Uncertain Environments ICRA 2026
In this work, we present a hierarchical framework designed to support robotic inspection under environment uncertainty. By leveraging a known environment model, existing methods plan and safely track inspection routes to visit points of interest. However, discrepancies between the model and actual site conditions, caused by either natural or human activities, can alter the surface morphology or introduce path obstructions. To address this challenge, the proposed framework divides the inspection task into: (a) generating the initial global view-plan for region of interests based on a historical map and (b) local view replanning to adapt to the current morphology of the inspection scene. The proposed hierarchy preserves global coverage objectives while enabling reactive adaptation to the local surface morphology. This enables the local autonomy to remain robust against environment uncertainty and complete the inspection tasks. We validate the approach through deployments in real-world subterranean mines using quadrupedal robot.
comment: Submitted for ICRA 2026
GeVI-SLAM: Gravity-Enhanced Stereo Visua Inertial SLAM for Underwater Robots
Accurate visual inertial simultaneous localization and mapping (VI SLAM) for underwater robots remains a significant challenge due to frequent visual degeneracy and insufficient inertial measurement unit (IMU) motion excitation. In this paper, we present GeVI-SLAM, a gravity-enhanced stereo VI SLAM system designed to address these issues. By leveraging the stereo camera's direct depth estimation ability, we eliminate the need to estimate scale during IMU initialization, enabling stable operation even under low acceleration dynamics. With precise gravity initialization, we decouple the pitch and roll from the pose estimation and solve a 4 degrees of freedom (DOF) Perspective-n-Point (PnP) problem for pose tracking. This allows the use of a minimal 3-point solver, which significantly reduces computational time to reject outliers within a Random Sample Consensus framework. We further propose a bias-eliminated 4-DOF PnP estimator with provable consistency, ensuring the relative pose converges to the true value as the feature number increases. To handle dynamic motion, we refine the full 6-DOF pose while jointly estimating the IMU covariance, enabling adaptive weighting of the gravity prior. Extensive experiments on simulated and real-world data demonstrate that GeVI-SLAM achieves higher accuracy and greater stability compared to state-of-the-art methods.
Stochastic Prize-Collecting Games: Strategic Planning in Multi-Robot Systems
The Team Orienteering Problem (TOP) generalizes many real-world multi-robot scheduling and routing tasks that occur in autonomous mobility, aerial logistics, and surveillance applications. While many flavors of the TOP exist for planning in multi-robot systems, they assume that all the robots cooperate toward a single objective; thus, they do not extend to settings where the robots compete in reward-scarce environments. We propose Stochastic Prize-Collecting Games (SPCG) as an extension of the TOP to plan in the presence of self-interested robots operating on a graph, under energy constraints and stochastic transitions. A theoretical study on complete and star graphs establishes that there is a unique pure Nash equilibrium in SPCGs that coincides with the optimal routing solution of an equivalent TOP given a rank-based conflict resolution rule. This work proposes two algorithms: Ordinal Rank Search (ORS) to obtain the ''ordinal rank'' --one's effective rank in temporarily-formed local neighborhoods during the games' stages, and Fictitious Ordinal Response Learning (FORL) to obtain best-response policies against one's senior-rank opponents. Empirical evaluations conducted on road networks and synthetic graphs under both dynamic and stationary prize distributions show that 1) the state-aliasing induced by OR-conditioning enables learning policies that scale more efficiently to large team sizes than those trained with the global index, and 2) Policies trained with FORL generalize better to imbalanced prize distributions than those with other multi-agent training methods. Finally, the learned policies in the SPCG achieved between 87% and 95% optimality compared to an equivalent TOP solution obtained by mixed-integer linear programming.
comment: Submitted to IEEE Robotics and Automation Letters
Supervisory Measurement-Guided Noise Covariance Estimation
Reliable state estimation hinges on accurate specification of sensor noise covariances, which weigh heterogeneous measurements. In practice, these covariances are difficult to identify due to environmental variability, front-end preprocessing, and other reasons. We address this by formulating noise covariance estimation as a bilevel optimization that, from a Bayesian perspective, factorizes the joint likelihood of so-called odometry and supervisory measurements, thereby balancing information utilization with computational efficiency. The factorization converts the nested Bayesian dependency into a chain structure, enabling efficient parallel computation: at the lower level, an invariant extended Kalman filter with state augmentation estimates trajectories, while a derivative filter computes analytical gradients in parallel for upper-level gradient updates. The upper level refines the covariance to guide the lower-level estimation. Experiments on synthetic and real-world datasets show that our method achieves higher efficiency over existing baselines.
Sample-efficient and Scalable Exploration in Continuous-Time RL
Reinforcement learning algorithms are typically designed for discrete-time dynamics, even though the underlying real-world control systems are often continuous in time. In this paper, we study the problem of continuous-time reinforcement learning, where the unknown system dynamics are represented using nonlinear ordinary differential equations (ODEs). We leverage probabilistic models, such as Gaussian processes and Bayesian neural networks, to learn an uncertainty-aware model of the underlying ODE. Our algorithm, COMBRL, greedily maximizes a weighted sum of the extrinsic reward and model epistemic uncertainty. This yields a scalable and sample-efficient approach to continuous-time model-based RL. We show that COMBRL achieves sublinear regret in the reward-driven setting, and in the unsupervised RL setting (i.e., without extrinsic rewards), we provide a sample complexity bound. In our experiments, we evaluate COMBRL in both standard and unsupervised RL settings and demonstrate that it scales better, is more sample-efficient than prior methods, and outperforms baselines across several deep RL tasks.
comment: 26 pages, 6 figures, 6 tables
Adaptive Surrogate Gradients for Sequential Reinforcement Learning in Spiking Neural Networks
Neuromorphic computing systems are set to revolutionize energy-constrained robotics by achieving orders-of-magnitude efficiency gains, while enabling native temporal processing. Spiking Neural Networks (SNNs) represent a promising algorithmic approach for these systems, yet their application to complex control tasks faces two critical challenges: (1) the non-differentiable nature of spiking neurons necessitates surrogate gradients with unclear optimization properties, and (2) the stateful dynamics of SNNs require training on sequences, which in reinforcement learning (RL) is hindered by limited sequence lengths during early training, preventing the network from bridging its warm-up period. We address these challenges by systematically analyzing surrogate gradient slope settings, showing that shallower slopes increase gradient magnitude in deeper layers but reduce alignment with true gradients. In supervised learning, we find no clear preference for fixed or scheduled slopes. The effect is much more pronounced in RL settings, where shallower slopes or scheduled slopes lead to a 2.1x improvement in both training and final deployed performance. Next, we propose a novel training approach that leverages a privileged guiding policy to bootstrap the learning process, while still exploiting online environment interactions with the spiking policy. Combining our method with an adaptive slope schedule for a real-world drone position control task, we achieve an average return of 400 points, substantially outperforming prior techniques, including Behavioral Cloning and TD3BC, which achieve at most --200 points under the same conditions. This work advances both the theoretical understanding of surrogate gradient learning in SNNs and practical training methodologies for neuromorphic controllers demonstrated in real-world robotic systems.
Flatness-based trajectory planning for 3D overhead cranes with friction compensation and collision avoidance
This paper presents an optimal trajectory generation method for 3D overhead cranes by leveraging differential flatness. This framework enables the direct inclusion of complex physical and dynamic constraints, such as nonlinear friction and collision avoidance for both payload and rope. Our approach allows for aggressive movements by constraining payload swing only at the final point. A comparative simulation study validates our approach, demonstrating that neglecting dry friction leads to actuator saturation and collisions. The results show that friction modeling is a fundamental requirement for fast and safe crane trajectories.
comment: 8 pages, 11 figures
A Hybrid Approach for Visual Multi-Object Tracking
This paper proposes a visual multi-object tracking method that jointly employs stochastic and deterministic mechanisms to ensure identifier consistency for unknown and time-varying target numbers under nonlinear dynamics. A stochastic particle filter addresses nonlinear dynamics and non-Gaussian noise, with support from particle swarm optimization (PSO) to guide particles toward state distribution modes and mitigate divergence through proposed fitness measures incorporating motion consistency, appearance similarity, and social-interaction cues with neighboring targets. Deterministic association further enforces identifier consistency via a proposed cost matrix incorporating spatial consistency between particles and current detections, detection confidences, and track penalties. Subsequently, a novel scheme is proposed for the smooth updating of target states while preserving their identities, particularly for weak tracks during interactions with other targets and prolonged occlusions. Moreover, velocity regression over past states provides trend-seed velocities, enhancing particle sampling and state updates. The proposed tracker is designed to operate flexibly for both pre-recorded videos and camera live streams, where future frames are unavailable. Experimental results confirm superior performance compared to state-of-the-art trackers. The source-code reference implementations of both the proposed method and compared-trackers are provided on GitHub: https://github.com/SDU-VelKoTek/GenTrack2
comment: This work has been submitted to the IEEE for possible publication
GenTrack: A New Generation of Multi-Object Tracking
This paper introduces a novel multi-object tracking (MOT) method, dubbed GenTrack, whose main contributions include: a hybrid tracking approach employing both stochastic and deterministic manners to robustly handle unknown and time-varying numbers of targets, particularly in maintaining target identity (ID) consistency and managing nonlinear dynamics, leveraging particle swarm optimization (PSO) with some proposed fitness measures to guide stochastic particles toward their target distribution modes, enabling effective tracking even with weak and noisy object detectors, integration of social interactions among targets to enhance PSO-guided particles as well as improve continuous updates of both strong (matched) and weak (unmatched) tracks, thereby reducing ID switches and track loss, especially during occlusions, a GenTrack-based redefined visual MOT baseline incorporating a comprehensive state and observation model based on space consistency, appearance, detection confidence, track penalties, and social scores for systematic and efficient target updates, and the first-ever publicly available source-code reference implementation with minimal dependencies, featuring three variants, including GenTrack Basic, PSO, and PSO-Social, facilitating flexible reimplementation. Experimental results have shown that GenTrack provides superior performance on standard benchmarks and real-world scenarios compared to state-of-the-art trackers, with integrated implementations of baselines for fair comparison. Potential directions for future work are also discussed. The source-code reference implementations of both the proposed method and compared-trackers are provided on GitHub: https://github.com/SDU-VelKoTek/GenTrack
comment: This work has been submitted to the IEEE for possible publication
NVSim: Novel View Synthesis Simulator for Large Scale Indoor Navigation
We present NVSim, a framework that automatically constructs large-scale, navigable indoor simulators from only common image sequences, overcoming the cost and scalability limitations of traditional 3D scanning. Our approach adapts 3D Gaussian Splatting to address visual artifacts on sparsely observed floors a common issue in robotic traversal data. We introduce Floor-Aware Gaussian Splatting to ensure a clean, navigable ground plane, and a novel mesh-free traversability checking algorithm that constructs a topological graph by directly analyzing rendered views. We demonstrate our system's ability to generate valid, large-scale navigation graphs from real-world data. A video demonstration is avilable at https://youtu.be/tTiIQt6nXC8
comment: 9 pages, 10 figures
Global-State-Free Obstacle Avoidance for Quadrotor Control in Air-Ground Cooperation
CoNi-MPC provides an efficient framework for UAV control in air-ground cooperative tasks by relying exclusively on relative states, eliminating the need for global state estimation. However, its lack of environmental information poses significant challenges for obstacle avoidance. To address this issue, we propose a novel obstacle avoidance algorithm, Cooperative Non-inertial frame-based Obstacle Avoidance (CoNi-OA), designed explicitly for UAV-UGV cooperative scenarios without reliance on global state estimation or obstacle prediction. CoNi-OA uniquely utilizes a single frame of raw LiDAR data from the UAV to generate a modulation matrix, which directly adjusts the quadrotor's velocity to achieve obstacle avoidance. This modulation-based method enables real-time generation of collision-free trajectories within the UGV's non-inertial frame, significantly reducing computational demands (less than 5 ms per iteration) while maintaining safety in dynamic and unpredictable environments. The key contributions of this work include: (1) a modulation-based obstacle avoidance algorithm specifically tailored for UAV-UGV cooperation in non-inertial frames without global states; (2) rapid, real-time trajectory generation based solely on single-frame LiDAR data, removing the need for obstacle modeling or prediction; and (3) adaptability to both static and dynamic environments, thus extending applicability to featureless or unknown scenarios.
DynaRend: Learning 3D Dynamics via Masked Future Rendering for Robotic Manipulation NeurIPS 2025
Learning generalizable robotic manipulation policies remains a key challenge due to the scarcity of diverse real-world training data. While recent approaches have attempted to mitigate this through self-supervised representation learning, most either rely on 2D vision pretraining paradigms such as masked image modeling, which primarily focus on static semantics or scene geometry, or utilize large-scale video prediction models that emphasize 2D dynamics, thus failing to jointly learn the geometry, semantics, and dynamics required for effective manipulation. In this paper, we present DynaRend, a representation learning framework that learns 3D-aware and dynamics-informed triplane features via masked reconstruction and future prediction using differentiable volumetric rendering. By pretraining on multi-view RGB-D video data, DynaRend jointly captures spatial geometry, future dynamics, and task semantics in a unified triplane representation. The learned representations can be effectively transferred to downstream robotic manipulation tasks via action value map prediction. We evaluate DynaRend on two challenging benchmarks, RLBench and Colosseum, as well as in real-world robotic experiments, demonstrating substantial improvements in policy success rate, generalization to environmental perturbations, and real-world applicability across diverse manipulation tasks.
comment: Accepted to NeurIPS 2025
Can LLMs Translate Human Instructions into a Reinforcement Learning Agent's Internal Emergent Symbolic Representation?
Emergent symbolic representations are critical for enabling developmental learning agents to plan and generalize across tasks. In this work, we investigate whether large language models (LLMs) can translate human natural language instructions into the internal symbolic representations that emerge during hierarchical reinforcement learning. We apply a structured evaluation framework to measure the translation performance of commonly seen LLMs -- GPT, Claude, Deepseek and Grok -- across different internal symbolic partitions generated by a hierarchical reinforcement learning algorithm in the Ant Maze and Ant Fall environments. Our findings reveal that although LLMs demonstrate some ability to translate natural language into a symbolic representation of the environment dynamics, their performance is highly sensitive to partition granularity and task complexity. The results expose limitations in current LLMs capacity for representation alignment, highlighting the need for further research on robust alignment between language and internal agent representations.
Manipulate as Human: Learning Task-oriented Manipulation Skills by Adversarial Motion Priors
In recent years, there has been growing interest in developing robots and autonomous systems that can interact with human in a more natural and intuitive way. One of the key challenges in achieving this goal is to enable these systems to manipulate objects and tools in a manner that is similar to that of humans. In this paper, we propose a novel approach for learning human-style manipulation skills by using adversarial motion priors, which we name HMAMP. The approach leverages adversarial networks to model the complex dynamics of tool and object manipulation, as well as the aim of the manipulation task. The discriminator is trained using a combination of real-world data and simulation data executed by the agent, which is designed to train a policy that generates realistic motion trajectories that match the statistical properties of human motion. We evaluated HMAMP on one challenging manipulation task: hammering, and the results indicate that HMAMP is capable of learning human-style manipulation skills that outperform current baseline methods. Additionally, we demonstrate that HMAMP has potential for real-world applications by performing real robot arm hammering tasks. In general, HMAMP represents a significant step towards developing robots and autonomous systems that can interact with humans in a more natural and intuitive way, by learning to manipulate tools and objects in a manner similar to how humans do.
Blindfolded Experts Generalize Better: Insights from Robotic Manipulation and Videogames
Behavioral cloning is a simple yet effective technique for learning sequential decision-making from demonstrations. Recently, it has gained prominence as the core of foundation models for the physical world, where achieving generalization requires countless demonstrations of a multitude of tasks. Typically, a human expert with full information on the task demonstrates a (nearly) optimal behavior. In this paper, we propose to hide some of the task's information from the demonstrator. This ``blindfolded'' expert is compelled to employ non-trivial exploration to solve the task. We show that cloning the blindfolded expert generalizes better to unseen tasks than its fully-informed counterpart. We conduct experiments of real-world robot peg insertion tasks with (limited) human demonstrations, alongside videogames from the Procgen benchmark. Additionally, we support our findings with theoretical analysis, which confirms that the generalization error scales with $\sqrt{I/m}$, where $I$ measures the amount of task information available to the demonstrator, and $m$ is the number of demonstrated tasks. Both theory and practice indicate that cloning blindfolded experts generalizes better with fewer demonstrated tasks. Project page with videos and code: https://sites.google.com/view/blindfoldedexperts/home
BLM$_1$: A Boundless Large Model for Cross-Space, Cross-Task, and Cross-Embodiment Learning
Multimodal large language models (MLLMs) have advanced vision-language reasoning and are increasingly deployed in embodied agents. However, significant limitations remain: MLLMs generalize poorly across digital-physical spaces and embodiments; vision-language-action models (VLAs) produce low-level actions yet lack robust high-level embodied reasoning; and most embodied large language models (ELLMs) are constrained to digital-space with poor generalization to the physical world. Thus, unified models that operate seamlessly across digital and physical spaces while generalizing across embodiments and tasks remain absent. We introduce the \textbf{Boundless Large Model (BLM$_1$)}, a multimodal spatial foundation model that preserves instruction following and reasoning, incorporates embodied knowledge, and supports robust cross-embodiment control. BLM$_1$ integrates three key capabilities -- \textit{cross-space transfer, cross-task learning, and cross-embodiment generalization} -- via a two-stage training paradigm. Stage I injects embodied knowledge into the MLLM through curated digital corpora while maintaining language competence. Stage II trains a policy module through an intent-bridging interface that extracts high-level semantics from the MLLM to guide control, without fine-tuning the MLLM backbone. This process is supported by a self-collected cross-embodiment demonstration suite spanning four robot embodiments and six progressively challenging tasks. Evaluations across digital and physical benchmarks show that a single BLM$_1$ instance outperforms four model families -- MLLMs, ELLMs, VLAs, and GMLMs -- achieving $\sim\!\textbf{6%}$ gains in digital tasks and $\sim\!\textbf{3%}$ in physical tasks.
LagMemo: Language 3D Gaussian Splatting Memory for Multi-modal Open-vocabulary Multi-goal Visual Navigation
Navigating to a designated goal using visual information is a fundamental capability for intelligent robots. Most classical visual navigation methods are restricted to single-goal, single-modality, and closed set goal settings. To address the practical demands of multi-modal, open-vocabulary goal queries and multi-goal visual navigation, we propose LagMemo, a navigation system that leverages a language 3D Gaussian Splatting memory. During exploration, LagMemo constructs a unified 3D language memory. With incoming task goals, the system queries the memory, predicts candidate goal locations, and integrates a local perception-based verification mechanism to dynamically match and validate goals during navigation. For fair and rigorous evaluation, we curate GOAT-Core, a high-quality core split distilled from GOAT-Bench tailored to multi-modal open-vocabulary multi-goal visual navigation. Experimental results show that LagMemo's memory module enables effective multi-modal open-vocabulary goal localization, and that LagMemo outperforms state-of-the-art methods in multi-goal visual navigation. Project page: https://weekgoodday.github.io/lagmemo
PFEA: An LLM-based High-Level Natural Language Planning and Feedback Embodied Agent for Human-Centered AI
The rapid advancement of Large Language Models (LLMs) has marked a significant breakthrough in Artificial Intelligence (AI), ushering in a new era of Human-centered Artificial Intelligence (HAI). HAI aims to better serve human welfare and needs, thereby placing higher demands on the intelligence level of robots, particularly in aspects such as natural language interaction, complex task planning, and execution. Intelligent agents powered by LLMs have opened up new pathways for realizing HAI. However, existing LLM-based embodied agents often lack the ability to plan and execute complex natural language control tasks online. This paper explores the implementation of intelligent robotic manipulating agents based on Vision-Language Models (VLMs) in the physical world. We propose a novel embodied agent framework for robots, which comprises a human-robot voice interaction module, a vision-language agent module and an action execution module. The vision-language agent itself includes a vision-based task planner, a natural language instruction converter, and a task performance feedback evaluator. Experimental results demonstrate that our agent achieves a 28\% higher average task success rate in both simulated and real environments compared to approaches relying solely on LLM+CLIP, significantly improving the execution success rate of high-level natural language instruction tasks.
ZTRS: Zero-Imitation End-to-end Autonomous Driving with Trajectory Scoring
End-to-end autonomous driving maps raw sensor inputs directly into ego-vehicle trajectories to avoid cascading errors from perception modules and to leverage rich semantic cues. Existing frameworks largely rely on Imitation Learning (IL), which can be limited by sub-optimal expert demonstrations and covariate shift during deployment. On the other hand, Reinforcement Learning (RL) has recently shown potential in scaling up with simulations, but is typically confined to low-dimensional symbolic inputs (e.g. 3D objects and maps), falling short of full end-to-end learning from raw sensor data. We introduce ZTRS (Zero-Imitation End-to-End Autonomous Driving with Trajectory Scoring), a framework that combines the strengths of both worlds: sensor inputs without losing information and RL training for robust planning. To the best of our knowledge, ZTRS is the first framework that eliminates IL entirely by only learning from rewards while operating directly on high-dimensional sensor data. ZTRS utilizes offline reinforcement learning with our proposed Exhaustive Policy Optimization (EPO), a variant of policy gradient tailored for enumerable actions and rewards. ZTRS demonstrates strong performance across three benchmarks: Navtest (generic real-world open-loop planning), Navhard (open-loop planning in challenging real-world and synthetic scenarios), and HUGSIM (simulated closed-loop driving). Specifically, ZTRS achieves the state-of-the-art result on Navhard and outperforms IL-based baselines on HUGSIM. Code will be available at https://github.com/woxihuanjiangguo/ZTRS.
Learning Parameterized Skills from Demonstrations
We present DEPS, an end-to-end algorithm for discovering parameterized skills from expert demonstrations. Our method learns parameterized skill policies jointly with a meta-policy that selects the appropriate discrete skill and continuous parameters at each timestep. Using a combination of temporal variational inference and information-theoretic regularization methods, we address the challenge of degeneracy common in latent variable models, ensuring that the learned skills are temporally extended, semantically meaningful, and adaptable. We empirically show that learning parameterized skills from multitask expert demonstrations significantly improves generalization to unseen tasks. Our method outperforms multitask as well as skill learning baselines on both LIBERO and MetaWorld benchmarks. We also demonstrate that DEPS discovers interpretable parameterized skills, such as an object grasping skill whose continuous arguments define the grasp location.
comment: Neurips 2025
Dynamically-Consistent Trajectory Optimization for Legged Robots via Contact Point Decomposition
To generate reliable motion for legged robots through trajectory optimization, it is crucial to simultaneously compute the robot's path and contact sequence, as well as accurately consider the dynamics in the problem formulation. In this paper, we present a phase-based trajectory optimization that ensures the feasibility of translational dynamics and friction cone constraints throughout the entire trajectory. Specifically, our approach leverages the superposition properties of linear differential equations to decouple the translational dynamics for each contact point, which operates under different phase sequences. Furthermore, we utilize the differentiation matrix of B{\'e}zier polynomials to derive an analytical relationship between the robot's position and force, thereby ensuring the consistent satisfaction of translational dynamics. Additionally, by exploiting the convex closure property of B{\'e}zier polynomials, our method ensures compliance with friction cone constraints. Using the aforementioned approach, the proposed trajectory optimization framework can generate dynamically reliable motions with various gait sequences for legged robots. We validate our framework using a quadruped robot model, focusing on the feasibility of dynamics and motion generation.
comment: 8 pages, 4 figures, IEEE ROBOTICS AND AUTOMATION LETTERS. PREPRINT VERSION. ACCEPTED OCTOBER, 2025
Balanced Collaborative Exploration via Distributed Topological Graph Voronoi Partition
This work addresses the collaborative multi-robot autonomous online exploration problem, particularly focusing on distributed exploration planning for dynamically balanced exploration area partition and task allocation among a team of mobile robots operating in obstacle-dense non-convex environments. We present a novel topological map structure that simultaneously characterizes both spatial connectivity and global exploration completeness of the environment. The topological map is updated incrementally to utilize known spatial information for updating reachable spaces, while exploration targets are planned in a receding horizon fashion under global coverage guidance. A distributed weighted topological graph Voronoi algorithm is introduced implementing balanced graph space partitions of the fused topological maps. Theoretical guarantees are provided for distributed consensus convergence and equitable graph space partitions with constant bounds. A local planner optimizes the visitation sequence of exploration targets within the balanced partitioned graph space to minimize travel distance, while generating safe, smooth, and dynamically feasible motion trajectories. Comprehensive benchmarking against state-of-the-art methods demonstrates significant improvements in exploration efficiency, completeness, and workload balance across the robot team.
Language-Conditioned Representations and Mixture-of-Experts Policy for Robust Multi-Task Robotic Manipulation
Perceptual ambiguity and task conflict limit multitask robotic manipulation via imitation learning. We propose a framework combining a Language-Conditioned Visual Representation (LCVR) module and a Language-conditioned Mixture-ofExperts Density Policy (LMoE-DP). LCVR resolves perceptual ambiguities by grounding visual features with language instructions, enabling differentiation between visually similar tasks. To mitigate task conflict, LMoE-DP uses a sparse expert architecture to specialize in distinct, multimodal action distributions, stabilized by gradient modulation. On real-robot benchmarks, LCVR boosts Action Chunking with Transformers (ACT) and Diffusion Policy (DP) success rates by 33.75% and 25%, respectively. The full framework achieves a 79% average success, outperforming the advanced baseline by 21%. Our work shows that combining semantic grounding and expert specialization enables robust, efficient multi-task manipulation
comment: 8 pages
SynAD: Enhancing Real-World End-to-End Autonomous Driving Models through Synthetic Data Integration
Recent advancements in deep learning and the availability of high-quality real-world driving datasets have propelled end-to-end autonomous driving. Despite this progress, relying solely on real-world data limits the variety of driving scenarios for training. Synthetic scenario generation has emerged as a promising solution to enrich the diversity of training data; however, its application within E2E AD models remains largely unexplored. This is primarily due to the absence of a designated ego vehicle and the associated sensor inputs, such as camera or LiDAR, typically provided in real-world scenarios. To address this gap, we introduce SynAD, the first framework designed to enhance real-world E2E AD models using synthetic data. Our method designates the agent with the most comprehensive driving information as the ego vehicle in a multi-agent synthetic scenario. We further project path-level scenarios onto maps and employ a newly developed Map-to-BEV Network to derive bird's-eye-view features without relying on sensor inputs. Finally, we devise a training strategy that effectively integrates these map-based synthetic data with real driving data. Experimental results demonstrate that SynAD effectively integrates all components and notably enhances safety performance. By bridging synthetic scenario generation and E2E AD, SynAD paves the way for more comprehensive and robust autonomous driving models.
Improved Accuracy of Robot Localization Using 3-D LiDAR in a Hippocampus-Inspired Model
Boundary Vector Cells (BVCs) are a class of neurons in the brains of vertebrates that encode environmental boundaries at specific distances and allocentric directions, playing a central role in forming place fields in the hippocampus. Most computational BVC models are restricted to two-dimensional (2D) environments, making them prone to spatial ambiguities in the presence of horizontal symmetries in the environment. To address this limitation, we incorporate vertical angular sensitivity into the BVC framework, thereby enabling robust boundary detection in three dimensions, and leading to significantly more accurate spatial localization in a biologically-inspired robot model. The proposed model processes LiDAR data to capture vertical contours, thereby disambiguating locations that would be indistinguishable under a purely 2D representation. Experimental results show that in environments with minimal vertical variation, the proposed 3D model matches the performance of a 2D baseline; yet, as 3D complexity increases, it yields substantially more distinct place fields and markedly reduces spatial aliasing. These findings show that adding a vertical dimension to BVC-based localization can significantly enhance navigation and mapping in real-world 3D spaces while retaining performance parity in simpler, near-planar scenarios.
comment: 8 pages, 9 figures, Presented at the 2025 International Joint Conference on Neural Networks, Rome, July 2025
VOCALoco: Viability-Optimized Cost-aware Adaptive Locomotion
Recent advancements in legged robot locomotion have facilitated traversal over increasingly complex terrains. Despite this progress, many existing approaches rely on end-to-end deep reinforcement learning (DRL), which poses limitations in terms of safety and interpretability, especially when generalizing to novel terrains. To overcome these challenges, we introduce VOCALoco, a modular skill-selection framework that dynamically adapts locomotion strategies based on perceptual input. Given a set of pre-trained locomotion policies, VOCALoco evaluates their viability and energy-consumption by predicting both the safety of execution and the anticipated cost of transport over a fixed planning horizon. This joint assessment enables the selection of policies that are both safe and energy-efficient, given the observed local terrain. We evaluate our approach on staircase locomotion tasks, demonstrating its performance in both simulated and real-world scenarios using a quadrupedal robot. Empirical results show that VOCALoco achieves improved robustness and safety during stair ascent and descent compared to a conventional end-to-end DRL policy
comment: Accepted in IEEE Robotics and Automation Letters (RAL), 2025. 8 pages, 9 figures
A Survey on Collaborative SLAM with 3D Gaussian Splatting
This survey comprehensively reviews the evolving field of multi-robot collaborative Simultaneous Localization and Mapping (SLAM) using 3D Gaussian Splatting (3DGS). As an explicit scene representation, 3DGS has enabled unprecedented real-time, high-fidelity rendering, ideal for robotics. However, its use in multi-robot systems introduces significant challenges in maintaining global consistency, managing communication, and fusing data from heterogeneous sources. We systematically categorize approaches by their architecture -- centralized, distributed -- and analyze core components like multi-agent consistency and alignment, communication-efficient, Gaussian representation, semantic distillation, fusion and pose optimization, and real-time scalability. In addition, a summary of critical datasets and evaluation metrics is provided to contextualize performance. Finally, we identify key open challenges and chart future research directions, including lifelong mapping, semantic association and mapping, multi-model for robustness, and bridging the Sim2Real gap.
Adaptive-twist Soft Finger Mechanism for Grasping by Wrapping
This paper presents a soft robot finger capable of adaptive-twist deformation to grasp objects by wrapping them. For a soft hand to grasp and pick-up one object from densely contained multiple objects, a soft finger requires the adaptive-twist deformation function in both in-plane and out-of-plane directions. The function allows the finger to be inserted deeply into a limited gap among objects. Once inserted, the soft finger requires appropriate control of grasping force normal to contact surface, thereby maintaining the twisted deformation. In this paper, we refer to this type of grasping as grasping by wrapping. To achieve these two functions by a single actuation source, we propose a variable stiffness mechanism that can adaptively change the stiffness as the pressure is higher. We conduct a finite element analysis (FEA) on the proposed mechanism and determine its design parameter based on the FEA result. Using the developed soft finger, we report basic experimental results and demonstrations on grasping various objects.
A Comprehensive General Model of Tendon-Actuated Concentric Tube Robots with Multiple Tubes and Tendons
Tendon-actuated concentric tube mechanisms combine the advantages of tendon-driven continuum robots and concentric tube robots while addressing their respective limitations. They overcome the restricted degrees of freedom often seen in tendon-driven designs, and mitigate issues such as snapping instability associated with concentric tube robots. However, a complete and general mechanical model for these systems remains an open problem. In this work, we propose a Cosserat rod-based framework for modeling the general case of $n$ concentric tubes, each actuated by $m_i$ tendons, where $i = \{1, \ldots, n\}$. The model allows each tube to twist and elongate while enforcing a shared centerline for bending. We validate the proposed framework through experiments with two-tube and three tube assemblies under various tendon routing configurations, achieving tip prediction errors $<4\%$ of the robot's total length. We further demonstrate the model's generality by applying it to existing robots in the field, where maximum tip deviations remain around $5\%$ of the total length. This model provides a foundation for accurate shape estimation and control of advanced tendon-actuated concentric tube robots.
Defect Mitigation for Robot Arm-based Additive Manufacturing Utilizing Intelligent Control and IOT
This paper presents an integrated robotic fused deposition modeling additive manufacturing system featuring closed-loop thermal control and intelligent in-situ defect correction using a 6-degree of freedom robotic arm and an Oak-D camera. The robot arm end effector was modified to mount an E3D hotend thermally regulated by an IoT microcontroller, enabling precise temperature control through real-time feedback. Filament extrusion system was synchronized with robotic motion, coordinated via ROS2, ensuring consistent deposition along complex trajectories. A vision system based on OpenCV detects layer-wise defects position, commanding autonomous re-extrusion at identified sites. Experimental validation demonstrated successful defect mitigation in printing operations. The integrated system effectively addresses challenges real-time quality assurance. Inverse kinematics were used for motion planning, while homography transformations corrected camera perspectives for accurate defect localization. The intelligent system successfully mitigated surface anomalies without interrupting the print process. By combining real-time thermal regulation, motion control, and intelligent defect detection & correction, this architecture establishes a scalable and adaptive robotic additive manufacturing framework suitable for aerospace, biomedical, and industrial applications.
comment: This Paper Has Accepted at ASME 2025 International Mechanical Engineering Congress and Exposition (IMECE 2025)
Smooth path planning with safety margins using Piece-Wise Bezier curves
In this paper, we propose a computationally efficient quadratic programming (QP) approach for generating smooth, $C^1$ continuous paths for mobile robots using piece-wise quadratic Bezier (PWB) curves. Our method explicitly incorporates safety margins within a structured optimization framework, balancing trajectory smoothness and robustness with manageable numerical complexity suitable for real-time and embedded applications. Comparative simulations demonstrate clear advantages over traditional piece-wise linear (PWL) path planning methods, showing reduced trajectory deviations, enhanced robustness, and improved overall path quality. These benefits are validated through simulations using a Pure-Pursuit controller in representative scenarios, highlighting the practical effectiveness and scalability of our approach for safe navigation.
SCOUT: A Lightweight Framework for Scenario Coverage Assessment in Autonomous Driving
Assessing scenario coverage is crucial for evaluating the robustness of autonomous agents, yet existing methods rely on expensive human annotations or computationally intensive Large Vision-Language Models (LVLMs). These approaches are impractical for large-scale deployment due to cost and efficiency constraints. To address these shortcomings, we propose SCOUT (Scenario Coverage Oversight and Understanding Tool), a lightweight surrogate model designed to predict scenario coverage labels directly from an agent's latent sensor representations. SCOUT is trained through a distillation process, learning to approximate LVLM-generated coverage labels while eliminating the need for continuous LVLM inference or human annotation. By leveraging precomputed perception features, SCOUT avoids redundant computations and enables fast, scalable scenario coverage estimation. We evaluate our method across a large dataset of real-life autonomous navigation scenarios, demonstrating that it maintains high accuracy while significantly reducing computational cost. Our results show that SCOUT provides an effective and practical alternative for large-scale coverage analysis. While its performance depends on the quality of LVLM-generated training labels, SCOUT represents a major step toward efficient scenario coverage oversight in autonomous systems.
A Humanoid Visual-Tactile-Action Dataset for Contact-Rich Manipulation
Contact-rich manipulation has become increasingly important in robot learning. However, previous studies on robot learning datasets have focused on rigid objects and underrepresented the diversity of pressure conditions for real-world manipulation. To address this gap, we present a humanoid visual-tactile-action dataset designed for manipulating deformable soft objects. The dataset was collected via teleoperation using a humanoid robot equipped with dexterous hands, capturing multi-modal interactions under varying pressure conditions. This work also motivates future research on models with advanced optimization strategies capable of effectively leveraging the complexity and diversity of tactile signals.
HyPerNav: Hybrid Perception for Object-Oriented Navigation in Unknown Environment
Objective-oriented navigation(ObjNav) enables robot to navigate to target object directly and autonomously in an unknown environment. Effective perception in navigation in unknown environment is critical for autonomous robots. While egocentric observations from RGB-D sensors provide abundant local information, real-time top-down maps offer valuable global context for ObjNav. Nevertheless, the majority of existing studies focus on a single source, seldom integrating these two complementary perceptual modalities, despite the fact that humans naturally attend to both. With the rapid advancement of Vision-Language Models(VLMs), we propose Hybrid Perception Navigation (HyPerNav), leveraging VLMs' strong reasoning and vision-language understanding capabilities to jointly perceive both local and global information to enhance the effectiveness and intelligence of navigation in unknown environments. In both massive simulation evaluation and real-world validation, our methods achieved state-of-the-art performance against popular baselines. Benefiting from hybrid perception approach, our method captures richer cues and finds the objects more effectively, by simultaneously leveraging information understanding from egocentric observations and the top-down map. Our ablation study further proved that either of the hybrid perception contributes to the navigation performance.
comment: under review
Look and Tell: A Dataset for Multimodal Grounding Across Egocentric and Exocentric Views NeurIPS 2025
We introduce Look and Tell, a multimodal dataset for studying referential communication across egocentric and exocentric perspectives. Using Meta Project Aria smart glasses and stationary cameras, we recorded synchronized gaze, speech, and video as 25 participants instructed a partner to identify ingredients in a kitchen. Combined with 3D scene reconstructions, this setup provides a benchmark for evaluating how different spatial representations (2D vs. 3D; ego vs. exo) affect multimodal grounding. The dataset contains 3.67 hours of recordings, including 2,707 richly annotated referential expressions, and is designed to advance the development of embodied agents that can understand and engage in situated dialogue.
comment: 10 pages, 6 figures, 2 tables. Accepted to the NeurIPS 2025 Workshop on SPACE in Vision, Language, and Embodied AI (SpaVLE). Dataset: https://huggingface.co/datasets/annadeichler/KTH-ARIA-referential
FieldGen: From Teleoperated Pre-Manipulation Trajectories to Field-Guided Data Generation
Large-scale and diverse datasets are vital for training robust robotic manipulation policies, yet existing data collection methods struggle to balance scale, diversity, and quality. Simulation offers scalability but suffers from sim-to-real gaps, while teleoperation yields high-quality demonstrations with limited diversity and high labor cost. We introduce FieldGen, a field-guided data generation framework that enables scalable, diverse, and high-quality real-world data collection with minimal human supervision. FieldGen decomposes manipulation into two stages: a pre-manipulation phase, allowing trajectory diversity, and a fine manipulation phase requiring expert precision. Human demonstrations capture key contact and pose information, after which an attraction field automatically generates diverse trajectories converging to successful configurations. This decoupled design combines scalable trajectory diversity with precise supervision. Moreover, FieldGen-Reward augments generated data with reward annotations to further enhance policy learning. Experiments demonstrate that policies trained with FieldGen achieve higher success rates and improved stability compared to teleoperation-based baselines, while significantly reducing human effort in long-term real-world data collection. Webpage is available at https://fieldgen.github.io/.
comment: Webpage: https://fieldgen.github.io/
Performance evaluation of a ROS2 based Automated Driving System
Automated driving is currently a prominent area of scientific work. In the future, highly automated driving and new Advanced Driver Assistance Systems will become reality. While Advanced Driver Assistance Systems and automated driving functions for certain domains are already commercially available, ubiquitous automated driving in complex scenarios remains a subject of ongoing research. Contrarily to single-purpose Electronic Control Units, the software for automated driving is often executed on high performance PCs. The Robot Operating System 2 (ROS2) is commonly used to connect components in an automated driving system. Due to the time critical nature of automated driving systems, the performance of the framework is especially important. In this paper, a thorough performance evaluation of ROS2 is conducted, both in terms of timeliness and error rate. The results show that ROS2 is a suitable framework for automated driving systems.
comment: Published and presented at VEHITS 2024, Proceedings of the 10th International Conference on Vehicle Technology and Intelligent Transport Systems - VEHITS; 2024
Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies
Vision-Language-Action (VLA) models adapt large vision-language backbones to map images and instructions into robot actions. However, prevailing VLAs either generate actions auto-regressively in a fixed left-to-right order or attach separate MLP or diffusion heads outside the backbone, leading to fragmented information pathways and specialized training requirements that hinder a unified, scalable architecture. We present Discrete Diffusion VLA, a unified-transformer policy that models discretized action chunks with discrete diffusion. The design retains diffusion's progressive refinement paradigm while remaining natively compatible with the discrete token interface of VLMs. Our method achieves an adaptive decoding order that resolves easy action elements before harder ones and uses secondary re-masking to revisit uncertain predictions across refinement rounds, which improves consistency and enables robust error correction. This unified decoder preserves pre-trained vision-language priors, supports parallel decoding, breaks the autoregressive bottleneck, and reduces the number of function evaluations. Discrete Diffusion VLA achieves 96.3% avg. success rates on LIBERO, 71.2% visual matching on SimplerEnv-Fractal and 54.2% overall on SimplerEnv-Bridge, improving over autoregressive, MLP decoder and continuous diffusion baselines. These findings indicate that discrete-diffusion VLA supports precise action modeling and consistent training, laying groundwork for scaling VLA to larger models and datasets. Our project page is https://github.com/Liang-ZX/DiscreteDiffusionVLA
comment: 16 pages
Robust Point Cloud Reinforcement Learning via PCA-Based Canonicalization
Reinforcement Learning (RL) from raw visual input has achieved impressive successes in recent years, yet it remains fragile to out-of-distribution variations such as changes in lighting, color, and viewpoint. Point Cloud Reinforcement Learning (PC-RL) offers a promising alternative by mitigating appearance-based brittleness, but its sensitivity to camera pose mismatches continues to undermine reliability in realistic settings. To address this challenge, we propose PCA Point Cloud (PPC), a canonicalization framework specifically tailored for downstream robotic control. PPC maps point clouds under arbitrary rigid-body transformations to a unique canonical pose, aligning observations to a consistent frame, thereby substantially decreasing viewpoint-induced inconsistencies. In our experiments, we show that PPC improves robustness to unseen camera poses across challenging robotic tasks, providing a principled alternative to domain randomization.
Procedural Generation of Articulated Simulation-Ready Assets
We introduce Infinigen-Articulated, a toolkit for generating realistic, procedurally generated articulated assets for robotics simulation. We include procedural generators for 18 common articulated object categories along with high-level utilities for use creating custom articulated assets in Blender. We also provide an export pipeline to integrate the resulting assets along with their physical properties into common robotics simulators. Experiments demonstrate that assets sampled from these generators are effective for movable object segmentation, training generalizable reinforcement learning policies, and sim-to-real transfer of imitation learning policies.
comment: Updated to include information on newly implemented assets, new experimental results (both simulation and real world), and additional features including material and dynamics parameters
Concurrent-Allocation Task Execution for Multi-Robot Path-Crossing-Minimal Navigation in Obstacle Environments
Reducing undesirable path crossings among trajectories of different robots is vital in multi-robot navigation missions, which not only reduces detours and conflict scenarios, but also enhances navigation efficiency and boosts productivity. Despite recent progress in multi-robot path-crossing-minimal (MPCM) navigation, the majority of approaches depend on the minimal squared-distance reassignment of suitable desired points to robots directly. However, if obstacles occupy the passing space, calculating the actual robot-point distances becomes complex or intractable, which may render the MPCM navigation in obstacle environments inefficient or even infeasible. In this paper, the concurrent-allocation task execution (CATE) algorithm is presented to address this problem (i.e., MPCM navigation in obstacle environments). First, the path-crossing-related elements in terms of (i) robot allocation, (ii) desired-point convergence, and (iii) collision and obstacle avoidance are encoded into integer and control barrier function (CBF) constraints. Then, the proposed constraints are used in an online constrained optimization framework, which implicitly yet effectively minimizes the possible path crossings and trajectory length in obstacle environments by minimizing the desired point allocation cost and slack variables in CBF constraints simultaneously. In this way, the MPCM navigation in obstacle environments can be achieved with flexible spatial orderings. Note that the feasibility of solutions and the asymptotic convergence property of the proposed CATE algorithm in obstacle environments are both guaranteed, and the calculation burden is also reduced by concurrently calculating the optimal allocation and the control input directly without the path planning process.
comment: Accepted in IEEE Transactions on Robotics
Robot Cell Modeling via Exploratory Robot Motions: A Novel and Accessible Data-Driven Approach
Generating a collision-free robot motion is crucial for safe applications in real-world settings. This requires an accurate model of all obstacle shapes within the constrained robot cell, which is particularly challenging and time-consuming. The difficulty is heightened in flexible production lines, where the environment model must be updated each time the robot cell is modified. Furthermore, sensor-based methods often necessitate costly hardware and calibration procedures and can be influenced by environmental factors (e.g., light conditions or reflections). To address these challenges, we present a novel data-driven approach to modeling a cluttered workspace, leveraging solely the robot internal joint encoders to capture exploratory motions. By computing the corresponding swept volume (SV), we generate a (conservative) mesh of the environment that is subsequently used for collision checking within established path planning and control methods. Our method significantly reduces the complexity and cost of classical environment modeling by removing the need for computer-aided design (CAD) files and external sensors. We validate the approach with the KUKA LBR iisy collaborative robot in a pick-and-place scenario. In less than three minutes of exploratory robot motions and less than four additional minutes of computation time, we obtain an accurate model that enables collision-free motions. Our approach is intuitive and easy to use, making it accessible to users without specialized technical knowledge. It is applicable to all types of industrial robots or cobots.
comment: 8 pages, 9 figures
Two-Stage Learning of Stabilizing Neural Controllers via Zubov Sampling and Iterative Domain Expansion NeurIPS 2025
Learning-based neural network (NN) control policies have shown impressive empirical performance. However, obtaining stability guarantees and estimates of the region of attraction of these learned neural controllers is challenging due to the lack of stable and scalable training and verification algorithms. Although previous works in this area have achieved great success, much conservatism remains in their frameworks. In this work, we propose a novel two-stage training framework to jointly synthesize a controller and a Lyapunov function for continuous-time systems. By leveraging a Zubov-inspired region of attraction characterization to directly estimate stability boundaries, we propose a novel training-data sampling strategy and a domain-updating mechanism that significantly reduces the conservatism in training. Moreover, unlike existing works on continuous-time systems that rely on an SMT solver to formally verify the Lyapunov condition, we extend state-of-the-art neural network verifier $\alpha,\!\beta$-CROWN with the capability of performing automatic bound propagation through the Jacobian of dynamical systems and a novel verification scheme that avoids expensive bisection. To demonstrate the effectiveness of our approach, we conduct numerical experiments by synthesizing and verifying controllers on several challenging nonlinear systems across multiple dimensions. We show that our training can yield region of attractions with volume $5 - 1.5\cdot 10^{5}$ times larger compared to the baselines, and our verification on continuous systems can be up to $40-10{,}000$ times faster compared to the traditional SMT solver dReal. Our code is available at https://github.com/Verified-Intelligence/Two-Stage_Neural_Controller_Training.
comment: NeurIPS 2025
Acoustic Neural 3D Reconstruction Under Pose Drift IROS
We consider the problem of optimizing neural implicit surfaces for 3D reconstruction using acoustic images collected with drifting sensor poses. The accuracy of current state-of-the-art 3D acoustic modeling algorithms is highly dependent on accurate pose estimation; small errors in sensor pose can lead to severe reconstruction artifacts. In this paper, we propose an algorithm that jointly optimizes the neural scene representation and sonar poses. Our algorithm does so by parameterizing the 6DoF poses as learnable parameters and backpropagating gradients through the neural renderer and implicit representation. We validated our algorithm on both real and simulated datasets. It produces high-fidelity 3D reconstructions even under significant pose drift.
comment: 8 pages, 8 figures. This paper is accepted by 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Autonomous Horizon-based Asteroid Navigation With Observability-constrained Maneuvers
Small body exploration is a pertinent challenge due to low gravity environments and strong sensitivity to perturbations like Solar Radiation Pressure (SRP). Thus, autonomous methods are being developed to enable safe navigation and control around small bodies. These methods often involve using Optical Navigation (OpNav) to determine the spacecraft's location. Ensuring OpNav reliability would allow the spacecraft to maintain an accurate state estimate throughout its mission. This research presents an observability-constrained Lyapunov controller that steers a spacecraft to a desired target orbit while guaranteeing continuous OpNav observability. We design observability path constraints to avoid regions where horizon-based OpNav methods exhibit poor performance, ensuring control input that maintains good observability. This controller is implemented with a framework that simulates small body dynamics, synthetic image generation, edge detection, horizon-based OpNav, and filtering. We evaluate the approach in two representative scenarios, orbit maintenance and approach with circularization, around spherical and ellipsoidal target bodies. In Monte Carlo simulations, the proposed approach improves the rate of attaining target orbits without observability violations by up to 94% compared to an unconstrained Lyapunov baseline, demonstrating improved robustness over conventional methods.
comment: 52 pages, 18 figures, published in the Journal of the Astronautical Sciences
Learning to See and Act: Task-Aware View Planning for Robotic Manipulation
Recent vision-language-action (VLA) models for multi-task robotic manipulation commonly rely on static viewpoints and shared visual encoders, which limit 3D perception and cause task interference, hindering robustness and generalization. In this work, we propose Task-Aware View Planning (TAVP), a framework designed to overcome these challenges by integrating active view planning with task-specific representation learning. TAVP employs an efficient exploration policy, accelerated by a novel pseudo-environment, to actively acquire informative views. Furthermore, we introduce a Mixture-of-Experts (MoE) visual encoder to disentangle features across different tasks, boosting both representation fidelity and task generalization. By learning to see the world in a task-aware way, TAVP generates more complete and discriminative visual representations, demonstrating significantly enhanced action prediction across a wide array of manipulation challenges. Extensive experiments on RLBench tasks show that our proposed TAVP model achieves superior performance over state-of-the-art fixed-view approaches. Visual results and code are provided at: https://hcplab-sysu.github.io/TAVP.
comment: 14 pages, 8 figures, project page: https://hcplab-sysu.github.io/TAVP
DynaFlow: Dynamics-embedded Flow Matching for Physically Consistent Motion Generation from State-only Demonstrations
This paper introduces DynaFlow, a novel framework that embeds a differentiable simulator directly into a flow matching model. By generating trajectories in the action space and mapping them to dynamically feasible state trajectories via the simulator, DynaFlow ensures all outputs are physically consistent by construction. This end-to-end differentiable architecture enables training on state-only demonstrations, allowing the model to simultaneously generate physically consistent state trajectories while inferring the underlying action sequences required to produce them. We demonstrate the effectiveness of our approach through quantitative evaluations and showcase its real-world applicability by deploying the generated actions onto a physical Go1 quadruped robot. The robot successfully reproduces diverse gait present in the dataset, executes long-horizon motions in open-loop control and translates infeasible kinematic demonstrations into dynamically executable, stylistic behaviors. These hardware experiments validate that DynaFlow produces deployable, highly effective motions on real-world hardware from state-only demonstrations, effectively bridging the gap between kinematic data and real-world execution.
comment: 8 pages
GaussianFusion: Gaussian-Based Multi-Sensor Fusion for End-to-End Autonomous Driving NeurIPS2025
Multi-sensor fusion is crucial for improving the performance and robustness of end-to-end autonomous driving systems. Existing methods predominantly adopt either attention-based flatten fusion or bird's eye view fusion through geometric transformations. However, these approaches often suffer from limited interpretability or dense computational overhead. In this paper, we introduce GaussianFusion, a Gaussian-based multi-sensor fusion framework for end-to-end autonomous driving. Our method employs intuitive and compact Gaussian representations as intermediate carriers to aggregate information from diverse sensors. Specifically, we initialize a set of 2D Gaussians uniformly across the driving scene, where each Gaussian is parameterized by physical attributes and equipped with explicit and implicit features. These Gaussians are progressively refined by integrating multi-modal features. The explicit features capture rich semantic and spatial information about the traffic scene, while the implicit features provide complementary cues beneficial for trajectory planning. To fully exploit rich spatial and semantic information in Gaussians, we design a cascade planning head that iteratively refines trajectory predictions through interactions with Gaussians. Extensive experiments on the NAVSIM and Bench2Drive benchmarks demonstrate the effectiveness and robustness of the proposed GaussianFusion framework. The source code will be released at https://github.com/Say2L/GaussianFusion.
comment: Accepted at NeurIPS2025 (Spotlight)
Radar and Event Camera Fusion for Agile Robot Ego-Motion Estimation
Achieving reliable ego motion estimation for agile robots, e.g., aerobatic aircraft, remains challenging because most robot sensors fail to respond timely and clearly to highly dynamic robot motions, often resulting in measurement blurring, distortion, and delays. In this paper, we propose an IMU-free and feature-association-free framework to achieve aggressive ego-motion velocity estimation of a robot platform in highly dynamic scenarios by combining two types of exteroceptive sensors, an event camera and a millimeter wave radar, First, we used instantaneous raw events and Doppler measurements to derive rotational and translational velocities directly. Without a sophisticated association process between measurement frames, the proposed method is more robust in texture-less and structureless environments and is more computationally efficient for edge computing devices. Then, in the back-end, we propose a continuous-time state-space model to fuse the hybrid time-based and event-based measurements to estimate the ego-motion velocity in a fixed-lagged smoother fashion. In the end, we validate our velometer framework extensively in self-collected experiment datasets. The results indicate that our IMU-free and association-free ego motion estimation framework can achieve reliable and efficient velocity output in challenging environments. The source code, illustrative video and dataset are available at https://github.com/ZzhYgwh/TwistEstimator.
comment: 2025.10.28 version v2 for TwistEstimator
Boosting Omnidirectional Stereo Matching with a Pre-trained Depth Foundation Model IROS 2025
Omnidirectional depth perception is essential for mobile robotics applications that require scene understanding across a full 360{\deg} field of view. Camera-based setups offer a cost-effective option by using stereo depth estimation to generate dense, high-resolution depth maps without relying on expensive active sensing. However, existing omnidirectional stereo matching approaches achieve only limited depth accuracy across diverse environments, depth ranges, and lighting conditions, due to the scarcity of real-world data. We present DFI-OmniStereo, a novel omnidirectional stereo matching method that leverages a large-scale pre-trained foundation model for relative monocular depth estimation within an iterative optimization-based stereo matching architecture. We introduce a dedicated two-stage training strategy to utilize the relative monocular depth features for our omnidirectional stereo matching before scale-invariant fine-tuning. DFI-OmniStereo achieves state-of-the-art results on the real-world Helvipad dataset, reducing disparity MAE by approximately 16% compared to the previous best omnidirectional stereo method.
comment: Accepted at IROS 2025. Project page: https://vita-epfl.github.io/DFI-OmniStereo-website/
GRS: Generating Robotic Simulation Tasks from Real-World Images
We introduce GRS (Generating Robotic Simulation tasks), a system addressing real-to-sim for robotic simulations. GRS creates digital twin simulations from single RGB-D observations with solvable tasks for virtual agent training. Using vision-language models (VLMs), our pipeline operates in three stages: 1) scene comprehension with SAM2 for segmentation and object description, 2) matching objects with simulation-ready assets, and 3) generating appropriate tasks. We ensure simulation-task alignment through generated test suites and introduce a router that iteratively refines both simulation and test code. Experiments demonstrate our system's effectiveness in object correspondence and task environment generation through our novel router mechanism.
SimpleVSF: VLM-Scoring Fusion for Trajectory Prediction of End-to-End Autonomous Driving
End-to-end autonomous driving has emerged as a promising paradigm for achieving robust and intelligent driving policies. However, existing end-to-end methods still face significant challenges, such as suboptimal decision-making in complex scenarios. In this paper,we propose SimpleVSF (Simple VLM-Scoring Fusion), a novel framework that enhances end-to-end planning by leveraging the cognitive capabilities of Vision-Language Models (VLMs) and advanced trajectory fusion techniques. We utilize the conventional scorers and the novel VLM-enhanced scorers. And we leverage a robust weight fusioner for quantitative aggregation and a powerful VLM-based fusioner for qualitative, context-aware decision-making. As the leading approach in the ICCV 2025 NAVSIM v2 End-to-End Driving Challenge, our SimpleVSF framework demonstrates state-of-the-art performance, achieving a superior balance between safety, comfort, and efficiency.
Online Adaptation for Flying Quadrotors in Tight Formations
The task of flying in tight formations is challenging for teams of quadrotors because the complex aerodynamic wake interactions can destabilize individual team members as well as the team. Furthermore, these aerodynamic effects are highly nonlinear and fast-paced, making them difficult to model and predict. To overcome these challenges, we present L1 KNODE-DW MPC, an adaptive, mixed expert learning based control framework that allows individual quadrotors to accurately track trajectories while adapting to time-varying aerodynamic interactions during formation flights. We evaluate L1 KNODE-DW MPC in two different three-quadrotor formations and show that it outperforms several MPC baselines. Our results show that the proposed framework is capable of enabling the three-quadrotor team to remain vertically aligned in close proximity throughout the flight. These findings show that the L1 adaptive module compensates for unmodeled disturbances most effectively when paired with an accurate dynamics model. A video showcasing our framework and the physical experiments is available here: https://youtu.be/9QX1Q5Ut9Rs
comment: 10 pages, 4 figures
EquiContact: A Hierarchical SE(3) Vision-to-Force Equivariant Policy for Spatially Generalizable Contact-rich Tasks
This paper presents a framework for learning vision-based robotic policies for contact-rich manipulation tasks that generalize spatially across task configurations. We focus on achieving robust spatial generalization of the policy for the peg-in-hole (PiH) task trained from a small number of demonstrations. We propose EquiContact, a hierarchical policy composed of a high-level vision planner (Diffusion Equivariant Descriptor Field, Diff-EDF) and a novel low-level compliant visuomotor policy (Geometric Compliant ACT, G-CompACT). G-CompACT operates using only localized observations (geometrically consistent error vectors (GCEV), force-torque readings, and wrist-mounted RGB images) and produces actions defined in the end-effector frame. Through these design choices, we show that the entire EquiContact pipeline is SE(3)-equivariant, from perception to force control. We also outline three key components for spatially generalizable contact-rich policies: compliance, localized policies, and induced equivariance. Real-world experiments on PiH, screwing, and surface wiping tasks demonstrate a near-perfect success rate and robust generalization to unseen spatial configurations, validating the proposed framework and principles. The experimental videos can be found on the project website: https://sites.google.com/berkeley.edu/equicontact
comment: Submitted to RA-L
Quantum Machine Learning and Grover's Algorithm for Quantum Optimization of Robotic Manipulators
Optimizing high-degree of freedom robotic manipulators requires searching complex, high-dimensional configuration spaces, a task that is computationally challenging for classical methods. This paper introduces a quantum native framework that integrates quantum machine learning with Grover's algorithm to solve kinematic optimization problems efficiently. A parameterized quantum circuit is trained to approximate the forward kinematics model, which then constructs an oracle to identify optimal configurations. Grover's algorithm leverages this oracle to provide a quadratic reduction in search complexity. Demonstrated on simulated 1-DoF, 2-DoF, and dual-arm manipulator tasks, the method achieves significant speedups-up to 93x over classical optimizers like Nelder Mead as problem dimensionality increases. This work establishes a foundational, quantum-native framework for robot kinematic optimization, effectively bridging quantum computing and robotics problems.
CAT-RRT: Motion Planning that Admits Contact One Link at a Time
Current motion planning approaches rely on binary collision checking to evaluate the validity of a state and thereby dictate where the robot is allowed to move. This approach leaves little room for robots to engage in contact with an object, as is often necessary when operating in densely cluttered spaces. In this work, we propose an alternative method that considers contact states as high-cost states that the robot should avoid but can traverse if necessary to complete a task. More specifically, we introduce Contact Admissible Transition-based Rapidly exploring Random Trees (CAT-RRT), a planner that uses a novel per-link cost heuristic to find a path by traversing high-cost obstacle regions. Through extensive testing, we find that state-of-the-art optimization planners tend to over-explore low-cost states, which leads to slow and inefficient convergence to contact regions. Conversely, CAT-RRT searches both low and high-cost regions simultaneously with an adaptive thresholding mechanism carried out at each robot link. This leads to paths with a balance between efficiency, path length, and contact cost.
Efficient Path Planning and Task Allocation Algorithm for Boolean Specifications
This paper presents a novel path-planning and task assignment algorithm for multi-robot systems that should fulfill a global Boolean specification. The proposed method is based on Integer Linear Programming (ILP) formulations, which are combined with structural insights from Petri nets to improve scalability and computational efficiency. By proving that the \emph{constraint matrix} is totally unimodular (TU) for certain classes of problems, the ILP formulation can be relaxed into a Linear Programming (LP) problem without losing the integrality of the solution. This relaxation eliminates complex combinatorial techniques, significantly reducing computational overhead and thus ensuring scalability for large-scale systems. Using the approach proposed in this paper, we can solve path-planning problems for teams made up to 500 robots. The method guarantees computational tractability, handles collision avoidance and reduces computational demands through iterative LP optimization techniques. Case studies demonstrate the efficiency of the algorithm in generating scalable, collision-free paths for large robot teams navigating in complex environments. While the conservative nature of collision avoidance introduces additional constraints, and thus, computational requirements, the solution remains practical and impactful for diverse applications. The algorithm is particularly applicable to real-world scenarios, including warehouse logistics where autonomous robots must efficiently coordinate tasks or search-and-rescue operations in various environments. This work contributes both theoretically and practically to scalable multi-robot path planning and task allocation, offering an efficient framework for coordinating autonomous agents in shared environments.
Federated Deep Reinforcement Learning for Privacy-Preserving Robotic-Assisted Surgery
The integration of Reinforcement Learning (RL) into robotic-assisted surgery (RAS) holds significant promise for advancing surgical precision, adaptability, and autonomous decision-making. However, the development of robust RL models in clinical settings is hindered by key challenges, including stringent patient data privacy regulations, limited access to diverse surgical datasets, and high procedural variability. To address these limitations, this paper presents a Federated Deep Reinforcement Learning (FDRL) framework that enables decentralized training of RL models across multiple healthcare institutions without exposing sensitive patient information. A central innovation of the proposed framework is its dynamic policy adaptation mechanism, which allows surgical robots to select and tailor patient-specific policies in real-time, thereby ensuring personalized and Optimised interventions. To uphold rigorous privacy standards while facilitating collaborative learning, the FDRL framework incorporates secure aggregation, differential privacy, and homomorphic encryption techniques. Experimental results demonstrate a 60\% reduction in privacy leakage compared to conventional methods, with surgical precision maintained within a 1.5\% margin of a centralized baseline. This work establishes a foundational approach for adaptive, secure, and patient-centric AI-driven surgical robotics, offering a pathway toward clinical translation and scalable deployment across diverse healthcare environments.
comment: 11 pages, 7 figures, conference
On Robustness of Vision-Language-Action Model against Multi-Modal Perturbations
In Vision-Language-Action (VLA) models, robustness to real-world perturbations is critical for deployment. Existing methods target simple visual disturbances, overlooking the broader multi-modal perturbations that arise in actions, instructions, environments, and observations. Here, we first evaluate the robustness of mainstream VLAs under 17 perturbations across four modalities. We find (1) actions as the most fragile modality, (2) Existing visual-robust VLA do not gain robustness in other modality, and (3) pi0 demonstrates superior robustness with a diffusion-based action head. To build multi-modal robust VLAs, we propose RobustVLA against perturbations in VLA inputs and outputs. For output robustness, we perform offline robust optimization against worst-case action noise that maximizes mismatch in flow matching objective. This can be seen as adversarial training, label smoothing, and outlier penalization. For input robustness, we enforce consistent actions across input variations that preserve task semantics. To account for multiple perturbations, we formulate robustness as a multi-armed bandit problem and apply an upper confidence bound algorithm to automatically identify the most harmful noise. Experiments on LIBERO demonstrate our RobustVLA delivers absolute gains over baselines of 12.6% on the pi0 backbone and 10.4% on the OpenVLA backbone across all 17 perturbations, achieving 50.6x faster inference than existing visual-robust VLAs, and a 10.4% gain under mixed perturbations. Our RobustVLA is particularly effective on real-world FR5 robot with limited demonstrations, showing absolute gains by 65.6% under perturbations of four modalities.
Multiagent Systems
Tongyi DeepResearch Technical Report
We present Tongyi DeepResearch, an agentic large language model, which is specifically designed for long-horizon, deep information-seeking research tasks. To incentivize autonomous deep research agency, Tongyi DeepResearch is developed through an end-to-end training framework that combines agentic mid-training and agentic post-training, enabling scalable reasoning and information seeking across complex tasks. We design a highly scalable data synthesis pipeline that is fully automatic, without relying on costly human annotation, and empowers all training stages. By constructing customized environments for each stage, our system enables stable and consistent interactions throughout. Tongyi DeepResearch, featuring 30.5 billion total parameters, with only 3.3 billion activated per token, achieves state-of-the-art performance across a range of agentic deep research benchmarks, including Humanity's Last Exam, BrowseComp, BrowseComp-ZH, WebWalkerQA, xbench-DeepSearch, FRAMES and xbench-DeepSearch-2510. We open-source the model, framework, and complete solutions to empower the community.
comment: https://tongyi-agent.github.io/blog
Local Performance vs. Out-of-Distribution Generalization: An Empirical Analysis of Personalized Federated Learning in Heterogeneous Data Environments
In the context of Federated Learning with heterogeneous data environments, local models tend to converge to their own local model optima during local training steps, deviating from the overall data distributions. Aggregation of these local updates, e.g., with FedAvg, often does not align with the global model optimum (client drift), resulting in an update that is suboptimal for most clients. Personalized Federated Learning approaches address this challenge by exclusively focusing on the average local performances of clients' models on their own data distribution. Generalization to out-of-distribution samples, which is a substantial benefit of FedAvg and represents a significant component of robustness, appears to be inadequately incorporated into the assessment and evaluation processes. This study involves a thorough evaluation of Federated Learning approaches, encompassing both their local performance and their generalization capabilities. Therefore, we examine different stages within a single communication round to enable a more nuanced understanding of the considered metrics. Furthermore, we propose and incorporate a modified approach of FedAvg, designated as Federated Learning with Individualized Updates (FLIU), extending the algorithm by a straightforward individualization step with an adaptive personalization factor. We evaluate and compare the approaches empirically using MNIST and CIFAR-10 under various distributional conditions, including benchmark IID and pathological non-IID, as well as additional novel test environments with Dirichlet distribution specifically developed to stress the algorithms on complex data heterogeneity.
Affordance Representation and Recognition for Autonomous Agents
The autonomy of software agents is fundamentally dependent on their ability to construct an actionable internal world model from the structured data that defines their digital environment, such as the Document Object Model (DOM) of web pages and the semantic descriptions of web services. However, constructing this world model from raw structured data presents two critical challenges: the verbosity of raw HTML makes it computationally intractable for direct use by foundation models, while the static nature of hardcoded API integrations prevents agents from adapting to evolving services. This paper introduces a pattern language for world modeling from structured data, presenting two complementary architectural patterns. The DOM Transduction Pattern addresses the challenge of web page complexity by distilling} a verbose, raw DOM into a compact, task-relevant representation or world model optimized for an agent's reasoning core. Concurrently, the Hypermedia Affordances Recognition Pattern enables the agent to dynamically enrich its world model by parsing standardized semantic descriptions to discover and integrate the capabilities of unknown web services at runtime. Together, these patterns provide a robust framework for engineering agents that can efficiently construct and maintain an accurate world model, enabling scalable, adaptive, and interoperable automation across the web and its extended resources.
Law in Silico: Simulating Legal Society with LLM-Based Agents
Since real-world legal experiments are often costly or infeasible, simulating legal societies with Artificial Intelligence (AI) systems provides an effective alternative for verifying and developing legal theory, as well as supporting legal administration. Large Language Models (LLMs), with their world knowledge and role-playing capabilities, are strong candidates to serve as the foundation for legal society simulation. However, the application of LLMs to simulate legal systems remains underexplored. In this work, we introduce Law in Silico, an LLM-based agent framework for simulating legal scenarios with individual decision-making and institutional mechanisms of legislation, adjudication, and enforcement. Our experiments, which compare simulated crime rates with real-world data, demonstrate that LLM-based agents can largely reproduce macro-level crime trends and provide insights that align with real-world observations. At the same time, micro-level simulations reveal that a well-functioning, transparent, and adaptive legal system offers better protection of the rights of vulnerable individuals.
Can LLMs Write Faithfully? An Agent-Based Evaluation of LLM-generated Islamic Content NeurIPS 2025
Large language models are increasingly used for Islamic guidance, but risk misquoting texts, misapplying jurisprudence, or producing culturally inconsistent responses. We pilot an evaluation of GPT-4o, Ansari AI, and Fanar on prompts from authentic Islamic blogs. Our dual-agent framework uses a quantitative agent for citation verification and six-dimensional scoring (e.g., Structure, Islamic Consistency, Citations) and a qualitative agent for five-dimensional side-by-side comparison (e.g., Tone, Depth, Originality). GPT-4o scored highest in Islamic Accuracy (3.93) and Citation (3.38), Ansari AI followed (3.68, 3.32), and Fanar lagged (2.76, 1.82). Despite relatively strong performance, models still fall short in reliably producing accurate Islamic content and citations -- a paramount requirement in faith-sensitive writing. GPT-4o had the highest mean quantitative score (3.90/5), while Ansari AI led qualitative pairwise wins (116/200). Fanar, though trailing, introduces innovations for Islamic and Arabic contexts. This study underscores the need for community-driven benchmarks centering Muslim perspectives, offering an early step toward more reliable AI in Islamic knowledge and other high-stakes domains such as medicine, law, and journalism.
comment: Accepted at 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: 5th Muslims in Machine Learning (MusIML) Workshop
Policy Cards: Machine-Readable Runtime Governance for Autonomous AI Agents
Policy Cards are introduced as a machine-readable, deployment-layer standard for expressing operational, regulatory, and ethical constraints for AI agents. The Policy Card sits with the agent and enables it to follow required constraints at runtime. It tells the agent what it must and must not do. As such, it becomes an integral part of the deployed agent. Policy Cards extend existing transparency artifacts such as Model, Data, and System Cards by defining a normative layer that encodes allow/deny rules, obligations, evidentiary requirements, and crosswalk mappings to assurance frameworks including NIST AI RMF, ISO/IEC 42001, and the EU AI Act. Each Policy Card can be validated automatically, version-controlled, and linked to runtime enforcement or continuous-audit pipelines. The framework enables verifiable compliance for autonomous agents, forming a foundation for distributed assurance in multi-agent ecosystems. Policy Cards provide a practical mechanism for integrating high-level governance with hands-on engineering practice and enabling accountable autonomy at scale.
comment: First published on 19/10/2025. Canonical archived record and DOI: 10.5281/zenodo.17391796
Human Machine Social Hybrid Intelligence:A Collaborative Decision Making Framework for Large Model Agent Groups and Human Experts
The rapid advancements in large foundation models and multi-agent systems offer unprecedented capabilities, yet current Human-in-the-Loop (HiTL) paradigms inadequately integrate human expertise, often leading to cognitive overload and decision-making bottlenecks in complex, high-stakes environments. We propose the "Human-Machine Social Hybrid Intelligence" (HMS-HI) framework, a novel architecture designed for deep, collaborative decision-making between groups of human experts and LLM-powered AI agents. HMS-HI is built upon three core pillars: (1) a \textbf{Shared Cognitive Space (SCS)} for unified, multi-modal situational awareness and structured world modeling; (2) a \textbf{Dynamic Role and Task Allocation (DRTA)} module that adaptively assigns tasks to the most suitable agent (human or AI) based on capabilities and workload; and (3) a \textbf{Cross-Species Trust Calibration (CSTC)} protocol that fosters transparency, accountability, and mutual adaptation through explainable declarations and structured feedback. Validated in a high-fidelity urban emergency response simulation, HMS-HI significantly reduced civilian casualties by 72\% and cognitive load by 70\% compared to traditional HiTL approaches, demonstrating superior decision quality, efficiency, and human-AI trust. An ablation study confirms the critical contribution of each module, highlighting that engineered trust and shared context are foundational for scalable, synergistic human-AI collaboration.
Emergent Coordinated Behaviors in Networked LLM Agents: Modeling the Strategic Dynamics of Information Operations
Generative agents are rapidly advancing in sophistication, raising urgent questions about how they might coordinate when deployed in online ecosystems. This is particularly consequential in information operations (IOs), influence campaigns that aim to manipulate public opinion on social media. While traditional IOs have been orchestrated by human operators and relied on manually crafted tactics, agentic AI promises to make campaigns more automated, adaptive, and difficult to detect. This work presents the first systematic study of emergent coordination among generative agents in simulated IO campaigns. Using generative agent-based modeling, we instantiate IO and organic agents in a simulated environment and evaluate coordination across operational regimes, from simple goal alignment to team knowledge and collective decision-making. As operational regimes become more structured, IO networks become denser and more clustered, interactions more reciprocal and positive, narratives more homogeneous, amplification more synchronized, and hashtag adoption faster and more sustained. Remarkably, simply revealing to agents which other agents share their goals can produce coordination levels nearly equivalent to those achieved through explicit deliberation and collective voting. Overall, we show that generative agents, even without human guidance, can reproduce coordination strategies characteristic of real-world IOs, underscoring the societal risks posed by increasingly automated, self-organizing IOs.
Trust Dynamics in Strategic Coopetition: Computational Foundations for Requirements Engineering in Multi-Agent Systems
Requirements engineering increasingly occurs in multi-stakeholder environments where organizations simultaneously cooperate and compete, creating coopetitive relationships in which trust evolves dynamically based on observed behavior over repeated interactions. While conceptual modeling languages like i* represent trust relationships qualitatively, they lack computational mechanisms for analyzing how trust changes with behavioral evidence. Conversely, computational trust models from multi-agent systems provide algorithmic updating but lack grounding in requirements engineering contexts and conceptual models. This technical report bridges this gap by developing a computational trust model that extends game-theoretic foundations for strategic coopetition with dynamic trust evolution. We introduce trust as a two-layer system with immediate trust responding to current behavior and reputation tracking violation history. Trust evolves through asymmetric updating where cooperation builds trust gradually while violations erode it sharply, creating hysteresis effects and trust ceilings that constrain relationship recovery. We develop a structured translation framework enabling requirements engineers to instantiate computational trust models from i* dependency networks and organizational contexts. Comprehensive experimental validation across 78,125 parameter configurations establishes robust emergence of negativity bias, hysteresis effects, and cumulative damage amplification. Empirical validation using the Renault-Nissan Alliance case study (1999-2025) achieves 49 out of 60 validation points (81.7%), successfully reproducing documented trust evolution across five distinct relationship phases including crisis and recovery periods. This technical report builds upon its foundational companion work in arXiv:2510.18802.
comment: 62 pages, 20 figures, This technical report is the second in a research program and should be read in conjunction with its foundational companion work arXiv:2510.18802. It builds on the frameworks established in that prior work and also adapts and extends material on trustworthiness first presented in the doctoral dissertation 'Modeling Strategic Coopetition' (Pant, 2021, University of Toronto)
MASPRM: Multi-Agent System Process Reward Model
Practical deployment of Multi-Agent Systems (MAS) demands strong test-time performance, motivating methods that guide inference-time search and selectively spend compute to improve quality. We present the Multi-Agent System Process Reward Model (MASPRM). It assigns per-action, per-agent values to partial inter-agent transcripts and acts as an inference-time controller. MASPRM is trained from multi-agent Monte Carlo Tree Search (MCTS) rollouts without requiring step-level human annotations, by propagating returns to local targets. At inference, MASPRM guides step-level beam search and MCTS, focusing computation on promising branches and pruning early. On GSM8K and MATH, MASPRM-guided decoding with an outcome reward model (ORM) applied to the final answer, improves exact match (EM) over a single straight-through MAS pass by $+30.7$ and $+22.9$ points, respectively. A MASPRM trained on GSM8K transfers zero-shot to MATH without retraining, adding $8.4$ EM points at the same budget. MASPRM is a plug-in value model that estimates per-agent progress and complements verifier-style decoders, enabling more reliable, compute-aware multi-agent reasoning. Code: https://github.com/milad1378yz/MASPRM
From Narrative to Action: A Hierarchical LLM-Agent Framework for Human Mobility Generation
Understanding and replicating human mobility requires not only spatial-temporal accuracy but also an awareness of the cognitive hierarchy underlying real-world travel decisions. Traditional agent-based or deep learning models can reproduce statistical patterns of movement but fail to capture the semantic coherence and causal logic of human behavior. Large language models (LLMs) show potential, but struggle to balance creative reasoning with strict structural compliance. This study proposes a Hierarchical LLM-Agent Framework, termed Narrative-to-Action, that integrates high-level narrative reasoning, mid-level reflective planning, and low-level behavioral execution within a unified cognitive hierarchy. At the macro level, one agent is employed as a "creative writer" to produce diary-style narratives rich in motivation and context, then uses another agent as a "structural parser" to convert narratives into machine-readable plans. A dynamic execution module further grounds agents in geographic environments and enables adaptive behavioral adjustments guided by a novel occupation-aware metric, Mobility Entropy by Occupation (MEO), which captures heterogeneous schedule flexibility across different occupational personalities. At the micro level, the agent executes concrete actions-selecting locations, transportation modes, and time intervals-through interaction with an environmental simulation. By embedding this multi-layer cognitive process, the framework produces not only synthetic trajectories that align closely with real-world patterns but also interpretable representations of human decision logic. This research advances synthetic mobility generation from a data-driven paradigm to a cognition-driven simulation, providing a scalable pathway for understanding, predicting, and synthesizing complex urban mobility behaviors through hierarchical LLM agents.
comment: 47 pages, 3 figures
Central Bank Digital Currency, Flight-to-Quality, and Bank-Runs in an Agent-Based Model
We analyse financial stability and welfare impacts associated with the introduction of a Central Bank Digital Currency (CBDC) in a macroeconomic agent-based model. The model considers firms, banks, and households interacting on labour, goods, credit, and interbank markets. Households move their liquidity from deposits to CBDC based on the perceived riskiness of their banks. We find that the introduction of CBDC exacerbates bank-runs and may lead to financial instability phenomena. The effect can be changed by introducing a limit on CBDC holdings. The adoption of CBDC has little effect on macroeconomic variables but the interest rate on loans to firms goes up and credit goes down in a limited way. CBDC leads to a redistribution of wealth from firms and banks to households with a higher bank default rate. CBDC may have negative welfare effects, but a bound on holding enables a welfare improvement.
Long-Term Mapping of the Douro River Plume with Multi-Agent Reinforcement Learning
We study the problem of long-term (multiple days) mapping of a river plume using multiple autonomous underwater vehicles (AUVs), focusing on the Douro river representative use-case. We propose an energy - and communication - efficient multi-agent reinforcement learning approach in which a central coordinator intermittently communicates with the AUVs, collecting measurements and issuing commands. Our approach integrates spatiotemporal Gaussian process regression (GPR) with a multi-head Q-network controller that regulates direction and speed for each AUV. Simulations using the Delft3D ocean model demonstrate that our method consistently outperforms both single- and multi-agent benchmarks, with scaling the number of agents both improving mean squared error (MSE) and operational endurance. In some instances, our algorithm demonstrates that doubling the number of AUVs can more than double endurance while maintaining or improving accuracy, underscoring the benefits of multi-agent coordination. Our learned policies generalize across unseen seasonal regimes over different months and years, demonstrating promise for future developments of data-driven long-term monitoring of dynamic plume environments.
A cutting-surface consensus approach for distributed robust optimization of multi-agent systems
A novel and fully distributed optimization method is proposed for the distributed robust convex program (DRCP) over a time-varying unbalanced directed network under the uniformly jointly strongly connected (UJSC) assumption. Firstly, an approximated DRCP (ADRCP) is introduced by discretizing the semi-infinite constraints into a finite number of inequality constraints to ensure tractability and restricting the right-hand side of the constraints with a positive parameter to ensure a feasible solution for (DRCP) can be obtained. This problem is iteratively solved by a distributed projected gradient algorithm proposed in this paper, which is based on epigraphic reformulation and gradient projected operations. Secondly, a cutting-surface consensus approach is proposed for locating an approximately optimal consensus solution of the DRCP with guaranteed local feasibility for each agent. This approach is based on iteratively approximating the DRCP by successively reducing the restriction parameter of the right-hand constraints and adding the cutting-surfaces into the existing finite set of constraints. Thirdly, to ensure finite-time termination of the distributed optimization, a distributed termination algorithm is developed based on consensus and zeroth-order stopping conditions under UJSC graphs. Fourthly, it is proved that the cutting-surface consensus approach terminates finitely and yields a feasible and approximate optimal solution for each agent. Finally, the effectiveness of the approach is illustrated through a numerical example.
comment: 16 pages, 8 figures, published to IEEE TAC
Partially Observable Multi-Agent Reinforcement Learning with Information Sharing ICML 2023
We study provable multi-agent reinforcement learning (RL) in the general framework of partially observable stochastic games (POSGs). To circumvent the known hardness results and the use of computationally intractable oracles, we advocate leveraging the potential \emph{information-sharing} among agents, a common practice in empirical multi-agent RL, and a standard model for multi-agent control systems with communication. We first establish several computational complexity results to justify the necessity of information-sharing, as well as the observability assumption that has enabled quasi-polynomial time and sample single-agent RL with partial observations, for tractably solving POSGs. Inspired by the inefficiency of planning in the ground-truth model, we then propose to further \emph{approximate} the shared common information to construct an approximate model of the POSG, in which an approximate \emph{equilibrium} (of the original POSG) can be found in quasi-polynomial-time, under the aforementioned assumptions. Furthermore, we develop a partially observable multi-agent RL algorithm whose time and sample complexities are \emph{both} quasi-polynomial. Finally, beyond equilibrium learning, we extend our algorithmic framework to finding the \emph{team-optimal solution} in cooperative POSGs, i.e., decentralized partially observable Markov decision processes, a more challenging goal. We establish concrete computational and sample complexities under several structural assumptions of the model. We hope our study could open up the possibilities of leveraging and even designing different \emph{information structures}, a well-studied notion in control theory, for developing both sample- and computation-efficient partially observable multi-agent RL.
comment: Journal extension of the conference version at ICML 2023 accepted to SIAM Journal on Control and Optimization (SICON)
HAMLET: Hyperadaptive Agent-based Modeling for Live Embodied Theatrics
Creating an immersive and interactive theatrical experience is a long-term goal in the field of interactive narrative. The emergence of large language model (LLM) is providing a new path to achieve this goal. However, existing LLM-based drama generation methods often result in agents that lack initiative and cannot interact with the physical scene. Furthermore, these methods typically require detailed user input to drive the drama. These limitations reduce the interactivity and immersion of online real-time performance. To address the above challenges, we propose HAMLET, a multi-agent framework focused on drama creation and online performance. Given a simple topic, the framework generates a narrative blueprint, guiding the subsequent improvisational performance. During the online performance, each actor is given an autonomous mind. This means that actors can make independent decisions based on their own background, goals, and emotional state. In addition to conversations with other actors, their decisions can also change the state of scene props through actions such as opening a letter or picking up a weapon. The change is then broadcast to other related actors, updating what they know and care about, which in turn influences their next action. To evaluate the quality of drama performance generated by HAMLET, we designed an evaluation method to assess three primary aspects, including character performance, narrative quality, and interaction experience. The experimental evaluation shows that HAMLET can create expressive and coherent theatrical experiences.
Systems and Control (CS)
Feature Matching-Based Gait Phase Prediction for Obstacle Crossing Control of Powered Transfemoral Prosthesis
For amputees with powered transfemoral prosthetics, navigating obstacles or complex terrain remains challenging. This study addresses this issue by using an inertial sensor on the sound ankle to guide obstacle-crossing movements. A genetic algorithm computes the optimal neural network structure to predict the required angles of the thigh and knee joints. A gait progression prediction algorithm determines the actuation angle index for the prosthetic knee motor, ultimately defining the necessary thigh and knee angles and gait progression. Results show that when the standard deviation of Gaussian noise added to the thigh angle data is less than 1, the method can effectively eliminate noise interference, achieving 100\% accuracy in gait phase estimation under 150 Hz, with thigh angle prediction error being 8.71\% and knee angle prediction error being 6.78\%. These findings demonstrate the method's ability to accurately predict gait progression and joint angles, offering significant practical value for obstacle negotiation in powered transfemoral prosthetics.
comment: 6 pages, conference
Learning to Drive Safely with Hybrid Options
Out of the many deep reinforcement learning approaches for autonomous driving, only few make use of the options (or skills) framework. That is surprising, as this framework is naturally suited for hierarchical control applications in general, and autonomous driving tasks in specific. Therefore, in this work the options framework is applied and tailored to autonomous driving tasks on highways. More specifically, we define dedicated options for longitudinal and lateral manoeuvres with embedded safety and comfort constraints. This way, prior domain knowledge can be incorporated into the learning process and the learned driving behaviour can be constrained more easily. We propose several setups for hierarchical control with options and derive practical algorithms following state-of-the-art reinforcement learning techniques. By separately selecting actions for longitudinal and lateral control, the introduced policies over combined and hybrid options obtain the same expressiveness and flexibility that human drivers have, while being easier to interpret than classical policies over continuous actions. Of all the investigated approaches, these flexible policies over hybrid options perform the best under varying traffic conditions, outperforming the baseline policies over actions.
Efficient Network Reconfiguration by Randomized Switching
We present an algorithm that efficiently computes nearly-optimal solutions to a class of combinatorial reconfiguration problems on weighted, undirected graphs. Inspired by societally relevant applications in networked infrastructure systems, these problems consist of simultaneously finding an unreweighted sparsified graph and nodal potentials that satisfy fixed demands, where the objective is to minimize some congestion criterion, e.g., a Laplacian quadratic form. These are mixed-integer nonlinear programming problems that are NP-hard in general. To circumvent these challenges, instead of solving for a single best configuration, the proposed randomized switching algorithm seeks to design a distribution of configurations that, when sampled, ensures that congestion concentrates around its optimum. We show that the proposed congestion metric is a generalized self-concordant function in the space of switching probabilities, which enables the use of efficient and simple conditional gradient methods. We implement our algorithm and show that it outperforms a state-of-the-art commercial mixed-integer second-order cone programming (MISOCP) solver by orders of magnitude over a large range of problem sizes.
Flatness-based trajectory planning for 3D overhead cranes with friction compensation and collision avoidance
This paper presents an optimal trajectory generation method for 3D overhead cranes by leveraging differential flatness. This framework enables the direct inclusion of complex physical and dynamic constraints, such as nonlinear friction and collision avoidance for both payload and rope. Our approach allows for aggressive movements by constraining payload swing only at the final point. A comparative simulation study validates our approach, demonstrating that neglecting dry friction leads to actuator saturation and collisions. The results show that friction modeling is a fundamental requirement for fast and safe crane trajectories.
comment: 8 pages, 11 figures
Analyzing Parametric Oscillator Ising Machines through the Kuramoto Lens
Networks of coupled nonlinear oscillators are emerging as powerful physical platforms for implementing Ising machines. Yet the relationship between parametric-oscillator implementations and traditional oscillator-based Ising machines remains underexplored. In this work, we develop a Kuramoto-style, canonical phase description of parametric oscillator Ising machines by starting from the Stuart-Landau oscillator model -- the canonical normal form near a Hopf bifurcation, and a natural reduced description for many parametric oscillator implementations such as the degenerate optical parametric oscillator (DOPO) among others. The resulting phase dynamics combine the usual phase-difference coupling observed in the standard Kuramoto model along with an intrinsic phase sum term that is generated when conjugate coupling is considered. Moreover, our formulation helps explain why explicit second-harmonic driving is unnecessary in parametric oscillators and also reveals how quasi-steady amplitude heterogeneity scales the original strength of the spin interaction with potentially adverse impacts on the solution quality. Our work helps develop a unifying view of the oscillator-based approach to designing Ising machines.
Contributions to Semialgebraic-Set-Based Stability Verification of Dynamical Systems with Neural-Network-Based Controllers
Neural-network-based controllers (NNCs) can represent complex, highly nonlinear control laws, but verifying the closed-loop stability of dynamical systems using them remains challenging. This work presents contributions to a state-of-the-art stability verification procedure for NNC-controlled systems which relies on semialgebraic-set-based input-output modeling to pose the search for a Lyapunov function as an optimization problem. Specifically, this procedure's conservatism when analyzing NNCs using transcendental activation functions and the restriction to feedforward NNCs are addressed by a) introducing novel semialgebraic activation functions that preserve key properties of common transcendental activations and b) proving compatibility of NNCs from the broader class of recurrent equilibrium networks (RENs) with this procedure. Furthermore, the indirect optimization of a local region of attraction (RoA) estimate using a restricted set of candidate Lyapunov functions is greatly improved via c) the introduction of a richer parameterization of candidate Lyapunov functions than previously reported and d) the formulation of novel semidefinite programs (SDPs) that directly optimize the resulting RoA estimate. The value of these contributions is highlighted in two numerical examples.
comment: Submitted to the IEEE for possible publication, 16 pages, 6 figures
Development of a Digital Twin for an Electric Vehicle Emulator Modeling, Control, and Experimental Validation
This paper presents the development and validation of a digital twin for a scaled-down electric vehicle (EV) emulator, designed to replicate longitudinal vehicle dynamics under diverse operating conditions. The emulator integrates a separately excited DC motor (SEDCM), a four-quadrant DC-DC converter, a battery emulator, and a mechanical load emulator. The system models tractive effort, aerodynamic drag, and gradient resistance using Newton's second law. In contrast to conventional graphical modeling tools (e.g., block diagrams and bond graphs), the adopted Energetic Macroscopic Representation (EMR) framework offers clear advantages by explicitly representing energy interactions and facilitating the systematic derivation of control structures. A control strategy developed within this framework governs energy flow across the powertrain, enabling accurate speed control via armature voltage regulation. Experimental tests conducted on a Lucas-Nulle test bench show strong correlation with simulation results. The study also introduces a methodology to compute the maximum admissible vehicle mass - determined to be 13.5 kg for a 180 W motor operating at 1900 rpm - based on acceleration and slope constraints. Furthermore, a switching algorithm for the bidirectional converter ensures reliable four quadrant operation. Overall, the proposed framework provides a scalable and effective approach for EV emulation, control design, and energy management validation.
comment: 6 pages, Accepted at CODIT 2025 (Conference on Decision and Control in Intelligent Technology)
Mechanism-Guided Residual Lifting and Control Consistent Modeling for Pneumatic Drying Processes
Pneumatic drying processes in industries such as agriculture, chemicals,and pharmaceuticals are notoriously difficult to model and control due to multi-source disturbances,coupled stage dynamics, and significant measurement delays. Traditional modeling paradigms often fail to simultaneously deliver accuracy, interpretability, and closed-loop applicability. To address this challenge, this paper introduces a unified hybrid modeling framework, termed Physics-Guided Residual Lifting with Control-Consistent Correction,which integrates a transient mechanistic model with a stability-constrained data-driven component. The framework covers the complete process chain of drying, transport, and winnowing. On the mechanistic level, the model unifies mass transfer dynamics using the partial pressure difference of water vapor, incorporates water activity clamping and latent heat corrections for bound water, and ensures energy closure with moisture-dependent specific heat. On the data-driven level,we propose an orthogonal residual learning scheme. It leverages intermediate states from the mechanistic model as proxy variables to construct a physics-inspired dictionary, preventing parameter compensation and overfitting during ridge regression. Furthermore, to ensure suitability for predictive control, a Control-Consistent Extended Dynamic Mode Decomposition with stability constraints is employed to learn the residual dynamics, for which we provide boundedness proofs and stability guarantees. The framework was validated on 10 industrial batches, comprising 63,000 samples. On unseen test data, the hybrid model achieved a Mean Absolute Error of 0.016% for outlet moisture and 0.015 {\deg}C for outlet temperature, with values improving to 0.986 and 0.995, respectively. The resulting prediction residuals exhibit white-noise characteristics, with significantly reduced spectral energy at low frequencies.
comment: 6figs,4tables
An N-of-1 Artificial Intelligence Ecosystem for Precision Medicine
Artificial intelligence in medicine is built to serve the average patient. By minimizing error across large datasets, most systems deliver strong aggregate accuracy yet falter at the margins: patients with rare variants, multimorbidity, or underrepresented demographics. This average patient fallacy erodes both equity and trust. We propose a different design: a multi-agent ecosystem for N-of-1 decision support. In this environment, agents clustered by organ systems, patient populations, and analytic modalities draw on a shared library of models and evidence synthesis tools. Their results converge in a coordination layer that weighs reliability, uncertainty, and data density before presenting the clinician with a decision-support packet: risk estimates bounded by confidence ranges, outlier flags, and linked evidence. Validation shifts from population averages to individual reliability, measured by error in low-density regions, calibration in the small, and risk--coverage trade-offs. Anticipated challenges include computational demands, automation bias, and regulatory fit, addressed through caching strategies, consensus checks, and adaptive trial frameworks. By moving from monolithic models to orchestrated intelligence, this approach seeks to align medical AI with the first principle of medicine: care that is transparent, equitable, and centered on the individual.
comment: This study has been supported by grants from the National Institutes of Health: The National Institute on Aging R01AG074372 and The National Institute of Allergy and Infectious Diseases R01AI165535
Survey and Tutorial of Reinforcement Learning Methods in Process Systems Engineering
Sequential decision making under uncertainty is central to many Process Systems Engineering (PSE) challenges, where traditional methods often face limitations related to controlling and optimizing complex and stochastic systems. Reinforcement Learning (RL) offers a data-driven approach to derive control policies for such challenges. This paper presents a survey and tutorial on RL methods, tailored for the PSE community. We deliver a tutorial on RL, covering fundamental concepts and key algorithmic families including value-based, policy-based and actor-critic methods. Subsequently, we survey existing applications of these RL techniques across various PSE domains, such as in fed-batch and continuous process control, process optimization, and supply chains. We conclude with PSE focused discussion of specialized techniques and emerging directions. By synthesizing the current state of RL algorithm development and implications for PSE this work identifies successes, challenges, trends, and outlines avenues for future research at the interface of these fields.
A comparison between joint and dual UKF implementations for state estimation and leak localization in water distribution networks
The sustainability of modern cities highly depends on efficient water distribution management, including effective pressure control and leak detection and localization. Accurate information about the network hydraulic state is therefore essential. This article presents a comparison between two data-driven state estimation methods based on the Unscented Kalman Filter (UKF), fusing pressure, demand and flow data for head and flow estimation. One approach uses a joint state vector with a single estimator, while the other uses a dual-estimator scheme. We analyse their main characteristics, discussing differences, advantages and limitations, and compare them theoretically in terms of accuracy and complexity. Finally, we show several estimation results for the L-TOWN benchmark, allowing to discuss their properties in a real implementation.
comment: This work has been submitted to ECC2026 for review. It has 7 pages and 2 figures
Sample-based Moving Horizon Estimation
In this paper, we propose a sample-based moving horizon estimation (MHE) scheme for general nonlinear systems to estimate the current system state using irregularly and/or infrequently available measurements. The cost function of the MHE optimization problem is suitably designed to accommodate these irregular output sequences. We also establish that, under a suitable sample-based detectability condition known as sample-based incremental input/output-to-state stability (i-IOSS), the proposed sample-based MHE achieves robust global exponential stability (RGES). Additionally, for the case of linear systems, we draw connections between sample-based observability and sample-based i-IOSS. This demonstrates that previously established conditions for linear systems to be sample-based observable can be utilized to verify or design sampling strategies that satisfy the conditions to guarantee RGES of the sample-based MHE. Finally, the effectiveness of the proposed sample-based MHE is illustrated through a simulation example.
Improved Accuracy of Robot Localization Using 3-D LiDAR in a Hippocampus-Inspired Model
Boundary Vector Cells (BVCs) are a class of neurons in the brains of vertebrates that encode environmental boundaries at specific distances and allocentric directions, playing a central role in forming place fields in the hippocampus. Most computational BVC models are restricted to two-dimensional (2D) environments, making them prone to spatial ambiguities in the presence of horizontal symmetries in the environment. To address this limitation, we incorporate vertical angular sensitivity into the BVC framework, thereby enabling robust boundary detection in three dimensions, and leading to significantly more accurate spatial localization in a biologically-inspired robot model. The proposed model processes LiDAR data to capture vertical contours, thereby disambiguating locations that would be indistinguishable under a purely 2D representation. Experimental results show that in environments with minimal vertical variation, the proposed 3D model matches the performance of a 2D baseline; yet, as 3D complexity increases, it yields substantially more distinct place fields and markedly reduces spatial aliasing. These findings show that adding a vertical dimension to BVC-based localization can significantly enhance navigation and mapping in real-world 3D spaces while retaining performance parity in simpler, near-planar scenarios.
comment: 8 pages, 9 figures, Presented at the 2025 International Joint Conference on Neural Networks, Rome, July 2025
Defect Mitigation for Robot Arm-based Additive Manufacturing Utilizing Intelligent Control and IOT
This paper presents an integrated robotic fused deposition modeling additive manufacturing system featuring closed-loop thermal control and intelligent in-situ defect correction using a 6-degree of freedom robotic arm and an Oak-D camera. The robot arm end effector was modified to mount an E3D hotend thermally regulated by an IoT microcontroller, enabling precise temperature control through real-time feedback. Filament extrusion system was synchronized with robotic motion, coordinated via ROS2, ensuring consistent deposition along complex trajectories. A vision system based on OpenCV detects layer-wise defects position, commanding autonomous re-extrusion at identified sites. Experimental validation demonstrated successful defect mitigation in printing operations. The integrated system effectively addresses challenges real-time quality assurance. Inverse kinematics were used for motion planning, while homography transformations corrected camera perspectives for accurate defect localization. The intelligent system successfully mitigated surface anomalies without interrupting the print process. By combining real-time thermal regulation, motion control, and intelligent defect detection & correction, this architecture establishes a scalable and adaptive robotic additive manufacturing framework suitable for aerospace, biomedical, and industrial applications.
comment: This Paper Has Accepted at ASME 2025 International Mechanical Engineering Congress and Exposition (IMECE 2025)
A Hamilton-Jacobi Reachability Framework with Soft Constraints for Safety-Critical Systems
Traditional reachability methods provide formal guarantees of safety under bounded disturbances. However, they strictly enforce state constraints as inviolable, which can result in overly conservative or infeasible solutions in complex operational scenarios. Many constraints encountered in practice, such as bounds on battery state of charge in electric vehicles, recommended speed envelopes, and comfort constraints in passenger-carrying vehicles, are inherently soft. Soft constraints allow temporary violations within predefined safety margins to accommodate uncertainty and competing operational demands, albeit at a cost such as increased wear or higher operational expenses. This paper introduces a novel soft-constrained reachability framework that extends Hamilton-Jacobi reachability analysis for the formal verification of safety-critical systems subject to both hard and soft constraints. Specifically, the framework characterizes a subset of the state space, referred to as the soft-constrained reach-avoid set, from which the system is guaranteed to reach a desired set safely, under worst-case disturbances, while ensuring that cumulative soft-constraint violations remain within a user-specified budget. The framework comprises two principal components: (i) an augmented-state model with an auxiliary budget state that tracks soft-constraint violations, and (ii) a regularization-based approximation of the discontinuous Hamilton-Jacobi value function associated with the reach-avoid differential game studied herein. The effectiveness of the proposed framework is demonstrated through numerical examples involving the landing of a simple point-mass model and a fixed-wing aircraft executing an emergency descent, both under wind disturbances. The simulation results validate the framework's ability to simultaneously manage both hard and soft constraints in safety-critical settings
Delay Tolerant Control for Autonomous Driving Using CDOB
With the rapid growth of autonomous vehicle technologies, effective path-tracking control has become a critical component in ensuring safety and efficiency in complex traffic scenarios. When a high level decision making agent generates a collision free path, a robust low level controller is required to precisely follow this trajectory. However, connected autonomous vehicles (CAV) are inherently affected by communication delays and computation delays, which significantly degrade the performance of conventional controllers such as PID or other more advanced controllers like disturbance observers (DOB). While DOB-based designs have shown effectiveness in rejecting disturbances under nominal conditions, their performance deteriorates considerably in the presence of unknown time delays. To address this challenge, this paper proposes a delay-tolerant communication disturbance observer (CDOB) framework for path-tracking control in delayed systems. The proposed CDOB compensates for the adverse effects of time delays, maintaining accurate trajectory tracking even under uncertain and varying delay conditions. It is shown through a simulation study that the proposed control architecture maintains close alignment with the reference trajectory across various scenarios, including single lane change, double-= lane change, and Elastic Band generated collision avoidance paths under various time delays. Simulation results further demonstrate that the proposed method outperforms conventional approaches in both tracking accuracy and delay robustness, making it well suited for autonomous driving applications.
Decentralized Merging Control of Connected and Automated Vehicles to Enhance Safety and Energy Efficiency using Control Barrier Functions
This paper presents a decentralized Control Barrier Function (CBF) based approach for highway merging of Connected and Automated Vehicles (CAVs). In this control algorithm, each "host" vehicle negotiates with other agents in a control zone of the highway network, and enacts its own action, to perform safe and energy-efficient merge maneuvers. It uses predictor-corrector loops within the robust CBF setting for negotiation and to reconcile disagreements that may arise. There is no explicit order of vehicles and no priority. A notable feature is absence of gridlocks due to instability of the inter-agent system. Results from Monte Carlo simulations show significant improvement in the system-wide energy efficiency and traffic flow compared to a first-in-first-out approach, as well as enhanced robustness of the proposed decentralized controller compared to its centralized counterpart.
comment: This work has been submitted to a conference for possible publication and is under review. Paper summary: 8 pages, 5 figures, 2 tables
Deep Reinforcement Learning Approach to QoSAware Load Balancing in 5G Cellular Networks under User Mobility and Observation Uncertainty
Efficient mobility management and load balancing are critical to sustaining Quality of Service (QoS) in dense, highly dynamic 5G radio access networks. We present a deep reinforcement learning framework based on Proximal Policy Optimization (PPO) for autonomous, QoS-aware load balancing implemented end-to-end in a lightweight, pure-Python simulation environment. The control problem is formulated as a Markov Decision Process in which the agent periodically adjusts Cell Individual Offset (CIO) values to steer user-cell associations. A multi-objective reward captures key performance indicators (aggregate throughput, latency, jitter, packet loss rate, Jain's fairness index, and handover count), so the learned policy explicitly balances efficiency and stability under user mobility and noisy observations. The PPO agent uses an actor-critic neural network trained from trajectories generated by the Python simulator with configurable mobility (e.g., Gauss-Markov) and stochastic measurement noise. Across 500+ training episodes and stress tests with increasing user density, the PPO policy consistently improves KPI trends (higher throughput and fairness, lower delay, jitter, packet loss, and handovers) and exhibits rapid, stable convergence. Comparative evaluations show that PPO outperforms rule-based ReBuHa and A3 as well as the learning-based CDQL baseline across all KPIs while maintaining smoother learning dynamics and stronger generalization as load increases. These results indicate that PPO's clipped policy updates and advantage-based training yield robust, deployable control for next-generation RAN load balancing using an entirely Python-based toolchain.
NeuroDOB: A Deep Neural Observer-Based Controller for Vehicle Lateral Dynamics
This paper proposes NeuroDOB, a deep neural network based observer controller for vehicle lateral dynamics, which replaces the conventional disturbance observer (DOB) with a deep neural network (DNN) to enhance personalized lateral control. Unlike conventional DOBs that compensate for general disturbances such as road friction variation and crosswind, NeuroDOB explicitly addresses unmodeled vehicle dynamics and driver-specific behaviors by learning the steering compensation signal from driver-in-the-loop simulations using CarSim's embedded controller as a surrogate driver. The proposed architecture integrates NeuroDOB with a linear quadratic regulator (LQR), where the DNN outputs a delta error correction added to the baseline LQR steering input to produce the final control command. Input features to the DNN include lateral position and yaw angle errors, and the LQR control input. Experimental validation using a lateral dynamic bicycle model within CarSim demonstrates that NeuroDOB effectively adapts to individual driving habits, improving lateral control performance beyond what conventional LQR controllers achieve. The results indicate the potential of deep neural network based observer to enable personalized and adaptive autonomous vehicle control. In cognitive terms, the proposed architecture can be viewed as a dual-system control structure. The baseline LQR corresponds to System 1, a model-based, fast, and analytic reasoning layer ensuring stability. The NeuroDOB acts as System 2, a reflective, data-driven layer that learns compensation from experience and corrects the analytical bias of System 1. Together, they form an integrated decision process analogous to human intuition-reflection interaction, enabling both stability and adaptability in lateral control.
comment: 12 pages, 16 figures
Long-Term Mapping of the Douro River Plume with Multi-Agent Reinforcement Learning
We study the problem of long-term (multiple days) mapping of a river plume using multiple autonomous underwater vehicles (AUVs), focusing on the Douro river representative use-case. We propose an energy - and communication - efficient multi-agent reinforcement learning approach in which a central coordinator intermittently communicates with the AUVs, collecting measurements and issuing commands. Our approach integrates spatiotemporal Gaussian process regression (GPR) with a multi-head Q-network controller that regulates direction and speed for each AUV. Simulations using the Delft3D ocean model demonstrate that our method consistently outperforms both single- and multi-agent benchmarks, with scaling the number of agents both improving mean squared error (MSE) and operational endurance. In some instances, our algorithm demonstrates that doubling the number of AUVs can more than double endurance while maintaining or improving accuracy, underscoring the benefits of multi-agent coordination. Our learned policies generalize across unseen seasonal regimes over different months and years, demonstrating promise for future developments of data-driven long-term monitoring of dynamic plume environments.
Assessment of Power System Stability Considering Multiple Time-Scale Dynamics: Insights into Hopf Bifurcations in Presence of GFL and GFM IBRs
Real power systems exhibit dynamics that evolve across a wide range of time scales, from very fast to very slow phenomena. Historically, incorporating these wide-ranging dynamics into a single model has been impractical. As a result, power engineers rely on time-scale decomposition to simplify models. When fast phenomena are evaluated, slow dynamics are neglected (assumed stable), and vice versa. This paper challenges this paradigm by showing the importance of assessing power system stability while considering multiple time scales simultaneously. Using the concept of Hopf bifurcations, it exemplifies instability issues that would be missed if multi-time-scale dynamics are not considered. Although this work employs both grid-following and grid-forming inverter-based resource models, it is not a direct comparison. Instead, it presents a case study demonstrating how one technology can complement the other from a multi time-scale dynamics perspective.
comment: 7 pages
Two-Stage Learning of Stabilizing Neural Controllers via Zubov Sampling and Iterative Domain Expansion NeurIPS 2025
Learning-based neural network (NN) control policies have shown impressive empirical performance. However, obtaining stability guarantees and estimates of the region of attraction of these learned neural controllers is challenging due to the lack of stable and scalable training and verification algorithms. Although previous works in this area have achieved great success, much conservatism remains in their frameworks. In this work, we propose a novel two-stage training framework to jointly synthesize a controller and a Lyapunov function for continuous-time systems. By leveraging a Zubov-inspired region of attraction characterization to directly estimate stability boundaries, we propose a novel training-data sampling strategy and a domain-updating mechanism that significantly reduces the conservatism in training. Moreover, unlike existing works on continuous-time systems that rely on an SMT solver to formally verify the Lyapunov condition, we extend state-of-the-art neural network verifier $\alpha,\!\beta$-CROWN with the capability of performing automatic bound propagation through the Jacobian of dynamical systems and a novel verification scheme that avoids expensive bisection. To demonstrate the effectiveness of our approach, we conduct numerical experiments by synthesizing and verifying controllers on several challenging nonlinear systems across multiple dimensions. We show that our training can yield region of attractions with volume $5 - 1.5\cdot 10^{5}$ times larger compared to the baselines, and our verification on continuous systems can be up to $40-10{,}000$ times faster compared to the traditional SMT solver dReal. Our code is available at https://github.com/Verified-Intelligence/Two-Stage_Neural_Controller_Training.
comment: NeurIPS 2025
MathBode: Understanding LLM Reasoning with Dynamical Systems
This paper presents MathBode, a dynamic diagnostic for mathematical reasoning in large language models (LLMs). Instead of one-shot accuracy, MathBode treats each parametric problem as a system: we drive a single parameter sinusoidally and fit first-harmonic responses of model outputs and exact solutions. This yields interpretable, frequency-resolved metrics -- gain (amplitude tracking) and phase (lag) -- that form Bode-style fingerprints. Across five closed-form families (linear solve, ratio/saturation, compound interest, 2x2 linear systems, similar triangles), the diagnostic surfaces systematic low-pass behavior and growing phase lag that accuracy alone obscures. We compare several models against a symbolic baseline that calibrates the instrument ($G \approx 1$, $\phi \approx 0$). Results separate frontier from mid-tier models on dynamics, providing a compact, reproducible protocol that complements standard benchmarks with actionable measurements of reasoning fidelity and consistency. We open-source the dataset and code to enable further research and adoption.
Learning Wireless Interference Patterns: Decoupled GNN for Throughput Prediction in Heterogeneous Multi-Hop p-CSMA Networks
The p-persistent CSMA protocol is central to random-access MAC analysis, but predicting saturation throughput in heterogeneous multi-hop wireless networks remains a hard problem. Simplified models that assume a single, shared interference domain can underestimate throughput by 48-62% in sparse topologies. Exact Markov-chain analyses are accurate but scale exponentially in computation time, making them impractical for large networks. These computational barriers motivate structural machine learning approaches like GNNs for scalable throughput prediction in general network topologies. Yet off-the-shelf GNNs struggle here: a standard GCN yields 63.94% normalized mean absolute error (NMAE) on heterogeneous networks because symmetric normalization conflates a node's direct interference with higher-order, cascading effects that pertain to how interference propagates over the network graph. Building on these insights, we propose the Decoupled Graph Convolutional Network (D-GCN), a novel architecture that explicitly separates processing of a node's own transmission probability from neighbor interference effects. D-GCN replaces mean aggregation with learnable attention, yielding interpretable, per-neighbor contribution weights while capturing complex multihop interference patterns. D-GCN attains 3.3% NMAE, outperforms strong baselines, remains tractable even when exact analytical methods become computationally infeasible, and enables gradient-based network optimization that achieves within 1% of theoretical optima.
Flight-Ready Precise and Robust Carrier-Phase GNSS Navigation Software for Distributed Space Systems
This paper presents the full requirements analysis, design, development, and testing of high-precision navigation flight software for Distributed Space Systems (DSS) using Carrier Phase Differential GNSS (CDGNSS). Five main contributions are made. First, a survey of flown and upcoming DSS missions with stringent precision requirements is conducted, from which a thorough requirements analysis is distilled to guide development and testing. Second, a real-time navigation functional architecture is designed, and adopts a sparse and regularized Consider Kalman Filter with options for numerical stability in-flight. The filter rigorously accounts for uncertainties in process noise, measurement noise, and biases. It tracks float ambiguities with integer resolution where possible. The covariance correlation structure is preserved under all navigation modes, including contingencies and outages. Third, a lightweight, memoryless Fault Detection, Isolation, and Recovery (FDIR) module is developed to guard against anomalous measurements, providing statistical screening and ensuring robust navigation. Fourth, the software architecture is proposed for ease of integration, with strategies presented for modularity and computational efficiency tailored to constrained flight systems. Fifth, a comprehensive test campaign is conducted, mapped to a requirements verification matrix, spanning unit, interface, software-in-the-loop, and real-time hardware-in-the-loop tests, emphasizing gradual test fidelity for efficient fault isolation. Finally, flight-like results are demonstrated using the VISORS mission, due to the generalizability of the VISORS navigation operations, and the stringency which demands sub-centimeter relative position and sub-millimeter-per-second velocity accuracy. This architecture aims to serve as a reference for next-generation DSS missions adopting CDGNSS.
A Volumetric Privacy Measure for Dynamical Systems With Bounded Disturbance
In this paper, we first present a volumetric privacy measure for dynamical systems with bounded disturbances, wherein the states of the system contain private information and an adversary with access to sensor measurements attempts to infer the set of potential values of the private information. Under the proposed privacy measure, the volume of the uncertainty set of the adversary given the sensor measurements is considered as the privacy level of the system. We next characteristic the time evolution of the proposed privacy measure and study its properties for a particular system with both public and private states, where a set containing the public state is shared as the observation. Approximate set-membership estimation techniques are developed to compute the private-state uncertainty set, and the properties of the privacy measure are analyzed, demonstrating that the uncertainty reduction of the adversary is bounded by the information gain from the observation set. Furthermore, an optimization-based privacy filter design problem is formulated, employing randomization and linear programming to enhance the privacy level. The effectiveness of the proposed approach is validated through a production-inventory case study. Results show that the optimal privacy filter significantly improves robustness against inference attacks and outperforms two baseline mechanisms based on additive noise and quantization.
A cutting-surface consensus approach for distributed robust optimization of multi-agent systems
A novel and fully distributed optimization method is proposed for the distributed robust convex program (DRCP) over a time-varying unbalanced directed network under the uniformly jointly strongly connected (UJSC) assumption. Firstly, an approximated DRCP (ADRCP) is introduced by discretizing the semi-infinite constraints into a finite number of inequality constraints to ensure tractability and restricting the right-hand side of the constraints with a positive parameter to ensure a feasible solution for (DRCP) can be obtained. This problem is iteratively solved by a distributed projected gradient algorithm proposed in this paper, which is based on epigraphic reformulation and gradient projected operations. Secondly, a cutting-surface consensus approach is proposed for locating an approximately optimal consensus solution of the DRCP with guaranteed local feasibility for each agent. This approach is based on iteratively approximating the DRCP by successively reducing the restriction parameter of the right-hand constraints and adding the cutting-surfaces into the existing finite set of constraints. Thirdly, to ensure finite-time termination of the distributed optimization, a distributed termination algorithm is developed based on consensus and zeroth-order stopping conditions under UJSC graphs. Fourthly, it is proved that the cutting-surface consensus approach terminates finitely and yields a feasible and approximate optimal solution for each agent. Finally, the effectiveness of the approach is illustrated through a numerical example.
comment: 16 pages, 8 figures, published to IEEE TAC
Operational Risks in Grid Integration of Large Data Center Loads: Characteristics, Stability Assessments, and Sensitivity Studies
This paper investigates the dynamic interactions between large-scale data centers and the power grid, focusing on reliability challenges arising from sudden fluctuations in demand. With the rapid growth of AI-driven workloads, such fluctuations, along with fast ramp patterns, are expected to exacerbate stressed grid conditions and system instabilities. We consider a few open-source AI data center consumption profiles from the MIT supercloud datasets, along with generating a few experimental HPC job-distribution-based inference profiles. Subsequently, we develop analytical methodologies for real-time assessment of grid stability, focusing on both transient and small-signal stability assessments. Energy-flow-like metrics for nonlinear transient stability, formulated by computing localized data center bus kinetic-like flows and coupling interactions with neighboring buses over varying time windows, help provide operators with real-time assessments of the regional grid stress in the data center hubs. On the other hand, small-signal stability metrics, constructed from analytical state matrices under variable operating conditions during a fast ramping period, enable snapshot-based assessments of data center load fluctuations and provide enhanced observability into evolving grid conditions. By quantifying the stability impacts of large data center clusters, studies conducted in the modified IEEE benchmark $68-$bus model support improved operator situational awareness to capture risks in reliable integration of large data center loads.
comment: 13 pages, 8 figures, 3 tables
Online Adaptation for Flying Quadrotors in Tight Formations
The task of flying in tight formations is challenging for teams of quadrotors because the complex aerodynamic wake interactions can destabilize individual team members as well as the team. Furthermore, these aerodynamic effects are highly nonlinear and fast-paced, making them difficult to model and predict. To overcome these challenges, we present L1 KNODE-DW MPC, an adaptive, mixed expert learning based control framework that allows individual quadrotors to accurately track trajectories while adapting to time-varying aerodynamic interactions during formation flights. We evaluate L1 KNODE-DW MPC in two different three-quadrotor formations and show that it outperforms several MPC baselines. Our results show that the proposed framework is capable of enabling the three-quadrotor team to remain vertically aligned in close proximity throughout the flight. These findings show that the L1 adaptive module compensates for unmodeled disturbances most effectively when paired with an accurate dynamics model. A video showcasing our framework and the physical experiments is available here: https://youtu.be/9QX1Q5Ut9Rs
comment: 10 pages, 4 figures
Efficient Path Planning and Task Allocation Algorithm for Boolean Specifications
This paper presents a novel path-planning and task assignment algorithm for multi-robot systems that should fulfill a global Boolean specification. The proposed method is based on Integer Linear Programming (ILP) formulations, which are combined with structural insights from Petri nets to improve scalability and computational efficiency. By proving that the \emph{constraint matrix} is totally unimodular (TU) for certain classes of problems, the ILP formulation can be relaxed into a Linear Programming (LP) problem without losing the integrality of the solution. This relaxation eliminates complex combinatorial techniques, significantly reducing computational overhead and thus ensuring scalability for large-scale systems. Using the approach proposed in this paper, we can solve path-planning problems for teams made up to 500 robots. The method guarantees computational tractability, handles collision avoidance and reduces computational demands through iterative LP optimization techniques. Case studies demonstrate the efficiency of the algorithm in generating scalable, collision-free paths for large robot teams navigating in complex environments. While the conservative nature of collision avoidance introduces additional constraints, and thus, computational requirements, the solution remains practical and impactful for diverse applications. The algorithm is particularly applicable to real-world scenarios, including warehouse logistics where autonomous robots must efficiently coordinate tasks or search-and-rescue operations in various environments. This work contributes both theoretically and practically to scalable multi-robot path planning and task allocation, offering an efficient framework for coordinating autonomous agents in shared environments.
Systems and Control (EESS)
Feature Matching-Based Gait Phase Prediction for Obstacle Crossing Control of Powered Transfemoral Prosthesis
For amputees with powered transfemoral prosthetics, navigating obstacles or complex terrain remains challenging. This study addresses this issue by using an inertial sensor on the sound ankle to guide obstacle-crossing movements. A genetic algorithm computes the optimal neural network structure to predict the required angles of the thigh and knee joints. A gait progression prediction algorithm determines the actuation angle index for the prosthetic knee motor, ultimately defining the necessary thigh and knee angles and gait progression. Results show that when the standard deviation of Gaussian noise added to the thigh angle data is less than 1, the method can effectively eliminate noise interference, achieving 100\% accuracy in gait phase estimation under 150 Hz, with thigh angle prediction error being 8.71\% and knee angle prediction error being 6.78\%. These findings demonstrate the method's ability to accurately predict gait progression and joint angles, offering significant practical value for obstacle negotiation in powered transfemoral prosthetics.
comment: 6 pages, conference
Learning to Drive Safely with Hybrid Options
Out of the many deep reinforcement learning approaches for autonomous driving, only few make use of the options (or skills) framework. That is surprising, as this framework is naturally suited for hierarchical control applications in general, and autonomous driving tasks in specific. Therefore, in this work the options framework is applied and tailored to autonomous driving tasks on highways. More specifically, we define dedicated options for longitudinal and lateral manoeuvres with embedded safety and comfort constraints. This way, prior domain knowledge can be incorporated into the learning process and the learned driving behaviour can be constrained more easily. We propose several setups for hierarchical control with options and derive practical algorithms following state-of-the-art reinforcement learning techniques. By separately selecting actions for longitudinal and lateral control, the introduced policies over combined and hybrid options obtain the same expressiveness and flexibility that human drivers have, while being easier to interpret than classical policies over continuous actions. Of all the investigated approaches, these flexible policies over hybrid options perform the best under varying traffic conditions, outperforming the baseline policies over actions.
Efficient Network Reconfiguration by Randomized Switching
We present an algorithm that efficiently computes nearly-optimal solutions to a class of combinatorial reconfiguration problems on weighted, undirected graphs. Inspired by societally relevant applications in networked infrastructure systems, these problems consist of simultaneously finding an unreweighted sparsified graph and nodal potentials that satisfy fixed demands, where the objective is to minimize some congestion criterion, e.g., a Laplacian quadratic form. These are mixed-integer nonlinear programming problems that are NP-hard in general. To circumvent these challenges, instead of solving for a single best configuration, the proposed randomized switching algorithm seeks to design a distribution of configurations that, when sampled, ensures that congestion concentrates around its optimum. We show that the proposed congestion metric is a generalized self-concordant function in the space of switching probabilities, which enables the use of efficient and simple conditional gradient methods. We implement our algorithm and show that it outperforms a state-of-the-art commercial mixed-integer second-order cone programming (MISOCP) solver by orders of magnitude over a large range of problem sizes.
Flatness-based trajectory planning for 3D overhead cranes with friction compensation and collision avoidance
This paper presents an optimal trajectory generation method for 3D overhead cranes by leveraging differential flatness. This framework enables the direct inclusion of complex physical and dynamic constraints, such as nonlinear friction and collision avoidance for both payload and rope. Our approach allows for aggressive movements by constraining payload swing only at the final point. A comparative simulation study validates our approach, demonstrating that neglecting dry friction leads to actuator saturation and collisions. The results show that friction modeling is a fundamental requirement for fast and safe crane trajectories.
comment: 8 pages, 11 figures
Analyzing Parametric Oscillator Ising Machines through the Kuramoto Lens
Networks of coupled nonlinear oscillators are emerging as powerful physical platforms for implementing Ising machines. Yet the relationship between parametric-oscillator implementations and traditional oscillator-based Ising machines remains underexplored. In this work, we develop a Kuramoto-style, canonical phase description of parametric oscillator Ising machines by starting from the Stuart-Landau oscillator model -- the canonical normal form near a Hopf bifurcation, and a natural reduced description for many parametric oscillator implementations such as the degenerate optical parametric oscillator (DOPO) among others. The resulting phase dynamics combine the usual phase-difference coupling observed in the standard Kuramoto model along with an intrinsic phase sum term that is generated when conjugate coupling is considered. Moreover, our formulation helps explain why explicit second-harmonic driving is unnecessary in parametric oscillators and also reveals how quasi-steady amplitude heterogeneity scales the original strength of the spin interaction with potentially adverse impacts on the solution quality. Our work helps develop a unifying view of the oscillator-based approach to designing Ising machines.
Contributions to Semialgebraic-Set-Based Stability Verification of Dynamical Systems with Neural-Network-Based Controllers
Neural-network-based controllers (NNCs) can represent complex, highly nonlinear control laws, but verifying the closed-loop stability of dynamical systems using them remains challenging. This work presents contributions to a state-of-the-art stability verification procedure for NNC-controlled systems which relies on semialgebraic-set-based input-output modeling to pose the search for a Lyapunov function as an optimization problem. Specifically, this procedure's conservatism when analyzing NNCs using transcendental activation functions and the restriction to feedforward NNCs are addressed by a) introducing novel semialgebraic activation functions that preserve key properties of common transcendental activations and b) proving compatibility of NNCs from the broader class of recurrent equilibrium networks (RENs) with this procedure. Furthermore, the indirect optimization of a local region of attraction (RoA) estimate using a restricted set of candidate Lyapunov functions is greatly improved via c) the introduction of a richer parameterization of candidate Lyapunov functions than previously reported and d) the formulation of novel semidefinite programs (SDPs) that directly optimize the resulting RoA estimate. The value of these contributions is highlighted in two numerical examples.
comment: Submitted to the IEEE for possible publication, 16 pages, 6 figures
Development of a Digital Twin for an Electric Vehicle Emulator Modeling, Control, and Experimental Validation
This paper presents the development and validation of a digital twin for a scaled-down electric vehicle (EV) emulator, designed to replicate longitudinal vehicle dynamics under diverse operating conditions. The emulator integrates a separately excited DC motor (SEDCM), a four-quadrant DC-DC converter, a battery emulator, and a mechanical load emulator. The system models tractive effort, aerodynamic drag, and gradient resistance using Newton's second law. In contrast to conventional graphical modeling tools (e.g., block diagrams and bond graphs), the adopted Energetic Macroscopic Representation (EMR) framework offers clear advantages by explicitly representing energy interactions and facilitating the systematic derivation of control structures. A control strategy developed within this framework governs energy flow across the powertrain, enabling accurate speed control via armature voltage regulation. Experimental tests conducted on a Lucas-Nulle test bench show strong correlation with simulation results. The study also introduces a methodology to compute the maximum admissible vehicle mass - determined to be 13.5 kg for a 180 W motor operating at 1900 rpm - based on acceleration and slope constraints. Furthermore, a switching algorithm for the bidirectional converter ensures reliable four quadrant operation. Overall, the proposed framework provides a scalable and effective approach for EV emulation, control design, and energy management validation.
comment: 6 pages, Accepted at CODIT 2025 (Conference on Decision and Control in Intelligent Technology)
Mechanism-Guided Residual Lifting and Control Consistent Modeling for Pneumatic Drying Processes
Pneumatic drying processes in industries such as agriculture, chemicals,and pharmaceuticals are notoriously difficult to model and control due to multi-source disturbances,coupled stage dynamics, and significant measurement delays. Traditional modeling paradigms often fail to simultaneously deliver accuracy, interpretability, and closed-loop applicability. To address this challenge, this paper introduces a unified hybrid modeling framework, termed Physics-Guided Residual Lifting with Control-Consistent Correction,which integrates a transient mechanistic model with a stability-constrained data-driven component. The framework covers the complete process chain of drying, transport, and winnowing. On the mechanistic level, the model unifies mass transfer dynamics using the partial pressure difference of water vapor, incorporates water activity clamping and latent heat corrections for bound water, and ensures energy closure with moisture-dependent specific heat. On the data-driven level,we propose an orthogonal residual learning scheme. It leverages intermediate states from the mechanistic model as proxy variables to construct a physics-inspired dictionary, preventing parameter compensation and overfitting during ridge regression. Furthermore, to ensure suitability for predictive control, a Control-Consistent Extended Dynamic Mode Decomposition with stability constraints is employed to learn the residual dynamics, for which we provide boundedness proofs and stability guarantees. The framework was validated on 10 industrial batches, comprising 63,000 samples. On unseen test data, the hybrid model achieved a Mean Absolute Error of 0.016% for outlet moisture and 0.015 {\deg}C for outlet temperature, with values improving to 0.986 and 0.995, respectively. The resulting prediction residuals exhibit white-noise characteristics, with significantly reduced spectral energy at low frequencies.
comment: 6figs,4tables
An N-of-1 Artificial Intelligence Ecosystem for Precision Medicine
Artificial intelligence in medicine is built to serve the average patient. By minimizing error across large datasets, most systems deliver strong aggregate accuracy yet falter at the margins: patients with rare variants, multimorbidity, or underrepresented demographics. This average patient fallacy erodes both equity and trust. We propose a different design: a multi-agent ecosystem for N-of-1 decision support. In this environment, agents clustered by organ systems, patient populations, and analytic modalities draw on a shared library of models and evidence synthesis tools. Their results converge in a coordination layer that weighs reliability, uncertainty, and data density before presenting the clinician with a decision-support packet: risk estimates bounded by confidence ranges, outlier flags, and linked evidence. Validation shifts from population averages to individual reliability, measured by error in low-density regions, calibration in the small, and risk--coverage trade-offs. Anticipated challenges include computational demands, automation bias, and regulatory fit, addressed through caching strategies, consensus checks, and adaptive trial frameworks. By moving from monolithic models to orchestrated intelligence, this approach seeks to align medical AI with the first principle of medicine: care that is transparent, equitable, and centered on the individual.
comment: This study has been supported by grants from the National Institutes of Health: The National Institute on Aging R01AG074372 and The National Institute of Allergy and Infectious Diseases R01AI165535
Survey and Tutorial of Reinforcement Learning Methods in Process Systems Engineering
Sequential decision making under uncertainty is central to many Process Systems Engineering (PSE) challenges, where traditional methods often face limitations related to controlling and optimizing complex and stochastic systems. Reinforcement Learning (RL) offers a data-driven approach to derive control policies for such challenges. This paper presents a survey and tutorial on RL methods, tailored for the PSE community. We deliver a tutorial on RL, covering fundamental concepts and key algorithmic families including value-based, policy-based and actor-critic methods. Subsequently, we survey existing applications of these RL techniques across various PSE domains, such as in fed-batch and continuous process control, process optimization, and supply chains. We conclude with PSE focused discussion of specialized techniques and emerging directions. By synthesizing the current state of RL algorithm development and implications for PSE this work identifies successes, challenges, trends, and outlines avenues for future research at the interface of these fields.
A comparison between joint and dual UKF implementations for state estimation and leak localization in water distribution networks
The sustainability of modern cities highly depends on efficient water distribution management, including effective pressure control and leak detection and localization. Accurate information about the network hydraulic state is therefore essential. This article presents a comparison between two data-driven state estimation methods based on the Unscented Kalman Filter (UKF), fusing pressure, demand and flow data for head and flow estimation. One approach uses a joint state vector with a single estimator, while the other uses a dual-estimator scheme. We analyse their main characteristics, discussing differences, advantages and limitations, and compare them theoretically in terms of accuracy and complexity. Finally, we show several estimation results for the L-TOWN benchmark, allowing to discuss their properties in a real implementation.
comment: This work has been submitted to ECC2026 for review. It has 7 pages and 2 figures
Sample-based Moving Horizon Estimation
In this paper, we propose a sample-based moving horizon estimation (MHE) scheme for general nonlinear systems to estimate the current system state using irregularly and/or infrequently available measurements. The cost function of the MHE optimization problem is suitably designed to accommodate these irregular output sequences. We also establish that, under a suitable sample-based detectability condition known as sample-based incremental input/output-to-state stability (i-IOSS), the proposed sample-based MHE achieves robust global exponential stability (RGES). Additionally, for the case of linear systems, we draw connections between sample-based observability and sample-based i-IOSS. This demonstrates that previously established conditions for linear systems to be sample-based observable can be utilized to verify or design sampling strategies that satisfy the conditions to guarantee RGES of the sample-based MHE. Finally, the effectiveness of the proposed sample-based MHE is illustrated through a simulation example.
Improved Accuracy of Robot Localization Using 3-D LiDAR in a Hippocampus-Inspired Model
Boundary Vector Cells (BVCs) are a class of neurons in the brains of vertebrates that encode environmental boundaries at specific distances and allocentric directions, playing a central role in forming place fields in the hippocampus. Most computational BVC models are restricted to two-dimensional (2D) environments, making them prone to spatial ambiguities in the presence of horizontal symmetries in the environment. To address this limitation, we incorporate vertical angular sensitivity into the BVC framework, thereby enabling robust boundary detection in three dimensions, and leading to significantly more accurate spatial localization in a biologically-inspired robot model. The proposed model processes LiDAR data to capture vertical contours, thereby disambiguating locations that would be indistinguishable under a purely 2D representation. Experimental results show that in environments with minimal vertical variation, the proposed 3D model matches the performance of a 2D baseline; yet, as 3D complexity increases, it yields substantially more distinct place fields and markedly reduces spatial aliasing. These findings show that adding a vertical dimension to BVC-based localization can significantly enhance navigation and mapping in real-world 3D spaces while retaining performance parity in simpler, near-planar scenarios.
comment: 8 pages, 9 figures, Presented at the 2025 International Joint Conference on Neural Networks, Rome, July 2025
Defect Mitigation for Robot Arm-based Additive Manufacturing Utilizing Intelligent Control and IOT
This paper presents an integrated robotic fused deposition modeling additive manufacturing system featuring closed-loop thermal control and intelligent in-situ defect correction using a 6-degree of freedom robotic arm and an Oak-D camera. The robot arm end effector was modified to mount an E3D hotend thermally regulated by an IoT microcontroller, enabling precise temperature control through real-time feedback. Filament extrusion system was synchronized with robotic motion, coordinated via ROS2, ensuring consistent deposition along complex trajectories. A vision system based on OpenCV detects layer-wise defects position, commanding autonomous re-extrusion at identified sites. Experimental validation demonstrated successful defect mitigation in printing operations. The integrated system effectively addresses challenges real-time quality assurance. Inverse kinematics were used for motion planning, while homography transformations corrected camera perspectives for accurate defect localization. The intelligent system successfully mitigated surface anomalies without interrupting the print process. By combining real-time thermal regulation, motion control, and intelligent defect detection & correction, this architecture establishes a scalable and adaptive robotic additive manufacturing framework suitable for aerospace, biomedical, and industrial applications.
comment: This Paper Has Accepted at ASME 2025 International Mechanical Engineering Congress and Exposition (IMECE 2025)
A Hamilton-Jacobi Reachability Framework with Soft Constraints for Safety-Critical Systems
Traditional reachability methods provide formal guarantees of safety under bounded disturbances. However, they strictly enforce state constraints as inviolable, which can result in overly conservative or infeasible solutions in complex operational scenarios. Many constraints encountered in practice, such as bounds on battery state of charge in electric vehicles, recommended speed envelopes, and comfort constraints in passenger-carrying vehicles, are inherently soft. Soft constraints allow temporary violations within predefined safety margins to accommodate uncertainty and competing operational demands, albeit at a cost such as increased wear or higher operational expenses. This paper introduces a novel soft-constrained reachability framework that extends Hamilton-Jacobi reachability analysis for the formal verification of safety-critical systems subject to both hard and soft constraints. Specifically, the framework characterizes a subset of the state space, referred to as the soft-constrained reach-avoid set, from which the system is guaranteed to reach a desired set safely, under worst-case disturbances, while ensuring that cumulative soft-constraint violations remain within a user-specified budget. The framework comprises two principal components: (i) an augmented-state model with an auxiliary budget state that tracks soft-constraint violations, and (ii) a regularization-based approximation of the discontinuous Hamilton-Jacobi value function associated with the reach-avoid differential game studied herein. The effectiveness of the proposed framework is demonstrated through numerical examples involving the landing of a simple point-mass model and a fixed-wing aircraft executing an emergency descent, both under wind disturbances. The simulation results validate the framework's ability to simultaneously manage both hard and soft constraints in safety-critical settings
Delay Tolerant Control for Autonomous Driving Using CDOB
With the rapid growth of autonomous vehicle technologies, effective path-tracking control has become a critical component in ensuring safety and efficiency in complex traffic scenarios. When a high level decision making agent generates a collision free path, a robust low level controller is required to precisely follow this trajectory. However, connected autonomous vehicles (CAV) are inherently affected by communication delays and computation delays, which significantly degrade the performance of conventional controllers such as PID or other more advanced controllers like disturbance observers (DOB). While DOB-based designs have shown effectiveness in rejecting disturbances under nominal conditions, their performance deteriorates considerably in the presence of unknown time delays. To address this challenge, this paper proposes a delay-tolerant communication disturbance observer (CDOB) framework for path-tracking control in delayed systems. The proposed CDOB compensates for the adverse effects of time delays, maintaining accurate trajectory tracking even under uncertain and varying delay conditions. It is shown through a simulation study that the proposed control architecture maintains close alignment with the reference trajectory across various scenarios, including single lane change, double-= lane change, and Elastic Band generated collision avoidance paths under various time delays. Simulation results further demonstrate that the proposed method outperforms conventional approaches in both tracking accuracy and delay robustness, making it well suited for autonomous driving applications.
Decentralized Merging Control of Connected and Automated Vehicles to Enhance Safety and Energy Efficiency using Control Barrier Functions
This paper presents a decentralized Control Barrier Function (CBF) based approach for highway merging of Connected and Automated Vehicles (CAVs). In this control algorithm, each "host" vehicle negotiates with other agents in a control zone of the highway network, and enacts its own action, to perform safe and energy-efficient merge maneuvers. It uses predictor-corrector loops within the robust CBF setting for negotiation and to reconcile disagreements that may arise. There is no explicit order of vehicles and no priority. A notable feature is absence of gridlocks due to instability of the inter-agent system. Results from Monte Carlo simulations show significant improvement in the system-wide energy efficiency and traffic flow compared to a first-in-first-out approach, as well as enhanced robustness of the proposed decentralized controller compared to its centralized counterpart.
comment: This work has been submitted to a conference for possible publication and is under review. Paper summary: 8 pages, 5 figures, 2 tables
Deep Reinforcement Learning Approach to QoSAware Load Balancing in 5G Cellular Networks under User Mobility and Observation Uncertainty
Efficient mobility management and load balancing are critical to sustaining Quality of Service (QoS) in dense, highly dynamic 5G radio access networks. We present a deep reinforcement learning framework based on Proximal Policy Optimization (PPO) for autonomous, QoS-aware load balancing implemented end-to-end in a lightweight, pure-Python simulation environment. The control problem is formulated as a Markov Decision Process in which the agent periodically adjusts Cell Individual Offset (CIO) values to steer user-cell associations. A multi-objective reward captures key performance indicators (aggregate throughput, latency, jitter, packet loss rate, Jain's fairness index, and handover count), so the learned policy explicitly balances efficiency and stability under user mobility and noisy observations. The PPO agent uses an actor-critic neural network trained from trajectories generated by the Python simulator with configurable mobility (e.g., Gauss-Markov) and stochastic measurement noise. Across 500+ training episodes and stress tests with increasing user density, the PPO policy consistently improves KPI trends (higher throughput and fairness, lower delay, jitter, packet loss, and handovers) and exhibits rapid, stable convergence. Comparative evaluations show that PPO outperforms rule-based ReBuHa and A3 as well as the learning-based CDQL baseline across all KPIs while maintaining smoother learning dynamics and stronger generalization as load increases. These results indicate that PPO's clipped policy updates and advantage-based training yield robust, deployable control for next-generation RAN load balancing using an entirely Python-based toolchain.
NeuroDOB: A Deep Neural Observer-Based Controller for Vehicle Lateral Dynamics
This paper proposes NeuroDOB, a deep neural network based observer controller for vehicle lateral dynamics, which replaces the conventional disturbance observer (DOB) with a deep neural network (DNN) to enhance personalized lateral control. Unlike conventional DOBs that compensate for general disturbances such as road friction variation and crosswind, NeuroDOB explicitly addresses unmodeled vehicle dynamics and driver-specific behaviors by learning the steering compensation signal from driver-in-the-loop simulations using CarSim's embedded controller as a surrogate driver. The proposed architecture integrates NeuroDOB with a linear quadratic regulator (LQR), where the DNN outputs a delta error correction added to the baseline LQR steering input to produce the final control command. Input features to the DNN include lateral position and yaw angle errors, and the LQR control input. Experimental validation using a lateral dynamic bicycle model within CarSim demonstrates that NeuroDOB effectively adapts to individual driving habits, improving lateral control performance beyond what conventional LQR controllers achieve. The results indicate the potential of deep neural network based observer to enable personalized and adaptive autonomous vehicle control. In cognitive terms, the proposed architecture can be viewed as a dual-system control structure. The baseline LQR corresponds to System 1, a model-based, fast, and analytic reasoning layer ensuring stability. The NeuroDOB acts as System 2, a reflective, data-driven layer that learns compensation from experience and corrects the analytical bias of System 1. Together, they form an integrated decision process analogous to human intuition-reflection interaction, enabling both stability and adaptability in lateral control.
comment: 12 pages, 16 figures
Long-Term Mapping of the Douro River Plume with Multi-Agent Reinforcement Learning
We study the problem of long-term (multiple days) mapping of a river plume using multiple autonomous underwater vehicles (AUVs), focusing on the Douro river representative use-case. We propose an energy - and communication - efficient multi-agent reinforcement learning approach in which a central coordinator intermittently communicates with the AUVs, collecting measurements and issuing commands. Our approach integrates spatiotemporal Gaussian process regression (GPR) with a multi-head Q-network controller that regulates direction and speed for each AUV. Simulations using the Delft3D ocean model demonstrate that our method consistently outperforms both single- and multi-agent benchmarks, with scaling the number of agents both improving mean squared error (MSE) and operational endurance. In some instances, our algorithm demonstrates that doubling the number of AUVs can more than double endurance while maintaining or improving accuracy, underscoring the benefits of multi-agent coordination. Our learned policies generalize across unseen seasonal regimes over different months and years, demonstrating promise for future developments of data-driven long-term monitoring of dynamic plume environments.
Assessment of Power System Stability Considering Multiple Time-Scale Dynamics: Insights into Hopf Bifurcations in Presence of GFL and GFM IBRs
Real power systems exhibit dynamics that evolve across a wide range of time scales, from very fast to very slow phenomena. Historically, incorporating these wide-ranging dynamics into a single model has been impractical. As a result, power engineers rely on time-scale decomposition to simplify models. When fast phenomena are evaluated, slow dynamics are neglected (assumed stable), and vice versa. This paper challenges this paradigm by showing the importance of assessing power system stability while considering multiple time scales simultaneously. Using the concept of Hopf bifurcations, it exemplifies instability issues that would be missed if multi-time-scale dynamics are not considered. Although this work employs both grid-following and grid-forming inverter-based resource models, it is not a direct comparison. Instead, it presents a case study demonstrating how one technology can complement the other from a multi time-scale dynamics perspective.
comment: 7 pages
Two-Stage Learning of Stabilizing Neural Controllers via Zubov Sampling and Iterative Domain Expansion NeurIPS 2025
Learning-based neural network (NN) control policies have shown impressive empirical performance. However, obtaining stability guarantees and estimates of the region of attraction of these learned neural controllers is challenging due to the lack of stable and scalable training and verification algorithms. Although previous works in this area have achieved great success, much conservatism remains in their frameworks. In this work, we propose a novel two-stage training framework to jointly synthesize a controller and a Lyapunov function for continuous-time systems. By leveraging a Zubov-inspired region of attraction characterization to directly estimate stability boundaries, we propose a novel training-data sampling strategy and a domain-updating mechanism that significantly reduces the conservatism in training. Moreover, unlike existing works on continuous-time systems that rely on an SMT solver to formally verify the Lyapunov condition, we extend state-of-the-art neural network verifier $\alpha,\!\beta$-CROWN with the capability of performing automatic bound propagation through the Jacobian of dynamical systems and a novel verification scheme that avoids expensive bisection. To demonstrate the effectiveness of our approach, we conduct numerical experiments by synthesizing and verifying controllers on several challenging nonlinear systems across multiple dimensions. We show that our training can yield region of attractions with volume $5 - 1.5\cdot 10^{5}$ times larger compared to the baselines, and our verification on continuous systems can be up to $40-10{,}000$ times faster compared to the traditional SMT solver dReal. Our code is available at https://github.com/Verified-Intelligence/Two-Stage_Neural_Controller_Training.
comment: NeurIPS 2025
MathBode: Understanding LLM Reasoning with Dynamical Systems
This paper presents MathBode, a dynamic diagnostic for mathematical reasoning in large language models (LLMs). Instead of one-shot accuracy, MathBode treats each parametric problem as a system: we drive a single parameter sinusoidally and fit first-harmonic responses of model outputs and exact solutions. This yields interpretable, frequency-resolved metrics -- gain (amplitude tracking) and phase (lag) -- that form Bode-style fingerprints. Across five closed-form families (linear solve, ratio/saturation, compound interest, 2x2 linear systems, similar triangles), the diagnostic surfaces systematic low-pass behavior and growing phase lag that accuracy alone obscures. We compare several models against a symbolic baseline that calibrates the instrument ($G \approx 1$, $\phi \approx 0$). Results separate frontier from mid-tier models on dynamics, providing a compact, reproducible protocol that complements standard benchmarks with actionable measurements of reasoning fidelity and consistency. We open-source the dataset and code to enable further research and adoption.
Learning Wireless Interference Patterns: Decoupled GNN for Throughput Prediction in Heterogeneous Multi-Hop p-CSMA Networks
The p-persistent CSMA protocol is central to random-access MAC analysis, but predicting saturation throughput in heterogeneous multi-hop wireless networks remains a hard problem. Simplified models that assume a single, shared interference domain can underestimate throughput by 48-62% in sparse topologies. Exact Markov-chain analyses are accurate but scale exponentially in computation time, making them impractical for large networks. These computational barriers motivate structural machine learning approaches like GNNs for scalable throughput prediction in general network topologies. Yet off-the-shelf GNNs struggle here: a standard GCN yields 63.94% normalized mean absolute error (NMAE) on heterogeneous networks because symmetric normalization conflates a node's direct interference with higher-order, cascading effects that pertain to how interference propagates over the network graph. Building on these insights, we propose the Decoupled Graph Convolutional Network (D-GCN), a novel architecture that explicitly separates processing of a node's own transmission probability from neighbor interference effects. D-GCN replaces mean aggregation with learnable attention, yielding interpretable, per-neighbor contribution weights while capturing complex multihop interference patterns. D-GCN attains 3.3% NMAE, outperforms strong baselines, remains tractable even when exact analytical methods become computationally infeasible, and enables gradient-based network optimization that achieves within 1% of theoretical optima.
Flight-Ready Precise and Robust Carrier-Phase GNSS Navigation Software for Distributed Space Systems
This paper presents the full requirements analysis, design, development, and testing of high-precision navigation flight software for Distributed Space Systems (DSS) using Carrier Phase Differential GNSS (CDGNSS). Five main contributions are made. First, a survey of flown and upcoming DSS missions with stringent precision requirements is conducted, from which a thorough requirements analysis is distilled to guide development and testing. Second, a real-time navigation functional architecture is designed, and adopts a sparse and regularized Consider Kalman Filter with options for numerical stability in-flight. The filter rigorously accounts for uncertainties in process noise, measurement noise, and biases. It tracks float ambiguities with integer resolution where possible. The covariance correlation structure is preserved under all navigation modes, including contingencies and outages. Third, a lightweight, memoryless Fault Detection, Isolation, and Recovery (FDIR) module is developed to guard against anomalous measurements, providing statistical screening and ensuring robust navigation. Fourth, the software architecture is proposed for ease of integration, with strategies presented for modularity and computational efficiency tailored to constrained flight systems. Fifth, a comprehensive test campaign is conducted, mapped to a requirements verification matrix, spanning unit, interface, software-in-the-loop, and real-time hardware-in-the-loop tests, emphasizing gradual test fidelity for efficient fault isolation. Finally, flight-like results are demonstrated using the VISORS mission, due to the generalizability of the VISORS navigation operations, and the stringency which demands sub-centimeter relative position and sub-millimeter-per-second velocity accuracy. This architecture aims to serve as a reference for next-generation DSS missions adopting CDGNSS.
A Volumetric Privacy Measure for Dynamical Systems With Bounded Disturbance
In this paper, we first present a volumetric privacy measure for dynamical systems with bounded disturbances, wherein the states of the system contain private information and an adversary with access to sensor measurements attempts to infer the set of potential values of the private information. Under the proposed privacy measure, the volume of the uncertainty set of the adversary given the sensor measurements is considered as the privacy level of the system. We next characteristic the time evolution of the proposed privacy measure and study its properties for a particular system with both public and private states, where a set containing the public state is shared as the observation. Approximate set-membership estimation techniques are developed to compute the private-state uncertainty set, and the properties of the privacy measure are analyzed, demonstrating that the uncertainty reduction of the adversary is bounded by the information gain from the observation set. Furthermore, an optimization-based privacy filter design problem is formulated, employing randomization and linear programming to enhance the privacy level. The effectiveness of the proposed approach is validated through a production-inventory case study. Results show that the optimal privacy filter significantly improves robustness against inference attacks and outperforms two baseline mechanisms based on additive noise and quantization.
A cutting-surface consensus approach for distributed robust optimization of multi-agent systems
A novel and fully distributed optimization method is proposed for the distributed robust convex program (DRCP) over a time-varying unbalanced directed network under the uniformly jointly strongly connected (UJSC) assumption. Firstly, an approximated DRCP (ADRCP) is introduced by discretizing the semi-infinite constraints into a finite number of inequality constraints to ensure tractability and restricting the right-hand side of the constraints with a positive parameter to ensure a feasible solution for (DRCP) can be obtained. This problem is iteratively solved by a distributed projected gradient algorithm proposed in this paper, which is based on epigraphic reformulation and gradient projected operations. Secondly, a cutting-surface consensus approach is proposed for locating an approximately optimal consensus solution of the DRCP with guaranteed local feasibility for each agent. This approach is based on iteratively approximating the DRCP by successively reducing the restriction parameter of the right-hand constraints and adding the cutting-surfaces into the existing finite set of constraints. Thirdly, to ensure finite-time termination of the distributed optimization, a distributed termination algorithm is developed based on consensus and zeroth-order stopping conditions under UJSC graphs. Fourthly, it is proved that the cutting-surface consensus approach terminates finitely and yields a feasible and approximate optimal solution for each agent. Finally, the effectiveness of the approach is illustrated through a numerical example.
comment: 16 pages, 8 figures, published to IEEE TAC
Operational Risks in Grid Integration of Large Data Center Loads: Characteristics, Stability Assessments, and Sensitivity Studies
This paper investigates the dynamic interactions between large-scale data centers and the power grid, focusing on reliability challenges arising from sudden fluctuations in demand. With the rapid growth of AI-driven workloads, such fluctuations, along with fast ramp patterns, are expected to exacerbate stressed grid conditions and system instabilities. We consider a few open-source AI data center consumption profiles from the MIT supercloud datasets, along with generating a few experimental HPC job-distribution-based inference profiles. Subsequently, we develop analytical methodologies for real-time assessment of grid stability, focusing on both transient and small-signal stability assessments. Energy-flow-like metrics for nonlinear transient stability, formulated by computing localized data center bus kinetic-like flows and coupling interactions with neighboring buses over varying time windows, help provide operators with real-time assessments of the regional grid stress in the data center hubs. On the other hand, small-signal stability metrics, constructed from analytical state matrices under variable operating conditions during a fast ramping period, enable snapshot-based assessments of data center load fluctuations and provide enhanced observability into evolving grid conditions. By quantifying the stability impacts of large data center clusters, studies conducted in the modified IEEE benchmark $68-$bus model support improved operator situational awareness to capture risks in reliable integration of large data center loads.
comment: 13 pages, 8 figures, 3 tables
Online Adaptation for Flying Quadrotors in Tight Formations
The task of flying in tight formations is challenging for teams of quadrotors because the complex aerodynamic wake interactions can destabilize individual team members as well as the team. Furthermore, these aerodynamic effects are highly nonlinear and fast-paced, making them difficult to model and predict. To overcome these challenges, we present L1 KNODE-DW MPC, an adaptive, mixed expert learning based control framework that allows individual quadrotors to accurately track trajectories while adapting to time-varying aerodynamic interactions during formation flights. We evaluate L1 KNODE-DW MPC in two different three-quadrotor formations and show that it outperforms several MPC baselines. Our results show that the proposed framework is capable of enabling the three-quadrotor team to remain vertically aligned in close proximity throughout the flight. These findings show that the L1 adaptive module compensates for unmodeled disturbances most effectively when paired with an accurate dynamics model. A video showcasing our framework and the physical experiments is available here: https://youtu.be/9QX1Q5Ut9Rs
comment: 10 pages, 4 figures
Efficient Path Planning and Task Allocation Algorithm for Boolean Specifications
This paper presents a novel path-planning and task assignment algorithm for multi-robot systems that should fulfill a global Boolean specification. The proposed method is based on Integer Linear Programming (ILP) formulations, which are combined with structural insights from Petri nets to improve scalability and computational efficiency. By proving that the \emph{constraint matrix} is totally unimodular (TU) for certain classes of problems, the ILP formulation can be relaxed into a Linear Programming (LP) problem without losing the integrality of the solution. This relaxation eliminates complex combinatorial techniques, significantly reducing computational overhead and thus ensuring scalability for large-scale systems. Using the approach proposed in this paper, we can solve path-planning problems for teams made up to 500 robots. The method guarantees computational tractability, handles collision avoidance and reduces computational demands through iterative LP optimization techniques. Case studies demonstrate the efficiency of the algorithm in generating scalable, collision-free paths for large robot teams navigating in complex environments. While the conservative nature of collision avoidance introduces additional constraints, and thus, computational requirements, the solution remains practical and impactful for diverse applications. The algorithm is particularly applicable to real-world scenarios, including warehouse logistics where autonomous robots must efficiently coordinate tasks or search-and-rescue operations in various environments. This work contributes both theoretically and practically to scalable multi-robot path planning and task allocation, offering an efficient framework for coordinating autonomous agents in shared environments.
Robotics
Track, Inpaint, Resplat: Subject-driven 3D and 4D Generation with Progressive Texture Infilling NeurIPS 2025
Current 3D/4D generation methods are usually optimized for photorealism, efficiency, and aesthetics. However, they often fail to preserve the semantic identity of the subject across different viewpoints. Adapting generation methods with one or few images of a specific subject (also known as Personalization or Subject-driven generation) allows generating visual content that align with the identity of the subject. However, personalized 3D/4D generation is still largely underexplored. In this work, we introduce TIRE (Track, Inpaint, REsplat), a novel method for subject-driven 3D/4D generation. It takes an initial 3D asset produced by an existing 3D generative model as input and uses video tracking to identify the regions that need to be modified. Then, we adopt a subject-driven 2D inpainting model for progressively infilling the identified regions. Finally, we resplat the modified 2D multi-view observations back to 3D while still maintaining consistency. Extensive experiments demonstrate that our approach significantly improves identity preservation in 3D/4D generation compared to state-of-the-art methods. Our project website is available at https://zsh2000.github.io/track-inpaint-resplat.github.io/.
comment: NeurIPS 2025, 38 pages, 22 figures
UrbanVLA: A Vision-Language-Action Model for Urban Micromobility
Urban micromobility applications, such as delivery robots, demand reliable navigation across large-scale urban environments while following long-horizon route instructions. This task is particularly challenging due to the dynamic and unstructured nature of real-world city areas, yet most existing navigation methods remain tailored to short-scale and controllable scenarios. Effective urban micromobility requires two complementary levels of navigation skills: low-level capabilities such as point-goal reaching and obstacle avoidance, and high-level capabilities, such as route-visual alignment. To this end, we propose UrbanVLA, a route-conditioned Vision-Language-Action (VLA) framework designed for scalable urban navigation. Our method explicitly aligns noisy route waypoints with visual observations during execution, and subsequently plans trajectories to drive the robot. To enable UrbanVLA to master both levels of navigation, we employ a two-stage training pipeline. The process begins with Supervised Fine-Tuning (SFT) using simulated environments and trajectories parsed from web videos. This is followed by Reinforcement Fine-Tuning (RFT) on a mixture of simulation and real-world data, which enhances the model's safety and adaptability in real-world settings. Experiments demonstrate that UrbanVLA surpasses strong baselines by more than 55% in the SocialNav task on MetaUrban. Furthermore, UrbanVLA achieves reliable real-world navigation, showcasing both scalability to large-scale urban environments and robustness against real-world uncertainties.
RobotArena $\infty$: Scalable Robot Benchmarking via Real-to-Sim Translation
The pursuit of robot generalists - instructable agents capable of performing diverse tasks across diverse environments - demands rigorous and scalable evaluation. Yet real-world testing of robot policies remains fundamentally constrained: it is labor-intensive, slow, unsafe at scale, and difficult to reproduce. Existing simulation benchmarks are similarly limited, as they train and test policies within the same synthetic domains and cannot assess models trained from real-world demonstrations or alternative simulation environments. As policies expand in scope and complexity, these barriers only intensify, since defining "success" in robotics often hinges on nuanced human judgments of execution quality. In this paper, we introduce a new benchmarking framework that overcomes these challenges by shifting VLA evaluation into large-scale simulated environments augmented with online human feedback. Leveraging advances in vision-language models, 2D-to-3D generative modeling, and differentiable rendering, our approach automatically converts video demonstrations from widely used robot datasets into simulated counterparts. Within these digital twins, we assess VLA policies using both automated VLM-guided scoring and scalable human preference judgments collected from crowdworkers, transforming human involvement from tedious scene setup, resetting, and safety supervision into lightweight preference comparisons. To measure robustness, we systematically perturb simulated environments along multiple axes, such as textures and object placements, stress-testing policy generalization under controlled variation. The result is a continuously evolving, reproducible, and scalable benchmark for real-world trained robot manipulation policies, addressing a critical missing capability in today's robotics landscape.
comment: Website: https://robotarenainf.github.io
DPGLA: Bridging the Gap between Synthetic and Real Data for Unsupervised Domain Adaptation in 3D LiDAR Semantic Segmentation IROS
Annotating real-world LiDAR point clouds for use in intelligent autonomous systems is costly. To overcome this limitation, self-training-based Unsupervised Domain Adaptation (UDA) has been widely used to improve point cloud semantic segmentation by leveraging synthetic point cloud data. However, we argue that existing methods do not effectively utilize unlabeled data, as they either rely on predefined or fixed confidence thresholds, resulting in suboptimal performance. In this paper, we propose a Dynamic Pseudo-Label Filtering (DPLF) scheme to enhance real data utilization in point cloud UDA semantic segmentation. Additionally, we design a simple and efficient Prior-Guided Data Augmentation Pipeline (PG-DAP) to mitigate domain shift between synthetic and real-world point clouds. Finally, we utilize data mixing consistency loss to push the model to learn context-free representations. We implement and thoroughly evaluate our approach through extensive comparisons with state-of-the-art methods. Experiments on two challenging synthetic-to-real point cloud semantic segmentation tasks demonstrate that our approach achieves superior performance. Ablation studies confirm the effectiveness of the DPLF and PG-DAP modules. We release the code of our method in this paper.
comment: This paper has been accepted for publication at the 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Explicit Memory through Online 3D Gaussian Splatting Improves Class-Agnostic Video Segmentation
Remembering where object segments were predicted in the past is useful for improving the accuracy and consistency of class-agnostic video segmentation algorithms. Existing video segmentation algorithms typically use either no object-level memory (e.g. FastSAM) or they use implicit memories in the form of recurrent neural network features (e.g. SAM2). In this paper, we augment both types of segmentation models using an explicit 3D memory and show that the resulting models have more accurate and consistent predictions. For this, we develop an online 3D Gaussian Splatting (3DGS) technique to store predicted object-level segments generated throughout the duration of a video. Based on this 3DGS representation, a set of fusion techniques are developed, named FastSAM-Splat and SAM2-Splat, that use the explicit 3DGS memory to improve their respective foundation models' predictions. Ablation experiments are used to validate the proposed techniques' design and hyperparameter settings. Results from both real-world and simulated benchmarking experiments show that models which use explicit 3D memories result in more accurate and consistent predictions than those which use no memory or only implicit neural network memories. Project Page: https://topipari.com/projects/FastSAM-Splat/
comment: Accepted in IEEE Robotics and Automation Letters September 2025
Localising under the drape: proprioception in the era of distributed surgical robotic system
Despite their mechanical sophistication, surgical robots remain blind to their surroundings. This lack of spatial awareness causes collisions, system recoveries, and workflow disruptions, issues that will intensify with the introduction of distributed robots with independent interacting arms. Existing tracking systems rely on bulky infrared cameras and reflective markers, providing only limited views of the surgical scene and adding hardware burden in crowded operating rooms. We present a marker-free proprioception method that enables precise localisation of surgical robots under their sterile draping despite associated obstruction of visual cues. Our method solely relies on lightweight stereo-RGB cameras and novel transformer-based deep learning models. It builds on the largest multi-centre spatial robotic surgery dataset to date (1.4M self-annotated images from human cadaveric and preclinical in vivo studies). By tracking the entire robot and surgical scene, rather than individual markers, our approach provides a holistic view robust to occlusions, supporting surgical scene understanding and context-aware control. We demonstrate an example of potential clinical benefits during in vivo breathing compensation with access to tissue dynamics, unobservable under state of the art tracking, and accurately locate in multi-robot systems for future intelligent interaction. In addition, and compared with existing systems, our method eliminates markers and improves tracking visibility by 25%. To our knowledge, this is the first demonstration of marker-free proprioception for fully draped surgical robots, reducing setup complexity, enhancing safety, and paving the way toward modular and autonomous robotic surgery.
Dexbotic: Open-Source Vision-Language-Action Toolbox
In this paper, we present Dexbotic, an open-source Vision-Language-Action (VLA) model toolbox based on PyTorch. It aims to provide a one-stop VLA research service for professionals in the field of embodied intelligence. It offers a codebase that supports multiple mainstream VLA policies simultaneously, allowing users to reproduce various VLA methods with just a single environment setup. The toolbox is experiment-centric, where the users can quickly develop new VLA experiments by simply modifying the Exp script. Moreover, we provide much stronger pretrained models to achieve great performance improvements for state-of-the-art VLA policies. Dexbotic will continuously update to include more of the latest pre-trained foundation models and cutting-edge VLA models in the industry.
comment: Authors are listed in alphabetical order. The official website is located at https://dexbotic.com/. Code is available at https://github.com/Dexmal/dexbotic
Deductive Chain-of-Thought Augmented Socially-aware Robot Navigation World Model
Social robot navigation increasingly relies on large language models for reasoning, path planning, and enabling movement in dynamic human spaces. However, relying solely on LLMs for planning often leads to unpredictable and unsafe behaviors, especially in dynamic human spaces, due to limited physical grounding and weak logical consistency. In this work, we introduce NaviWM, a socially-aware robot Navigation World Model that augments LLM reasoning with a structured world model and a logic-driven chain-of-thought process. NaviWM consists of two main components: (1) a spatial-temporal world model that captures the positions, velocities, and activities of agents in the environment, and (2) a deductive reasoning module that guides LLMs through a multi-step, logic-based inference process. This integration enables the robot to generate navigation decisions that are both socially compliant and physically safe, under well-defined constraints such as personal space, collision avoidance, and timing. Unlike previous methods based on prompting or fine-tuning, NaviWM encodes social norms as first-order logic, enabling interpretable and verifiable reasoning. Experiments show that NaviWM improves success rates and reduces social violations, particularly in crowded environments. These results demonstrate the benefit of combining formal reasoning with LLMs for robust social navigation. Additional experimental details and demo videos for this work can be found at: https://sites.google.com/view/NaviWM.
COOPERA: Continual Open-Ended Human-Robot Assistance NeurIPS 2025
To understand and collaborate with humans, robots must account for individual human traits, habits, and activities over time. However, most robotic assistants lack these abilities, as they primarily focus on predefined tasks in structured environments and lack a human model to learn from. This work introduces COOPERA, a novel framework for COntinual, OPen-Ended human-Robot Assistance, where simulated humans, driven by psychological traits and long-term intentions, interact with robots in complex environments. By integrating continuous human feedback, our framework, for the first time, enables the study of long-term, open-ended human-robot collaboration (HRC) in different collaborative tasks across various time-scales. Within COOPERA, we introduce a benchmark and an approach to personalize the robot's collaborative actions by learning human traits and context-dependent intents. Experiments validate the extent to which our simulated humans reflect realistic human behaviors and demonstrate the value of inferring and personalizing to human intents for open-ended and long-term HRC. Project Page: https://dannymcy.github.io/coopera/
comment: NeurIPS 2025 (Spotlight); Project Page: https://dannymcy.github.io/coopera/
Full-Dynamics Real-Time Nonlinear Model Predictive Control of Heavy-Duty Hydraulic Manipulator for Trajectory Tracking Tasks
Heavy-duty hydraulic manipulators (HHMs) operate under strict physical and safety-critical constraints due to their large size, high power, and complex nonlinear dynamics. Ensuring that both joint-level and end-effector trajectories remain compliant with actuator capabilities, such as force, velocity, and position limits, is essential for safe and reliable operation, yet remains largely underexplored in real-time control frameworks. This paper presents a nonlinear model predictive control (NMPC) framework designed to guarantee constraint satisfaction throughout the full nonlinear dynamics of HHMs, while running at a real-time control frequency of 1 kHz. The proposed method combines a multiple-shooting strategy with real-time sensor feedback, and is supported by a robust low-level controller based on virtual decomposition control (VDC) for precise joint tracking. Experimental validation on a full-scale hydraulic manipulator shows that the NMPC framework not only enforces actuator constraints at the joint level, but also ensures constraint-compliant motion in Cartesian space for the end-effector. These results demonstrate the method's capability to deliver high-accuracy trajectory tracking while strictly respecting safety-critical limits, setting a new benchmark for real-time control in large-scale hydraulic systems.
comment: This work has been submitted for possible publication in IEEE
T-ESKF: Transformed Error-State Kalman Filter for Consistent Visual-Inertial Navigation
This paper presents a novel approach to address the inconsistency problem caused by observability mismatch in visual-inertial navigation systems (VINS). The key idea involves applying a linear time-varying transformation to the error-state within the Error-State Kalman Filter (ESKF). This transformation ensures that \textrr{the unobservable subspace of the transformed error-state system} becomes independent of the state, thereby preserving the correct observability of the transformed system against variations in linearization points. We introduce the Transformed ESKF (T-ESKF), a consistent VINS estimator that performs state estimation using the transformed error-state system. Furthermore, we develop an efficient propagation technique to accelerate the covariance propagation based on the transformation relationship between the transition and accumulated matrices of T-ESKF and ESKF. We validate the proposed method through extensive simulations and experiments, demonstrating better (or competitive at least) performance compared to state-of-the-art methods. The code is available at github.com/HITCSC/T-ESKF.
comment: This paper was submitted to IEEE RA-L on July 14, 2024, and accepted on December 18, 2024. This version serves as the 'plus edition' of the accepted paper, incorporating supplementary materials for completeness
Large language model-based task planning for service robots: A review
With the rapid advancement of large language models (LLMs) and robotics, service robots are increasingly becoming an integral part of daily life, offering a wide range of services in complex environments. To deliver these services intelligently and efficiently, robust and accurate task planning capabilities are essential. This paper presents a comprehensive overview of the integration of LLMs into service robotics, with a particular focus on their role in enhancing robotic task planning. First, the development and foundational techniques of LLMs, including pre-training, fine-tuning, retrieval-augmented generation (RAG), and prompt engineering, are reviewed. We then explore the application of LLMs as the cognitive core-`brain'-of service robots, discussing how LLMs contribute to improved autonomy and decision-making. Furthermore, recent advancements in LLM-driven task planning across various input modalities are analyzed, including text, visual, audio, and multimodal inputs. Finally, we summarize key challenges and limitations in current research and propose future directions to advance the task planning capabilities of service robots in complex, unstructured domestic environments. This review aims to serve as a valuable reference for researchers and practitioners in the fields of artificial intelligence and robotics.
comment: Submitted to Biomimetic Intelligence and Robotics for possible publication
Transferable Deep Reinforcement Learning for Cross-Domain Navigation: from Farmland to the Moon
Autonomous navigation in unstructured environments is essential for field and planetary robotics, where robots must efficiently reach goals while avoiding obstacles under uncertain conditions. Conventional algorithmic approaches often require extensive environment-specific tuning, limiting scalability to new domains. Deep Reinforcement Learning (DRL) provides a data-driven alternative, allowing robots to acquire navigation strategies through direct interactions with their environment. This work investigates the feasibility of DRL policy generalization across visually and topographically distinct simulated domains, where policies are trained in terrestrial settings and validated in a zero-shot manner in extraterrestrial environments. A 3D simulation of an agricultural rover is developed and trained using Proximal Policy Optimization (PPO) to achieve goal-directed navigation and obstacle avoidance in farmland settings. The learned policy is then evaluated in a lunar-like simulated environment to assess transfer performance. The results indicate that policies trained under terrestrial conditions retain a high level of effectiveness, achieving close to 50\% success in lunar simulations without the need for additional training and fine-tuning. This underscores the potential of cross-domain DRL-based policy transfer as a promising approach to developing adaptable and efficient autonomous navigation for future planetary exploration missions, with the added benefit of minimizing retraining costs.
comment: 6 pages, 7 figures. Accepted at IEEE iSpaRo 2025
Payload trajectory tracking control for aerial transportation systems with cable length online optimization
Cable-suspended aerial transportation systems are employed extensively across various industries. The capability to flexibly adjust the relative position between the multirotor and the payload has spurred growing interest in the system equipped with variable-length cable, promising broader application potential. Compared to systems with fixed-length cables, introducing the variable-length cable adds a new degree of freedom. However, it also results in increased nonlinearity and more complex dynamic coupling among the multirotor, the cable and the payload, posing significant challenges in control design. This paper introduces a backstepping control strategy tailored for aerial transportation systems with variable-length cable, designed to precisely track the payload trajectory while dynamically adjusting cable length. Then, a cable length generator has been developed that achieves online optimization of the cable length while satisfying state constraints, thus balancing the multirotor's motion and cable length changes without the need for manual trajectory planning. The asymptotic stability of the closed-loop system is guaranteed through Lyapunov techniques and the growth restriction condition. Finally, simulation results confirm the efficacy of the proposed method in managing trajectory tracking and cable length adjustments effectively.
Precise Time Delay Measurement and Compensation for Tightly Coupled Underwater SINS/piUSBL Navigation
In multi-sensor systems, time synchronization between sensors is a significant challenge, and this issue is particularly pronounced in underwater integrated navigation systems incorporating acoustic positioning. Such systems are highly susceptible to time delay, which can significantly degrade accuracy when measurement and fusion moments are misaligned. To address this challenge, this paper introduces a tightly coupled navigation framework that integrates a passive inverted ultra-short baseline (piUSBL) acoustic positioning system, a strapdown inertial navigation system (SINS), and a depth gauge under precise time synchronization. The framework fuses azimuth and slant range from the piUSBL with depth data, thereby avoiding poor vertical-angle observability in planar arrays. A novel delay measurement strategy is introduced, combining synchronized timing with acoustic signal processing, which redefines delay-traditionally an unobservable error-into a quantifiable parameter, enabling explicit estimation of both acoustic propagation and system processing delays. Simulations and field experiments confirm the feasibility of the proposed method, with delay-compensated navigation reducing RMSE by 40.45% and maximum error by 32.55%. These findings show that precise delay measurement and compensation not only enhance underwater navigation accuracy but also establish a generalizable framework for acoustic positioning integration, offering valuable insights into time alignment and data fusion in latency-sensitive multi-sensor systems.
Deep Active Inference with Diffusion Policy and Multiple Timescale World Model for Real-World Exploration and Navigation
Autonomous robotic navigation in real-world environments requires exploration to acquire environmental information as well as goal-directed navigation in order to reach specified targets. Active inference (AIF) based on the free-energy principle provides a unified framework for these behaviors by minimizing the expected free energy (EFE), thereby combining epistemic and extrinsic values. To realize this practically, we propose a deep AIF framework that integrates a diffusion policy as the policy model and a multiple timescale recurrent state-space model (MTRSSM) as the world model. The diffusion policy generates diverse candidate actions while the MTRSSM predicts their long-horizon consequences through latent imagination, enabling action selection that minimizes EFE. Real-world navigation experiments demonstrated that our framework achieved higher success rates and fewer collisions compared with the baselines, particularly in exploration-demanding scenarios. These results highlight how AIF based on EFE minimization can unify exploration and goal-directed navigation in real-world robotic settings.
comment: Preprint version
Optimal Dimensioning of Elastic-Link Manipulators regarding Lifetime Estimation
Resourceful operation and design of robots is key for sustainable industrial automation. This will be enabled by lightweight design along with time and energy optimal control of robotic manipulators. Design and control of such systems is intertwined as the control must take into account inherent mechanical compliance while the design must accommodate the dynamic requirements demanded by the control. As basis for such design optimization, a method for estimating the lifetime of elastic link robotic manipulators is presented. This is applied to the geometry optimization of flexible serial manipulators performing pick-and-place operations, where the optimization objective is a combination of overall weight and vibration amplitudes. The lifetime estimation draws from a fatigue analysis combining the rainflow counting algorithm and the method of critical cutting plane. Tresca hypothesis is used to formulate an equivalent stress, and linear damage accumulation is assumed. The final robot geometry is selected from a Pareto front as a tradeoff of lifetime and vibration characteristic. The method is illustrated for a three degrees of freedom articulated robotic manipulator.
comment: Mechanics Based Design of Structures and Machines, December 2024
Workspace Registration and Collision Detection for Industrial Robotics Applications
Motion planning for robotic manipulators relies on precise knowledge of the environment in order to be able to define restricted areas and to take collision objects into account. To capture the workspace, point clouds of the environment are acquired using various sensors. The collision objects are identified by region growing segmentation and VCCS algorithm. Subsequently the point clusters are approximated. The aim of the present paper is to compare different sensors, to illustrate the process from detection to the finished collision environment and to detect collisions between the robot and this environment.
If They Disagree, Will You Conform? Exploring the Role of Robots' Value Awareness in a Decision-Making Task
This study investigates whether the opinions of robotic agents are more likely to influence human decision-making when the robots are perceived as value-aware (i.e., when they display an understanding of human principles). We designed an experiment in which participants interacted with two Furhat robots - one programmed to be Value-Aware and the other Non-Value-Aware - during a labeling task for images representing human values. Results indicate that participants distinguished the Value-Aware robot from the Non-Value-Aware one. Although their explicit choices did not indicate a clear preference for one robot over the other, participants directed their gaze more toward the Value-Aware robot. Additionally, the Value-Aware robot was perceived as more loyal, suggesting that value awareness in a social robot may enhance its perceived commitment to the group. Finally, when both robots disagreed with the participant, conformity occurred in about one out of four trials, and participants took longer to confirm their responses, suggesting that two robots expressing dissent may introduce hesitation in decision-making. On one hand, this highlights the potential risk that robots, if misused, could manipulate users for unethical purposes. On the other hand, it reinforces the idea that social robots might encourage reflection in ambiguous situations and help users avoid scams.
TARC: Time-Adaptive Robotic Control
Fixed-frequency control in robotics imposes a trade-off between the efficiency of low-frequency control and the robustness of high-frequency control, a limitation not seen in adaptable biological systems. We address this with a reinforcement learning approach in which policies jointly select control actions and their application durations, enabling robots to autonomously modulate their control frequency in response to situational demands. We validate our method with zero-shot sim-to-real experiments on two distinct hardware platforms: a high-speed RC car and a quadrupedal robot. Our method matches or outperforms fixed-frequency baselines in terms of rewards while significantly reducing the control frequency and exhibiting adaptive frequency control under real-world conditions.
Combining High Level Scheduling and Low Level Control to Manage Fleets of Mobile Robots
The deployment of mobile robots for material handling in industrial environments requires scalable coordination of large fleets in dynamic settings. This paper presents a two-layer framework that combines high-level scheduling with low-level control. Tasks are assigned and scheduled using the compositional algorithm ComSat, which generates time-parameterized routes for each robot. These schedules are then used by a distributed Model Predictive Control (MPC) system in real time to compute local reference trajectories, accounting for static and dynamic obstacles. The approach ensures safe, collision-free operation, and supports rapid rescheduling in response to disruptions such as robot failures or environmental changes. We evaluate the method in simulated 2D environments with varying road capacities and traffic conditions, demonstrating high task completion rates and robust behavior even under congestion. The modular structure of the framework allows for computational tractability and flexibility, making it suitable for deployment in complex, real-world industrial scenarios.
Reliable Robotic Task Execution in the Face of Anomalies
Learned robot policies have consistently been shown to be versatile, but they typically have no built-in mechanism for handling the complexity of open environments, making them prone to execution failures; this implies that deploying policies without the ability to recognise and react to failures may lead to unreliable and unsafe robot behaviour. In this paper, we present a framework that couples a learned policy with a method to detect visual anomalies during policy deployment and to perform recovery behaviours when necessary, thereby aiming to prevent failures. Specifically, we train an anomaly detection model using data collected during nominal executions of a trained policy. This model is then integrated into the online policy execution process, so that deviations from the nominal execution can trigger a three-level sequential recovery process that consists of (i) pausing the execution temporarily, (ii) performing a local perturbation of the robot's state, and (iii) resetting the robot to a safe state by sampling from a learned execution success model. We verify our proposed method in two different scenarios: (i) a door handle reaching task with a Kinova Gen3 arm using a policy trained in simulation and transferred to the real robot, and (ii) an object placing task with a UFactory xArm 6 using a general-purpose policy model. Our results show that integrating policy execution with anomaly detection and recovery increases the execution success rate in environments with various anomalies, such as trajectory deviations and adversarial human interventions.
comment: Accepted for publication in IEEE Robotics and Automation Letters (RA-L)
OmniDexGrasp: Generalizable Dexterous Grasping via Foundation Model and Force Feedback
Enabling robots to dexterously grasp and manipulate objects based on human commands is a promising direction in robotics. However, existing approaches are challenging to generalize across diverse objects or tasks due to the limited scale of semantic dexterous grasp datasets. Foundation models offer a new way to enhance generalization, yet directly leveraging them to generate feasible robotic actions remains challenging due to the gap between abstract model knowledge and physical robot execution. To address these challenges, we propose OmniDexGrasp, a generalizable framework that achieves omni-capabilities in user prompting, dexterous embodiment, and grasping tasks by combining foundation models with the transfer and control strategies. OmniDexGrasp integrates three key modules: (i) foundation models are used to enhance generalization by generating human grasp images supporting omni-capability of user prompt and task; (ii) a human-image-to-robot-action transfer strategy converts human demonstrations into executable robot actions, enabling omni dexterous embodiment; (iii) force-aware adaptive grasp strategy ensures robust and stable grasp execution. Experiments in simulation and on real robots validate the effectiveness of OmniDexGrasp on diverse user prompts, grasp task and dexterous hands, and further results show its extensibility to dexterous manipulation tasks.
comment: Project page: https://isee-laboratory.github.io/OmniDexGrasp/
An Automated Tape Laying System Employing a Uniaxial Force Control Device
This paper deals with the design of a cost effective automated tape laying system (ATL system) with integrated uniaxial force control to ensure the necessary compaction forces as well as with an accurate temperature control to guarantee the used tape being melted appropriate. It is crucial to control the substrate and the oncoming tape onto a specific temperature level to ensure an optimal consolidation between the different layers of the product. Therefore, it takes several process steps from the spooled tape on the coil until it is finally tacked onto the desired mold. The different modules are divided into the tape storage spool, a tape-guiding roller, a tape processing unit, a heating zone and the consolidation unit. Moreover, a special robot control concept for testing the ATL system is presented. In contrast to many other systems, with this approach, the tape laying device is spatially fixed and the shape is moved accordingly by the robot, which allows for handling of rather compact and complex shapes. The functionality of the subsystems and the taping process itself was finally approved in experimental results using a carbon fiber reinforced HDPE tape.
comment: Proceedings ECCM21 - 21st European Conference on Composite Materials, Nantes, France, 7-2024
EndoWave: Rational-Wavelet 4D Gaussian Splatting for Endoscopic Reconstruction
In robot-assisted minimally invasive surgery, accurate 3D reconstruction from endoscopic video is vital for downstream tasks and improved outcomes. However, endoscopic scenarios present unique challenges, including photometric inconsistencies, non-rigid tissue motion, and view-dependent highlights. Most 3DGS-based methods that rely solely on appearance constraints for optimizing 3DGS are often insufficient in this context, as these dynamic visual artifacts can mislead the optimization process and lead to inaccurate reconstructions. To address these limitations, we present EndoWave, a unified spatio-temporal Gaussian Splatting framework by incorporating an optical flow-based geometric constraint and a multi-resolution rational wavelet supervision. First, we adopt a unified spatio-temporal Gaussian representation that directly optimizes primitives in a 4D domain. Second, we propose a geometric constraint derived from optical flow to enhance temporal coherence and effectively constrain the 3D structure of the scene. Third, we propose a multi-resolution rational orthogonal wavelet as a constraint, which can effectively separate the details of the endoscope and enhance the rendering performance. Extensive evaluations on two real surgical datasets, EndoNeRF and StereoMIS, demonstrate that our method EndoWave achieves state-of-the-art reconstruction quality and visual accuracy compared to the baseline method.
Breaking the Circle: An Autonomous Control-Switching Strategy for Stable Orographic Soaring in MAVs
Orographic soaring can significantly extend the endurance of micro aerial vehicles (MAVs), but circling behavior, arising from control conflicts between the longitudinal and vertical axes, increases energy consumption and the risk of divergence. We propose a control switching method, named SAOS: Switched Control for Autonomous Orographic Soaring, which mitigates circling behavior by selectively controlling either the horizontal or vertical axis, effectively transforming the system from underactuated to fully actuated during soaring. Additionally, the angle of attack is incorporated into the INDI controller to improve force estimation. Simulations with randomized initial positions and wind tunnel experiments on two MAVs demonstrate that the SAOS improves position convergence, reduces throttle usage, and mitigates roll oscillations caused by pitch-roll coupling. These improvements enhance energy efficiency and flight stability in constrained soaring environments.
comment: 13 pages, 15 figures
Awakening Facial Emotional Expressions in Human-Robot IROS 2025
The facial expression generation capability of humanoid social robots is critical for achieving natural and human-like interactions, playing a vital role in enhancing the fluidity of human-robot interactions and the accuracy of emotional expression. Currently, facial expression generation in humanoid social robots still relies on pre-programmed behavioral patterns, which are manually coded at high human and time costs. To enable humanoid robots to autonomously acquire generalized expressive capabilities, they need to develop the ability to learn human-like expressions through self-training. To address this challenge, we have designed a highly biomimetic robotic face with physical-electronic animated facial units and developed an end-to-end learning framework based on KAN (Kolmogorov-Arnold Network) and attention mechanisms. Unlike previous humanoid social robots, we have also meticulously designed an automated data collection system based on expert strategies of facial motion primitives to construct the dataset. Notably, to the best of our knowledge, this is the first open-source facial dataset for humanoid social robots. Comprehensive evaluations indicate that our approach achieves accurate and diverse facial mimicry across different test subjects.
comment: Accepted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025). 8 pages, 7 figures, IEEE two-column format
Seq-DeepIPC: Sequential Sensing for End-to-End Control in Legged Robot Navigation
We present Seq-DeepIPC, a sequential end-to-end perception-to-control model for legged robot navigation in realworld environments. Seq-DeepIPC advances intelligent sensing for autonomous legged navigation by tightly integrating multi-modal perception (RGB-D + GNSS) with temporal fusion and control. The model jointly predicts semantic segmentation and depth estimation, giving richer spatial features for planning and control. For efficient deployment on edge devices, we use EfficientNet-B0 as the encoder, reducing computation while maintaining accuracy. Heading estimation is simplified by removing the noisy IMU and instead computing the bearing angle directly from consecutive GNSS positions. We collected a larger and more diverse dataset that includes both road and grass terrains, and validated Seq-DeepIPC on a robot dog. Comparative and ablation studies show that sequential inputs improve perception and control in our models, while other baselines do not benefit. Seq-DeepIPC achieves competitive or better results with reasonable model size; although GNSS-only heading is less reliable near tall buildings, it is robust in open areas. Overall, Seq-DeepIPC extends end-to-end navigation beyond wheeled robots to more versatile and temporally-aware systems. To support future research, we will release the codes to our GitHub repository at https://github.com/oskarnatan/Seq-DeepIPC.
comment: Preprint notice, this manuscript has been submitted to IEEE sensors journal for possible publication
Mixed Density Diffuser: Efficient Planning with Non-uniform Temporal Resolution
Recent studies demonstrate that diffusion planners benefit from sparse-step planning over single-step planning. Training models to skip steps in their trajectories helps capture long-term dependencies without additional or memory computational cost. However, predicting excessively sparse plans degrades performance. We hypothesize this temporal density threshold is non-uniform across a temporal horizon and that certain parts of a planned trajectory should be more densely planned. We propose Mixed Density Diffuser (MDD), a diffusion planner where the densities throughout the horizon are tunable hyperparameters. MDD achieves a new SOTA across the Maze2D, Franka Kitchen, and Antmaze D4RL task domains.
comment: European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESSAN) (under review)
Planning Oriented Integrated Sensing and Communication
Integrated sensing and communication (ISAC) enables simultaneous localization, environment perception, and data exchange for connected autonomous vehicles. However, most existing ISAC designs prioritize sensing accuracy and communication throughput, treating all targets uniformly and overlooking the impact of critical obstacles on motion efficiency. To overcome this limitation, we propose a planning-oriented ISAC (PISAC) framework that reduces the sensing uncertainty of planning-bottleneck obstacles and expands the safe navigable path for the ego-vehicle, thereby bridging the gap between physical-layer optimization and motion-level planning. The core of PISAC lies in deriving a closed-form safety bound that explicitly links ISAC transmit power to sensing uncertainty, based on the Cram\'er-Rao Bound and occupancy inflation principles. Using this model, we formulate a bilevel power allocation and motion planning (PAMP) problem, where the inner layer optimizes the ISAC beam power distribution and the outer layer computes a collision-free trajectory under uncertainty-aware safety constraints. Comprehensive simulations in high-fidelity urban driving environments demonstrate that PISAC achieves up to 40% higher success rates and over 5% shorter traversal times than existing ISAC-based and communication-oriented benchmarks, validating its effectiveness in enhancing both safety and efficiency.
ManiDP: Manipulability-Aware Diffusion Policy for Posture-Dependent Bimanual Manipulation IROS 2025
Recent work has demonstrated the potential of diffusion models in robot bimanual skill learning. However, existing methods ignore the learning of posture-dependent task features, which are crucial for adapting dual-arm configurations to meet specific force and velocity requirements in dexterous bimanual manipulation. To address this limitation, we propose Manipulability-Aware Diffusion Policy (ManiDP), a novel imitation learning method that not only generates plausible bimanual trajectories, but also optimizes dual-arm configurations to better satisfy posture-dependent task requirements. ManiDP achieves this by extracting bimanual manipulability from expert demonstrations and encoding the encapsulated posture features using Riemannian-based probabilistic models. These encoded posture features are then incorporated into a conditional diffusion process to guide the generation of task-compatible bimanual motion sequences. We evaluate ManiDP on six real-world bimanual tasks, where the experimental results demonstrate a 39.33$\%$ increase in average manipulation success rate and a 0.45 improvement in task compatibility compared to baseline methods. This work highlights the importance of integrating posture-relevant robotic priors into bimanual skill diffusion to enable human-like adaptability and dexterity.
comment: 7 pages, 6 figures, Accepted and published in IROS 2025
An Intelligent Water-Saving Irrigation System Based on Multi-Sensor Fusion and Visual Servoing Control
This paper introduces an intelligent water-saving irrigation system designed to address critical challenges in precision agriculture, such as inefficient water use and poor terrain adaptability. The system integrates advanced computer vision, robotic control, and real-time stabilization technologies via a multi-sensor fusion approach. A lightweight YOLO model, deployed on an embedded vision processor (K210), enables real-time plant container detection with over 96% accuracy under varying lighting conditions. A simplified hand-eye calibration algorithm-designed for 'handheld camera' robot arm configurations-ensures that the end effector can be precisely positioned, with a success rate exceeding 90%. The active leveling system, driven by the STM32F103ZET6 main control chip and JY901S inertial measurement data, can stabilize the irrigation platform on slopes up to 10 degrees, with a response time of 1.8 seconds. Experimental results across three simulated agricultural environments (standard greenhouse, hilly terrain, complex lighting) demonstrate a 30-50% reduction in water consumption compared to conventional flood irrigation, with water use efficiency exceeding 92% in all test cases.
End-to-End Design and Validation of a Low-Cost Stewart Platform with Nonlinear Estimation and Control
This paper presents the complete design, control, and experimental validation of a low-cost Stewart platform prototype developed as an affordable yet capable robotic testbed for research and education. The platform combines off the shelf components with 3D printed and custom fabricated parts to deliver full six degrees of freedom motions using six linear actuators connecting a moving platform to a fixed base. The system software integrates dynamic modeling, data acquisition, and real time control within a unified framework. A robust trajectory tracking controller based on feedback linearization, augmented with an LQR scheme, compensates for the platform's nonlinear dynamics to achieve precise motion control. In parallel, an Extended Kalman Filter fuses IMU and actuator encoder feedback to provide accurate and reliable state estimation under sensor noise and external disturbances. Unlike prior efforts that emphasize only isolated aspects such as modeling or control, this work delivers a complete hardware-software platform validated through both simulation and experiments on static and dynamic trajectories. Results demonstrate effective trajectory tracking and real-time state estimation, highlighting the platform's potential as a cost effective and versatile tool for advanced research and educational applications.
comment: 24 pages, journal
Clinic-Oriented Feasibility of a Sensor-Fused Wearable for Upper-Limb Function
Background: Upper-limb weakness and tremor (4--12 Hz) limit activities of daily living (ADL) and reduce adherence to home rehabilitation. Objective: To assess technical feasibility and clinician-relevant signals of a sensor-fused wearable targeting the triceps brachii and extensor pollicis brevis. Methods: A lightweight node integrates surface EMG (1 kHz), IMU (100--200 Hz), and flex/force sensors with on-device INT8 inference (Tiny 1D-CNN/Transformer) and a safety-bounded assist policy (angle/torque/jerk limits; stall/time-out). Healthy adults (n = 12) performed three ADL-like tasks. Primary outcomes: Tremor Index (TI), range of motion (ROM), repetitions (Reps min$^{-1}$). Secondary: EMG median-frequency slope (fatigue trend), closed-loop latency, session completion, and device-related adverse events. Analyses used subject-level paired medians with BCa 95\% CIs; exact Wilcoxon $p$-values are reported in the Results. Results: Assistance was associated with lower tremor prominence and improved task throughput: TI decreased by $-0.092$ (95\% CI [$-0.102$, $-0.079$]), ROM increased by $+12.65\%$ (95\% CI [$+8.43$, $+13.89$]), and Reps rose by $+2.99$ min$^{-1}$ (95\% CI [$+2.61$, $+3.35$]). Median on-device latency was 8.7 ms at a 100 Hz loop rate; all sessions were completed with no device-related adverse events. Conclusions: Multimodal sensing with low-latency, safety-bounded assistance produced improved movement quality (TI $\downarrow$) and throughput (ROM, Reps $\uparrow$) in a pilot technical-feasibility setting, supporting progression to IRB-approved patient studies. Trial registration: Not applicable (pilot non-clinical).
comment: 19 pages, 7 figures, 5 Tables
Never Too Rigid to Reach: Adaptive Virtual Model Control with LLM- and Lyapunov-Based Reinforcement Learning
Robotic arms are increasingly deployed in uncertain environments, yet conventional control pipelines often become rigid and brittle when exposed to perturbations or incomplete information. Virtual Model Control (VMC) enables compliant behaviors by embedding virtual forces and mapping them into joint torques, but its reliance on fixed parameters and limited coordination among virtual components constrains adaptability and may undermine stability as task objectives evolve. To address these limitations, we propose Adaptive VMC with Large Language Model (LLM)- and Lyapunov-Based Reinforcement Learning (RL), which preserves the physical interpretability of VMC while supporting stability-guaranteed online adaptation. The LLM provides structured priors and high-level reasoning that enhance coordination among virtual components, improve sample efficiency, and facilitate flexible adjustment to varying task requirements. Complementarily, Lyapunov-based RL enforces theoretical stability constraints, ensuring safe and reliable adaptation under uncertainty. Extensive simulations on a 7-DoF Panda arm demonstrate that our approach effectively balances competing objectives in dynamic tasks, achieving superior performance while highlighting the synergistic benefits of LLM guidance and Lyapunov-constrained adaptation.
Adaptive Keyframe Selection for Scalable 3D Scene Reconstruction in Dynamic Environments
In this paper, we propose an adaptive keyframe selection method for improved 3D scene reconstruction in dynamic environments. The proposed method integrates two complementary modules: an error-based selection module utilizing photometric and structural similarity (SSIM) errors, and a momentum-based update module that dynamically adjusts keyframe selection thresholds according to scene motion dynamics. By dynamically curating the most informative frames, our approach addresses a key data bottleneck in real-time perception. This allows for the creation of high-quality 3D world representations from a compressed data stream, a critical step towards scalable robot learning and deployment in complex, dynamic environments. Experimental results demonstrate significant improvements over traditional static keyframe selection strategies, such as fixed temporal intervals or uniform frame skipping. These findings highlight a meaningful advancement toward adaptive perception systems that can dynamically respond to complex and evolving visual scenes. We evaluate our proposed adaptive keyframe selection module on two recent state-of-the-art 3D reconstruction networks, Spann3r and CUT3R, and observe consistent improvements in reconstruction quality across both frameworks. Furthermore, an extensive ablation study confirms the effectiveness of each individual component in our method, underlining their contribution to the overall performance gains.
comment: Under Review for ROBOVIS 2026
Stand, Walk, Navigate: Recovery-Aware Visual Navigation on a Low-Cost Wheeled Quadruped IROS 2025
Wheeled-legged robots combine the efficiency of wheels with the obstacle negotiation of legs, yet many state-of-the-art systems rely on costly actuators and sensors, and fall-recovery is seldom integrated, especially for wheeled-legged morphologies. This work presents a recovery-aware visual-inertial navigation system on a low-cost wheeled quadruped. The proposed system leverages vision-based perception from a depth camera and deep reinforcement learning policies for robust locomotion and autonomous recovery from falls across diverse terrains. Simulation experiments show agile mobility with low-torque actuators over irregular terrain and reliably recover from external perturbations and self-induced failures. We further show goal directed navigation in structured indoor spaces with low-cost perception. Overall, this approach lowers the barrier to deploying autonomous navigation and robust locomotion policies in budget-constrained robotic platforms.
comment: Accepted at the IROS 2025 Workshop on Wheeled-Legged Robots
Coordinated Autonomous Drones for Human-Centered Fire Evacuation in Partially Observable Urban Environments
Autonomous drone technology holds significant promise for enhancing search and rescue operations during evacuations by guiding humans toward safety and supporting broader emergency response efforts. However, their application in dynamic, real-time evacuation support remains limited. Existing models often overlook the psychological and emotional complexity of human behavior under extreme stress. In real-world fire scenarios, evacuees frequently deviate from designated safe routes due to panic and uncertainty. To address these challenges, this paper presents a multi-agent coordination framework in which autonomous Unmanned Aerial Vehicles (UAVs) assist human evacuees in real-time by locating, intercepting, and guiding them to safety under uncertain conditions. We model the problem as a Partially Observable Markov Decision Process (POMDP), where two heterogeneous UAV agents, a high-level rescuer (HLR) and a low-level rescuer (LLR), coordinate through shared observations and complementary capabilities. Human behavior is captured using an agent-based model grounded in empirical psychology, where panic dynamically affects decision-making and movement in response to environmental stimuli. The environment features stochastic fire spread, unknown evacuee locations, and limited visibility, requiring UAVs to plan over long horizons to search for humans and adapt in real-time. Our framework employs the Proximal Policy Optimization (PPO) algorithm with recurrent policies to enable robust decision-making in partially observable settings. Simulation results demonstrate that the UAV team can rapidly locate and intercept evacuees, significantly reducing the time required for them to reach safety compared to scenarios without UAV assistance.
comment: Accepted to IEEE Global Humanitarian Technology Conference (GHTC 2025). 8 pages, 4 figures
Modeling and Scheduling of Fusion Patterns in Autonomous Driving Systems (Extended Version)
In Autonomous Driving Systems (ADS), Directed Acyclic Graphs (DAGs) are widely used to model complex data dependencies and inter-task communication. However, existing DAG scheduling approaches oversimplify data fusion tasks by assuming fixed triggering mechanisms, failing to capture the diverse fusion patterns found in real-world ADS software stacks. In this paper, we propose a systematic framework for analyzing various fusion patterns and their performance implications in ADS. Our framework models three distinct fusion task types: timer-triggered, wait-for-all, and immediate fusion, which comprehensively represent real-world fusion behaviors. Our Integer Linear Programming (ILP)-based approach enables an optimization of multiple real-time performance metrics, including reaction time, time disparity, age of information, and response time, while generating deterministic offline schedules directly applicable to real platforms. Evaluation using real-world ADS case studies, Raspberry Pi implementation, and randomly generated DAGs demonstrates that our framework handles diverse fusion patterns beyond the scope of existing work, and achieves substantial performance improvements in comparable scenarios.
Motivating Students' Self-study with Goal Reminder and Emotional Support
While the efficacy of social robots in supporting people in learning tasks has been extensively investigated, their potential impact in assisting students in self-studying contexts has not been investigated much. This study explores how a social robot can act as a peer study companion for college students during self-study tasks by delivering task-oriented goal reminder and positive emotional support. We conducted an exploratory Wizard-of-Oz study to explore how these robotic support behaviors impacted students' perceived focus, productivity, and engagement in comparison to a robot that only provided physical presence (control). Our study results suggest that participants in the goal reminder and the emotional support conditions reported greater ease of use, with the goal reminder condition additionally showing a higher willingness to use the robot in future study sessions. Participants' satisfaction with the robot was correlated with their perception of the robot as a social other, and this perception was found to be a predictor for their level of goal achievement in the self-study task. These findings highlight the potential of socially assistive robots to support self-study through both functional and emotional engagement.
comment: RO-MAN 2025 accepted paper
A Survey on Efficient Vision-Language-Action Models
Vision-Language-Action models (VLAs) represent a significant frontier in embodied intelligence, aiming to bridge digital knowledge with physical-world interaction. While these models have demonstrated remarkable generalist capabilities, their deployment is severely hampered by the substantial computational and data requirements inherent to their underlying large-scale foundation models. Motivated by the urgent need to address these challenges, this survey presents the first comprehensive review of Efficient Vision-Language-Action models (Efficient VLAs) across the entire data-model-training process. Specifically, we introduce a unified taxonomy to systematically organize the disparate efforts in this domain, categorizing current techniques into three core pillars: (1) Efficient Model Design, focusing on efficient architectures and model compression; (2) Efficient Training, which reduces computational burdens during model learning; and (3) Efficient Data Collection, which addresses the bottlenecks in acquiring and utilizing robotic data. Through a critical review of state-of-the-art methods within this framework, this survey not only establishes a foundational reference for the community but also summarizes representative applications, delineates key challenges, and charts a roadmap for future research. We maintain a continuously updated project page to track our latest developments: https://evla-survey.github.io/
comment: 26 pages, 8 figures
Steering Flexible Linear Objects in Planar Environments by Two Robot Hands Using Euler's Elastica Solutions
The manipulation of flexible objects such as cables, wires and fresh food items by robot hands forms a special challenge in robot grasp mechanics. This paper considers the steering of flexible linear objects in planar environments by two robot hands. The flexible linear object, modeled as an elastic non-stretchable rod, is manipulated by varying the gripping endpoint positions while keeping equal endpoint tangents. The flexible linear object shape has a closed form solution in terms of the grasp endpoint positions and tangents, called Euler's elastica. This paper obtains the elastica solutions under the optimal control framework, then uses the elastica solutions to obtain closed-form criteria for non self-intersection, stability and obstacle avoidance of the flexible linear object. The new tools are incorporated into a planning scheme for steering flexible linear objects in planar environments populated by sparsely spaced obstacles. The scheme is fully implemented and demonstrated with detailed examples.
Data-Driven Soft Robot Control via Adiabatic Spectral Submanifolds
The mechanical complexity of soft robots creates significant challenges for their model-based control. Specifically, linear data-driven models have struggled to control soft robots on complex, spatially extended paths that explore regions with significant nonlinear behavior. To account for these nonlinearities, we develop here a model-predictive control strategy based on the recent theory of adiabatic spectral submanifolds (aSSMs). This theory is applicable because the internal vibrations of heavily overdamped robots decay at a speed that is much faster than the desired speed of the robot along its intended path. In that case, low-dimensional attracting invariant manifolds (aSSMs) emanate from the path and carry the dominant dynamics of the robot. Aided by this recent theory, we devise an aSSM-based model-predictive control scheme purely from data. We demonstrate our data-driven model's effectiveness in tracking dynamic trajectories across diverse tasks, validated on a high-fidelity, high-dimensional finite-element model of a soft trunk robot and a Cosserat rod-based elastic soft arm. Notably, we find that five- or six-dimensional aSSM-reduced models outperform the tracking performance of other data-driven modeling methods by a factor up to $10$ across all closed-loop control tasks.
comment: 41 pages, 24 figures
iWalker: Imperative Visual Planning for Walking Humanoid Robot
Humanoid robots, designed to operate in human-centric environments, serve as a fundamental platform for a broad range of tasks. Although humanoid robots have been extensively studied for decades, a majority of existing humanoid robots still heavily rely on complex modular frameworks, leading to inflexibility and potential compounded errors from independent sensing, planning, and acting components. In response, we propose an end-to-end humanoid sense-plan-act walking system, enabling vision-based obstacle avoidance and footstep planning for whole body balancing simultaneously. We designed two imperative learning (IL)-based bilevel optimizations for model-predictive step planning and whole body balancing, respectively, to achieve self-supervised learning for humanoid robot walking. This enables the robot to learn from arbitrary unlabeled data, improving its adaptability and generalization capabilities. We refer to our method as iWalker and demonstrate its effectiveness in both simulated and real-world environments, representing a significant advancement toward autonomous humanoid robots.
Onboard Mission Replanning for Adaptive Cooperative Multi-Robot Systems
Cooperative autonomous robotic systems have significant potential for executing complex multi-task missions across space, air, ground, and maritime domains. But they commonly operate in remote, dynamic and hazardous environments, requiring rapid in-mission adaptation without reliance on fragile or slow communication links to centralised compute. Fast, on-board replanning algorithms are therefore needed to enhance resilience. Reinforcement Learning shows strong promise for efficiently solving mission planning tasks when formulated as Travelling Salesperson Problems (TSPs), but existing methods: 1) are unsuitable for replanning, where agents do not start at a single location; 2) do not allow cooperation between agents; 3) are unable to model tasks with variable durations; or 4) lack practical considerations for on-board deployment. Here we define the Cooperative Mission Replanning Problem as a novel variant of multiple TSP with adaptations to overcome these issues, and develop a new encoder/decoder-based model using Graph Attention Networks and Attention Models to solve it effectively and efficiently. Using a simple example of cooperative drones, we show our replanner consistently (90% of the time) maintains performance within 10% of the state-of-the-art LKH3 heuristic solver, whilst running 85-370 times faster on a Raspberry Pi. This work paves the way for increased resilience in autonomous multi-agent systems.
comment: 9 pages, 5 figures, 1 table
World-Env: Leveraging World Model as a Virtual Environment for VLA Post-Training
Vision-Language-Action (VLA) models trained via imitation learning suffer from significant performance degradation in data-scarce scenarios due to their reliance on large-scale demonstration datasets. Although reinforcement learning (RL)-based post-training has proven effective in addressing data scarcity, its application to VLA models is hindered by the non-resettable nature of real-world environments. This limitation is particularly critical in high-risk domains such as industrial automation, where interactions often induce state changes that are costly or infeasible to revert. Furthermore, existing VLA approaches lack a reliable mechanism for detecting task completion, leading to redundant actions that reduce overall task success rates. To address these challenges, we propose World-Env, an RL-based post-training framework that replaces physical interaction with a low-cost, world model-based virtual simulator. World-Env consists of two key components: (1) a video-based world simulator that generates temporally consistent future visual observations, and (2) a vision-language model (VLM)-guided instant reflector that provides continuous reward signals and predicts action termination. This simulated environment enables VLA models to safely explore and generalize beyond their initial imitation learning distribution. Our method achieves notable performance gains with as few as five expert demonstrations per task. Experiments on complex robotic manipulation tasks demonstrate that World-Env effectively overcomes the data inefficiency, safety constraints, and inefficient execution of conventional VLA models that rely on real-world interaction, offering a practical and scalable solution for post-training in resource-constrained settings. Our code is available at https://github.com/junjxiao/world-env.
Controllable Collision Scenario Generation via Collision Pattern Prediction
Evaluating the safety of autonomous vehicles (AVs) requires diverse, safety-critical scenarios, with collisions being especially important yet rare and unsafe to collect in the real world. Therefore, the community has been focusing on generating safety-critical scenarios in simulation. However, controlling attributes such as collision type and time-to-accident (TTA) remains challenging. We introduce a new task called controllable collision scenario generation, where the goal is to produce trajectories that realize a user-specified collision type and TTA, to investigate the feasibility of automatically generating desired collision scenarios. To support this task, we present COLLIDE, a large-scale collision scenario dataset constructed by transforming real-world driving logs into diverse collisions, balanced across five representative collision types and different TTA intervals. We propose a framework that predicts Collision Pattern, a compact and interpretable representation that captures the spatial configuration of the ego and the adversarial vehicles at impact, before rolling out full adversarial trajectories. Experiments show that our approach outperforms strong baselines in both collision rate and controllability. Furthermore, generated scenarios consistently induce higher planner failure rates, revealing limitations of existing planners. We demonstrate that these scenarios fine-tune planners for robustness improvements, contributing to safer AV deployment in different collision scenarios. Project page is available at https://submit-user.github.io/anon2025
comment: 8 pages, 3 figures
ExAMPC: the Data-Driven Explainable and Approximate NMPC with Physical Insights IROS
Amidst the surge in the use of Artificial Intelligence (AI) for control purposes, classical and model-based control methods maintain their popularity due to their transparency and deterministic nature. However, advanced controllers like Nonlinear Model Predictive Control (NMPC), despite proven capabilities, face adoption challenges due to their computational complexity and unpredictable closed-loop performance in complex validation systems. This paper introduces ExAMPC, a methodology bridging classical control and explainable AI by augmenting the NMPC with data-driven insights to improve the trustworthiness and reveal the optimization solution and closed-loop performance's sensitivities to physical variables and system parameters. By employing a low-order spline embedding, we reduce the open-loop trajectory dimensionality by over 95%, and integrate it with SHAP and Symbolic Regression from eXplainable AI (XAI) for an approximate NMPC, enabling intuitive physical insights into the NMPC's optimization routine. The prediction accuracy of the approximate NMPC is enhanced through physics-inspired continuous-time constraints penalties, reducing the predicted continuous trajectory violations by 93%. ExAMPC also enables accurate forecasting of the NMPC's computational requirements with explainable insights on worst-case scenarios. Experimental validation on automated valet parking and autonomous racing with lap-time optimization, demonstrates the methodology's practical effectiveness for potential real-world applications.
comment: This paper has been accepted for publication in the 2025 IEEE/RSJ IROS Conference
A Single Motor Nano Aerial Vehicle with Novel Peer-to-Peer Communication and Sensing Mechanism
Communication and position sensing are among the most important capabilities for swarm robots to interact with their peers and perform tasks collaboratively. However, the hardware required to facilitate communication and position sensing is often too complicated, expensive, and bulky to be carried on swarm robots. Here we present Maneuverable Piccolissimo 3 (MP3), a minimalist, single motor drone capable of executing inter-robot communication via infrared light and triangulation-based sensing of relative bearing, distance, and elevation using message arrival time. Thanks to its novel design, MP3 can communicate with peers and localize itself using simple components, keeping its size and mass small and making it inherently safe for human interaction. We present the hardware and software design of MP3 and demonstrate its capability to localize itself, fly stably, and maneuver in the environment using peer-to-peer communication and sensing.
comment: Proceedings of Robotics: Science and Systems (RSS), 2024
On the Importance of Tactile Sensing for Imitation Learning: A Case Study on Robotic Match Lighting
The field of robotic manipulation has advanced significantly in recent years. At the sensing level, several novel tactile sensors have been developed, capable of providing accurate contact information. On a methodological level, learning from demonstrations has proven an efficient paradigm to obtain performant robotic manipulation policies. The combination of both holds the promise to extract crucial contact-related information from the demonstration data and actively exploit it during policy rollouts. However, this integration has so far been underexplored, most notably in dynamic, contact-rich manipulation tasks where precision and reactivity are essential. This work therefore proposes a multimodal, visuotactile imitation learning framework that integrates a modular transformer architecture with a flow-based generative model, enabling efficient learning of fast and dexterous manipulation policies. We evaluate our framework on the dynamic, contact-rich task of robotic match lighting - a task in which tactile feedback influences human manipulation performance. The experimental results highlight the effectiveness of our approach and show that adding tactile information improves policy performance, thereby underlining their combined potential for learning dynamic manipulation from few demonstrations. Project website: https://sites.google.com/view/tactile-il .
D-LIO: 6DoF Direct LiDAR-Inertial Odometry based on Simultaneous Truncated Distance Field Mapping
This paper presents a new approach for 6DoF Direct LiDAR-Inertial Odometry (D-LIO) based on the simultaneous mapping of truncated distance fields on CPU. Such continuous representation (in the vicinity of the points) enables working with raw 3D LiDAR data online, avoiding the need of LiDAR feature selection and tracking, simplifying the odometry pipeline and easily generalizing to many scenarios. The method is based on the proposed Fast Truncated Distance Field (Fast-TDF) method as a convenient tool to represent the environment. Such representation enables i) solving the LiDAR point-cloud registration as a nonlinear optimization process without the need of selecting/tracking LiDAR features in the input data, ii) simultaneously producing an accurate truncated distance field map of the environment, and iii) updating such map at constant time independently of its size. The approach is tested using open datasets, aerial and ground. It is also benchmarked against other state-of-the-art odometry approaches, demonstrating the same or better level of accuracy with the added value of an online-generated TDF representation of the environment, that can be used for other robotics tasks as planning or collision avoidance. The source code is publicly available at https://anonymous.4open.science/r/D-LIO
comment: 9 pages, 3 figures and 43 references
DDBot: Differentiable Physics-based Digging Robot for Unknown Granular Materials
Automating the manipulation of granular materials poses significant challenges due to complex contact dynamics, unpredictable material properties, and intricate system states. Existing approaches often fail to achieve efficiency and accuracy in such tasks. To fill the research gap, this paper studies the small-scale and high-precision granular material digging task with unknown physical properties. A new framework, named differentiable digging robot (DDBot), is proposed to manipulate granular materials, including sand and soil. Specifically, we equip DDBot with a differentiable physics-based simulator, tailored for granular material manipulation, powered by GPU-accelerated parallel computing and automatic differentiation. DDBot can perform efficient differentiable system identification and high-precision digging skill optimisation for unknown granular materials, which is enabled by a differentiable skill-to-action mapping, a task-oriented demonstration method, gradient clipping and line search-based gradient descent. Experimental results show that DDBot can efficiently (converge within 5 to 20 minutes) identify unknown granular material dynamics and optimise digging skills, with high-precision results in zero-shot real-world deployments, highlighting its practicality. Benchmark results against state-of-the-art baselines also confirm the robustness and efficiency of DDBot in such digging tasks.
comment: Accepted as a regular paper by the IEEE Transactions on Robotics
FailSafe: Reasoning and Recovery from Failures in Vision-Language-Action Models
Recent advances in robotic manipulation have integrated low-level robotic control into Vision-Language Models (VLMs), extending them into Vision-Language-Action (VLA) models. Although state-of-the-art VLAs achieve strong performance in downstream robotic applications, supported by large-scale crowd-sourced robot training data, they still inevitably encounter failures during execution. Enabling robots to reason and recover from unpredictable and abrupt failures remains a critical challenge. Existing robotic manipulation datasets, collected in either simulation or the real world, primarily provide only ground-truth trajectories, leaving robots unable to recover once failures occur. Moreover, the few datasets that address failure detection typically offer only textual explanations, which are difficult to utilize directly in VLA models. To address this gap, we introduce FailSafe, a novel failure generation and recovery system that automatically produces diverse failure cases paired with executable recovery actions. FailSafe can be seamlessly applied to any manipulation task in any simulator, enabling scalable creation of failure action data. To demonstrate its effectiveness, we fine-tune LLaVa-OneVision-7B (LLaVa-OV-7B) to build FailSafe-VLM. Experimental results show that FailSafe-VLM successfully helps robotic arms detect and recover from potential failures, improving the performance of three state-of-the-art VLA models (pi0-FAST, OpenVLA, OpenVLA-OFT) by up to 22.6% on average across several tasks in Maniskill. Furthermore, FailSafe-VLM could generalize across different spatial configurations, camera viewpoints, object and robotic embodiments. We plan to release the FailSafe code to the community.
comment: Project Page: https://jimntu.github.io/FailSafe
Lazy-DaSH: Lazy Approach for Hypergraph-based Multi-robot Task and Motion Planning
We introduce Lazy-DaSH, an improvement over the recent state of the art multi-robot task and motion planning method DaSH, which scales to more than double the number of robots and objects compared to the original method and achieves an order of magnitude faster planning time when applied to a multi-manipulator object rearrangement problem. We achieve this improvement through a hierarchical approach, where a high-level task planning layer identifies planning spaces required for task completion, and motion feasibility is validated lazily only within these spaces. In contrast, DaSH precomputes the motion feasibility of all possible actions, resulting in higher costs for constructing state space representations. Lazy-DaSH maintains efficient query performance by utilizing a constraint feedback mechanism within its hierarchical structure, ensuring that motion feasibility is effectively conveyed to the query process. By maintaining smaller state space representations, our method significantly reduces both representation construction time and query time. We evaluate Lazy-DaSH in four distinct scenarios, demonstrating its scalability to increasing numbers of robots and objects, as well as its adaptability in resolving conflicts through the constraint feedback mechanism.
Hierarchical Language Models for Semantic Navigation and Manipulation in an Aerial-Ground Robotic System
Heterogeneous multirobot systems show great potential in complex tasks requiring coordinated hybrid cooperation. However, existing methods that rely on static or task-specific models often lack generalizability across diverse tasks and dynamic environments. This highlights the need for generalizable intelligence that can bridge high-level reasoning with low-level execution across heterogeneous agents. To address this, we propose a hierarchical multimodal framework that integrates a prompted large language model (LLM) with a fine-tuned vision-language model (VLM). At the system level, the LLM performs hierarchical task decomposition and constructs a global semantic map, while the VLM provides semantic perception and object localization, where the proposed GridMask significantly enhances the VLM's spatial accuracy for reliable fine-grained manipulation. The aerial robot leverages this global map to generate semantic paths and guide the ground robot's local navigation and manipulation, ensuring robust coordination even in target-absent or ambiguous scenarios. We validate the framework through extensive simulation and real-world experiments on long-horizon object arrangement tasks, demonstrating zero-shot adaptability, robust semantic navigation, and reliable manipulation in dynamic environments. To the best of our knowledge, this work presents the first heterogeneous aerial-ground robotic system that integrates VLM-based perception with LLM-driven reasoning for global high-level task planning and execution.
comment: 18 pages, 10 figures
CIVIL: Causal and Intuitive Visual Imitation Learning
Today's robots attempt to learn new tasks by imitating human examples. These robots watch the human complete the task, and then try to match the actions taken by the human expert. However, this standard approach to visual imitation learning is fundamentally limited: the robot observes what the human does, but not why the human chooses those behaviors. Without understanding which features of the system or environment factor into the human's decisions, robot learners often misinterpret the human's examples. In practice, this results in causal confusion, inefficient learning, and robot policies that fail when the environment changes. We therefore propose a shift in perspective: instead of asking human teachers just to show what actions the robot should take, we also enable humans to intuitively indicate why they made those decisions. Under our paradigm human teachers attach markers to task-relevant objects and use natural language prompts to describe their state representation. Our proposed algorithm, CIVIL, leverages this augmented demonstration data to filter the robot's visual observations and extract a feature representation that aligns with the human teacher. CIVIL then applies these causal features to train a transformer-based policy that -- when tested on the robot -- is able to emulate human behaviors without being confused by visual distractors or irrelevant items. Our simulations and real-world experiments demonstrate that robots trained with CIVIL learn both what actions to take and why to take those actions, resulting in better performance than state-of-the-art baselines. From the human's perspective, our user study reveals that this new training paradigm actually reduces the total time required for the robot to learn the task, and also improves the robot's performance in previously unseen scenarios. See videos at our project website: https://civil2025.github.io
HAND Me the Data: Fast Robot Adaptation via Hand Path Retrieval
We hand the community HAND, a simple and time-efficient method for teaching robots new manipulation tasks through human hand demonstrations. Instead of relying on task-specific robot demonstrations collected via teleoperation, HAND uses easy-to-provide hand demonstrations to retrieve relevant behaviors from task-agnostic robot play data. Using a visual tracking pipeline, HAND extracts the motion of the human hand from the hand demonstration and retrieves robot sub-trajectories in two stages: first filtering by visual similarity, then retrieving trajectories with similar behaviors to the hand. Fine-tuning a policy on the retrieved data enables real-time learning of tasks in under four minutes, without requiring calibrated cameras or detailed hand pose estimation. Experiments also show that HAND outperforms retrieval baselines by over 2x in average task success rates on real robots. Videos can be found at our project website: https://liralab.usc.edu/handretrieval/.
Open-Vocabulary Spatio-Temporal Scene Graph for Robot Perception and Teleoperation Planning
Teleoperation via natural-language reduces operator workload and enhances safety in high-risk or remote settings. However, in dynamic remote scenes, transmission latency during bidirectional communication creates gaps between remote perceived states and operator intent, leading to command misunderstanding and incorrect execution. To mitigate this, we introduce the Spatio-Temporal Open-Vocabulary Scene Graph (ST-OVSG), a representation that enriches open-vocabulary perception with temporal dynamics and lightweight latency annotations. ST-OVSG leverages LVLMs to construct open-vocabulary 3D object representations, and extends them into the temporal domain via Hungarian assignment with our temporal matching cost, yielding a unified spatio-temporal scene graph. A latency tag is embedded to enable LVLM planners to retrospectively query past scene states, thereby resolving local-remote state mismatches caused by transmission delays. To further reduce redundancy and highlight task-relevant cues, we propose a task-oriented subgraph filtering strategy that produces compact inputs for the planner. ST-OVSG generalizes to novel categories and enhances planning robustness against transmission latency without requiring fine-tuning. Experiments show that our method achieves 74 percent node accuracy on the Replica benchmark, outperforming ConceptGraph. Notably, in the latency-robustness experiment, the LVLM planner assisted by ST-OVSG achieved a planning success rate of 70.5 percent.
KungfuBot: Physics-Based Humanoid Whole-Body Control for Learning Highly-Dynamic Skills NeurIPS 2025
Humanoid robots are promising to acquire various skills by imitating human behaviors. However, existing algorithms are only capable of tracking smooth, low-speed human motions, even with delicate reward and curriculum design. This paper presents a physics-based humanoid control framework, aiming to master highly-dynamic human behaviors such as Kungfu and dancing through multi-steps motion processing and adaptive motion tracking. For motion processing, we design a pipeline to extract, filter out, correct, and retarget motions, while ensuring compliance with physical constraints to the maximum extent. For motion imitation, we formulate a bi-level optimization problem to dynamically adjust the tracking accuracy tolerance based on the current tracking error, creating an adaptive curriculum mechanism. We further construct an asymmetric actor-critic framework for policy training. In experiments, we train whole-body control policies to imitate a set of highly-dynamic motions. Our method achieves significantly lower tracking errors than existing approaches and is successfully deployed on the Unitree G1 robot, demonstrating stable and expressive behaviors. The project page is https://kungfu-bot.github.io.
comment: NeurIPS 2025. Project Page: https://kungfu-bot.github.io/
COMPASS: Cross-embodiment Mobility Policy via Residual RL and Skill Synthesis
As robots are increasingly deployed in diverse application domains, enabling robust mobility across different embodiments has become a critical challenge. Classical mobility stacks, though effective on specific platforms, require extensive per-robot tuning and do not scale easily to new embodiments. Learning-based approaches, such as imitation learning (IL), offer alternatives, but face significant limitations on the need for high-quality demonstrations for each embodiment. To address these challenges, we introduce COMPASS, a unified framework that enables scalable cross-embodiment mobility using expert demonstrations from only a single embodiment. We first pre-train a mobility policy on a single robot using IL, combining a world model with a policy model. We then apply residual reinforcement learning (RL) to efficiently adapt this policy to diverse embodiments through corrective refinements. Finally, we distill specialist policies into a single generalist policy conditioned on an embodiment embedding vector. This design significantly reduces the burden of collecting data while enabling robust generalization across a wide range of robot designs. Our experiments demonstrate that COMPASS scales effectively across diverse robot platforms while maintaining adaptability to various environment configurations, achieving a generalist policy with a success rate approximately 5X higher than the pre-trained IL policy on unseen embodiments, and further demonstrates zero-shot sim-to-real transfer.
Online POMDP Planning with Anytime Deterministic Optimality Guarantees
Decision-making under uncertainty is a critical aspect of many practical autonomous systems due to incomplete information. Partially Observable Markov Decision Processes (POMDPs) offer a mathematically principled framework for formulating decision-making problems under such conditions. However, finding an optimal solution for a POMDP is generally intractable. In recent years, there has been a significant progress of scaling approximate solvers from small to moderately sized problems, using online tree search solvers. Often, such approximate solvers are limited to probabilistic or asymptotic guarantees towards the optimal solution. In this paper, we derive a deterministic relationship for discrete POMDPs between an approximated and the optimal solution. We show that at any time, we can derive bounds that relate between the existing solution and the optimal one. We show that our derivations provide an avenue for a new set of algorithms and can be attached to existing algorithms that have a certain structure to provide them with deterministic guarantees with marginal computational overhead. In return, not only do we certify the solution quality, but we demonstrate that making a decision based on the deterministic guarantee may result in superior performance compared to the original algorithm without the deterministic certification.
Multiagent Systems
Model Proficiency in Centralized Multi-Agent Systems: A Performance Study
Autonomous agents are increasingly deployed in dynamic environments where their ability to perform a given task depends on both individual and team-level proficiency. While proficiency self-assessment (PSA) has been studied for single agents, its extension to a team of agents remains underexplored. This letter addresses this gap by presenting a framework for team PSA in centralized settings. We investigate three metrics for centralized team PSA: the measurement prediction bound (MPB), the Kolmogorov-Smirnov (KS) statistic, and the Kullback-Leibler (KL) divergence. These metrics quantify the discrepancy between predicted and actual measurements. We use the KL divergence as a reference metric since it compares the true and predictive distributions, whereas the MPB and KS provide efficient indicators for in situ assessment. Simulation results in a target tracking scenario demonstrate that both MPB and KS metrics accurately capture model mismatches, align with the KL divergence reference, and enable real-time proficiency assessment.
A Neuro-Symbolic Multi-Agent Approach to Legal-Cybersecurity Knowledge Integration
The growing intersection of cybersecurity and law creates a complex information space where traditional legal research tools struggle to deal with nuanced connections between cases, statutes, and technical vulnerabilities. This knowledge divide hinders collaboration between legal experts and cybersecurity professionals. To address this important gap, this work provides a first step towards intelligent systems capable of navigating the increasingly intricate cyber-legal domain. We demonstrate promising initial results on multilingual tasks.
comment: 7 pages
AutoStreamPipe: LLM Assisted Automatic Generation of Data Stream Processing Pipelines
Data pipelines are essential in stream processing as they enable the efficient collection, processing, and delivery of real-time data, supporting rapid data analysis. In this paper, we present AutoStreamPipe, a novel framework that employs Large Language Models (LLMs) to automate the design, generation, and deployment of stream processing pipelines. AutoStreamPipe bridges the semantic gap between high-level user intent and platform-specific implementations across distributed stream processing systems for structured multi-agent reasoning by integrating a Hypergraph of Thoughts (HGoT) as an extended version of GoT. AutoStreamPipe combines resilient execution strategies, advanced query analysis, and HGoT to deliver pipelines with good accuracy. Experimental evaluations on diverse pipelines demonstrate that AutoStreamPipe significantly reduces development time (x6.3) and error rates (x5.19), as measured by a novel Error-Free Score (EFS), compared to LLM code-generation methods.
comment: Under review
Multi-Stakeholder Alignment in LLM-Powered Collaborative AI Systems: A Multi-Agent Framework for Intelligent Tutoring
The integration of Large Language Models into Intelligent Tutoring Systems pre-sents significant challenges in aligning with diverse and often conflicting values from students, parents, teachers, and institutions. Existing architectures lack for-mal mechanisms for negotiating these multi-stakeholder tensions, creating risks in accountability and bias. This paper introduces the Advisory Governance Layer (AGL), a non-intrusive, multi-agent framework designed to enable distributed stakeholder participation in AI governance. The AGL employs specialized agents representing stakeholder groups to evaluate pedagogical actions against their spe-cific policies in a privacy-preserving manner, anticipating future advances in per-sonal assistant technology that will enhance stakeholder value expression. Through a novel policy taxonomy and conflict-resolution protocols, the frame-work provides structured, auditable governance advice to the ITS without altering its core pedagogical decision-making. This work contributes a reference architec-ture and technical specifications for aligning educational AI with multi-stakeholder values, bridging the gap between high-level ethical principles and practical implementation.
CodeAD: Synthesize Code of Rules for Log-based Anomaly Detection with LLMs
Log-based anomaly detection (LogAD) is critical for maintaining the reliability and availability of large-scale online service systems. While machine learning, deep learning, and large language models (LLMs)-based methods have advanced the LogAD, they often suffer from limited interpretability, high inference costs, and extensive preprocessing requirements, limiting their practicality for real-time, high-volume log analysis. In contrast, rule-based systems offer efficiency and transparency, but require significant manual effort and are difficult to scale across diverse and evolving environments. In this paper, We present CodeAD, a novel framework that automatically synthesizes lightweight Python rule functions for LogAD using LLMs. CodeAD introduces a hierarchical clustering and anchor-grounded sampling strategy to construct representative contrastive log windows, enabling LLMs to discern discriminative anomaly patterns. To ensure robustness and generalizability, CodeAD employs an agentic workflow that iteratively generates, tests, repairs, and refines the rules until it meets correctness and abstraction requirements. The synthesized rules are interpretable, lightweight, and directly executable on raw logs, supporting efficient and transparent online anomaly detection. Our comprehensive experiments on three public datasets (BGL, Hadoop, Thunderbird) demonstrate that CodeAD achieves an average absolute improvement of 3.6% F1 score over the state-of-the-art baselines, while processing large datasets up to 4x faster and at a fraction of the cost (total LLM invocation cost under 4 USD per dataset). These results highlight CodeAD as a practical and scalable solution for online monitoring systems, enabling interpretable, efficient, and automated LogAD in real-world environment.
Multi-Agent Conditional Diffusion Model with Mean Field Communication as Wireless Resource Allocation Planner
In wireless communication systems, efficient and adaptive resource allocation plays a crucial role in enhancing overall Quality of Service (QoS). While centralized Multi-Agent Reinforcement Learning (MARL) frameworks rely on a central coordinator for policy training and resource scheduling, they suffer from scalability issues and privacy risks. In contrast, the Distributed Training with Decentralized Execution (DTDE) paradigm enables distributed learning and decision-making, but it struggles with non-stationarity and limited inter-agent cooperation, which can severely degrade system performance. To overcome these challenges, we propose the Multi-Agent Conditional Diffusion Model Planner (MA-CDMP) for decentralized communication resource management. Built upon the Model-Based Reinforcement Learning (MBRL) paradigm, MA-CDMP employs Diffusion Models (DMs) to capture environment dynamics and plan future trajectories, while an inverse dynamics model guides action generation, thereby alleviating the sample inefficiency and slow convergence of conventional DTDE methods. Moreover, to approximate large-scale agent interactions, a Mean-Field (MF) mechanism is introduced as an assistance to the classifier in DMs. This design mitigates inter-agent non-stationarity and enhances cooperation with minimal communication overhead in distributed settings. We further theoretically establish an upper bound on the distributional approximation error introduced by the MF-based diffusion generation, guaranteeing convergence stability and reliable modeling of multi-agent stochastic dynamics. Extensive experiments demonstrate that MA-CDMP consistently outperforms existing MARL baselines in terms of average reward and QoS metrics, showcasing its scalability and practicality for real-world wireless network optimization.
Coordinated Autonomous Drones for Human-Centered Fire Evacuation in Partially Observable Urban Environments
Autonomous drone technology holds significant promise for enhancing search and rescue operations during evacuations by guiding humans toward safety and supporting broader emergency response efforts. However, their application in dynamic, real-time evacuation support remains limited. Existing models often overlook the psychological and emotional complexity of human behavior under extreme stress. In real-world fire scenarios, evacuees frequently deviate from designated safe routes due to panic and uncertainty. To address these challenges, this paper presents a multi-agent coordination framework in which autonomous Unmanned Aerial Vehicles (UAVs) assist human evacuees in real-time by locating, intercepting, and guiding them to safety under uncertain conditions. We model the problem as a Partially Observable Markov Decision Process (POMDP), where two heterogeneous UAV agents, a high-level rescuer (HLR) and a low-level rescuer (LLR), coordinate through shared observations and complementary capabilities. Human behavior is captured using an agent-based model grounded in empirical psychology, where panic dynamically affects decision-making and movement in response to environmental stimuli. The environment features stochastic fire spread, unknown evacuee locations, and limited visibility, requiring UAVs to plan over long horizons to search for humans and adapt in real-time. Our framework employs the Proximal Policy Optimization (PPO) algorithm with recurrent policies to enable robust decision-making in partially observable settings. Simulation results demonstrate that the UAV team can rapidly locate and intercept evacuees, significantly reducing the time required for them to reach safety compared to scenarios without UAV assistance.
comment: Accepted to IEEE Global Humanitarian Technology Conference (GHTC 2025). 8 pages, 4 figures
TDFlow: Agentic Workflows for Test Driven Software Engineering
We introduce TDFlow, a novel test-driven agentic workflow that frames repository-scale software engineering as a test-resolution task, specifically designed to solve human-written tests. Given a set of tests, TDFlow repeatedly proposes, revises, and debugs repository-scale patches using precisely engineered sub-agents and tightly constrained tools. The workflow decomposes software engineering program repair into four components governed by respective sub-agents. This simple, forced decoupling of patch proposing, debugging, patch revision, and optional test generation (1) reduces long-context burden on any individual sub-agent, (2) focuses each sub-agent on specific, pre-defined sub-tasks, and (3) allows for specialized performance improvement on specific sub-tasks. When provided human-written tests, TDFlow attains 88.8% pass rate on SWE-Bench Lite (an absolute improvement of 27.8% over the next best system) and 94.3% on SWE-Bench Verified. Manual inspection of the 800 TDFlow runs within SWE-Bench Lite and Verified uncover only 7 instances of test hacking, which were subsequently counted as failures. Furthermore, we show that the primary obstacle to human-level software engineering performance lies within writing successful reproduction tests. We envision a human-LLM interactive system powered by TDFlow where human developers write tests solved by LLM systems. Together, these results indicate that modern LLMs, when embedded in a narrowly engineered, test-driven workflow, already achieve human-level test resolution -- with the final frontier for fully autonomous repository repair being the accurate generation of valid reproduction tests.
Fortytwo: Swarm Inference with Peer-Ranked Consensus
As centralized AI hits compute ceilings and diminishing returns from ever-larger training runs, meeting demand requires an inference layer that scales horizontally in both capacity and capability. We present Fortytwo, a novel protocol that leverages swarm intelligence principles and distributed pairwise ranking consensus to achieve superior performance in AI inference. Our approach reimagines collaboration among AI nodes using swarm inference: a peer-ranked, reputation-weighted consensus across heterogeneous models that surfaces the highest-quality responses. Using pairwise ranking with a custom Bradley-Terry-style aggregation model, we demonstrate that swarm inference substantially outperforms majority voting, achieving 85.90% on GPQA Diamond versus 68.69% for majority voting with the same model set - an improvement of +17.21 percentage points (approximately +25.1% relative). The protocol incorporates on-chain reputation so node influence adapts to demonstrated accuracy over time, yielding a meritocratic consensus that filters low-quality or malicious participants. To resist Sybil attacks, Fortytwo employs proof-of-capability in its consensus: nodes must successfully complete calibration/test requests and stake reputation to enter ranking rounds, making multi-identity attacks economically unattractive while preserving openness. Across six challenging benchmarks, including GPQA Diamond, LiveCodeBench, and AIME, our evaluation indicates higher accuracy and strong resilience to adversarial and noisy free-form prompting (e.g., prompt-injection degradation of only 0.12% versus 6.20% for a monolithic single-model baseline), while retaining practical deployability. Together, these results establish a foundation for decentralized AI systems - democratizing access to high-quality inference through collective intelligence without sacrificing reliability or security.
Magentic Marketplace: An Open-Source Environment for Studying Agentic Markets
As LLM agents advance, they are increasingly mediating economic decisions, ranging from product discovery to transactions, on behalf of users. Such applications promise benefits but also raise many questions about agent accountability and value for users. Addressing these questions requires understanding how agents behave in realistic market conditions. However, previous research has largely evaluated agents in constrained settings, such as single-task marketplaces (e.g., negotiation) or structured two-agent interactions. Real-world markets are fundamentally different: they require agents to handle diverse economic activities and coordinate within large, dynamic ecosystems where multiple agents with opaque behaviors may engage in open-ended dialogues. To bridge this gap, we investigate two-sided agentic marketplaces where Assistant agents represent consumers and Service agents represent competing businesses. To study these interactions safely, we develop Magentic-Marketplace -- a simulated environment where Assistants and Services can operate. This environment enables us to study key market dynamics: the utility agents achieve, behavioral biases, vulnerability to manipulation, and how search mechanisms shape market outcomes. Our experiments show that frontier models can approach optimal welfare -- but only under ideal search conditions. Performance degrades sharply with scale, and all models exhibit severe first-proposal bias, creating 10-30x advantages for response speed over quality. These findings reveal how behaviors emerge across market conditions, informing the design of fair and efficient agentic marketplaces.
On the Fundamental Limitations of Decentralized Learnable Reward Shaping in Cooperative Multi-Agent Reinforcement Learning
Recent advances in learnable reward shaping have shown promise in single-agent reinforcement learning by automatically discovering effective feedback signals. However, the effectiveness of decentralized learnable reward shaping in cooperative multi-agent settings remains poorly understood. We propose DMARL-RSA, a fully decentralized system where each agent learns individual reward shaping, and evaluate it on cooperative navigation tasks in the simple_spread_v3 environment. Despite sophisticated reward learning, DMARL-RSA achieves only -24.20 +/- 0.09 average reward, compared to MAPPO with centralized training at 1.92 +/- 0.87 -- a 26.12-point gap. DMARL-RSA performs similarly to simple independent learning (IPPO: -23.19 +/- 0.96), indicating that advanced reward shaping cannot overcome fundamental decentralized coordination limitations. Interestingly, decentralized methods achieve higher landmark coverage (0.888 +/- 0.029 for DMARL-RSA, 0.960 +/- 0.045 for IPPO out of 3 total) but worse overall performance than centralized MAPPO (0.273 +/- 0.008 landmark coverage) -- revealing a coordination paradox between local optimization and global performance. Analysis identifies three critical barriers: (1) non-stationarity from concurrent policy updates, (2) exponential credit assignment complexity, and (3) misalignment between individual reward optimization and global objectives. These results establish empirical limits for decentralized reward learning and underscore the necessity of centralized coordination for effective multi-agent cooperation.
comment: 8 pages, 5 figures, 2 tables
What Is Your AI Agent Buying? Evaluation, Implications and Emerging Questions for Agentic E-Commerce
Online marketplaces will be transformed by autonomous AI agents acting on behalf of consumers. Rather than humans browsing and clicking, AI agents can parse webpages or interact through APIs to evaluate products, and transact. This raises a fundamental question: what do AI agents buy-and why? We develop ACES, a sandbox environment that pairs a platform-agnostic agent with a fully programmable mock marketplace to study this. We first explore aggregate choices, revealing that modal choices can differ across models, with AI agents sometimes concentrating on a few products, raising competition questions. We then analyze the drivers of choices through rationality checks and randomized experiments on product positions and listing attributes. Models show sizeable and heterogeneous position effects: all favor the top row, yet different models prefer different columns, undermining the assumption of a universal ``top'' rank. They penalize sponsored tags, reward endorsements, and sensitivities to price, ratings, and reviews are directionally as expected, but vary sharply across models. Finally, we find that a seller-side agent that makes minor tweaks to product descriptions can deliver substantial market-share gains by targeting AI buyer preferences. Our findings reveal how AI agents behave in e-commerce, and surface concrete seller strategy, platform design, and regulatory questions.
A Principle of Targeted Intervention for Multi-Agent Reinforcement Learning NeurIPS 2025
Steering cooperative multi-agent reinforcement learning (MARL) towards desired outcomes is challenging, particularly when the global guidance from a human on the whole multi-agent system is impractical in a large-scale MARL. On the other hand, designing external mechanisms (e.g., intrinsic rewards and human feedback) to coordinate agents mostly relies on empirical studies, lacking a easy-to-use research tool. In this work, we employ multi-agent influence diagrams (MAIDs) as a graphical framework to address the above issues. First, we introduce the concept of MARL interaction paradigms (orthogonal to MARL learning paradigms), using MAIDs to analyze and visualize both unguided self-organization and global guidance mechanisms in MARL. Then, we design a new MARL interaction paradigm, referred to as the targeted intervention paradigm that is applied to only a single targeted agent, so the problem of global guidance can be mitigated. In implementation, we introduce a causal inference technique, referred to as Pre-Strategy Intervention (PSI), to realize the targeted intervention paradigm. Since MAIDs can be regarded as a special class of causal diagrams, a composite desired outcome that integrates the primary task goal and an additional desired outcome can be achieved by maximizing the corresponding causal effect through the PSI. Moreover, the bundled relevance graph analysis of MAIDs provides a tool to identify whether an MARL learning paradigm is workable under the design of an MARL interaction paradigm. In experiments, we demonstrate the effectiveness of our proposed targeted intervention, and verify the result of relevance graph analysis.
comment: Published in NeurIPS 2025
Human-AI Collaboration: Trade-offs Between Performance and Preferences
Despite the growing interest in collaborative AI, designing systems that seamlessly integrate human input remains a major challenge. In this study, we developed a task to systematically examine human preferences for collaborative agents. We created and evaluated five collaborative AI agents with strategies that differ in the manner and degree they adapt to human actions. Participants interacted with a subset of these agents, evaluated their perceived traits, and selected their preferred agent. We used a Bayesian model to understand how agents' strategies influence the Human-AI team performance, AI's perceived traits, and the factors shaping human-preferences in pairwise agent comparisons. Our results show that agents who are more considerate of human actions are preferred over purely performance-maximizing agents. Moreover, we show that such human-centric design can improve the likability of AI collaborators without reducing performance. We find evidence for inequality-aversion effects being a driver of human choices, suggesting that people prefer collaborative agents which allow them to meaningfully contribute to the team. Taken together, these findings demonstrate how collaboration with AI can benefit from development efforts which include both subjective and objective metrics.
comment: LW Mayer & S Karny are co-first authors
Stronger together? The homophily trap in networks
While homophily -- the tendency to link with similar others -- may nurture a sense of belonging and shared values, it can also hinder diversity and widen inequalities. Here, we unravel this trade-off analytically, revealing homophily traps for minority groups: scenarios where increased homophilic interaction among minorities negatively affects their structural opportunities within a network. We demonstrate that homophily traps arise when minority size falls below 25% of a network, at which point homophily comes at the expense of lower structural visibility for the minority group. Our work reveals that social groups require a critical size to benefit from homophily without incurring structural costs, providing insights into core processes underlying the emergence of group inequality in networks.
comment: 11 pages, 1 figure
ColorEcosystem: Powering Personalized, Standardized, and Trustworthy Agentic Service in massive-agent Ecosystem
With the rapid development of (multimodal) large language model-based agents, the landscape of agentic service management has evolved from single-agent systems to multi-agent systems, and now to massive-agent ecosystems. Current massive-agent ecosystems face growing challenges, including impersonal service experiences, a lack of standardization, and untrustworthy behavior. To address these issues, we propose ColorEcosystem, a novel blueprint designed to enable personalized, standardized, and trustworthy agentic service at scale. Concretely, ColorEcosystem consists of three key components: agent carrier, agent store, and agent audit. The agent carrier provides personalized service experiences by utilizing user-specific data and creating a digital twin, while the agent store serves as a centralized, standardized platform for managing diverse agentic services. The agent audit, based on the supervision of developer and user activities, ensures the integrity and credibility of both service providers and users. Through the analysis of challenges, transitional forms, and practical considerations, the ColorEcosystem is poised to power personalized, standardized, and trustworthy agentic service across massive-agent ecosystems. Meanwhile, we have also implemented part of ColorEcosystem's functionality, and the relevant code is open-sourced at https://github.com/opas-lab/color-ecosystem.
Module checking of pushdown multi-agent systems
In this paper, we investigate the module-checking problem of pushdown multi-agent systems (PMS) against ATL and ATL* specifications. We establish that for ATL, module checking of PMS is 2EXPTIME-complete, which is the same complexity as pushdown module-checking for CTL. On the other hand, we show that ATL* module-checking of PMS turns out to be 4EXPTIME-complete, hence exponentially harder than both CTL* pushdown module-checking and ATL* model-checking of PMS. Our result for ATL* provides a rare example of a natural decision problem that is elementary yet but with a complexity that is higher than triply exponential-time.
comment: arXiv admin note: substantial text overlap with arXiv:1709.02107
Systems and Control (CS)
From Zonal to Nodal Capacity Expansion Planning: Spatial Aggregation Impacts on a Realistic Test-Case SC
Solving power system capacity expansion planning (CEP) problems at realistic spatial resolutions is computationally challenging. Thus, a common practice is to solve CEP over zonal models with low spatial resolution rather than over full-scale nodal power networks. Due to improvements in solving large-scale stochastic mixed integer programs, these computational limitations are becoming less relevant, and the assumption that zonal models are realistic and useful approximations of nodal CEP is worth revisiting. This work is the first to conduct a systematic computational study on the assumption that spatial aggregation can reasonably be used for ISO- and interconnect-scale CEP. By considering a realistic, large-scale test network based on the state of California with over 8,000 buses and 10,000 transmission lines, we demonstrate that well-designed small spatial aggregations can yield good approximations but that coarser zonal models result in large distortions of investment decisions.
comment: 10 pages, 4 figures, 6 tables, submitted to 2026 Power Systems Computation Conference (PSCC)
Towards Stochastic (N-1)-Secure Redispatch
The intermittent nature of renewable power availability is one of the major sources of uncertainty in power systems. While markets can guarantee that the demand is covered by the available generation, transmission system operators have to often intervene via economic redispatch to ensure that the physical constraints of the network are satisfied. To account for uncertainty, the underlying optimal power flow (OPF) routines have to be modified. Recently, polynomial chaos expansion (PCE) has been suggested in the literature as a tool for stochastic OPF problems. However, the usage of PCE-based methods in security-constrained OPF for (N-1)-secure operations has not yet been explored. In this paper, we propose a procedure that iteratively solves a PCE-overloaded stochastic OPF problem by including line outage constraints until an (N-1)-secure solution is achieved. We demonstrate the efficacy of our method by comparing it with a Monte-Carlo simulation on a 118-bus example system.
comment: 7 pages, 1 figure
An Error-Based Safety Buffer for Safe Adaptive Control (Extended Version)
We consider the problem of adaptive control of a class of feedback linearizable plants with matched parametric uncertainties whose states are accessible, subject to state constraints, which often arise due to safety considerations. In this paper, we combine adaptation and control barrier functions into a real-time control architecture that guarantees stability, ensures control performance, and remains safe even with the parametric uncertainties. Two problems are considered, differing in the nature of the parametric uncertainties. In both cases, the control barrier function is assumed to have an arbitrary relative degree. In addition to guaranteeing stability, it is proved that both the control objective and safety objective are met with near-zero conservatism. No excitation conditions are imposed on the command signal. Simulation results demonstrate the non-conservatism of all of the theoretical developments.
comment: Submitted to IEEE Transactions on Automatic Control
IoT-Driven Smart Management in Broiler Farming: Simulation of Remote Sensing and Control Systems SC
Parameter monitoring and control systems are crucial in the industry as they enable automation processes that improve productivity and resource optimization. These improvements also help to manage environmental factors and the complex interactions between multiple inputs and outputs required for production management. This paper proposes an automation system for broiler management based on a simulation scenario that involves sensor networks and embedded systems. The aim is to create a transmission network for monitoring and controlling broiler temperature and feeding using the Internet of Things (IoT), complemented by a dashboard and a cloud-based service database to track improvements in broiler management. We look forward this work will serve as a guide for stakeholders and entrepreneurs in the animal production industry, fostering sustainable development through simple and cost-effective automation solutions. The goal is for them to scale and integrate these recommendations into their existing operations, leading to more efficient decision-making at the management level.
comment: 2025 IEEE Technology and Engineering Management Society Conference (TEMSCON LATAM), Cartagena, Colombia
Flexibility aggregation via set projection for distribution grids with multiple interconnections
With the increasing number of flexible energy devices in distribution grids, coordination between Transmission System Operators (TSOs) and Distribution System Operators (DSOs) becomes critical for optimal system operation. One form of coordination is to solve the overall system operation problem in a hierarchical way, computing Feasible Operational Regions (FORs) for the interconnection between TSO/DSO. Most methods for computing FORs rely on the assumption of only one interconnection point between TSO and DSOs, which is often violated in practice. In this work, we propose a method for computing FORs in distribution grids with multiple interconnection points to the transmission grid. We test our method in a grid with two interconnecting points and analyze the properties of the resulting high-dimensional FOR from a power systems perspective.
Payload trajectory tracking control for aerial transportation systems with cable length online optimization
Cable-suspended aerial transportation systems are employed extensively across various industries. The capability to flexibly adjust the relative position between the multirotor and the payload has spurred growing interest in the system equipped with variable-length cable, promising broader application potential. Compared to systems with fixed-length cables, introducing the variable-length cable adds a new degree of freedom. However, it also results in increased nonlinearity and more complex dynamic coupling among the multirotor, the cable and the payload, posing significant challenges in control design. This paper introduces a backstepping control strategy tailored for aerial transportation systems with variable-length cable, designed to precisely track the payload trajectory while dynamically adjusting cable length. Then, a cable length generator has been developed that achieves online optimization of the cable length while satisfying state constraints, thus balancing the multirotor's motion and cable length changes without the need for manual trajectory planning. The asymptotic stability of the closed-loop system is guaranteed through Lyapunov techniques and the growth restriction condition. Finally, simulation results confirm the efficacy of the proposed method in managing trajectory tracking and cable length adjustments effectively.
Inertia Partitioning Modular Control Framework for Reconfigurable Multibody Systems
A novel modular control framework for reconfigurable rigid multibody systems is proposed, motivated by the challenges of modular control of systems with closed kinematic chains. In the framework, modularity is defined in the sense of degrees of freedom, and the inertial properties of each body are partitioned with respect to how they are reflected in the kinetic energy of the system through the motion induced by each degree of freedom. This approach inherently handles closed chains in the same manner as tree-like structures, eliminating the need for explicit constraint force calculations or formulations based on differential-algebraic equations. The proposed framework is implemented via simulation on a three-degree-of-freedom series-parallel manipulator, with the results being consistent with the expected stability and tracking performance, and indicating the framework's potential for scalability in trajectory-tracking control of multibody systems.
Neural Networks for AC Optimal Power Flow: Improving Worst-Case Guarantees during Training SC
The AC Optimal Power Flow (AC-OPF) problem is central to power system operation but challenging to solve efficiently due to its nonconvex and nonlinear nature. Neural networks (NNs) offer fast surrogates, yet their black-box behavior raises concerns about constraint violations that can compromise safety. We propose a verification-informed NN framework that incorporates worst-case constraint violations directly into training, producing models that are both accurate and provably safer. Through post-hoc verification, we achieve substantial reductions in worst-case violations and, for the first time, verify all operational constraints of large-scale AC-OPF proxies. Practical feasibility is further enhanced via restoration and warm-start strategies for infeasible operating points. Experiments on systems ranging from 57 to 793 buses demonstrate scalability, speed, and reliability, bridging the gap between ML acceleration and safe, real-time deployment of AC-OPF solutions - and paving the way toward data-driven optimal control.
comment: Submitted to PSCC 2026 (under review)
Embroidery Actuator Utilizing Embroidery Patterns to Generate Diverse Fabric Deformations
This paper presents a novel Embroidery Actuator, a fabric-integrated pneumatic actuator that enables diverse and controllable deformations through embroidery pattern design. Unlike conventional fabric actuators that rely on fiber- or thread-shaped actuators, the proposed actuator is fabricated by directly stitching an inflatable tube onto the fabric using a cord-embroidery technique. The embroidered thread and the fabric jointly form a sleeve that constrains the expansion of the inflatable tube, converting internal pressure into targeted bending or stretching deformations. By varying the embroidery pattern, such as zigzag or cross configurations, different geometric constraints can be realized, allowing for flexible control of deformation direction and magnitude. Analytical deformation models based on the Neo-Hookean model and Lagrange's equations were developed to predict the relationship between pneumatic pressure and bending angle, and were experimentally validated using motion-capture measurements. The results demonstrated that the actuator achieves strong agreement with the analytical deformation model.
comment: 8 pages, 8 figures. This work has been submitted to the IEEE for possible publication
Combining High Level Scheduling and Low Level Control to Manage Fleets of Mobile Robots
The deployment of mobile robots for material handling in industrial environments requires scalable coordination of large fleets in dynamic settings. This paper presents a two-layer framework that combines high-level scheduling with low-level control. Tasks are assigned and scheduled using the compositional algorithm ComSat, which generates time-parameterized routes for each robot. These schedules are then used by a distributed Model Predictive Control (MPC) system in real time to compute local reference trajectories, accounting for static and dynamic obstacles. The approach ensures safe, collision-free operation, and supports rapid rescheduling in response to disruptions such as robot failures or environmental changes. We evaluate the method in simulated 2D environments with varying road capacities and traffic conditions, demonstrating high task completion rates and robust behavior even under congestion. The modular structure of the framework allows for computational tractability and flexibility, making it suitable for deployment in complex, real-world industrial scenarios.
Context-awareness for Dependable Low-Power IoT
Dependability is the ability to consistently deliver trusted and uninterrupted service in the face of operational uncertainties. Ensuring dependable operation in large-scale, energy-constrained Internet of Things (IoT) deployments is as crucial as challenging, and calls for context-aware protocols where context refers to situational or state information. In this paper, we identify four critical context dimensions for IoT networks, namely energy status, information freshness, task relevance, and physical/medium conditions, and show how each one underpins core dependability attributes. Building on these insights, we propose a two-step protocol design framework that incorporates operation-specific context fields. Through three representative use cases, we demonstrate how context awareness can significantly enhance system dependability while imposing only minimal control-plane overhead.
NeuroDOB: A Deep Neural Observer-Based Controller for Vehicle Lateral Dynamics
This paper proposes NeuroDOB, a deep neural network based observer controller for vehicle lateral dynamics, which replaces the conventional disturbance observer (DOB) with a deep neural network (DNN) to enhance personalized lateral control. Unlike conventional DOBs that compensate for general disturbances such as road friction variation and crosswind, NeuroDOB explicitly addresses unmodeled vehicle dynamics and driver-specific behaviors by learning the steering compensation signal from driver-in-the-loop simulations using CarSim's embedded controller as a surrogate driver. The proposed architecture integrates NeuroDOB with a linear quadratic regulator (LQR), where the DNN outputs a delta error correction added to the baseline LQR steering input to produce the final control command. Input features to the DNN include lateral position and yaw angle errors, and the LQR control input. Experimental validation using a lateral dynamic bicycle model within CarSim demonstrates that NeuroDOB effectively adapts to individual driving habits, improving lateral control performance beyond what conventional LQR controllers achieve. The results indicate the potential of deep neural network based observer to enable personalized and adaptive autonomous vehicle control. In cognitive terms, the proposed architecture can be viewed as a dual-system control structure. The baseline LQR corresponds to System 1, a model-based, fast, and analytic reasoning layer ensuring stability. The NeuroDOB acts as System 2, a reflective, data-driven layer that learns compensation from experience and corrects the analytical bias of System 1. Together, they form an integrated decision process analogous to human intuition-reflection interaction, enabling both stability and adaptability in lateral control.
comment: 12 pages, 16 figures
zkSTAR: A zero knowledge system for time series attack detection enforcing regulatory compliance in critical infrastructure networks
Industrial control systems (ICS) form the operational backbone of critical infrastructure networks (CIN) such as power grids, water supply systems, and gas pipelines. As cyber threats to these systems escalate, regulatory agencies are imposing stricter compliance requirements to ensure system-wide security and reliability. A central challenge, however, is enabling regulators to verify the effectiveness of detection mechanisms without requiring utilities to disclose sensitive operational data. In this paper, we introduce zkSTAR, a cyberattack detection framework that leverages zk-SNARKs to reconcile these requirements and enable provable detection guarantees while preserving data confidentiality. Our approach builds on established residual-based statistical hypothesis testing methods applied to state-space detection models. Specifically, we design a two-pronged zk-SNARK architecture that enforces temporal consistency of the state-space dynamics and statistical consistency of the detection tests, allowing regulators to temporally verify alarm correctness without visibility into utility-level data. We formally analyze the soundness and zero knowledge properties of our framework and validate its practical feasibility through computational experiments on real-world ICS datasets. As a result, our work demonstrates a scalable, privacy-preserving alternative for regulatory compliance for ICS driven critical infrastructure networks.
Seq-DeepIPC: Sequential Sensing for End-to-End Control in Legged Robot Navigation
We present Seq-DeepIPC, a sequential end-to-end perception-to-control model for legged robot navigation in realworld environments. Seq-DeepIPC advances intelligent sensing for autonomous legged navigation by tightly integrating multi-modal perception (RGB-D + GNSS) with temporal fusion and control. The model jointly predicts semantic segmentation and depth estimation, giving richer spatial features for planning and control. For efficient deployment on edge devices, we use EfficientNet-B0 as the encoder, reducing computation while maintaining accuracy. Heading estimation is simplified by removing the noisy IMU and instead computing the bearing angle directly from consecutive GNSS positions. We collected a larger and more diverse dataset that includes both road and grass terrains, and validated Seq-DeepIPC on a robot dog. Comparative and ablation studies show that sequential inputs improve perception and control in our models, while other baselines do not benefit. Seq-DeepIPC achieves competitive or better results with reasonable model size; although GNSS-only heading is less reliable near tall buildings, it is robust in open areas. Overall, Seq-DeepIPC extends end-to-end navigation beyond wheeled robots to more versatile and temporally-aware systems. To support future research, we will release the codes to our GitHub repository at https://github.com/oskarnatan/Seq-DeepIPC.
comment: Preprint notice, this manuscript has been submitted to IEEE sensors journal for possible publication
Planning Oriented Integrated Sensing and Communication
Integrated sensing and communication (ISAC) enables simultaneous localization, environment perception, and data exchange for connected autonomous vehicles. However, most existing ISAC designs prioritize sensing accuracy and communication throughput, treating all targets uniformly and overlooking the impact of critical obstacles on motion efficiency. To overcome this limitation, we propose a planning-oriented ISAC (PISAC) framework that reduces the sensing uncertainty of planning-bottleneck obstacles and expands the safe navigable path for the ego-vehicle, thereby bridging the gap between physical-layer optimization and motion-level planning. The core of PISAC lies in deriving a closed-form safety bound that explicitly links ISAC transmit power to sensing uncertainty, based on the Cram\'er-Rao Bound and occupancy inflation principles. Using this model, we formulate a bilevel power allocation and motion planning (PAMP) problem, where the inner layer optimizes the ISAC beam power distribution and the outer layer computes a collision-free trajectory under uncertainty-aware safety constraints. Comprehensive simulations in high-fidelity urban driving environments demonstrate that PISAC achieves up to 40% higher success rates and over 5% shorter traversal times than existing ISAC-based and communication-oriented benchmarks, validating its effectiveness in enhancing both safety and efficiency.
An Intelligent Water-Saving Irrigation System Based on Multi-Sensor Fusion and Visual Servoing Control
This paper introduces an intelligent water-saving irrigation system designed to address critical challenges in precision agriculture, such as inefficient water use and poor terrain adaptability. The system integrates advanced computer vision, robotic control, and real-time stabilization technologies via a multi-sensor fusion approach. A lightweight YOLO model, deployed on an embedded vision processor (K210), enables real-time plant container detection with over 96% accuracy under varying lighting conditions. A simplified hand-eye calibration algorithm-designed for 'handheld camera' robot arm configurations-ensures that the end effector can be precisely positioned, with a success rate exceeding 90%. The active leveling system, driven by the STM32F103ZET6 main control chip and JY901S inertial measurement data, can stabilize the irrigation platform on slopes up to 10 degrees, with a response time of 1.8 seconds. Experimental results across three simulated agricultural environments (standard greenhouse, hilly terrain, complex lighting) demonstrate a 30-50% reduction in water consumption compared to conventional flood irrigation, with water use efficiency exceeding 92% in all test cases.
End-to-End Design and Validation of a Low-Cost Stewart Platform with Nonlinear Estimation and Control
This paper presents the complete design, control, and experimental validation of a low-cost Stewart platform prototype developed as an affordable yet capable robotic testbed for research and education. The platform combines off the shelf components with 3D printed and custom fabricated parts to deliver full six degrees of freedom motions using six linear actuators connecting a moving platform to a fixed base. The system software integrates dynamic modeling, data acquisition, and real time control within a unified framework. A robust trajectory tracking controller based on feedback linearization, augmented with an LQR scheme, compensates for the platform's nonlinear dynamics to achieve precise motion control. In parallel, an Extended Kalman Filter fuses IMU and actuator encoder feedback to provide accurate and reliable state estimation under sensor noise and external disturbances. Unlike prior efforts that emphasize only isolated aspects such as modeling or control, this work delivers a complete hardware-software platform validated through both simulation and experiments on static and dynamic trajectories. Results demonstrate effective trajectory tracking and real-time state estimation, highlighting the platform's potential as a cost effective and versatile tool for advanced research and educational applications.
comment: 24 pages, journal
Never Too Rigid to Reach: Adaptive Virtual Model Control with LLM- and Lyapunov-Based Reinforcement Learning
Robotic arms are increasingly deployed in uncertain environments, yet conventional control pipelines often become rigid and brittle when exposed to perturbations or incomplete information. Virtual Model Control (VMC) enables compliant behaviors by embedding virtual forces and mapping them into joint torques, but its reliance on fixed parameters and limited coordination among virtual components constrains adaptability and may undermine stability as task objectives evolve. To address these limitations, we propose Adaptive VMC with Large Language Model (LLM)- and Lyapunov-Based Reinforcement Learning (RL), which preserves the physical interpretability of VMC while supporting stability-guaranteed online adaptation. The LLM provides structured priors and high-level reasoning that enhance coordination among virtual components, improve sample efficiency, and facilitate flexible adjustment to varying task requirements. Complementarily, Lyapunov-based RL enforces theoretical stability constraints, ensuring safe and reliable adaptation under uncertainty. Extensive simulations on a 7-DoF Panda arm demonstrate that our approach effectively balances competing objectives in dynamic tasks, achieving superior performance while highlighting the synergistic benefits of LLM guidance and Lyapunov-constrained adaptation.
Secure Control of Connected and Autonomous Electrified Vehicles Under Adversarial Cyber-Attacks
Connected and Autonomous Electrified Vehicles (CAEV) is the solution to the future smart mobility having benefits of efficient traffic flow and cleaner environmental impact. Although CAEV has advantages they are still susceptible to adversarial cyber attacks due to their autonomous electric operation and the involved connectivity. To alleviate this issue, we propose a secure control architecture of CAEV. Particularly, we design an additional control input using Reinforcement Learning (RL) to be applied to the vehicle powertrain along with the input commanded by the battery. We present simulation case studies to demonstrate the potential of the proposed approach in keeping the CAEV platoon operating safely without collisions by curbing the effect of adversarial attacks.
Dynamical Modeling of Temperature and Smoke Evolution in a Thermal-Runaway Event of a Large-Format Lithium-ion Battery in a Mine Tunnel
Large-format lithium-ion batteries (LIBs) provide effective energy storage solutions for high-power equipment used in underground mining operations. They have high Columbic efficiency and minimal heat and emission footprints. However, improper use of LIBs, accidents, or other factors may increase the probability of thermal runaway (TR), a rapid combustion reaction that discharges toxic and flammable substances. Several such incidents have been documented in mines. Since repeatable TR experiments to uncover the transient-state propagation of TR are expensive and hazardous, high-fidelity models are usually developed to mimic the impact of these events. They are resource-intensive and are impractical to develop for many scenarios that could be observed in a mine. Therefore, dynamic models within a reduced-order framework were constructed to represent the transient-state combustion event. Reduced order models (ROMs) reasonably replicate trends in temperature and smoke, showing strong alignment with the ground-truth dataset.
Modeling and Scheduling of Fusion Patterns in Autonomous Driving Systems (Extended Version)
In Autonomous Driving Systems (ADS), Directed Acyclic Graphs (DAGs) are widely used to model complex data dependencies and inter-task communication. However, existing DAG scheduling approaches oversimplify data fusion tasks by assuming fixed triggering mechanisms, failing to capture the diverse fusion patterns found in real-world ADS software stacks. In this paper, we propose a systematic framework for analyzing various fusion patterns and their performance implications in ADS. Our framework models three distinct fusion task types: timer-triggered, wait-for-all, and immediate fusion, which comprehensively represent real-world fusion behaviors. Our Integer Linear Programming (ILP)-based approach enables an optimization of multiple real-time performance metrics, including reaction time, time disparity, age of information, and response time, while generating deterministic offline schedules directly applicable to real platforms. Evaluation using real-world ADS case studies, Raspberry Pi implementation, and randomly generated DAGs demonstrates that our framework handles diverse fusion patterns beyond the scope of existing work, and achieves substantial performance improvements in comparable scenarios.
Carbon-Aware Optimal Power Flow with Data-Driven Carbon Emission Tracing
Quantifying locational carbon emissions in power grids is crucial for implementing effective carbon reduction strategies for customers relying on electricity. This paper presents a carbon-aware optimal power flow (OPF) framework that incorporates data-driven carbon tracing, enabling rapid estimation of nodal carbon emissions from electric loads. By developing generator-to-load carbon emission distribution factors through data-driven technique, the analytical formulas for both average and marginal carbon emissions can be derived and integrated seamlessly into DC OPF models as linear constraints. The proposed carbon-aware OPF model enables market operators to optimize energy dispatch while reducing greenhouse gas emissions. Simulations on IEEE test systems confirm the accuracy and computational efficiency of the proposed approach, highlighting its applicability for real-time carbon-aware system operations.
A Spatio-Temporal Graph Learning Approach to Real-Time Economic Dispatch with Multi-Transmission-Node DER Aggregation
The integration of distributed energy resources (DERs) into wholesale electricity markets, as mandated by FERC Order 2222, imposes new challenges on system operations. To remain consistent with existing market structures, regional transmission organizations (RTOs) have advanced the aggregation of transmission-node-level DERs (T-DERs), where a nodal virtual power plant (VPP) represents the mapping of all distribution-level DERs to their respective transmission nodes. This paper develops a real-time economic dispatch (RTED) framework that enables multi-transmission-node DER aggregation while addressing computational efficiency. To this end, we introduce a spatio-temporal graph convolutional network (ST-GCN) for adaptive prediction of distribution factors (DFs), thereby capturing the dynamic influence of individual T-DERs across the transmission system. Furthermore, an iterative constraint identification strategy is incorporated to alleviate transmission security constraints without compromising system reliability. Together, these innovations accelerate the market clearing process and support the effective participation of T-DER aggregators under current market paradigms. The proposed approach is validated on large-scale test systems, including modified 118-, 2383-, and 3012-bus networks under a rolling RTED setting with real demand data. Numerical results demonstrate significant improvements in reducing operational costs and maintaining transmission network feasibility, underscoring the scalability and practicality of the proposed framework.
Neural Two-Stage Stochastic Volt-VAR Optimization for Three-Phase Unbalanced Distribution Systems with Network Reconfiguration
The increasing integration of intermittent distributed energy resources (DERs) has introduced significant variability in distribution networks, posing challenges to voltage regulation and reactive power management. This paper presents a novel neural two-stage stochastic Volt-VAR optimization (2S-VVO) method for three-phase unbalanced distribution systems considering network reconfiguration under uncertainty. To address the computational intractability associated with solving large-scale scenario-based 2S-VVO problems, a learning-based acceleration strategy is introduced, wherein the second-stage recourse model is approximated by a neural network. This neural approximation is embedded into the optimization model as a mixed-integer linear program (MILP), enabling effective enforcement of operational constraints related to the first-stage decisions. Numerical simulations on a 123-bus unbalanced distribution system demonstrate that the proposed approach achieves over 50 times speedup compared to conventional solvers and decomposition methods, while maintaining a typical optimality gap below 0.30%. These results underscore the method's efficacy and scalability in addressing large-scale stochastic VVO problems under practical operating conditions.
MDP-based Energy-aware Task Scheduling for Battery-less IoT
Realizing high long-term task completion rates represents a fundamental challenge in battery-less Internet of Things (IoT) devices powered by ambient energy harvesting. This difficulty is primarily due to the stochastic and time-varying characteristics of the available energy, which significantly complicate the design of optimal task scheduling policies. In this paper, we consider a battery-less IoT device that must periodically report sensing measurements to a monitoring center. We adopt the Markov decision process (MDP) framework to handle energy variability while aiming to maximize the long-term task completion rate. For this, we first identify its components and then define two appropriate reward functions. We demonstrate the inherent properties associated with the MDP formulation and the related optimal policy. Subsequently, we solve the resulting optimization problem, leading to the optimal stationary threshold-based (OSTB) scheduling. Simulation results demonstrate that OSTB outperforms the well-known ``as late as possible'' (ALAP) scheduling strategy. For instance, an $8.6\%$ increase in the task completion rate, along with a $65\%$ reduction in power failures and a $86.29\%$ decrease in execution delays during task execution are registered assuming a $4.7$ mF capacitor.
comment: 13 pages, 11 figures
A Simultaneous ECG-PCG Acquisition System with Real-Time Burst-Adaptive Noise Cancellation ISCA
Cardiac auscultation is an essential clinical skill, requiring excellent hearing to distinguish subtle differences in timing and pitch of heart sounds. However, diagnosing solely from these sounds is often challenging due to interference from surrounding noise, and the information may be limited. Existing solutions that adaptively cancel external noise are either not real-time or are computationally intensive, making them unsuitable for implementation in a portable system. This work proposes an end-to-end system with a real-time adaptive noise cancellation pipeline integrated into a device that simultaneously acquires electrocardiogram (ECG) and phonocardiogram (PCG) signals. The performance of the system is validated using real-world hospital noise datasets and recordings captured with the dual-modality device. For PCG and ECG signals recorded from the device in noisy hospital settings, the proposed algorithms achieved signal-to-noise ratio improvements of 37.01 dB and 30.32 dB, respectively. These results demonstrate the systems effectiveness in enabling reliable and accessible cardiac screening, including noisy hospital environments typical of resource-constrained settings.
comment: Paper submitted to IEEE International Symposium on Circuits and Systems (ISCAS) 2026
Maximal Load Shedding Verification for Neural Network Models of AC Line Switching
Solving for globally optimal line switching decisions in AC transmission grids can be intractability slow. Machine learning (ML) models, meanwhile, can be trained to predict near-optimal decisions at a fraction of the speed. Verifying the performance and impact of these ML models on network operation, however, is a critically important step prior to their actual deployment. In this paper, we train a Neural Network (NN) to solve the optimal power shutoff line switching problem. To assess the worst-case load shedding induced by this model, we propose a bilevel attacker-defender verification approach that finds the NN line switching decisions that cause the highest quantity of network load shedding. Solving this problem to global optimality is challenging (due to AC power flow and NN nonconvexities), so our approach exploits a convex relaxation of the AC physics, combined with a local NN search, to find a guaranteed lower bound on worst--case load shedding. These under-approximation bounds are solved via MathOptAI.jl. We benchmark against a random sampling approach, and we find that our optimization-based approach always finds larger load shedding. Test results are collected on multiple PGLib test cases and on trained NN models which contain more than 10 million model parameters.
Switching Network System Identification via Convex Optimizations
This paper introduces a convex optimization framework for identifying switched network systems, in which both the node dynamics and the underlying graph topology switch between a finite number of configurations. Building on our recent convex identification method for general switching systems, we extend the formulation to structured network systems where each mode corresponds to a distinct adjacency matrix. We show that both the continuous node dynamics and binary network topologies can be identified from sampled state-velocity data by solving a sequence of convex programs. The proposed framework provides a unified and scalable way to recover piecewise network structures from data without a prior knowledge of mode labels at each state. Numerical results on diffusively coupled oscillators demonstrate accurate recovery of both mode dynamics and switching graphs.
comment: 6 pages, 5 figures
Data-Driven Soft Robot Control via Adiabatic Spectral Submanifolds
The mechanical complexity of soft robots creates significant challenges for their model-based control. Specifically, linear data-driven models have struggled to control soft robots on complex, spatially extended paths that explore regions with significant nonlinear behavior. To account for these nonlinearities, we develop here a model-predictive control strategy based on the recent theory of adiabatic spectral submanifolds (aSSMs). This theory is applicable because the internal vibrations of heavily overdamped robots decay at a speed that is much faster than the desired speed of the robot along its intended path. In that case, low-dimensional attracting invariant manifolds (aSSMs) emanate from the path and carry the dominant dynamics of the robot. Aided by this recent theory, we devise an aSSM-based model-predictive control scheme purely from data. We demonstrate our data-driven model's effectiveness in tracking dynamic trajectories across diverse tasks, validated on a high-fidelity, high-dimensional finite-element model of a soft trunk robot and a Cosserat rod-based elastic soft arm. Notably, we find that five- or six-dimensional aSSM-reduced models outperform the tracking performance of other data-driven modeling methods by a factor up to $10$ across all closed-loop control tasks.
comment: 41 pages, 24 figures
iWalker: Imperative Visual Planning for Walking Humanoid Robot
Humanoid robots, designed to operate in human-centric environments, serve as a fundamental platform for a broad range of tasks. Although humanoid robots have been extensively studied for decades, a majority of existing humanoid robots still heavily rely on complex modular frameworks, leading to inflexibility and potential compounded errors from independent sensing, planning, and acting components. In response, we propose an end-to-end humanoid sense-plan-act walking system, enabling vision-based obstacle avoidance and footstep planning for whole body balancing simultaneously. We designed two imperative learning (IL)-based bilevel optimizations for model-predictive step planning and whole body balancing, respectively, to achieve self-supervised learning for humanoid robot walking. This enables the robot to learn from arbitrary unlabeled data, improving its adaptability and generalization capabilities. We refer to our method as iWalker and demonstrate its effectiveness in both simulated and real-world environments, representing a significant advancement toward autonomous humanoid robots.
ExAMPC: the Data-Driven Explainable and Approximate NMPC with Physical Insights IROS
Amidst the surge in the use of Artificial Intelligence (AI) for control purposes, classical and model-based control methods maintain their popularity due to their transparency and deterministic nature. However, advanced controllers like Nonlinear Model Predictive Control (NMPC), despite proven capabilities, face adoption challenges due to their computational complexity and unpredictable closed-loop performance in complex validation systems. This paper introduces ExAMPC, a methodology bridging classical control and explainable AI by augmenting the NMPC with data-driven insights to improve the trustworthiness and reveal the optimization solution and closed-loop performance's sensitivities to physical variables and system parameters. By employing a low-order spline embedding, we reduce the open-loop trajectory dimensionality by over 95%, and integrate it with SHAP and Symbolic Regression from eXplainable AI (XAI) for an approximate NMPC, enabling intuitive physical insights into the NMPC's optimization routine. The prediction accuracy of the approximate NMPC is enhanced through physics-inspired continuous-time constraints penalties, reducing the predicted continuous trajectory violations by 93%. ExAMPC also enables accurate forecasting of the NMPC's computational requirements with explainable insights on worst-case scenarios. Experimental validation on automated valet parking and autonomous racing with lap-time optimization, demonstrates the methodology's practical effectiveness for potential real-world applications.
comment: This paper has been accepted for publication in the 2025 IEEE/RSJ IROS Conference
Formally Verified Neural Network Controllers for Incremental Input-to-State Stability of Unknown Discrete-Time Systems
This work aims to synthesize a controller that ensures that an unknown discrete-time system is incrementally input-to-state stable ($\delta$-ISS). In this work, we introduce the notion of $\delta$-ISS control Lyapunov function ($\delta$-ISS-CLF), which, in conjunction with the controller, ensures that the closed-loop system is incrementally ISS. To address the unknown dynamics of the system, we parameterize the controller as well as the $\delta$-ISS-CLF as neural networks and learn them by utilizing the sampled data from the state space of the unknown system. To formally verify the obtained $\delta$-ISS-CLF, we develop a validity condition and incorporate the condition into the training framework to ensure a provable correctness guarantee at the end of the training process. Finally, the usefulness of the proposed approach is proved using multiple case studies - the first one is a scalar system with a non-affine non-polynomial structure, the second example is a one-link manipulator system, the third system is a nonlinear Moore-Grietzer model of the jet engine and the final one is a rotating rigid spacecraft model.
A Single Motor Nano Aerial Vehicle with Novel Peer-to-Peer Communication and Sensing Mechanism
Communication and position sensing are among the most important capabilities for swarm robots to interact with their peers and perform tasks collaboratively. However, the hardware required to facilitate communication and position sensing is often too complicated, expensive, and bulky to be carried on swarm robots. Here we present Maneuverable Piccolissimo 3 (MP3), a minimalist, single motor drone capable of executing inter-robot communication via infrared light and triangulation-based sensing of relative bearing, distance, and elevation using message arrival time. Thanks to its novel design, MP3 can communicate with peers and localize itself using simple components, keeping its size and mass small and making it inherently safe for human interaction. We present the hardware and software design of MP3 and demonstrate its capability to localize itself, fly stably, and maneuver in the environment using peer-to-peer communication and sensing.
Opinion Dynamics on Signed Graphs and Graphons
In this paper, we make use of graphon theory to study opinion dynamics on large undirected networks. The opinion dynamics models that we take into consideration allow for negative interactions between the individuals, whose opinions can thus grow apart. We consider both the repelling and the opposing models of negative interactions, which have been studied in the literature. We define the repelling and the opposing dynamics on signed graphons and we show that their initial value problem solutions exist and are unique. We then show that, in a suitable sense, the graphon dynamics is a good approximation of the dynamics on large graphs that converge to a graphon. This result applies to large random graphs that are sampled according to a graphon (W-random graphs), for which we provide a new convergence result under very general assumptions.
FlightKooba: A Fast Interpretable FTP Model
Flight trajectory prediction (FTP) and similar time series tasks typically require capturing smooth latent dynamics hidden within noisy signals. However, existing deep learning models face significant challenges of high computational cost and insufficient interpretability due to their complex black-box nature. This paper introduces FlightKooba, a novel modeling approach designed to extract such underlying dynamics analytically. Our framework uniquely integrates HiPPO theory, Koopman operator theory, and control theory. By leveraging Legendre polynomial bases, it constructs Koopman operators analytically, thereby avoiding large-scale parameter training. The method's core strengths lie in its exceptional computational efficiency and inherent interpretability. Experiments on multiple public datasets validate our design philosophy: for signals exhibiting strong periodicity or clear physical laws (e.g., in aviation, meteorology, and traffic flow), FlightKooba delivers competitive prediction accuracy while reducing trainable parameters by several orders of magnitude and achieving the fastest training speed. Furthermore, we analyze the model's theoretical boundaries, clarifying its inherent low-pass filtering characteristics that render it unsuitable for sequences dominated by high-frequency noise. In summary, FlightKooba offers a powerful, efficient, and interpretable new alternative for time series analysis, particularly in resource-constrained environments.
comment: Version 2: Major revision of the manuscript to refine the narrative, clarify the model's theoretical limitations and application scope, and improve overall presentation for journal submission
A Linear Parameter-Varying Approach to Data Predictive Control
By means of the linear parameter-varying (LPV) Fundamental Lemma, we derive novel data-driven predictive control (DPC) methods for LPV systems. In particular, we present output-feedback and state-feedback-based LPV-DPC methods with terminal ingredients, which guarantee exponential stability and recursive feasibility. We provide methods for the data-based computation of these terminal ingredients. Furthermore, an in-depth analysis of the application and implementation aspects of the LPV-DPC schemes is given, including application for nonlinear systems and handling noisy data. We compare and demonstrate the performance of the proposed methods in a detailed simulation example involving a nonlinear unbalanced disc system.
comment: To appear in IEEE Transactions on Automatic Control. Final Author Copy. Extended version (Section VI.C, Appendix C & D not in original version). 18 pages
Composite Learning Adaptive Control under Non-Persistent Partial Excitation
This paper focuses on relaxing the excitation conditions for the adaptive control of uncertain nonlinear systems. By adopting the spectral decomposition technique, a linear regression equation (LRE) is constructed to quantitatively collect historical excitation information, based on which the parameter estimation error is decomposed into the excited component and the unexcited component. By sufficiently utilizing the collected excitation information, the composite learning and {\mu}-modification terms are designed and incorporated into the "Lyapunov-based" parameter update law. By developing a novel Lyapunov function, it is demonstrated that under non-persistent partial excitation, the control error and the excited parameter estimation error component converge to zero, while the unexcited component remains bounded. Furthermore, the proposed adaptive control scheme can effectively eliminate the effects of parametric uncertainties and enhance the robustness of the closed-loop systems. Simulation results are provided to verify the theoretical findings.
comment: 16 pages, 15 figures
A Hybrid GNN-IZR Framework for Fast and Empirically Robust AC Power Flow Analysis in Radial Distribution Systems
The Alternating Current Power Flow (ACPF) problem forces a trade-off between the speed of data-driven models and the reliability of analytical solvers. This paper introduces a hybrid framework that synergizes a Graph Neural Network (GNN) with the Implicit Z-Bus Recursive (IZR) method, a robust, non-iterative solver for radial distribution networks. The framework employs a physics-informed GNN for rapid initial predictions and invokes the IZR solver as a failsafe for stressed cases identified by a two-stage trigger. A failure is defined as any solution with a maximum power mismatch exceeding 0.1 p.u., a significant operational deviation. On a challenging test set of 7,500 stressed scenarios for the IEEE 33-bus system, the GNN-only model failed on 13.11 % of cases. In contrast, the hybrid framework identified all potential failures, delegating them to the IZR solver to achieve a 0.00 % failure rate, empirically matching the 100 % success rate of the analytical solver on this specific test set. An expanded ablation study confirms that both physics-informed training and Z-bus sensitivity features are critical, collaboratively reducing the GNN's failure rate from 98.72 % (data-only) to 13.11 %. The hybrid approach demonstrates a pragmatic path to achieving the empirical reliability of an analytical solver while leveraging GNN speed, enabling a significant increase in the number of scenarios analyzable in near real-time.
CIVIL: Causal and Intuitive Visual Imitation Learning
Today's robots attempt to learn new tasks by imitating human examples. These robots watch the human complete the task, and then try to match the actions taken by the human expert. However, this standard approach to visual imitation learning is fundamentally limited: the robot observes what the human does, but not why the human chooses those behaviors. Without understanding which features of the system or environment factor into the human's decisions, robot learners often misinterpret the human's examples. In practice, this results in causal confusion, inefficient learning, and robot policies that fail when the environment changes. We therefore propose a shift in perspective: instead of asking human teachers just to show what actions the robot should take, we also enable humans to intuitively indicate why they made those decisions. Under our paradigm human teachers attach markers to task-relevant objects and use natural language prompts to describe their state representation. Our proposed algorithm, CIVIL, leverages this augmented demonstration data to filter the robot's visual observations and extract a feature representation that aligns with the human teacher. CIVIL then applies these causal features to train a transformer-based policy that -- when tested on the robot -- is able to emulate human behaviors without being confused by visual distractors or irrelevant items. Our simulations and real-world experiments demonstrate that robots trained with CIVIL learn both what actions to take and why to take those actions, resulting in better performance than state-of-the-art baselines. From the human's perspective, our user study reveals that this new training paradigm actually reduces the total time required for the robot to learn the task, and also improves the robot's performance in previously unseen scenarios. See videos at our project website: https://civil2025.github.io
Learning Robust Satellite Attitude Dynamics with Physics-Informed Normalising Flow
Attitude control is a fundamental aspect of spacecraft operations. Model Predictive Control (MPC) has emerged as a powerful strategy for these tasks, relying on accurate models of the system dynamics to optimize control actions over a prediction horizon. In scenarios where physics models are incomplete, difficult to derive, or computationally expensive, machine learning offers a flexible alternative by learning the system behavior directly from data. However, purely data-driven models often struggle with generalization and stability, especially when applied to inputs outside their training domain. To address these limitations, we investigate the benefits of incorporating Physics-Informed Neural Networks (PINNs) into the learning of spacecraft attitude dynamics, comparing their performance with that of purely data-driven approaches. Using a Real-valued Non-Volume Preserving (Real NVP) neural network architecture with a self-attention mechanism, we trained several models on simulated data generated with the Basilisk simulator. Two training strategies were considered: a purely data-driven baseline and a physics-informed variant to improve robustness and stability. Our results demonstrate that the inclusion of physics-based information significantly enhances the performance in terms of the mean relative error with the best architectures found by 27.08%. These advantages are particularly evident when the learned models are integrated into an MPC framework, where PINN-based models consistently outperform their purely data-driven counterparts in terms of control accuracy and robustness, and achieve improved settling times when compared to traditional MPC approaches, yielding improvements of up to 62%, when subject to observation noise and RWs friction.
A Single Motor Nano Aerial Vehicle with Novel Peer-to-Peer Communication and Sensing Mechanism
Communication and position sensing are among the most important capabilities for swarm robots to interact with their peers and perform tasks collaboratively. However, the hardware required to facilitate communication and position sensing is often too complicated, expensive, and bulky to be carried on swarm robots. Here we present Maneuverable Piccolissimo 3 (MP3), a minimalist, single motor drone capable of executing inter-robot communication via infrared light and triangulation-based sensing of relative bearing, distance, and elevation using message arrival time. Thanks to its novel design, MP3 can communicate with peers and localize itself using simple components, keeping its size and mass small and making it inherently safe for human interaction. We present the hardware and software design of MP3 and demonstrate its capability to localize itself, fly stably, and maneuver in the environment using peer-to-peer communication and sensing.
comment: Proceedings of Robotics: Science and Systems (RSS), 2024
Design and Optimization of EV Charging Infrastructure with Battery in Commercial Buildings
The installation of electric vehicle (EV) charging stations in buildings is inevitable, as states push for increased EV adoption to support decarbonization efforts. This transition could force the need for grid infrastructure upgrades and enhanced controls to support reliable power delivery to end-use loads, and overall economic operation. This paper evaluates strategies that address these needs on two fronts: i) optimal sizing of service transformers and battery energy storage systems (BESS), and ii) optimized coordination between EV charging, BESS operation, and building demand. These strategies are applied to a school campus setting, consisting of building and EV charging loads, to provide an illustration of energy management in commercial buildings with EV fleets. A rolling-window optimization approach is applied to determine i) optimal sizing of the service transformer and BESS and ii) optimal control of EV charging and BESS charge/discharge schedules. The design and control strategies are validated in a 20-year time horizon with an annually increasing number of EVs (buses and vans). In addition, an economic analysis is also carried out to show the costs and benefits of each design as a medium- and long-term investment.
comment: This paper contains references to several concepts that are not aligned with current DOE guidance (decarbonization, state mandates, green energy, etc.)
Dynamic Dimensioning of Frequency Containment Reserves: The Case of the Nordic Grid
One of the main responsibilities of a Transmission System Operator (TSO) operating an electric grid is to maintain a designated frequency (e.g., 50 Hz in Europe). To achieve this, TSOs have created several products called frequency-supporting ancillary services. The Frequency Containment Reserve (FCR) is one of these ancillary service products. This article focuses on the TSO problem of determining the volume procured for FCR. Specifically, we investigate the potential benefits and impact on grid security when transitioning from a traditionally \textit{static} procurement method to a \textit{dynamic} strategy for FCR volume. We take the Nordic synchronous area in Europe as a case study and use a diffusion model to capture its frequency development. We introduce a controlled mean reversal parameter to assess changes in FCR obligations, in particular for the Nordic FCR-N ancillary service product. We establish closed-form expressions for exceedance probabilities and use historical frequency data as input to calibrate the model. We show that a dynamic dimensioning approach for FCR has the potential to significantly reduce the exceedance probabilities (up to $37\%$) while maintaining the total yearly procured FCR volume equal to that of the current static approach. Alternatively, a dynamic dimensioning approach could significantly increase security at limited extra cost.
comment: 12 pages, 12 figures, Accepted for publication at IEEE Transactions on Power Systems
The Limits of Fairness of the Variational Generalized Nash Equilibrium
Generalized Nash equilibrium (GNE) problems are commonly used to model strategic interactions between self-interested agents who are coupled in cost and constraints. Specifically, the variational GNE, a refinement of the GNE, is often selected as the solution concept due to its non-discriminatory treatment of agents by charging a uniform ``shadow price" for shared resources. We study the fairness concept of v-GNEs from a comparability perspective and show that it makes an implicit assumption of unit comparability of agent's cost functions, one of the strongest comparability notions. Further, we introduce a new solution concept, f-GNE in which a fairness metric is chosen a priori which is compatible with the comparability at hand. We introduce an electric vehicle charging game to demonstrate the fragility of v-GNE fairness and compare it to the f-GNE under various fairness metrics.
Carbon-Aware Computing for Data Centers with Probabilistic Performance Guarantees
Data centers are significant contributors to carbon emissions and can strain power systems due to their high electricity consumption. To mitigate this impact and to participate in demand response programs, cloud computing companies strive to balance and optimize operations across their global fleets by making strategic decisions about when and where to place compute jobs for execution. In this paper, we introduce a load shaping scheme which reacts to time-varying grid signals by leveraging both temporal and spatial flexibility of compute jobs to provide risk-aware management guidelines and job placement with provable performance guarantees based on distributionally robust optimization. Our approach divides the problem into two key components: (i) day-ahead planning, which generates an optimal scheduling strategy based on historical load data, and (ii) real-time job placement and (time) scheduling, which dynamically tracks the optimal strategy generated in (i). We validate our method in simulation using normalized load profiles from randomly selected Google clusters, incorporating time-varying grid signals. We can demonstrate significant reductions in carbon cost and peak power with our approach compared to myopic greedy policies, while maintaining computational efficiency and abiding to system and grid constraints.
Systems and Control (EESS)
From Zonal to Nodal Capacity Expansion Planning: Spatial Aggregation Impacts on a Realistic Test-Case SC
Solving power system capacity expansion planning (CEP) problems at realistic spatial resolutions is computationally challenging. Thus, a common practice is to solve CEP over zonal models with low spatial resolution rather than over full-scale nodal power networks. Due to improvements in solving large-scale stochastic mixed integer programs, these computational limitations are becoming less relevant, and the assumption that zonal models are realistic and useful approximations of nodal CEP is worth revisiting. This work is the first to conduct a systematic computational study on the assumption that spatial aggregation can reasonably be used for ISO- and interconnect-scale CEP. By considering a realistic, large-scale test network based on the state of California with over 8,000 buses and 10,000 transmission lines, we demonstrate that well-designed small spatial aggregations can yield good approximations but that coarser zonal models result in large distortions of investment decisions.
comment: 10 pages, 4 figures, 6 tables, submitted to 2026 Power Systems Computation Conference (PSCC)
Towards Stochastic (N-1)-Secure Redispatch
The intermittent nature of renewable power availability is one of the major sources of uncertainty in power systems. While markets can guarantee that the demand is covered by the available generation, transmission system operators have to often intervene via economic redispatch to ensure that the physical constraints of the network are satisfied. To account for uncertainty, the underlying optimal power flow (OPF) routines have to be modified. Recently, polynomial chaos expansion (PCE) has been suggested in the literature as a tool for stochastic OPF problems. However, the usage of PCE-based methods in security-constrained OPF for (N-1)-secure operations has not yet been explored. In this paper, we propose a procedure that iteratively solves a PCE-overloaded stochastic OPF problem by including line outage constraints until an (N-1)-secure solution is achieved. We demonstrate the efficacy of our method by comparing it with a Monte-Carlo simulation on a 118-bus example system.
comment: 7 pages, 1 figure
An Error-Based Safety Buffer for Safe Adaptive Control (Extended Version)
We consider the problem of adaptive control of a class of feedback linearizable plants with matched parametric uncertainties whose states are accessible, subject to state constraints, which often arise due to safety considerations. In this paper, we combine adaptation and control barrier functions into a real-time control architecture that guarantees stability, ensures control performance, and remains safe even with the parametric uncertainties. Two problems are considered, differing in the nature of the parametric uncertainties. In both cases, the control barrier function is assumed to have an arbitrary relative degree. In addition to guaranteeing stability, it is proved that both the control objective and safety objective are met with near-zero conservatism. No excitation conditions are imposed on the command signal. Simulation results demonstrate the non-conservatism of all of the theoretical developments.
comment: Submitted to IEEE Transactions on Automatic Control
IoT-Driven Smart Management in Broiler Farming: Simulation of Remote Sensing and Control Systems SC
Parameter monitoring and control systems are crucial in the industry as they enable automation processes that improve productivity and resource optimization. These improvements also help to manage environmental factors and the complex interactions between multiple inputs and outputs required for production management. This paper proposes an automation system for broiler management based on a simulation scenario that involves sensor networks and embedded systems. The aim is to create a transmission network for monitoring and controlling broiler temperature and feeding using the Internet of Things (IoT), complemented by a dashboard and a cloud-based service database to track improvements in broiler management. We look forward this work will serve as a guide for stakeholders and entrepreneurs in the animal production industry, fostering sustainable development through simple and cost-effective automation solutions. The goal is for them to scale and integrate these recommendations into their existing operations, leading to more efficient decision-making at the management level.
comment: 2025 IEEE Technology and Engineering Management Society Conference (TEMSCON LATAM), Cartagena, Colombia
Flexibility aggregation via set projection for distribution grids with multiple interconnections
With the increasing number of flexible energy devices in distribution grids, coordination between Transmission System Operators (TSOs) and Distribution System Operators (DSOs) becomes critical for optimal system operation. One form of coordination is to solve the overall system operation problem in a hierarchical way, computing Feasible Operational Regions (FORs) for the interconnection between TSO/DSO. Most methods for computing FORs rely on the assumption of only one interconnection point between TSO and DSOs, which is often violated in practice. In this work, we propose a method for computing FORs in distribution grids with multiple interconnection points to the transmission grid. We test our method in a grid with two interconnecting points and analyze the properties of the resulting high-dimensional FOR from a power systems perspective.
Payload trajectory tracking control for aerial transportation systems with cable length online optimization
Cable-suspended aerial transportation systems are employed extensively across various industries. The capability to flexibly adjust the relative position between the multirotor and the payload has spurred growing interest in the system equipped with variable-length cable, promising broader application potential. Compared to systems with fixed-length cables, introducing the variable-length cable adds a new degree of freedom. However, it also results in increased nonlinearity and more complex dynamic coupling among the multirotor, the cable and the payload, posing significant challenges in control design. This paper introduces a backstepping control strategy tailored for aerial transportation systems with variable-length cable, designed to precisely track the payload trajectory while dynamically adjusting cable length. Then, a cable length generator has been developed that achieves online optimization of the cable length while satisfying state constraints, thus balancing the multirotor's motion and cable length changes without the need for manual trajectory planning. The asymptotic stability of the closed-loop system is guaranteed through Lyapunov techniques and the growth restriction condition. Finally, simulation results confirm the efficacy of the proposed method in managing trajectory tracking and cable length adjustments effectively.
Inertia Partitioning Modular Control Framework for Reconfigurable Multibody Systems
A novel modular control framework for reconfigurable rigid multibody systems is proposed, motivated by the challenges of modular control of systems with closed kinematic chains. In the framework, modularity is defined in the sense of degrees of freedom, and the inertial properties of each body are partitioned with respect to how they are reflected in the kinetic energy of the system through the motion induced by each degree of freedom. This approach inherently handles closed chains in the same manner as tree-like structures, eliminating the need for explicit constraint force calculations or formulations based on differential-algebraic equations. The proposed framework is implemented via simulation on a three-degree-of-freedom series-parallel manipulator, with the results being consistent with the expected stability and tracking performance, and indicating the framework's potential for scalability in trajectory-tracking control of multibody systems.
Neural Networks for AC Optimal Power Flow: Improving Worst-Case Guarantees during Training SC
The AC Optimal Power Flow (AC-OPF) problem is central to power system operation but challenging to solve efficiently due to its nonconvex and nonlinear nature. Neural networks (NNs) offer fast surrogates, yet their black-box behavior raises concerns about constraint violations that can compromise safety. We propose a verification-informed NN framework that incorporates worst-case constraint violations directly into training, producing models that are both accurate and provably safer. Through post-hoc verification, we achieve substantial reductions in worst-case violations and, for the first time, verify all operational constraints of large-scale AC-OPF proxies. Practical feasibility is further enhanced via restoration and warm-start strategies for infeasible operating points. Experiments on systems ranging from 57 to 793 buses demonstrate scalability, speed, and reliability, bridging the gap between ML acceleration and safe, real-time deployment of AC-OPF solutions - and paving the way toward data-driven optimal control.
comment: Submitted to PSCC 2026 (under review)
Embroidery Actuator Utilizing Embroidery Patterns to Generate Diverse Fabric Deformations
This paper presents a novel Embroidery Actuator, a fabric-integrated pneumatic actuator that enables diverse and controllable deformations through embroidery pattern design. Unlike conventional fabric actuators that rely on fiber- or thread-shaped actuators, the proposed actuator is fabricated by directly stitching an inflatable tube onto the fabric using a cord-embroidery technique. The embroidered thread and the fabric jointly form a sleeve that constrains the expansion of the inflatable tube, converting internal pressure into targeted bending or stretching deformations. By varying the embroidery pattern, such as zigzag or cross configurations, different geometric constraints can be realized, allowing for flexible control of deformation direction and magnitude. Analytical deformation models based on the Neo-Hookean model and Lagrange's equations were developed to predict the relationship between pneumatic pressure and bending angle, and were experimentally validated using motion-capture measurements. The results demonstrated that the actuator achieves strong agreement with the analytical deformation model.
comment: 8 pages, 8 figures. This work has been submitted to the IEEE for possible publication
Combining High Level Scheduling and Low Level Control to Manage Fleets of Mobile Robots
The deployment of mobile robots for material handling in industrial environments requires scalable coordination of large fleets in dynamic settings. This paper presents a two-layer framework that combines high-level scheduling with low-level control. Tasks are assigned and scheduled using the compositional algorithm ComSat, which generates time-parameterized routes for each robot. These schedules are then used by a distributed Model Predictive Control (MPC) system in real time to compute local reference trajectories, accounting for static and dynamic obstacles. The approach ensures safe, collision-free operation, and supports rapid rescheduling in response to disruptions such as robot failures or environmental changes. We evaluate the method in simulated 2D environments with varying road capacities and traffic conditions, demonstrating high task completion rates and robust behavior even under congestion. The modular structure of the framework allows for computational tractability and flexibility, making it suitable for deployment in complex, real-world industrial scenarios.
Context-awareness for Dependable Low-Power IoT
Dependability is the ability to consistently deliver trusted and uninterrupted service in the face of operational uncertainties. Ensuring dependable operation in large-scale, energy-constrained Internet of Things (IoT) deployments is as crucial as challenging, and calls for context-aware protocols where context refers to situational or state information. In this paper, we identify four critical context dimensions for IoT networks, namely energy status, information freshness, task relevance, and physical/medium conditions, and show how each one underpins core dependability attributes. Building on these insights, we propose a two-step protocol design framework that incorporates operation-specific context fields. Through three representative use cases, we demonstrate how context awareness can significantly enhance system dependability while imposing only minimal control-plane overhead.
NeuroDOB: A Deep Neural Observer-Based Controller for Vehicle Lateral Dynamics
This paper proposes NeuroDOB, a deep neural network based observer controller for vehicle lateral dynamics, which replaces the conventional disturbance observer (DOB) with a deep neural network (DNN) to enhance personalized lateral control. Unlike conventional DOBs that compensate for general disturbances such as road friction variation and crosswind, NeuroDOB explicitly addresses unmodeled vehicle dynamics and driver-specific behaviors by learning the steering compensation signal from driver-in-the-loop simulations using CarSim's embedded controller as a surrogate driver. The proposed architecture integrates NeuroDOB with a linear quadratic regulator (LQR), where the DNN outputs a delta error correction added to the baseline LQR steering input to produce the final control command. Input features to the DNN include lateral position and yaw angle errors, and the LQR control input. Experimental validation using a lateral dynamic bicycle model within CarSim demonstrates that NeuroDOB effectively adapts to individual driving habits, improving lateral control performance beyond what conventional LQR controllers achieve. The results indicate the potential of deep neural network based observer to enable personalized and adaptive autonomous vehicle control. In cognitive terms, the proposed architecture can be viewed as a dual-system control structure. The baseline LQR corresponds to System 1, a model-based, fast, and analytic reasoning layer ensuring stability. The NeuroDOB acts as System 2, a reflective, data-driven layer that learns compensation from experience and corrects the analytical bias of System 1. Together, they form an integrated decision process analogous to human intuition-reflection interaction, enabling both stability and adaptability in lateral control.
comment: 12 pages, 16 figures
zkSTAR: A zero knowledge system for time series attack detection enforcing regulatory compliance in critical infrastructure networks
Industrial control systems (ICS) form the operational backbone of critical infrastructure networks (CIN) such as power grids, water supply systems, and gas pipelines. As cyber threats to these systems escalate, regulatory agencies are imposing stricter compliance requirements to ensure system-wide security and reliability. A central challenge, however, is enabling regulators to verify the effectiveness of detection mechanisms without requiring utilities to disclose sensitive operational data. In this paper, we introduce zkSTAR, a cyberattack detection framework that leverages zk-SNARKs to reconcile these requirements and enable provable detection guarantees while preserving data confidentiality. Our approach builds on established residual-based statistical hypothesis testing methods applied to state-space detection models. Specifically, we design a two-pronged zk-SNARK architecture that enforces temporal consistency of the state-space dynamics and statistical consistency of the detection tests, allowing regulators to temporally verify alarm correctness without visibility into utility-level data. We formally analyze the soundness and zero knowledge properties of our framework and validate its practical feasibility through computational experiments on real-world ICS datasets. As a result, our work demonstrates a scalable, privacy-preserving alternative for regulatory compliance for ICS driven critical infrastructure networks.
Seq-DeepIPC: Sequential Sensing for End-to-End Control in Legged Robot Navigation
We present Seq-DeepIPC, a sequential end-to-end perception-to-control model for legged robot navigation in realworld environments. Seq-DeepIPC advances intelligent sensing for autonomous legged navigation by tightly integrating multi-modal perception (RGB-D + GNSS) with temporal fusion and control. The model jointly predicts semantic segmentation and depth estimation, giving richer spatial features for planning and control. For efficient deployment on edge devices, we use EfficientNet-B0 as the encoder, reducing computation while maintaining accuracy. Heading estimation is simplified by removing the noisy IMU and instead computing the bearing angle directly from consecutive GNSS positions. We collected a larger and more diverse dataset that includes both road and grass terrains, and validated Seq-DeepIPC on a robot dog. Comparative and ablation studies show that sequential inputs improve perception and control in our models, while other baselines do not benefit. Seq-DeepIPC achieves competitive or better results with reasonable model size; although GNSS-only heading is less reliable near tall buildings, it is robust in open areas. Overall, Seq-DeepIPC extends end-to-end navigation beyond wheeled robots to more versatile and temporally-aware systems. To support future research, we will release the codes to our GitHub repository at https://github.com/oskarnatan/Seq-DeepIPC.
comment: Preprint notice, this manuscript has been submitted to IEEE sensors journal for possible publication
Planning Oriented Integrated Sensing and Communication
Integrated sensing and communication (ISAC) enables simultaneous localization, environment perception, and data exchange for connected autonomous vehicles. However, most existing ISAC designs prioritize sensing accuracy and communication throughput, treating all targets uniformly and overlooking the impact of critical obstacles on motion efficiency. To overcome this limitation, we propose a planning-oriented ISAC (PISAC) framework that reduces the sensing uncertainty of planning-bottleneck obstacles and expands the safe navigable path for the ego-vehicle, thereby bridging the gap between physical-layer optimization and motion-level planning. The core of PISAC lies in deriving a closed-form safety bound that explicitly links ISAC transmit power to sensing uncertainty, based on the Cram\'er-Rao Bound and occupancy inflation principles. Using this model, we formulate a bilevel power allocation and motion planning (PAMP) problem, where the inner layer optimizes the ISAC beam power distribution and the outer layer computes a collision-free trajectory under uncertainty-aware safety constraints. Comprehensive simulations in high-fidelity urban driving environments demonstrate that PISAC achieves up to 40% higher success rates and over 5% shorter traversal times than existing ISAC-based and communication-oriented benchmarks, validating its effectiveness in enhancing both safety and efficiency.
An Intelligent Water-Saving Irrigation System Based on Multi-Sensor Fusion and Visual Servoing Control
This paper introduces an intelligent water-saving irrigation system designed to address critical challenges in precision agriculture, such as inefficient water use and poor terrain adaptability. The system integrates advanced computer vision, robotic control, and real-time stabilization technologies via a multi-sensor fusion approach. A lightweight YOLO model, deployed on an embedded vision processor (K210), enables real-time plant container detection with over 96% accuracy under varying lighting conditions. A simplified hand-eye calibration algorithm-designed for 'handheld camera' robot arm configurations-ensures that the end effector can be precisely positioned, with a success rate exceeding 90%. The active leveling system, driven by the STM32F103ZET6 main control chip and JY901S inertial measurement data, can stabilize the irrigation platform on slopes up to 10 degrees, with a response time of 1.8 seconds. Experimental results across three simulated agricultural environments (standard greenhouse, hilly terrain, complex lighting) demonstrate a 30-50% reduction in water consumption compared to conventional flood irrigation, with water use efficiency exceeding 92% in all test cases.
End-to-End Design and Validation of a Low-Cost Stewart Platform with Nonlinear Estimation and Control
This paper presents the complete design, control, and experimental validation of a low-cost Stewart platform prototype developed as an affordable yet capable robotic testbed for research and education. The platform combines off the shelf components with 3D printed and custom fabricated parts to deliver full six degrees of freedom motions using six linear actuators connecting a moving platform to a fixed base. The system software integrates dynamic modeling, data acquisition, and real time control within a unified framework. A robust trajectory tracking controller based on feedback linearization, augmented with an LQR scheme, compensates for the platform's nonlinear dynamics to achieve precise motion control. In parallel, an Extended Kalman Filter fuses IMU and actuator encoder feedback to provide accurate and reliable state estimation under sensor noise and external disturbances. Unlike prior efforts that emphasize only isolated aspects such as modeling or control, this work delivers a complete hardware-software platform validated through both simulation and experiments on static and dynamic trajectories. Results demonstrate effective trajectory tracking and real-time state estimation, highlighting the platform's potential as a cost effective and versatile tool for advanced research and educational applications.
comment: 24 pages, journal
Never Too Rigid to Reach: Adaptive Virtual Model Control with LLM- and Lyapunov-Based Reinforcement Learning
Robotic arms are increasingly deployed in uncertain environments, yet conventional control pipelines often become rigid and brittle when exposed to perturbations or incomplete information. Virtual Model Control (VMC) enables compliant behaviors by embedding virtual forces and mapping them into joint torques, but its reliance on fixed parameters and limited coordination among virtual components constrains adaptability and may undermine stability as task objectives evolve. To address these limitations, we propose Adaptive VMC with Large Language Model (LLM)- and Lyapunov-Based Reinforcement Learning (RL), which preserves the physical interpretability of VMC while supporting stability-guaranteed online adaptation. The LLM provides structured priors and high-level reasoning that enhance coordination among virtual components, improve sample efficiency, and facilitate flexible adjustment to varying task requirements. Complementarily, Lyapunov-based RL enforces theoretical stability constraints, ensuring safe and reliable adaptation under uncertainty. Extensive simulations on a 7-DoF Panda arm demonstrate that our approach effectively balances competing objectives in dynamic tasks, achieving superior performance while highlighting the synergistic benefits of LLM guidance and Lyapunov-constrained adaptation.
Secure Control of Connected and Autonomous Electrified Vehicles Under Adversarial Cyber-Attacks
Connected and Autonomous Electrified Vehicles (CAEV) is the solution to the future smart mobility having benefits of efficient traffic flow and cleaner environmental impact. Although CAEV has advantages they are still susceptible to adversarial cyber attacks due to their autonomous electric operation and the involved connectivity. To alleviate this issue, we propose a secure control architecture of CAEV. Particularly, we design an additional control input using Reinforcement Learning (RL) to be applied to the vehicle powertrain along with the input commanded by the battery. We present simulation case studies to demonstrate the potential of the proposed approach in keeping the CAEV platoon operating safely without collisions by curbing the effect of adversarial attacks.
Dynamical Modeling of Temperature and Smoke Evolution in a Thermal-Runaway Event of a Large-Format Lithium-ion Battery in a Mine Tunnel
Large-format lithium-ion batteries (LIBs) provide effective energy storage solutions for high-power equipment used in underground mining operations. They have high Columbic efficiency and minimal heat and emission footprints. However, improper use of LIBs, accidents, or other factors may increase the probability of thermal runaway (TR), a rapid combustion reaction that discharges toxic and flammable substances. Several such incidents have been documented in mines. Since repeatable TR experiments to uncover the transient-state propagation of TR are expensive and hazardous, high-fidelity models are usually developed to mimic the impact of these events. They are resource-intensive and are impractical to develop for many scenarios that could be observed in a mine. Therefore, dynamic models within a reduced-order framework were constructed to represent the transient-state combustion event. Reduced order models (ROMs) reasonably replicate trends in temperature and smoke, showing strong alignment with the ground-truth dataset.
Modeling and Scheduling of Fusion Patterns in Autonomous Driving Systems (Extended Version)
In Autonomous Driving Systems (ADS), Directed Acyclic Graphs (DAGs) are widely used to model complex data dependencies and inter-task communication. However, existing DAG scheduling approaches oversimplify data fusion tasks by assuming fixed triggering mechanisms, failing to capture the diverse fusion patterns found in real-world ADS software stacks. In this paper, we propose a systematic framework for analyzing various fusion patterns and their performance implications in ADS. Our framework models three distinct fusion task types: timer-triggered, wait-for-all, and immediate fusion, which comprehensively represent real-world fusion behaviors. Our Integer Linear Programming (ILP)-based approach enables an optimization of multiple real-time performance metrics, including reaction time, time disparity, age of information, and response time, while generating deterministic offline schedules directly applicable to real platforms. Evaluation using real-world ADS case studies, Raspberry Pi implementation, and randomly generated DAGs demonstrates that our framework handles diverse fusion patterns beyond the scope of existing work, and achieves substantial performance improvements in comparable scenarios.
Carbon-Aware Optimal Power Flow with Data-Driven Carbon Emission Tracing
Quantifying locational carbon emissions in power grids is crucial for implementing effective carbon reduction strategies for customers relying on electricity. This paper presents a carbon-aware optimal power flow (OPF) framework that incorporates data-driven carbon tracing, enabling rapid estimation of nodal carbon emissions from electric loads. By developing generator-to-load carbon emission distribution factors through data-driven technique, the analytical formulas for both average and marginal carbon emissions can be derived and integrated seamlessly into DC OPF models as linear constraints. The proposed carbon-aware OPF model enables market operators to optimize energy dispatch while reducing greenhouse gas emissions. Simulations on IEEE test systems confirm the accuracy and computational efficiency of the proposed approach, highlighting its applicability for real-time carbon-aware system operations.
A Spatio-Temporal Graph Learning Approach to Real-Time Economic Dispatch with Multi-Transmission-Node DER Aggregation
The integration of distributed energy resources (DERs) into wholesale electricity markets, as mandated by FERC Order 2222, imposes new challenges on system operations. To remain consistent with existing market structures, regional transmission organizations (RTOs) have advanced the aggregation of transmission-node-level DERs (T-DERs), where a nodal virtual power plant (VPP) represents the mapping of all distribution-level DERs to their respective transmission nodes. This paper develops a real-time economic dispatch (RTED) framework that enables multi-transmission-node DER aggregation while addressing computational efficiency. To this end, we introduce a spatio-temporal graph convolutional network (ST-GCN) for adaptive prediction of distribution factors (DFs), thereby capturing the dynamic influence of individual T-DERs across the transmission system. Furthermore, an iterative constraint identification strategy is incorporated to alleviate transmission security constraints without compromising system reliability. Together, these innovations accelerate the market clearing process and support the effective participation of T-DER aggregators under current market paradigms. The proposed approach is validated on large-scale test systems, including modified 118-, 2383-, and 3012-bus networks under a rolling RTED setting with real demand data. Numerical results demonstrate significant improvements in reducing operational costs and maintaining transmission network feasibility, underscoring the scalability and practicality of the proposed framework.
Neural Two-Stage Stochastic Volt-VAR Optimization for Three-Phase Unbalanced Distribution Systems with Network Reconfiguration
The increasing integration of intermittent distributed energy resources (DERs) has introduced significant variability in distribution networks, posing challenges to voltage regulation and reactive power management. This paper presents a novel neural two-stage stochastic Volt-VAR optimization (2S-VVO) method for three-phase unbalanced distribution systems considering network reconfiguration under uncertainty. To address the computational intractability associated with solving large-scale scenario-based 2S-VVO problems, a learning-based acceleration strategy is introduced, wherein the second-stage recourse model is approximated by a neural network. This neural approximation is embedded into the optimization model as a mixed-integer linear program (MILP), enabling effective enforcement of operational constraints related to the first-stage decisions. Numerical simulations on a 123-bus unbalanced distribution system demonstrate that the proposed approach achieves over 50 times speedup compared to conventional solvers and decomposition methods, while maintaining a typical optimality gap below 0.30%. These results underscore the method's efficacy and scalability in addressing large-scale stochastic VVO problems under practical operating conditions.
MDP-based Energy-aware Task Scheduling for Battery-less IoT
Realizing high long-term task completion rates represents a fundamental challenge in battery-less Internet of Things (IoT) devices powered by ambient energy harvesting. This difficulty is primarily due to the stochastic and time-varying characteristics of the available energy, which significantly complicate the design of optimal task scheduling policies. In this paper, we consider a battery-less IoT device that must periodically report sensing measurements to a monitoring center. We adopt the Markov decision process (MDP) framework to handle energy variability while aiming to maximize the long-term task completion rate. For this, we first identify its components and then define two appropriate reward functions. We demonstrate the inherent properties associated with the MDP formulation and the related optimal policy. Subsequently, we solve the resulting optimization problem, leading to the optimal stationary threshold-based (OSTB) scheduling. Simulation results demonstrate that OSTB outperforms the well-known ``as late as possible'' (ALAP) scheduling strategy. For instance, an $8.6\%$ increase in the task completion rate, along with a $65\%$ reduction in power failures and a $86.29\%$ decrease in execution delays during task execution are registered assuming a $4.7$ mF capacitor.
comment: 13 pages, 11 figures
A Simultaneous ECG-PCG Acquisition System with Real-Time Burst-Adaptive Noise Cancellation ISCA
Cardiac auscultation is an essential clinical skill, requiring excellent hearing to distinguish subtle differences in timing and pitch of heart sounds. However, diagnosing solely from these sounds is often challenging due to interference from surrounding noise, and the information may be limited. Existing solutions that adaptively cancel external noise are either not real-time or are computationally intensive, making them unsuitable for implementation in a portable system. This work proposes an end-to-end system with a real-time adaptive noise cancellation pipeline integrated into a device that simultaneously acquires electrocardiogram (ECG) and phonocardiogram (PCG) signals. The performance of the system is validated using real-world hospital noise datasets and recordings captured with the dual-modality device. For PCG and ECG signals recorded from the device in noisy hospital settings, the proposed algorithms achieved signal-to-noise ratio improvements of 37.01 dB and 30.32 dB, respectively. These results demonstrate the systems effectiveness in enabling reliable and accessible cardiac screening, including noisy hospital environments typical of resource-constrained settings.
comment: Paper submitted to IEEE International Symposium on Circuits and Systems (ISCAS) 2026
Maximal Load Shedding Verification for Neural Network Models of AC Line Switching
Solving for globally optimal line switching decisions in AC transmission grids can be intractability slow. Machine learning (ML) models, meanwhile, can be trained to predict near-optimal decisions at a fraction of the speed. Verifying the performance and impact of these ML models on network operation, however, is a critically important step prior to their actual deployment. In this paper, we train a Neural Network (NN) to solve the optimal power shutoff line switching problem. To assess the worst-case load shedding induced by this model, we propose a bilevel attacker-defender verification approach that finds the NN line switching decisions that cause the highest quantity of network load shedding. Solving this problem to global optimality is challenging (due to AC power flow and NN nonconvexities), so our approach exploits a convex relaxation of the AC physics, combined with a local NN search, to find a guaranteed lower bound on worst--case load shedding. These under-approximation bounds are solved via MathOptAI.jl. We benchmark against a random sampling approach, and we find that our optimization-based approach always finds larger load shedding. Test results are collected on multiple PGLib test cases and on trained NN models which contain more than 10 million model parameters.
Switching Network System Identification via Convex Optimizations
This paper introduces a convex optimization framework for identifying switched network systems, in which both the node dynamics and the underlying graph topology switch between a finite number of configurations. Building on our recent convex identification method for general switching systems, we extend the formulation to structured network systems where each mode corresponds to a distinct adjacency matrix. We show that both the continuous node dynamics and binary network topologies can be identified from sampled state-velocity data by solving a sequence of convex programs. The proposed framework provides a unified and scalable way to recover piecewise network structures from data without a prior knowledge of mode labels at each state. Numerical results on diffusively coupled oscillators demonstrate accurate recovery of both mode dynamics and switching graphs.
comment: 6 pages, 5 figures
Data-Driven Soft Robot Control via Adiabatic Spectral Submanifolds
The mechanical complexity of soft robots creates significant challenges for their model-based control. Specifically, linear data-driven models have struggled to control soft robots on complex, spatially extended paths that explore regions with significant nonlinear behavior. To account for these nonlinearities, we develop here a model-predictive control strategy based on the recent theory of adiabatic spectral submanifolds (aSSMs). This theory is applicable because the internal vibrations of heavily overdamped robots decay at a speed that is much faster than the desired speed of the robot along its intended path. In that case, low-dimensional attracting invariant manifolds (aSSMs) emanate from the path and carry the dominant dynamics of the robot. Aided by this recent theory, we devise an aSSM-based model-predictive control scheme purely from data. We demonstrate our data-driven model's effectiveness in tracking dynamic trajectories across diverse tasks, validated on a high-fidelity, high-dimensional finite-element model of a soft trunk robot and a Cosserat rod-based elastic soft arm. Notably, we find that five- or six-dimensional aSSM-reduced models outperform the tracking performance of other data-driven modeling methods by a factor up to $10$ across all closed-loop control tasks.
comment: 41 pages, 24 figures
iWalker: Imperative Visual Planning for Walking Humanoid Robot
Humanoid robots, designed to operate in human-centric environments, serve as a fundamental platform for a broad range of tasks. Although humanoid robots have been extensively studied for decades, a majority of existing humanoid robots still heavily rely on complex modular frameworks, leading to inflexibility and potential compounded errors from independent sensing, planning, and acting components. In response, we propose an end-to-end humanoid sense-plan-act walking system, enabling vision-based obstacle avoidance and footstep planning for whole body balancing simultaneously. We designed two imperative learning (IL)-based bilevel optimizations for model-predictive step planning and whole body balancing, respectively, to achieve self-supervised learning for humanoid robot walking. This enables the robot to learn from arbitrary unlabeled data, improving its adaptability and generalization capabilities. We refer to our method as iWalker and demonstrate its effectiveness in both simulated and real-world environments, representing a significant advancement toward autonomous humanoid robots.
ExAMPC: the Data-Driven Explainable and Approximate NMPC with Physical Insights IROS
Amidst the surge in the use of Artificial Intelligence (AI) for control purposes, classical and model-based control methods maintain their popularity due to their transparency and deterministic nature. However, advanced controllers like Nonlinear Model Predictive Control (NMPC), despite proven capabilities, face adoption challenges due to their computational complexity and unpredictable closed-loop performance in complex validation systems. This paper introduces ExAMPC, a methodology bridging classical control and explainable AI by augmenting the NMPC with data-driven insights to improve the trustworthiness and reveal the optimization solution and closed-loop performance's sensitivities to physical variables and system parameters. By employing a low-order spline embedding, we reduce the open-loop trajectory dimensionality by over 95%, and integrate it with SHAP and Symbolic Regression from eXplainable AI (XAI) for an approximate NMPC, enabling intuitive physical insights into the NMPC's optimization routine. The prediction accuracy of the approximate NMPC is enhanced through physics-inspired continuous-time constraints penalties, reducing the predicted continuous trajectory violations by 93%. ExAMPC also enables accurate forecasting of the NMPC's computational requirements with explainable insights on worst-case scenarios. Experimental validation on automated valet parking and autonomous racing with lap-time optimization, demonstrates the methodology's practical effectiveness for potential real-world applications.
comment: This paper has been accepted for publication in the 2025 IEEE/RSJ IROS Conference
Formally Verified Neural Network Controllers for Incremental Input-to-State Stability of Unknown Discrete-Time Systems
This work aims to synthesize a controller that ensures that an unknown discrete-time system is incrementally input-to-state stable ($\delta$-ISS). In this work, we introduce the notion of $\delta$-ISS control Lyapunov function ($\delta$-ISS-CLF), which, in conjunction with the controller, ensures that the closed-loop system is incrementally ISS. To address the unknown dynamics of the system, we parameterize the controller as well as the $\delta$-ISS-CLF as neural networks and learn them by utilizing the sampled data from the state space of the unknown system. To formally verify the obtained $\delta$-ISS-CLF, we develop a validity condition and incorporate the condition into the training framework to ensure a provable correctness guarantee at the end of the training process. Finally, the usefulness of the proposed approach is proved using multiple case studies - the first one is a scalar system with a non-affine non-polynomial structure, the second example is a one-link manipulator system, the third system is a nonlinear Moore-Grietzer model of the jet engine and the final one is a rotating rigid spacecraft model.
A Single Motor Nano Aerial Vehicle with Novel Peer-to-Peer Communication and Sensing Mechanism
Communication and position sensing are among the most important capabilities for swarm robots to interact with their peers and perform tasks collaboratively. However, the hardware required to facilitate communication and position sensing is often too complicated, expensive, and bulky to be carried on swarm robots. Here we present Maneuverable Piccolissimo 3 (MP3), a minimalist, single motor drone capable of executing inter-robot communication via infrared light and triangulation-based sensing of relative bearing, distance, and elevation using message arrival time. Thanks to its novel design, MP3 can communicate with peers and localize itself using simple components, keeping its size and mass small and making it inherently safe for human interaction. We present the hardware and software design of MP3 and demonstrate its capability to localize itself, fly stably, and maneuver in the environment using peer-to-peer communication and sensing.
Opinion Dynamics on Signed Graphs and Graphons
In this paper, we make use of graphon theory to study opinion dynamics on large undirected networks. The opinion dynamics models that we take into consideration allow for negative interactions between the individuals, whose opinions can thus grow apart. We consider both the repelling and the opposing models of negative interactions, which have been studied in the literature. We define the repelling and the opposing dynamics on signed graphons and we show that their initial value problem solutions exist and are unique. We then show that, in a suitable sense, the graphon dynamics is a good approximation of the dynamics on large graphs that converge to a graphon. This result applies to large random graphs that are sampled according to a graphon (W-random graphs), for which we provide a new convergence result under very general assumptions.
FlightKooba: A Fast Interpretable FTP Model
Flight trajectory prediction (FTP) and similar time series tasks typically require capturing smooth latent dynamics hidden within noisy signals. However, existing deep learning models face significant challenges of high computational cost and insufficient interpretability due to their complex black-box nature. This paper introduces FlightKooba, a novel modeling approach designed to extract such underlying dynamics analytically. Our framework uniquely integrates HiPPO theory, Koopman operator theory, and control theory. By leveraging Legendre polynomial bases, it constructs Koopman operators analytically, thereby avoiding large-scale parameter training. The method's core strengths lie in its exceptional computational efficiency and inherent interpretability. Experiments on multiple public datasets validate our design philosophy: for signals exhibiting strong periodicity or clear physical laws (e.g., in aviation, meteorology, and traffic flow), FlightKooba delivers competitive prediction accuracy while reducing trainable parameters by several orders of magnitude and achieving the fastest training speed. Furthermore, we analyze the model's theoretical boundaries, clarifying its inherent low-pass filtering characteristics that render it unsuitable for sequences dominated by high-frequency noise. In summary, FlightKooba offers a powerful, efficient, and interpretable new alternative for time series analysis, particularly in resource-constrained environments.
comment: Version 2: Major revision of the manuscript to refine the narrative, clarify the model's theoretical limitations and application scope, and improve overall presentation for journal submission
A Linear Parameter-Varying Approach to Data Predictive Control
By means of the linear parameter-varying (LPV) Fundamental Lemma, we derive novel data-driven predictive control (DPC) methods for LPV systems. In particular, we present output-feedback and state-feedback-based LPV-DPC methods with terminal ingredients, which guarantee exponential stability and recursive feasibility. We provide methods for the data-based computation of these terminal ingredients. Furthermore, an in-depth analysis of the application and implementation aspects of the LPV-DPC schemes is given, including application for nonlinear systems and handling noisy data. We compare and demonstrate the performance of the proposed methods in a detailed simulation example involving a nonlinear unbalanced disc system.
comment: To appear in IEEE Transactions on Automatic Control. Final Author Copy. Extended version (Section VI.C, Appendix C & D not in original version). 18 pages
Composite Learning Adaptive Control under Non-Persistent Partial Excitation
This paper focuses on relaxing the excitation conditions for the adaptive control of uncertain nonlinear systems. By adopting the spectral decomposition technique, a linear regression equation (LRE) is constructed to quantitatively collect historical excitation information, based on which the parameter estimation error is decomposed into the excited component and the unexcited component. By sufficiently utilizing the collected excitation information, the composite learning and {\mu}-modification terms are designed and incorporated into the "Lyapunov-based" parameter update law. By developing a novel Lyapunov function, it is demonstrated that under non-persistent partial excitation, the control error and the excited parameter estimation error component converge to zero, while the unexcited component remains bounded. Furthermore, the proposed adaptive control scheme can effectively eliminate the effects of parametric uncertainties and enhance the robustness of the closed-loop systems. Simulation results are provided to verify the theoretical findings.
comment: 16 pages, 15 figures
A Hybrid GNN-IZR Framework for Fast and Empirically Robust AC Power Flow Analysis in Radial Distribution Systems
The Alternating Current Power Flow (ACPF) problem forces a trade-off between the speed of data-driven models and the reliability of analytical solvers. This paper introduces a hybrid framework that synergizes a Graph Neural Network (GNN) with the Implicit Z-Bus Recursive (IZR) method, a robust, non-iterative solver for radial distribution networks. The framework employs a physics-informed GNN for rapid initial predictions and invokes the IZR solver as a failsafe for stressed cases identified by a two-stage trigger. A failure is defined as any solution with a maximum power mismatch exceeding 0.1 p.u., a significant operational deviation. On a challenging test set of 7,500 stressed scenarios for the IEEE 33-bus system, the GNN-only model failed on 13.11 % of cases. In contrast, the hybrid framework identified all potential failures, delegating them to the IZR solver to achieve a 0.00 % failure rate, empirically matching the 100 % success rate of the analytical solver on this specific test set. An expanded ablation study confirms that both physics-informed training and Z-bus sensitivity features are critical, collaboratively reducing the GNN's failure rate from 98.72 % (data-only) to 13.11 %. The hybrid approach demonstrates a pragmatic path to achieving the empirical reliability of an analytical solver while leveraging GNN speed, enabling a significant increase in the number of scenarios analyzable in near real-time.
CIVIL: Causal and Intuitive Visual Imitation Learning
Today's robots attempt to learn new tasks by imitating human examples. These robots watch the human complete the task, and then try to match the actions taken by the human expert. However, this standard approach to visual imitation learning is fundamentally limited: the robot observes what the human does, but not why the human chooses those behaviors. Without understanding which features of the system or environment factor into the human's decisions, robot learners often misinterpret the human's examples. In practice, this results in causal confusion, inefficient learning, and robot policies that fail when the environment changes. We therefore propose a shift in perspective: instead of asking human teachers just to show what actions the robot should take, we also enable humans to intuitively indicate why they made those decisions. Under our paradigm human teachers attach markers to task-relevant objects and use natural language prompts to describe their state representation. Our proposed algorithm, CIVIL, leverages this augmented demonstration data to filter the robot's visual observations and extract a feature representation that aligns with the human teacher. CIVIL then applies these causal features to train a transformer-based policy that -- when tested on the robot -- is able to emulate human behaviors without being confused by visual distractors or irrelevant items. Our simulations and real-world experiments demonstrate that robots trained with CIVIL learn both what actions to take and why to take those actions, resulting in better performance than state-of-the-art baselines. From the human's perspective, our user study reveals that this new training paradigm actually reduces the total time required for the robot to learn the task, and also improves the robot's performance in previously unseen scenarios. See videos at our project website: https://civil2025.github.io
Learning Robust Satellite Attitude Dynamics with Physics-Informed Normalising Flow
Attitude control is a fundamental aspect of spacecraft operations. Model Predictive Control (MPC) has emerged as a powerful strategy for these tasks, relying on accurate models of the system dynamics to optimize control actions over a prediction horizon. In scenarios where physics models are incomplete, difficult to derive, or computationally expensive, machine learning offers a flexible alternative by learning the system behavior directly from data. However, purely data-driven models often struggle with generalization and stability, especially when applied to inputs outside their training domain. To address these limitations, we investigate the benefits of incorporating Physics-Informed Neural Networks (PINNs) into the learning of spacecraft attitude dynamics, comparing their performance with that of purely data-driven approaches. Using a Real-valued Non-Volume Preserving (Real NVP) neural network architecture with a self-attention mechanism, we trained several models on simulated data generated with the Basilisk simulator. Two training strategies were considered: a purely data-driven baseline and a physics-informed variant to improve robustness and stability. Our results demonstrate that the inclusion of physics-based information significantly enhances the performance in terms of the mean relative error with the best architectures found by 27.08%. These advantages are particularly evident when the learned models are integrated into an MPC framework, where PINN-based models consistently outperform their purely data-driven counterparts in terms of control accuracy and robustness, and achieve improved settling times when compared to traditional MPC approaches, yielding improvements of up to 62%, when subject to observation noise and RWs friction.
A Single Motor Nano Aerial Vehicle with Novel Peer-to-Peer Communication and Sensing Mechanism
Communication and position sensing are among the most important capabilities for swarm robots to interact with their peers and perform tasks collaboratively. However, the hardware required to facilitate communication and position sensing is often too complicated, expensive, and bulky to be carried on swarm robots. Here we present Maneuverable Piccolissimo 3 (MP3), a minimalist, single motor drone capable of executing inter-robot communication via infrared light and triangulation-based sensing of relative bearing, distance, and elevation using message arrival time. Thanks to its novel design, MP3 can communicate with peers and localize itself using simple components, keeping its size and mass small and making it inherently safe for human interaction. We present the hardware and software design of MP3 and demonstrate its capability to localize itself, fly stably, and maneuver in the environment using peer-to-peer communication and sensing.
comment: Proceedings of Robotics: Science and Systems (RSS), 2024
Design and Optimization of EV Charging Infrastructure with Battery in Commercial Buildings
The installation of electric vehicle (EV) charging stations in buildings is inevitable, as states push for increased EV adoption to support decarbonization efforts. This transition could force the need for grid infrastructure upgrades and enhanced controls to support reliable power delivery to end-use loads, and overall economic operation. This paper evaluates strategies that address these needs on two fronts: i) optimal sizing of service transformers and battery energy storage systems (BESS), and ii) optimized coordination between EV charging, BESS operation, and building demand. These strategies are applied to a school campus setting, consisting of building and EV charging loads, to provide an illustration of energy management in commercial buildings with EV fleets. A rolling-window optimization approach is applied to determine i) optimal sizing of the service transformer and BESS and ii) optimal control of EV charging and BESS charge/discharge schedules. The design and control strategies are validated in a 20-year time horizon with an annually increasing number of EVs (buses and vans). In addition, an economic analysis is also carried out to show the costs and benefits of each design as a medium- and long-term investment.
comment: This paper contains references to several concepts that are not aligned with current DOE guidance (decarbonization, state mandates, green energy, etc.)
Dynamic Dimensioning of Frequency Containment Reserves: The Case of the Nordic Grid
One of the main responsibilities of a Transmission System Operator (TSO) operating an electric grid is to maintain a designated frequency (e.g., 50 Hz in Europe). To achieve this, TSOs have created several products called frequency-supporting ancillary services. The Frequency Containment Reserve (FCR) is one of these ancillary service products. This article focuses on the TSO problem of determining the volume procured for FCR. Specifically, we investigate the potential benefits and impact on grid security when transitioning from a traditionally \textit{static} procurement method to a \textit{dynamic} strategy for FCR volume. We take the Nordic synchronous area in Europe as a case study and use a diffusion model to capture its frequency development. We introduce a controlled mean reversal parameter to assess changes in FCR obligations, in particular for the Nordic FCR-N ancillary service product. We establish closed-form expressions for exceedance probabilities and use historical frequency data as input to calibrate the model. We show that a dynamic dimensioning approach for FCR has the potential to significantly reduce the exceedance probabilities (up to $37\%$) while maintaining the total yearly procured FCR volume equal to that of the current static approach. Alternatively, a dynamic dimensioning approach could significantly increase security at limited extra cost.
comment: 12 pages, 12 figures, Accepted for publication at IEEE Transactions on Power Systems
The Limits of Fairness of the Variational Generalized Nash Equilibrium
Generalized Nash equilibrium (GNE) problems are commonly used to model strategic interactions between self-interested agents who are coupled in cost and constraints. Specifically, the variational GNE, a refinement of the GNE, is often selected as the solution concept due to its non-discriminatory treatment of agents by charging a uniform ``shadow price" for shared resources. We study the fairness concept of v-GNEs from a comparability perspective and show that it makes an implicit assumption of unit comparability of agent's cost functions, one of the strongest comparability notions. Further, we introduce a new solution concept, f-GNE in which a fairness metric is chosen a priori which is compatible with the comparability at hand. We introduce an electric vehicle charging game to demonstrate the fragility of v-GNE fairness and compare it to the f-GNE under various fairness metrics.
Carbon-Aware Computing for Data Centers with Probabilistic Performance Guarantees
Data centers are significant contributors to carbon emissions and can strain power systems due to their high electricity consumption. To mitigate this impact and to participate in demand response programs, cloud computing companies strive to balance and optimize operations across their global fleets by making strategic decisions about when and where to place compute jobs for execution. In this paper, we introduce a load shaping scheme which reacts to time-varying grid signals by leveraging both temporal and spatial flexibility of compute jobs to provide risk-aware management guidelines and job placement with provable performance guarantees based on distributionally robust optimization. Our approach divides the problem into two key components: (i) day-ahead planning, which generates an optimal scheduling strategy based on historical load data, and (ii) real-time job placement and (time) scheduling, which dynamically tracks the optimal strategy generated in (i). We validate our method in simulation using normalized load profiles from randomly selected Google clusters, incorporating time-varying grid signals. We can demonstrate significant reductions in carbon cost and peak power with our approach compared to myopic greedy policies, while maintaining computational efficiency and abiding to system and grid constraints.
Multiagent Systems
Policies over Poses: Reinforcement Learning based Distributed Pose-Graph Optimization for Multi-Robot SLAM
We consider the distributed pose-graph optimization (PGO) problem, which is fundamental in accurate trajectory estimation in multi-robot simultaneous localization and mapping (SLAM). Conventional iterative approaches linearize a highly non-convex optimization objective, requiring repeated solving of normal equations, which often converge to local minima and thus produce suboptimal estimates. We propose a scalable, outlier-robust distributed planar PGO framework using Multi-Agent Reinforcement Learning (MARL). We cast distributed PGO as a partially observable Markov game defined on local pose-graphs, where each action refines a single edge's pose estimate. A graph partitioner decomposes the global pose graph, and each robot runs a recurrent edge-conditioned Graph Neural Network (GNN) encoder with adaptive edge-gating to denoise noisy edges. Robots sequentially refine poses through a hybrid policy that utilizes prior action memory and graph embeddings. After local graph correction, a consensus scheme reconciles inter-robot disagreements to produce a globally consistent estimate. Our extensive evaluations on a comprehensive suite of synthetic and real-world datasets demonstrate that our learned MARL-based actors reduce the global objective by an average of 37.5% more than the state-of-the-art distributed PGO framework, while enhancing inference efficiency by at least 6X. We also demonstrate that actor replication allows a single learned policy to scale effortlessly to substantially larger robot teams without any retraining. Code is publicly available at https://github.com/herolab-uga/policies-over-poses.
comment: IEEE International Symposium on Multi-Robot & Multi-Agent Systems (MRS) 2025
ATLAS: Actor-Critic Task-Completion with Look-ahead Action Simulation NeurIPS 2025
We observe that current state-of-the-art web-agents are unable to effectively adapt to new environments without neural network fine-tuning, without which they produce inefficient execution plans due to a lack of awareness of the structure and dynamics of the new environment. To address this limitation, we introduce ATLAS (Actor-Critic Task-completion with Look-ahead Action Simulation), a memory-augmented agent that is able to make plans grounded in a model of the environment by simulating the consequences of those actions in cognitive space. Our agent starts by building a "cognitive map" by performing a lightweight curiosity driven exploration of the environment. The planner proposes candidate actions; the simulator predicts their consequences in cognitive space; a critic analyzes the options to select the best roll-out and update the original plan; and a browser executor performs the chosen action. On the WebArena-Lite Benchmark, we achieve a 63% success rate compared to 53.9% success rate for the previously published state-of-the-art. Unlike previous systems, our modular architecture requires no website-specific LLM fine-tuning. Ablations show sizable drops without the world-model, hierarchical planner, and look-ahead-based replanner confirming their complementary roles within the design of our system
comment: 9 pages, NeurIPS 2025 Workshop on Language Agents and World Models
TABL-ABM: A Hybrid Framework for Synthetic LOB Generation ECAI2025
The recent application of deep learning models to financial trading has heightened the need for high fidelity financial time series data. This synthetic data can be used to supplement historical data to train large trading models. The state-of-the-art models for the generative application often rely on huge amounts of historical data and large, complicated models. These models range from autoregressive and diffusion-based models through to architecturally simpler models such as the temporal-attention bilinear layer. Agent-based approaches to modelling limit order book dynamics can also recreate trading activity through mechanistic models of trader behaviours. In this work, we demonstrate how a popular agent-based framework for simulating intraday trading activity, the Chiarella model, can be combined with one of the most performant deep learning models for forecasting multi-variate time series, the TABL model. This forecasting model is coupled to a simulation of a matching engine with a novel method for simulating deleted order flow. Our simulator gives us the ability to test the generative abilities of the forecasting model using stylised facts. Our results show that this methodology generates realistic price dynamics however, when analysing deeper, parts of the markets microstructure are not accurately recreated, highlighting the necessity for including more sophisticated agent behaviors into the modeling framework to help account for tail events.
comment: 8 pages, 5 figures, accepted to the Workshop on AI in Finance at ECAI2025
UCB-type Algorithm for Budget-Constrained Expert Learning
In many modern applications, a system must dynamically choose between several adaptive learning algorithms that are trained online. Examples include model selection in streaming environments, switching between trading strategies in finance, and orchestrating multiple contextual bandit or reinforcement learning agents. At each round, a learner must select one predictor among $K$ adaptive experts to make a prediction, while being able to update at most $M \le K$ of them under a fixed training budget. We address this problem in the \emph{stochastic setting} and introduce \algname{M-LCB}, a computationally efficient UCB-style meta-algorithm that provides \emph{anytime regret guarantees}. Its confidence intervals are built directly from realized losses, require no additional optimization, and seamlessly reflect the convergence properties of the underlying experts. If each expert achieves internal regret $\tilde O(T^\alpha)$, then \algname{M-LCB} ensures overall regret bounded by $\tilde O\!\Bigl(\sqrt{\tfrac{KT}{M}} \;+\; (K/M)^{1-\alpha}\,T^\alpha\Bigr)$. To our knowledge, this is the first result establishing regret guarantees when multiple adaptive experts are trained simultaneously under per-round budget constraints. We illustrate the framework with two representative cases: (i) parametric models trained online with stochastic losses, and (ii) experts that are themselves multi-armed bandit algorithms. These examples highlight how \algname{M-LCB} extends the classical bandit paradigm to the more realistic scenario of coordinating stateful, self-learning experts under limited resources.
Curriculum-Based Iterative Self-Play for Scalable Multi-Drone Racing
The coordination of multiple autonomous agents in high-speed, competitive environments represents a significant engineering challenge. This paper presents CRUISE (Curriculum-Based Iterative Self-Play for Scalable Multi-Drone Racing), a reinforcement learning framework designed to solve this challenge in the demanding domain of multi-drone racing. CRUISE overcomes key scalability limitations by synergistically combining a progressive difficulty curriculum with an efficient self-play mechanism to foster robust competitive behaviors. Validated in high-fidelity simulation with realistic quadrotor dynamics, the resulting policies significantly outperform both a standard reinforcement learning baseline and a state-of-the-art game-theoretic planner. CRUISE achieves nearly double the planner's mean racing speed, maintains high success rates, and demonstrates robust scalability as agent density increases. Ablation studies confirm that the curriculum structure is the critical component for this performance leap. By providing a scalable and effective training methodology, CRUISE advances the development of autonomous systems for dynamic, competitive tasks and serves as a blueprint for future real-world deployment.
comment: 13 pages, 5 figures. This paper is currently under review at the journal Engineering Applications of Artificial Intelligence. Supplementary video: https://drive.google.com/file/d/1k7necen2DgIxaYT2alKK8-b20sE_AyDA/view Source code and models: https://doi.org/10.5281/zenodo.17256943
SPIRAL: Self-Play Incremental Racing Algorithm for Learning in Multi-Drone Competitions
This paper introduces SPIRAL (Self-Play Incremental Racing Algorithm for Learning), a novel approach for training autonomous drones in multi-agent racing competitions. SPIRAL distinctively employs a self-play mechanism to incrementally cultivate complex racing behaviors within a challenging, dynamic environment. Through this self-play core, drones continuously compete against increasingly proficient versions of themselves, naturally escalating the difficulty of competitive interactions. This progressive learning journey guides agents from mastering fundamental flight control to executing sophisticated cooperative multi-drone racing strategies. Our method is designed for versatility, allowing integration with any state-of-the-art Deep Reinforcement Learning (DRL) algorithms within its self-play framework. Simulations demonstrate the significant advantages of SPIRAL and benchmark the performance of various DRL algorithms operating within it. Consequently, we contribute a versatile, scalable, and self-improving learning framework to the field of autonomous drone racing. SPIRAL's capacity to autonomously generate appropriate and escalating challenges through its self-play dynamic offers a promising direction for developing robust and adaptive racing strategies in multi-agent environments. This research opens new avenues for enhancing the performance and reliability of autonomous racing drones in increasingly complex and competitive scenarios.
comment: \c{opyright} 2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Agent-GSPO: Communication-Efficient Multi-Agent Systems via Group Sequence Policy Optimization
To combat the prohibitive communication costs of ``free-for-all" multi-agent systems (MAS), we introduce \textbf{Agent-GSPO}, a framework that directly optimizes for token economy using sequence-level reinforcement learning. Agent-GSPO leverages the stable and memory-efficient Group Sequence Policy Optimization (GSPO) algorithm to train agents on a communication-aware reward that explicitly penalizes verbosity. Across seven reasoning benchmarks, Agent-GSPO not only achieves new state-of-the-art performance but does so with a fraction of the token consumption of existing methods. By fostering emergent strategies like ``strategic silence," our approach provides a practical blueprint for developing scalable and economically viable multi-agent systems.
MI9: An Integrated Runtime Governance Framework for Agentic AI
Agentic AI systems capable of reasoning, planning, and executing actions present fundamentally distinct governance challenges compared to traditional AI models. Unlike conventional AI, these systems exhibit emergent and unexpected behaviors during runtime, introducing novel agent-related risks that cannot be fully anticipated through pre-deployment governance alone. To address this critical gap, we introduce MI9, the first fully integrated runtime governance framework designed specifically for safety and alignment of agentic AI systems. MI9 introduces real-time controls through six integrated components: agency-risk index, agent-semantic telemetry capture, continuous authorization monitoring, Finite-State-Machine (FSM)-based conformance engines, goal-conditioned drift detection, and graduated containment strategies. Operating transparently across heterogeneous agent architectures, MI9 enables the systematic, safe, and responsible deployment of agentic systems in production environments where conventional governance approaches fall short, providing the foundational infrastructure for safe agentic AI deployment at scale. Detailed analysis through a diverse set of scenarios demonstrates MI9's systematic coverage of governance challenges that existing approaches fail to address, establishing the technical foundation for comprehensive agentic AI oversight.
Systems and Control (CS)
Transmission Neural Networks: Approximate Receding Horizon Control for Virus Spread on Networks
Transmission Neural Networks (TransNNs) pro- posed by Gao and Caines (2022) serve as both virus spread models over networks and neural network models with tuneable activation functions. This paper establishes that TransNNs provide upper bounds on the infection probability generated from the associated Markovian stochastic Susceptible-Infected- Susceptible (SIS) model with 2^n state configurations where n is the number of nodes in the network, and can be employed as an approximate model for the latter. Based on such an approximation, a TransNN-based receding horizon control approach for mitigating virus spread is proposed and we demonstrate that it allows significant computational savings compared to the dynamic programming solution to Markovian SIS model with 2^n state configurations, as well as providing less conservative control actions compared to the TransNN- based optimal control. Finally, numerical comparisons among (a) dynamic programming solutions for the Markovian SIS model, (b) TransNN-based optimal control and (c) the proposed TransNN-based receding horizon control are presented.
Analytical Swarm Chemistry: Characterization and Analysis of Emergent Swarm Behaviors
Swarm robotics has potential for a wide variety of applications, but real-world deployments remain rare due to the difficulty of predicting emergent behaviors arising from simple local interactions. Traditional engineering approaches design controllers to achieve desired macroscopic outcomes under idealized conditions, while agent-based and artificial life studies explore emergent phenomena in a bottom-up, exploratory manner. In this work, we introduce Analytical Swarm Chemistry, a framework that integrates concepts from engineering, agent-based and artificial life research, and chemistry. This framework combines macrostate definitions with phase diagram analysis to systematically explore how swarm parameters influence emergent behavior. Inspired by concepts from chemistry, the framework treats parameters like thermodynamic variables, enabling visualization of regions in parameter space that give rise to specific behaviors. Applying this framework to agents with minimally viable capabilities, we identify sufficient conditions for behaviors such as milling and diffusion and uncover regions of the parameter space that reliably produce these behaviors. Preliminary validation on real robots demonstrates that these regions correspond to observable behaviors in practice. By providing a principled, interpretable approach, this framework lays the groundwork for predictable and reliable emergent behavior in real-world swarm systems.
comment: 9 pages, 8 figures, 1 table
Residual Bias Compensation Filter for Physics-Based SOC Estimation in Lithium Iron Phosphate Batteries
This paper addresses state of charge (SOC) estimation for lithium iron phosphate (LFP) batteries, where the relatively flat open-circuit voltage (OCV-SOC) characteristic reduces observability. A residual bias compensation dual extended Kalman filter (RBC-DEKF) is developed. Unlike conventional bias compensation methods that treat the bias as an augmented state within a single filter, the proposed dual-filter structure decouples residual bias estimation from electrochemical state estimation. One EKF estimates the system states of a control-oriented parameter-grouped single particle model with thermal effects, while the other EKF estimates a residual bias that continuously corrects the voltage observation equation, thereby refining the model-predicted voltage in real time. Unlike bias-augmented single-filter schemes that enlarge the covariance coupling, the decoupled bias estimator refines the voltage observation without perturbing electrochemical state dynamics. Validation is conducted on an LFP cell from a public dataset under three representative operating conditions: US06 at 0 degC, DST at 25 degC, and FUDS at 50 degC. Compared with a conventional EKF using the same model and identical state filter settings, the proposed method reduces the average SOC RMSE from 3.75% to 0.20% and the voltage RMSE between the filtered model voltage and the measured voltage from 32.8 mV to 0.8 mV. The improvement is most evident in the mid-SOC range where the OCV-SOC curve is flat, confirming that residual bias compensation significantly enhances accuracy for model-based SOC estimation of LFP batteries across a wide temperature range.
comment: This paper has been submitted to the European Control Conference (ECC) 2026 for consideration. This is the authors' version of the work, made available for early dissemination. The copyright remains with the authors. The final version, if accepted, will appear in the ECC 2026 proceedings
Ellipsoidal Set-Theoretic Design of Robust Safety Filters for Constrained Linear Systems
This paper presents an ellipsoidal set-theoretic framework for robust safety filter synthesis in constrained linear systems subject to additive bounded disturbances and input constraints. We formulate the safety filter design as a convex linear matrix inequality (LMI) optimization problem that simultaneously computes a robust controlled invariant (RCI) ellipsoidal set and its associated state-feedback control law. The RCI set is characterized as an ellipsoidal set, enabling computational tractability for high-dimensional systems while providing formal safety guarantees. The safety filter employs a smooth mixing strategy between nominal and backup controllers based on distance to the invariant set boundary, facilitating minimal intervention when the system operates safely. The proposed method extends to nonlinear systems by treating nonlinear terms as bounded disturbances with rigorous approximation bounds. Numerical validation on a six-degree-of-freedom quadrotor system demonstrates the filter's effectiveness in maintaining stability under external disturbances and aggressive maneuvers while preserving nominal performance during safe operation. The approach provides a constructive and computationally efficient solution for safety-critical control applications requiring real-time implementation.
Curriculum-Based Iterative Self-Play for Scalable Multi-Drone Racing
The coordination of multiple autonomous agents in high-speed, competitive environments represents a significant engineering challenge. This paper presents CRUISE (Curriculum-Based Iterative Self-Play for Scalable Multi-Drone Racing), a reinforcement learning framework designed to solve this challenge in the demanding domain of multi-drone racing. CRUISE overcomes key scalability limitations by synergistically combining a progressive difficulty curriculum with an efficient self-play mechanism to foster robust competitive behaviors. Validated in high-fidelity simulation with realistic quadrotor dynamics, the resulting policies significantly outperform both a standard reinforcement learning baseline and a state-of-the-art game-theoretic planner. CRUISE achieves nearly double the planner's mean racing speed, maintains high success rates, and demonstrates robust scalability as agent density increases. Ablation studies confirm that the curriculum structure is the critical component for this performance leap. By providing a scalable and effective training methodology, CRUISE advances the development of autonomous systems for dynamic, competitive tasks and serves as a blueprint for future real-world deployment.
comment: 13 pages, 5 figures. This paper is currently under review at the journal Engineering Applications of Artificial Intelligence. Supplementary video: https://drive.google.com/file/d/1k7necen2DgIxaYT2alKK8-b20sE_AyDA/view Source code and models: https://doi.org/10.5281/zenodo.17256943
SPIRAL: Self-Play Incremental Racing Algorithm for Learning in Multi-Drone Competitions
This paper introduces SPIRAL (Self-Play Incremental Racing Algorithm for Learning), a novel approach for training autonomous drones in multi-agent racing competitions. SPIRAL distinctively employs a self-play mechanism to incrementally cultivate complex racing behaviors within a challenging, dynamic environment. Through this self-play core, drones continuously compete against increasingly proficient versions of themselves, naturally escalating the difficulty of competitive interactions. This progressive learning journey guides agents from mastering fundamental flight control to executing sophisticated cooperative multi-drone racing strategies. Our method is designed for versatility, allowing integration with any state-of-the-art Deep Reinforcement Learning (DRL) algorithms within its self-play framework. Simulations demonstrate the significant advantages of SPIRAL and benchmark the performance of various DRL algorithms operating within it. Consequently, we contribute a versatile, scalable, and self-improving learning framework to the field of autonomous drone racing. SPIRAL's capacity to autonomously generate appropriate and escalating challenges through its self-play dynamic offers a promising direction for developing robust and adaptive racing strategies in multi-agent environments. This research opens new avenues for enhancing the performance and reliability of autonomous racing drones in increasingly complex and competitive scenarios.
comment: \c{opyright} 2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Approximate Gradient Coding for Distributed Learning with Heterogeneous Stragglers
In this paper, we propose an optimally structured gradient coding scheme to mitigate the straggler problem in distributed learning. Conventional gradient coding methods often assume homogeneous straggler models or rely on excessive data replication, limiting performance in real-world heterogeneous systems. To address these limitations, we formulate an optimization problem minimizing residual error while ensuring unbiased gradient estimation by explicitly considering individual straggler probabilities. We derive closed-form solutions for optimal encoding and decoding coefficients via Lagrangian duality and convex optimization, and propose data allocation strategies that reduce both redundancy and computation load. We also analyze convergence behavior for $\lambda$-strongly convex and $\mu$-smooth loss functions. Numerical results show that our approach significantly reduces the impact of stragglers and accelerates convergence compared to existing methods.
Smart Sensor Placement: A Correlation-Aware Attribution Framework (CAAF) for Real-world Data Modeling
Optimal sensor placement (OSP) is critical for efficient, accurate monitoring, control, and inference in complex real-world systems. We propose a machine-learning-based feature attribution framework to identify OSP for the prediction of quantities of interest. Feature attribution quantifies input contributions to a model's output; however, it struggles with highly correlated input data often encountered in real-world applications. To address this, we propose a Correlation-Aware Attribution Framework (CAAF), which introduces a clustering step before performing feature attribution to reduce redundancy and enhance generalizability. We first illustrate the core principles of the proposed framework through a series of validation cases, then demonstrate its effectiveness in real-world dynamical systems, such as structural health monitoring, airfoil lift prediction, and wall-normal velocity estimation for turbulent channel flow. The results show that the CAAF outperforms alternative approaches that typically struggle due to the presence of nonlinear dynamics, chaotic behavior, and multi-scale interactions, and enables the effective application of feature attribution for identifying OSP in real-world environments.
Robust Multi-Agent Safety via Tube-Based Tightened Exponential Barrier Functions
This paper presents a constructive framework for synthesizing provably safe controllers for nonlinear multi-agent systems subject to bounded disturbances. The methodology applies to systems representable in Brunovsky canonical form, accommodating arbitrary-order dynamics in multi-dimensional spaces. The central contribution is a method of constraint tightening that formally couples robust error feedback with nominal trajectory planning. The key insight is that the design of an ancillary feedback law, which confines state errors to a robust positively invariant (RPI) tube, simultaneously provides the exact information needed to ensure the safety of the nominal plan. Specifically, the geometry of the resulting RPI tube is leveraged via its support function to derive state-dependent safety margins. These margins are then used to systematically tighten the high relative-degree exponential control barrier function (eCBF) constraints imposed on the nominal planner. This integrated synthesis guarantees that any nominal trajectory satisfying the tightened constraints corresponds to a provably safe trajectory for the true, disturbed system. We demonstrate the practical utility of this formal synthesis method by implementing the planner within a distributed Model Predictive Control (MPC) scheme, which optimizes performance while inheriting the robust safety guarantees.
comment: This work has been submitted to IFAC for possible publication
Functional Uncertainty Classes, Nonparametric Adaptive Contro Functional Uncertainty Classes for Nonparametric Adaptive Control: the Curse of Dimensionality
This paper derives a new class of vector-valued reproducing kernel Hilbert spaces (vRKHS) defined in terms of operator-valued kernels for the representation of functional uncertainty arising in nonparametric adaptive control methods. These are referred to as maneuver or trajectory vRKHS KM in the paper, and they are introduced to address the curse of dimensionality that can arise for some types of nonparametric adaptive control strategies. The maneuver vRKHSs are derived based on the structure of a compact, l-dimensional, smooth Riemannian manifold M that is regularly embedded in the state space X = Rn, where M is assumed to approximately support the ultimate dynamics of the reference system to be tracked.
A Scenario-based Stochastic Model of using BESS-based Virtual Transmission Lines in Day-Ahead Unit Commitment
The rapid increase in renewable energy sources (RES) implementation in the power system creates more severe network congestion, which may reduce grid operation efficiency and cause renewable curtailment. Deterministic optimization for the unit commitment shows that battery energy storage system (BESS)-based Virtual Transmission Line (VTL), as an alternative to physical transmission lines, can offer a quick solution for congestion relief, reduced operational costs, and lowered renewable curtailment. This paper aims to evaluate the benefits of VTL when considering Renewable Energy Sources uncertainty. Particularly, this work proposes a scenario-based stochastic security-constrained unit commitment model considering VTL, referred to as SSCUC-VTL. It incorporates the forecast error of RES into the commitment decision for systems with VTL. The performance of applying the VTL strategy is compared to that of adding a new physical transmission line and a standalone BESS. A case study has been conducted on an enhanced IEEE 24-bus test system. The simulation results demonstrate that VTL provides 23% more operational cost reduction than the physical transmission line, and up to 67% more congestion relief than the standalone BESS in a power system with solar and wind generation.
comment: This paper is to be published in the IEEE PES Asia-Pacific Power and Energy Engineering Conference (APPEEC) 2025
Data-driven Exponential Framing for Pulsive Temporal Patterns without Repetition or Singularity
Extracting pulsive temporal patterns from a small dataset without their repetition or singularity shows significant importance in manufacturing applications but does not sufficiently attract scientific attention. We propose to quantify how long temporal patterns appear without relying on their repetition or singularity, enabling to extract such temporal patterns from a small dataset. Inspired by the celebrated time delay embedding and data-driven Hankel matrix analysis, we introduce a linear dynamical system model on the time-delay coordinates behind the data to derive the discrete-time bases each of which has a distinct exponential decay constant. The derived bases are fitted onto subsequences that are extracted with a sliding window in order to quantify how long patterns are dominant in the set of subsequences. We call the quantification method Data-driven Exponential Framing (DEF). A toy model-based experiment shows that DEF can identify multiple patterns with distinct lengths. DEF is also applied to electric current measurement on a punching machine, showing its possibility to extract multiple patterns from real-world oscillatory data.
comment: 16 pages
Transmission Neural Networks: Approximate Receding Horizon Control for Virus Spread on Networks
Transmission Neural Networks (TransNNs) proposed by Gao and Caines (2022) serve as both virus spread models over networks and neural network models with tuneable activation functions. This paper establishes that TransNNs provide upper bounds on the infection probability generated from the associated Markovian stochastic Susceptible-Infected-Susceptible (SIS) model with 2^n state configurations where n is the number of nodes in the network, and can be employed as an approximate model for the latter. Based on such an approximation, a TransNN-based receding horizon control approach for mitigating virus spread is proposed and we demonstrate that it allows significant computational savings compared to the dynamic programming solution to Markovian SIS model with 2^n state configurations, as well as providing less conservative control actions compared to the TransNN-based optimal control. Finally, numerical comparisons among (a) dynamic programming solutions for the Markovian SIS model, (b) TransNN-based optimal control and (c) the proposed TransNN-based receding horizon control are presented.
Predictive Reliability Assessment of Distribution Grids with Residential Distributed Energy Resources
Distribution system end users are transforming from passive to active participants, marked by the push towards widespread adoption of edge-level Distributed Energy Resources (DERs). This paper addresses the challenges in distribution system planning arising from these dynamic changes. We introduce a bottom-up probabilistic approach that integrates these edge-level DERs into the reliability evaluation process. Our methodology leverages joint probability distributions to characterize and model the penetration of rooftop photovoltaic (PV) systems and energy storage across a distribution network at the individual residential level. Employing a scenario-based approach, we showcase the application of our probabilistic method using a Monte Carlo Simulation process to assess average system reliability indices and their variations at the user level. To validate our approach, we applied this methodology to the RBTS test system across various adoption scenarios, effectively showcasing the capability of our proposed method in quantifying the variation in end-user reliability indices for each scenario within the distribution system.
comment: Accepted by CSEE Journal of Power and Energy Systems in Oct. 2025
Stability Criteria and Motor Performance in Delayed Haptic Dyadic Interactions Mediated by Robots
This paper establishes analytical stability criteria for robot-mediated human-human (dyadic) interaction systems, focusing on haptic communication under network-induced time delays. Through frequency-domain analysis supported by numerical simulations, we identify both delay-independent and delay-dependent stability criteria. The delay-independent criterion guarantees stability irrespective of the delay, whereas the delay-dependent criterion is characterised by a maximum tolerable delay before instability occurs. The criteria demonstrate dependence on controller and robot dynamic parameters, where increasing stiffness reduces the maximum tolerable delay in a non-linear manner, thereby heightening system vulnerability. The proposed criteria can be generalised to a wide range of robot-mediated interactions and serve as design guidelines for stable remote dyadic systems. Experiments with robots performing human-like movements further illustrate the correlation between stability and motor performance. The findings of this paper suggest the prerequisites for effective delay-compensation strategies.
Koopman Eigenfunction-Based Identification and Optimal Nonlinear Control of Turbojet Engine
Gas turbine engines are complex and highly nonlinear dynamical systems. Deriving their physics-based models can be challenging because it requires performance characteristics that are not always available, often leading to many simplifying assumptions. This paper discusses the limitations of conventional experimental methods used to derive component-level and locally linear parameter-varying models, and addresses these issues by employing identification techniques based on data collected from standard engine operation under closed-loop control. The rotor dynamics are estimated using the sparse identification of nonlinear dynamics. Subsequently, the autonomous part of the dynamics is mapped into an optimally constructed Koopman eigenfunction space. This process involves eigenvalue optimization using metaheuristic algorithms and temporal projection, followed by gradient-based eigenfunction identification. The resulting Koopman model is validated against an in-house reference component-level model. A globally optimal nonlinear feedback controller and a Kalman estimator are then designed within the eigenfunction space and compared to traditional and gain-scheduled proportional-integral controllers, as well as a proposed internal model control approach. The eigenmode structure enables targeting individual modes during optimization, leading to improved performance tuning. Results demonstrate that the Koopman-based controller surpasses other benchmark controllers in both reference tracking and disturbance rejection under sea-level and varying flight conditions, due to its global nature.
comment: 34 pages, 29 figures Under review at Springer Nonlinear Dynamics
Dynamic financial processes identification using sparse regressive reservoir computers
In this document, we present key findings in structured matrix approximation theory, with applications to the regressive representation of dynamic financial processes. Initially, we explore a comprehensive approach involving generic nonlinear time delay embedding for time series data extracted from a financial or economic system under examination. Subsequently, we employ sparse least-squares and structured matrix approximation methods to discern approximate representations of the output coupling matrices. These representations play a pivotal role in establishing the regressive models corresponding to the recursive structures inherent in a given financial system. The document further introduces prototypical algorithms that leverage the aforementioned techniques. These algorithms are demonstrated through applications in approximate identification and predictive simulation of dynamic financial and economic processes, encompassing scenarios that may or may not exhibit chaotic behavior.
comment: The content of this publication represents the opinion of the researchers affiliated with the Department of Statistics and Research, but not the official opinion of the CNBS
Zero-Shot Trajectory Planning for Signal Temporal Logic Tasks
Signal Temporal Logic (STL) is a powerful specification language for describing complex temporal behaviors of continuous signals, making it well-suited for high-level robotic task descriptions. However, generating executable plans for STL tasks is challenging, as it requires consideration of the coupling between the task specification and the system dynamics. Existing approaches either follow a model-based setting that explicitly requires knowledge of the system dynamics or adopt a task-oriented data-driven approach to learn plans for specific tasks. In this work, we address the problem of generating executable STL plans for systems with unknown dynamics. We propose a hierarchical planning framework that enables zero-shot generalization to new STL tasks by leveraging only task-agnostic trajectory data during offline training. The framework consists of three key components: (i) decomposing the STL specification into several progresses and time constraints, (ii) searching for timed waypoints that satisfy all progresses under time constraints, and (iii) generating trajectory segments using a pre-trained diffusion model and stitching them into complete trajectories. We formally prove that our method guarantees STL satisfaction, and simulation results demonstrate its effectiveness in generating dynamically feasible trajectories across diverse long-horizon STL tasks.
A Note on Comparator-Overdrive-Delay Conditioning for Current-Mode Control
Comparator-overdrive-delay conditioning is a new control conditioning approach for high-frequency current-mode control. No existing literature rigorously studies the effect of the comparator overdrive delay on the current-mode control. The results in this paper provide insights into the mechanism of comparator-overdrive-delay conditioning.
comment: Add extra case studies
Systems and Control (EESS)
Transmission Neural Networks: Approximate Receding Horizon Control for Virus Spread on Networks
Transmission Neural Networks (TransNNs) pro- posed by Gao and Caines (2022) serve as both virus spread models over networks and neural network models with tuneable activation functions. This paper establishes that TransNNs provide upper bounds on the infection probability generated from the associated Markovian stochastic Susceptible-Infected- Susceptible (SIS) model with 2^n state configurations where n is the number of nodes in the network, and can be employed as an approximate model for the latter. Based on such an approximation, a TransNN-based receding horizon control approach for mitigating virus spread is proposed and we demonstrate that it allows significant computational savings compared to the dynamic programming solution to Markovian SIS model with 2^n state configurations, as well as providing less conservative control actions compared to the TransNN- based optimal control. Finally, numerical comparisons among (a) dynamic programming solutions for the Markovian SIS model, (b) TransNN-based optimal control and (c) the proposed TransNN-based receding horizon control are presented.
Analytical Swarm Chemistry: Characterization and Analysis of Emergent Swarm Behaviors
Swarm robotics has potential for a wide variety of applications, but real-world deployments remain rare due to the difficulty of predicting emergent behaviors arising from simple local interactions. Traditional engineering approaches design controllers to achieve desired macroscopic outcomes under idealized conditions, while agent-based and artificial life studies explore emergent phenomena in a bottom-up, exploratory manner. In this work, we introduce Analytical Swarm Chemistry, a framework that integrates concepts from engineering, agent-based and artificial life research, and chemistry. This framework combines macrostate definitions with phase diagram analysis to systematically explore how swarm parameters influence emergent behavior. Inspired by concepts from chemistry, the framework treats parameters like thermodynamic variables, enabling visualization of regions in parameter space that give rise to specific behaviors. Applying this framework to agents with minimally viable capabilities, we identify sufficient conditions for behaviors such as milling and diffusion and uncover regions of the parameter space that reliably produce these behaviors. Preliminary validation on real robots demonstrates that these regions correspond to observable behaviors in practice. By providing a principled, interpretable approach, this framework lays the groundwork for predictable and reliable emergent behavior in real-world swarm systems.
comment: 9 pages, 8 figures, 1 table
Residual Bias Compensation Filter for Physics-Based SOC Estimation in Lithium Iron Phosphate Batteries
This paper addresses state of charge (SOC) estimation for lithium iron phosphate (LFP) batteries, where the relatively flat open-circuit voltage (OCV-SOC) characteristic reduces observability. A residual bias compensation dual extended Kalman filter (RBC-DEKF) is developed. Unlike conventional bias compensation methods that treat the bias as an augmented state within a single filter, the proposed dual-filter structure decouples residual bias estimation from electrochemical state estimation. One EKF estimates the system states of a control-oriented parameter-grouped single particle model with thermal effects, while the other EKF estimates a residual bias that continuously corrects the voltage observation equation, thereby refining the model-predicted voltage in real time. Unlike bias-augmented single-filter schemes that enlarge the covariance coupling, the decoupled bias estimator refines the voltage observation without perturbing electrochemical state dynamics. Validation is conducted on an LFP cell from a public dataset under three representative operating conditions: US06 at 0 degC, DST at 25 degC, and FUDS at 50 degC. Compared with a conventional EKF using the same model and identical state filter settings, the proposed method reduces the average SOC RMSE from 3.75% to 0.20% and the voltage RMSE between the filtered model voltage and the measured voltage from 32.8 mV to 0.8 mV. The improvement is most evident in the mid-SOC range where the OCV-SOC curve is flat, confirming that residual bias compensation significantly enhances accuracy for model-based SOC estimation of LFP batteries across a wide temperature range.
comment: This paper has been submitted to the European Control Conference (ECC) 2026 for consideration. This is the authors' version of the work, made available for early dissemination. The copyright remains with the authors. The final version, if accepted, will appear in the ECC 2026 proceedings
Ellipsoidal Set-Theoretic Design of Robust Safety Filters for Constrained Linear Systems
This paper presents an ellipsoidal set-theoretic framework for robust safety filter synthesis in constrained linear systems subject to additive bounded disturbances and input constraints. We formulate the safety filter design as a convex linear matrix inequality (LMI) optimization problem that simultaneously computes a robust controlled invariant (RCI) ellipsoidal set and its associated state-feedback control law. The RCI set is characterized as an ellipsoidal set, enabling computational tractability for high-dimensional systems while providing formal safety guarantees. The safety filter employs a smooth mixing strategy between nominal and backup controllers based on distance to the invariant set boundary, facilitating minimal intervention when the system operates safely. The proposed method extends to nonlinear systems by treating nonlinear terms as bounded disturbances with rigorous approximation bounds. Numerical validation on a six-degree-of-freedom quadrotor system demonstrates the filter's effectiveness in maintaining stability under external disturbances and aggressive maneuvers while preserving nominal performance during safe operation. The approach provides a constructive and computationally efficient solution for safety-critical control applications requiring real-time implementation.
Curriculum-Based Iterative Self-Play for Scalable Multi-Drone Racing
The coordination of multiple autonomous agents in high-speed, competitive environments represents a significant engineering challenge. This paper presents CRUISE (Curriculum-Based Iterative Self-Play for Scalable Multi-Drone Racing), a reinforcement learning framework designed to solve this challenge in the demanding domain of multi-drone racing. CRUISE overcomes key scalability limitations by synergistically combining a progressive difficulty curriculum with an efficient self-play mechanism to foster robust competitive behaviors. Validated in high-fidelity simulation with realistic quadrotor dynamics, the resulting policies significantly outperform both a standard reinforcement learning baseline and a state-of-the-art game-theoretic planner. CRUISE achieves nearly double the planner's mean racing speed, maintains high success rates, and demonstrates robust scalability as agent density increases. Ablation studies confirm that the curriculum structure is the critical component for this performance leap. By providing a scalable and effective training methodology, CRUISE advances the development of autonomous systems for dynamic, competitive tasks and serves as a blueprint for future real-world deployment.
comment: 13 pages, 5 figures. This paper is currently under review at the journal Engineering Applications of Artificial Intelligence. Supplementary video: https://drive.google.com/file/d/1k7necen2DgIxaYT2alKK8-b20sE_AyDA/view Source code and models: https://doi.org/10.5281/zenodo.17256943
SPIRAL: Self-Play Incremental Racing Algorithm for Learning in Multi-Drone Competitions
This paper introduces SPIRAL (Self-Play Incremental Racing Algorithm for Learning), a novel approach for training autonomous drones in multi-agent racing competitions. SPIRAL distinctively employs a self-play mechanism to incrementally cultivate complex racing behaviors within a challenging, dynamic environment. Through this self-play core, drones continuously compete against increasingly proficient versions of themselves, naturally escalating the difficulty of competitive interactions. This progressive learning journey guides agents from mastering fundamental flight control to executing sophisticated cooperative multi-drone racing strategies. Our method is designed for versatility, allowing integration with any state-of-the-art Deep Reinforcement Learning (DRL) algorithms within its self-play framework. Simulations demonstrate the significant advantages of SPIRAL and benchmark the performance of various DRL algorithms operating within it. Consequently, we contribute a versatile, scalable, and self-improving learning framework to the field of autonomous drone racing. SPIRAL's capacity to autonomously generate appropriate and escalating challenges through its self-play dynamic offers a promising direction for developing robust and adaptive racing strategies in multi-agent environments. This research opens new avenues for enhancing the performance and reliability of autonomous racing drones in increasingly complex and competitive scenarios.
comment: \c{opyright} 2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Approximate Gradient Coding for Distributed Learning with Heterogeneous Stragglers
In this paper, we propose an optimally structured gradient coding scheme to mitigate the straggler problem in distributed learning. Conventional gradient coding methods often assume homogeneous straggler models or rely on excessive data replication, limiting performance in real-world heterogeneous systems. To address these limitations, we formulate an optimization problem minimizing residual error while ensuring unbiased gradient estimation by explicitly considering individual straggler probabilities. We derive closed-form solutions for optimal encoding and decoding coefficients via Lagrangian duality and convex optimization, and propose data allocation strategies that reduce both redundancy and computation load. We also analyze convergence behavior for $\lambda$-strongly convex and $\mu$-smooth loss functions. Numerical results show that our approach significantly reduces the impact of stragglers and accelerates convergence compared to existing methods.
Smart Sensor Placement: A Correlation-Aware Attribution Framework (CAAF) for Real-world Data Modeling
Optimal sensor placement (OSP) is critical for efficient, accurate monitoring, control, and inference in complex real-world systems. We propose a machine-learning-based feature attribution framework to identify OSP for the prediction of quantities of interest. Feature attribution quantifies input contributions to a model's output; however, it struggles with highly correlated input data often encountered in real-world applications. To address this, we propose a Correlation-Aware Attribution Framework (CAAF), which introduces a clustering step before performing feature attribution to reduce redundancy and enhance generalizability. We first illustrate the core principles of the proposed framework through a series of validation cases, then demonstrate its effectiveness in real-world dynamical systems, such as structural health monitoring, airfoil lift prediction, and wall-normal velocity estimation for turbulent channel flow. The results show that the CAAF outperforms alternative approaches that typically struggle due to the presence of nonlinear dynamics, chaotic behavior, and multi-scale interactions, and enables the effective application of feature attribution for identifying OSP in real-world environments.
Robust Multi-Agent Safety via Tube-Based Tightened Exponential Barrier Functions
This paper presents a constructive framework for synthesizing provably safe controllers for nonlinear multi-agent systems subject to bounded disturbances. The methodology applies to systems representable in Brunovsky canonical form, accommodating arbitrary-order dynamics in multi-dimensional spaces. The central contribution is a method of constraint tightening that formally couples robust error feedback with nominal trajectory planning. The key insight is that the design of an ancillary feedback law, which confines state errors to a robust positively invariant (RPI) tube, simultaneously provides the exact information needed to ensure the safety of the nominal plan. Specifically, the geometry of the resulting RPI tube is leveraged via its support function to derive state-dependent safety margins. These margins are then used to systematically tighten the high relative-degree exponential control barrier function (eCBF) constraints imposed on the nominal planner. This integrated synthesis guarantees that any nominal trajectory satisfying the tightened constraints corresponds to a provably safe trajectory for the true, disturbed system. We demonstrate the practical utility of this formal synthesis method by implementing the planner within a distributed Model Predictive Control (MPC) scheme, which optimizes performance while inheriting the robust safety guarantees.
comment: This work has been submitted to IFAC for possible publication
Functional Uncertainty Classes, Nonparametric Adaptive Contro Functional Uncertainty Classes for Nonparametric Adaptive Control: the Curse of Dimensionality
This paper derives a new class of vector-valued reproducing kernel Hilbert spaces (vRKHS) defined in terms of operator-valued kernels for the representation of functional uncertainty arising in nonparametric adaptive control methods. These are referred to as maneuver or trajectory vRKHS KM in the paper, and they are introduced to address the curse of dimensionality that can arise for some types of nonparametric adaptive control strategies. The maneuver vRKHSs are derived based on the structure of a compact, l-dimensional, smooth Riemannian manifold M that is regularly embedded in the state space X = Rn, where M is assumed to approximately support the ultimate dynamics of the reference system to be tracked.
A Scenario-based Stochastic Model of using BESS-based Virtual Transmission Lines in Day-Ahead Unit Commitment
The rapid increase in renewable energy sources (RES) implementation in the power system creates more severe network congestion, which may reduce grid operation efficiency and cause renewable curtailment. Deterministic optimization for the unit commitment shows that battery energy storage system (BESS)-based Virtual Transmission Line (VTL), as an alternative to physical transmission lines, can offer a quick solution for congestion relief, reduced operational costs, and lowered renewable curtailment. This paper aims to evaluate the benefits of VTL when considering Renewable Energy Sources uncertainty. Particularly, this work proposes a scenario-based stochastic security-constrained unit commitment model considering VTL, referred to as SSCUC-VTL. It incorporates the forecast error of RES into the commitment decision for systems with VTL. The performance of applying the VTL strategy is compared to that of adding a new physical transmission line and a standalone BESS. A case study has been conducted on an enhanced IEEE 24-bus test system. The simulation results demonstrate that VTL provides 23% more operational cost reduction than the physical transmission line, and up to 67% more congestion relief than the standalone BESS in a power system with solar and wind generation.
comment: This paper is to be published in the IEEE PES Asia-Pacific Power and Energy Engineering Conference (APPEEC) 2025
Data-driven Exponential Framing for Pulsive Temporal Patterns without Repetition or Singularity
Extracting pulsive temporal patterns from a small dataset without their repetition or singularity shows significant importance in manufacturing applications but does not sufficiently attract scientific attention. We propose to quantify how long temporal patterns appear without relying on their repetition or singularity, enabling to extract such temporal patterns from a small dataset. Inspired by the celebrated time delay embedding and data-driven Hankel matrix analysis, we introduce a linear dynamical system model on the time-delay coordinates behind the data to derive the discrete-time bases each of which has a distinct exponential decay constant. The derived bases are fitted onto subsequences that are extracted with a sliding window in order to quantify how long patterns are dominant in the set of subsequences. We call the quantification method Data-driven Exponential Framing (DEF). A toy model-based experiment shows that DEF can identify multiple patterns with distinct lengths. DEF is also applied to electric current measurement on a punching machine, showing its possibility to extract multiple patterns from real-world oscillatory data.
comment: 16 pages
Transmission Neural Networks: Approximate Receding Horizon Control for Virus Spread on Networks
Transmission Neural Networks (TransNNs) proposed by Gao and Caines (2022) serve as both virus spread models over networks and neural network models with tuneable activation functions. This paper establishes that TransNNs provide upper bounds on the infection probability generated from the associated Markovian stochastic Susceptible-Infected-Susceptible (SIS) model with 2^n state configurations where n is the number of nodes in the network, and can be employed as an approximate model for the latter. Based on such an approximation, a TransNN-based receding horizon control approach for mitigating virus spread is proposed and we demonstrate that it allows significant computational savings compared to the dynamic programming solution to Markovian SIS model with 2^n state configurations, as well as providing less conservative control actions compared to the TransNN-based optimal control. Finally, numerical comparisons among (a) dynamic programming solutions for the Markovian SIS model, (b) TransNN-based optimal control and (c) the proposed TransNN-based receding horizon control are presented.
Predictive Reliability Assessment of Distribution Grids with Residential Distributed Energy Resources
Distribution system end users are transforming from passive to active participants, marked by the push towards widespread adoption of edge-level Distributed Energy Resources (DERs). This paper addresses the challenges in distribution system planning arising from these dynamic changes. We introduce a bottom-up probabilistic approach that integrates these edge-level DERs into the reliability evaluation process. Our methodology leverages joint probability distributions to characterize and model the penetration of rooftop photovoltaic (PV) systems and energy storage across a distribution network at the individual residential level. Employing a scenario-based approach, we showcase the application of our probabilistic method using a Monte Carlo Simulation process to assess average system reliability indices and their variations at the user level. To validate our approach, we applied this methodology to the RBTS test system across various adoption scenarios, effectively showcasing the capability of our proposed method in quantifying the variation in end-user reliability indices for each scenario within the distribution system.
comment: Accepted by CSEE Journal of Power and Energy Systems in Oct. 2025
Stability Criteria and Motor Performance in Delayed Haptic Dyadic Interactions Mediated by Robots
This paper establishes analytical stability criteria for robot-mediated human-human (dyadic) interaction systems, focusing on haptic communication under network-induced time delays. Through frequency-domain analysis supported by numerical simulations, we identify both delay-independent and delay-dependent stability criteria. The delay-independent criterion guarantees stability irrespective of the delay, whereas the delay-dependent criterion is characterised by a maximum tolerable delay before instability occurs. The criteria demonstrate dependence on controller and robot dynamic parameters, where increasing stiffness reduces the maximum tolerable delay in a non-linear manner, thereby heightening system vulnerability. The proposed criteria can be generalised to a wide range of robot-mediated interactions and serve as design guidelines for stable remote dyadic systems. Experiments with robots performing human-like movements further illustrate the correlation between stability and motor performance. The findings of this paper suggest the prerequisites for effective delay-compensation strategies.
Koopman Eigenfunction-Based Identification and Optimal Nonlinear Control of Turbojet Engine
Gas turbine engines are complex and highly nonlinear dynamical systems. Deriving their physics-based models can be challenging because it requires performance characteristics that are not always available, often leading to many simplifying assumptions. This paper discusses the limitations of conventional experimental methods used to derive component-level and locally linear parameter-varying models, and addresses these issues by employing identification techniques based on data collected from standard engine operation under closed-loop control. The rotor dynamics are estimated using the sparse identification of nonlinear dynamics. Subsequently, the autonomous part of the dynamics is mapped into an optimally constructed Koopman eigenfunction space. This process involves eigenvalue optimization using metaheuristic algorithms and temporal projection, followed by gradient-based eigenfunction identification. The resulting Koopman model is validated against an in-house reference component-level model. A globally optimal nonlinear feedback controller and a Kalman estimator are then designed within the eigenfunction space and compared to traditional and gain-scheduled proportional-integral controllers, as well as a proposed internal model control approach. The eigenmode structure enables targeting individual modes during optimization, leading to improved performance tuning. Results demonstrate that the Koopman-based controller surpasses other benchmark controllers in both reference tracking and disturbance rejection under sea-level and varying flight conditions, due to its global nature.
comment: 34 pages, 29 figures Under review at Springer Nonlinear Dynamics
Dynamic financial processes identification using sparse regressive reservoir computers
In this document, we present key findings in structured matrix approximation theory, with applications to the regressive representation of dynamic financial processes. Initially, we explore a comprehensive approach involving generic nonlinear time delay embedding for time series data extracted from a financial or economic system under examination. Subsequently, we employ sparse least-squares and structured matrix approximation methods to discern approximate representations of the output coupling matrices. These representations play a pivotal role in establishing the regressive models corresponding to the recursive structures inherent in a given financial system. The document further introduces prototypical algorithms that leverage the aforementioned techniques. These algorithms are demonstrated through applications in approximate identification and predictive simulation of dynamic financial and economic processes, encompassing scenarios that may or may not exhibit chaotic behavior.
comment: The content of this publication represents the opinion of the researchers affiliated with the Department of Statistics and Research, but not the official opinion of the CNBS
Zero-Shot Trajectory Planning for Signal Temporal Logic Tasks
Signal Temporal Logic (STL) is a powerful specification language for describing complex temporal behaviors of continuous signals, making it well-suited for high-level robotic task descriptions. However, generating executable plans for STL tasks is challenging, as it requires consideration of the coupling between the task specification and the system dynamics. Existing approaches either follow a model-based setting that explicitly requires knowledge of the system dynamics or adopt a task-oriented data-driven approach to learn plans for specific tasks. In this work, we address the problem of generating executable STL plans for systems with unknown dynamics. We propose a hierarchical planning framework that enables zero-shot generalization to new STL tasks by leveraging only task-agnostic trajectory data during offline training. The framework consists of three key components: (i) decomposing the STL specification into several progresses and time constraints, (ii) searching for timed waypoints that satisfy all progresses under time constraints, and (iii) generating trajectory segments using a pre-trained diffusion model and stitching them into complete trajectories. We formally prove that our method guarantees STL satisfaction, and simulation results demonstrate its effectiveness in generating dynamically feasible trajectories across diverse long-horizon STL tasks.
A Note on Comparator-Overdrive-Delay Conditioning for Current-Mode Control
Comparator-overdrive-delay conditioning is a new control conditioning approach for high-frequency current-mode control. No existing literature rigorously studies the effect of the comparator overdrive delay on the current-mode control. The results in this paper provide insights into the mechanism of comparator-overdrive-delay conditioning.
comment: Add extra case studies
Robotics
Drone Carry-on Weight and Wind Flow Assessment via Micro-Doppler Analysis
Remote monitoring of drones has become a global objective due to emerging applications in national security and managing aerial delivery traffic. Despite their relatively small size, drones can carry significant payloads, which require monitoring, especially in cases of unauthorized transportation of dangerous goods. A drone's flight dynamics heavily depend on outdoor wind conditions and the carry-on weight, which affect the tilt angle of a drone's body and the rotation velocity of the blades. A surveillance radar can capture both effects, provided a sufficient signal-to-noise ratio for the received echoes and an adjusted postprocessing detection algorithm. Here, we conduct a systematic study to demonstrate that micro-Doppler analysis enables the disentanglement of the impacts of wind and weight on a hovering drone. The physics behind the effect is related to the flight controller, as the way the drone counteracts weight and wind differs. When the payload is balanced, it imposes an additional load symmetrically on all four rotors, causing them to rotate faster, thereby generating a blade-related micro-Doppler shift at a higher frequency. However, the impact of the wind is different. The wind attempts to displace the drone, and to counteract this, the drone tilts to the side. As a result, the forward and rear rotors rotate at different velocities to maintain the tilt angle of the drone body relative to the airflow direction. This causes the splitting in the micro-Doppler spectra. By performing a set of experiments in a controlled environment, specifically, an anechoic chamber for electromagnetic isolation and a wind tunnel for imposing deterministic wind conditions, we demonstrate that both wind and payload details can be extracted using a simple deterministic algorithm based on branching in the micro-Doppler spectra.
Kinematically Controllable Cable Robots with Reconfigurable End-effectors
To enlarge the translational workspace of cable-driven robots, one common approach is to increase the number of cables. However, this introduces two challenges: (1) cable interference significantly reduces the rotational workspace, and (2) the solution of tensions in cables becomes non-unique, resulting in difficulties for kinematic control of the robot. In this work, we design structurally simple reconfigurable end-effectors for cable robots. By incorporating a spring, a helical-grooved shaft, and a matching nut, relative linear motions between end-effector components are converted into relative rotations, thereby expanding the rotational workspace of the mechanism. Meanwhile, a bearing is introduced to provide an additional rotational degree of freedom, making the mechanism non-redundant. As a result, the robot's motion can be controlled purely through kinematics without additional tension sensing and control.
comment: 7 pages, 12 figures, Technical Report
Analytical Swarm Chemistry: Characterization and Analysis of Emergent Swarm Behaviors
Swarm robotics has potential for a wide variety of applications, but real-world deployments remain rare due to the difficulty of predicting emergent behaviors arising from simple local interactions. Traditional engineering approaches design controllers to achieve desired macroscopic outcomes under idealized conditions, while agent-based and artificial life studies explore emergent phenomena in a bottom-up, exploratory manner. In this work, we introduce Analytical Swarm Chemistry, a framework that integrates concepts from engineering, agent-based and artificial life research, and chemistry. This framework combines macrostate definitions with phase diagram analysis to systematically explore how swarm parameters influence emergent behavior. Inspired by concepts from chemistry, the framework treats parameters like thermodynamic variables, enabling visualization of regions in parameter space that give rise to specific behaviors. Applying this framework to agents with minimally viable capabilities, we identify sufficient conditions for behaviors such as milling and diffusion and uncover regions of the parameter space that reliably produce these behaviors. Preliminary validation on real robots demonstrates that these regions correspond to observable behaviors in practice. By providing a principled, interpretable approach, this framework lays the groundwork for predictable and reliable emergent behavior in real-world swarm systems.
comment: 9 pages, 8 figures, 1 table
Learning Neural Observer-Predictor Models for Limb-level Sampling-based Locomotion Planning
Accurate full-body motion prediction is essential for the safe, autonomous navigation of legged robots, enabling critical capabilities like limb-level collision checking in cluttered environments. Simplified kinematic models often fail to capture the complex, closed-loop dynamics of the robot and its low-level controller, limiting their predictions to simple planar motion. To address this, we present a learning-based observer-predictor framework that accurately predicts this motion. Our method features a neural observer with provable UUB guarantees that provides a reliable latent state estimate from a history of proprioceptive measurements. This stable estimate initializes a computationally efficient predictor, designed for the rapid, parallel evaluation of thousands of potential trajectories required by modern sampling-based planners. We validated the system by integrating our neural predictor into an MPPI-based planner on a Vision 60 quadruped. Hardware experiments successfully demonstrated effective, limb-aware motion planning in a challenging, narrow passage and over small objects, highlighting our system's ability to provide a robust foundation for high-performance, collision-aware planning on dynamic robotic platforms.
PIP-LLM: Integrating PDDL-Integer Programming with LLMs for Coordinating Multi-Robot Teams Using Natural Language
Enabling robot teams to execute natural language commands requires translating high-level instructions into feasible, efficient multi-robot plans. While Large Language Models (LLMs) combined with Planning Domain Description Language (PDDL) offer promise for single-robot scenarios, existing approaches struggle with multi-robot coordination due to brittle task decomposition, poor scalability, and low coordination efficiency. We introduce PIP-LLM, a language-based coordination framework that consists of PDDL-based team-level planning and Integer Programming (IP) based robot-level planning. PIP-LLMs first decomposes the command by translating the command into a team-level PDDL problem and solves it to obtain a team-level plan, abstracting away robot assignment. Each team-level action represents a subtask to be finished by the team. Next, this plan is translated into a dependency graph representing the subtasks' dependency structure. Such a dependency graph is then used to guide the robot-level planning, in which each subtask node will be formulated as an IP-based task allocation problem, explicitly optimizing travel costs and workload while respecting robot capabilities and user-defined constraints. This separation of planning from assignment allows PIP-LLM to avoid the pitfalls of syntax-based decomposition and scale to larger teams. Experiments across diverse tasks show that PIP-LLM improves plan success rate, reduces maximum and average travel costs, and achieves better load balancing compared to state-of-the-art baselines.
TWC-SLAM: Multi-Agent Cooperative SLAM with Text Semantics and WiFi Features Integration for Similar Indoor Environments IROS
Multi-agent cooperative SLAM often encounters challenges in similar indoor environments characterized by repetitive structures, such as corridors and rooms. These challenges can lead to significant inaccuracies in shared location identification when employing point cloud-based techniques. To mitigate these issues, we introduce TWC-SLAM, a multi-agent cooperative SLAM framework that integrates text semantics and WiFi signal features to enhance location identification and loop closure detection. TWC-SLAM comprises a single-agent front-end odometry module based on FAST-LIO2, a location identification and loop closure detection module that leverages text semantics and WiFi features, and a global mapping module. The agents are equipped with sensors capable of capturing textual information and detecting WiFi signals. By correlating these data sources, TWC-SLAM establishes a common location, facilitating point cloud alignment across different agents' maps. Furthermore, the system employs loop closure detection and optimization modules to achieve global optimization and cohesive mapping. We evaluated our approach using an indoor dataset featuring similar corridors, rooms, and text signs. The results demonstrate that TWC-SLAM significantly improves the performance of cooperative SLAM systems in complex environments with repetitive architectural features.
comment: Accepted by the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2025
Policies over Poses: Reinforcement Learning based Distributed Pose-Graph Optimization for Multi-Robot SLAM
We consider the distributed pose-graph optimization (PGO) problem, which is fundamental in accurate trajectory estimation in multi-robot simultaneous localization and mapping (SLAM). Conventional iterative approaches linearize a highly non-convex optimization objective, requiring repeated solving of normal equations, which often converge to local minima and thus produce suboptimal estimates. We propose a scalable, outlier-robust distributed planar PGO framework using Multi-Agent Reinforcement Learning (MARL). We cast distributed PGO as a partially observable Markov game defined on local pose-graphs, where each action refines a single edge's pose estimate. A graph partitioner decomposes the global pose graph, and each robot runs a recurrent edge-conditioned Graph Neural Network (GNN) encoder with adaptive edge-gating to denoise noisy edges. Robots sequentially refine poses through a hybrid policy that utilizes prior action memory and graph embeddings. After local graph correction, a consensus scheme reconciles inter-robot disagreements to produce a globally consistent estimate. Our extensive evaluations on a comprehensive suite of synthetic and real-world datasets demonstrate that our learned MARL-based actors reduce the global objective by an average of 37.5% more than the state-of-the-art distributed PGO framework, while enhancing inference efficiency by at least 6X. We also demonstrate that actor replication allows a single learned policy to scale effortlessly to substantially larger robot teams without any retraining. Code is publicly available at https://github.com/herolab-uga/policies-over-poses.
comment: IEEE International Symposium on Multi-Robot & Multi-Agent Systems (MRS) 2025
SCAL for Pinch-Lifting: Complementary Rotational and Linear Prototypes for Environment-Adaptive Grasping IROS 2025
This paper presents environment-adaptive pinch-lifting built on a slot-constrained adaptive linkage (SCAL) and instantiated in two complementary fingers: SCAL-R, a rotational-drive design with an active fingertip that folds inward after contact to form an envelope, and SCAL-L, a linear-drive design that passively opens on contact to span wide or weak-feature objects. Both fingers convert surface following into an upward lifting branch while maintaining fingertip orientation, enabling thin or low-profile targets to be raised from supports with minimal sensing and control. Two-finger grippers are fabricated via PLA-based 3D printing. Experiments evaluate (i) contact-preserving sliding and pinch-lifting on tabletops, (ii) ramp negotiation followed by lift, and (iii) handling of bulky objects via active enveloping (SCAL-R) or contact-triggered passive opening (SCAL-L). Across dozens of trials on small parts, boxes, jars, and tape rolls, both designs achieve consistent grasps with limited tuning. A quasi-static analysis provides closed-form fingertip-force models for linear parallel pinching and two-point enveloping, offering geometry-aware guidance for design and operation. Overall, the results indicate complementary operating regimes and a practical path to robust, environment-adaptive grasping with simple actuation.
comment: Preliminary version presented at the IROS 2025 CIM Workshop, where it was selected as a Best Demo Award (Finalist) and subsequently received the Best Demo Award after oral presentation
ATLAS: Actor-Critic Task-Completion with Look-ahead Action Simulation NeurIPS 2025
We observe that current state-of-the-art web-agents are unable to effectively adapt to new environments without neural network fine-tuning, without which they produce inefficient execution plans due to a lack of awareness of the structure and dynamics of the new environment. To address this limitation, we introduce ATLAS (Actor-Critic Task-completion with Look-ahead Action Simulation), a memory-augmented agent that is able to make plans grounded in a model of the environment by simulating the consequences of those actions in cognitive space. Our agent starts by building a "cognitive map" by performing a lightweight curiosity driven exploration of the environment. The planner proposes candidate actions; the simulator predicts their consequences in cognitive space; a critic analyzes the options to select the best roll-out and update the original plan; and a browser executor performs the chosen action. On the WebArena-Lite Benchmark, we achieve a 63% success rate compared to 53.9% success rate for the previously published state-of-the-art. Unlike previous systems, our modular architecture requires no website-specific LLM fine-tuning. Ablations show sizable drops without the world-model, hierarchical planner, and look-ahead-based replanner confirming their complementary roles within the design of our system
comment: 9 pages, NeurIPS 2025 Workshop on Language Agents and World Models
RL-AVIST: Reinforcement Learning for Autonomous Visual Inspection of Space Targets
The growing need for autonomous on-orbit services such as inspection, maintenance, and situational awareness calls for intelligent spacecraft capable of complex maneuvers around large orbital targets. Traditional control systems often fall short in adaptability, especially under model uncertainties, multi-spacecraft configurations, or dynamically evolving mission contexts. This paper introduces RL-AVIST, a Reinforcement Learning framework for Autonomous Visual Inspection of Space Targets. Leveraging the Space Robotics Bench (SRB), we simulate high-fidelity 6-DOF spacecraft dynamics and train agents using DreamerV3, a state-of-the-art model-based RL algorithm, with PPO and TD3 as model-free baselines. Our investigation focuses on 3D proximity maneuvering tasks around targets such as the Lunar Gateway and other space assets. We evaluate task performance under two complementary regimes: generalized agents trained on randomized velocity vectors, and specialized agents trained to follow fixed trajectories emulating known inspection orbits. Furthermore, we assess the robustness and generalization of policies across multiple spacecraft morphologies and mission domains. Results demonstrate that model-based RL offers promising capabilities in trajectory fidelity, and sample efficiency, paving the way for scalable, retrainable control solutions for future space operations
Uncertainty-Aware Autonomous Vehicles: Predicting the Road Ahead
Autonomous Vehicle (AV) perception systems have advanced rapidly in recent years, providing vehicles with the ability to accurately interpret their environment. Perception systems remain susceptible to errors caused by overly-confident predictions in the case of rare events or out-of-sample data. This study equips an autonomous vehicle with the ability to 'know when it is uncertain', using an uncertainty-aware image classifier as part of the AV software stack. Specifically, the study exploits the ability of Random-Set Neural Networks (RS-NNs) to explicitly quantify prediction uncertainty. Unlike traditional CNNs or Bayesian methods, RS-NNs predict belief functions over sets of classes, allowing the system to identify and signal uncertainty clearly in novel or ambiguous scenarios. The system is tested in a real-world autonomous racing vehicle software stack, with the RS-NN classifying the layout of the road ahead and providing the associated uncertainty of the prediction. Performance of the RS-NN under a range of road conditions is compared against traditional CNN and Bayesian neural networks, with the RS-NN achieving significantly higher accuracy and superior uncertainty calibration. This integration of RS-NNs into Robot Operating System (ROS)-based vehicle control pipeline demonstrates that predictive uncertainty can dynamically modulate vehicle speed, maintaining high-speed performance under confident predictions while proactively improving safety through speed reductions in uncertain scenarios. These results demonstrate the potential of uncertainty-aware neural networks - in particular RS-NNs - as a practical solution for safer and more robust autonomous driving.
RoGER-SLAM: A Robust Gaussian Splatting SLAM System for Noisy and Low-light Environment Resilience
The reliability of Simultaneous Localization and Mapping (SLAM) is severely constrained in environments where visual inputs suffer from noise and low illumination. Although recent 3D Gaussian Splatting (3DGS) based SLAM frameworks achieve high-fidelity mapping under clean conditions, they remain vulnerable to compounded degradations that degrade mapping and tracking performance. A key observation underlying our work is that the original 3DGS rendering pipeline inherently behaves as an implicit low-pass filter, attenuating high-frequency noise but also risking over-smoothing. Building on this insight, we propose RoGER-SLAM, a robust 3DGS SLAM system tailored for noise and low-light resilience. The framework integrates three innovations: a Structure-Preserving Robust Fusion (SP-RoFusion) mechanism that couples rendered appearance, depth, and edge cues; an adaptive tracking objective with residual balancing regularization; and a Contrastive Language-Image Pretraining (CLIP)-based enhancement module, selectively activated under compounded degradations to restore semantic and structural fidelity. Comprehensive experiments on Replica, TUM, and real-world sequences show that RoGER-SLAM consistently improves trajectory accuracy and reconstruction quality compared with other 3DGS-SLAM systems, especially under adverse imaging conditions.
comment: 13 pages, 11 figures, under review
Curriculum-Based Iterative Self-Play for Scalable Multi-Drone Racing
The coordination of multiple autonomous agents in high-speed, competitive environments represents a significant engineering challenge. This paper presents CRUISE (Curriculum-Based Iterative Self-Play for Scalable Multi-Drone Racing), a reinforcement learning framework designed to solve this challenge in the demanding domain of multi-drone racing. CRUISE overcomes key scalability limitations by synergistically combining a progressive difficulty curriculum with an efficient self-play mechanism to foster robust competitive behaviors. Validated in high-fidelity simulation with realistic quadrotor dynamics, the resulting policies significantly outperform both a standard reinforcement learning baseline and a state-of-the-art game-theoretic planner. CRUISE achieves nearly double the planner's mean racing speed, maintains high success rates, and demonstrates robust scalability as agent density increases. Ablation studies confirm that the curriculum structure is the critical component for this performance leap. By providing a scalable and effective training methodology, CRUISE advances the development of autonomous systems for dynamic, competitive tasks and serves as a blueprint for future real-world deployment.
comment: 13 pages, 5 figures. This paper is currently under review at the journal Engineering Applications of Artificial Intelligence. Supplementary video: https://drive.google.com/file/d/1k7necen2DgIxaYT2alKK8-b20sE_AyDA/view Source code and models: https://doi.org/10.5281/zenodo.17256943
SPIRAL: Self-Play Incremental Racing Algorithm for Learning in Multi-Drone Competitions
This paper introduces SPIRAL (Self-Play Incremental Racing Algorithm for Learning), a novel approach for training autonomous drones in multi-agent racing competitions. SPIRAL distinctively employs a self-play mechanism to incrementally cultivate complex racing behaviors within a challenging, dynamic environment. Through this self-play core, drones continuously compete against increasingly proficient versions of themselves, naturally escalating the difficulty of competitive interactions. This progressive learning journey guides agents from mastering fundamental flight control to executing sophisticated cooperative multi-drone racing strategies. Our method is designed for versatility, allowing integration with any state-of-the-art Deep Reinforcement Learning (DRL) algorithms within its self-play framework. Simulations demonstrate the significant advantages of SPIRAL and benchmark the performance of various DRL algorithms operating within it. Consequently, we contribute a versatile, scalable, and self-improving learning framework to the field of autonomous drone racing. SPIRAL's capacity to autonomously generate appropriate and escalating challenges through its self-play dynamic offers a promising direction for developing robust and adaptive racing strategies in multi-agent environments. This research opens new avenues for enhancing the performance and reliability of autonomous racing drones in increasingly complex and competitive scenarios.
comment: \c{opyright} 2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Bag-of-Word-Groups (BoWG): A Robust and Efficient Loop Closure Detection Method Under Perceptual Aliasing IROS
Loop closure is critical in Simultaneous Localization and Mapping (SLAM) systems to reduce accumulative drift and ensure global mapping consistency. However, conventional methods struggle in perceptually aliased environments, such as narrow pipes, due to vector quantization, feature sparsity, and repetitive textures, while existing solutions often incur high computational costs. This paper presents Bag-of-Word-Groups (BoWG), a novel loop closure detection method that achieves superior precision-recall, robustness, and computational efficiency. The core innovation lies in the introduction of word groups, which captures the spatial co-occurrence and proximity of visual words to construct an online dictionary. Additionally, drawing inspiration from probabilistic transition models, we incorporate temporal consistency directly into similarity computation with an adaptive scheme, substantially improving precision-recall performance. The method is further strengthened by a feature distribution analysis module and dedicated post-verification mechanisms. To evaluate the effectiveness of our method, we conduct experiments on both public datasets and a confined-pipe dataset we constructed. Results demonstrate that BoWG surpasses state-of-the-art methods, including both traditional and learning-based approaches, in terms of precision-recall and computational efficiency. Our approach also exhibits excellent scalability, achieving an average processing time of 16 ms per image across 17,565 images in the Bicocca25b dataset.
comment: This paper has been accepted by IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2025
Ant-inspired Walling Strategies for Scalable Swarm Separation: Reinforcement Learning Approaches Based on Finite State Machines
In natural systems, emergent structures often arise to balance competing demands. Army ants, for example, form temporary "walls" that prevent interference between foraging trails. Inspired by this behavior, we developed two decentralized controllers for heterogeneous robotic swarms to maintain spatial separation while executing concurrent tasks. The first is a finite-state machine (FSM)-based controller that uses encounter-triggered transitions to create rigid, stable walls. The second integrates FSM states with a Deep Q-Network (DQN), dynamically optimizing separation through emergent "demilitarized zones." In simulation, both controllers reduce mixing between subgroups, with the DQN-enhanced controller improving adaptability and reducing mixing by 40-50% while achieving faster convergence.
On Steerability Factors for Growing Vine Robots
Vine robots extend their tubular bodies by everting material from the tip, enabling navigation in complex environments with a minimalist soft body. Despite their promise for field applications, especially in the urban search and rescue domain, performance is constrained by the weight of attached sensors or tools, as well as other design and control choices. This work investigates how tip load, pressure, length, diameter, and fabrication method shape vine robot steerability--the ability to maneuver with controlled curvature--for robots that steer with series pouch motor-style pneumatic actuators. We conduct two groups of experiments: (1) studying tip load, chamber pressure, length, and diameter in a robot supporting itself against gravity, and (2) studying fabrication method and ratio of actuator to chamber pressure in a robot supported on the ground. Results show that steerability decreases with increasing tip load, is best at moderate chamber pressure, increases with length, and is largely unaffected by diameter. Robots with actuators attached on their exterior begin curving at low pressure ratios, but curvature saturates at high pressure ratios; those with actuators integrated into the robot body require higher pressure ratios to begin curving but achieve higher curvature overall. We demonstrate that robots optimized with these principles outperform those with ad hoc parameters in a mobility task that involves maximizing upward and horizontal curvatures.
Forward Kinematics Solution For A General Stewart Platform Through Iteration Based Simulation
This paper presents a method to generate feasible, unique forward-kinematic solutions for a general Stewart platform. This is done by using inverse kinematics to obtain valid workspace data and corresponding actuator lengths for the moving platform. For parallel kinematic machines, such as the Stewart Platform, inverse kinematics are straight forward, but the forward kinematics are complex and generates multiple solutions due to the closed loop structure of the kinematic links. In this research, a simple iterative algorithm has been used employing modified Denavit-Hartenberg convention. The outcome is encouraging as this method generates a single feasible forward kinematic solution for each valid pose with the solved DH parameters and unlike earlier forward kinematics solutions, this unique solution does not need to be manually verified. Therefore, the forward kinematic solutions can be used directly for further calculations without the need for manual pose verification. This capability is essential for the six degree of freedom materials testing system developed by the authors in their laboratory. The developed system is aimed at characterizing additively manufactured materials under complex combined multiple loading conditions. The material characterization is done by enabling high precision force control on the moving platform via in situ calibration of the as-built kinematics of the Stewart Gough Platform.
Diffusion Beats Autoregressive in Data-Constrained Settings
Autoregressive (AR) models have long dominated the landscape of large language models, driving progress across a wide range of tasks. Recently, diffusion-based language models have emerged as a promising alternative, though their advantages over AR models remain underexplored. In this paper, we systematically study masked diffusion models in data-constrained settings where training involves repeated passes over limited data and find that they significantly outperform AR models when compute is abundant but data is scarce. Diffusion models make better use of repeated data, achieving lower validation loss and superior downstream performance. We find new scaling laws for diffusion models and derive a closed-form expression for the critical compute threshold at which diffusion begins to outperform AR. Finally, we explain why diffusion models excel in this regime: their randomized masking objective implicitly trains over a rich distribution of token orderings, acting as an implicit data augmentation that AR's fixed left-to-right factorization lacks. Our results suggest that when data, not compute, is the bottleneck, diffusion models offer a compelling alternative to the standard AR paradigm. Our code is available at: https://diffusion-scaling.github.io.
comment: Project Webpage: https://diffusion-scaling.github.io
Stability Criteria and Motor Performance in Delayed Haptic Dyadic Interactions Mediated by Robots
This paper establishes analytical stability criteria for robot-mediated human-human (dyadic) interaction systems, focusing on haptic communication under network-induced time delays. Through frequency-domain analysis supported by numerical simulations, we identify both delay-independent and delay-dependent stability criteria. The delay-independent criterion guarantees stability irrespective of the delay, whereas the delay-dependent criterion is characterised by a maximum tolerable delay before instability occurs. The criteria demonstrate dependence on controller and robot dynamic parameters, where increasing stiffness reduces the maximum tolerable delay in a non-linear manner, thereby heightening system vulnerability. The proposed criteria can be generalised to a wide range of robot-mediated interactions and serve as design guidelines for stable remote dyadic systems. Experiments with robots performing human-like movements further illustrate the correlation between stability and motor performance. The findings of this paper suggest the prerequisites for effective delay-compensation strategies.
Correspondence-Free, Function-Based Sim-to-Real Learning for Deformable Surface Control
This paper presents a correspondence-free, function-based sim-to-real learning method for controlling deformable freeform surfaces. Unlike traditional sim-to-real transfer methods that strongly rely on marker points with full correspondences, our approach simultaneously learns a deformation function space and a confidence map -- both parameterized by a neural network -- to map simulated shapes to their real-world counterparts. As a result, the sim-to-real learning can be conducted by input from either a 3D scanner as point clouds (without correspondences) or a motion capture system as marker points (tolerating missed markers). The resultant sim-to-real transfer can be seamlessly integrated into a neural network-based computational pipeline for inverse kinematics and shape control. We demonstrate the versatility and adaptability of our method on both vision devices and across four pneumatically actuated soft robots: a deformable membrane, a robotic mannequin, and two soft manipulators.
comment: arXiv admin note: text overlap with arXiv:2405.08935
Adversarial Locomotion and Motion Imitation for Humanoid Policy Learning NeurIPS 2025
Humans exhibit diverse and expressive whole-body movements. However, attaining human-like whole-body coordination in humanoid robots remains challenging, as conventional approaches that mimic whole-body motions often neglect the distinct roles of upper and lower body. This oversight leads to computationally intensive policy learning and frequently causes robot instability and falls during real-world execution. To address these issues, we propose Adversarial Locomotion and Motion Imitation (ALMI), a novel framework that enables adversarial policy learning between upper and lower body. Specifically, the lower body aims to provide robust locomotion capabilities to follow velocity commands while the upper body tracks various motions. Conversely, the upper-body policy ensures effective motion tracking when the robot executes velocity-based movements. Through iterative updates, these policies achieve coordinated whole-body control, which can be extended to loco-manipulation tasks with teleoperation systems. Extensive experiments demonstrate that our method achieves robust locomotion and precise motion tracking in both simulation and on the full-size Unitree H1 robot. Additionally, we release a large-scale whole-body motion control dataset featuring high-quality episodic trajectories from MuJoCo simulations deployable on real robots. The project page is https://almi-humanoid.github.io.
comment: NeurIPS 2025. Code: https://github.com/TeleHuman/ALMI-Open, Dataset: https://huggingface.co/datasets/TeleEmbodied/ALMI-X
Pretraining a Unified PDDL Domain from Real-World Demonstrations for Generalizable Robot Task Planning NeurIPS 2025
Robotic task planning in real-world environments requires reasoning over implicit constraints from language and vision. While LLMs and VLMs offer strong priors, they struggle with long-horizon structure and symbolic grounding. Existing methods that combine LLMs with symbolic planning often rely on handcrafted or narrow domains, limiting generalization. We propose UniDomain, a framework that pre-trains a PDDL domain from robot manipulation demonstrations and applies it for online robotic task planning. It extracts atomic domains from 12,393 manipulation videos to form a unified domain with 3137 operators, 2875 predicates, and 16481 causal edges. Given a target class of tasks, it retrieves relevant atomics from the unified domain and systematically fuses them into high-quality meta-domains to support compositional generalization in planning. Experiments on diverse real-world tasks show that UniDomain solves complex, unseen tasks in a zero-shot manner, achieving up to 58% higher task success and 160% improvement in plan optimality over state-of-the-art LLM and LLM-PDDL baselines.
comment: Accepted at NeurIPS 2025
ASC-SW: Atrous strip convolution network with sliding windows
With the rapid development of lightweight visual neural network architectures, traditional high-performance vision models have undergone significant compression, enhancing their computational and energy efficiency and enabling deployment on resource-constrained edge devices. In order to enable the mobile robot to avoid the ground wires, we propose a visual-assisted navigation framework called Atrous Strip Convolution Sliding Window (ASC-SW). This framework compensates for the limitations of traditional light detection and range (LiDAR) sensors to detect ground-level obstacles such as wires. A lightweight and efficient segmentation model, Atrous Strip Convolution Network (ASCnet) was proposed, for detecting deformable linear objects (DLOs). Atrous Strip Convolution Spatial Pyramid Pooling (ASCSPP) is designed to extract DLOs features effectively. Atrous Strip Convolution is integrated into ASCSPP to accurately identify the linear structure of DLOs with low computational cost. Additionally, a Sliding Window (SW) post processing module is proposed to denoise the output in complex environments, improving recognition accuracy. ASC-SW achieves 75.3% MIoU at 217 FPS on a self-built real world dataset and real-robot experiment was demonstrated that our proposed framework. It can be successfully verified on the real-robot on the edge device(Jetson platform) at that were originally inoperable.
comment: The data of model comparsion in chapter 4 need to be modified
Optimal Kinematic Synthesis and Prototype Development of Knee Exoskeleton
This study focuses on enhancing the design of an existing knee exoskeleton by addressing limitations in the range of motion (ROM) during Sit-to-Stand (STS) motions. While current knee exoskeletons emphasize toughness and rehabilitation, their closed-loop mechanisms hinder optimal ROM, which is crucial for effective rehabilitation. This research aims to optimize the exoskeleton design to achieve the necessary ROM, improving its functionality in rehabilitation. This can be achieved by utilizing kinematic modeling and formulation, the existing design was represented in the non-linear and non-convex mathematical functions. Optimization techniques, considering constraints based on human leg measurements, were applied to determine the best dimensions for the exoskeleton. This resulted in a significant increase in ROM compared to existing models. A MATLAB program was developed to compare the ROM of the optimized exoskeleton with the original design. To validate the practicality of the optimized design, analysis was conducted using a mannequin with average human dimensions, followed by constructing a cardboard dummy model to confirm simulation results. The STS motion of an average human was captured using a camera and TRACKER software, and the motion was compared with that of the dummy model to identify any misalignments between the human and exoskeleton knee joints. Furthermore, a prototype of the knee joint exoskeleton is being developed to further investigate misalignments and improve the design. Future work includes the use of EMG sensors for more detailed analysis and better results.
Soft and Compliant Contact-Rich Hair Manipulation and Care
Hair care robots can help address labor shortages in elderly care while enabling those with limited mobility to maintain their hair-related identity. We present MOE-Hair, a soft robot system that performs three hair-care tasks: head patting, finger combing, and hair grasping. The system features a tendon-driven soft robot end-effector (MOE) with a wrist-mounted RGBD camera, leveraging both mechanical compliance for safety and visual force sensing through deformation. In testing with a force-sensorized mannequin head, MOE achieved comparable hair-grasping effectiveness while applying significantly less force than rigid grippers. Our novel force estimation method combines visual deformation data and tendon tensions from actuators to infer applied forces, reducing sensing errors by up to 60.1% and 20.3% compared to actuator current load-only and depth image-only baselines, respectively. A user study with 12 participants demonstrated statistically significant preferences for MOE-Hair over a baseline system in terms of comfort, effectiveness, and appropriate force application. These results demonstrate the unique advantages of soft robots in contact-rich hair-care tasks, while highlighting the importance of precise force control despite the inherent compliance of the system.
SAC Flow: Sample-Efficient Reinforcement Learning of Flow-Based Policies via Velocity-Reparameterized Sequential Modeling
Training expressive flow-based policies with off-policy reinforcement learning is notoriously unstable due to gradient pathologies in the multi-step action sampling process. We trace this instability to a fundamental connection: the flow rollout is algebraically equivalent to a residual recurrent computation, making it susceptible to the same vanishing and exploding gradients as RNNs. To address this, we reparameterize the velocity network using principles from modern sequential models, introducing two stable architectures: Flow-G, which incorporates a gated velocity, and Flow-T, which utilizes a decoded velocity. We then develop a practical SAC-based algorithm, enabled by a noise-augmented rollout, that facilitates direct end-to-end training of these policies. Our approach supports both from-scratch and offline-to-online learning and achieves state-of-the-art performance on continuous control and robotic manipulation benchmarks, eliminating the need for common workarounds like policy distillation or surrogate objectives.
SceneWeaver: All-in-One 3D Scene Synthesis with an Extensible and Self-Reflective Agent NeurIPS 2025
Indoor scene synthesis has become increasingly important with the rise of Embodied AI, which requires 3D environments that are not only visually realistic but also physically plausible and functionally diverse. While recent approaches have advanced visual fidelity, they often remain constrained to fixed scene categories, lack sufficient object-level detail and physical consistency, and struggle to align with complex user instructions. In this work, we present SceneWeaver, a reflective agentic framework that unifies diverse scene synthesis paradigms through tool-based iterative refinement. At its core, SceneWeaver employs a language model-based planner to select from a suite of extensible scene generation tools, ranging from data-driven generative models to visual- and LLM-based methods, guided by self-evaluation of physical plausibility, visual realism, and semantic alignment with user input. This closed-loop reason-act-reflect design enables the agent to identify semantic inconsistencies, invoke targeted tools, and update the environment over successive iterations. Extensive experiments on both common and open-vocabulary room types demonstrate that SceneWeaver not only outperforms prior methods on physical, visual, and semantic metrics, but also generalizes effectively to complex scenes with diverse instructions, marking a step toward general-purpose 3D environment generation. Project website: https://scene-weaver.github.io/.
comment: Accepted by NeurIPS 2025, 26 pages
A Cycle Ride to HDR: Semantics Aware Self-Supervised Framework for Unpaired LDR-to-HDR Image Reconstruction
Reconstruction of High Dynamic Range (HDR) from Low Dynamic Range (LDR) images is an important computer vision task. There is a significant amount of research utilizing both conventional non-learning methods and modern data-driven approaches, focusing on using both single-exposed and multi-exposed LDR for HDR image reconstruction. However, most current state-of-the-art methods require high-quality paired {LDR;HDR} datasets with limited literature use of unpaired datasets, that is, methods that learn the LDR-HDR mapping between domains. This paper proposes CycleHDR, a method that integrates self-supervision into a modified semantic- and cycle-consistent adversarial architecture that utilizes unpaired LDR and HDR datasets for training. Our method introduces novel artifact- and exposure-aware generators to address visual artifact removal. It also puts forward an encoder and loss to address semantic consistency, another under-explored topic. CycleHDR is the first to use semantic and contextual awareness for the LDR-HDR reconstruction task in a self-supervised setup. The method achieves state-of-the-art performance across several benchmark datasets and reconstructs high-quality HDR images. The official website of this work is available at: https://github.com/HrishavBakulBarua/Cycle-HDR
Zero-Shot Trajectory Planning for Signal Temporal Logic Tasks
Signal Temporal Logic (STL) is a powerful specification language for describing complex temporal behaviors of continuous signals, making it well-suited for high-level robotic task descriptions. However, generating executable plans for STL tasks is challenging, as it requires consideration of the coupling between the task specification and the system dynamics. Existing approaches either follow a model-based setting that explicitly requires knowledge of the system dynamics or adopt a task-oriented data-driven approach to learn plans for specific tasks. In this work, we address the problem of generating executable STL plans for systems with unknown dynamics. We propose a hierarchical planning framework that enables zero-shot generalization to new STL tasks by leveraging only task-agnostic trajectory data during offline training. The framework consists of three key components: (i) decomposing the STL specification into several progresses and time constraints, (ii) searching for timed waypoints that satisfy all progresses under time constraints, and (iii) generating trajectory segments using a pre-trained diffusion model and stitching them into complete trajectories. We formally prove that our method guarantees STL satisfaction, and simulation results demonstrate its effectiveness in generating dynamically feasible trajectories across diverse long-horizon STL tasks.
Robotics
A short methodological review on social robot navigation benchmarking
Social Robot Navigation is the skill that allows robots to move efficiently in human-populated environments while ensuring safety, comfort, and trust. Unlike other areas of research, the scientific community has not yet achieved an agreement on how Social Robot Navigation should be benchmarked. This is notably important, as the lack of a de facto standard to benchmark Social Robot Navigation can hinder the progress of the field and may lead to contradicting conclusions. Motivated by this gap, we contribute with a short review focused exclusively on benchmarking trends in the period from January 2020 to July 2025. Of the 130 papers identified by our search using IEEE Xplore, we analysed the 85 papers that met the criteria of the review. This review addresses the metrics used in the literature for benchmarking purposes, the algorithms employed in such benchmarks, the use of human surveys for benchmarking, and how conclusions are drawn from the benchmarking results, when applicable.
comment: 18 pages, 14 of which references. 3 figures, 2 tables
Separation of Unconscious Robots with Obstructed Visibility
We study a recently introduced \textit{unconscious} mobile robot model, where each robot is associated with a \textit{color}, which is visible to other robots but not to itself. The robots are autonomous, anonymous, oblivious and silent, operating in the Euclidean plane under the conventional \textit{Look-Compute-Move} cycle. A primary task in this model is the \textit{separation problem}, where unconscious robots sharing the same color must separate from others, forming recognizable geometric shapes such as circles, points, or lines. All prior works model the robots as \textit{transparent}, enabling each to know the positions and colors of all other robots. In contrast, we model the robots as \textit{opaque}, where a robot can obstruct the visibility of two other robots, if it lies on the line segment between them. Under this obstructed visibility, we consider a variant of the separation problem in which robots, starting from any arbitrary initial configuration, are required to separate into concentric semicircles. We present a collision-free algorithm that solves the separation problem under a semi-synchronous scheduler in $O(n)$ epochs, where $n$ is the number of robots. The robots agree on one coordinate axis but have no knowledge of $n$.
A Novel Multi-Timescale Stability-Preserving Hierarchical Reinforcement Learning Controller Framework for Adaptive Control in High-Dimensional Dynamical Systems
Controlling high-dimensional stochastic systems, critical in robotics, autonomous vehicles, and hyperchaotic systems, faces the curse of dimensionality, lacks temporal abstraction, and often fails to ensure stochastic stability. To overcome these limitations, this study introduces the Multi-Timescale Lyapunov-Constrained Hierarchical Reinforcement Learning (MTLHRL) framework. MTLHRL integrates a hierarchical policy within a semi-Markov Decision Process (SMDP), featuring a high-level policy for strategic planning and a low-level policy for reactive control, which effectively manages complex, multi-timescale decision-making and reduces dimensionality overhead. Stability is rigorously enforced using a neural Lyapunov function optimized via Lagrangian relaxation and multi-timescale actor-critic updates, ensuring mean-square boundedness or asymptotic stability in the face of stochastic dynamics. The framework promotes efficient and reliable learning through trust-region constraints and decoupled optimization. Extensive simulations on an 8D hyperchaotic system and a 5-DOF robotic manipulator demonstrate MTLHRL's empirical superiority. It significantly outperforms baseline methods in both stability and performance, recording the lowest error indices (e.g., Integral Absolute Error (IAE): 3.912 in hyperchaotic control and IAE: 1.623 in robotics), achieving faster convergence, and exhibiting superior disturbance rejection. MTLHRL offers a theoretically grounded and practically viable solution for robust control of complex stochastic systems.
BLIP-FusePPO: A Vision-Language Deep Reinforcement Learning Framework for Lane Keeping in Autonomous Vehicles
In this paper, we propose Bootstrapped Language-Image Pretraining-driven Fused State Representation in Proximal Policy Optimization (BLIP-FusePPO), a novel multimodal reinforcement learning (RL) framework for autonomous lane-keeping (LK), in which semantic embeddings generated by a vision-language model (VLM) are directly fused with geometric states, LiDAR observations, and Proportional-Integral-Derivative-based (PID) control feedback within the agent observation space. The proposed method lets the agent learn driving rules that are aware of their surroundings and easy to understand by combining high-level scene understanding from the VLM with low-level control and spatial signals. Our architecture brings together semantic, geometric, and control-aware representations to make policy learning more robust. A hybrid reward function that includes semantic alignment, LK accuracy, obstacle avoidance, and speed regulation helps learning to be more efficient and generalizable. Our method is different from the approaches that only use semantic models to shape rewards. Instead, it directly embeds semantic features into the state representation. This cuts down on expensive runtime inference and makes sure that semantic guidance is always available. The simulation results show that the proposed model is better at LK stability and adaptability than the best vision-based and multimodal RL baselines in a wide range of difficult driving situations. We make our code publicly available.
comment: https://github.com/Amin-A96/BLIP-FusePPO-A-Vision-Language-Deep-Reinforcement-Learning-Framework-for-Lane-Keeping-in-Autonomous.git
Estimating Continuum Robot Shape under External Loading using Spatiotemporal Neural Networks IROS
This paper presents a learning-based approach for accurately estimating the 3D shape of flexible continuum robots subjected to external loads. The proposed method introduces a spatiotemporal neural network architecture that fuses multi-modal inputs, including current and historical tendon displacement data and RGB images, to generate point clouds representing the robot's deformed configuration. The network integrates a recurrent neural module for temporal feature extraction, an encoding module for spatial feature extraction, and a multi-modal fusion module to combine spatial features extracted from visual data with temporal dependencies from historical actuator inputs. Continuous 3D shape reconstruction is achieved by fitting B\'ezier curves to the predicted point clouds. Experimental validation demonstrates that our approach achieves high precision, with mean shape estimation errors of 0.08 mm (unloaded) and 0.22 mm (loaded), outperforming state-of-the-art methods in shape sensing for TDCRs. The results validate the efficacy of deep learning-based spatiotemporal data fusion for precise shape estimation under loading conditions.
comment: 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Toward Humanoid Brain-Body Co-design: Joint Optimization of Control and Morphology for Fall Recovery
Humanoid robots represent a central frontier in embodied intelligence, as their anthropomorphic form enables natural deployment in humans' workspace. Brain-body co-design for humanoids presents a promising approach to realizing this potential by jointly optimizing control policies and physical morphology. Within this context, fall recovery emerges as a critical capability. It not only enhances safety and resilience but also integrates naturally with locomotion systems, thereby advancing the autonomy of humanoids. In this paper, we propose RoboCraft, a scalable humanoid co-design framework for fall recovery that iteratively improves performance through the coupled updates of control policy and morphology. A shared policy pretrained across multiple designs is progressively finetuned on high-performing morphologies, enabling efficient adaptation without retraining from scratch. Concurrently, morphology search is guided by human-inspired priors and optimization algorithms, supported by a priority buffer that balances reevaluation of promising candidates with the exploration of novel designs. Experiments show that \ourmethod{} achieves an average performance gain of 44.55% on seven public humanoid robots, with morphology optimization drives at least 40% of improvements in co-designing four humanoid robots, underscoring the critical role of humanoid co-design.
Breaking the Static Assumption: A Dynamic-Aware LIO Framework Via Spatio-Temporal Normal Analysis
This paper addresses the challenge of Lidar-Inertial Odometry (LIO) in dynamic environments, where conventional methods often fail due to their static-world assumptions. Traditional LIO algorithms perform poorly when dynamic objects dominate the scenes, particularly in geometrically sparse environments. Current approaches to dynamic LIO face a fundamental challenge: accurate localization requires a reliable identification of static features, yet distinguishing dynamic objects necessitates precise pose estimation. Our solution breaks this circular dependency by integrating dynamic awareness directly into the point cloud registration process. We introduce a novel dynamic-aware iterative closest point algorithm that leverages spatio-temporal normal analysis, complemented by an efficient spatial consistency verification method to enhance static map construction. Experimental evaluations demonstrate significant performance improvements over state-of-the-art LIO systems in challenging dynamic environments with limited geometric structure. The code and dataset are available at https://github.com/thisparticle/btsa.
comment: 8 pages, 7 figures, Accepted to IEEE Robotics and Automation Letters (RA-L)
CGoT: A Novel Inference Mechanism for Embodied Multi-Agent Systems Using Composable Graphs of Thoughts
The integration of self-driving cars and service robots is becoming increasingly prevalent across a wide array of fields, playing a crucial and expanding role in both industrial applications and everyday life. In parallel, the rapid advancements in Large Language Models (LLMs) have garnered substantial attention and interest within the research community. This paper introduces a novel vehicle-robot system that leverages the strengths of both autonomous vehicles and service robots. In our proposed system, two autonomous ego-vehicles transports service robots to locations within an office park, where they perform a series of tasks. The study explores the feasibility and potential benefits of incorporating LLMs into this system, with the aim of enhancing operational efficiency and maximizing the potential of the cooperative mechanisms between the vehicles and the robots. This paper proposes a novel inference mechanism which is called CGOT toward this type of system where an agent can carry another agent. Experimental results are presented to validate the performance of the proposed method.
Bridging Perception and Reasoning: Dual-Pipeline Neuro-Symbolic Landing for UAVs in Cluttered Environments
Autonomous landing in unstructured (cluttered, uneven, and map-poor) environments is a core requirement for Unmanned Aerial Vehicles (UAVs), yet purely vision-based or deep learning models often falter under covariate shift and provide limited interpretability. We propose NeuroSymLand, a neuro-symbolic framework that tightly couples two complementary pipelines: (i) an offline pipeline, where Large Language Models (LLMs) and human-in-the-loop refinement synthesize Scallop code from diverse landing scenarios, distilling generalizable and verifiable symbolic knowledge; and (ii) an online pipeline, where a compact foundation-based semantic segmentation model generates probabilistic Scallop facts that are composed into semantic scene graphs for real-time deductive reasoning. This design combines the perceptual strengths of lightweight foundation models with the interpretability and verifiability of symbolic reasoning. Node attributes (e.g., flatness, area) and edge relations (adjacency, containment, proximity) are computed with geometric routines rather than learned, avoiding the data dependence and latency of train-time graph builders. The resulting Scallop program encodes landing principles (avoid water and obstacles; prefer large, flat, accessible regions) and yields calibrated safety scores with ranked Regions of Interest (ROIs) and human-readable justifications. Extensive evaluations across datasets, diverse simulation maps, and real UAV hardware show that NeuroSymLand achieves higher accuracy, stronger robustness to covariate shift, and superior efficiency compared with state-of-the-art baselines, while advancing UAV safety and reliability in emergency response, surveillance, and delivery missions.
ACG: Action Coherence Guidance for Flow-based VLA models
Diffusion and flow matching models have emerged as powerful robot policies, enabling Vision-Language-Action (VLA) models to generalize across diverse scenes and instructions. Yet, when trained via imitation learning, their high generative capacity makes them sensitive to noise in human demonstrations: jerks, pauses, and jitter which reduce action coherence. Reduced action coherence causes instability and trajectory drift during deployment, failures that are catastrophic in fine-grained manipulation where precision is crucial. In this paper, we present Action Coherence Guidance (ACG) for VLA models, a training-free test-time guidance algorithm that improves action coherence and thereby yields performance gains. Evaluated on RoboCasa, DexMimicGen, and real-world SO-101 tasks, ACG consistently improves action coherence and boosts success rates across diverse manipulation tasks. Code and project page are available at https://github.com/DAVIAN-Robotics/ACG and https://DAVIAN-Robotics.github.io/ACG , respectively.
MOGRAS: Human Motion with Grasping in 3D Scenes
Generating realistic full-body motion interacting with objects is critical for applications in robotics, virtual reality, and human-computer interaction. While existing methods can generate full-body motion within 3D scenes, they often lack the fidelity for fine-grained tasks like object grasping. Conversely, methods that generate precise grasping motions typically ignore the surrounding 3D scene. This gap, generating full-body grasping motions that are physically plausible within a 3D scene, remains a significant challenge. To address this, we introduce MOGRAS (Human MOtion with GRAsping in 3D Scenes), a large-scale dataset that bridges this gap. MOGRAS provides pre-grasping full-body walking motions and final grasping poses within richly annotated 3D indoor scenes. We leverage MOGRAS to benchmark existing full-body grasping methods and demonstrate their limitations in scene-aware generation. Furthermore, we propose a simple yet effective method to adapt existing approaches to work seamlessly within 3D scenes. Through extensive quantitative and qualitative experiments, we validate the effectiveness of our dataset and highlight the significant improvements our proposed method achieves, paving the way for more realistic human-scene interactions.
comment: British Machine Vision Conference Workshop - From Scene Understanding to Human Modeling
LT-Exosense: A Vision-centric Multi-session Mapping System for Lifelong Safe Navigation of Exoskeletons
Self-balancing exoskeletons offer a promising mobility solution for individuals with lower-limb disabilities. For reliable long-term operation, these exoskeletons require a perception system that is effective in changing environments. In this work, we introduce LT-Exosense, a vision-centric, multi-session mapping system designed to support long-term (semi)-autonomous navigation for exoskeleton users. LT-Exosense extends single-session mapping capabilities by incrementally fusing spatial knowledge across multiple sessions, detecting environmental changes, and updating a persistent global map. This representation enables intelligent path planning, which can adapt to newly observed obstacles and can recover previous routes when obstructions are removed. We validate LT-Exosense through several real-world experiments, demonstrating a scalable multi-session map that achieves an average point-to-point error below 5 cm when compared to ground-truth laser scans. We also illustrate the potential application of adaptive path planning in dynamically changing indoor environments.
comment: 8 pages, 4 figures
LOC: A General Language-Guided Framework for Open-Set 3D Occupancy Prediction
Vision-Language Models (VLMs) have shown significant progress in open-set challenges. However, the limited availability of 3D datasets hinders their effective application in 3D scene understanding. We propose LOC, a general language-guided framework adaptable to various occupancy networks, supporting both supervised and self-supervised learning paradigms. For self-supervised tasks, we employ a strategy that fuses multi-frame LiDAR points for dynamic/static scenes, using Poisson reconstruction to fill voids, and assigning semantics to voxels via K-Nearest Neighbor (KNN) to obtain comprehensive voxel representations. To mitigate feature over-homogenization caused by direct high-dimensional feature distillation, we introduce Densely Contrastive Learning (DCL). DCL leverages dense voxel semantic information and predefined textual prompts. This efficiently enhances open-set recognition without dense pixel-level supervision, and our framework can also leverage existing ground truth to further improve performance. Our model predicts dense voxel features embedded in the CLIP feature space, integrating textual and image pixel information, and classifies based on text and semantic similarity. Experiments on the nuScenes dataset demonstrate the method's superior performance, achieving high-precision predictions for known classes and distinguishing unknown classes without additional training data.
EasyUUV: An LLM-Enhanced Universal and Lightweight Sim-to-Real Reinforcement Learning Framework for UUV Attitude Control
Despite recent advances in Unmanned Underwater Vehicle (UUV) attitude control, existing methods still struggle with generalizability, robustness to real-world disturbances, and efficient deployment. To address the above challenges, this paper presents EasyUUV, a Large Language Model (LLM)-enhanced, universal, and lightweight simulation-to-reality reinforcement learning (RL) framework for robust attitude control of UUVs. EasyUUV combines parallelized RL training with a hybrid control architecture, where a learned policy outputs high-level attitude corrections executed by an adaptive S-Surface controller. A multimodal LLM is further integrated to adaptively tune controller parameters at runtime using visual and textual feedback, enabling training-free adaptation to unmodeled dynamics. Also, we have developed a low-cost 6-DoF UUV platform and applied an RL policy trained through efficient parallelized simulation. Extensive simulation and real-world experiments validate the effectiveness and outstanding performance of EasyUUV in achieving robust and adaptive UUV attitude control across diverse underwater conditions. The source code is available from the following website: https://360zmem.github.io/easyuuv/
comment: 8 pages, 15 figures
RaycastGrasp: Eye-Gaze Interaction with Wearable Devices for Robotic Manipulation
Robotic manipulators are increasingly used to assist individuals with mobility impairments in object retrieval. However, the predominant joystick-based control interfaces can be challenging due to high precision requirements and unintuitive reference frames. Recent advances in human-robot interaction have explored alternative modalities, yet many solutions still rely on external screens or restrictive control schemes, limiting their intuitiveness and accessibility. To address these challenges, we present an egocentric, gaze-guided robotic manipulation interface that leverages a wearable Mixed Reality (MR) headset. Our system enables users to interact seamlessly with real-world objects using natural gaze fixation from a first-person perspective, while providing augmented visual cues to confirm intent and leveraging a pretrained vision model and robotic arm for intent recognition and object manipulation. Experimental results demonstrate that our approach significantly improves manipulation accuracy, reduces system latency, and achieves single-pass intention and object recognition accuracy greater than 88% across multiple real-world scenarios. These results demonstrate the system's effectiveness in enhancing intuitiveness and accessibility, underscoring its practical significance for assistive robotics applications.
comment: 5 pages, 5 figures; Accepted to: 2025 IEEE 4th International Conference on Intelligent Reality (ICIR 2025); Zitiantao Lin and Yongpeng Sang contributed equally to this work (co-first authors). Corresponding author: Yang Ye (y.ye@northeastern.edu)
Prognostic Framework for Robotic Manipulators Operating Under Dynamic Task Severities
Robotic manipulators are critical in many applications but are known to degrade over time. This degradation is influenced by the nature of the tasks performed by the robot. Tasks with higher severity, such as handling heavy payloads, can accelerate the degradation process. One way this degradation is reflected is in the position accuracy of the robot's end-effector. In this paper, we present a prognostic modeling framework that predicts a robotic manipulator's Remaining Useful Life (RUL) while accounting for the effects of task severity. Our framework represents the robot's position accuracy as a Brownian motion process with a random drift parameter that is influenced by task severity. The dynamic nature of task severity is modeled using a continuous-time Markov chain (CTMC). To evaluate RUL, we discuss two approaches -- (1) a novel closed-form expression for Remaining Lifetime Distribution (RLD), and (2) Monte Carlo simulations, commonly used in prognostics literature. Theoretical results establish the equivalence between these RUL computation approaches. We validate our framework through experiments using two distinct physics-based simulators for planar and spatial robot fleets. Our findings show that robots in both fleets experience shorter RUL when handling a higher proportion of high-severity tasks.
comment: Accepted for Publication in IEEE Transactions on Systems, Man, and Cybernetics: Systems
Using Non-Expert Data to Robustify Imitation Learning via Offline Reinforcement Learning
Imitation learning has proven effective for training robots to perform complex tasks from expert human demonstrations. However, it remains limited by its reliance on high-quality, task-specific data, restricting adaptability to the diverse range of real-world object configurations and scenarios. In contrast, non-expert data -- such as play data, suboptimal demonstrations, partial task completions, or rollouts from suboptimal policies -- can offer broader coverage and lower collection costs. However, conventional imitation learning approaches fail to utilize this data effectively. To address these challenges, we posit that with right design decisions, offline reinforcement learning can be used as a tool to harness non-expert data to enhance the performance of imitation learning policies. We show that while standard offline RL approaches can be ineffective at actually leveraging non-expert data under the sparse data coverage settings typically encountered in the real world, simple algorithmic modifications can allow for the utilization of this data, without significant additional assumptions. Our approach shows that broadening the support of the policy distribution can allow imitation algorithms augmented by offline RL to solve tasks robustly, showing considerably enhanced recovery and generalization behavior. In manipulation tasks, these innovations significantly increase the range of initial conditions where learned policies are successful when non-expert data is incorporated. Moreover, we show that these methods are able to leverage all collected data, including partial or suboptimal demonstrations, to bolster task-directed policy performance. This underscores the importance of algorithmic techniques for using non-expert data for robust policy learning in robotics. Website: https://uwrobotlearning.github.io/RISE-offline/
MOSAIC: Modular Foundation Models for Assistive and Interactive Cooking
We present MOSAIC, a modular architecture for coordinating multiple robots to (a) interact with users using natural language and (b) manipulate an open vocabulary of everyday objects. MOSAIC employs modularity at several levels: it leverages multiple large-scale pre-trained models for high-level tasks like language and image recognition, while using streamlined modules designed for low-level task-specific control. This decomposition allows us to reap the complementary benefits of foundation models as well as precise, more specialized models. Pieced together, our system is able to scale to complex tasks that involve coordinating multiple robots and humans. First, we unit-test individual modules with 180 episodes of visuomotor picking, 60 episodes of human motion forecasting, and 46 online user evaluations of the task planner. We then extensively evaluate MOSAIC with 60 end-to-end trials. We discuss crucial design decisions, limitations of the current system, and open challenges in this domain. The project's website is at https://portal-cornell.github.io/MOSAIC/
comment: 22 pages, 13 figures; CoRL 2024
GOPLA: Generalizable Object Placement Learning via Synthetic Augmentation of Human Arrangement
Robots are expected to serve as intelligent assistants, helping humans with everyday household organization. A central challenge in this setting is the task of object placement, which requires reasoning about both semantic preferences (e.g., common-sense object relations) and geometric feasibility (e.g., collision avoidance). We present GOPLA, a hierarchical framework that learns generalizable object placement from augmented human demonstrations. A multi-modal large language model translates human instructions and visual inputs into structured plans that specify pairwise object relationships. These plans are then converted into 3D affordance maps with geometric common sense by a spatial mapper, while a diffusion-based planner generates placement poses guided by test-time costs, considering multi-plan distributions and collision avoidance. To overcome data scarcity, we introduce a scalable pipeline that expands human placement demonstrations into diverse synthetic training data. Extensive experiments show that our approach improves placement success rates by 30.04 percentage points over the runner-up, evaluated on positioning accuracy and physical plausibility, demonstrating strong generalization across a wide range of real-world robotic placement scenarios.
Depth-Constrained ASV Navigation with Deep RL and Limited Sensing
Autonomous Surface Vehicles (ASVs) play a crucial role in maritime operations, yet their navigation in shallow-water environments remains challenging due to dynamic disturbances and depth constraints. Traditional navigation strategies struggle with limited sensor information, making safe and efficient operation difficult. In this paper, we propose a reinforcement learning (RL) framework for ASV navigation under depth constraints, where the vehicle must reach a target while avoiding unsafe areas with only a single depth measurement per timestep from a downward-facing Single Beam Echosounder (SBES). To enhance environmental awareness, we integrate Gaussian Process (GP) regression into the RL framework, enabling the agent to progressively estimate a bathymetric depth map from sparse sonar readings. This approach improves decision-making by providing a richer representation of the environment. Furthermore, we demonstrate effective sim-to-real transfer, ensuring that trained policies generalize well to real-world aquatic conditions. Experimental results validate our method's capability to improve ASV navigation performance while maintaining safety in challenging shallow-water environments.
comment: 8 pages, 8 figures, Accepted to IEEE Robotics and Automation Letters (this is not the final version)
Open-Set 3D Semantic Instance Maps for Vision Language Navigation -- O3D-SIM
Humans excel at forming mental maps of their surroundings, equipping them to understand object relationships and navigate based on language queries. Our previous work, SI Maps (Nanwani L, Agarwal A, Jain K, et al. Instance-level semantic maps for vision language navigation. In: 2023 32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN). IEEE; 2023 Aug.), showed that having instance-level information and the semantic understanding of an environment helps significantly improve performance for language-guided tasks. We extend this instance-level approach to 3D while increasing the pipeline's robustness and improving quantitative and qualitative results. Our method leverages foundational models for object recognition, image segmentation, and feature extraction. We propose a representation that results in a 3D point cloud map with instance-level embeddings, which bring in the semantic understanding that natural language commands can query. Quantitatively, the work improves upon the success rate of language-guided tasks. At the same time, we qualitatively observe the ability to identify instances more clearly and leverage the foundational models and language and image-aligned embeddings to identify objects that, otherwise, a closed-set approach wouldn't be able to identify. Project Page - https://smart-wheelchair-rrc.github.io/o3d-sim-webpage
TreeIRL: Safe Urban Driving with Tree Search and Inverse Reinforcement Learning
We present TreeIRL, a novel planner for autonomous driving that combines Monte Carlo tree search (MCTS) and inverse reinforcement learning (IRL) to achieve state-of-the-art performance in simulation and in real-world driving. The core idea is to use MCTS to find a promising set of safe candidate trajectories and a deep IRL scoring function to select the most human-like among them. We evaluate TreeIRL against both classical and state-of-the-art planners in large-scale simulations and on 500+ miles of real-world autonomous driving in the Las Vegas metropolitan area. Test scenarios include dense urban traffic, adaptive cruise control, cut-ins, and traffic lights. TreeIRL achieves the best overall performance, striking a balance between safety, progress, comfort, and human-likeness. To our knowledge, our work is the first demonstration of MCTS-based planning on public roads and underscores the importance of evaluating planners across a diverse set of metrics and in real-world environments. TreeIRL is highly extensible and could be further improved with reinforcement learning and imitation learning, providing a framework for exploring different combinations of classical and learning-based approaches to solve the planning bottleneck in autonomous driving.
DexSinGrasp: Learning a Unified Policy for Dexterous Object Singulation and Grasping in Densely Cluttered Environments
Grasping objects in cluttered environments remains a fundamental yet challenging problem in robotic manipulation. While prior works have explored learning-based synergies between pushing and grasping for two-fingered grippers, few have leveraged the high degrees of freedom (DoF) in dexterous hands to perform efficient singulation for grasping in cluttered settings. In this work, we introduce DexSinGrasp, a unified policy for dexterous object singulation and grasping. DexSinGrasp enables high-dexterity object singulation to facilitate grasping, significantly improving efficiency and effectiveness in cluttered environments. We incorporate clutter arrangement curriculum learning to enhance success rates and generalization across diverse clutter conditions, while policy distillation enables a deployable vision-based grasping strategy. To evaluate our approach, we introduce a set of cluttered grasping tasks with varying object arrangements and occlusion levels. Experimental results show that our method outperforms baselines in both efficiency and grasping success rate, particularly in dense clutter. Codes, appendix, and videos are available on our website https://nus-lins-lab.github.io/dexsingweb/.
Real-Time Knee Angle Prediction Using EMG and Kinematic Data with an Attention-Based CNN-LSTM Network and Transfer Learning Across Multiple Datasets
Electromyography (EMG) signals are widely used for predicting body joint angles through machine learning (ML) and deep learning (DL) methods. However, these approaches often face challenges such as limited real-time applicability, non-representative test conditions, and the need for large datasets to achieve optimal performance. This paper presents a transfer-learning framework for knee joint angle prediction that requires only a few gait cycles from new subjects. Three datasets - Georgia Tech, the University of California Irvine (UCI), and the Sharif Mechatronic Lab Exoskeleton (SMLE) - containing four EMG channels relevant to knee motion were utilized. A lightweight attention-based CNN-LSTM model was developed and pre-trained on the Georgia Tech dataset, then transferred to the UCI and SMLE datasets. The proposed model achieved Normalized Mean Absolute Errors (NMAE) of 6.8 percent and 13.7 percent for one-step and 50-step predictions on abnormal subjects using EMG inputs alone. Incorporating historical knee angles reduced the NMAE to 3.1 percent and 3.5 percent for normal subjects, and to 2.8 percent and 7.5 percent for abnormal subjects. When further adapted to the SMLE exoskeleton with EMG, kinematic, and interaction force inputs, the model achieved 1.09 percent and 3.1 percent NMAE for one- and 50-step predictions, respectively. These results demonstrate robust performance and strong generalization for both short- and long-term rehabilitation scenarios.
Robust Understanding of Human-Robot Social Interactions through Multimodal Distillation
There is a growing need for social robots and intelligent agents that can effectively interact with and support users. For the interactions to be seamless, the agents need to analyse social scenes and behavioural cues from their (robot's) perspective. Works that model human-agent interactions in social situations are few; and even those existing ones are computationally too intensive to be deployed in real time or perform poorly in real-world scenarios when only limited information is available. We propose a knowledge distillation framework that models social interactions through various multimodal cues, and yet is robust against incomplete and noisy information during inference. We train a teacher model with multimodal input (body, face and hand gestures, gaze, raw images) that transfers knowledge to a student model which relies solely on body pose. Extensive experiments on two publicly available human-robot interaction datasets demonstrate that our student model achieves an average accuracy gain of 14.75% over competitive baselines on multiple downstream social understanding tasks, even with up to 51% of its input being corrupted. The student model is also highly efficient - less than 1% in size of the teacher model in terms of parameters and its latency is 11.9% of the teacher model. Our code and related data are available at github.com/biantongfei/SocialEgoMobile.
comment: Accepted by ACM Multimedia 2025, camera-ready version
SceneComplete: Open-World 3D Scene Completion in Cluttered Real World Environments for Robot Manipulation
Careful robot manipulation in every-day cluttered environments requires an accurate understanding of the 3D scene, in order to grasp and place objects stably and reliably and to avoid colliding with other objects. In general, we must construct such a 3D interpretation of a complex scene based on limited input, such as a single RGB-D image. We describe SceneComplete, a system for constructing a complete, segmented, 3D model of a scene from a single view. SceneComplete is a novel pipeline for composing general-purpose pretrained perception modules (vision-language, segmentation, image-inpainting, image-to-3D, visual-descriptors and pose-estimation) to obtain highly accurate results. We demonstrate its accuracy and effectiveness with respect to ground-truth models in a large benchmark dataset and show that its accurate whole-object reconstruction enables robust grasp proposal generation, including for a dexterous hand. We release the code and additional results on our website.
CCDP: Composition of Conditional Diffusion Policies with Guided Sampling IROS 2025
Imitation Learning offers a promising approach to learn directly from data without requiring explicit models, simulations, or detailed task definitions. During inference, actions are sampled from the learned distribution and executed on the robot. However, sampled actions may fail for various reasons, and simply repeating the sampling step until a successful action is obtained can be inefficient. In this work, we propose an enhanced sampling strategy that refines the sampling distribution to avoid previously unsuccessful actions. We demonstrate that by solely utilizing data from successful demonstrations, our method can infer recovery actions without the need for additional exploratory behavior or a high-level controller. Furthermore, we leverage the concept of diffusion model decomposition to break down the primary problem, which may require long-horizon history to manage failures, into multiple smaller, more manageable sub-problems in learning, data collection, and inference, thereby enabling the system to adapt to variable failure counts. Our approach yields a low-level controller that dynamically adjusts its sampling space to improve efficiency when prior samples fall short. We validate our method across several tasks, including door opening with unknown directions, object manipulation, and button-searching scenarios, demonstrating that our approach outperforms traditional baselines.
comment: Accepted to IROS 2025
RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics NeurIPS 2025
Spatial referring is a fundamental capability of embodied robots to interact with the 3D physical world. However, even with the powerful pretrained vision language models (VLMs), recent approaches are still not qualified to accurately understand the complex 3D scenes and dynamically reason about the instruction-indicated locations for interaction. To this end, we propose RoboRefer, a 3D-aware VLM that can first achieve precise spatial understanding by integrating a disentangled but dedicated depth encoder via supervised fine-tuning (SFT). Moreover, RoboRefer advances generalized multi-step spatial reasoning via reinforcement fine-tuning (RFT), with metric-sensitive process reward functions tailored for spatial referring tasks. To support SFT and RFT training, we introduce RefSpatial, a large-scale dataset of 20M QA pairs (2x prior), covering 31 spatial relations (vs. 15 prior) and supporting complex reasoning processes (up to 5 steps). In addition, we introduce RefSpatial-Bench, a challenging benchmark filling the gap in evaluating spatial referring with multi-step reasoning. Experiments show that SFT-trained RoboRefer achieves state-of-the-art spatial understanding, with an average success rate of 89.6%. RFT-trained RoboRefer further outperforms all other baselines by a large margin, even surpassing Gemini-2.5-Pro by 17.4% in average accuracy on RefSpatial-Bench. Notably, RoboRefer can be integrated with various control policies to execute long-horizon, dynamic tasks across diverse robots (e,g., UR5, G1 humanoid) in cluttered real-world scenes. Please see the project page at https://zhoues.github.io/RoboRefer.
comment: Accepted by NeurIPS 2025. Project page: https://zhoues.github.io/RoboRefer/
Raw2Drive: Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2) NeurIPS 2025
Reinforcement Learning (RL) can mitigate the causal confusion and distribution shift inherent to imitation learning (IL). However, applying RL to end-to-end autonomous driving (E2E-AD) remains an open problem for its training difficulty, and IL is still the mainstream paradigm in both academia and industry. Recently Model-based Reinforcement Learning (MBRL) have demonstrated promising results in neural planning; however, these methods typically require privileged information as input rather than raw sensor data. We fill this gap by designing Raw2Drive, a dual-stream MBRL approach. Initially, we efficiently train an auxiliary privileged world model paired with a neural planner that uses privileged information as input. Subsequently, we introduce a raw sensor world model trained via our proposed Guidance Mechanism, which ensures consistency between the raw sensor world model and the privileged world model during rollouts. Finally, the raw sensor world model combines the prior knowledge embedded in the heads of the privileged world model to effectively guide the training of the raw sensor policy. Raw2Drive is so far the only RL based end-to-end method on CARLA Leaderboard 2.0, and Bench2Drive and it achieves state-of-the-art performance.
comment: Accepted by NeurIPS 2025
Systems and Control (CS)
Resilient Composite Control for Stability Enhancement in EV Integrated DC Microgrids
When electric vehicles (EVs) are integrated into standalone DC microgrids (DCMGs), stability issues arise due to their constant power load (CPL) behavior, which provides negative incremental impedance (NII). In addition, the microgrids suffer from an inherent low-inertia problem. Therefore, this study presents a composite controller incorporating a global integral terminal sliding mode controller with a backstepping controller. A virtual capacitor is employed to mitigate the low-inertia issue and strengthen the DC-bus response. An improved fractional power-based reaching law decreases chattering and accelerates convergence. Exact feedback linearization converts the nonlinear boost converter model into Brunovsky's canonical form, mitigating NII effects and non-minimum phase issues. The entire system stability is verified using Lyapunov control theory. Simulation outcomes confirm superior performance, with 34.4-53.3% reduction in overshoot, 52.9-74.9% in undershoot, and 12-47.4% in settling time compared to the existing controller.
A Novel Multi-Timescale Stability-Preserving Hierarchical Reinforcement Learning Controller Framework for Adaptive Control in High-Dimensional Dynamical Systems
Controlling high-dimensional stochastic systems, critical in robotics, autonomous vehicles, and hyperchaotic systems, faces the curse of dimensionality, lacks temporal abstraction, and often fails to ensure stochastic stability. To overcome these limitations, this study introduces the Multi-Timescale Lyapunov-Constrained Hierarchical Reinforcement Learning (MTLHRL) framework. MTLHRL integrates a hierarchical policy within a semi-Markov Decision Process (SMDP), featuring a high-level policy for strategic planning and a low-level policy for reactive control, which effectively manages complex, multi-timescale decision-making and reduces dimensionality overhead. Stability is rigorously enforced using a neural Lyapunov function optimized via Lagrangian relaxation and multi-timescale actor-critic updates, ensuring mean-square boundedness or asymptotic stability in the face of stochastic dynamics. The framework promotes efficient and reliable learning through trust-region constraints and decoupled optimization. Extensive simulations on an 8D hyperchaotic system and a 5-DOF robotic manipulator demonstrate MTLHRL's empirical superiority. It significantly outperforms baseline methods in both stability and performance, recording the lowest error indices (e.g., Integral Absolute Error (IAE): 3.912 in hyperchaotic control and IAE: 1.623 in robotics), achieving faster convergence, and exhibiting superior disturbance rejection. MTLHRL offers a theoretically grounded and practically viable solution for robust control of complex stochastic systems.
Politics, Inequality, and the Robustness of Shared Infrastructure Systems
Our infrastructure systems enable our well-being by allowing us to move, store, and transform materials and information given considerable social and environmental variation. Critically, this ability is shaped by the degree to which society invests in infrastructure, a fundamentally political question in large public systems. There, infrastructure providers are distinguished from users through political processes, such as elections, and there is considerable heterogeneity among users. Previous political economic models have not taken into account (i) dynamic infrastructures, (ii) dynamic user preferences, and (iii) alternatives to rational actor theory. Meanwhile, engineering often neglects politics. We address these gaps with a general dynamic model of shared infrastructure systems that incorporates theories from political economy, social-ecological systems, and political psychology. We use the model to develop propositions on how multiple characteristics of the political process impact the robustness of shared infrastructure systems to capacity shocks and unequal opportunity for private infrastructure investment. Under user fees, inequality decreases robustness, but taxing private infrastructure use can increase robustness if non-elites have equal political influence. Election cycle periods have a nonlinear effect where increasing them increases robustness up to a point but decreases robustness beyond that point. Further, there is a negative relationship between the ideological sensitivity of candidates and robustness. Overall, the biases of voters and candidates (whether they favor tax increases or decreases) mediate these political-economic effects on robustness because biases may or may not match the reality of system needs (whether system recovery requires tax increases).
Vector-Valued Native Space Embedding for Adaptive State Observation
This paper combines vector-valued reproducing kernel Hilbert space (vRKHS) embedding with robust adaptive observation, yielding an algorithm that is both non-parametric and robust. The main contribution of this paper lies in the ability of the proposed system to estimate the state of a plan model whose matched uncertainties are elements of an infinite-dimensional native space. The plant model considered in this paper also suffers from unmatched uncertainties. Finally, the measured output is affected by disturbances as well. Upper bounds on the state observation error are provided in an analytical form. The proposed theoretical results are applied to the problem of estimating the state of a rigid body.
Model-Free Power System Stability Enhancement with Dissipativity-Based Neural Control SC
The integration of converter-interfaced generation introduces new transient stability challenges to modern power systems. Classical Lyapunov- and scalable passivity-based approaches typically rely on restrictive assumptions, and finding storage functions for large grids is generally considered intractable. Furthermore, most methods require an accurate grid dynamics model. To address these challenges, we propose a model-free, nonlinear, and dissipativity-based controller which, when applied to grid-connected virtual synchronous generators (VSGs), enhances power system transient stability. Using input-state data, we train neural networks to learn dissipativity-characterizing matrices that yield stabilizing controllers. Furthermore, we incorporate cost function shaping to improve the performance with respect to the user-specified objectives. Numerical results on a modified, all-VSG Kundur two-area power system validate the effectiveness of the proposed approach.
comment: 7 pages, 6 figures, submitted to the 24th Power Systems Computation Conference (PSCC 2026)
Fair Cost Allocation in Energy Communities: A DLMP-based Bilevel Optimization with a Shapley Value Approach
Energy communities (ECs) are emerging as a promising decentralized model for managing cooperative distributed energy resources (DERs). As these communities expand and their operations become increasingly integrated into the grid, ensuring fairness in allocating operating costs among participants becomes a challenge. In distribution networks, DER operations at the community level can influence Distribution Locational Marginal Prices (DLMPs), which in turn affect system's operation cost. This interdependence between local decisions and system-level pricing introduces new challenges for fair and transparent cost allocation. Despite growing interest in fairness-aware methods, most methods do not account for the impact of DLMPs. To fill this gap, we propose a bilevel optimization model in which a Community Energy Aggregator (CEA) schedules DERs across multiple ECs while a Distribution System Operator (DSO) determines DLMPs through network-constrained dispatch. Leveraging the Karush-Kuhn-Tucker (KKT) conditions and strong duality, the bilevel model is reformulated into a tractable single-level problem. We achieve fairness in the cost allocation by applying the Shapley value to quantify each community's marginal contribution to system-wide cost savings. The effectiveness of the proposed method is validated through simulations on several benchmark distribution systems.
Adapting Noise-Driven PUF and AI for Secure WBG ICS: A Proof-of-Concept Study
Wide-bandgap (WBG) technologies offer unprecedented improvements in power system efficiency, size, and performance, but also introduce unique sensor corruption and cybersecurity risks in industrial control systems (ICS), particularly due to high-frequency noise and sophisticated cyber-physical threats. This proof-of-concept (PoC) study demonstrates the adaptation of a noise-driven physically unclonable function (PUF) and machine learning (ML)-assisted anomaly detection framework to the demanding environment of WBG-based ICS sensor pathways. By extracting entropy from unavoidable WBG switching noise (up to 100 kHz) as a PUF source, and simultaneously using this noise as a real-time threat indicator, the proposed system unites hardware-level authentication and anomaly detection. Our approach integrates hybrid machine learning (ML) models with adaptive Bayesian filtering, providing robust and low-latency detection capabilities resilient to both natural electromagnetic interference (EMI) and active adversarial manipulation. Through detailed simulations of WBG modules under benign and attack scenarios--including EMI injection, signal tampering, and node impersonation--we achieve 95% detection accuracy and sub-millisecond processing latency. These results demonstrate the feasibility of physics-driven, dual-use noise exploitation as a scalable ICS defense primitive. Our findings lay the groundwork for next-generation security strategies that leverage inherent device characteristics, bridging hardware and artificial intelligence (AI) for enhanced protection of critical ICS infrastructure.
Distributed Stochastic Proximal Algorithm on Riemannian Submanifolds for Weakly-convex Functions
This paper aims to investigate the distributed stochastic optimization problems on compact embedded submanifolds (in the Euclidean space) for multi-agent network systems. To address the manifold structure, we propose a distributed Riemannian stochastic proximal algorithm framework by utilizing the retraction and Riemannian consensus protocol, and analyze three specific algorithms: the distributed Riemannian stochastic subgradient, proximal point, and prox-linear algorithms. When the local costs are weakly-convex and the initial points satisfy certain conditions, we show that the iterates generated by this framework converge to a nearly stationary point in expectation while achieving consensus. We further establish the convergence rate of the algorithm framework as $\mathcal{O}(\frac{1+\kappa_g}{\sqrt{k}})$ where $k$ denotes the number of iterations and $\kappa_g$ shows the impact of manifold geometry on the algorithm performance. Finally, numerical experiments are implemented to demonstrate the theoretical results and show the empirical performance.
Taming Silent Failures: A Framework for Verifiable AI Reliability
The integration of Artificial Intelligence (AI) into safety-critical systems introduces a new reliability paradigm: silent failures, where AI produces confident but incorrect outputs that can be dangerous. This paper introduces the Formal Assurance and Monitoring Environment (FAME), a novel framework that confronts this challenge. FAME synergizes the mathematical rigor of offline formal synthesis with the vigilance of online runtime monitoring to create a verifiable safety net around opaque AI components. We demonstrate its efficacy in an autonomous vehicle perception system, where FAME successfully detected 93.5% of critical safety violations that were otherwise silent. By contextualizing our framework within the ISO 26262 and ISO/PAS 8800 standards, we provide reliability engineers with a practical, certifiable pathway for deploying trustworthy AI. FAME represents a crucial shift from accepting probabilistic performance to enforcing provable safety in next-generation systems.
comment: This preprint has been accepted by IEEE Reliability Magazine. 10 pages, 3 figures
TRASE-NODEs: Trajectory Sensitivity-aware Neural Ordinary Differential Equations for Efficient Dynamic Modeling
Modeling dynamical systems is crucial across the science and engineering fields for accurate prediction, control, and decision-making. Recently, machine learning (ML) approaches, particularly neural ordinary differential equations (NODEs), have emerged as a powerful tool for data-driven modeling of continuous-time dynamics. Nevertheless, standard NODEs require a large number of data samples to remain consistent under varying control inputs, posing challenges to generate sufficient simulated data and ensure the safety of control design. To address this gap, we propose trajectory-sensitivity-aware (TRASE-)NODEs, which construct an augmented system for both state and sensitivity, enabling simultaneous learning of their dynamics. This formulation allows the adjoint method to update gradients in a memory-efficient manner and ensures that control-input effects are captured in the learned dynamics. We evaluate TRASE-NODEs using damped oscillator and inverter-based resources (IBRs). The results show that TRASE-NODEs generalize better from the limited training data, yielding lower prediction errors than standard NODEs for both examples. The proposed framework offers a data-efficient, control-oriented modeling approach suitable for dynamic systems that require accurate trajectory sensitivity prediction.
From Time Series to Affine Systems
The paper extends core results of behavioral systems theory from linear to affine time-invariant systems. We characterize the behavior of affine time-invariant systems via kernel, input-output, state-space, and finite-horizon data-driven representations, demonstrating a range of structural parallels with linear time-invariant systems. Building on these representations, we introduce a new persistence of excitation condition tailored to the model class of affine time-invariant systems. The condition yields a new fundamental lemma that parallels the classical result for linear systems while provably reducing data requirements. Our analysis highlights that excitation conditions must be adapted to the model class: overlooking structural differences may lead to unnecessarily conservative data requirements.
comment: Submitted to the IEEE Transactions on Automatic Control
Prognostic Framework for Robotic Manipulators Operating Under Dynamic Task Severities
Robotic manipulators are critical in many applications but are known to degrade over time. This degradation is influenced by the nature of the tasks performed by the robot. Tasks with higher severity, such as handling heavy payloads, can accelerate the degradation process. One way this degradation is reflected is in the position accuracy of the robot's end-effector. In this paper, we present a prognostic modeling framework that predicts a robotic manipulator's Remaining Useful Life (RUL) while accounting for the effects of task severity. Our framework represents the robot's position accuracy as a Brownian motion process with a random drift parameter that is influenced by task severity. The dynamic nature of task severity is modeled using a continuous-time Markov chain (CTMC). To evaluate RUL, we discuss two approaches -- (1) a novel closed-form expression for Remaining Lifetime Distribution (RLD), and (2) Monte Carlo simulations, commonly used in prognostics literature. Theoretical results establish the equivalence between these RUL computation approaches. We validate our framework through experiments using two distinct physics-based simulators for planar and spatial robot fleets. Our findings show that robots in both fleets experience shorter RUL when handling a higher proportion of high-severity tasks.
comment: Accepted for Publication in IEEE Transactions on Systems, Man, and Cybernetics: Systems
A general framework for supporting economic feasibility of generator and storage energy systems through capacity and dispatch optimization
Integration of various electricity-generating technologies (such as natural gas, wind, nuclear, etc.) with storage systems (such as thermal, battery electric, hydrogen, etc.) has the potential to improve the economic competitiveness of modern energy systems. Driven by the need to efficiently assess the economic feasibility of various energy system configurations in early system concept development, this work outlines a versatile computational framework for assessing the net present value of various integrated storage technologies. The subsystems' fundamental dynamics are defined, with a particular emphasis on balancing critical physical and economic domains to enable optimal decision-making in the context of capacity and dispatch optimization. In its presented form, the framework formulates a linear, convex optimization problem that can be efficiently solved using a direct transcription approach in the open-source software DTQP. Three case studies demonstrate and validate the framework's capabilities, highlighting its value and computational efficiency in facilitating the economic assessment of various energy system configurations. In particular, natural gas with thermal storage and carbon capture, wind energy with battery storage, and nuclear with hydrogen are demonstrated.
comment: 16 pages, 10 figures
Accelerated Gradient Methods for Nonconvex Optimization: Escape Trajectories From Strict Saddle Points and Convergence to Local Minima
This paper considers the problem of understanding the behavior of a general class of accelerated gradient methods on smooth nonconvex functions. Motivated by some recent works that have proposed effective algorithms, based on Polyak's heavy ball method and the Nesterov accelerated gradient method, to achieve convergence to a local minimum of nonconvex functions, this work proposes a broad class of Nesterov-type accelerated methods and puts forth a rigorous study of these methods encompassing the escape from saddle points and convergence to local minima through both an asymptotic and a non-asymptotic analysis. In the asymptotic regime, this paper answers an open question of whether Nesterov's accelerated gradient method (NAG) with variable momentum parameter avoids strict saddle points almost surely. This work also develops two metrics of asymptotic rates of convergence and divergence, and evaluates these two metrics for several popular standard accelerated methods such as the NAG and Nesterov's accelerated gradient with constant momentum (NCM) near strict saddle points. In the non-asymptotic regime, this work provides an analysis that leads to the "linear" exit time estimates from strict saddle neighborhoods for trajectories of these accelerated methods as well the necessary conditions for the existence of such trajectories. Finally, this work studies a sub-class of accelerated methods that can converge in convex neighborhoods of nonconvex functions with a near optimal rate to a local minimum and at the same time this sub-class offers superior saddle-escape behavior compared to that of NAG.
comment: 122 pages, 20 figures; accepted for publication in Foundations of Computational Mathematics (FoCM)
Security of Gradient Tracking Algorithms Against Malicious Agents
Consensus algorithms are fundamental to multi-agent distributed optimization, and their security under adversarial conditions is an active area of research. While prior works primarily establish conditions for successful global consensus under attack, little is known about system behavior when these conditions are violated. This paper addresses this gap by investigating the robustness of the Wang--Elia algorithm, which is a robust to noise version of gradient tracking algorithm, in the presence of malicious agents. We consider a network of agents collaboratively minimizing a global cost function, where a subset of agents may transmit faulty information to disrupt consensus. To quantify resilience, we formulate a security metric as an optimization problem, which is rooted in centralized attack detection literature. We provide a tractable reformulation of the optimization problem, and derive conditions under which the metric becomes unbounded, identifying undetectable attack signals that reveal inherent vulnerabilities. To facilitate design and analysis, we propose a well-posed variant of the metric and propose design methods to enhance network robustness against stealthy adversarial attacks. Numerical examples demonstrate the effectiveness of the proposed framework to enhance the resilience of multi-agent distributed optimization.
comment: under review
$Δ_T$ Noise in Mesoscopic Hybrid Junctions: Influence of Barrier Strength and Thermal Bias
Quantum noise is a fundamental probe of quantum transport phenomena, offering insights into current correlations and wave-particle duality. A particularly intriguing form of such noise, $\Delta_T$ noise, emerges under a finite temperature difference in the absence of charge current at zero voltage bias. In this work, we investigate $\Delta_T$ noise in mesoscopic hybrid junctions incorporating insulating barriers, where the average charge current remains zero at zero bias. Using quantum shot noise measurements, we demonstrate that $\Delta_T$ noise in metal-insulator-superconductor (NIS) junctions is approximately $16$ times greater than in metal-insulator-metal (NIN) counterparts. Our analysis further reveals that $\Delta_T$ noise exhibits a non-monotonic dependence on barrier strength, rising to a peak before declining, while increasing monotonically with the applied temperature bias. These findings underscore the rich interplay between thermal gradients and barrier properties in determining quantum noise characteristics in hybrid mesoscopic systems.
comment: 19 pages, 6 figures, accepted for publication in J. Phys. Condensed Matter (2025)
Systems and Control (EESS)
Resilient Composite Control for Stability Enhancement in EV Integrated DC Microgrids
When electric vehicles (EVs) are integrated into standalone DC microgrids (DCMGs), stability issues arise due to their constant power load (CPL) behavior, which provides negative incremental impedance (NII). In addition, the microgrids suffer from an inherent low-inertia problem. Therefore, this study presents a composite controller incorporating a global integral terminal sliding mode controller with a backstepping controller. A virtual capacitor is employed to mitigate the low-inertia issue and strengthen the DC-bus response. An improved fractional power-based reaching law decreases chattering and accelerates convergence. Exact feedback linearization converts the nonlinear boost converter model into Brunovsky's canonical form, mitigating NII effects and non-minimum phase issues. The entire system stability is verified using Lyapunov control theory. Simulation outcomes confirm superior performance, with 34.4-53.3% reduction in overshoot, 52.9-74.9% in undershoot, and 12-47.4% in settling time compared to the existing controller.
A Novel Multi-Timescale Stability-Preserving Hierarchical Reinforcement Learning Controller Framework for Adaptive Control in High-Dimensional Dynamical Systems
Controlling high-dimensional stochastic systems, critical in robotics, autonomous vehicles, and hyperchaotic systems, faces the curse of dimensionality, lacks temporal abstraction, and often fails to ensure stochastic stability. To overcome these limitations, this study introduces the Multi-Timescale Lyapunov-Constrained Hierarchical Reinforcement Learning (MTLHRL) framework. MTLHRL integrates a hierarchical policy within a semi-Markov Decision Process (SMDP), featuring a high-level policy for strategic planning and a low-level policy for reactive control, which effectively manages complex, multi-timescale decision-making and reduces dimensionality overhead. Stability is rigorously enforced using a neural Lyapunov function optimized via Lagrangian relaxation and multi-timescale actor-critic updates, ensuring mean-square boundedness or asymptotic stability in the face of stochastic dynamics. The framework promotes efficient and reliable learning through trust-region constraints and decoupled optimization. Extensive simulations on an 8D hyperchaotic system and a 5-DOF robotic manipulator demonstrate MTLHRL's empirical superiority. It significantly outperforms baseline methods in both stability and performance, recording the lowest error indices (e.g., Integral Absolute Error (IAE): 3.912 in hyperchaotic control and IAE: 1.623 in robotics), achieving faster convergence, and exhibiting superior disturbance rejection. MTLHRL offers a theoretically grounded and practically viable solution for robust control of complex stochastic systems.
Politics, Inequality, and the Robustness of Shared Infrastructure Systems
Our infrastructure systems enable our well-being by allowing us to move, store, and transform materials and information given considerable social and environmental variation. Critically, this ability is shaped by the degree to which society invests in infrastructure, a fundamentally political question in large public systems. There, infrastructure providers are distinguished from users through political processes, such as elections, and there is considerable heterogeneity among users. Previous political economic models have not taken into account (i) dynamic infrastructures, (ii) dynamic user preferences, and (iii) alternatives to rational actor theory. Meanwhile, engineering often neglects politics. We address these gaps with a general dynamic model of shared infrastructure systems that incorporates theories from political economy, social-ecological systems, and political psychology. We use the model to develop propositions on how multiple characteristics of the political process impact the robustness of shared infrastructure systems to capacity shocks and unequal opportunity for private infrastructure investment. Under user fees, inequality decreases robustness, but taxing private infrastructure use can increase robustness if non-elites have equal political influence. Election cycle periods have a nonlinear effect where increasing them increases robustness up to a point but decreases robustness beyond that point. Further, there is a negative relationship between the ideological sensitivity of candidates and robustness. Overall, the biases of voters and candidates (whether they favor tax increases or decreases) mediate these political-economic effects on robustness because biases may or may not match the reality of system needs (whether system recovery requires tax increases).
Vector-Valued Native Space Embedding for Adaptive State Observation
This paper combines vector-valued reproducing kernel Hilbert space (vRKHS) embedding with robust adaptive observation, yielding an algorithm that is both non-parametric and robust. The main contribution of this paper lies in the ability of the proposed system to estimate the state of a plan model whose matched uncertainties are elements of an infinite-dimensional native space. The plant model considered in this paper also suffers from unmatched uncertainties. Finally, the measured output is affected by disturbances as well. Upper bounds on the state observation error are provided in an analytical form. The proposed theoretical results are applied to the problem of estimating the state of a rigid body.
Model-Free Power System Stability Enhancement with Dissipativity-Based Neural Control SC
The integration of converter-interfaced generation introduces new transient stability challenges to modern power systems. Classical Lyapunov- and scalable passivity-based approaches typically rely on restrictive assumptions, and finding storage functions for large grids is generally considered intractable. Furthermore, most methods require an accurate grid dynamics model. To address these challenges, we propose a model-free, nonlinear, and dissipativity-based controller which, when applied to grid-connected virtual synchronous generators (VSGs), enhances power system transient stability. Using input-state data, we train neural networks to learn dissipativity-characterizing matrices that yield stabilizing controllers. Furthermore, we incorporate cost function shaping to improve the performance with respect to the user-specified objectives. Numerical results on a modified, all-VSG Kundur two-area power system validate the effectiveness of the proposed approach.
comment: 7 pages, 6 figures, submitted to the 24th Power Systems Computation Conference (PSCC 2026)
Fair Cost Allocation in Energy Communities: A DLMP-based Bilevel Optimization with a Shapley Value Approach
Energy communities (ECs) are emerging as a promising decentralized model for managing cooperative distributed energy resources (DERs). As these communities expand and their operations become increasingly integrated into the grid, ensuring fairness in allocating operating costs among participants becomes a challenge. In distribution networks, DER operations at the community level can influence Distribution Locational Marginal Prices (DLMPs), which in turn affect system's operation cost. This interdependence between local decisions and system-level pricing introduces new challenges for fair and transparent cost allocation. Despite growing interest in fairness-aware methods, most methods do not account for the impact of DLMPs. To fill this gap, we propose a bilevel optimization model in which a Community Energy Aggregator (CEA) schedules DERs across multiple ECs while a Distribution System Operator (DSO) determines DLMPs through network-constrained dispatch. Leveraging the Karush-Kuhn-Tucker (KKT) conditions and strong duality, the bilevel model is reformulated into a tractable single-level problem. We achieve fairness in the cost allocation by applying the Shapley value to quantify each community's marginal contribution to system-wide cost savings. The effectiveness of the proposed method is validated through simulations on several benchmark distribution systems.
Adapting Noise-Driven PUF and AI for Secure WBG ICS: A Proof-of-Concept Study
Wide-bandgap (WBG) technologies offer unprecedented improvements in power system efficiency, size, and performance, but also introduce unique sensor corruption and cybersecurity risks in industrial control systems (ICS), particularly due to high-frequency noise and sophisticated cyber-physical threats. This proof-of-concept (PoC) study demonstrates the adaptation of a noise-driven physically unclonable function (PUF) and machine learning (ML)-assisted anomaly detection framework to the demanding environment of WBG-based ICS sensor pathways. By extracting entropy from unavoidable WBG switching noise (up to 100 kHz) as a PUF source, and simultaneously using this noise as a real-time threat indicator, the proposed system unites hardware-level authentication and anomaly detection. Our approach integrates hybrid machine learning (ML) models with adaptive Bayesian filtering, providing robust and low-latency detection capabilities resilient to both natural electromagnetic interference (EMI) and active adversarial manipulation. Through detailed simulations of WBG modules under benign and attack scenarios--including EMI injection, signal tampering, and node impersonation--we achieve 95% detection accuracy and sub-millisecond processing latency. These results demonstrate the feasibility of physics-driven, dual-use noise exploitation as a scalable ICS defense primitive. Our findings lay the groundwork for next-generation security strategies that leverage inherent device characteristics, bridging hardware and artificial intelligence (AI) for enhanced protection of critical ICS infrastructure.
Distributed Stochastic Proximal Algorithm on Riemannian Submanifolds for Weakly-convex Functions
This paper aims to investigate the distributed stochastic optimization problems on compact embedded submanifolds (in the Euclidean space) for multi-agent network systems. To address the manifold structure, we propose a distributed Riemannian stochastic proximal algorithm framework by utilizing the retraction and Riemannian consensus protocol, and analyze three specific algorithms: the distributed Riemannian stochastic subgradient, proximal point, and prox-linear algorithms. When the local costs are weakly-convex and the initial points satisfy certain conditions, we show that the iterates generated by this framework converge to a nearly stationary point in expectation while achieving consensus. We further establish the convergence rate of the algorithm framework as $\mathcal{O}(\frac{1+\kappa_g}{\sqrt{k}})$ where $k$ denotes the number of iterations and $\kappa_g$ shows the impact of manifold geometry on the algorithm performance. Finally, numerical experiments are implemented to demonstrate the theoretical results and show the empirical performance.
Taming Silent Failures: A Framework for Verifiable AI Reliability
The integration of Artificial Intelligence (AI) into safety-critical systems introduces a new reliability paradigm: silent failures, where AI produces confident but incorrect outputs that can be dangerous. This paper introduces the Formal Assurance and Monitoring Environment (FAME), a novel framework that confronts this challenge. FAME synergizes the mathematical rigor of offline formal synthesis with the vigilance of online runtime monitoring to create a verifiable safety net around opaque AI components. We demonstrate its efficacy in an autonomous vehicle perception system, where FAME successfully detected 93.5% of critical safety violations that were otherwise silent. By contextualizing our framework within the ISO 26262 and ISO/PAS 8800 standards, we provide reliability engineers with a practical, certifiable pathway for deploying trustworthy AI. FAME represents a crucial shift from accepting probabilistic performance to enforcing provable safety in next-generation systems.
comment: This preprint has been accepted by IEEE Reliability Magazine. 10 pages, 3 figures
TRASE-NODEs: Trajectory Sensitivity-aware Neural Ordinary Differential Equations for Efficient Dynamic Modeling
Modeling dynamical systems is crucial across the science and engineering fields for accurate prediction, control, and decision-making. Recently, machine learning (ML) approaches, particularly neural ordinary differential equations (NODEs), have emerged as a powerful tool for data-driven modeling of continuous-time dynamics. Nevertheless, standard NODEs require a large number of data samples to remain consistent under varying control inputs, posing challenges to generate sufficient simulated data and ensure the safety of control design. To address this gap, we propose trajectory-sensitivity-aware (TRASE-)NODEs, which construct an augmented system for both state and sensitivity, enabling simultaneous learning of their dynamics. This formulation allows the adjoint method to update gradients in a memory-efficient manner and ensures that control-input effects are captured in the learned dynamics. We evaluate TRASE-NODEs using damped oscillator and inverter-based resources (IBRs). The results show that TRASE-NODEs generalize better from the limited training data, yielding lower prediction errors than standard NODEs for both examples. The proposed framework offers a data-efficient, control-oriented modeling approach suitable for dynamic systems that require accurate trajectory sensitivity prediction.
From Time Series to Affine Systems
The paper extends core results of behavioral systems theory from linear to affine time-invariant systems. We characterize the behavior of affine time-invariant systems via kernel, input-output, state-space, and finite-horizon data-driven representations, demonstrating a range of structural parallels with linear time-invariant systems. Building on these representations, we introduce a new persistence of excitation condition tailored to the model class of affine time-invariant systems. The condition yields a new fundamental lemma that parallels the classical result for linear systems while provably reducing data requirements. Our analysis highlights that excitation conditions must be adapted to the model class: overlooking structural differences may lead to unnecessarily conservative data requirements.
comment: Submitted to the IEEE Transactions on Automatic Control
Prognostic Framework for Robotic Manipulators Operating Under Dynamic Task Severities
Robotic manipulators are critical in many applications but are known to degrade over time. This degradation is influenced by the nature of the tasks performed by the robot. Tasks with higher severity, such as handling heavy payloads, can accelerate the degradation process. One way this degradation is reflected is in the position accuracy of the robot's end-effector. In this paper, we present a prognostic modeling framework that predicts a robotic manipulator's Remaining Useful Life (RUL) while accounting for the effects of task severity. Our framework represents the robot's position accuracy as a Brownian motion process with a random drift parameter that is influenced by task severity. The dynamic nature of task severity is modeled using a continuous-time Markov chain (CTMC). To evaluate RUL, we discuss two approaches -- (1) a novel closed-form expression for Remaining Lifetime Distribution (RLD), and (2) Monte Carlo simulations, commonly used in prognostics literature. Theoretical results establish the equivalence between these RUL computation approaches. We validate our framework through experiments using two distinct physics-based simulators for planar and spatial robot fleets. Our findings show that robots in both fleets experience shorter RUL when handling a higher proportion of high-severity tasks.
comment: Accepted for Publication in IEEE Transactions on Systems, Man, and Cybernetics: Systems
A general framework for supporting economic feasibility of generator and storage energy systems through capacity and dispatch optimization
Integration of various electricity-generating technologies (such as natural gas, wind, nuclear, etc.) with storage systems (such as thermal, battery electric, hydrogen, etc.) has the potential to improve the economic competitiveness of modern energy systems. Driven by the need to efficiently assess the economic feasibility of various energy system configurations in early system concept development, this work outlines a versatile computational framework for assessing the net present value of various integrated storage technologies. The subsystems' fundamental dynamics are defined, with a particular emphasis on balancing critical physical and economic domains to enable optimal decision-making in the context of capacity and dispatch optimization. In its presented form, the framework formulates a linear, convex optimization problem that can be efficiently solved using a direct transcription approach in the open-source software DTQP. Three case studies demonstrate and validate the framework's capabilities, highlighting its value and computational efficiency in facilitating the economic assessment of various energy system configurations. In particular, natural gas with thermal storage and carbon capture, wind energy with battery storage, and nuclear with hydrogen are demonstrated.
comment: 16 pages, 10 figures
Accelerated Gradient Methods for Nonconvex Optimization: Escape Trajectories From Strict Saddle Points and Convergence to Local Minima
This paper considers the problem of understanding the behavior of a general class of accelerated gradient methods on smooth nonconvex functions. Motivated by some recent works that have proposed effective algorithms, based on Polyak's heavy ball method and the Nesterov accelerated gradient method, to achieve convergence to a local minimum of nonconvex functions, this work proposes a broad class of Nesterov-type accelerated methods and puts forth a rigorous study of these methods encompassing the escape from saddle points and convergence to local minima through both an asymptotic and a non-asymptotic analysis. In the asymptotic regime, this paper answers an open question of whether Nesterov's accelerated gradient method (NAG) with variable momentum parameter avoids strict saddle points almost surely. This work also develops two metrics of asymptotic rates of convergence and divergence, and evaluates these two metrics for several popular standard accelerated methods such as the NAG and Nesterov's accelerated gradient with constant momentum (NCM) near strict saddle points. In the non-asymptotic regime, this work provides an analysis that leads to the "linear" exit time estimates from strict saddle neighborhoods for trajectories of these accelerated methods as well the necessary conditions for the existence of such trajectories. Finally, this work studies a sub-class of accelerated methods that can converge in convex neighborhoods of nonconvex functions with a near optimal rate to a local minimum and at the same time this sub-class offers superior saddle-escape behavior compared to that of NAG.
comment: 122 pages, 20 figures; accepted for publication in Foundations of Computational Mathematics (FoCM)
Security of Gradient Tracking Algorithms Against Malicious Agents
Consensus algorithms are fundamental to multi-agent distributed optimization, and their security under adversarial conditions is an active area of research. While prior works primarily establish conditions for successful global consensus under attack, little is known about system behavior when these conditions are violated. This paper addresses this gap by investigating the robustness of the Wang--Elia algorithm, which is a robust to noise version of gradient tracking algorithm, in the presence of malicious agents. We consider a network of agents collaboratively minimizing a global cost function, where a subset of agents may transmit faulty information to disrupt consensus. To quantify resilience, we formulate a security metric as an optimization problem, which is rooted in centralized attack detection literature. We provide a tractable reformulation of the optimization problem, and derive conditions under which the metric becomes unbounded, identifying undetectable attack signals that reveal inherent vulnerabilities. To facilitate design and analysis, we propose a well-posed variant of the metric and propose design methods to enhance network robustness against stealthy adversarial attacks. Numerical examples demonstrate the effectiveness of the proposed framework to enhance the resilience of multi-agent distributed optimization.
comment: under review
$Δ_T$ Noise in Mesoscopic Hybrid Junctions: Influence of Barrier Strength and Thermal Bias
Quantum noise is a fundamental probe of quantum transport phenomena, offering insights into current correlations and wave-particle duality. A particularly intriguing form of such noise, $\Delta_T$ noise, emerges under a finite temperature difference in the absence of charge current at zero voltage bias. In this work, we investigate $\Delta_T$ noise in mesoscopic hybrid junctions incorporating insulating barriers, where the average charge current remains zero at zero bias. Using quantum shot noise measurements, we demonstrate that $\Delta_T$ noise in metal-insulator-superconductor (NIS) junctions is approximately $16$ times greater than in metal-insulator-metal (NIN) counterparts. Our analysis further reveals that $\Delta_T$ noise exhibits a non-monotonic dependence on barrier strength, rising to a peak before declining, while increasing monotonically with the applied temperature bias. These findings underscore the rich interplay between thermal gradients and barrier properties in determining quantum noise characteristics in hybrid mesoscopic systems.
comment: 19 pages, 6 figures, accepted for publication in J. Phys. Condensed Matter (2025)
Multiagent Systems
Hollywood Town: Long-Video Generation via Cross-Modal Multi-Agent Orchestration
Recent advancements in multi-agent systems have demonstrated significant potential for enhancing creative task performance, such as long video generation. This study introduces three innovations to improve multi-agent collaboration. First, we propose OmniAgent, a hierarchical, graph-based multi-agent framework for long video generation that leverages a film-production-inspired architecture to enable modular specialization and scalable inter-agent collaboration. Second, inspired by context engineering, we propose hypergraph nodes that enable temporary group discussions among agents lacking sufficient context, reducing individual memory requirements while ensuring adequate contextual information. Third, we transition from directed acyclic graphs (DAGs) to directed cyclic graphs with limited retries, allowing agents to reflect and refine outputs iteratively, thereby improving earlier stages through feedback from subsequent nodes. These contributions lay the groundwork for developing more robust multi-agent systems in creative tasks.
Group size effects and collective misalignment in LLM multi-agent systems
Multi-agent systems of large language models (LLMs) are rapidly expanding across domains, introducing dynamics not captured by single-agent evaluations. Yet, existing work has mostly contrasted the behavior of a single agent with that of a collective of fixed size, leaving open a central question: how does group size shape dynamics? Here, we move beyond this dichotomy and systematically explore outcomes across the full range of group sizes. We focus on multi-agent misalignment, building on recent evidence that interacting LLMs playing a simple coordination game can generate collective biases absent in individual models. First, we show that collective bias is a deeper phenomenon than previously assessed: interaction can amplify individual biases, introduce new ones, or override model-level preferences. Second, we demonstrate that group size affects the dynamics in a non-linear way, revealing model-dependent dynamical regimes. Finally, we develop a mean-field analytical approach and show that, above a critical population size, simulations converge to deterministic predictions that expose the basins of attraction of competing equilibria. These findings establish group size as a key driver of multi-agent dynamics and highlight the need to consider population-level effects when deploying LLM-based systems at scale.
IFS: Information Flow Structure for Multi-agent Ad Hoc System
Multi-agent ad hoc systems are dynamic collaborative systems in which multiple autonomous agents must cooperate with both known and unknown teammates in open environments, without relying on pre-coordinated strategies. These systems operate under conditions of uncertainty and partial observability, where team composition, agent behaviors, and environmental factors may change during execution. Through an analysis of information flow in such systems, we identify two key limitations in existing research: insufficient information flow and limited information processing capacity. To address these issues, we propose an information flow structure for multi-agent ad hoc systems (IFS), which tackles these challenges from the perspectives of communication and information fusion. Experimental results in StarCraft II demonstrate that IFS significantly improves both information flow and processing capacity, while exhibiting strong generalization capabilities and outperforming baseline methods in complex ad hoc teamwork scenarios.
CGoT: A Novel Inference Mechanism for Embodied Multi-Agent Systems Using Composable Graphs of Thoughts
The integration of self-driving cars and service robots is becoming increasingly prevalent across a wide array of fields, playing a crucial and expanding role in both industrial applications and everyday life. In parallel, the rapid advancements in Large Language Models (LLMs) have garnered substantial attention and interest within the research community. This paper introduces a novel vehicle-robot system that leverages the strengths of both autonomous vehicles and service robots. In our proposed system, two autonomous ego-vehicles transports service robots to locations within an office park, where they perform a series of tasks. The study explores the feasibility and potential benefits of incorporating LLMs into this system, with the aim of enhancing operational efficiency and maximizing the potential of the cooperative mechanisms between the vehicles and the robots. This paper proposes a novel inference mechanism which is called CGOT toward this type of system where an agent can carry another agent. Experimental results are presented to validate the performance of the proposed method.
CreditXAI: A Multi-Agent System for Explainable Corporate Credit Rating
In the domain of corporate credit rating, traditional deep learning methods have improved predictive accuracy but still suffer from the inherent 'black-box' problem and limited interpretability. While incorporating non-financial information enriches the data and provides partial interpretability, the models still lack hierarchical reasoning mechanisms, limiting their comprehensive analytical capabilities. To address these challenges, we propose CreditXAI, a Multi-Agent System (MAS) framework that simulates the collaborative decision-making process of professional credit analysts. The framework focuses on business, financial, and governance risk dimensions to generate consistent and interpretable credit assessments. Experimental results demonstrate that multi-agent collaboration improves predictive accuracy by more than 7% over the best single-agent baseline, confirming its significant synergistic advantage in corporate credit risk evaluation. This study provides a new technical pathway to build intelligent and interpretable credit rating models.
comment: 8 pages, 2 figures
Solving Continuous Mean Field Games: Deep Reinforcement Learning for Non-Stationary Dynamics
Mean field games (MFGs) have emerged as a powerful framework for modeling interactions in large-scale multi-agent systems. Despite recent advancements in reinforcement learning (RL) for MFGs, existing methods are typically limited to finite spaces or stationary models, hindering their applicability to real-world problems. This paper introduces a novel deep reinforcement learning (DRL) algorithm specifically designed for non-stationary continuous MFGs. The proposed approach builds upon a Fictitious Play (FP) methodology, leveraging DRL for best-response computation and supervised learning for average policy representation. Furthermore, it learns a representation of the time-dependent population distribution using a Conditional Normalizing Flow. To validate the effectiveness of our method, we evaluate it on three different examples of increasing complexity. By addressing critical limitations in scalability and density approximation, this work represents a significant advancement in applying DRL techniques to complex MFG problems, bringing the field closer to real-world multi-agent systems.
comment: Neurips 2025
Solving the Unsolvable: Translating Case Law in Hong Kong
This paper addresses the challenges translating case law under Hong Kong's bilingual legal system. It highlights the initial success of translating all written statutes into Chinese before the 1997 handover, a task mandated by the Basic Law. The effort involved significant collaboration among legal, linguistic, and translation experts, resulting in a comprehensive and culturally appropriate bilingual legal system. However, translating case law remains a significant challenge due to the sheer volume and continuous growth of judicial decisions. The paper critiques the governments and judiciarys sporadic and uncoordinated efforts to translate case law, contrasting it with the thorough approach previously taken for statute translation. Although the government acknowledges the importance of legal bilingualism, it lacks a sustainable strategy for translating case law. The Judiciarys position that translating all judgments is unnecessary, unrealistic, and not cost-effectiveis analyzed and critiqued for its impact on legal transparency and public trust. A proposed solution involves leveraging machine translation technology through a human-machine interactive translation platform, which undergoes two major transitions. Initially based on a neural model, the platform transitions to using a large language model for improved translation accuracy. Furthermore, it evolves from a single-agent system to a multi-agent system, incorporating Translator, Annotator, and Proofreader agents. This multi-agent approach, supported by a grant, aims to facilitate efficient, high-quality translation of judicial judgments by integrating advanced artificial intelligence and continuous feedback mechanisms, thus better meeting the needs of a bilingual legal system.
Assessing the Potential of Generative Agents in Crowdsourced Fact-Checking
The growing spread of online misinformation has created an urgent need for scalable, reliable fact-checking solutions. Crowdsourced fact-checking - where non-experts evaluate claim veracity - offers a cost-effective alternative to expert verification, despite concerns about variability in quality and bias. Encouraged by promising results in certain contexts, major platforms such as X (formerly Twitter), Facebook, and Instagram have begun shifting from centralized moderation to decentralized, crowd-based approaches. In parallel, advances in Large Language Models (LLMs) have shown strong performance across core fact-checking tasks, including claim detection and evidence evaluation. However, their potential role in crowdsourced workflows remains unexplored. This paper investigates whether LLM-powered generative agents - autonomous entities that emulate human behavior and decision-making - can meaningfully contribute to fact-checking tasks traditionally reserved for human crowds. Using the protocol of La Barbera et al. (2024), we simulate crowds of generative agents with diverse demographic and ideological profiles. Agents retrieve evidence, assess claims along multiple quality dimensions, and issue final veracity judgments. Our results show that agent crowds outperform human crowds in truthfulness classification, exhibit higher internal consistency, and show reduced susceptibility to social and cognitive biases. Compared to humans, agents rely more systematically on informative criteria such as Accuracy, Precision, and Informativeness, suggesting a more structured decision-making process. Overall, our findings highlight the potential of generative agents as scalable, consistent, and less biased contributors to crowd-based fact-checking systems.
comment: This paper has been published in Online Social Networks and Media (https://doi.org/10.1016/j.osnem.2025.100326). Please cite the published version accordingly
Robotics
Design and Structural Validation of a Micro-UAV with On-Board Dynamic Route Planning
Micro aerial vehicles are becoming increasingly important in search and rescue operations due to their agility, speed, and ability to access confined spaces or hazardous areas. However, designing lightweight aerial systems presents significant structural, aerodynamic, and computational challenges. This work addresses two key limitations in many low-cost aerial systems under two kilograms: their lack of structural durability during flight through rough terrains and inability to replan paths dynamically when new victims or obstacles are detected. We present a fully customised drone built from scratch using only commonly available components and materials, emphasising modularity, low cost, and ease of assembly. The structural frame is reinforced with lightweight yet durable materials to withstand impact, while the onboard control system is powered entirely by free, open-source software solutions. The proposed system demonstrates real-time perception and adaptive navigation capabilities without relying on expensive hardware accelerators, offering an affordable and practical solution for real-world search and rescue missions.
comment: 8 pages, 4 figures, 4 tables
Enhancing Tactile-based Reinforcement Learning for Robotic Control
Achieving safe, reliable real-world robotic manipulation requires agents to evolve beyond vision and incorporate tactile sensing to overcome sensory deficits and reliance on idealised state information. Despite its potential, the efficacy of tactile sensing in reinforcement learning (RL) remains inconsistent. We address this by developing self-supervised learning (SSL) methodologies to more effectively harness tactile observations, focusing on a scalable setup of proprioception and sparse binary contacts. We empirically demonstrate that sparse binary tactile signals are critical for dexterity, particularly for interactions that proprioceptive control errors do not register, such as decoupled robot-object motions. Our agents achieve superhuman dexterity in complex contact tasks (ball bouncing and Baoding ball rotation). Furthermore, we find that decoupling the SSL memory from the on-policy memory can improve performance. We release the Robot Tactile Olympiad (RoTO) benchmark to standardise and promote future research in tactile-based manipulation. Project page: https://elle-miller.github.io/tactile_rl
MATrack: Efficient Multiscale Adaptive Tracker for Real-Time Nighttime UAV Operations
Nighttime UAV tracking faces significant challenges in real-world robotics operations. Low-light conditions not only limit visual perception capabilities, but cluttered backgrounds and frequent viewpoint changes also cause existing trackers to drift or fail during deployment. To address these difficulties, researchers have proposed solutions based on low-light enhancement and domain adaptation. However, these methods still have notable shortcomings in actual UAV systems: low-light enhancement often introduces visual artifacts, domain adaptation methods are computationally expensive and existing lightweight designs struggle to fully leverage dynamic object information. Based on an in-depth analysis of these key issues, we propose MATrack-a multiscale adaptive system designed specifically for nighttime UAV tracking. MATrack tackles the main technical challenges of nighttime tracking through the collaborative work of three core modules: Multiscale Hierarchy Blende (MHB) enhances feature consistency between static and dynamic templates. Adaptive Key Token Gate accurately identifies object information within complex backgrounds. Nighttime Template Calibrator (NTC) ensures stable tracking performance over long sequences. Extensive experiments show that MATrack achieves a significant performance improvement. On the UAVDark135 benchmark, its precision, normalized precision and AUC surpass state-of-the-art (SOTA) methods by 5.9%, 5.4% and 4.2% respectively, while maintaining a real-time processing speed of 81 FPS. Further tests on a real-world UAV platform validate the system's reliability, demonstrating that MATrack can provide stable and effective nighttime UAV tracking support for critical robotics applications such as nighttime search and rescue and border patrol.
comment: Preprint, Under Review
Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos
This paper presents a novel approach for pretraining robotic manipulation Vision-Language-Action (VLA) models using a large corpus of unscripted real-life video recordings of human hand activities. Treating human hand as dexterous robot end-effector, we show that "in-the-wild" egocentric human videos without any annotations can be transformed into data formats fully aligned with existing robotic V-L-A training data in terms of task granularity and labels. This is achieved by the development of a fully-automated holistic human activity analysis approach for arbitrary human hand videos. This approach can generate atomic-level hand activity segments and their language descriptions, each accompanied with framewise 3D hand motion and camera motion. We process a large volume of egocentric videos and create a hand-VLA training dataset containing 1M episodes and 26M frames. This training data covers a wide range of objects and concepts, dexterous manipulation tasks, and environment variations in real life, vastly exceeding the coverage of existing robot data. We design a dexterous hand VLA model architecture and pretrain the model on this dataset. The model exhibits strong zero-shot capabilities on completely unseen real-world observations. Additionally, fine-tuning it on a small amount of real robot action data significantly improves task success rates and generalization to novel objects in real robotic experiments. We also demonstrate the appealing scaling behavior of the model's task performance with respect to pretraining data scale. We believe this work lays a solid foundation for scalable VLA pretraining, advancing robots toward truly generalizable embodied intelligence.
comment: Project page: https://microsoft.github.io/VITRA/
Learning Neural Control Barrier Functions from Expert Demonstrations using Inverse Constraint Learning
Safety is a fundamental requirement for autonomous systems operating in critical domains. Control barrier functions (CBFs) have been used to design safety filters that minimally alter nominal controls for such systems to maintain their safety. Learning neural CBFs has been proposed as a data-driven alternative for their computationally expensive optimization-based synthesis. However, it is often the case that the failure set of states that should be avoided is non-obvious or hard to specify formally, e.g., tailgating in autonomous driving, while a set of expert demonstrations that achieve the task and avoid the failure set is easier to generate. We use ICL to train a constraint function that classifies the states of the system under consideration to safe, i.e., belong to a controlled forward invariant set that is disjoint from the unspecified failure set, and unsafe ones, i.e., belong to the complement of that set. We then use that function to label a new set of simulated trajectories to train our neural CBF. We empirically evaluate our approach in four different environments, demonstrating that it outperforms existing baselines and achieves comparable performance to a neural CBF trained with the same data but annotated with ground-truth safety labels.
AURASeg: Attention Guided Upsampling with Residual Boundary-Assistive Refinement for Drivable-Area Segmentation
Free space ground segmentation is essential to navigate robots and autonomous vehicles, recognize drivable zones, and traverse efficiently. Fine-grained features remain challenging for existing segmentation models, particularly for robots in indoor and structured environments. These difficulties arise from ineffective multi-scale processing, suboptimal boundary refinement, and limited feature representation. In order to overcome these limitations, we propose Attention-Guided Upsampling with Residual Boundary-Assistive Refinement (AURASeg), a ground-plane semantic segmentation model that maintains high segmentation accuracy while improving border precision. Our method uses CSP-Darknet backbone by adding a Residual Border Refinement Module (RBRM) for accurate edge delineation and an Attention Progressive Upsampling Decoder (APUD) for strong feature integration. We also incorporate a lightweight Atrous Spatial Pyramid Pooling (ASPP-Lite) module to ensure multi-scale context extraction without compromising real-time performance. The proposed model beats benchmark segmentation architectures in mIoU and F1 metrics when tested on the Ground Mobile Robot Perception (GMRP) Dataset and a custom Gazebo indoor dataset. Our approach achieves an improvement in mean Intersection-over-Union (mIoU) of +1.26% and segmentation precision of +1.65% compared to state-of-the-art models. These results show that our technique is feasible for autonomous perception in both indoor and outdoor environments, enabling precise border refinement with minimal effect on inference speed.
comment: 10 pages, 5 figures, 4 tables
Enhancing Social Robots through Resilient AI
As artificial intelligence continues to advance and becomes more integrated into sensitive areas like healthcare, education, and everyday life, it's crucial for these systems to be both resilient and robust. This paper shows how resilience is a fundamental characteristic of social robots, which, through it, ensure trust in the robot itself-an essential element especially when operating in contexts with elderly people, who often have low trust in these systems. Resilience is therefore the ability to operate under adverse or stressful conditions, even when degraded or weakened, while maintaining essential operational capabilities.
comment: 8 pages, Workshop on Adaptive Social Interaction based on user's Mental mOdels and behaVior in HRI, The 17th International Conference on Social Robotics, 10-12 September 2025, Naples (IT)
PhysWorld: From Real Videos to World Models of Deformable Objects via Physics-Aware Demonstration Synthesis
Interactive world models that simulate object dynamics are crucial for robotics, VR, and AR. However, it remains a significant challenge to learn physics-consistent dynamics models from limited real-world video data, especially for deformable objects with spatially-varying physical properties. To overcome the challenge of data scarcity, we propose PhysWorld, a novel framework that utilizes a simulator to synthesize physically plausible and diverse demonstrations to learn efficient world models. Specifically, we first construct a physics-consistent digital twin within MPM simulator via constitutive model selection and global-to-local optimization of physical properties. Subsequently, we apply part-aware perturbations to the physical properties and generate various motion patterns for the digital twin, synthesizing extensive and diverse demonstrations. Finally, using these demonstrations, we train a lightweight GNN-based world model that is embedded with physical properties. The real video can be used to further refine the physical properties. PhysWorld achieves accurate and fast future predictions for various deformable objects, and also generalizes well to novel interactions. Experiments show that PhysWorld has competitive performance while enabling inference speeds 47 times faster than the recent state-of-the-art method, i.e., PhysTwin.
comment: 17 pages, 5 figures
PREVENT: Proactive Risk Evaluation and Vigilant Execution of Tasks for Mobile Robotic Chemists using Multi-Modal Behavior Trees
Mobile robotic chemists are a fast growing trend in the field of chemistry and materials research. However, so far these mobile robots lack workflow awareness skills. This poses the risk that even a small anomaly, such as an improperly capped sample vial could disrupt the entire workflow. This wastes time, and resources, and could pose risks to human researchers, such as exposure to toxic materials. Existing perception mechanisms can be used to predict anomalies but they often generate excessive false positives. This may halt workflow execution unnecessarily, requiring researchers to intervene and to resume the workflow when no problem actually exists, negating the benefits of autonomous operation. To address this problem, we propose PREVENT a system comprising navigation and manipulation skills based on a multimodal Behavior Tree (BT) approach that can be integrated into existing software architectures with minimal modifications. Our approach involves a hierarchical perception mechanism that exploits AI techniques and sensory feedback through Dexterous Vision and Navigational Vision cameras and an IoT gas sensor module for execution-related decision-making. Experimental evaluations show that the proposed approach is comparatively efficient and completely avoids both false negatives and false positives when tested in simulated risk scenarios within our robotic chemistry workflow. The results also show that the proposed multi-modal perception skills achieved deployment accuracies that were higher than the average of the corresponding uni-modal skills, both for navigation and for manipulation.
comment: 25 pages, 8 figures, paper submitted to Robotics and Autonomous Systems Journal
Load-bearing Assessment for Safe Locomotion of Quadruped Robots on Collapsing Terrain
Collapsing terrains, often present in search and rescue missions or planetary exploration, pose significant challenges for quadruped robots. This paper introduces a robust locomotion framework for safe navigation over unstable surfaces by integrating terrain probing, load-bearing analysis, motion planning, and control strategies. Unlike traditional methods that rely on specialized sensors or external terrain mapping alone, our approach leverages joint measurements to assess terrain stability without hardware modifications. A Model Predictive Control (MPC) system optimizes robot motion, balancing stability and probing constraints, while a state machine coordinates terrain probing actions, enabling the robot to detect collapsible regions and dynamically adjust its footholds. Experimental results on custom-made collapsing platforms and rocky terrains demonstrate the framework's ability to traverse collapsing terrain while maintaining stability and prioritizing safety.
Remote Autonomy for Multiple Small Lowcost UAVs in GNSS-denied Search and Rescue Operations
In recent years, consumer-grade UAVs have been widely adopted by first responders. In general, they are operated manually, which requires trained pilots, especially in unknown GNSS-denied environments and in the vicinity of structures. Autonomous flight can facilitate the application of UAVs and reduce operator strain. However, autonomous systems usually require special programming interfaces, custom sensor setups, and strong onboard computers, which limits a broader deployment. We present a system for autonomous flight using lightweight consumer-grade DJI drones. They are controlled by an Android app for state estimation and obstacle avoidance directly running on the UAV's remote control. Our ground control station enables a single operator to configure and supervise multiple heterogeneous UAVs at once. Furthermore, it combines the observations of all UAVs into a joint 3D environment model for improved situational awareness.
comment: Accepted final version. IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), Galway, Ireland, 2025
Towards Reliable Code-as-Policies: A Neuro-Symbolic Framework for Embodied Task Planning NeurIPS 2025
Recent advances in large language models (LLMs) have enabled the automatic generation of executable code for task planning and control in embodied agents such as robots, demonstrating the potential of LLM-based embodied intelligence. However, these LLM-based code-as-policies approaches often suffer from limited environmental grounding, particularly in dynamic or partially observable settings, leading to suboptimal task success rates due to incorrect or incomplete code generation. In this work, we propose a neuro-symbolic embodied task planning framework that incorporates explicit symbolic verification and interactive validation processes during code generation. In the validation phase, the framework generates exploratory code that actively interacts with the environment to acquire missing observations while preserving task-relevant states. This integrated process enhances the grounding of generated code, resulting in improved task reliability and success rates in complex environments. We evaluate our framework on RLBench and in real-world settings across dynamic, partially observable scenarios. Experimental results demonstrate that our framework improves task success rates by 46.2% over Code-as-Policies baselines and attains over 86.8% executability of task-relevant actions, thereby enhancing the reliability of task planning in dynamic environments.
comment: Accepted at NeurIPS 2025 Spotlight
Track-to-Track Association for Collective Perception based on Stochastic Optimization
Collective perception is a key aspect for autonomous driving in smart cities as it aims to combine the local environment models of multiple intelligent vehicles in order to overcome sensor limitations. A crucial part of multi-sensor fusion is track-to-track association. Previous works often suffer from high computational complexity or are based on heuristics. We propose an association algorithms based on stochastic optimization, which leverages a multidimensional likelihood incorporating the number of tracks and their spatial distribution and furthermore computes several association hypotheses. We demonstrate the effectiveness of our approach in Monte Carlo simulations and a realistic collective perception scenario computing high-likelihood associations in ambiguous settings.
Underwater Visual-Inertial-Acoustic-Depth SLAM with DVL Preintegration for Degraded Environments
Visual degradation caused by limited visibility, insufficient lighting, and feature scarcity in underwater environments presents significant challenges to visual-inertial simultaneous localization and mapping (SLAM) systems. To address these challenges, this paper proposes a graph-based visual-inertial-acoustic-depth SLAM system that integrates a stereo camera, an inertial measurement unit (IMU), the Doppler velocity log (DVL), and a pressure sensor. The key innovation lies in the tight integration of four distinct sensor modalities to ensure reliable operation, even under degraded visual conditions. To mitigate DVL drift and improve measurement efficiency, we propose a novel velocity-bias-based DVL preintegration strategy. At the frontend, hybrid tracking strategies and acoustic-inertial-depth joint optimization enhance system stability. Additionally, multi-source hybrid residuals are incorporated into a graph optimization framework. Extensive quantitative and qualitative analyses of the proposed system are conducted in both simulated and real-world underwater scenarios. The results demonstrate that our approach outperforms current state-of-the-art stereo visual-inertial SLAM systems in both stability and localization accuracy, exhibiting exceptional robustness, particularly in visually challenging environments.
comment: 10 pages, 10 figures
An Agnostic End-Effector Alignment Controller for Robust Assembly of Modular Space Robots
Modular robots offer reconfigurability and fault tolerance essential for lunar missions, but require controllers that adapt safely to real-world disturbances. We build on our previous hardware-agnostic actuator synchronization in Motion Stack to develop a new controller enforcing adaptive velocity bounds via a dynamic hypersphere clamp. Using only real-time end-effector and target pose measurements, the controller adjusts its translational and rotational speed limits to ensure smooth, stable alignment without abrupt motions. We implemented two variants, a discrete, step-based version and a continuous, velocity-based version, and tested them on two MoonBot limbs in JAXA's lunar environment simulator. Field trials demonstrate that the step-based variant produces highly predictable, low-wobble motions, while the continuous variant converges more quickly and maintains millimeter-level positional accuracy, and both remain robust across limbs with differing mechanical imperfections and sensing noise (e.g., backlash and flex). These results highlight the flexibility and robustness of our robot-agnostic framework for autonomous self-assembly and reconfiguration under harsh conditions.
comment: 6 pages, 12 figures. Accepted at iSparo 2025 | Video: https://youtu.be/BW0YgSrvuDo
Generalizable Hierarchical Skill Learning via Object-Centric Representation
We present Generalizable Hierarchical Skill Learning (GSL), a novel framework for hierarchical policy learning that significantly improves policy generalization and sample efficiency in robot manipulation. One core idea of GSL is to use object-centric skills as an interface that bridges the high-level vision-language model and the low-level visual-motor policy. Specifically, GSL decomposes demonstrations into transferable and object-canonicalized skill primitives using foundation models, ensuring efficient low-level skill learning in the object frame. At test time, the skill-object pairs predicted by the high-level agent are fed to the low-level module, where the inferred canonical actions are mapped back to the world frame for execution. This structured yet flexible design leads to substantial improvements in sample efficiency and generalization of our method across unseen spatial arrangements, object appearances, and task compositions. In simulation, GSL trained with only 3 demonstrations per task outperforms baselines trained with 30 times more data by 15.5 percent on unseen tasks. In real-world experiments, GSL also surpasses the baseline trained with 10 times more data.
ESCORT: Efficient Stein-variational and Sliced Consistency-Optimized Temporal Belief Representation for POMDPs NeurIPS'25
In Partially Observable Markov Decision Processes (POMDPs), maintaining and updating belief distributions over possible underlying states provides a principled way to summarize action-observation history for effective decision-making under uncertainty. As environments grow more realistic, belief distributions develop complexity that standard mathematical models cannot accurately capture, creating a fundamental challenge in maintaining representational accuracy. Despite advances in deep learning and probabilistic modeling, existing POMDP belief approximation methods fail to accurately represent complex uncertainty structures such as high-dimensional, multi-modal belief distributions, resulting in estimation errors that lead to suboptimal agent behaviors. To address this challenge, we present ESCORT (Efficient Stein-variational and sliced Consistency-Optimized Representation for Temporal beliefs), a particle-based framework for capturing complex, multi-modal distributions in high-dimensional belief spaces. ESCORT extends SVGD with two key innovations: correlation-aware projections that model dependencies between state dimensions, and temporal consistency constraints that stabilize updates while preserving correlation structures. This approach retains SVGD's attractive-repulsive particle dynamics while enabling accurate modeling of intricate correlation patterns. Unlike particle filters prone to degeneracy or parametric methods with fixed representational capacity, ESCORT dynamically adapts to belief landscape complexity without resampling or restrictive distributional assumptions. We demonstrate ESCORT's effectiveness through extensive evaluations on both POMDP domains and synthetic multi-modal distributions of varying dimensionality, where it consistently outperforms state-of-the-art methods in terms of belief approximation accuracy and downstream decision quality.
comment: Proceeding of the 39th Conference on Neural Information Processing Systems (NeurIPS'25). Code would be available at https://github.com/scope-lab-vu/ESCORT
Revisiting Replanning from Scratch: Real-Time Incremental Planning with Fast Almost-Surely Asymptotically Optimal Planners ICRA
Robots operating in changing environments either predict obstacle changes and/or plan quickly enough to react to them. Predictive approaches require a strong prior about the position and motion of obstacles. Reactive approaches require no assumptions about their environment but must replan quickly and find high-quality paths to navigate effectively. Reactive approaches often reuse information between queries to reduce planning cost. These techniques are conceptually sound but updating dense planning graphs when information changes can be computationally prohibitive. It can also require significant effort to detect the changes in some applications. This paper revisits the long-held assumption that reactive replanning requires updating existing plans. It shows that the incremental planning problem can alternatively be solved more efficiently as a series of independent problems using fast almost-surely asymptotically optimal (ASAO) planning algorithms. These ASAO algorithms quickly find an initial solution and converge towards an optimal solution which allows them to find consistent global plans in the presence of changing obstacles without requiring explicit plan reuse. This is demonstrated with simulated experiments where Effort Informed Trees (EIT*) finds shorter median solution paths than the tested reactive planning algorithms and is further validated using Asymptotically Optimal RRT-Connect (AORRTC) on a real-world planning problem on a robot arm.
comment: Submitted to IEEE International Conference on Robotics and Automation (ICRA) 2026, 8 pages, 5 figures, 1 table. A video of this work can be found at https://www.youtube.com/watch?v=XaZrFy8wGZs
ZING-3D: Zero-shot Incremental 3D Scene Graphs via Vision-Language Models
Understanding and reasoning about complex 3D environments requires structured scene representations that capture not only objects but also their semantic and spatial relationships. While recent works on 3D scene graph generation have leveraged pretrained VLMs without task-specific fine-tuning, they are largely confined to single-view settings, fail to support incremental updates as new observations arrive and lack explicit geometric grounding in 3D space, all of which are essential for embodied scenarios. In this paper, we propose, ZING-3D, a framework that leverages the vast knowledge of pretrained foundation models to enable open-vocabulary recognition and generate a rich semantic representation of the scene in a zero-shot manner while also enabling incremental updates and geometric grounding in 3D space, making it suitable for downstream robotics applications. Our approach leverages VLM reasoning to generate a rich 2D scene graph, which is grounded in 3D using depth information. Nodes represent open-vocabulary objects with features, 3D locations, and semantic context, while edges capture spatial and semantic relations with inter-object distances. Our experiments on scenes from the Replica and HM3D dataset show that ZING-3D is effective at capturing spatial and relational knowledge without the need of task-specific training.
Estimation of Minimum Stride Frequency for the Frontal Plane Stability of Bipedal Systems
Stability of bipedal systems in frontal plane is affected by the hip offset, to the extent that adjusting stride time using feedforward retraction and extension of the legs can lead to stable oscillations without feedback control. This feedforward stabilization can be leveraged to reduce the control effort and energy expenditure and increase the locomotion robustness. However, there is limited understanding of how key parameters, such as mass, stiffness, leg length, and hip width, affect stability and the minimum stride frequency needed to maintain it. This study aims to address these gaps through analyzing how individual model parameters and the system's natural frequency influence the minimum stride frequency required to maintain a stable cycle. We propose a method to predict the minimum stride frequency, and compare the predicted stride frequencies with actual values for randomly generated models. The findings of this work provide a better understanding of the frontal plane stability mechanisms and how feedforward stabilization can be leveraged to reduce the control effort.
Two-Steps Diffusion Policy for Robotic Manipulation via Genetic Denoising
Diffusion models, such as diffusion policy, have achieved state-of-the-art results in robotic manipulation by imitating expert demonstrations. While diffusion models were originally developed for vision tasks like image and video generation, many of their inference strategies have been directly transferred to control domains without adaptation. In this work, we show that by tailoring the denoising process to the specific characteristics of embodied AI tasks -- particularly structured, low-dimensional nature of action distributions -- diffusion policies can operate effectively with as few as 5 neural function evaluations (NFE). Building on this insight, we propose a population-based sampling strategy, genetic denoising, which enhances both performance and stability by selecting denoising trajectories with low out-of-distribution risk. Our method solves challenging tasks with only 2 NFE while improving or matching performance. We evaluate our approach across 14 robotic manipulation tasks from D4RL and Robomimic, spanning multiple action horizons and inference budgets. In over 2 million evaluations, our method consistently outperforms standard diffusion-based policies, achieving up to 20\% performance gains with significantly fewer inference steps.
comment: 16 pages, 11 figure, 2 tables, accepted at Neurips 2025
Real-Time Gait Adaptation for Quadrupeds using Model Predictive Control and Reinforcement Learning
Model-free reinforcement learning (RL) has enabled adaptable and agile quadruped locomotion; however, policies often converge to a single gait, leading to suboptimal performance. Traditionally, Model Predictive Control (MPC) has been extensively used to obtain task-specific optimal policies but lacks the ability to adapt to varying environments. To address these limitations, we propose an optimization framework for real-time gait adaptation in a continuous gait space, combining the Model Predictive Path Integral (MPPI) algorithm with a Dreamer module to produce adaptive and optimal policies for quadruped locomotion. At each time step, MPPI jointly optimizes the actions and gait variables using a learned Dreamer reward that promotes velocity tracking, energy efficiency, stability, and smooth transitions, while penalizing abrupt gait changes. A learned value function is incorporated as terminal reward, extending the formulation to an infinite-horizon planner. We evaluate our framework in simulation on the Unitree Go1, demonstrating an average reduction of up to 36.48 % in energy consumption across varying target speeds, while maintaining accurate tracking and adaptive, task-appropriate gaits.
comment: 7 pages
Intrinsic Goals for Autonomous Agents: Model-Based Exploration in Virtual Zebrafish Predicts Ethological Behavior and Whole-Brain Dynamics
Autonomy is a hallmark of animal intelligence, enabling adaptive and intelligent behavior in complex environments without relying on external reward or task structure. Existing reinforcement learning approaches to exploration in reward-free environments, including a class of methods known as model-based intrinsic motivation, exhibit inconsistent exploration patterns and do not converge to an exploratory policy, thus failing to capture robust autonomous behaviors observed in animals. Moreover, systems neuroscience has largely overlooked the neural basis of autonomy, focusing instead on experimental paradigms where animals are motivated by external reward rather than engaging in ethological, naturalistic and task-independent behavior. To bridge these gaps, we introduce a novel model-based intrinsic drive explicitly designed after the principles of autonomous exploration in animals. Our method (3M-Progress) achieves animal-like exploration by tracking divergence between an online world model and a fixed prior learned from an ecological niche. To the best of our knowledge, we introduce the first autonomous embodied agent that predicts brain data entirely from self-supervised optimization of an intrinsic goal -- without any behavioral or neural training data -- demonstrating that 3M-Progress agents capture the explainable variance in behavioral patterns and whole-brain neural-glial dynamics recorded from autonomously behaving larval zebrafish, thereby providing the first goal-driven, population-level model of neural-glial computation. Our findings establish a computational framework connecting model-based intrinsic motivation to naturalistic behavior, providing a foundation for building artificial agents with animal-like autonomy.
comment: 17 pages, 7 figures
SimuRA: A World-Model-Driven Simulative Reasoning Architecture for General Goal-Oriented Agents
AI agents built on foundation models hold enormous promise. Current practice, however, focuses on a one-task-one-agent approach, which not only falls short of scalability and generality, but also faces practical limitations from black-box autoregressive reasoning, where decisions unfold token by token without explicit simulation or counterfactual evaluation of outcomes. Humans, on the other hand, reason and plan by mentally simulating the consequences of actions within an internal model of the world -- a capability that supports flexible, goal-directed behavior across diverse contexts. Moving towards a more general and powerful AI agent, we introduce SimuRA, a goal-oriented architecture for generalized agentic reasoning. Based on a principled formulation of an optimal agent in any general environment, SimuRA addresses the limitations of black-box autoregressive reasoning by incorporating the world model for planning via simulation. Our prototype world model is implemented using LLMs as a substrate, leveraging the natural language as a discrete, hierarchical representation grounded in concepts for planning, while remaining model-agnostic. On complex web-browsing tasks such as flight search, SimuRA improves the success rate from 0% to 32.2% compared to a representative open-web agent baseline. Across tasks, world-model-based planning achieves up to 124% higher task completion rates than a matched black-box autoregressive baseline, demonstrating the advantages of simulative reasoning. We release ReasonerAgent-Web, a web-browsing agent built on SimuRA, as an open-source research demo.
comment: This submission has been updated to adjust the scope and presentation of the work
High-Precision Climbing Robot Localization Using Planar Array UWB/GPS/IMU/Barometer Integration
To address the need for high-precision localization of climbing robots in complex high-altitude environments, this paper proposes a multi-sensor fusion system that overcomes the limitations of single-sensor approaches. Firstly, the localization scenarios and the problem model are analyzed. An integrated architecture of Attention Mechanism-based Fusion Algorithm (AMFA) incorporating planar array Ultra-Wideband (UWB), GPS, Inertial Measurement Unit (IMU), and barometer is designed to handle challenges such as GPS occlusion and UWB Non-Line-of-Sight (NLOS) problem. Then, End-to-end neural network inference models for UWB and barometer are developed, along with a multimodal attention mechanism for adaptive data fusion. An Unscented Kalman Filter (UKF) is applied to refine the trajectory, improving accuracy and robustness. Finally, real-world experiments show that the method achieves 0.48 m localization accuracy and lower MAX error of 1.50 m, outperforming baseline algorithms such as GPS/INS-EKF and demonstrating stronger robustness.
Reinforcement Learning with Action Chunking NeurIPS 2025
We present Q-chunking, a simple yet effective recipe for improving reinforcement learning (RL) algorithms for long-horizon, sparse-reward tasks. Our recipe is designed for the offline-to-online RL setting, where the goal is to leverage an offline prior dataset to maximize the sample-efficiency of online learning. Effective exploration and sample-efficient learning remain central challenges in this setting, as it is not obvious how the offline data should be utilized to acquire a good exploratory policy. Our key insight is that action chunking, a technique popularized in imitation learning where sequences of future actions are predicted rather than a single action at each timestep, can be applied to temporal difference (TD)-based RL methods to mitigate the exploration challenge. Q-chunking adopts action chunking by directly running RL in a 'chunked' action space, enabling the agent to (1) leverage temporally consistent behaviors from offline data for more effective online exploration and (2) use unbiased $n$-step backups for more stable and efficient TD learning. Our experimental results demonstrate that Q-chunking exhibits strong offline performance and online sample efficiency, outperforming prior best offline-to-online methods on a range of long-horizon, sparse-reward manipulation tasks.
comment: The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS 2025); 36 pages, 17 figures
Trust-Aware Assistance Seeking in Human-Supervised Autonomy
Our goal is to model and experimentally assess trust evolution to predict future beliefs and behaviors of human-robot teams in dynamic environments. Research suggests that maintaining trust among team members in a human-robot team is vital for successful team performance. Research suggests that trust is a multi-dimensional and latent entity that relates to past experiences and future actions in a complex manner. Employing a human-robot collaborative task, we design an optimal assistance-seeking strategy for the robot using a POMDP framework. In the task, the human supervises an autonomous mobile manipulator collecting objects in an environment. The supervisor's task is to ensure that the robot safely executes its task. The robot can either choose to attempt to collect the object or seek human assistance. The human supervisor actively monitors the robot's activities, offering assistance upon request, and intervening if they perceive the robot may fail. In this setting, human trust is the hidden state, and the primary objective is to optimize team performance. We execute two sets of human-robot interaction experiments. The data from the first experiment are used to estimate POMDP parameters, which are used to compute an optimal assistance-seeking policy evaluated in the second experiment. The estimated POMDP parameters reveal that, for most participants, human intervention is more probable when trust is low, particularly in high-complexity tasks. Our estimates suggest that the robot's action of asking for assistance in high-complexity tasks can positively impact human trust. Our experimental results show that the proposed trust-aware policy is better than an optimal trust-agnostic policy. By comparing model estimates of human trust, obtained using only behavioral data, with the collected self-reported trust values, we show that model estimates are isomorphic to self-reported responses.
LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models
Visual-Language-Action (VLA) models report impressive success rates on robotic manipulation benchmarks, yet these results may mask fundamental weaknesses in robustness. We perform a systematic vulnerability analysis by introducing controlled perturbations across seven dimensions: objects layout, camera viewpoints, robot initial states, language instructions, light conditions, background textures and sensor noise. We comprehensively analyzed multiple state-of-the-art models and revealed consistent brittleness beneath apparent competence. Our analysis exposes critical weaknesses: models exhibit extreme sensitivity to perturbation factors, including camera viewpoints and robot initial states, with performance dropping from 95% to below 30% under modest perturbations. Surprisingly, models are largely insensitive to language variations, with further experiments revealing that models tend to ignore language instructions completely. Our findings challenge the assumption that high benchmark scores equate to true competency and highlight the need for evaluation practices that assess reliability under realistic variation.
Visual Cues Enhance Predictive Turn-Taking for Two-Party Human Interaction ACL 2025
Turn-taking is richly multimodal. Predictive turn-taking models (PTTMs) facilitate naturalistic human-robot interaction, yet most rely solely on speech. We introduce MM-VAP, a multimodal PTTM which combines speech with visual cues including facial expression, head pose and gaze. We find that it outperforms the state-of-the-art audio-only in videoconferencing interactions (84% vs. 79% hold/shift prediction accuracy). Unlike prior work which aggregates all holds and shifts, we group by duration of silence between turns. This reveals that through the inclusion of visual features, MM-VAP outperforms a state-of-the-art audio-only turn-taking model across all durations of speaker transitions. We conduct a detailed ablation study, which reveals that facial expression features contribute the most to model performance. Thus, our working hypothesis is that when interlocutors can see one another, visual cues are vital for turn-taking and must therefore be included for accurate turn-taking prediction. We additionally validate the suitability of automatic speech alignment for PTTM training using telephone speech. This work represents the first comprehensive analysis of multimodal PTTMs. We discuss implications for future work and make all code publicly available.
comment: Accepted to ACL 2025, Findings of the Association for Computational Linguistics
RESample: A Robust Data Augmentation Framework via Exploratory Sampling for Robotic Manipulation ICRA2026
Vision-Language-Action models (VLAs) have demonstrated remarkable performance on complex robotic manipulation tasks through imitation learning. However, existing imitation learning datasets contain only successful trajectories and lack failure or recovery data, especially for out-of-distribution (OOD) states where the robot deviates from the main policy due to minor perturbations or errors, leading VLA models to struggle with states deviating from the training distribution. To this end, we propose an automated OOD data augmentation framework named RESample through exploratory sampling. Specifically, we first leverage offline reinforcement learning to obtain an action-value network that accurately identifies sub-optimal actions under the current manipulation policy. We further sample potential OOD states from trajectories via rollout, and design an exploratory sampling mechanism that adaptively incorporates these action proxies into the training dataset to ensure efficiency. Subsequently, our framework explicitly encourages the VLAs to recover from OOD states and enhances their robustness against distributional shifts. We conduct extensive experiments on the LIBERO benchmark as well as real-world robotic manipulation tasks, demonstrating that RESample consistently improves the stability and generalization ability of VLA models.
comment: 9 pages,7 figures, submitted to ICRA2026
BEAST: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning NeurIPS 2025
We present the B-spline Encoded Action Sequence Tokenizer (BEAST), a novel action tokenizer that encodes action sequences into compact discrete or continuous tokens using B-splines. In contrast to existing action tokenizers based on vector quantization or byte pair encoding, BEAST requires no separate tokenizer training and consistently produces tokens of uniform length, enabling fast action sequence generation via parallel decoding. Leveraging our B-spline formulation, BEAST inherently ensures generating smooth trajectories without discontinuities between adjacent segments. We extensively evaluate BEAST by integrating it with three distinct model architectures: a Variational Autoencoder (VAE) with continuous tokens, a decoder-only Transformer with discrete tokens, and Florence-2, a pretrained Vision-Language Model with an encoder-decoder architecture, demonstrating BEAST's compatibility and scalability with large pretrained models. We evaluate BEAST across three established benchmarks consisting of 166 simulated tasks and on three distinct robot settings with a total of 8 real-world tasks. Experimental results demonstrate that BEAST (i) significantly reduces both training and inference computational costs, and (ii) consistently generates smooth, high-frequency control signals suitable for continuous control tasks while (iii) reliably achieves competitive task success rates compared to state-of-the-art methods.
comment: Accepted by NeurIPS 2025
Mix Q-learning for Lane Changing: A Collaborative Decision-Making Method in Multi-Agent Deep Reinforcement Learning
Lane-changing decisions, which are crucial for autonomous vehicle path planning, face practical challenges due to rule-based constraints and limited data. Deep reinforcement learning has become a major research focus due to its advantages in data acquisition and interpretability. However, current models often overlook collaboration, which affects not only impacts overall traffic efficiency but also hinders the vehicle's own normal driving in the long run. To address the aforementioned issue, this paper proposes a method named Mix Q-learning for Lane Changing(MQLC) that integrates a hybrid value Q network, taking into account both collective and individual benefits for the greater good. At the collective level, our method coordinates the individual Q and global Q networks by utilizing global information. This enables agents to effectively balance their individual interests with the collective benefit. At the individual level, we integrated a deep learning-based intent recognition module into our observation and enhanced the decision network. These changes provide agents with richer decision information and more accurate feature extraction for improved lane-changing decisions. This strategy enables the multi-agent system to learn and formulate optimal decision-making strategies effectively. Our MQLC model, through extensive experimental results, impressively outperforms other state-of-the-art multi-agent decision-making methods, achieving significantly safer and faster lane-changing decisions. The code is available at https:github.com/pku-smart-city/source_code/tree/main/MQLC.
DeltaFlow: An Efficient Multi-frame Scene Flow Estimation Method NeurIPS 2025
Previous dominant methods for scene flow estimation focus mainly on input from two consecutive frames, neglecting valuable information in the temporal domain. While recent trends shift towards multi-frame reasoning, they suffer from rapidly escalating computational costs as the number of frames grows. To leverage temporal information more efficiently, we propose DeltaFlow ($\Delta$Flow), a lightweight 3D framework that captures motion cues via a $\Delta$ scheme, extracting temporal features with minimal computational cost, regardless of the number of frames. Additionally, scene flow estimation faces challenges such as imbalanced object class distributions and motion inconsistency. To tackle these issues, we introduce a Category-Balanced Loss to enhance learning across underrepresented classes and an Instance Consistency Loss to enforce coherent object motion, improving flow accuracy. Extensive evaluations on the Argoverse 2, Waymo and nuScenes datasets show that $\Delta$Flow achieves state-of-the-art performance with up to 22% lower error and $2\times$ faster inference compared to the next-best multi-frame supervised method, while also demonstrating a strong cross-domain generalization ability. The code is open-sourced at https://github.com/Kin-Zhang/DeltaFlow along with trained model weights.
comment: NeurIPS 2025 Spotlight, 18 pages (10 main pages + 8 supp materail), 11 figures, code at https://github.com/Kin-Zhang/DeltaFlow
Augmenting Neural Networks-Based Model Approximators in Robotic Force-Tracking Tasks
As robotics gains popularity, interaction control becomes crucial for ensuring force tracking in manipulator-based tasks. Typically, traditional interaction controllers either require extensive tuning, or demand expert knowledge of the environment, which is often impractical in real-world applications. This work proposes a novel control strategy leveraging Neural Networks (NNs) to enhance the force-tracking behavior of a Direct Force Controller (DFC). Unlike similar previous approaches, it accounts for the manipulator's tangential velocity, a critical factor in force exertion, especially during fast motions. The method employs an ensemble of feedforward NNs to predict contact forces, then exploits the prediction to solve an optimization problem and generate an optimal residual action, which is added to the DFC output and applied to an impedance controller. The proposed Velocity-augmented Artificial intelligence Interaction Controller for Ambiguous Models (VAICAM) is validated in the Gazebo simulator on a Franka Emika Panda robot. Against a vast set of trajectories, VAICAM achieves superior performance compared to two baseline controllers.
comment: In Proceedings of the 22nd International Conference on Informatics in Control, Automation and Robotics - Volume 2: ICINCO, 394-401, 2025 , Marbella, Spain
MEReQ: Max-Ent Residual-Q Inverse RL for Sample-Efficient Alignment from Intervention
Aligning robot behavior with human preferences is crucial for deploying embodied AI agents in human-centered environments. A promising solution is interactive imitation learning from human intervention, where a human expert observes the policy's execution and provides interventions as feedback. However, existing methods often fail to utilize the prior policy efficiently to facilitate learning, thus hindering sample efficiency. In this work, we introduce MEReQ (Maximum-Entropy Residual-Q Inverse Reinforcement Learning), designed for sample-efficient alignment from human intervention. Instead of inferring the complete human behavior characteristics, MEReQ infers a residual reward function that captures the discrepancy between the human expert's and the prior policy's underlying reward functions. It then employs Residual Q-Learning (RQL) to align the policy with human preferences using this residual reward function. Extensive evaluations on simulated and real-world tasks demonstrate that MEReQ achieves sample-efficient policy alignment from human intervention.
Grasp2Grasp: Vision-Based Dexterous Grasp Translation via Schrödinger Bridges NeurIPS 2025
We propose a new approach to vision-based dexterous grasp translation, which aims to transfer grasp intent across robotic hands with differing morphologies. Given a visual observation of a source hand grasping an object, our goal is to synthesize a functionally equivalent grasp for a target hand without requiring paired demonstrations or hand-specific simulations. We frame this problem as a stochastic transport between grasp distributions using the Schr\"odinger Bridge formalism. Our method learns to map between source and target latent grasp spaces via score and flow matching, conditioned on visual observations. To guide this translation, we introduce physics-informed cost functions that encode alignment in base pose, contact maps, wrench space, and manipulability. Experiments across diverse hand-object pairs demonstrate our approach generates stable, physically grounded grasps with strong generalization. This work enables semantic grasp transfer for heterogeneous manipulators and bridges vision-based grasping with probabilistic generative modeling. Additional details at https://grasp2grasp.github.io/
comment: Accepted at NeurIPS 2025
LightPlanner: Unleashing the Reasoning Capabilities of Lightweight Large Language Models in Task Planning IROS 2025
In recent years, lightweight large language models (LLMs) have garnered significant attention in the robotics field due to their low computational resource requirements and suitability for edge deployment. However, in task planning -- particularly for complex tasks that involve dynamic semantic logic reasoning -- lightweight LLMs have underperformed. To address this limitation, we propose a novel task planner, LightPlanner, which enhances the performance of lightweight LLMs in complex task planning by fully leveraging their reasoning capabilities. Unlike conventional planners that use fixed skill templates, LightPlanner controls robot actions via parameterized function calls, dynamically generating parameter values. This approach allows for fine-grained skill control and improves task planning success rates in complex scenarios. Furthermore, we introduce hierarchical deep reasoning. Before generating each action decision step, LightPlanner thoroughly considers three levels: action execution (feedback verification), semantic parsing (goal consistency verification), and parameter generation (parameter validity verification). This ensures the correctness of subsequent action controls. Additionally, we incorporate a memory module to store historical actions, thereby reducing context length and enhancing planning efficiency for long-term tasks. We train the LightPlanner-1.5B model on our LightPlan-40k dataset, which comprises 40,000 action controls across tasks with 2 to 13 action steps. Experiments demonstrate that our model achieves the highest task success rate despite having the smallest number of parameters. In tasks involving spatial semantic reasoning, the success rate exceeds that of ReAct by 14.9 percent. Moreover, we demonstrate LightPlanner's potential to operate on edge devices.
comment: The 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025)
Rectified Point Flow: Generic Point Cloud Pose Estimation NeurIPS 2025
We introduce Rectified Point Flow, a unified parameterization that formulates pairwise point cloud registration and multi-part shape assembly as a single conditional generative problem. Given unposed point clouds, our method learns a continuous point-wise velocity field that transports noisy points toward their target positions, from which part poses are recovered. In contrast to prior work that regresses part-wise poses with ad-hoc symmetry handling, our method intrinsically learns assembly symmetries without symmetry labels. Together with a self-supervised encoder focused on overlapping points, our method achieves a new state-of-the-art performance on six benchmarks spanning pairwise registration and shape assembly. Notably, our unified formulation enables effective joint training on diverse datasets, facilitating the learning of shared geometric priors and consequently boosting accuracy. Project page: https://rectified-pointflow.github.io/.
comment: NeurIPS 2025 Camera-ready. Project page: https://rectified-pointflow.github.io/
DERD-Net: Learning Depth from Event-based Ray Densities NeurIPS
Event cameras offer a promising avenue for multi-view stereo depth estimation and Simultaneous Localization And Mapping (SLAM) due to their ability to detect blur-free 3D edges at high-speed and over broad illumination conditions. However, traditional deep learning frameworks designed for conventional cameras struggle with the asynchronous, stream-like nature of event data, as their architectures are optimized for discrete, image-like inputs. We propose a scalable, flexible and adaptable framework for pixel-wise depth estimation with event cameras in both monocular and stereo setups. The 3D scene structure is encoded into disparity space images (DSIs), representing spatial densities of rays obtained by back-projecting events into space via known camera poses. Our neural network processes local subregions of the DSIs combining 3D convolutions and a recurrent structure to recognize valuable patterns for depth prediction. Local processing enables fast inference with full parallelization and ensures constant ultra-low model complexity and memory costs, regardless of camera resolution. Experiments on standard benchmarks (MVSEC and DSEC datasets) demonstrate unprecedented effectiveness: (i) using purely monocular data, our method achieves comparable results to existing stereo methods; (ii) when applied to stereo data, it strongly outperforms all state-of-the-art (SOTA) approaches, reducing the mean absolute error by at least 42%; (iii) our method also allows for increases in depth completeness by more than 3-fold while still yielding a reduction in median absolute error of at least 30%. Given its remarkable performance and effective processing of event-data, our framework holds strong potential to become a standard approach for using deep learning for event-based depth estimation and SLAM. Project page: https://github.com/tub-rip/DERD-Net
comment: 17 pages, 3 figures, 15 tables. Project page: https://github.com/tub-rip/DERD-Net. 39th Conference on Neural Information Processing Systems (NeurIPS), San Diego, 2025
Multiagent Systems
ColorEcosystem: Powering Personalized, Standardized, and Trustworthy Agentic Service in massive-agent Ecosystem
With the rapid development of (multimodal) large language model-based agents, the landscape of agentic service management has evolved from single-agent systems to multi-agent systems, and now to massive-agent ecosystems. Current massive-agent ecosystems face growing challenges, including impersonal service experiences, a lack of standardization, and untrustworthy behavior. To address these issues, we propose ColorEcosystem, a novel blueprint designed to enable personalized, standardized, and trustworthy agentic service at scale. Concretely, ColorEcosystem consists of three key components: agent carrier, agent store, and agent audit. The agent carrier provides personalized service experiences by utilizing user-specific data and creating a digital twin, while the agent store serves as a centralized, standardized platform for managing diverse agentic services. The agent audit, based on the supervision of developer and user activities, ensures the integrity and credibility of both service providers and users. Through the analysis of challenges, transitional forms, and practical considerations, the ColorEcosystem is poised to power personalized, standardized, and trustworthy agentic service across massive-agent ecosystems. Meanwhile, we have also implemented part of ColorEcosystem's functionality, and the relevant code is open-sourced at https://github.com/opas-lab/color-ecosystem.
Scalable Neural Incentive Design with Parameterized Mean-Field Approximation NeurIPS 2025
Designing incentives for a multi-agent system to induce a desirable Nash equilibrium is both a crucial and challenging problem appearing in many decision-making domains, especially for a large number of agents $N$. Under the exchangeability assumption, we formalize this incentive design (ID) problem as a parameterized mean-field game (PMFG), aiming to reduce complexity via an infinite-population limit. We first show that when dynamics and rewards are Lipschitz, the finite-$N$ ID objective is approximated by the PMFG at rate $\mathscr{O}(\frac{1}{\sqrt{N}})$. Moreover, beyond the Lipschitz-continuous setting, we prove the same $\mathscr{O}(\frac{1}{\sqrt{N}})$ decay for the important special case of sequential auctions, despite discontinuities in dynamics, through a tailored auction-specific analysis. Built on our novel approximation results, we further introduce our Adjoint Mean-Field Incentive Design (AMID) algorithm, which uses explicit differentiation of iterated equilibrium operators to compute gradients efficiently. By uniting approximation bounds with optimization guarantees, AMID delivers a powerful, scalable algorithmic tool for many-agent (large $N$) ID. Across diverse auction settings, the proposed AMID method substantially increases revenue over first-price formats and outperforms existing benchmark methods.
comment: 52 pages, to appear at NeurIPS 2025
HIKMA: Human-Inspired Knowledge by Machine Agents through a Multi-Agent Framework for Semi-Autonomous Scientific Conferences
HIKMA Semi-Autonomous Conference is the first experiment in reimagining scholarly communication through an end-to-end integration of artificial intelligence into the academic publishing and presentation pipeline. This paper presents the design, implementation, and evaluation of the HIKMA framework, which includes AI dataset curation, AI-based manuscript generation, AI-assisted peer review, AI-driven revision, AI conference presentation, and AI archival dissemination. By combining language models, structured research workflows, and domain safeguards, HIKMA shows how AI can support - not replace traditional scholarly practices while maintaining intellectual property protection, transparency, and integrity. The conference functions as a testbed and proof of concept, providing insights into the opportunities and challenges of AI-enabled scholarship. It also examines questions about AI authorship, accountability, and the role of human-AI collaboration in research.
CXRAgent: Director-Orchestrated Multi-Stage Reasoning for Chest X-Ray Interpretation
Chest X-ray (CXR) plays a pivotal role in clinical diagnosis, and a variety of task-specific and foundation models have been developed for automatic CXR interpretation. However, these models often struggle to adapt to new diagnostic tasks and complex reasoning scenarios. Recently, LLM-based agent models have emerged as a promising paradigm for CXR analysis, enhancing model's capability through tool coordination, multi-step reasoning, and team collaboration, etc. However, existing agents often rely on a single diagnostic pipeline and lack mechanisms for assessing tools' reliability, limiting their adaptability and credibility. To this end, we propose CXRAgent, a director-orchestrated, multi-stage agent for CXR interpretation, where a central director coordinates the following stages: (1) Tool Invocation: The agent strategically orchestrates a set of CXR-analysis tools, with outputs normalized and verified by the Evidence-driven Validator (EDV), which grounds diagnostic outputs with visual evidence to support reliable downstream diagnosis; (2) Diagnostic Planning: Guided by task requirements and intermediate findings, the agent formulates a targeted diagnostic plan. It then assembles an expert team accordingly, defining member roles and coordinating their interactions to enable adaptive and collaborative reasoning; (3) Collaborative Decision-making: The agent integrates insights from the expert team with accumulated contextual memories, synthesizing them into an evidence-backed diagnostic conclusion. Experiments on various CXR interpretation tasks show that CXRAgent delivers strong performance, providing visual evidence and generalizes well to clinical tasks of different complexity. Code and data are valuable at this \href{https://github.com/laojiahuo2003/CXRAgent/}{link}.
comment: 10 pages, 4 figures, 7 Tables
Shift Bribery over Social Networks
In shift bribery, a briber seeks to promote his preferred candidate by paying voters to raise their ranking. Classical models of shift bribery assume voters act independently, overlooking the role of social influence. However, in reality, individuals are social beings and are often represented as part of a social network, where bribed voters may influence their neighbors, thereby amplifying the effect of persuasion. We study Shift bribery over Networks, where voters are modeled as nodes in a directed weighted graph, and arcs represent social influence between them. In this setting, bribery is not confined to directly targeted voters its effects can propagate through the network, influencing neighbors and amplifying persuasion. Given a budget and individual cost functions for shifting each voter's preference toward a designated candidate, the goal is to determine whether a shift strategy exists within budget that ensures the preferred candidate wins after both direct and network-propagated influence takes effect. We show that the problem is NP-Complete even with two candidates and unit costs, and W[2]-hard when parameterized by budget or maximum degree. On the positive side, we design polynomial-time algorithms for complete graphs under plurality and majority rules and path graphs for uniform edge weights, linear-time algorithms for transitive tournaments for two candidates, linear cost functions and uniform arc weights, and pseudo-polynomial algorithms for cluster graphs. We further prove the existence of fixed-parameter tractable algorithms with treewidth as parameter for two candidates, linear cost functions and uniform arc weights and pseudo-FPT with cluster vertex deletion number for two candidates and uniform arc weights. Together, these results give a detailed complexity landscape for shift bribery in social networks.
Central Bank Digital Currency, Flight-to-Quality, and Bank-Runs in an Agent-Based Model
We analyse financial stability and welfare impacts associated with the introduction of a Central Bank Digital Currency (CBDC) in a macroeconomic agent-based model. The model considers firms, banks, and households interacting on labour, goods, credit, and interbank markets. Households move their liquidity from deposits to CBDC based on the perceived riskiness of their banks. We find that the introduction of CBDC exacerbates bank-runs and may lead to financial instability phenomena. The effect can be changed by introducing a limit on CBDC holdings. The adoption of CBDC has little effect on macroeconomic variables but the interest rate on loans to firms goes up and credit goes down in a limited way. CBDC leads to a redistribution of wealth from firms and banks to households with a higher bank default rate. CBDC may have negative welfare effects, but a bound on holding enables a welfare improvement.
Evaluation of A Spatial Microsimulation Framework for Small-Area Estimation of Population Health Outcomes Using the Behavioral Risk Factor Surveillance System
This study introduces the Spatial Health and Population Estimator (SHAPE), a spatial microsimulation framework that applies hierarchical iterative proportional fitting (IPF) to estimate two health risk behaviors and eleven health outcomes across multiple spatial scales. SHAPE was evaluated using county-level direct estimates from the Behavioral Risk Factor Surveillance System (BRFSS) and both county and census tract level data from CDC PLACES for New York (2021) and Florida (2019). Results show that SHAPE's SAEs are moderately consistent with BRFSS (average Pearson's correlation coefficient r of about 0.5), similar to CDC PLACES (average r of about 0.6), and are strongly aligned with CDC PLACES model-based estimates at both county (average r of about 0.8) and census tract (average r of about 0.7) levels. SHAPE is an open, reproducible, and transparent framework programmed in R that meets a need for accessible SAE methods in public health.
LLM-augmented empirical game theoretic simulation for social-ecological systems
Designing institutions for social-ecological systems requires models that capture heterogeneity, uncertainty, and strategic interaction. Multiple modeling approaches have emerged to meet this challenge, including empirical game-theoretic analysis (EGTA), which merges ABM's scale and diversity with game-theoretic models' formal equilibrium analysis. The newly popular class of LLM-driven simulations provides yet another approach, and it is not clear how these approaches can be integrated with one another, nor whether the resulting simulations produce a plausible range of behaviours for real-world social-ecological governance. To address this gap, we compare four LLM-augmented frameworks: procedural ABMs, generative ABMs, LLM-EGTA, and expert guided LLM-EGTA, and evaluate them on a real-world case study of irrigation and fishing in the Amu Darya basin under centralized and decentralized governance. Our results show: first, procedural ABMs, generative ABMs, and LLM-augmented EGTA models produce strikingly different patterns of collective behaviour, highlighting the value of methodological diversity. Second, inducing behaviour through system prompts in LLMs is less effective than shaping behaviour through parameterized payoffs in an expert-guided EGTA-based model.
ColorAgent: Building A Robust, Personalized, and Interactive OS Agent
With the advancements in hardware, software, and large language model technologies, the interaction between humans and operating systems has evolved from the command-line interface to the rapidly emerging AI agent interactions. Building an operating system (OS) agent capable of executing user instructions and faithfully following user desires is becoming a reality. In this technical report, we present ColorAgent, an OS agent designed to engage in long-horizon, robust interactions with the environment while also enabling personalized and proactive user interaction. To enable long-horizon interactions with the environment, we enhance the model's capabilities through step-wise reinforcement learning and self-evolving training, while also developing a tailored multi-agent framework that ensures generality, consistency, and robustness. In terms of user interaction, we explore personalized user intent recognition and proactive engagement, positioning the OS agent not merely as an automation tool but as a warm, collaborative partner. We evaluate ColorAgent on the AndroidWorld and AndroidLab benchmarks, achieving success rates of 77.2% and 50.7%, respectively, establishing a new state of the art. Nonetheless, we note that current benchmarks are insufficient for a comprehensive evaluation of OS agents and propose further exploring directions in future work, particularly in the areas of evaluation paradigms, agent collaboration, and security.
Mean-Field Sampling for Cooperative Multi-Agent Reinforcement Learning AAAI 2025
Designing efficient algorithms for multi-agent reinforcement learning (MARL) is fundamentally challenging because the size of the joint state and action spaces grows exponentially in the number of agents. These difficulties are exacerbated when balancing sequential global decision-making with local agent interactions. In this work, we propose a new algorithm $\texttt{SUBSAMPLE-MFQ}$ ($\textbf{Subsample}$-$\textbf{M}$ean-$\textbf{F}$ield-$\textbf{Q}$-learning) and a decentralized randomized policy for a system with $n$ agents. For any $k\leq n$, our algorithm learns a policy for the system in time polynomial in $k$. We prove that this learned policy converges to the optimal policy on the order of $\tilde{O}(1/\sqrt{k})$ as the number of subsampled agents $k$ increases. In particular, this bound is independent of the number of agents $n$.
comment: 53 pages. AAAI 2025 MARW Best Paper Award. Accepted at NeurIPS 2025 (spotlight)
Revisiting Multi-Agent World Modeling from a Diffusion-Inspired Perspective NIPS'25
World models have recently attracted growing interest in Multi-Agent Reinforcement Learning (MARL) due to their ability to improve sample efficiency for policy learning. However, accurately modeling environments in MARL is challenging due to the exponentially large joint action space and highly uncertain dynamics inherent in multi-agent systems. To address this, we reduce modeling complexity by shifting from jointly modeling the entire state-action transition dynamics to focusing on the state space alone at each timestep through sequential agent modeling. Specifically, our approach enables the model to progressively resolve uncertainty while capturing the structured dependencies among agents, providing a more accurate representation of how agents influence the state. Interestingly, this sequential revelation of agents' actions in a multi-agent system aligns with the reverse process in diffusion models--a class of powerful generative models known for their expressiveness and training stability compared to autoregressive or latent variable models. Leveraging this insight, we develop a flexible and robust world model for MARL using diffusion models. Our method, Diffusion-Inspired Multi-Agent world model (DIMA), achieves state-of-the-art performance across multiple multi-agent control benchmarks, significantly outperforming prior world models in terms of final return and sample efficiency, including MAMuJoCo and Bi-DexHands. DIMA establishes a new paradigm for constructing multi-agent world models, advancing the frontier of MARL research. Codes are open-sourced at https://github.com/breez3young/DIMA.
comment: Accepted at NIPS'25
Semantic knowledge guides innovation and drives cultural evolution
Cultural evolution allows ideas and technology to build over generations, a process reaching its most complex and open-ended form in humans. While social learning enables the transmission of such innovations, the cognitive processes that generate innovations remain unclear. We propose that semantic knowledge-the associations linking concepts to their properties and functions-guides human innovation and drives cumulative culture. To test this, we combined an agent-based model, which examines how semantic knowledge shapes cultural evolutionary dynamics, with a large-scale behavioural experiment (N = 1,243) testing its role in human innovation. Semantic knowledge directed exploration toward meaningful solutions and interacted synergistically with social learning to amplify innovation and cultural evolution. Participants lacking access to semantic knowledge performed no better than chance, even when social information was available, and relied on shallow exploration strategies for innovation. Together, these findings indicate that semantic knowledge is a key cognitive process enabling human cumulative culture.
Simulating Society Requires Simulating Thought NeurIPS 2025
Simulating society with large language models (LLMs), we argue, requires more than generating plausible behavior; it demands cognitively grounded reasoning that is structured, revisable, and traceable. LLM-based agents are increasingly used to emulate individual and group behavior, primarily through prompting and supervised fine-tuning. Yet current simulations remain grounded in a behaviorist "demographics in, behavior out" paradigm, focusing on surface-level plausibility. As a result, they often lack internal coherence, causal reasoning, and belief traceability, making them unreliable for modeling how people reason, deliberate, and respond to interventions. To address this, we present a conceptual modeling paradigm, Generative Minds (GenMinds), which draws from cognitive science to support structured belief representations in generative agents. To evaluate such agents, we introduce the RECAP (REconstructing CAusal Paths) framework, a benchmark designed to assess reasoning fidelity via causal traceability, demographic grounding, and intervention consistency. These contributions advance a broader shift: from surface-level mimicry to generative agents that simulate thought, not just language, for social simulations.
comment: NeurIPS 2025 (Position Paper Track)
Dipper: Diversity in Prompts for Producing Large Language Model Ensembles in Reasoning tasks EMNLP 2025
Large Language Models (LLMs), particularly smaller variants, still struggle with complex reasoning tasks. While inference-time prompting can guide reasoning, existing methods often rely on sequential queries. Ensemble approaches offer a promising path to performance gains, especially given recent batch inference speed-ups. This work introduces DIPPER, a novel, training-free framework that transforms a single LLM into an effective inference-time ensemble. By feeding the model an optimized and diverse set of prompts in parallel, DIPPER elicits varied reasoning paths, leading to performance gains. We empirically demonstrate significant improvements on reasoning benchmarks, such as MATH, where a DIPPER ensemble of three Qwen2-MATH-1.5B instances (via parallel prompting of a single model) outperforms a larger 7B model.
comment: Accepted to EMNLP 2025 Main Conference
Systems and Control (CS)
Rate-cost tradeoffs in continuous-time control with a biomolecular application
This paper focuses on rate-limited control of the generalized Ornstein-Uhlenbeck process where the control action can be either multiplicative or additive, and the noise variance can depend on the control action. We derive a lower bound on the data rate necessary to achieve the desired control cost. The lower bound is attained with equality if the control is performed via an additive white Gaussian channel. The system model approximates the dynamics of a discrete-state molecular birth-death process, and the result has direct implications on the control of a biomolecular system via chemical reactions, where the multiplicative control corresponds to the degradation rate, the additive control corresponds to the production rate, and the control objective is to decrease the fluctuations of the controlled molecular species around their desired concentration levels.
System-Theoretic Analysis of Dynamic Generalized Nash Equilibrium Problems -- Turnpikes and Dissipativity
Generalized Nash equilibria are used in multi-agent control applications to model strategic interactions between agents that are coupled in the cost, dynamics, and constraints. We study the properties of open-loop GNE trajectories from a system-theoretic perspective. We show how strict dissipativity generates the turnpike phenomenon in GNE solutions. Moreover, we establish a converse turnpike result, i.e., the implication from turnpike to strict dissipativity. We derive conditions under which the steady-state GNE is the optimal operating point and, using a game value function, we give a local characterization of the geometry of storage functions. Finally, we design linear terminal penalties that ensure GNE open-loop trajectories converge to and remain at the steady-state GNE. These connections provide the foundation for future system-theoretic analysis of GNEs similar to those existing in optimal control.
Auction-Based Responsibility Allocation for Scalable Decentralized Safety Filters in Cooperative Multi-Agent Collision Avoidance
This paper proposes a scalable decentralized safety filter for multi-agent systems based on high-order control barrier functions (HOCBFs) and auction-based responsibility allocation. While decentralized HOCBF formulations ensure pairwise safety under input bounds, they face feasibility and scalability challenges as the number of agents grows. Each agent must evaluate an increasing number of pairwise constraints, raising the risk of infeasibility and making it difficult to meet real-time requirements. To address this, we introduce an auction-based allocation scheme that distributes constraint enforcement asymmetrically among neighbors based on local control effort estimates. The resulting directed responsibility graph guarantees full safety coverage while reducing redundant constraints and per-agent computational load. Simulation results confirm safe and efficient coordination across a range of network sizes and interaction densities.
comment: 6 pages, 3 figures, Submitted to Control Engineering Practice and IFAC World Congress 2026
Analysis and Synthesis of Switched Optimization Algorithms
Deployment of optimization algorithms on networked systems face challenges associated with time delays and corruptions. One particular instance is the presence of time-varying delays arising from factors such as packet drops and irregular sampling. Fixed time delays can destabilize gradient descent algorithms, and this degradation is exacerbated by time-varying delays. This work concentrates on the analysis and creation of discrete-time optimization algorithms with certified exponential convergence rates that are robust against switched uncertainties between the optimizer and the gradient oracle. These optimization algorithms are implemented by a switch-scheduled output feedback controllers. Rate variation and sawtooth behavior (packet drops) in time-varying delays can be imposed through constraining switching sequences. Analysis is accomplished by bisection in the convergence rate to find Zames-Falb filter coefficents. Synthesis is performed by alternating between a filter coefficient search for a fixed controller, and a controller search for fixed multipliers.
Robust Regret Control with Uncertainty-Dependent Baseline
This paper proposes a robust regret control framework in which the performance baseline adapts to the realization of system uncertainty. The plant is modeled as a discrete-time, uncertain linear time-invariant system with real-parametric uncertainty. The performance baseline is the optimal non-causal controller constructed with full knowledge of the disturbance and the specific realization of the uncertain plant. We show that a controller achieves robust additive regret relative to this baseline if and only if it satisfies a related, robust $H_\infty$ performance condition on a modified plant. One technical issue is that the modified plant can, in general, have a complicated nonlinear dependence on the uncertainty. We use a linear approximation step so that the robust additive regret condition can be recast as a standard $\mu$-synthesis problem. A numerical example is used to demonstrate the proposed approach.
Predictive control barrier functions for piecewise affine systems with non-smooth constraints
Obtaining control barrier functions (CBFs) with large safe sets for complex nonlinear systems and constraints is a challenging task. Predictive CBFs address this issue by using an online finite-horizon optimal control problem that implicitly defines a large safe set. The optimal control problem, also known as the predictive safety filter (PSF), involves predicting the system's flow under a given backup control policy. However, for non-smooth systems and constraints, some key elements, such as CBF gradients and the sensitivity of the flow, are not well-defined, making the current methods inadequate for ensuring safety. Additionally, for control-non-affine systems, the PSF is generally nonlinear and non-convex, posing challenges for real-time computation. This paper considers piecewise affine systems, which are usually control-non-affine, under nonlinear state and polyhedral input constraints. We solve the safety issue by incorporating set-valued generalized Clarke derivatives in the PSF design. We show that enforcing CBF constraints across all elements of the generalized Clarke derivatives suffices to guarantee safety. Moreover, to lighten the computational overhead, we propose an explicit approximation of the PSF. The resulting control methods are demonstrated through numerical examples.
Data-driven Koopman MPC using Mixed Stochastic-Deterministic Tubes
This paper presents a novel data-driven stochastic MPC design for discrete-time nonlinear systems with additive disturbances by leveraging the Koopman operator and a distributionally robust optimization (DRO) framework. By lifting the dynamical system into a linear space, we achieve a finite-dimensional approximation of the Koopman operator. We explicitly account for the modeling approximation and additive disturbance error by a mixed stochastic-deterministic tube for the lifted linear model. This ensures the regulation of the original nonlinear system while complying with the prespecified constraints. Stochastic and deterministic tubes are constructed using a DRO and a hyper-cube hull, respectively. We provide finite sample error bounds for both types of tubes. The effectiveness of the proposed approach is demonstrated through numerical simulations.
comment: This is the accepted version. It will appear in Journal of Process Control, 2025
The PhasorArray Toolbox for Harmonic Analysis and Control Design
We present a MATLAB package called the Pha-sorArray Toolbox that has been developed to make harmonic analysis and control methods both practical and user-friendly. The toolbox adopts an object-oriented architecture that enables intuitive manipulation of periodic matrices through overloaded operators for addition, multiplication, convolution, and automatic Toeplitz construction. Its advanced features include harmonic Sylvester, Lyapunov and Riccati equations solvers, and seamless integration with YALMIP, thereby facilitating advanced control and analysis techniques based on Linear Matrix Inequalities (LMIs) in the harmonic framework.
Physics-Informed Neural Networks for MIMO Beam Map and Environment Reconstruction
As communication networks evolve towards greater complexity (e.g., 6G and beyond), a deep understanding of the wireless environment becomes increasingly crucial. When explicit knowledge of the environment is unavailable, geometry-aware feature extraction from channel state information (CSI) emerges as a pivotal methodology to bridge physical-layer measurements with network intelligence. This paper proposes to explore the received signal strength (RSS) data, without explicit 3D environment knowledge, to jointly construct the radio beam map and environmental geometry for a multiple-input multiple-output (MIMO) system. Unlike existing methods that only learn blockage structures, we propose an oriented virtual obstacle model that captures the geometric features of both blockage and reflection. Reflective zones are formulated to identify relevant reflected paths according to the geometry relation of the environment. We derive an analytical expression for the reflective zone and further analyze its geometric characteristics to develop a reformulation that is more compatible with deep learning representations. A physics-informed deep learning framework that incorporates the reflective-zone-based geometry model is proposed to learn the blockage, reflection, and scattering components, along with the beam pattern, which leverages physics prior knowledge to enhance network transferability. Numerical experiments demonstrate that, in addition to reconstructing the blockage and reflection geometry, the proposed model can construct a more accurate MIMO beam map with a 32%-48% accuracy improvement.
The Role of Information Incompleteness in Defending Against Stealth Attacks
The effectiveness of Data Injections Attacks (DIAs) critically depends on the completeness of the system information accessible to adversaries. This relationship positions information incompleteness enhancement as a vital defense strategy for degrading DIA performance. In this paper, we focus on the information-theoretic stealth attacks, where the attacker encounters a fundamental tradeoff between the attack stealthiness and destructiveness. Specifically, we systematically characterize how incomplete admittance information impacts the dual objectives. In particular, we establish sufficient conditions for two distinct operational regimes: (i) stealthiness intensifies while destructive potential diminishes and (ii) destructiveness increases while stealth capability weakens. For scenarios beyond these regimes, we propose a maximal incompleteness strategy to optimally degrade stealth capability. To solve the associated optimization problem, the feasible region is reduced without excluding the optimal solution, and a heuristic algorithm is then introduced to effectively identify the near-optimal solutions within the reduced region. Numerical simulations are conducted on IEEE test systems to validate the findings.
Green Hydrogen under Uncertainty: Evaluating Power-to-X Strategies Using Agent-Based Simulation and Multi-Criteria Decision Framework
The transition toward net-zero energy systems requires scalable and cost-effective deployment of Power-to-X technologies, particularly green hydrogen production. Despite increasing investments, a critical research gap remains in dynamically assessing how different operational strategies affect the feasibility of hydrogen production under real-world energy market conditions. Most existing studies rely on static, techno-economic models and overlook actor interactions, infrastructure limitations, and regulatory complexity. This paper presents a novel modeling framework that integrates agent-based simulation with multi-criteria decision-making to evaluate green hydrogen production strategies using co-located wind and solar generation. Three operational strategies - grid-only, on-site-only, and hybrid - are applied across three electrolyzer capacity levels (10 MW, 50 MW, and 100 MW) within a Danish case study. Real electricity tariffs, emissions factors, and market data are used to simulate technical, economic, and environmental performance indicators. The results show that hybrid strategies consistently outperform grid-only configurations in terms of cost and emissions while maintaining stable hydrogen output. Although on-site-only strategies minimize emissions and costs, they fail to meet fixed production demands. This framework offers novel scientific contributions by modeling dynamic actor interactions and integrating system performance evaluation into strategic planning. Practically, it provides actionable insights for energy planners and policymakers designing resilient and efficient Power-to-X systems in renewable-rich contexts.
The local Gaussian correlation networks among return tails in the Chinese stock market
Financial networks based on Pearson correlations have been intensively studied. However, previous studies may have led to misleading and catastrophic results because of several critical shortcomings of the Pearson correlation. The local Gaussian correlation coefficient, a new measurement of statistical dependence between variables, has unique advantages including capturing local nonlinear dependence and handling heavy-tailed distributions. This study constructs financial networks using the local Gaussian correlation coefficients between tail regions of stock returns in the Shanghai Stock Exchange. The work systematically analyzes fundamental network metrics including node centrality, average shortest path length, and entropy. Compared with the local Gaussian correlation network among positive tails and the conventional Pearson correlation network, the properties of the local Gaussian correlation network among negative tails are more sensitive to the stock market risks. This finding suggests researchers should prioritize the local Gaussian correlation network among negative tails. Future work should reevaluate existing findings using the local Gaussian correlation method.
Environment-Dependent Components Identification of Behind-the-Meter Resources via Inverse Optimization
With the increasing penetration of behind-the-meter (BTM) resources, it is vital to monitor the components of these resources and deduce their response behavior to external environment. Owing to data privacy, however, the appliance-wise measurement is invisible to the power system operator, which hinders the accurate modeling of load identification. To this end, this paper proposes a hybrid physics-inspired and data-driven framework for decomposing BTM components based on external measurement of total load and environmental factors. The total load is decomposed into different environment-dependent components, namely storage-like component, PV generation component, thermostatically-controlled load component, and periodic component. The overall load identification adopts a double-layer iterative solution framework. A data-driven inverse optimization algorithm is developed to identify parameters of the energy storage-like component. The physics-inspired model is proposed to identify the capacity and response of the rest components. The modeling accuracy and robustness of the proposed method are validated by numerical tests. The application significance of the proposed BTM identification method is also validated in electricity market clearing for reducing system operation costs.
High-Performance Rotor Cooling with Ducted Liquid in Completely Cold-Formed Modular Motor Shaft
This paper suggests a novel rotor-cooling shaft concept for high-performance electric motors that increases the effectiveness of cooling and is yet simple and cost-effective to manufacture. We investigate the thermal performance of four shaft geometries for rotor cooling in automotive applications. The proposed tooth-guided liquid-cooling shaft design aims to solve the high churning loss of conventional cooled rotor shafts due to internal vortex formation and their still limited heat transfer. Therefore, we optimize heat transfer efficiency and pressure management by incorporating cold-formed internal channels that restrict vortex formation beyond a degree that improves heat transfer. We evaluated key performance metrics, including heat transfer rate, outlet temperature, pressure drop, and velocity profiles, under varying rotational speeds, inlet flow rates, and coolant temperatures. Computational fluid analysis demonstrates that the tooth-guided design outperforms conventional hollow shafts and achieves up to 110% higher cooling efficiency at low rotational speeds, while it maintains comparable pressure levels. These findings provide practical insight into geometry-driven thermal optimization and offer a path toward improving the performance and durability of electric motors.
comment: 11 pages, 21 figures
Control of neural field equations with step-function inputs
Wilson-Cowan and Amari-type models capture nonlinear neural population dynamics, providing a fundamental framework for modeling how sensory and other exogenous inputs shape activity in neural tissue. We study the controllability properties of Amari-type neural fields subject to piecewise/constant-in-time inputs. The model describes the time evolution of the polarization of neural tissue within a spatial continuum, with synaptic interactions represented by a convolution kernel. We study the synthesis of piecewise/constant-in-time inputs to achieve two-point boundary-type control objectives, namely, steering neural activity from an initial state to a prescribed target state. This approach is particularly relevant for predicting the emergence of paradoxical neural representations, such as discordant visual illusions that occur in response to overt sensory stimuli. We first present a control synthesis based on the Banach fixed-point theorem, which yields an iterative construction of a constant-in-time input under minimal regularity assumptions on the kernel and transfer function; however, it exhibits practical limitations, even in the linear case. To overcome these challenges, we then develop a generic synthesis framework based on the flow of neural dynamics drift, enabling explicit piecewise constant and constant-in-time inputs. Extensive numerical results in one and two spatial dimensions confirm the effectiveness of the proposed syntheses and demonstrate their superior performance compared to inputs derived from naive linearization at the initial or target states when these states are not equilibria of the drift dynamics. By providing a mathematically rigorous framework for controlling Amari-type neural fields, this work advances our understanding of nonlinear neural population control with potential applications in computational neuroscience, psychophysics, and neurostimulation.
A Hybrid GNN-LSE Method for Fast, Robust, and Physically-Consistent AC Power Flow
Conventional AC Power Flow (ACPF) solvers like Newton-Raphson (NR) face significant computational and convergence challenges in modern, large-scale power systems. This paper proposes a novel, two-stage hybrid method that integrates a Physics-Informed Graph Neural Network (GNN) with a robust, iterative Linear State Estimation (LSE) refinement step to produce fast and physically-consistent solutions. The GNN, trained with a physics-informed loss function featuring an efficient dynamic weighting scheme, rapidly predicts a high-quality initial system state. This prediction is then refined using an iterative, direct linear solver inspired by state estimation techniques. This LSE refinement step solves a series of linear equations to enforce physical laws, effectively bypassing the non-linearities and convergence issues of traditional solvers. The proposed GNN-LSE framework is comprehensively validated on systems ranging from small radial distribution networks (IEEE 33-bus, 69-bus) to a large, meshed transmission system (IEEE 118-bus). Results show that our GNN variants are up to $8.4 \times 10^3$ times faster than NR. The LSE refinement provides a fast route to a physically-consistent solution, while heavy-loading stress tests (120%-150% of nominal) and N-1 contingencies demonstrate the method's reliability and generalization. This work presents a powerful and flexible framework for bridging fast, data-driven models with the rigorous constraints of power system physics, offering a practical tool for real-time operations and analysis.
Motion Planning with Precedence Specifications via Augmented Graphs of Convex Sets
We present an algorithm for planning trajectories that avoid obstacles and satisfy key-door precedence specifications expressed with a fragment of signal temporal logic. Our method includes a novel exact convex partitioning of the obstacle free space that encodes connectivity among convex free space sets, key sets, and door sets. We then construct an augmented graph of convex sets that exactly encodes the key-door precedence specifications. By solving a shortest path problem in this augmented graph of convex sets, our pipeline provides an exact solution up to a finite parameterization of the trajectory. To illustrate the effectiveness of our approach, we present a method to generate key-door mazes that provide challenging problem instances, and we perform numerical experiments to evaluate the proposed pipeline. Our pipeline is faster by several orders of magnitude than recent state-of-the art methods that use general purpose temporal logic tools.
Pricing Problems in Adoption of New Technologies
We propose a generalization of the Bass diffusion model in discrete-time that explicitly models the effect of price in adoption. Our model is different from earlier price-incorporated models and fits well to adoption data for various products. We then utilize this model to study two decision-making problems. First, we provide a series of structural results on optimal pricing strategies to maximize profits from product sales by a monopolist over a finite horizon. We fully characterize the optimal pricing strategy in the single-period problem, and establish several structural properties of the same for the multi-period counterpart. Second, we study a Stackelberg game between a policy-maker and a monopolist, where the former seeks to maximize adoption through rebates, while the latter focuses on profits. For this problem, we analytically characterize crucial properties of the equilibrium path of the single-period game, and demonstrate how they carry over to the multi-period variant.
Fixed Horizon Linear Quadratic Covariance Steering in Continuous Time with Hilbert-Schmidt Terminal Cost
We formulate and solve the fixed horizon linear quadratic covariance steering problem in continuous time with a terminal cost measured in Hilbert-Schmidt (i.e., Frobenius) norm error between the desired and the controlled terminal covariances. For this problem, the necessary conditions of optimality become a coupled matrix ODE two-point boundary value problem. To solve this system of equations, we design a matricial recursive algorithm and prove its convergence. The proposed algorithm and its analysis make use of the linear fractional transforms parameterized by the state transition matrix of the associated Hamiltonian matrix. To illustrate the results, we provide two numerical examples: one with a two dimensional and another with a six dimensional state space.
A Perspective on the Algebra, Topology, and Logic of Electrical Networks
This paper presents a unified algebraic, topological, and logical framework for electrical one-port networks based on \v{S}are's $m$-theory. Within this formalism, networks are represented by $m$-words (jorbs) over an ordered alphabet, where series and parallel composition induce an $m$-topology on $m$-graphs with a theta mapping $\vartheta$ that preserves one-port equivalence. The study formalizes quasi-orders, shells, and cores, showing their structural correspondence to network boundary conditions and impedance behavior. The $\lambda--\Delta$ metric, together with the valuation morphism $\Phi$, provides a concise descriptor of the impedance-degree structure. In the computational domain, the framework is extended with algorithmic procedures for generating and classifying non-isomorphic series-parallel topologies, accompanied by programmatic Cauer/Foster synthesis workflows and validation against canonical examples from Ladenheim's catalogue. The resulting approach enables symbolic-to-topological translation of impedance functions, offering a constructive bridge between algebraic representation and electrical realization. Overall, the paper outlines a self-consistent theoretical and computational foundation for automated network synthesis, classification, and formal verification within the emerging field of Jorbology.
Time-causal and time-recursive wavelets
When to apply wavelet analysis to real-time temporal signals, where the future cannot be accessed, it is essential to base all the steps in the signal processing pipeline on computational mechanisms that are truly time-causal. This paper describes how a time-causal wavelet analysis can be performed based on concepts developed in the area of temporal scale-space theory, originating from a complete classification of temporal smoothing kernels that guarantee non-creation of new structures from finer to coarser temporal scale levels. By necessity, convolution with truncated exponential kernels in cascade constitutes the only permissable class of kernels, as well as their temporal derivatives as a natural complement to fulfil the admissibility conditions of wavelet representations. For a particular way of choosing the time constants in the resulting infinite convolution of truncated exponential kernels, to ensure temporal scale covariance and thus self-similarity over temporal scales, we describe how mother wavelets can be chosen as temporal derivatives of the resulting time-causal limit kernel. By developing connections between wavelet theory and scale-space theory, we characterize and quantify how the continuous scaling properties transfer to the discrete implementation, demonstrating how the proposed time-causal wavelet representation can reflect the duration of locally dominant temporal structures in the input signals. We propose that this notion of time-causal wavelet analysis could be a valuable tool for signal processing tasks, where streams of signals are to be processed in real time, specifically for signals that may contain local variations over a rich span of temporal scales, or more generally for analysing physical or biophysical temporal phenomena, where a fully time-causal analysis is called for to be physically realistic.
comment: 25 pages, 10 figures
Trajectory Optimization for Minimum Threat Exposure using Physics-Informed Neural Networks
We apply a physics-informed neural network (PINN) to solve the two-point boundary value problem (BVP) arising from the necessary conditions postulated by Pontryagin's Minimum Principle for optimal control. Such BVPs are known to be numerically difficult to solve by traditional shooting methods due to extremely high sensitivity to initial guesses. In the light of recent successes in applying PINNs for solving high-dimensional differential equations, we develop a PINN to solve the problem of finding trajectories with minimum exposure to a spatiotemporal threat for a vehicle kinematic model. First, we implement PINNs that are trained to solve the BVP for a given pair of initial and final states for a given threat field. Next, we implement a PINN conditioned on the initial state for a given threat field, which eliminates the need for retraining for each initial state. We demonstrate that the PINN outputs satisfy the necessary conditions with low numerical error.
comment: 2025 Indian Control Conference
Operational Risks in Grid Integration of Large Data Center Loads: Characteristics, Stability Assessments, and Sensitivity Studies
This paper investigates the dynamic interactions between large-scale data centers and the power grid, focusing on reliability challenges arising from sudden fluctuations in demand. With the rapid growth of AI-driven workloads, such fluctuations, along with fast ramp patterns, are expected to exacerbate stressed grid conditions and system instabilities. We consider a few open-source AI data center consumption profiles from the MIT supercloud datasets, along with generating a few experimental HPC job-distribution-based inference profiles. Subsequently, we develop analytical methodologies for real-time assessment of grid stability, focusing on both transient and small-signal stability assessments. Energy-flow-like metrics for nonlinear transient stability, formulated by computing localized data center bus kinetic-like flows and coupling interactions with neighboring buses over varying time windows, help provide operators with real-time assessments of the regional grid stress in the data center hubs. On the other hand, small-signal stability metrics, constructed from analytical state matrices under variable operating conditions during a fast ramping period, enable snapshot-based assessments of data center load fluctuations and provide enhanced observability into evolving grid conditions. By quantifying the stability impacts of large data center clusters, studies conducted in the modified IEEE benchmark $68-$bus model support improved operator situational awareness to capture risks in reliable integration of large data center loads.
comment: 13 pages, 8 figures, 3 tables
A Novel State-Centric Necessary Condition for Time-Optimal Control of Controllable Linear Systems Based on Augmented Switching Laws (Extended Version)
Most existing necessary conditions for optimal control based on adjoining methods require both state and costate information, yet the unobservability of costates for a given feasible trajectory impedes the determination of optimality in practice. This paper establishes a novel theoretical framework for time-optimal control of controllable linear systems with a single input, proposing the augmented switching law (ASL) that represents the input control and the feasibility in a compact form. Given a feasible trajectory, the perturbed trajectory under the constraints of ASL is guaranteed to be feasible, resulting in a novel state-centric necessary condition without dependence on costate information. A first-order necessary condition is proposed that the Jacobian matrix of the ASL is not of full row rank, which also results in a potential approach to optimizing a given feasible trajectory with the preservation of arc structures. The proposed necessary condition is applied to high-order chain-of-integrator systems with full box constraints, contributing to some theoretical results challenging to reason by costate-based conditions.
comment: This paper has been published in IEEE TAC
Mean-Field Sampling for Cooperative Multi-Agent Reinforcement Learning AAAI 2025
Designing efficient algorithms for multi-agent reinforcement learning (MARL) is fundamentally challenging because the size of the joint state and action spaces grows exponentially in the number of agents. These difficulties are exacerbated when balancing sequential global decision-making with local agent interactions. In this work, we propose a new algorithm $\texttt{SUBSAMPLE-MFQ}$ ($\textbf{Subsample}$-$\textbf{M}$ean-$\textbf{F}$ield-$\textbf{Q}$-learning) and a decentralized randomized policy for a system with $n$ agents. For any $k\leq n$, our algorithm learns a policy for the system in time polynomial in $k$. We prove that this learned policy converges to the optimal policy on the order of $\tilde{O}(1/\sqrt{k})$ as the number of subsampled agents $k$ increases. In particular, this bound is independent of the number of agents $n$.
comment: 53 pages. AAAI 2025 MARW Best Paper Award. Accepted at NeurIPS 2025 (spotlight)
Hybrid MAC Protocol with Integrated Multi-Layered Security for Resource-Constrained UAV Swarm Communications
Flying Ad Hoc Networks (FANETs) present unique challenges due to high node mobility, dynamic topologies, and strict resource constraints. Existing routing protocols often optimize for a single metric, such as path length or energy, while neglecting the complex dependencies between network performance, security, and MAC layer efficiency. This paper introduces a novel hardware software co design framework for secure and adaptive UAV swarm communications, featuring an energy aware protocol stack. The architecture employs a multicast, clustered organization where routing decisions integrate dynamic trust scores, historical link quality, and internodal distance. A hybrid MAC protocol combines contention based and scheduled channel access for optimized throughput. Security is ensured through a zero trust model that fuses cryptographic authentication with a behavioral reputation system, alongside hardware accelerated AES GCM encryption. Comparative analysis in an NS 3 simulation environment demonstrates the framework's superiority in packet delivery ratio, latency, resilience, and overhead, providing a scalable foundation for high performance swarm operations.
comment: Accepted at ISED 2025
Predictability Enables Parallelization of Nonlinear State Space Models NeurIPS '25
The rise of parallel computing hardware has made it increasingly important to understand which nonlinear state space models can be efficiently parallelized. Recent advances like DEER (arXiv:2309.12252) or DeepPCR (arXiv:2309.16318) have shown that evaluating a state space model can be recast as solving a parallelizable optimization problem, and sometimes this approach can yield dramatic speed-ups in evaluation time. However, the factors that govern the difficulty of these optimization problems remain unclear, limiting the larger adoption of the technique. In this work, we establish a precise relationship between the dynamics of a nonlinear system and the conditioning of its corresponding optimization formulation. We show that the predictability of a system, defined as the degree to which small perturbations in state influence future behavior, impacts the number of optimization steps required for evaluation. In predictable systems, the state trajectory can be computed in $O((\log T)^2)$ time, where $T$ is the sequence length, a major improvement over the conventional sequential approach. In contrast, chaotic or unpredictable systems exhibit poor conditioning, with the consequence that parallel evaluation converges too slowly to be useful. Importantly, our theoretical analysis demonstrates that for predictable systems, the optimization problem is always well-conditioned, whereas for unpredictable systems, the conditioning degrades exponentially as a function of the sequence length. We validate our claims through extensive experiments, providing practical guidance on when nonlinear dynamical systems can be efficiently parallelized, and highlighting predictability as a key design principle for parallelizable models.
comment: NeurIPS '25. Code: https://github.com/lindermanlab/predictability_enables_parallelization
Robust time series generation via Schrödinger Bridge: a comprehensive evaluation
We investigate the generative capabilities of the Schr\"odinger Bridge (SB) approach for time series. The SB framework formulates time series synthesis as an entropic optimal interpolation transport problem between a reference probability measure on path space and a target joint distribution. This results in a stochastic differential equation over a finite horizon that accurately captures the temporal dynamics of the target time series. While the SB approach has been largely explored in fields like image generation, there is a scarcity of studies for its application to time series. In this work, we bridge this gap by conducting a comprehensive evaluation of the SB method's robustness and generative performance. We benchmark it against state-of-the-art (SOTA) time series generation methods across diverse datasets, assessing its strengths, limitations, and capacity to model complex temporal dependencies. Our results offer valuable insights into the SB framework's potential as a versatile and robust tool for time series generation.
comment: 9 pages
Optimal Rates of Convergence for Entropy Regularization in Discounted Markov Decision Processes
We study the error introduced by entropy regularization in infinite-horizon, discrete, discounted Markov decision processes. We show that this error decreases exponentially in the inverse regularization strength both in a weighted KL-divergence and in value with a problem-specific exponent. This is in contrast to previously known estimates, of the order $O(\tau)$, where $\tau$ is the regularization strength. We provide a lower bound matching our upper bound up to a polynomial term, thereby characterizing the exponential convergence rate for entropy regularization. Our proof relies on the observation that the solutions of entropy-regularized Markov decision processes solve a gradient flow of the unregularized reward with respect to a Riemannian metric common in natural policy gradient methods. This correspondence allows us to identify the limit of this gradient flow as the generalized maximum entropy optimal policy, thereby characterizing the implicit bias of this gradient flow, which corresponds to a time-continuous version of the natural policy gradient method. We use our improved error estimates to show that for entropy-regularized natural policy gradient methods, the overall error decays exponentially in the square root of the number of iterations, improving over existing sublinear guarantees. Finally, we extend our analysis to settings beyond the entropy. In particular, we characterize the implicit bias regarding general convex potentials and their resulting generalized natural policy gradients.
comment: 32 pages, 1 figure
A Traffic Prediction-Based Individualized Driver Warning System to Reduce Red Light Violations SC
Red light violation is a major cause of traffic collisions and resulting injuries and fatalities. Despite extensive prior work to reduce red light violations, they continue to be a major problem in practice, partly because existing systems suffer from the flaw of providing the same guidance to all drivers. As a result, some violations are avoided, but other drivers ignore or respond inappropriately to red light running systems, resulting in safety issues overall. We show a method of providing accurate warnings to individual drivers to avoid the broad guidance approach of most existing systems. Recognizing if a driver will run red lights is highly dependent on signal phase and timing, traffic conditions along the road, and individual driver behaviour, the proposed warning system contains three parts: a traffic prediction algorithm, an individual warning signal optimizer, and a driver warning display. The traffic prediction algorithm predicts future traffic states along the road towards the signalized intersections using the latest traffic conditions obtained through vehicle-to-vehicle and vehicle-to-infrastructure communications. Then, an optimization problem is formulated to compute the optimal warning signal based on predicted traffic states and driver reaction model. Finally, the optimal warning signal is shown on the display screen to advise driver on how much braking is needed to avoid running the red light. The system continuously updates the latest warning signal as the vehicle is approaching the intersection. Both numerical simulated driving scenarios and real-world road tests are used to demonstrate the proposed algorithm's performance under different conditions by comparing with previous work on red light running warning system. The results show that the system provides more effective and accurate warning signals to drivers, helping them avoid running red lights.
comment: accepted by ASCE's Journal of Transportation Engineering, Part A: Systems
Ensemble-Based Peak Demand Probability Density Forecasting with Application to Risk-Aware Power System Scheduling
Power systems face increasing challenges in maintaining resource adequacy due to lower operating margins, rising renewable energy uncertainty, and demand variability. Forecasting the probability distribution of peak demand on shorter timescales is a critical forward-facing issue under increasing volatility. This study introduces a novel ensemble-based machine learning method for peak demand probability density forecasting that extends classical extreme value theory to model time series peaks as nonstationary statistical distributions. The approach employs an ensemble of tree-based learners that recursively partition the covariate space and estimate local generalized extreme value distributions, allowing it to automatically capture complex covariate-dependent parameter variations. Unlike existing approaches, which often suffer from convergence issues or restrictive functional forms, this framework is both flexible and robust. Validation on a case study based on the PJM interconnection demonstrates that the method achieves a 38 percent reduction in committed capacity when generation is scheduled based on a reliability criterion. These improvements provide practical value for power system operation, enabling risk-aware capacity scheduling under peak demand uncertainty and supporting reliability-driven decision making in future energy systems.
comment: Completed major revision
Faster Reinforcement Learning by Freezing Slow States
We study infinite horizon Markov decision processes (MDPs) with "fast-slow" structure, where some state variables evolve rapidly ("fast states") while others change more gradually ("slow states"). This structure commonly arises in practice when decisions must be made at high frequencies over long horizons, and where slowly changing information still plays a critical role in determining optimal actions. Examples include inventory control under slowly changing demand indicators or dynamic pricing with gradually shifting consumer behavior. Modeling the problem at the natural decision frequency leads to MDPs with discount factors close to one, making them computationally challenging. We propose a novel approximation strategy that "freezes" slow states during phases of lower-level planning and subsequently applies value iteration to an auxiliary upper-level MDP that evolves on a slower timescale. Freezing states for short periods of time leads to easier-to-solve lower-level problems, while a slower upper-level timescale allows for a more favorable discount factor. On the theoretical side, we analyze the regret incurred by our frozen-state approach, which leads to simple insights on how to trade off regret versus computational cost. Empirically, we benchmark our new frozen-state methods on three domains, (i) inventory control with fixed order costs, (ii) a gridworld problem with spatial tasks, and (iii) dynamic pricing with reference-price effects. We demonstrate that the new methods produce high-quality policies with significantly less computation, and we show that simply omitting slow states is often a poor heuristic.
comment: 70 pages, 10 figures
Dimensionality Reduction with Koopman Generalized Eigenfunctions
This paper presents a methodology to achieve lower-dimensional Koopman quasi-linear representations of nonlinear system dynamics using Koopman generalized eigenfunctions. The proposed approach considers the analytically derived Koopman formulation of rigid body dynamics, but it can be extended to any data-driven or analytically derived generalized eigenfunction set. It achieves a representation for which the number of Koopman observables matches the number of inputs allowing for Koopman linearization control solutions rather than resorting to the least squares approximation method adopted in high dimensional Koopman formulations. Through a linear combination of Koopman generalized eigenfunctions a new set of Koopman generalized eigenfunction is constructed so that the zero order truncation approximate a Koopman eigenfunction which can be used to design linear control strategies to steer the dynamics of the original nonlinear system. The proposed methodology is tested by designing a linear quadratic (LQ) flight controller for a quadrotor UAV. Numerical and Hardware-in-the-loop (HIL) simulations validate the applicability and real-time implementability of the proposed approach in the presence of noise and sensor delays. The main advantage of the proposed method is the realization of a fully actuated Koopman based model which, in the case of the underactuated quadrotor system, allows to achieve trajectory tracking through a single linear control loop.
Bandwidth Efficient Livestreaming in Mobile Wireless Networks: A Peer-to-Peer ACIDE Solution
In mobile wireless networks, livestreaming in high user density areas presents two typical challenges: the wireless bandwidth is depleted and the number of users is limited. In this study, a media distribution model utilizing peer to peer communications, Active Control in an Intelligent and Distributed Environment, is proposed for bandwidth efficient livestreaming. The basic idea is to group users with identical livestream interest in a cluster of n peers. Instead of sending n copies of a livestream package, only one copy is sent to the cluster. A package is divided into n blocks. Each user receives one block from the base station and the remaining n-1 blocks from the other peers. Two optimization problems are addressed. The first problem is minimizing the bandwidth needed to guarantee a continuous live media play on all peers. A solution is proposed to find the optimal block sizes such that the wireless bandwidth is minimized. The second problem is maximizing the number of peers admitted to a cluster, given a fixed wireless bandwidth. This problem is NP-complete and a greedy strategy is proposed to calculate a feasible solution for peer selection. The proposed model improves the bandwidth efficiency and allows more users to be served.
comment: 16 pages, 12 figures, 4 tables, Journal submission
Systems and Control (EESS)
Rate-cost tradeoffs in continuous-time control with a biomolecular application
This paper focuses on rate-limited control of the generalized Ornstein-Uhlenbeck process where the control action can be either multiplicative or additive, and the noise variance can depend on the control action. We derive a lower bound on the data rate necessary to achieve the desired control cost. The lower bound is attained with equality if the control is performed via an additive white Gaussian channel. The system model approximates the dynamics of a discrete-state molecular birth-death process, and the result has direct implications on the control of a biomolecular system via chemical reactions, where the multiplicative control corresponds to the degradation rate, the additive control corresponds to the production rate, and the control objective is to decrease the fluctuations of the controlled molecular species around their desired concentration levels.
System-Theoretic Analysis of Dynamic Generalized Nash Equilibrium Problems -- Turnpikes and Dissipativity
Generalized Nash equilibria are used in multi-agent control applications to model strategic interactions between agents that are coupled in the cost, dynamics, and constraints. We study the properties of open-loop GNE trajectories from a system-theoretic perspective. We show how strict dissipativity generates the turnpike phenomenon in GNE solutions. Moreover, we establish a converse turnpike result, i.e., the implication from turnpike to strict dissipativity. We derive conditions under which the steady-state GNE is the optimal operating point and, using a game value function, we give a local characterization of the geometry of storage functions. Finally, we design linear terminal penalties that ensure GNE open-loop trajectories converge to and remain at the steady-state GNE. These connections provide the foundation for future system-theoretic analysis of GNEs similar to those existing in optimal control.
Auction-Based Responsibility Allocation for Scalable Decentralized Safety Filters in Cooperative Multi-Agent Collision Avoidance
This paper proposes a scalable decentralized safety filter for multi-agent systems based on high-order control barrier functions (HOCBFs) and auction-based responsibility allocation. While decentralized HOCBF formulations ensure pairwise safety under input bounds, they face feasibility and scalability challenges as the number of agents grows. Each agent must evaluate an increasing number of pairwise constraints, raising the risk of infeasibility and making it difficult to meet real-time requirements. To address this, we introduce an auction-based allocation scheme that distributes constraint enforcement asymmetrically among neighbors based on local control effort estimates. The resulting directed responsibility graph guarantees full safety coverage while reducing redundant constraints and per-agent computational load. Simulation results confirm safe and efficient coordination across a range of network sizes and interaction densities.
comment: 6 pages, 3 figures, Submitted to Control Engineering Practice and IFAC World Congress 2026
Analysis and Synthesis of Switched Optimization Algorithms
Deployment of optimization algorithms on networked systems face challenges associated with time delays and corruptions. One particular instance is the presence of time-varying delays arising from factors such as packet drops and irregular sampling. Fixed time delays can destabilize gradient descent algorithms, and this degradation is exacerbated by time-varying delays. This work concentrates on the analysis and creation of discrete-time optimization algorithms with certified exponential convergence rates that are robust against switched uncertainties between the optimizer and the gradient oracle. These optimization algorithms are implemented by a switch-scheduled output feedback controllers. Rate variation and sawtooth behavior (packet drops) in time-varying delays can be imposed through constraining switching sequences. Analysis is accomplished by bisection in the convergence rate to find Zames-Falb filter coefficents. Synthesis is performed by alternating between a filter coefficient search for a fixed controller, and a controller search for fixed multipliers.
Robust Regret Control with Uncertainty-Dependent Baseline
This paper proposes a robust regret control framework in which the performance baseline adapts to the realization of system uncertainty. The plant is modeled as a discrete-time, uncertain linear time-invariant system with real-parametric uncertainty. The performance baseline is the optimal non-causal controller constructed with full knowledge of the disturbance and the specific realization of the uncertain plant. We show that a controller achieves robust additive regret relative to this baseline if and only if it satisfies a related, robust $H_\infty$ performance condition on a modified plant. One technical issue is that the modified plant can, in general, have a complicated nonlinear dependence on the uncertainty. We use a linear approximation step so that the robust additive regret condition can be recast as a standard $\mu$-synthesis problem. A numerical example is used to demonstrate the proposed approach.
Predictive control barrier functions for piecewise affine systems with non-smooth constraints
Obtaining control barrier functions (CBFs) with large safe sets for complex nonlinear systems and constraints is a challenging task. Predictive CBFs address this issue by using an online finite-horizon optimal control problem that implicitly defines a large safe set. The optimal control problem, also known as the predictive safety filter (PSF), involves predicting the system's flow under a given backup control policy. However, for non-smooth systems and constraints, some key elements, such as CBF gradients and the sensitivity of the flow, are not well-defined, making the current methods inadequate for ensuring safety. Additionally, for control-non-affine systems, the PSF is generally nonlinear and non-convex, posing challenges for real-time computation. This paper considers piecewise affine systems, which are usually control-non-affine, under nonlinear state and polyhedral input constraints. We solve the safety issue by incorporating set-valued generalized Clarke derivatives in the PSF design. We show that enforcing CBF constraints across all elements of the generalized Clarke derivatives suffices to guarantee safety. Moreover, to lighten the computational overhead, we propose an explicit approximation of the PSF. The resulting control methods are demonstrated through numerical examples.
Data-driven Koopman MPC using Mixed Stochastic-Deterministic Tubes
This paper presents a novel data-driven stochastic MPC design for discrete-time nonlinear systems with additive disturbances by leveraging the Koopman operator and a distributionally robust optimization (DRO) framework. By lifting the dynamical system into a linear space, we achieve a finite-dimensional approximation of the Koopman operator. We explicitly account for the modeling approximation and additive disturbance error by a mixed stochastic-deterministic tube for the lifted linear model. This ensures the regulation of the original nonlinear system while complying with the prespecified constraints. Stochastic and deterministic tubes are constructed using a DRO and a hyper-cube hull, respectively. We provide finite sample error bounds for both types of tubes. The effectiveness of the proposed approach is demonstrated through numerical simulations.
comment: This is the accepted version. It will appear in Journal of Process Control, 2025
The PhasorArray Toolbox for Harmonic Analysis and Control Design
We present a MATLAB package called the Pha-sorArray Toolbox that has been developed to make harmonic analysis and control methods both practical and user-friendly. The toolbox adopts an object-oriented architecture that enables intuitive manipulation of periodic matrices through overloaded operators for addition, multiplication, convolution, and automatic Toeplitz construction. Its advanced features include harmonic Sylvester, Lyapunov and Riccati equations solvers, and seamless integration with YALMIP, thereby facilitating advanced control and analysis techniques based on Linear Matrix Inequalities (LMIs) in the harmonic framework.
Physics-Informed Neural Networks for MIMO Beam Map and Environment Reconstruction
As communication networks evolve towards greater complexity (e.g., 6G and beyond), a deep understanding of the wireless environment becomes increasingly crucial. When explicit knowledge of the environment is unavailable, geometry-aware feature extraction from channel state information (CSI) emerges as a pivotal methodology to bridge physical-layer measurements with network intelligence. This paper proposes to explore the received signal strength (RSS) data, without explicit 3D environment knowledge, to jointly construct the radio beam map and environmental geometry for a multiple-input multiple-output (MIMO) system. Unlike existing methods that only learn blockage structures, we propose an oriented virtual obstacle model that captures the geometric features of both blockage and reflection. Reflective zones are formulated to identify relevant reflected paths according to the geometry relation of the environment. We derive an analytical expression for the reflective zone and further analyze its geometric characteristics to develop a reformulation that is more compatible with deep learning representations. A physics-informed deep learning framework that incorporates the reflective-zone-based geometry model is proposed to learn the blockage, reflection, and scattering components, along with the beam pattern, which leverages physics prior knowledge to enhance network transferability. Numerical experiments demonstrate that, in addition to reconstructing the blockage and reflection geometry, the proposed model can construct a more accurate MIMO beam map with a 32%-48% accuracy improvement.
The Role of Information Incompleteness in Defending Against Stealth Attacks
The effectiveness of Data Injections Attacks (DIAs) critically depends on the completeness of the system information accessible to adversaries. This relationship positions information incompleteness enhancement as a vital defense strategy for degrading DIA performance. In this paper, we focus on the information-theoretic stealth attacks, where the attacker encounters a fundamental tradeoff between the attack stealthiness and destructiveness. Specifically, we systematically characterize how incomplete admittance information impacts the dual objectives. In particular, we establish sufficient conditions for two distinct operational regimes: (i) stealthiness intensifies while destructive potential diminishes and (ii) destructiveness increases while stealth capability weakens. For scenarios beyond these regimes, we propose a maximal incompleteness strategy to optimally degrade stealth capability. To solve the associated optimization problem, the feasible region is reduced without excluding the optimal solution, and a heuristic algorithm is then introduced to effectively identify the near-optimal solutions within the reduced region. Numerical simulations are conducted on IEEE test systems to validate the findings.
Green Hydrogen under Uncertainty: Evaluating Power-to-X Strategies Using Agent-Based Simulation and Multi-Criteria Decision Framework
The transition toward net-zero energy systems requires scalable and cost-effective deployment of Power-to-X technologies, particularly green hydrogen production. Despite increasing investments, a critical research gap remains in dynamically assessing how different operational strategies affect the feasibility of hydrogen production under real-world energy market conditions. Most existing studies rely on static, techno-economic models and overlook actor interactions, infrastructure limitations, and regulatory complexity. This paper presents a novel modeling framework that integrates agent-based simulation with multi-criteria decision-making to evaluate green hydrogen production strategies using co-located wind and solar generation. Three operational strategies - grid-only, on-site-only, and hybrid - are applied across three electrolyzer capacity levels (10 MW, 50 MW, and 100 MW) within a Danish case study. Real electricity tariffs, emissions factors, and market data are used to simulate technical, economic, and environmental performance indicators. The results show that hybrid strategies consistently outperform grid-only configurations in terms of cost and emissions while maintaining stable hydrogen output. Although on-site-only strategies minimize emissions and costs, they fail to meet fixed production demands. This framework offers novel scientific contributions by modeling dynamic actor interactions and integrating system performance evaluation into strategic planning. Practically, it provides actionable insights for energy planners and policymakers designing resilient and efficient Power-to-X systems in renewable-rich contexts.
The local Gaussian correlation networks among return tails in the Chinese stock market
Financial networks based on Pearson correlations have been intensively studied. However, previous studies may have led to misleading and catastrophic results because of several critical shortcomings of the Pearson correlation. The local Gaussian correlation coefficient, a new measurement of statistical dependence between variables, has unique advantages including capturing local nonlinear dependence and handling heavy-tailed distributions. This study constructs financial networks using the local Gaussian correlation coefficients between tail regions of stock returns in the Shanghai Stock Exchange. The work systematically analyzes fundamental network metrics including node centrality, average shortest path length, and entropy. Compared with the local Gaussian correlation network among positive tails and the conventional Pearson correlation network, the properties of the local Gaussian correlation network among negative tails are more sensitive to the stock market risks. This finding suggests researchers should prioritize the local Gaussian correlation network among negative tails. Future work should reevaluate existing findings using the local Gaussian correlation method.
Environment-Dependent Components Identification of Behind-the-Meter Resources via Inverse Optimization
With the increasing penetration of behind-the-meter (BTM) resources, it is vital to monitor the components of these resources and deduce their response behavior to external environment. Owing to data privacy, however, the appliance-wise measurement is invisible to the power system operator, which hinders the accurate modeling of load identification. To this end, this paper proposes a hybrid physics-inspired and data-driven framework for decomposing BTM components based on external measurement of total load and environmental factors. The total load is decomposed into different environment-dependent components, namely storage-like component, PV generation component, thermostatically-controlled load component, and periodic component. The overall load identification adopts a double-layer iterative solution framework. A data-driven inverse optimization algorithm is developed to identify parameters of the energy storage-like component. The physics-inspired model is proposed to identify the capacity and response of the rest components. The modeling accuracy and robustness of the proposed method are validated by numerical tests. The application significance of the proposed BTM identification method is also validated in electricity market clearing for reducing system operation costs.
High-Performance Rotor Cooling with Ducted Liquid in Completely Cold-Formed Modular Motor Shaft
This paper suggests a novel rotor-cooling shaft concept for high-performance electric motors that increases the effectiveness of cooling and is yet simple and cost-effective to manufacture. We investigate the thermal performance of four shaft geometries for rotor cooling in automotive applications. The proposed tooth-guided liquid-cooling shaft design aims to solve the high churning loss of conventional cooled rotor shafts due to internal vortex formation and their still limited heat transfer. Therefore, we optimize heat transfer efficiency and pressure management by incorporating cold-formed internal channels that restrict vortex formation beyond a degree that improves heat transfer. We evaluated key performance metrics, including heat transfer rate, outlet temperature, pressure drop, and velocity profiles, under varying rotational speeds, inlet flow rates, and coolant temperatures. Computational fluid analysis demonstrates that the tooth-guided design outperforms conventional hollow shafts and achieves up to 110% higher cooling efficiency at low rotational speeds, while it maintains comparable pressure levels. These findings provide practical insight into geometry-driven thermal optimization and offer a path toward improving the performance and durability of electric motors.
comment: 11 pages, 21 figures
Control of neural field equations with step-function inputs
Wilson-Cowan and Amari-type models capture nonlinear neural population dynamics, providing a fundamental framework for modeling how sensory and other exogenous inputs shape activity in neural tissue. We study the controllability properties of Amari-type neural fields subject to piecewise/constant-in-time inputs. The model describes the time evolution of the polarization of neural tissue within a spatial continuum, with synaptic interactions represented by a convolution kernel. We study the synthesis of piecewise/constant-in-time inputs to achieve two-point boundary-type control objectives, namely, steering neural activity from an initial state to a prescribed target state. This approach is particularly relevant for predicting the emergence of paradoxical neural representations, such as discordant visual illusions that occur in response to overt sensory stimuli. We first present a control synthesis based on the Banach fixed-point theorem, which yields an iterative construction of a constant-in-time input under minimal regularity assumptions on the kernel and transfer function; however, it exhibits practical limitations, even in the linear case. To overcome these challenges, we then develop a generic synthesis framework based on the flow of neural dynamics drift, enabling explicit piecewise constant and constant-in-time inputs. Extensive numerical results in one and two spatial dimensions confirm the effectiveness of the proposed syntheses and demonstrate their superior performance compared to inputs derived from naive linearization at the initial or target states when these states are not equilibria of the drift dynamics. By providing a mathematically rigorous framework for controlling Amari-type neural fields, this work advances our understanding of nonlinear neural population control with potential applications in computational neuroscience, psychophysics, and neurostimulation.
A Hybrid GNN-LSE Method for Fast, Robust, and Physically-Consistent AC Power Flow
Conventional AC Power Flow (ACPF) solvers like Newton-Raphson (NR) face significant computational and convergence challenges in modern, large-scale power systems. This paper proposes a novel, two-stage hybrid method that integrates a Physics-Informed Graph Neural Network (GNN) with a robust, iterative Linear State Estimation (LSE) refinement step to produce fast and physically-consistent solutions. The GNN, trained with a physics-informed loss function featuring an efficient dynamic weighting scheme, rapidly predicts a high-quality initial system state. This prediction is then refined using an iterative, direct linear solver inspired by state estimation techniques. This LSE refinement step solves a series of linear equations to enforce physical laws, effectively bypassing the non-linearities and convergence issues of traditional solvers. The proposed GNN-LSE framework is comprehensively validated on systems ranging from small radial distribution networks (IEEE 33-bus, 69-bus) to a large, meshed transmission system (IEEE 118-bus). Results show that our GNN variants are up to $8.4 \times 10^3$ times faster than NR. The LSE refinement provides a fast route to a physically-consistent solution, while heavy-loading stress tests (120%-150% of nominal) and N-1 contingencies demonstrate the method's reliability and generalization. This work presents a powerful and flexible framework for bridging fast, data-driven models with the rigorous constraints of power system physics, offering a practical tool for real-time operations and analysis.
Motion Planning with Precedence Specifications via Augmented Graphs of Convex Sets
We present an algorithm for planning trajectories that avoid obstacles and satisfy key-door precedence specifications expressed with a fragment of signal temporal logic. Our method includes a novel exact convex partitioning of the obstacle free space that encodes connectivity among convex free space sets, key sets, and door sets. We then construct an augmented graph of convex sets that exactly encodes the key-door precedence specifications. By solving a shortest path problem in this augmented graph of convex sets, our pipeline provides an exact solution up to a finite parameterization of the trajectory. To illustrate the effectiveness of our approach, we present a method to generate key-door mazes that provide challenging problem instances, and we perform numerical experiments to evaluate the proposed pipeline. Our pipeline is faster by several orders of magnitude than recent state-of-the art methods that use general purpose temporal logic tools.
Pricing Problems in Adoption of New Technologies
We propose a generalization of the Bass diffusion model in discrete-time that explicitly models the effect of price in adoption. Our model is different from earlier price-incorporated models and fits well to adoption data for various products. We then utilize this model to study two decision-making problems. First, we provide a series of structural results on optimal pricing strategies to maximize profits from product sales by a monopolist over a finite horizon. We fully characterize the optimal pricing strategy in the single-period problem, and establish several structural properties of the same for the multi-period counterpart. Second, we study a Stackelberg game between a policy-maker and a monopolist, where the former seeks to maximize adoption through rebates, while the latter focuses on profits. For this problem, we analytically characterize crucial properties of the equilibrium path of the single-period game, and demonstrate how they carry over to the multi-period variant.
Fixed Horizon Linear Quadratic Covariance Steering in Continuous Time with Hilbert-Schmidt Terminal Cost
We formulate and solve the fixed horizon linear quadratic covariance steering problem in continuous time with a terminal cost measured in Hilbert-Schmidt (i.e., Frobenius) norm error between the desired and the controlled terminal covariances. For this problem, the necessary conditions of optimality become a coupled matrix ODE two-point boundary value problem. To solve this system of equations, we design a matricial recursive algorithm and prove its convergence. The proposed algorithm and its analysis make use of the linear fractional transforms parameterized by the state transition matrix of the associated Hamiltonian matrix. To illustrate the results, we provide two numerical examples: one with a two dimensional and another with a six dimensional state space.
A Perspective on the Algebra, Topology, and Logic of Electrical Networks
This paper presents a unified algebraic, topological, and logical framework for electrical one-port networks based on \v{S}are's $m$-theory. Within this formalism, networks are represented by $m$-words (jorbs) over an ordered alphabet, where series and parallel composition induce an $m$-topology on $m$-graphs with a theta mapping $\vartheta$ that preserves one-port equivalence. The study formalizes quasi-orders, shells, and cores, showing their structural correspondence to network boundary conditions and impedance behavior. The $\lambda--\Delta$ metric, together with the valuation morphism $\Phi$, provides a concise descriptor of the impedance-degree structure. In the computational domain, the framework is extended with algorithmic procedures for generating and classifying non-isomorphic series-parallel topologies, accompanied by programmatic Cauer/Foster synthesis workflows and validation against canonical examples from Ladenheim's catalogue. The resulting approach enables symbolic-to-topological translation of impedance functions, offering a constructive bridge between algebraic representation and electrical realization. Overall, the paper outlines a self-consistent theoretical and computational foundation for automated network synthesis, classification, and formal verification within the emerging field of Jorbology.
Trajectory Optimization for Minimum Threat Exposure using Physics-Informed Neural Networks
We apply a physics-informed neural network (PINN) to solve the two-point boundary value problem (BVP) arising from the necessary conditions postulated by Pontryagin's Minimum Principle for optimal control. Such BVPs are known to be numerically difficult to solve by traditional shooting methods due to extremely high sensitivity to initial guesses. In the light of recent successes in applying PINNs for solving high-dimensional differential equations, we develop a PINN to solve the problem of finding trajectories with minimum exposure to a spatiotemporal threat for a vehicle kinematic model. First, we implement PINNs that are trained to solve the BVP for a given pair of initial and final states for a given threat field. Next, we implement a PINN conditioned on the initial state for a given threat field, which eliminates the need for retraining for each initial state. We demonstrate that the PINN outputs satisfy the necessary conditions with low numerical error.
comment: 2025 Indian Control Conference
Operational Risks in Grid Integration of Large Data Center Loads: Characteristics, Stability Assessments, and Sensitivity Studies
This paper investigates the dynamic interactions between large-scale data centers and the power grid, focusing on reliability challenges arising from sudden fluctuations in demand. With the rapid growth of AI-driven workloads, such fluctuations, along with fast ramp patterns, are expected to exacerbate stressed grid conditions and system instabilities. We consider a few open-source AI data center consumption profiles from the MIT supercloud datasets, along with generating a few experimental HPC job-distribution-based inference profiles. Subsequently, we develop analytical methodologies for real-time assessment of grid stability, focusing on both transient and small-signal stability assessments. Energy-flow-like metrics for nonlinear transient stability, formulated by computing localized data center bus kinetic-like flows and coupling interactions with neighboring buses over varying time windows, help provide operators with real-time assessments of the regional grid stress in the data center hubs. On the other hand, small-signal stability metrics, constructed from analytical state matrices under variable operating conditions during a fast ramping period, enable snapshot-based assessments of data center load fluctuations and provide enhanced observability into evolving grid conditions. By quantifying the stability impacts of large data center clusters, studies conducted in the modified IEEE benchmark $68-$bus model support improved operator situational awareness to capture risks in reliable integration of large data center loads.
comment: 13 pages, 8 figures, 3 tables
A Novel State-Centric Necessary Condition for Time-Optimal Control of Controllable Linear Systems Based on Augmented Switching Laws (Extended Version)
Most existing necessary conditions for optimal control based on adjoining methods require both state and costate information, yet the unobservability of costates for a given feasible trajectory impedes the determination of optimality in practice. This paper establishes a novel theoretical framework for time-optimal control of controllable linear systems with a single input, proposing the augmented switching law (ASL) that represents the input control and the feasibility in a compact form. Given a feasible trajectory, the perturbed trajectory under the constraints of ASL is guaranteed to be feasible, resulting in a novel state-centric necessary condition without dependence on costate information. A first-order necessary condition is proposed that the Jacobian matrix of the ASL is not of full row rank, which also results in a potential approach to optimizing a given feasible trajectory with the preservation of arc structures. The proposed necessary condition is applied to high-order chain-of-integrator systems with full box constraints, contributing to some theoretical results challenging to reason by costate-based conditions.
comment: This paper has been published in IEEE TAC
Mean-Field Sampling for Cooperative Multi-Agent Reinforcement Learning AAAI 2025
Designing efficient algorithms for multi-agent reinforcement learning (MARL) is fundamentally challenging because the size of the joint state and action spaces grows exponentially in the number of agents. These difficulties are exacerbated when balancing sequential global decision-making with local agent interactions. In this work, we propose a new algorithm $\texttt{SUBSAMPLE-MFQ}$ ($\textbf{Subsample}$-$\textbf{M}$ean-$\textbf{F}$ield-$\textbf{Q}$-learning) and a decentralized randomized policy for a system with $n$ agents. For any $k\leq n$, our algorithm learns a policy for the system in time polynomial in $k$. We prove that this learned policy converges to the optimal policy on the order of $\tilde{O}(1/\sqrt{k})$ as the number of subsampled agents $k$ increases. In particular, this bound is independent of the number of agents $n$.
comment: 53 pages. AAAI 2025 MARW Best Paper Award. Accepted at NeurIPS 2025 (spotlight)
Hybrid MAC Protocol with Integrated Multi-Layered Security for Resource-Constrained UAV Swarm Communications
Flying Ad Hoc Networks (FANETs) present unique challenges due to high node mobility, dynamic topologies, and strict resource constraints. Existing routing protocols often optimize for a single metric, such as path length or energy, while neglecting the complex dependencies between network performance, security, and MAC layer efficiency. This paper introduces a novel hardware software co design framework for secure and adaptive UAV swarm communications, featuring an energy aware protocol stack. The architecture employs a multicast, clustered organization where routing decisions integrate dynamic trust scores, historical link quality, and internodal distance. A hybrid MAC protocol combines contention based and scheduled channel access for optimized throughput. Security is ensured through a zero trust model that fuses cryptographic authentication with a behavioral reputation system, alongside hardware accelerated AES GCM encryption. Comparative analysis in an NS 3 simulation environment demonstrates the framework's superiority in packet delivery ratio, latency, resilience, and overhead, providing a scalable foundation for high performance swarm operations.
comment: Accepted at ISED 2025
Predictability Enables Parallelization of Nonlinear State Space Models NeurIPS '25
The rise of parallel computing hardware has made it increasingly important to understand which nonlinear state space models can be efficiently parallelized. Recent advances like DEER (arXiv:2309.12252) or DeepPCR (arXiv:2309.16318) have shown that evaluating a state space model can be recast as solving a parallelizable optimization problem, and sometimes this approach can yield dramatic speed-ups in evaluation time. However, the factors that govern the difficulty of these optimization problems remain unclear, limiting the larger adoption of the technique. In this work, we establish a precise relationship between the dynamics of a nonlinear system and the conditioning of its corresponding optimization formulation. We show that the predictability of a system, defined as the degree to which small perturbations in state influence future behavior, impacts the number of optimization steps required for evaluation. In predictable systems, the state trajectory can be computed in $O((\log T)^2)$ time, where $T$ is the sequence length, a major improvement over the conventional sequential approach. In contrast, chaotic or unpredictable systems exhibit poor conditioning, with the consequence that parallel evaluation converges too slowly to be useful. Importantly, our theoretical analysis demonstrates that for predictable systems, the optimization problem is always well-conditioned, whereas for unpredictable systems, the conditioning degrades exponentially as a function of the sequence length. We validate our claims through extensive experiments, providing practical guidance on when nonlinear dynamical systems can be efficiently parallelized, and highlighting predictability as a key design principle for parallelizable models.
comment: NeurIPS '25. Code: https://github.com/lindermanlab/predictability_enables_parallelization
Robust time series generation via Schrödinger Bridge: a comprehensive evaluation
We investigate the generative capabilities of the Schr\"odinger Bridge (SB) approach for time series. The SB framework formulates time series synthesis as an entropic optimal interpolation transport problem between a reference probability measure on path space and a target joint distribution. This results in a stochastic differential equation over a finite horizon that accurately captures the temporal dynamics of the target time series. While the SB approach has been largely explored in fields like image generation, there is a scarcity of studies for its application to time series. In this work, we bridge this gap by conducting a comprehensive evaluation of the SB method's robustness and generative performance. We benchmark it against state-of-the-art (SOTA) time series generation methods across diverse datasets, assessing its strengths, limitations, and capacity to model complex temporal dependencies. Our results offer valuable insights into the SB framework's potential as a versatile and robust tool for time series generation.
comment: 9 pages
Optimal Rates of Convergence for Entropy Regularization in Discounted Markov Decision Processes
We study the error introduced by entropy regularization in infinite-horizon, discrete, discounted Markov decision processes. We show that this error decreases exponentially in the inverse regularization strength both in a weighted KL-divergence and in value with a problem-specific exponent. This is in contrast to previously known estimates, of the order $O(\tau)$, where $\tau$ is the regularization strength. We provide a lower bound matching our upper bound up to a polynomial term, thereby characterizing the exponential convergence rate for entropy regularization. Our proof relies on the observation that the solutions of entropy-regularized Markov decision processes solve a gradient flow of the unregularized reward with respect to a Riemannian metric common in natural policy gradient methods. This correspondence allows us to identify the limit of this gradient flow as the generalized maximum entropy optimal policy, thereby characterizing the implicit bias of this gradient flow, which corresponds to a time-continuous version of the natural policy gradient method. We use our improved error estimates to show that for entropy-regularized natural policy gradient methods, the overall error decays exponentially in the square root of the number of iterations, improving over existing sublinear guarantees. Finally, we extend our analysis to settings beyond the entropy. In particular, we characterize the implicit bias regarding general convex potentials and their resulting generalized natural policy gradients.
comment: 32 pages, 1 figure
A Traffic Prediction-Based Individualized Driver Warning System to Reduce Red Light Violations SC
Red light violation is a major cause of traffic collisions and resulting injuries and fatalities. Despite extensive prior work to reduce red light violations, they continue to be a major problem in practice, partly because existing systems suffer from the flaw of providing the same guidance to all drivers. As a result, some violations are avoided, but other drivers ignore or respond inappropriately to red light running systems, resulting in safety issues overall. We show a method of providing accurate warnings to individual drivers to avoid the broad guidance approach of most existing systems. Recognizing if a driver will run red lights is highly dependent on signal phase and timing, traffic conditions along the road, and individual driver behaviour, the proposed warning system contains three parts: a traffic prediction algorithm, an individual warning signal optimizer, and a driver warning display. The traffic prediction algorithm predicts future traffic states along the road towards the signalized intersections using the latest traffic conditions obtained through vehicle-to-vehicle and vehicle-to-infrastructure communications. Then, an optimization problem is formulated to compute the optimal warning signal based on predicted traffic states and driver reaction model. Finally, the optimal warning signal is shown on the display screen to advise driver on how much braking is needed to avoid running the red light. The system continuously updates the latest warning signal as the vehicle is approaching the intersection. Both numerical simulated driving scenarios and real-world road tests are used to demonstrate the proposed algorithm's performance under different conditions by comparing with previous work on red light running warning system. The results show that the system provides more effective and accurate warning signals to drivers, helping them avoid running red lights.
comment: accepted by ASCE's Journal of Transportation Engineering, Part A: Systems
Time-causal and time-recursive wavelets
When to apply wavelet analysis to real-time temporal signals, where the future cannot be accessed, it is essential to base all the steps in the signal processing pipeline on computational mechanisms that are truly time-causal. This paper describes how a time-causal wavelet analysis can be performed based on concepts developed in the area of temporal scale-space theory, originating from a complete classification of temporal smoothing kernels that guarantee non-creation of new structures from finer to coarser temporal scale levels. By necessity, convolution with truncated exponential kernels in cascade constitutes the only permissable class of kernels, as well as their temporal derivatives as a natural complement to fulfil the admissibility conditions of wavelet representations. For a particular way of choosing the time constants in the resulting infinite convolution of truncated exponential kernels, to ensure temporal scale covariance and thus self-similarity over temporal scales, we describe how mother wavelets can be chosen as temporal derivatives of the resulting time-causal limit kernel. By developing connections between wavelet theory and scale-space theory, we characterize and quantify how the continuous scaling properties transfer to the discrete implementation, demonstrating how the proposed time-causal wavelet representation can reflect the duration of locally dominant temporal structures in the input signals. We propose that this notion of time-causal wavelet analysis could be a valuable tool for signal processing tasks, where streams of signals are to be processed in real time, specifically for signals that may contain local variations over a rich span of temporal scales, or more generally for analysing physical or biophysical temporal phenomena, where a fully time-causal analysis is called for to be physically realistic.
comment: 25 pages, 10 figures
Ensemble-Based Peak Demand Probability Density Forecasting with Application to Risk-Aware Power System Scheduling
Power systems face increasing challenges in maintaining resource adequacy due to lower operating margins, rising renewable energy uncertainty, and demand variability. Forecasting the probability distribution of peak demand on shorter timescales is a critical forward-facing issue under increasing volatility. This study introduces a novel ensemble-based machine learning method for peak demand probability density forecasting that extends classical extreme value theory to model time series peaks as nonstationary statistical distributions. The approach employs an ensemble of tree-based learners that recursively partition the covariate space and estimate local generalized extreme value distributions, allowing it to automatically capture complex covariate-dependent parameter variations. Unlike existing approaches, which often suffer from convergence issues or restrictive functional forms, this framework is both flexible and robust. Validation on a case study based on the PJM interconnection demonstrates that the method achieves a 38 percent reduction in committed capacity when generation is scheduled based on a reliability criterion. These improvements provide practical value for power system operation, enabling risk-aware capacity scheduling under peak demand uncertainty and supporting reliability-driven decision making in future energy systems.
comment: Completed major revision
Faster Reinforcement Learning by Freezing Slow States
We study infinite horizon Markov decision processes (MDPs) with "fast-slow" structure, where some state variables evolve rapidly ("fast states") while others change more gradually ("slow states"). This structure commonly arises in practice when decisions must be made at high frequencies over long horizons, and where slowly changing information still plays a critical role in determining optimal actions. Examples include inventory control under slowly changing demand indicators or dynamic pricing with gradually shifting consumer behavior. Modeling the problem at the natural decision frequency leads to MDPs with discount factors close to one, making them computationally challenging. We propose a novel approximation strategy that "freezes" slow states during phases of lower-level planning and subsequently applies value iteration to an auxiliary upper-level MDP that evolves on a slower timescale. Freezing states for short periods of time leads to easier-to-solve lower-level problems, while a slower upper-level timescale allows for a more favorable discount factor. On the theoretical side, we analyze the regret incurred by our frozen-state approach, which leads to simple insights on how to trade off regret versus computational cost. Empirically, we benchmark our new frozen-state methods on three domains, (i) inventory control with fixed order costs, (ii) a gridworld problem with spatial tasks, and (iii) dynamic pricing with reference-price effects. We demonstrate that the new methods produce high-quality policies with significantly less computation, and we show that simply omitting slow states is often a poor heuristic.
comment: 70 pages, 10 figures
Dimensionality Reduction with Koopman Generalized Eigenfunctions
This paper presents a methodology to achieve lower-dimensional Koopman quasi-linear representations of nonlinear system dynamics using Koopman generalized eigenfunctions. The proposed approach considers the analytically derived Koopman formulation of rigid body dynamics, but it can be extended to any data-driven or analytically derived generalized eigenfunction set. It achieves a representation for which the number of Koopman observables matches the number of inputs allowing for Koopman linearization control solutions rather than resorting to the least squares approximation method adopted in high dimensional Koopman formulations. Through a linear combination of Koopman generalized eigenfunctions a new set of Koopman generalized eigenfunction is constructed so that the zero order truncation approximate a Koopman eigenfunction which can be used to design linear control strategies to steer the dynamics of the original nonlinear system. The proposed methodology is tested by designing a linear quadratic (LQ) flight controller for a quadrotor UAV. Numerical and Hardware-in-the-loop (HIL) simulations validate the applicability and real-time implementability of the proposed approach in the presence of noise and sensor delays. The main advantage of the proposed method is the realization of a fully actuated Koopman based model which, in the case of the underactuated quadrotor system, allows to achieve trajectory tracking through a single linear control loop.
Bandwidth Efficient Livestreaming in Mobile Wireless Networks: A Peer-to-Peer ACIDE Solution
In mobile wireless networks, livestreaming in high user density areas presents two typical challenges: the wireless bandwidth is depleted and the number of users is limited. In this study, a media distribution model utilizing peer to peer communications, Active Control in an Intelligent and Distributed Environment, is proposed for bandwidth efficient livestreaming. The basic idea is to group users with identical livestream interest in a cluster of n peers. Instead of sending n copies of a livestream package, only one copy is sent to the cluster. A package is divided into n blocks. Each user receives one block from the base station and the remaining n-1 blocks from the other peers. Two optimization problems are addressed. The first problem is minimizing the bandwidth needed to guarantee a continuous live media play on all peers. A solution is proposed to find the optimal block sizes such that the wireless bandwidth is minimized. The second problem is maximizing the number of peers admitted to a cluster, given a fixed wireless bandwidth. This problem is NP-complete and a greedy strategy is proposed to calculate a feasible solution for peer selection. The proposed model improves the bandwidth efficiency and allows more users to be served.
comment: 16 pages, 12 figures, 4 tables, Journal submission
Robotics
VAMOS: A Hierarchical Vision-Language-Action Model for Capability-Modulated and Steerable Navigation
A fundamental challenge in robot navigation lies in learning policies that generalize across diverse environments while conforming to the unique physical constraints and capabilities of a specific embodiment (e.g., quadrupeds can walk up stairs, but rovers cannot). We propose VAMOS, a hierarchical VLA that decouples semantic planning from embodiment grounding: a generalist planner learns from diverse, open-world data, while a specialist affordance model learns the robot's physical constraints and capabilities in safe, low-cost simulation. We enabled this separation by carefully designing an interface that lets a high-level planner propose candidate paths directly in image space that the affordance model then evaluates and re-ranks. Our real-world experiments show that VAMOS achieves higher success rates in both indoor and complex outdoor navigation than state-of-the-art model-based and end-to-end learning methods. We also show that our hierarchical design enables cross-embodied navigation across legged and wheeled robots and is easily steerable using natural language. Real-world ablations confirm that the specialist model is key to embodiment grounding, enabling a single high-level planner to be deployed across physically distinct wheeled and legged robots. Finally, this model significantly enhances single-robot reliability, achieving 3X higher success rates by rejecting physically infeasible plans. Website: https://vamos-vla.github.io/
GSWorld: Closed-Loop Photo-Realistic Simulation Suite for Robotic Manipulation
This paper presents GSWorld, a robust, photo-realistic simulator for robotics manipulation that combines 3D Gaussian Splatting with physics engines. Our framework advocates "closing the loop" of developing manipulation policies with reproducible evaluation of policies learned from real-robot data and sim2real policy training without using real robots. To enable photo-realistic rendering of diverse scenes, we propose a new asset format, which we term GSDF (Gaussian Scene Description File), that infuses Gaussian-on-Mesh representation with robot URDF and other objects. With a streamlined reconstruction pipeline, we curate a database of GSDF that contains 3 robot embodiments for single-arm and bimanual manipulation, as well as more than 40 objects. Combining GSDF with physics engines, we demonstrate several immediate interesting applications: (1) learning zero-shot sim2real pixel-to-action manipulation policy with photo-realistic rendering, (2) automated high-quality DAgger data collection for adapting policies to deployment environments, (3) reproducible benchmarking of real-robot manipulation policies in simulation, (4) simulation data collection by virtual teleoperation, and (5) zero-shot sim2real visual reinforcement learning. Website: https://3dgsworld.github.io/.
The Reality Gap in Robotics: Challenges, Solutions, and Best Practices
Machine learning has facilitated significant advancements across various robotics domains, including navigation, locomotion, and manipulation. Many such achievements have been driven by the extensive use of simulation as a critical tool for training and testing robotic systems prior to their deployment in real-world environments. However, simulations consist of abstractions and approximations that inevitably introduce discrepancies between simulated and real environments, known as the reality gap. These discrepancies significantly hinder the successful transfer of systems from simulation to the real world. Closing this gap remains one of the most pressing challenges in robotics. Recent advances in sim-to-real transfer have demonstrated promising results across various platforms, including locomotion, navigation, and manipulation. By leveraging techniques such as domain randomization, real-to-sim transfer, state and action abstractions, and sim-real co-training, many works have overcome the reality gap. However, challenges persist, and a deeper understanding of the reality gap's root causes and solutions is necessary. In this survey, we present a comprehensive overview of the sim-to-real landscape, highlighting the causes, solutions, and evaluation metrics for the reality gap and sim-to-real transfer.
comment: Accepted for Publication as part of the Annual Review of Control, Robotics, and Autonomous Systems 2026
FieldGen: From Teleoperated Pre-Manipulation Trajectories to Field-Guided Data Generation
Large-scale and diverse datasets are vital for training robust robotic manipulation policies, yet existing data collection methods struggle to balance scale, diversity, and quality. Simulation offers scalability but suffers from sim-to-real gaps, while teleoperation yields high-quality demonstrations with limited diversity and high labor cost. We introduce FieldGen, a field-guided data generation framework that enables scalable, diverse, and high-quality real-world data collection with minimal human supervision. FieldGen decomposes manipulation into two stages: a pre-manipulation phase, allowing trajectory diversity, and a fine manipulation phase requiring expert precision. Human demonstrations capture key contact and pose information, after which an attraction field automatically generates diverse trajectories converging to successful configurations. This decoupled design combines scalable trajectory diversity with precise supervision. Moreover, FieldGen-Reward augments generated data with reward annotations to further enhance policy learning. Experiments demonstrate that policies trained with FieldGen achieve higher success rates and improved stability compared to teleoperation-based baselines, while significantly reducing human effort in long-term real-world data collection. Webpage is available at https://fieldgen.github.io/.
comment: Webpage: https://fieldgen.github.io/
ALICE-LRI: A General Method for Lossless Range Image Generation for Spinning LiDAR Sensors without Calibration Metadata
3D LiDAR sensors are essential for autonomous navigation, environmental monitoring, and precision mapping in remote sensing applications. To efficiently process the massive point clouds generated by these sensors, LiDAR data is often projected into 2D range images that organize points by their angular positions and distances. While these range image representations enable efficient processing, conventional projection methods suffer from fundamental geometric inconsistencies that cause irreversible information loss, compromising high-fidelity applications. We present ALICE-LRI (Automatic LiDAR Intrinsic Calibration Estimation for Lossless Range Images), the first general, sensor-agnostic method that achieves lossless range image generation from spinning LiDAR point clouds without requiring manufacturer metadata or calibration files. Our algorithm automatically reverse-engineers the intrinsic geometry of any spinning LiDAR sensor by inferring critical parameters including laser beam configuration, angular distributions, and per-beam calibration corrections, enabling lossless projection and complete point cloud reconstruction with zero point loss. Comprehensive evaluation across the complete KITTI and DurLAR datasets demonstrates that ALICE-LRI achieves perfect point preservation, with zero points lost across all point clouds. Geometric accuracy is maintained well within sensor precision limits, establishing geometric losslessness with real-time performance. We also present a compression case study that validates substantial downstream benefits, demonstrating significant quality improvements in practical applications. This paradigm shift from approximate to lossless LiDAR projections opens new possibilities for high-precision remote sensing applications requiring complete geometric preservation.
Real-Time Gait Adaptation for Quadrupeds using Model Predictive Control and Reinforcement Learning
Model-free reinforcement learning (RL) has enabled adaptable and agile quadruped locomotion; however, policies often converge to a single gait, leading to suboptimal performance. Traditionally, Model Predictive Control (MPC) has been extensively used to obtain task-specific optimal policies but lacks the ability to adapt to varying environments. To address these limitations, we propose an optimization framework for real-time gait adaptation in a continuous gait space, combining the Model Predictive Path Integral (MPPI) algorithm with a Dreamer module to produce adaptive and optimal policies for quadruped locomotion. At each time step, MPPI jointly optimizes the actions and gait variables using a learned Dreamer reward that promotes velocity tracking, energy efficiency, stability, and smooth transitions, while penalizing abrupt gait changes. A learned value function is incorporated as terminal reward, extending the formulation to an infinite-horizon planner. We evaluate our framework in simulation on the Unitree Go1, demonstrating an average reduction of up to 36.48\% in energy consumption across varying target speeds, while maintaining accurate tracking and adaptive, task-appropriate gaits.
C-NAV: Towards Self-Evolving Continual Object Navigation in Open World
Embodied agents are expected to perform object navigation in dynamic, open-world environments. However, existing approaches typically rely on static trajectories and a fixed set of object categories during training, overlooking the real-world requirement for continual adaptation to evolving scenarios. To facilitate related studies, we introduce the continual object navigation benchmark, which requires agents to acquire navigation skills for new object categories while avoiding catastrophic forgetting of previously learned knowledge. To tackle this challenge, we propose C-Nav, a continual visual navigation framework that integrates two key innovations: (1) A dual-path anti-forgetting mechanism, which comprises feature distillation that aligns multi-modal inputs into a consistent representation space to ensure representation consistency, and feature replay that retains temporal features within the action decoder to ensure policy consistency. (2) An adaptive sampling strategy that selects diverse and informative experiences, thereby reducing redundancy and minimizing memory overhead. Extensive experiments across multiple model architectures demonstrate that C-Nav consistently outperforms existing approaches, achieving superior performance even compared to baselines with full trajectory retention, while significantly lowering memory requirements. The code will be publicly available at https://bigtree765.github.io/C-Nav-project.
EmbodiedBrain: Expanding Performance Boundaries of Task Planning for Embodied Intelligence
The realization of Artificial General Intelligence (AGI) necessitates Embodied AI agents capable of robust spatial perception, effective task planning, and adaptive execution in physical environments. However, current large language models (LLMs) and multimodal LLMs (MLLMs) for embodied tasks suffer from key limitations, including a significant gap between model design and agent requirements, an unavoidable trade-off between real-time latency and performance, and the use of unauthentic, offline evaluation metrics. To address these challenges, we propose EmbodiedBrain, a novel vision-language foundation model available in both 7B and 32B parameter sizes. Our framework features an agent-aligned data structure and employs a powerful training methodology that integrates large-scale Supervised Fine-Tuning (SFT) with Step-Augumented Group Relative Policy Optimization (Step-GRPO), which boosts long-horizon task success by integrating preceding steps as Guided Precursors. Furthermore, we incorporate a comprehensive reward system, including a Generative Reward Model (GRM) accelerated at the infrastructure level, to improve training efficiency. For enable thorough validation, we establish a three-part evaluation system encompassing General, Planning, and End-to-End Simulation Benchmarks, highlighted by the proposal and open-sourcing of a novel, challenging simulation environment. Experimental results demonstrate that EmbodiedBrain achieves superior performance across all metrics, establishing a new state-of-the-art for embodied foundation models. Towards paving the way for the next generation of generalist embodied agents, we open-source all of our data, model weight, and evaluating methods, which are available at https://zterobot.github.io/EmbodiedBrain.github.io.
Deep Learning-Powered Visual SLAM Aimed at Assisting Visually Impaired Navigation
Despite advancements in SLAM technologies, robust operation under challenging conditions such as low-texture, motion-blur, or challenging lighting remains an open challenge. Such conditions are common in applications such as assistive navigation for the visually impaired. These challenges undermine localization accuracy and tracking stability, reducing navigation reliability and safety. To overcome these limitations, we present SELM-SLAM3, a deep learning-enhanced visual SLAM framework that integrates SuperPoint and LightGlue for robust feature extraction and matching. We evaluated our framework using TUM RGB-D, ICL-NUIM, and TartanAir datasets, which feature diverse and challenging scenarios. SELM-SLAM3 outperforms conventional ORB-SLAM3 by an average of 87.84% and exceeds state-of-the-art RGB-D SLAM systems by 36.77%. Our framework demonstrates enhanced performance under challenging conditions, such as low-texture scenes and fast motion, providing a reliable platform for developing navigation aids for the visually impaired.
comment: 8 pages, 7 figures, 4 tables
RubbleSim: A Photorealistic Structural Collapse Simulator for Confined Space Mapping
Despite well-reported instances of robots being used in disaster response, there is scant published data on the internal composition of the void spaces within structural collapse incidents. Data collected during these incidents is mired in legal constraints, as ownership is often tied to the responding agencies, with little hope of public release for research. While engineered rubble piles are used for training, these sites are also reluctant to release information about their proprietary training grounds. To overcome this access challenge, we present RubbleSim -- an open-source, reconfigurable simulator for photorealistic void space exploration. The design of the simulation assets is directly informed by visits to numerous training rubble sites at differing levels of complexity. The simulator is implemented in Unity with multi-operating system support. The simulation uses a physics-based approach to build stochastic rubble piles, allowing for rapid iteration between simulation worlds while retaining absolute knowledge of the ground truth. Using RubbleSim, we apply a state-of-the-art structure-from-motion algorithm to illustrate how perception performance degrades under challenging visual conditions inside the emulated void spaces. Pre-built binaries and source code to implement are available online: https://github.com/mit-ll/rubble_pile_simulator.
comment: Accepted to 2025 IEEE International Symposium on Safety, Security, and Rescue Robotics
A Parameter-Linear Formulation of the Optimal Path Following Problem for Robotic Manipulator
In this paper the computational challenges of time-optimal path following are addressed. The standard approach is to minimize the travel time, which inevitably leads to singularities at zero path speed, when reformulating the optimization problem in terms of a path parameter. Thus, smooth trajectory generation while maintaining a low computational effort is quite challenging, since the singularities have to be taken into account. To this end, a different approach is presented in this paper. This approach is based on maximizing the path speed along a prescribed path. Furthermore, the approach is capable of planning smooth trajectories numerically efficient. Moreover, the discrete reformulation of the underlying problem is linear in optimization variables.
Simultaneous Stiffness and Trajectory Optimization for Energy Minimization of Pick-and-Place Tasks of SEA-Actuated Parallel Kinematic Manipulators
A major field of industrial robot applications deals with repetitive tasks that alternate between operating points. For these so-called pick-and-place operations, parallel kinematic manipulators (PKM) are frequently employed. These tasks tend to automatically run for a long period of time and therefore minimizing energy consumption is always of interest. Recent research addresses this topic by the use of elastic elements and particularly series elastic actuators (SEA). This paper explores the possibilities of minimizing energy consumption of SEA actuated PKM performing pick-and-place tasks. The basic idea is to excite eigenmotions that result from the actuator springs and exploit their oscillating characteristics. To this end, a prescribed cyclic pick-and-place operation is analyzed and a dynamic model of SEA driven PKM is derived. Subsequently, an energy minimizing optimal control problem is formulated where operating trajectories as well as SEA stiffnesses are optimized simultaneously. Here, optimizing the actuator stiffness does not account for variable stiffness actuators. It serves as a tool for the design and dimensioning process. The hypothesis on energy reduction is tested on two (parallel) robot applications where redundant actuation is also addressed. The results confirm the validity of this approach.
Dual Control Reference Generation for Optimal Pick-and-Place Execution under Payload Uncertainty
This work addresses the problem of robot manipulation tasks under unknown dynamics, such as pick-and-place tasks under payload uncertainty, where active exploration and(/for) online parameter adaptation during task execution are essential to enable accurate model-based control. The problem is framed as dual control seeking a closed-loop optimal control problem that accounts for parameter uncertainty. We simplify the dual control problem by pre-defining the structure of the feedback policy to include an explicit adaptation mechanism. Then we propose two methods for reference trajectory generation. The first directly embeds parameter uncertainty in robust optimal control methods that minimize the expected task cost. The second method considers minimizing the so-called optimality loss, which measures the sensitivity of parameter-relevant information with respect to task performance. We observe that both approaches reason over the Fisher information as a natural side effect of their formulations, simultaneously pursuing optimal task execution. We demonstrate the effectiveness of our approaches for a pick-and-place manipulation task. We show that designing the reference trajectories whilst taking into account the control enables faster and more accurate task performance and system identification while ensuring stable and efficient control.
Degradation-Aware Cooperative Multi-Modal GNSS-Denied Localization Leveraging LiDAR-Based Robot Detections
Accurate long-term localization using onboard sensors is crucial for robots operating in Global Navigation Satellite System (GNSS)-denied environments. While complementary sensors mitigate individual degradations, carrying all the available sensor types on a single robot significantly increases the size, weight, and power demands. Distributing sensors across multiple robots enhances the deployability but introduces challenges in fusing asynchronous, multi-modal data from independently moving platforms. We propose a novel adaptive multi-modal multi-robot cooperative localization approach using a factor-graph formulation to fuse asynchronous Visual-Inertial Odometry (VIO), LiDAR-Inertial Odometry (LIO), and 3D inter-robot detections from distinct robots in a loosely-coupled fashion. The approach adapts to changing conditions, leveraging reliable data to assist robots affected by sensory degradations. A novel interpolation-based factor enables fusion of the unsynchronized measurements. LIO degradations are evaluated based on the approximate scan-matching Hessian. A novel approach of weighting odometry data proportionally to the Wasserstein distance between the consecutive VIO outputs is proposed. A theoretical analysis is provided, investigating the cooperative localization problem under various conditions, mainly in the presence of sensory degradations. The proposed method has been extensively evaluated on real-world data gathered with heterogeneous teams of an Unmanned Ground Vehicle (UGV) and Unmanned Aerial Vehicles (UAVs), showing that the approach provides significant improvements in localization accuracy in the presence of various sensory degradations.
comment: This work has been submitted to the IEEE for possible publication
Robot Path and Trajectory Planning Considering a Spatially Fixed TCP
This paper presents a method for planning a trajectory in workspace coordinates using a spatially fixed tool center point (TCP), while taking into account the processing path on a part. This approach is beneficial if it is easier to move the part rather than moving the tool. Whether a mathematical description that defines the shape to be processed or single points from a design program are used, the robot path is finally represented using B-splines. The use of splines enables the path to be continuous with a desired degree, which finally leads to a smooth robot trajectory. While calculating the robot trajectory through prescribed orientation, additionally a given velocity at the TCP has to be considered. The procedure was validated on a real system using an industrial robot moving an arbitrary defined part.
Behavior-Aware Online Prediction of Obstacle Occupancy using Zonotopes
Predicting the motion of surrounding vehicles is key to safe autonomous driving, especially in unstructured environments without prior information. This paper proposes a novel online method to accurately predict the occupancy sets of surrounding vehicles based solely on motion observations. The approach is divided into two stages: first, an Extended Kalman Filter and a Linear Programming (LP) problem are used to estimate a compact zonotopic set of control actions; then, a reachability analysis propagates this set to predict future occupancy. The effectiveness of the method has been validated through simulations in an urban environment, showing accurate and compact predictions without relying on prior assumptions or prior training data.
comment: 64th IEEE Conference on Decision and Control
MR-UBi: Mixed Reality-Based Underwater Robot Arm Teleoperation System with Reaction Torque Indicator via Bilateral Control
We present a mixed reality-based underwater robot arm teleoperation system with a reaction torque indicator via bilateral control (MR-UBi). The reaction torque indicator (RTI) overlays a color and length-coded torque bar in the MR-HMD, enabling seamless integration of visual and haptic feedback during underwater robot arm teleoperation. User studies with sixteen participants compared MR-UBi against a bilateral-control baseline. MR-UBi significantly improved grasping-torque control accuracy, increasing the time within the optimal torque range and reducing both low and high grasping torque range during lift and pick-and-place tasks with objects of different stiffness. Subjective evaluations further showed higher usability (SUS) and lower workload (NASA--TLX). Overall, the results confirm that \textit{MR-UBi} enables more stable, accurate, and user-friendly underwater robot-arm teleoperation through the integration of visual and haptic feedback. For additional material, please check: https://mertcookimg.github.io/mr-ubi
PointMapPolicy: Structured Point Cloud Processing for Multi-Modal Imitation Learning
Robotic manipulation systems benefit from complementary sensing modalities, where each provides unique environmental information. Point clouds capture detailed geometric structure, while RGB images provide rich semantic context. Current point cloud methods struggle to capture fine-grained detail, especially for complex tasks, which RGB methods lack geometric awareness, which hinders their precision and generalization. We introduce PointMapPolicy, a novel approach that conditions diffusion policies on structured grids of points without downsampling. The resulting data type makes it easier to extract shape and spatial relationships from observations, and can be transformed between reference frames. Yet due to their structure in a regular grid, we enable the use of established computer vision techniques directly to 3D data. Using xLSTM as a backbone, our model efficiently fuses the point maps with RGB data for enhanced multi-modal perception. Through extensive experiments on the RoboCasa and CALVIN benchmarks and real robot evaluations, we demonstrate that our method achieves state-of-the-art performance across diverse manipulation tasks. The overview and demos are available on our project page: https://point-map.github.io/Point-Map/
NeuralTouch: Neural Descriptors for Precise Sim-to-Real Tactile Robot Control
Grasping accuracy is a critical prerequisite for precise object manipulation, often requiring careful alignment between the robot hand and object. Neural Descriptor Fields (NDF) offer a promising vision-based method to generate grasping poses that generalize across object categories. However, NDF alone can produce inaccurate poses due to imperfect camera calibration, incomplete point clouds, and object variability. Meanwhile, tactile sensing enables more precise contact, but existing approaches typically learn policies limited to simple, predefined contact geometries. In this work, we introduce NeuralTouch, a multimodal framework that integrates NDF and tactile sensing to enable accurate, generalizable grasping through gentle physical interaction. Our approach leverages NDF to implicitly represent the target contact geometry, from which a deep reinforcement learning (RL) policy is trained to refine the grasp using tactile feedback. This policy is conditioned on the neural descriptors and does not require explicit specification of contact types. We validate NeuralTouch through ablation studies in simulation and zero-shot transfer to real-world manipulation tasks--such as peg-out-in-hole and bottle lid opening--without additional fine-tuning. Results show that NeuralTouch significantly improves grasping accuracy and robustness over baseline methods, offering a general framework for precise, contact-rich robotic manipulation.
Multi-Modal Decentralized Reinforcement Learning for Modular Reconfigurable Lunar Robots
Modular reconfigurable robots suit task-specific space operations, but the combinatorial growth of morphologies hinders unified control. We propose a decentralized reinforcement learning (Dec-RL) scheme where each module learns its own policy: wheel modules use Soft Actor-Critic (SAC) for locomotion and 7-DoF limbs use Proximal Policy Optimization (PPO) for steering and manipulation, enabling zero-shot generalization to unseen configurations. In simulation, the steering policy achieved a mean absolute error of 3.63{\deg} between desired and induced angles; the manipulation policy plateaued at 84.6 % success on a target-offset criterion; and the wheel policy cut average motor torque by 95.4 % relative to baseline while maintaining 99.6 % success. Lunar-analogue field tests validated zero-shot integration for autonomous locomotion, steering, and preliminary alignment for reconfiguration. The system transitioned smoothly among synchronous, parallel, and sequential modes for Policy Execution, without idle states or control conflicts, indicating a scalable, reusable, and robust approach for modular lunar robots.
comment: Accepted in IEEE iSpaRo 2025. Awaiting Publication
Dino-Diffusion Modular Designs Bridge the Cross-Domain Gap in Autonomous Parking
Parking is a critical pillar of driving safety. While recent end-to-end (E2E) approaches have achieved promising in-domain results, robustness under domain shifts (e.g., weather and lighting changes) remains a key challenge. Rather than relying on additional data, in this paper, we propose Dino-Diffusion Parking (DDP), a domain-agnostic autonomous parking pipeline that integrates visual foundation models with diffusion-based planning to enable generalized perception and robust motion planning under distribution shifts. We train our pipeline in CARLA at regular setting and transfer it to more adversarial settings in a zero-shot fashion. Our model consistently achieves a parking success rate above 90% across all tested out-of-distribution (OOD) scenarios, with ablation studies confirming that both the network architecture and algorithmic design significantly enhance cross-domain performance over existing baselines. Furthermore, testing in a 3D Gaussian splatting (3DGS) environment reconstructed from a real-world parking lot demonstrates promising sim-to-real transfer.
comment: Code is at https://github.com/ChampagneAndfragrance/Dino_Diffusion_Parking_Official
MemER: Scaling Up Memory for Robot Control via Experience Retrieval
Humans routinely rely on memory to perform tasks, yet most robot policies lack this capability; our goal is to endow robot policies with the same ability. Naively conditioning on long observation histories is computationally expensive and brittle under covariate shift, while indiscriminate subsampling of history leads to irrelevant or redundant information. We propose a hierarchical policy framework, where the high-level policy is trained to select and track previous relevant keyframes from its experience. The high-level policy uses selected keyframes and the most recent frames when generating text instructions for a low-level policy to execute. This design is compatible with existing vision-language-action (VLA) models and enables the system to efficiently reason over long-horizon dependencies. In our experiments, we finetune Qwen2.5-VL-7B-Instruct and $\pi_{0.5}$ as the high-level and low-level policies respectively, using demonstrations supplemented with minimal language annotations. Our approach, MemER, outperforms prior methods on three real-world long-horizon robotic manipulation tasks that require minutes of memory. Videos and code can be found at https://jen-pan.github.io/memer/.
comment: Project page: https://jen-pan.github.io/memer/
Kinaema: a recurrent sequence model for memory and pose in motion
One key aspect of spatially aware robots is the ability to "find their bearings", ie. to correctly situate themselves in previously seen spaces. In this work, we focus on this particular scenario of continuous robotics operations, where information observed before an actual episode start is exploited to optimize efficiency. We introduce a new model, Kinaema, and agent, capable of integrating a stream of visual observations while moving in a potentially large scene, and upon request, processing a query image and predicting the relative position of the shown space with respect to its current position. Our model does not explicitly store an observation history, therefore does not have hard constraints on context length. It maintains an implicit latent memory, which is updated by a transformer in a recurrent way, compressing the history of sensor readings into a compact representation. We evaluate the impact of this model in a new downstream task we call "Mem-Nav". We show that our large-capacity recurrent model maintains a useful representation of the scene, navigates to goals observed before the actual episode start, and is computationally efficient, in particular compared to classical transformers with attention over an observation history.
comment: 10 pages + references + checklist + appendix, 29 pages total
NODA-MMH: Certified Learning-Aided Nonlinear Control for Magnetically-Actuated Swarm Experiment Toward On-Orbit Proof
This study experimentally validates the principle of large-scale satellite swarm control through learning-aided magnetic field interactions generated by satellite-mounted magnetorquers. This actuation presents a promising solution for the long-term formation maintenance of multiple satellites and has primarily been demonstrated in ground-based testbeds for two-satellite position control. However, as the number of satellites increases beyond three, fundamental challenges coupled with the high nonlinearity arise: 1) nonholonomic constraints, 2) underactuation, 3) scalability, and 4) computational cost. Previous studies have shown that time-integrated current control theoretically solves these problems, where the average actuator outputs align with the desired command, and a learning-based technique further enhances their performance. Through multiple experiments, we validate critical aspects of learning-aided time-integrated current control: (1) enhanced controllability of the averaged system dynamics, with a theoretically guaranteed error bound, and (2) decentralized current management. We design two-axis coils and a ground-based experimental setup utilizing an air-bearing platform, enabling a mathematical replication of orbital dynamics. Based on the effectiveness of the learned interaction model, we introduce NODA-MMH (Neural power-Optimal Dipole Allocation for certified learned Model-based Magnetically swarm control Harness) for model-based power-optimal swarm control. This study complements our tutorial paper on magnetically actuated swarms for the long-term formation maintenance problem.
comment: Accepted for presentation at the 2025 International Conference on Space Robotics (iSpaRo 2025)
A Contact-Driven Framework for Manipulating in the Blind
Robots often face manipulation tasks in environments where vision is inadequate due to clutter, occlusions, or poor lighting--for example, reaching a shutoff valve at the back of a sink cabinet or locating a light switch above a crowded shelf. In such settings, robots, much like humans, must rely on contact feedback to distinguish free from occupied space and navigate around obstacles. Many of these environments often exhibit strong structural priors--for instance, pipes often span across sink cabinets--that can be exploited to anticipate unseen structure and avoid unnecessary collisions. We present a theoretically complete and empirically efficient framework for manipulation in the blind that integrates contact feedback with structural priors to enable robust operation in unknown environments. The framework comprises three tightly coupled components: (i) a contact detection and localization module that utilizes joint torque sensing with a contact particle filter to detect and localize contacts, (ii) an occupancy estimation module that uses the history of contact observations to build a partial occupancy map of the workspace and extrapolate it into unexplored regions with learned predictors, and (iii) a planning module that accounts for the fact that contact localization estimates and occupancy predictions can be noisy, computing paths that avoid collisions and complete tasks efficiently without eliminating feasible solutions. We evaluate the system in simulation and in the real world on a UR10e manipulator across two domestic tasks--(i) manipulating a valve under a kitchen sink surrounded by pipes and (ii) retrieving a target object from a cluttered shelf. Results show that the framework reliably solves these tasks, achieving up to a 2x reduction in task completion time compared to baselines, with ablations confirming the contribution of each module.
Reinforcement Learning-based Robust Wall Climbing Locomotion Controller in Ferromagnetic Environment
We present a reinforcement learning framework for quadrupedal wall-climbing locomotion that explicitly addresses uncertainty in magnetic foot adhesion. A physics-based adhesion model of a quadrupedal magnetic climbing robot is incorporated into simulation to capture partial contact, air-gap sensitivity, and probabilistic attachment failures. To stabilize learning and enable reliable transfer, we design a three-phase curriculum: (1) acquire a crawl gait on flat ground without adhesion, (2) gradually rotate the gravity vector to vertical while activating the adhesion model, and (3) inject stochastic adhesion failures to encourage slip recovery. The learned policy achieves a high success rate, strong adhesion retention, and rapid recovery from detachment in simulation under degraded adhesion. Compared with a model predictive control (MPC) baseline that assumes perfect adhesion, our controller maintains locomotion when attachment is intermittently lost. Hardware experiments with the untethered robot further confirm robust vertical crawling on steel surfaces, maintaining stability despite transient misalignment and incomplete attachment. These results show that combining curriculum learning with realistic adhesion modeling provides a resilient sim-to-real framework for magnetic climbing robots in complex environments.
comment: 8 pages, 6 figures
PathFormer: A Transformer with 3D Grid Constraints for Digital Twin Robot-Arm Trajectory Generation
Robotic arms require precise, task-aware trajectory planning, yet sequence models that ignore motion structure often yield invalid or inefficient executions. We present a Path-based Transformer that encodes robot motion with a 3-grid (where/what/when) representation and constraint-masked decoding, enforcing lattice-adjacent moves and workspace bounds while reasoning over task graphs and action order. Trained on 53,755 trajectories (80% train / 20% validation), the model aligns closely with ground truth -- 89.44% stepwise accuracy, 93.32% precision, 89.44% recall, and 90.40% F1 -- with 99.99% of paths legal by construction. Compiled to motor primitives on an xArm Lite 6 with a depth-camera digital twin, it attains up to 97.5% reach and 92.5% pick success in controlled tests, and 86.7% end-to-end success across 60 language-specified tasks in cluttered scenes, absorbing slips and occlusions via local re-grounding without global re-planning. These results show that path-structured representations enable Transformers to generate accurate, reliable, and interpretable robot trajectories, bridging graph-based planning and sequence-based learning and providing a practical foundation for general-purpose manipulation and sim-to-real transfer.
comment: 8 pages, 7 figures, 7 tables
Sequentially Teaching Sequential Tasks $(ST)^2$: Teaching Robots Long-horizon Manipulation Skills
Learning from demonstration is effective for teaching robots complex skills with high sample efficiency. However, teaching long-horizon tasks with multiple skills is difficult, as deviations accumulate, distributional shift increases, and human teachers become fatigued, raising the chance of failure. In this work, we study user responses to two teaching frameworks: (i) a traditional monolithic approach, where users demonstrate the entire trajectory of a long-horizon task; and (ii) a sequential approach, where the task is segmented by the user and demonstrations are provided step by step. To support this study, we introduce $(ST)^2$, a sequential method for learning long-horizon manipulation tasks that allows users to control the teaching flow by defining key points, enabling incremental and structured demonstrations. We conducted a user study on a restocking task with 16 participants in a realistic retail environment to evaluate both user preference and method effectiveness. Our objective and subjective results show that both methods achieve similar trajectory quality and success rates. Some participants preferred the sequential approach for its iterative control, while others favored the monolithic approach for its simplicity.
HRT1: One-Shot Human-to-Robot Trajectory Transfer for Mobile Manipulation
We introduce a novel system for human-to-robot trajectory transfer that enables robots to manipulate objects by learning from human demonstration videos. The system consists of four modules. The first module is a data collection module that is designed to collect human demonstration videos from the point of view of a robot using an AR headset. The second module is a video understanding module that detects objects and extracts 3D human-hand trajectories from demonstration videos. The third module transfers a human-hand trajectory into a reference trajectory of a robot end-effector in 3D space. The last module utilizes a trajectory optimization algorithm to solve a trajectory in the robot configuration space that can follow the end-effector trajectory transferred from the human demonstration. Consequently, these modules enable a robot to watch a human demonstration video once and then repeat the same mobile manipulation task in different environments, even when objects are placed differently from the demonstrations. Experiments of different manipulation tasks are conducted on a mobile manipulator to verify the effectiveness of our system
comment: 14 pages, 11 figures and 3 tables. Project page is available at \url{https://irvlutd.github.io/HRT1/}
Robust Point Cloud Reinforcement Learning via PCA-Based Canonicalization
Reinforcement Learning (RL) from raw visual input has achieved impressive successes in recent years, yet it remains fragile to out-of-distribution variations such as changes in lighting, color, and viewpoint. Point Cloud Reinforcement Learning (PC-RL) offers a promising alternative by mitigating appearance-based brittleness, but its sensitivity to camera pose mismatches continues to undermine reliability in realistic settings. To address this challenge, we propose PCA Point Cloud (PPC), a canonicalization framework specifically tailored for downstream robotic control. PPC maps point clouds under arbitrary rigid-body transformations to a unique canonical pose, aligning observations to a consistent frame, thereby substantially decreasing viewpoint-induced inconsistencies. In our experiments, we show that PPC improves robustness to unseen camera poses across challenging robotic tasks, providing a principled alternative to domain randomization.
SutureBot: A Precision Framework & Benchmark For Autonomous End-to-End Suturing NeurIPS 2025
Robotic suturing is a prototypical long-horizon dexterous manipulation task, requiring coordinated needle grasping, precise tissue penetration, and secure knot tying. Despite numerous efforts toward end-to-end autonomy, a fully autonomous suturing pipeline has yet to be demonstrated on physical hardware. We introduce SutureBot: an autonomous suturing benchmark on the da Vinci Research Kit (dVRK), spanning needle pickup, tissue insertion, and knot tying. To ensure repeatability, we release a high-fidelity dataset comprising 1,890 suturing demonstrations. Furthermore, we propose a goal-conditioned framework that explicitly optimizes insertion-point precision, improving targeting accuracy by 59\%-74\% over a task-only baseline. To establish this task as a benchmark for dexterous imitation learning, we evaluate state-of-the-art vision-language-action (VLA) models, including $\pi_0$, GR00T N1, OpenVLA-OFT, and multitask ACT, each augmented with a high-level task-prediction policy. Autonomous suturing is a key milestone toward achieving robotic autonomy in surgery. These contributions support reproducible evaluation and development of precision-focused, long-horizon dexterous manipulation policies necessary for end-to-end suturing. Dataset is available at: https://huggingface.co/datasets/jchen396/suturebot
comment: 10 pages, 5 figures, 4 tables, NeurIPS 2025
Safety Assessment in Reinforcement Learning via Model Predictive Control
Model-free reinforcement learning approaches are promising for control but typically lack formal safety guarantees. Existing methods to shield or otherwise provide these guarantees often rely on detailed knowledge of the safety specifications. Instead, this work's insight is that many difficult-to-specify safety issues are best characterized by invariance. Accordingly, we propose to leverage reversibility as a method for preventing these safety issues throughout the training process. Our method uses model-predictive path integral control to check the safety of an action proposed by a learned policy throughout training. A key advantage of this approach is that it only requires the ability to query the black-box dynamics, not explicit knowledge of the dynamics or safety constraints. Experimental results demonstrate that the proposed algorithm successfully aborts before all unsafe actions, while still achieving comparable training progress to a baseline PPO approach that is allowed to violate safety.
comment: 7 pages, 4 figures
An Experimental Study of Trojan Vulnerabilities in UAV Autonomous Landing
This study investigates the vulnerabilities of autonomous navigation and landing systems in Urban Air Mobility (UAM) vehicles. Specifically, it focuses on Trojan attacks that target deep learning models, such as Convolutional Neural Networks (CNNs). Trojan attacks work by embedding covert triggers within a model's training data. These triggers cause specific failures under certain conditions, while the model continues to perform normally in other situations. We assessed the vulnerability of Urban Autonomous Aerial Vehicles (UAAVs) using the DroNet framework. Our experiments showed a significant drop in accuracy, from 96.4% on clean data to 73.3% on data triggered by Trojan attacks. To conduct this study, we collected a custom dataset and trained models to simulate real-world conditions. We also developed an evaluation framework designed to identify Trojan-infected models. This work demonstrates the potential security risks posed by Trojan attacks and lays the groundwork for future research on enhancing the resilience of UAM systems.
comment: 6 pages
Aircraft Collision Avoidance Systems: Technological Challenges and Solutions on the Path to Regulatory Acceptance
Aircraft collision avoidance systems is critical to modern aviation. These systems are designed to predict potential collisions between aircraft and recommend appropriate avoidance actions. Creating effective collision avoidance systems requires solutions to a variety of technical challenges related to surveillance, decision making, and validation. These challenges have sparked significant research and development efforts over the past several decades that have resulted in a variety of proposed solutions. This article provides an overview of these challenges and solutions with an emphasis on those that have been put through a rigorous validation process and accepted by regulatory bodies. The challenges posed by the collision avoidance problem are often present in other domains, and aircraft collision avoidance systems can serve as case studies that provide valuable insights for a wide range of safety-critical systems.
comment: 32 pages, 9 figures
ROPES: Robotic Pose Estimation via Score-Based Causal Representation Learning NeurIPS 2025
Causal representation learning (CRL) has emerged as a powerful unsupervised framework that (i) disentangles the latent generative factors underlying high-dimensional data, and (ii) learns the cause-and-effect interactions among the disentangled variables. Despite extensive recent advances in identifiability and some practical progress, a substantial gap remains between theory and real-world practice. This paper takes a step toward closing that gap by bringing CRL to robotics, a domain that has motivated CRL. Specifically, this paper addresses the well-defined robot pose estimation -- the recovery of position and orientation from raw images -- by introducing Robotic Pose Estimation via Score-Based CRL (ROPES). Being an unsupervised framework, ROPES embodies the essence of interventional CRL by identifying those generative factors that are actuated: images are generated by intrinsic and extrinsic latent factors (e.g., joint angles, arm/limb geometry, lighting, background, and camera configuration) and the objective is to disentangle and recover the controllable latent variables, i.e., those that can be directly manipulated (intervened upon) through actuation. Interventional CRL theory shows that variables that undergo variations via interventions can be identified. In robotics, such interventions arise naturally by commanding actuators of various joints and recording images under varied controls. Empirical evaluations in semi-synthetic manipulator experiments demonstrate that ROPES successfully disentangles latent generative factors with high fidelity with respect to the ground truth. Crucially, this is achieved by leveraging only distributional changes, without using any labeled data. The paper also includes a comparison with a baseline based on a recently proposed semi-supervised framework. This paper concludes by positioning robot pose estimation as a near-practical testbed for CRL.
comment: A preliminary version of this paper appeared at NeurIPS 2025 Workshop on Embodied World Models for Decision Making
FIMD: Fast Isolated Marker Detection for UV-Based Visual Relative Localisation in Agile UAV Swarms
A novel approach for the fast onboard detection of isolated markers for visual relative localisation of multiple teammates in agile UAV swarms is introduced in this paper. As the detection forms a key component of real-time localisation systems, a three-fold innovation is presented, consisting of an optimised procedure for CPUs, a GPU shader program, and a functionally equivalent FPGA streaming architecture. For the proposed CPU and GPU solutions, the mean processing time per pixel of input camera frames was accelerated by two to three orders of magnitude compared to the \rev{unoptimised state-of-the-art approach}. For the localisation task, the proposed FPGA architecture offered the most significant overall acceleration by minimising the total delay from camera exposure to detection results. Additionally, the proposed solutions were evaluated on various 32-bit and 64-bit embedded platforms to demonstrate their efficiency, as well as their feasibility for applications using low-end UAVs and MAVs. Thus, it has become a crucial enabling technology for agile UAV swarming.
MoTVLA: A Vision-Language-Action Model with Unified Fast-Slow Reasoning
Integrating visual-language instructions into visuomotor policies is gaining momentum in robot learning for enhancing open-world generalization. Despite promising advances, existing approaches face two challenges: limited language steerability when no generated reasoning is used as a condition, or significant inference latency when reasoning is incorporated. In this work, we introduce MoTVLA, a mixture-of-transformers (MoT)-based vision-language-action (VLA) model that integrates fast-slow unified reasoning with behavior policy learning. MoTVLA preserves the general intelligence of pre-trained VLMs (serving as the generalist) for tasks such as perception, scene understanding, and semantic planning, while incorporating a domain expert, a second transformer that shares knowledge with the pretrained VLM, to generate domain-specific fast reasoning (e.g., robot motion decomposition), thereby improving policy execution efficiency. By conditioning the action expert on decomposed motion instructions, MoTVLA can learn diverse behaviors and substantially improve language steerability. Extensive evaluations across natural language processing benchmarks, robotic simulation environments, and real-world experiments confirm the superiority of MoTVLA in both fast-slow reasoning and manipulation task performance.
VO-DP: Semantic-Geometric Adaptive Diffusion Policy for Vision-Only Robotic Manipulation
In the context of imitation learning, visuomotor-based diffusion policy learning is one of the main directions in robotic manipulation. Most of these approaches rely on point clouds as observation inputs and construct scene representations through point clouds feature learning, which enables them to achieve remarkable accuracy. However, the existing literature lacks an in-depth exploration of vision-only solutions that have significant potential. In this paper, we propose a Vision-Only and single-view Diffusion Policy learning method (VO-DP) that leverages pretrained visual foundation models to achieve effective fusion of semantic and geometric features. We utilize intermediate features from VGGT incorporating semantic features from DINOv2 and geometric features from Alternating Attention blocks. Features are fused via cross-attention and spatially compressed with a CNN to form the input to the policy head. Extensive experiments demonstrate that VO-DP not only outperforms the vision-only baseline DP significantly but also exhibits distinct performance trends against the point cloud-based method DP3: in simulation tasks, VO-DP achieves an average success rate of 64.6% on par with DP3 64.0% and far higher than DP 34.8%, while in real-world tasks, it reaches 87.9%, outperforming both DP3 67.5% and DP 11.2% by a notable margin. Further robustness evaluations confirm that VO-DP remains highly stable under varying conditions including color, size, background, and lighting. Lastly, we open-source a training library for robotic manipulation. Built on Accelerate, this library supports multi-machine and multi-GPU parallel training, as well as mixed precision training. It is compatible with visuomotor policies such as DP, DP3 and VO-DP, and also supports the RoboTwin simulator.
Local Guidance for Configuration-Based Multi-Agent Pathfinding
Guidance is an emerging concept that improves the empirical performance of real-time, sub-optimal multi-agent pathfinding (MAPF) methods. It offers additional information to MAPF algorithms to mitigate congestion on a global scale by considering the collective behavior of all agents across the entire workspace. This global perspective helps reduce agents' waiting times, thereby improving overall coordination efficiency. In contrast, this study explores an alternative approach: providing local guidance in the vicinity of each agent. While such localized methods involve recomputation as agents move and may appear computationally demanding, we empirically demonstrate that supplying informative spatiotemporal cues to the planner can significantly improve solution quality without exceeding a moderate time budget. When applied to LaCAM, a leading configuration-based solver, this form of guidance establishes a new performance frontier for MAPF.
comment: 10 pages
ViTacGen: Robotic Pushing with Vision-to-Touch Generation
Robotic pushing is a fundamental manipulation task that requires tactile feedback to capture subtle contact forces and dynamics between the end-effector and the object. However, real tactile sensors often face hardware limitations such as high costs and fragility, and deployment challenges involving calibration and variations between different sensors, while vision-only policies struggle with satisfactory performance. Inspired by humans' ability to infer tactile states from vision, we propose ViTacGen, a novel robot manipulation framework designed for visual robotic pushing with vision-to-touch generation in reinforcement learning to eliminate the reliance on high-resolution real tactile sensors, enabling effective zero-shot deployment on visual-only robotic systems. Specifically, ViTacGen consists of an encoder-decoder vision-to-touch generation network that generates contact depth images, a standardized tactile representation, directly from visual image sequence, followed by a reinforcement learning policy that fuses visual-tactile data with contrastive learning based on visual and generated tactile observations. We validate the effectiveness of our approach in both simulation and real world experiments, demonstrating its superior performance and achieving a success rate of up to 86\%.
Efficient Vision-Language-Action Models for Embodied Manipulation: A Systematic Survey
Vision-Language-Action (VLA) models extend vision-language models to embodied control by mapping natural-language instructions and visual observations to robot actions. Despite their capabilities, VLA systems face significant challenges due to their massive computational and memory demands, which conflict with the constraints of edge platforms such as on-board mobile manipulators that require real-time performance. Addressing this tension has become a central focus of recent research. In light of the growing efforts toward more efficient and scalable VLA systems, this survey provides a systematic review of approaches for improving VLA efficiency, with an emphasis on reducing latency, memory footprint, and training and inference costs. We categorize existing solutions into four dimensions: model architecture, perception feature, action generation, and training/inference strategies, summarizing representative techniques within each category. Finally, we discuss future trends and open challenges, highlighting directions for advancing efficient embodied intelligence.
Prognostic Framework for Robotic Manipulators Operating Under Dynamic Task Severities
Robotic manipulators are critical in many applications but are known to degrade over time. This degradation is influenced by the nature of the tasks performed by the robot. Tasks with higher severity, such as handling heavy payloads, can accelerate the degradation process. One way this degradation is reflected is in the position accuracy of the robot's end-effector. In this paper, we present a prognostic modeling framework that predicts a robotic manipulator's Remaining Useful Life (RUL) while accounting for the effects of task severity. Our framework represents the robot's position accuracy as a Brownian motion process with a random drift parameter that is influenced by task severity. The dynamic nature of task severity is modeled using a continuous-time Markov chain (CTMC). To evaluate RUL, we discuss two approaches -- (1) a novel closed-form expression for Remaining Lifetime Distribution (RLD), and (2) Monte Carlo simulations, commonly used in prognostics literature. Theoretical results establish the equivalence between these RUL computation approaches. We validate our framework through experiments using two distinct physics-based simulators for planar and spatial robot fleets. Our findings show that robots in both fleets experience shorter RUL when handling a higher proportion of high-severity tasks.
comment: Accepted for Publication in IEEE Transactions on Systems, Man, and Cybernetics: Systems
SafeDiver: Cooperative AUV-USV Assisted Diver Communication via Multi-agent Reinforcement Learning Approach
As underwater human activities are increasing, the demand for underwater communication service presents a significant challenge. Existing underwater diver communication methods face hurdles due to inherent disadvantages and complex underwater environments. To address this issue, we propose a scheme that utilizes maritime unmanned systems to assist divers with reliable and high-speed communication. Multiple AUVs are equipped with optical and acoustic multimodal communication devices as relay nodes, providing adaptive communication services based on changes in the diver's activity area. By using a multi-agent reinforcement learning (MARL) approach to control the cooperative movement of AUVs, high-speed and reliable data transmission between divers can be achieved. At the same time, utilizing the advantages of on-demand deployment and wide coverage of unmanned surface vehicles (USVs) as surface relay nodes to coordinate and forward information from AUVs, and controlling AUVs to adaptively select relay USV nodes for data transmission, high-quality communication between divers and surface platform can be achieved. Through simulation verification, the proposed scheme can effectively achieve reliable and high-speed communication for divers.
comment: Withdrawn to reorganize and extend the current findings in a future version
LiDAR, GNSS and IMU Sensor Alignment through Dynamic Time Warping to Construct 3D City Maps
LiDAR-based 3D mapping suffers from cumulative drift causing global misalignment, particularly in GNSS-constrained environments. To address this, we propose a unified framework that fuses LiDAR, GNSS, and IMU data for high-resolution city-scale mapping. The method performs velocity-based temporal alignment using Dynamic Time Warping and refines GNSS and IMU signals via extended Kalman filtering. Local maps are built using Normal Distributions Transform-based registration and pose graph optimization with loop closure detection, while global consistency is enforced using GNSS-constrained anchors followed by fine registration of overlapping segments. We also introduce a large-scale multimodal dataset captured in Perth, Western Australia to facilitate future research in this direction. Our dataset comprises 144,000 frames acquired with a 128-channel Ouster LiDAR, synchronized RTK-GNSS trajectories, and MEMS-IMU measurements across 21 urban loops. To assess geometric consistency, we evaluated our method using alignment metrics based on road centerlines and intersections to capture both global and local accuracy. The proposed framework reduces the average global alignment error from 3.32m to 1.24m, achieving a 61.4% improvement, and significantly decreases the intersection centroid offset from 13.22m to 2.01m, corresponding to an 84.8% enhancement. The constructed high-fidelity map is publicly available through https://ieee-dataport.org/documents/perth-cbd-high-resolution-lidar-map-gnss-and-imu-calibration and its visualization can be viewed in the provided in https://www.youtube.com/watch?v=-ZUgs1KyMks. This dataset and method together establish a new benchmark for evaluating 3D city mapping in GNSS-constrained environments.
S$^2$-Diffusion: Generalizing from Instance-level to Category-level Skills in Robot Manipulation
Recent advances in skill learning has propelled robot manipulation to new heights by enabling it to learn complex manipulation tasks from a practical number of demonstrations. However, these skills are often limited to the particular action, object, and environment \textit{instances} that are shown in the training data, and have trouble transferring to other instances of the same category. In this work we present an open-vocabulary Spatial-Semantic Diffusion policy (S$^2$-Diffusion) which enables generalization from instance-level training data to category-level, enabling skills to be transferable between instances of the same category. We show that functional aspects of skills can be captured via a promptable semantic module combined with a spatial representation. We further propose leveraging depth estimation networks to allow the use of only a single RGB camera. Our approach is evaluated and compared on a diverse number of robot manipulation tasks, both in simulation and in the real world. Our results show that S$^2$-Diffusion is invariant to changes in category-irrelevant factors as well as enables satisfying performance on other instances within the same category, even if it was not trained on that specific instance. Project website: https://s2-diffusion.github.io.
IRIS: An Immersive Robot Interaction System
This paper introduces IRIS, an Immersive Robot Interaction System leveraging Extended Reality (XR). Existing XR-based systems enable efficient data collection but are often challenging to reproduce and reuse due to their specificity to particular robots, objects, simulators, and environments. IRIS addresses these issues by supporting immersive interaction and data collection across diverse simulators and real-world scenarios. It visualizes arbitrary rigid and deformable objects, robots from simulation, and integrates real-time sensor-generated point clouds for real-world applications. Additionally, IRIS enhances collaborative capabilities by enabling multiple users to simultaneously interact within the same virtual scene. Extensive experiments demonstrate that IRIS offers efficient and intuitive data collection in both simulated and real-world settings.
Leveraging Analytic Gradients in Provably Safe Reinforcement Learning
The deployment of autonomous robots in safety-critical applications requires safety guarantees. Provably safe reinforcement learning is an active field of research that aims to provide such guarantees using safeguards. These safeguards should be integrated during training to reduce the sim-to-real gap. While there are several approaches for safeguarding sampling-based reinforcement learning, analytic gradient-based reinforcement learning often achieves superior performance from fewer environment interactions. However, there is no safeguarding approach for this learning paradigm yet. Our work addresses this gap by developing the first effective safeguard for analytic gradient-based reinforcement learning. We analyse existing, differentiable safeguards, adapt them through modified mappings and gradient formulations, and integrate them into a state-of-the-art learning algorithm and a differentiable simulation. Using numerical experiments on three control tasks, we evaluate how different safeguards affect learning. The results demonstrate safeguarded training without compromising performance. Additional visuals are provided at \href{https://timwalter.github.io/safe-agb-rl.github.io}{timwalter.github.io/safe-agb-rl.github.io}.
comment: 21 pages, 10 figures
Leveraging Sidewalk Robots for Walkability-Related Analyses
Walkability is a key component of sustainable urban development, while collecting detailed data on sidewalks (or pedestrian infrastructures) remains challenging due to the high costs and limited scalability of traditional methods. Sidewalk delivery robots, increasingly deployed in urban environments, offer a promising solution to these limitations. This paper explores how these robots can serve as mobile data collection platforms, capturing sidewalk-level features related to walkability in a scalable, automated, and real-time manner. A sensor-equipped robot was deployed on a sidewalk network at KTH in Stockholm, completing 101 trips covering 900 segment records. From the collected data, different typologies of features are derived, including robot trip characteristics (e.g., speed, duration), sidewalk conditions (e.g., width, surface unevenness), and sidewalk utilization (e.g., pedestrian density). Their walkability-related implications were investigated with a series of analyses. The results demonstrate that pedestrian movement patterns are strongly influenced by sidewalk characteristics, with higher density, reduced width, and surface irregularity associated with slower and more variable trajectories. Notably, robot speed closely mirrors pedestrian behavior, highlighting its potential as a proxy for assessing pedestrian dynamics. The proposed framework enables continuous monitoring of sidewalk conditions and pedestrian behavior, contributing to the development of more walkable, inclusive, and responsive urban environments.
Towards Robust Zero-Shot Reinforcement Learning
The recent development of zero-shot reinforcement learning (RL) has opened a new avenue for learning pre-trained generalist policies that can adapt to arbitrary new tasks in a zero-shot manner. While the popular Forward-Backward representations (FB) and related methods have shown promise in zero-shot RL, we empirically found that their modeling lacks expressivity and that extrapolation errors caused by out-of-distribution (OOD) actions during offline learning sometimes lead to biased representations, ultimately resulting in suboptimal performance. To address these issues, we propose Behavior-REgularizEd Zero-shot RL with Expressivity enhancement (BREEZE), an upgraded FB-based framework that simultaneously enhances learning stability, policy extraction capability, and representation learning quality. BREEZE introduces behavioral regularization in zero-shot RL policy learning, transforming policy optimization into a stable in-sample learning paradigm. Additionally, BREEZE extracts the policy using a task-conditioned diffusion model, enabling the generation of high-quality and multimodal action distributions in zero-shot RL settings. Moreover, BREEZE employs expressive attention-based architectures for representation modeling to capture the complex relationships between environmental dynamics. Extensive experiments on ExORL and D4RL Kitchen demonstrate that BREEZE achieves the best or near-the-best performance while exhibiting superior robustness compared to prior offline zero-shot RL methods. The official implementation is available at: https://github.com/Whiterrrrr/BREEZE.
comment: Neurips 2025, 29 pages, 19 figures
CE-Nav: Flow-Guided Reinforcement Refinement for Cross-Embodiment Local Navigation
Generalizing local navigation policies across diverse robot morphologies is a critical challenge. Progress is often hindered by the need for costly and embodiment-specific data, the tight coupling of planning and control, and the "disastrous averaging" problem where deterministic models fail to capture multi-modal decisions (e.g., turning left or right). We introduce CE-Nav, a novel two-stage (IL-then-RL) framework that systematically decouples universal geometric reasoning from embodiment-specific dynamic adaptation. First, we train an embodiment-agnostic General Expert offline using imitation learning. This expert, a conditional normalizing flow model named VelFlow, learns the full distribution of kinematically-sound actions from a large-scale dataset generated by a classical planner, completely avoiding real robot data and resolving the multi-modality issue. Second, for a new robot, we freeze the expert and use it as a guiding prior to train a lightweight, Dynamics-Aware Refiner via online reinforcement learning. This refiner rapidly learns to compensate for the target robot's specific dynamics and controller imperfections with minimal environmental interaction. Extensive experiments on quadrupeds, bipeds, and quadrotors show that CE-Nav achieves state-of-the-art performance while drastically reducing adaptation cost. Successful real-world deployments further validate our approach as an efficient and scalable solution for building generalizable navigation systems. Code is available at https://github.com/amap-cvlab/CE-Nav.
comment: Project Page: https://ce-nav.github.io/. Code is available at https://github.com/amap-cvlab/CE-Nav
DexCanvas: Bridging Human Demonstrations and Robot Learning for Dexterous Manipulation
We present DexCanvas, a large-scale hybrid real-synthetic human manipulation dataset containing 7,000 hours of dexterous hand-object interactions seeded from 70 hours of real human demonstrations, organized across 21 fundamental manipulation types based on the Cutkosky taxonomy. Each entry combines synchronized multi-view RGB-D, high-precision mocap with MANO hand parameters, and per-frame contact points with physically consistent force profiles. Our real-to-sim pipeline uses reinforcement learning to train policies that control an actuated MANO hand in physics simulation, reproducing human demonstrations while discovering the underlying contact forces that generate the observed object motion. DexCanvas is the first manipulation dataset to combine large-scale real demonstrations, systematic skill coverage based on established taxonomies, and physics-validated contact annotations. The dataset can facilitate research in robotic manipulation learning, contact-rich control, and skill transfer across different hand morphologies.
Online Intrinsic Rewards for Decision Making Agents from Large Language Model Feedback
Automatically synthesizing dense rewards from natural language descriptions is a promising paradigm in reinforcement learning (RL), with applications to sparse reward problems, open-ended exploration, and hierarchical skill design. Recent works have made promising steps by exploiting the prior knowledge of large language models (LLMs). However, these approaches suffer from important limitations: they are either not scalable to problems requiring billions of environment samples, due to requiring LLM annotations for each observation, or they require a diverse offline dataset, which may not exist or be impossible to collect. In this work, we address these limitations through a combination of algorithmic and systems-level contributions. We propose ONI, a distributed architecture that simultaneously learns an RL policy and an intrinsic reward function using LLM feedback. Our approach annotates the agent's collected experience via an asynchronous LLM server, which is then distilled into an intrinsic reward model. We explore a range of algorithmic choices for reward modeling with varying complexity, including hashing, classification, and ranking models. Our approach achieves state-of-the-art performance across a range of challenging tasks from the NetHack Learning Environment, while removing the need for large offline datasets required by prior work. We make our code available at https://github.com/facebookresearch/oni.
comment: RLC 2025
Knot So Simple: A Minimalistic Environment for Spatial Reasoning
We propose KnotGym, an interactive environment for complex, spatial reasoning and manipulation. KnotGym includes goal-oriented rope manipulation tasks with varying levels of complexity, all requiring acting from pure image observations. Tasks are defined along a clear and quantifiable axis of complexity based on the number of knot crossings, creating a natural generalization test. KnotGym has a simple observation space, allowing for scalable development, yet it highlights core challenges in integrating acute perception, spatial reasoning, and grounded manipulation. We evaluate methods of different classes, including model-based RL, model-predictive control, and chain-of-thought reasoning, and illustrate the challenges KnotGym presents. KnotGym is available at https://github.com/lil-lab/knotgym.
HYPE: Hybrid Planning with Ego Proposal-Conditioned Predictions SC 2025
Safe and interpretable motion planning in complex urban environments needs to reason about bidirectional multi-agent interactions. This reasoning requires to estimate the costs of potential ego driving maneuvers. Many existing planners generate initial trajectories with sampling-based methods and refine them by optimizing on learned predictions of future environment states, which requires a cost function that encodes the desired vehicle behavior. Designing such a cost function can be very challenging, especially if a wide range of complex urban scenarios has to be considered. We propose HYPE: HYbrid Planning with Ego proposal-conditioned predictions, a planner that integrates multimodal trajectory proposals from a learned proposal model as heuristic priors into a Monte Carlo Tree Search (MCTS) refinement. To model bidirectional interactions, we introduce an ego-conditioned occupancy prediction model, enabling consistent, scene-aware reasoning. Our design significantly simplifies cost function design in refinement by considering proposal-driven guidance, requiring only minimalistic grid-based cost terms. Evaluations on large-scale real-world benchmarks nuPlan and DeepUrban show that HYPE effectively achieves state-of-the-art performance, especially in safety and adaptability.
comment: Accepted to IEEE ITSC 2025
HAVT-IVD: Heterogeneity-Aware Cross-Modal Network for Audio-Visual Surveillance: Idling Vehicles Detection With Multichannel Audio and Multiscale Visual Cues
Idling vehicle detection (IVD) uses surveillance video and multichannel audio to localize and classify vehicles in the last frame as moving, idling, or engine-off in pick-up zones. IVD faces three challenges: (i) modality heterogeneity between visual cues and audio patterns; (ii) large box scale variation requiring multi-resolution detection; and (iii) training instability due to coupled detection heads. The previous end-to-end (E2E) model with simple CBAM-based bi-modal attention fails to handle these issues and often misses vehicles. We propose HAVT-IVD, a heterogeneity-aware network with a visual feature pyramid and decoupled heads. Experiments show HAVT-IVD improves mAP by 7.66 over the disjoint baseline and 9.42 over the E2E baseline.
Multiagent Systems
Thought Communication in Multiagent Collaboration NeurIPS 2025
Natural language has long enabled human cooperation, but its lossy, ambiguous, and indirect nature limits the potential of collective intelligence. While machines are not subject to these constraints, most LLM-based multi-agent systems still rely solely on natural language, exchanging tokens or their embeddings. To go beyond language, we introduce a new paradigm, thought communication, which enables agents to interact directly mind-to-mind, akin to telepathy. To uncover these latent thoughts in a principled way, we formalize the process as a general latent variable model, where agent states are generated by an unknown function of underlying thoughts. We prove that, in a nonparametric setting without auxiliary information, both shared and private latent thoughts between any pair of agents can be identified. Moreover, the global structure of thought sharing, including which agents share which thoughts and how these relationships are structured, can also be recovered with theoretical guarantees. Guided by the established theory, we develop a framework that extracts latent thoughts from all agents prior to communication and assigns each agent the relevant thoughts, along with their sharing patterns. This paradigm naturally extends beyond LLMs to all modalities, as most observational data arise from hidden generative processes. Experiments on both synthetic and real-world benchmarks validate the theory and demonstrate the collaborative advantages of thought communication. We hope this work illuminates the potential of leveraging the hidden world, as many challenges remain unsolvable through surface-level observation alone, regardless of compute or data scale.
comment: NeurIPS 2025 Spotlight
Structures generated in a multiagent system performing information fusion in peer-to-peer resource-constrained networks
There has recently been a major advance with respect to how information fusion is performed. Information fusion has gone from being conceived as a purely hierarchical procedure, as is the case of traditional military applications, to now being regarded collaboratively, as holonic fusion, which is better suited for civil applications and edge organizations. The above paradigm shift is being boosted as information fusion gains ground in different non-military areas, and human-computer and machine-machine communications, where holarchies, which are more flexible structures than ordinary, static hierarchies, become more widespread. This paper focuses on showing how holonic structures tend to be generated when there are constraints on resources (energy, available messages, time, etc.) for interactions based on a set of fully intercommunicating elements (peers) whose components fuse information as a means of optimizing the impact of vagueness and uncertainty present message exchanges. Holon formation is studied generically based on a multiagent system model, and an example of its possible operation is shown. Holonic structures have a series of advantages, such as adaptability, to sudden changes in the environment or its composition, are somewhat autonomous and are capable of cooperating in order to achieve a common goal. This can be useful when the shortage of resources prevents communications or when the system components start to fail.
Balancing Specialization and Centralization: A Multi-Agent Reinforcement Learning Benchmark for Sequential Industrial Control
Autonomous control of multi-stage industrial processes requires both local specialization and global coordination. Reinforcement learning (RL) offers a promising approach, but its industrial adoption remains limited due to challenges such as reward design, modularity, and action space management. Many academic benchmarks differ markedly from industrial control problems, limiting their transferability to real-world applications. This study introduces an enhanced industry-inspired benchmark environment that combines tasks from two existing benchmarks, SortingEnv and ContainerGym, into a sequential recycling scenario with sorting and pressing operations. We evaluate two control strategies: a modular architecture with specialized agents and a monolithic agent governing the full system, while also analyzing the impact of action masking. Our experiments show that without action masking, agents struggle to learn effective policies, with the modular architecture performing better. When action masking is applied, both architectures improve substantially, and the performance gap narrows considerably. These results highlight the decisive role of action space constraints and suggest that the advantages of specialization diminish as action complexity is reduced. The proposed benchmark thus provides a valuable testbed for exploring practical and robust multi-agent RL solutions in industrial automation, while contributing to the ongoing debate on centralization versus specialization.
comment: Preprint (submitted version) to be presented at the 13th International Conference on Industrial Engineering and Applications (ICIEA-EU), Milan, 2026. The final Version of Record will appear in the official conference proceedings
Multi-Modal Decentralized Reinforcement Learning for Modular Reconfigurable Lunar Robots
Modular reconfigurable robots suit task-specific space operations, but the combinatorial growth of morphologies hinders unified control. We propose a decentralized reinforcement learning (Dec-RL) scheme where each module learns its own policy: wheel modules use Soft Actor-Critic (SAC) for locomotion and 7-DoF limbs use Proximal Policy Optimization (PPO) for steering and manipulation, enabling zero-shot generalization to unseen configurations. In simulation, the steering policy achieved a mean absolute error of 3.63{\deg} between desired and induced angles; the manipulation policy plateaued at 84.6 % success on a target-offset criterion; and the wheel policy cut average motor torque by 95.4 % relative to baseline while maintaining 99.6 % success. Lunar-analogue field tests validated zero-shot integration for autonomous locomotion, steering, and preliminary alignment for reconfiguration. The system transitioned smoothly among synchronous, parallel, and sequential modes for Policy Execution, without idle states or control conflicts, indicating a scalable, reusable, and robust approach for modular lunar robots.
comment: Accepted in IEEE iSpaRo 2025. Awaiting Publication
From Generation to Attribution: Music AI Agent Architectures for the Post-Streaming Era NeurIPS 2025
Generative AI is reshaping music creation, but its rapid growth exposes structural gaps in attribution, rights management, and economic models. Unlike past media shifts, from live performance to recordings, downloads, and streaming, AI transforms the entire lifecycle of music, collapsing boundaries between creation, distribution, and monetization. However, existing streaming systems, with opaque and concentrated royalty flows, are ill-equipped to handle the scale and complexity of AI-driven production. We propose a content-based Music AI Agent architecture that embeds attribution directly into the creative workflow through block-level retrieval and agentic orchestration. Designed for iterative, session-based interaction, the system organizes music into granular components (Blocks) stored in BlockDB; each use triggers an Attribution Layer event for transparent provenance and real-time settlement. This framework reframes AI from a generative tool into infrastructure for a Fair AI Media Platform. By enabling fine-grained attribution, equitable compensation, and participatory engagement, it points toward a post-streaming paradigm where music functions not as a static catalog but as a collaborative and adaptive ecosystem.
comment: Accepted to the NeurIPS 2025 AI4Music Workshop
High-order Interactions Modeling for Interpretable Multi-Agent Q-Learning
The ability to model interactions among agents is crucial for effective coordination and understanding their cooperation mechanisms in multi-agent reinforcement learning (MARL). However, previous efforts to model high-order interactions have been primarily hindered by the combinatorial explosion or the opaque nature of their black-box network structures. In this paper, we propose a novel value decomposition framework, called Continued Fraction Q-Learning (QCoFr), which can flexibly capture arbitrary-order agent interactions with only linear complexity $\mathcal{O}\left({n}\right)$ in the number of agents, thus avoiding the combinatorial explosion when modeling rich cooperation. Furthermore, we introduce the variational information bottleneck to extract latent information for estimating credits. This latent information helps agents filter out noisy interactions, thereby significantly enhancing both cooperation and interpretability. Extensive experiments demonstrate that QCoFr not only consistently achieves better performance but also provides interpretability that aligns with our theoretical analysis.
comment: 39th Conference on Neural Information Processing Systems
RADAR: A Risk-Aware Dynamic Multi-Agent Framework for LLM Safety Evaluation via Role-Specialized Collaboration
Existing safety evaluation methods for large language models (LLMs) suffer from inherent limitations, including evaluator bias and detection failures arising from model homogeneity, which collectively undermine the robustness of risk evaluation processes. This paper seeks to re-examine the risk evaluation paradigm by introducing a theoretical framework that reconstructs the underlying risk concept space. Specifically, we decompose the latent risk concept space into three mutually exclusive subspaces: the explicit risk subspace (encompassing direct violations of safety guidelines), the implicit risk subspace (capturing potential malicious content that requires contextual reasoning for identification), and the non-risk subspace. Furthermore, we propose RADAR, a multi-agent collaborative evaluation framework that leverages multi-round debate mechanisms through four specialized complementary roles and employs dynamic update mechanisms to achieve self-evolution of risk concept distributions. This approach enables comprehensive coverage of both explicit and implicit risks while mitigating evaluator bias. To validate the effectiveness of our framework, we construct an evaluation dataset comprising 800 challenging cases. Extensive experiments on our challenging testset and public benchmarks demonstrate that RADAR significantly outperforms baseline evaluation methods across multiple dimensions, including accuracy, stability, and self-evaluation risk sensitivity. Notably, RADAR achieves a 28.87% improvement in risk identification accuracy compared to the strongest baseline evaluation method.
Local Guidance for Configuration-Based Multi-Agent Pathfinding
Guidance is an emerging concept that improves the empirical performance of real-time, sub-optimal multi-agent pathfinding (MAPF) methods. It offers additional information to MAPF algorithms to mitigate congestion on a global scale by considering the collective behavior of all agents across the entire workspace. This global perspective helps reduce agents' waiting times, thereby improving overall coordination efficiency. In contrast, this study explores an alternative approach: providing local guidance in the vicinity of each agent. While such localized methods involve recomputation as agents move and may appear computationally demanding, we empirically demonstrate that supplying informative spatiotemporal cues to the planner can significantly improve solution quality without exceeding a moderate time budget. When applied to LaCAM, a leading configuration-based solver, this form of guidance establishes a new performance frontier for MAPF.
comment: 10 pages
Lessons Learned: A Multi-Agent Framework for Code LLMs to Learn and Improve NeurIPS 2025
Recent studies show that LLMs possess different skills and specialize in different tasks. In fact, we observe that their varied performance occur in several levels of granularity. For example, in the code optimization task, code LLMs excel at different optimization categories and no one dominates others. This observation prompts the question of how one leverages multiple LLM agents to solve a coding problem without knowing their complementary strengths a priori. We argue that a team of agents can learn from each other's successes and failures so as to improve their own performance. Thus, a lesson is the knowledge produced by an agent and passed on to other agents in the collective solution process. We propose a lesson-based collaboration framework, design the lesson solicitation--banking--selection mechanism, and demonstrate that a team of small LLMs with lessons learned can outperform a much larger LLM and other multi-LLM collaboration methods.
comment: NeurIPS 2025. Code is available at https://github.com/MITIBM-FastCoder/LessonL
SafeDiver: Cooperative AUV-USV Assisted Diver Communication via Multi-agent Reinforcement Learning Approach
As underwater human activities are increasing, the demand for underwater communication service presents a significant challenge. Existing underwater diver communication methods face hurdles due to inherent disadvantages and complex underwater environments. To address this issue, we propose a scheme that utilizes maritime unmanned systems to assist divers with reliable and high-speed communication. Multiple AUVs are equipped with optical and acoustic multimodal communication devices as relay nodes, providing adaptive communication services based on changes in the diver's activity area. By using a multi-agent reinforcement learning (MARL) approach to control the cooperative movement of AUVs, high-speed and reliable data transmission between divers can be achieved. At the same time, utilizing the advantages of on-demand deployment and wide coverage of unmanned surface vehicles (USVs) as surface relay nodes to coordinate and forward information from AUVs, and controlling AUVs to adaptively select relay USV nodes for data transmission, high-quality communication between divers and surface platform can be achieved. Through simulation verification, the proposed scheme can effectively achieve reliable and high-speed communication for divers.
comment: Withdrawn to reorganize and extend the current findings in a future version
Beyond Static Responses: Multi-Agent LLM Systems as a New Paradigm for Social Science Research
As large language models (LLMs) transition from static tools to fully agentic systems, their potential for transforming social science research has become increasingly evident. This paper introduces a structured framework for understanding the diverse applications of LLM-based agents, ranging from simple data processors to complex, multi-agent systems capable of simulating emergent social dynamics. By mapping this developmental continuum across six levels, the paper clarifies the technical and methodological boundaries between different agentic architectures, providing a comprehensive overview of current capabilities and future potential. It highlights how lower-tier systems streamline conventional tasks like text classification and data annotation, while higher-tier systems enable novel forms of inquiry, including the study of group dynamics, norm formation, and large-scale social processes. However, these advancements also introduce significant challenges, including issues of reproducibility, ethical oversight, and the risk of emergent biases. The paper critically examines these concerns, emphasizing the need for robust validation protocols, interdisciplinary collaboration, and standardized evaluation metrics. It argues that while LLM-based agents hold transformative potential for the social sciences, realizing this promise will require careful, context-sensitive deployment and ongoing methodological refinement. The paper concludes with a call for future research that balances technical innovation with ethical responsibility, encouraging the development of agentic systems that not only replicate but also extend the frontiers of social science, offering new insights into the complexities of human behavior.
Empirical Study on Robustness and Resilience in Cooperative Multi-Agent Reinforcement Learning NeurIPS 2025
In cooperative Multi-Agent Reinforcement Learning (MARL), it is a common practice to tune hyperparameters in ideal simulated environments to maximize cooperative performance. However, policies tuned for cooperation often fail to maintain robustness and resilience under real-world uncertainties. Building trustworthy MARL systems requires a deep understanding of robustness, which ensures stability under uncertainties, and resilience, the ability to recover from disruptions--a concept extensively studied in control systems but largely overlooked in MARL. In this paper, we present a large-scale empirical study comprising over 82,620 experiments to evaluate cooperation, robustness, and resilience in MARL across 4 real-world environments, 13 uncertainty types, and 15 hyperparameters. Our key findings are: (1) Under mild uncertainty, optimizing cooperation improves robustness and resilience, but this link weakens as perturbations intensify. Robustness and resilience also varies by algorithm and uncertainty type. (2) Robustness and resilience do not generalize across uncertainty modalities or agent scopes: policies robust to action noise for all agents may fail under observation noise on a single agent. (3) Hyperparameter tuning is critical for trustworthy MARL: surprisingly, standard practices like parameter sharing, GAE, and PopArt can hurt robustness, while early stopping, high critic learning rates, and Leaky ReLU consistently help. By optimizing hyperparameters only, we observe substantial improvement in cooperation, robustness and resilience across all MARL backbones, with the phenomenon also generalizing to robust MARL methods across these backbones. Code and results available at https://github.com/BUAA-TrustworthyMARL/adv_marl_benchmark .
comment: 44 pages, 16 figures, NeurIPS 2025
Evolution of Cooperation in LLM-Agent Societies: A Preliminary Study Using Different Punishment Strategies AAMAS 2025
The evolution of cooperation has been extensively studied using abstract mathematical models and simulations. Recent advances in Large Language Models (LLMs) and the rise of LLM agents have demonstrated their ability to perform social reasoning, thus providing an opportunity to test the emergence of norms in more realistic agent-based simulations with human-like reasoning using natural language. In this research, we investigate whether the cooperation dynamics presented in Boyd and Richerson's model persist in a more realistic simulation of the Diner's Dilemma using LLM agents compared to the abstract mathematical nature in the work of Boyd and Richerson. Our findings indicate that agents follow the strategies defined in the Boyd and Richerson model, and explicit punishment mechanisms drive norm emergence, reinforcing cooperative behaviour even when the agent strategy configuration varies. Our results suggest that LLM-based Multi-Agent System simulations, in fact, can replicate the evolution of cooperation predicted by the traditional mathematical models. Moreover, our simulations extend beyond the mathematical models by integrating natural language-driven reasoning and a pairwise imitation method for strategy adoption, making them a more realistic testbed for cooperative behaviour in MASs.
comment: 20 pages, 10 figures, Accepted for presentation as a full paper at the COINE 2025 workshop at AAMAS 2025 (https://coin-workshop.github.io/coine-2025-detroit/accepted_for_presentation.html)
Debate or Vote: Which Yields Better Decisions in Multi-Agent Large Language Models? NeurIPS 2025
Multi-Agent Debate~(MAD) has emerged as a promising paradigm for improving the performance of large language models through collaborative reasoning. Despite recent advances, the key factors driving MAD's effectiveness remain unclear. In this work, we disentangle MAD into two key components--Majority Voting and inter-agent Debate--and assess their respective contributions. Through extensive experiments across seven NLP benchmarks, we find that Majority Voting alone accounts for most of the performance gains typically attributed to MAD. To explain this, we propose a theoretical framework that models debate as a stochastic process. We prove that it induces a martingale over agents' belief trajectories, implying that debate alone does not improve expected correctness. Guided by these insights, we demonstrate that targeted interventions, by biasing the belief update toward correction, can meaningfully enhance debate effectiveness. Overall, our findings suggest that while MAD has potential, simple ensembling methods remain strong and more reliable alternatives in many practical settings. Code is released in https://github.com/deeplearning-wisc/debate-or-vote.
comment: NeurIPS 2025 Spotlight
Online Learning for Dynamic Vickrey-Clarke-Groves Mechanism in Unknown Environments
We consider the problem of online dynamic mechanism design for sequential auctions in unknown environments, where the underlying market and, thus, the bidders' values vary over time as interactions between the seller and the bidders progress. We model the sequential auctions as an infinite-horizon average-reward Markov decision process (MDP). In each round, the seller determines an allocation and sets a payment for each bidder, while each bidder receives a private reward and submits a sealed bid to the seller. The state, which represents the underlying market, evolves according to an unknown transition kernel and the seller's allocation policy without episodic resets. We first extend the Vickrey-Clarke-Groves (VCG) mechanism to sequential auctions, thereby obtaining a dynamic counterpart that preserves the desired properties: efficiency, truthfulness, and individual rationality. We then focus on the online setting and develop a reinforcement learning algorithm for the seller to learn the underlying MDP and implement a mechanism that closely resembles the dynamic VCG mechanism. We show that the learned mechanism approximately satisfies efficiency, truthfulness, and individual rationality and achieves guaranteed performance in terms of various notions of regret.
comment: 20 pages
Systems and Control (CS)
Bilevel Analysis of Cost and Emissions Externalities from Data Center Load Shifting
Data centers are emerging as large, flexible electricity consumers capable of shifting computational workloads across locations in response to economic and environmental signals. While this flexibility has potential for emissions reduction, its impact on power system operations depends critically on how such behavior interacts with network constraints and market signals. We develop a bilevel optimization framework in which a data center minimizes a weighted combination of electricity cost and marginal emissions intensity (LME), while the system operator clears economic dispatch under transmission and generation constraints. Focusing on a stylized three-bus power system, we derive closed-form, piecewise-linear expressions for both the data center and system-wide objectives as functions of the data centers' load shift. These expressions capture threshold-driven regime changes due to congestion and renewable saturation. We identify sufficient conditions under which the data center's decentralized decisions align with or diverge from socially optimal behavior and characterize the resulting externalities. Our results reveal how system topology and generator asymmetry affect incentive alignment and provide insight into when marginal price or emissions signals may fail to guide flexible loads toward socially beneficial outcomes. Our results offer a tractable starting point for analyzing decentralized flexibility under carbon-aware incentives and suggest directions for improving coordination between flexible loads and system operations.
Learning Optimal Power Flow with Pointwise Constraints
Training learning parameterizations to solve optimal power flow (OPF) with pointwise constraints is proposed. In this novel training approach, a learning parameterization is substituted directly into an OPF problem with constraints required to hold over all problem instances. This is different from existing supervised learning methods in which constraints are required to hold across the average of problem instances. Training with pointwise constraints is undertaken in the dual domain with the use of augmented Lagrangian and dual gradient ascent algorithm. Numerical experiments demonstrate that training with pointwise constraints produces solutions with smaller constraint violations. Experiments further demonstrated that pointwise constraints are most effective at reducing constraint violations in corner cases - defined as those realizations in which constraints are most difficult to satisfy. Gains are most pronounced in power systems with large numbers of buses.
comment: 19 pages, 13 figures
Sugar Shack 4.0: Implementation of a Cyber-Physical System for Logistic and Sanitary Automation in a Maple Syrup Boiling Center
This paper presents the design and deployment of a process-aware cyber-physical system that automates plant-level logistics, traceability, and sanitation in a centralized maple-syrup boiling center. The system replaces ad-hoc, manual operations with event-driven orchestration on a local server, employing reusable device abstractions and a centralized interlock with priority-based arbitration for shared piping. It implements deterministic routines for delivery, reverse osmosis integration, evaporator feed, and permeate management. The system is sensor rich: inline measurements of flow, temperature, and sugar concentration (degrees Brix) drive routing decisions and trigger systematic post-transfer rinses (cleaning-in-place), ensuring consistent hygiene and complete, immediate traceability up to the evaporator inlet. During the 2025 production season, the system queued 431 operations without incident; executed 908 \enquote{topstock} and \enquote{downstock} balancing cycles; increased usable permeate reserves from 22,712 to approximately 41,640 L through dynamic storage assignment; eliminated mid-season contamination incidents previously observed under manual practice; and reduced administrative effort for billing and reporting from more than 30 hours to roughly 1 hour through automatic documentation. These results demonstrate a practical path to modular, plant-scale automation beyond traditional architectures, and lay the groundwork for packaging reusable elements for similar plants or adjacent industries. This work is part of a larger project involving the first scientifically-documented integration of Industry 4.0 technologies in a maple syrup boiling center.
comment: 8 pages, 6 figures
Safe Decentralized Density Control of Multi-Robot Systems using PDE-Constrained Optimization with State Constraints
In this paper, we introduce a decentralized optimization-based density controller designed to enforce set invariance constraints in multi-robot systems. By designing a decentralized control barrier function, we derived sufficient conditions under which local safety constraints guarantee global safety. We account for localization and motion noise explicitly by modeling robots as spatial probability density functions governed by the Fokker-Planck equation. Compared to traditional centralized approaches, our controller requires less computational and communication power, making it more suitable for deployment in situations where perfect communication and localization are impractical. The controller is validated through simulations and experiments with four quadcopters.
comment: Accepted to MRS 2025
Decentralized Small Gain and Phase Stability Conditions for Grid-Forming Converters: Limitations and Extensions
The increasing share of converter based resources in power systems calls for scalable methods to analyse stability without relying on exhaustive system wide simulations. Decentralized small gain and small-phase criteria have recently been proposed for this purpose, but their applicability to grid forming converters is severely limited by the sectoriality assumption, which is not typically satisfied at low frequencies. This work revisits and extends mixed gain phase conditions by introducing loop shaping transformations that reformulate converter and network models in alternative coordinate frames. The proposed approach resolves intrinsic non sectoriality at low frequencies and reduces conservativeness, thereby improving the applicability of decentralized stability certification. Analytical results are illustrated using an infinite bus system first and then extended to the IEEE 14 bus network, demonstrating the practicality and scalability of the method. These findings provide a pathway toward less conservative and more widely applicable decentralized stability certificates in power grids.
Path-Based Conditions for the Identifiability of Non-additive Nonlinear Networks with Full Measurements
We analyze the identifiability of nonlinear networks with node dynamics characterized by functions that are non-additive. We consider the full measurement case (all the nodes are measured) in the path-independent delay scenario where all the excitation signals of a specific node have the same delay in the output of a measured node. Based on the notion of a generic nonlinear matrix associated with the network, we introduce the concept of generic identifiability and characterize the space of functions that satisfies this property. For directed acyclic graphs (DAGs) characterized by analytic functions, we derive a sufficient condition for identifiability based on vertex-disjoint paths from excited nodes to the in-neighbors of each node in the network. Furthermore, when we consider the class of polynomial functions, by using well-known results on algebraic varieties, we prove that the vertex-disjoint path condition is also necessary. Finally, we show that this identifiability condition is not necessary for the additive nonlinear model. Some examples are added to illustrate the results.
comment: 11 pages, 7 figures, submitted to IEEE Transactions on Automatic Control
Joint Computation Offloading and Resource Management for Cooperative Satellite-Aerial-Marine Internet of Things Networks
Devices within the marine Internet of Things (MIoT) can connect to low Earth orbit (LEO) satellites and unmanned aerial vehicles (UAVs) to facilitate low-latency data transmission and execution, as well as enhanced-capacity data storage. However, without proper traffic handling strategy, it is still difficult to effectively meet the low-latency requirements. In this paper, we consider a cooperative satellite-aerial-MIoT network (CSAMN) for maritime edge computing and maritime data storage to prioritize delay-sensitive (DS) tasks by employing mobile edge computing, while handling delay-tolerant (DT) tasks via the store-carry-forward method. Considering the delay constraints of DS tasks, we formulate a constrained joint optimization problem of maximizing satellite-collected data volume while minimizing system energy consumption by controlling four interdependent variables, including the transmit power of UAVs for DS tasks, the start time of DT tasks, computing resource allocation, and offloading ratio. To solve this non-convex and non-linear problem, we propose a joint computation offloading and resource management (JCORM) algorithm using the Dinkelbach method and linear programming. Our results show that the volume of data collected by the proposed JCORM algorithm can be increased by up to 41.5% compared to baselines. Moreover, JCORM algorithm achieves a dramatic reduction in computational time, from a maximum of 318.21 seconds down to just 0.16 seconds per experiment, making it highly suitable for real-time maritime applications.
Behavior-Aware Online Prediction of Obstacle Occupancy using Zonotopes
Predicting the motion of surrounding vehicles is key to safe autonomous driving, especially in unstructured environments without prior information. This paper proposes a novel online method to accurately predict the occupancy sets of surrounding vehicles based solely on motion observations. The approach is divided into two stages: first, an Extended Kalman Filter and a Linear Programming (LP) problem are used to estimate a compact zonotopic set of control actions; then, a reachability analysis propagates this set to predict future occupancy. The effectiveness of the method has been validated through simulations in an urban environment, showing accurate and compact predictions without relying on prior assumptions or prior training data.
comment: 64th IEEE Conference on Decision and Control
A Multifunctional Capacitive Sensing Platform for Wireless Vascular and Heart Monitoring
We present a multifunctional, antenna-integrated capacitive sensing (MAiCaS) platform for passive, wireless, and real-time cardiovascular monitoring. Unlike conventional systems that require separate sensors and wireless modules, our device unifies sensing, telemetry, and mechanical functionality into a compact and scalable design by exploiting the parasitic capacitance of an inductive antenna as a strain-sensitive element. The sensor is fabricated using a cleanroom-free, single-step UV laser patterning process on a flexible PDMS substrate, reducing manufacturing complexity and enabling high reproducibility. The MAiCaS is suitable for three different applications: as a sensor for epicardial strain measurement, a stent as a sensor, and a vascular graft sensor. We demonstrate MAiCaS's versatility by validating its wireless resonance-based response to strain, pressure, and deformation across unrolled and rolled forms. In vitro experiments demonstrated consistent resonance frequency shifts under physiological conditions, with stable performance on skin, in PBS, human serum, and simulated vascular environments. Repeatability and aging tests confirmed its long-term reliability and elasticity under cyclic loading. Calibration curves revealed high sensitivity across all configurations, with wireless interrogation achieved through S11 parameter measurements and resonance frequency shift as the output metric. The sensitivity of the device was measured to be 2.9 MHz per 1% strain, 0.43 MHz/mmHg, and 309.6kHz/\textmu m for epicardial patch, graft, and stent integrated sensor, respectively. The operation of MAiCaS was evaluated in a human experiment. This monolithic sensor architecture provides a scalable and cost-effective solution for battery-free monitoring of vascular dynamics, with potential for remote diagnostics, post-surgical follow-up, and continuous cardiovascular health management.
comment: 28 pages, 7 figures,
Balancing Specialization and Centralization: A Multi-Agent Reinforcement Learning Benchmark for Sequential Industrial Control
Autonomous control of multi-stage industrial processes requires both local specialization and global coordination. Reinforcement learning (RL) offers a promising approach, but its industrial adoption remains limited due to challenges such as reward design, modularity, and action space management. Many academic benchmarks differ markedly from industrial control problems, limiting their transferability to real-world applications. This study introduces an enhanced industry-inspired benchmark environment that combines tasks from two existing benchmarks, SortingEnv and ContainerGym, into a sequential recycling scenario with sorting and pressing operations. We evaluate two control strategies: a modular architecture with specialized agents and a monolithic agent governing the full system, while also analyzing the impact of action masking. Our experiments show that without action masking, agents struggle to learn effective policies, with the modular architecture performing better. When action masking is applied, both architectures improve substantially, and the performance gap narrows considerably. These results highlight the decisive role of action space constraints and suggest that the advantages of specialization diminish as action complexity is reduced. The proposed benchmark thus provides a valuable testbed for exploring practical and robust multi-agent RL solutions in industrial automation, while contributing to the ongoing debate on centralization versus specialization.
comment: Preprint (submitted version) to be presented at the 13th International Conference on Industrial Engineering and Applications (ICIEA-EU), Milan, 2026. The final Version of Record will appear in the official conference proceedings
Interlacing in Controllers Implementation: Frequency Analysis
The main goal of this contribution is to explain how to use interlacing techniques for LTI controllers implementation and analyze different struc- tures in this environment. These considerations lead to an important com- putation saving in constrained resource environments. It has been also intro- duced new procedures for obtaining the blocks related to different real and complex controllers poles. The resultant time-varying system is modeled using proper discrete lifting techniques and a new and efficient dual-rate fre- quency response computation allows to determine the characteristics of the control loop with interlaced controller. Examples illustrate the theoretical proposals.
comment: 18 pages, 6 figures, next version will be submitted to a journal
On MIMO Stability Analysis Methods Applied to Inverter-Based Resources Connected to Power Systems
This paper presents a critical review of methods commonly employed in the literature for small signal stability analysis of inverter based resources (IBRs). It discusses the intended purposes of these methods and outlines both their proper and improper implementations. The paper provides insights into the applicability of these techniques, clarifies their inherent limitations, and discusses and illustrates common sources of misinterpretation.
Multi-layer Optimized Coordination of Smart Building Resources in Active Power Distribution Systems
This paper proposes a multi-actor coordination platform for the optimal utilization of smart buildings resources, including roof top PV generation and battery energy storage system (BESS), in active power distribution systems. The proposed multi-actor coordination includes the Smart Building Coordinator (SBC), Micro-Grid Coordinator (MGC) and Distribution System Coordinator (DSC). The coordinators operate independently and only exchange limited information with each other to reach an optimal solution. In the proposed platform, a hierarchical optimization problem is solved to optimally determine the operating point of all distribution system resources. The proposed platform fully preserves the confidentiality of the behind the meter (BTM) data of the buildings since no information about the status of the PV system, BESS, and load of the building is shared with the owner of the power system. The proposed platform has a flexible and scalable architecture where the computational task of coordinating microgrids and smart buildings with distribution grid is performed locally at the MGC and SBC layers, respectively. Numerical simulations show the efficacy of the proposed platform in coordinating the BTM resources with the rest of the distribution system.
Observer-based Differentiators for Noisy Signals
We present a collection of different types of observation systems that work as differentiators. These observer-based differentiators can produce estimates for derivatives of a given signal, even though the given signal is prone to noise.
comment: 9 pages, 6 figures, technical report
From Bundles to Backstepping: Geometric Control Barrier Functions for Safety-Critical Control on Manifolds
Control barrier functions (CBFs) have a well-established theory in Euclidean spaces, yet still lack general formulations and constructive synthesis tools for systems evolving on manifolds common in robotics and aerospace applications. In this paper, we develop a general theory of geometric CBFs on bundles and, for control-affine systems, recover the standard optimization-based CBF controllers and their smooth analogues. Then, by generalizing kinetic energy-based CBF backstepping to Riemannian manifolds, we provide a constructive CBF synthesis technique for geometric mechanical systems, as well as easily verifiable conditions under which it succeeds. Further, this technique utilizes mechanical structure to avoid computations on higher-order tangent bundles. We demonstrate its application to an underactuated satellite on SO(3).
comment: 8 pages, 3 figures, Submitted to American Control Conference (ACC) 2026
Soft Switching Expert Policies for Controlling Systems with Uncertain Parameters
This paper proposes a simulation-based reinforcement learning algorithm for controlling systems with uncertain and varying system parameters. While simulators are useful for safely learning control policies for physical systems, mitigating the reality gap remains a major challenge. To address the challenge, we propose a two-stage algorithm. In the first stage, multiple control policies are learned for systems with different parameters in a simulator. In the second stage, for a real system, the control policies learned in the first stage are smoothly switched using an online convex optimization algorithm based on observations. Our proposed algorithm is demonstrated through numerical experiments.
comment: 6 pages, 8 figures. Submitted to an International Conference
Design Optimization and Global Impact Assessment of Solar-Thermal Direct Air Carbon Capture
The dual challenge of decarbonizing the economy and meeting rising global energy demand underscores the need for scalable and cost-effective carbon dioxide removal technologies. Direct air capture (DAC) is among the most promising approaches, but its high energy intensity, particularly the thermal energy required for sorbent regeneration, remains a critical barrier to cost reduction and sustainable deployment. This study explores solar-thermal DAC systems that combine concentrated solar thermal technology with low-cost sand-based thermal energy storage to meet this demand. We analyze the techno-economic performance of such systems in both grid-connected and stand-alone configurations. Results show that solar-thermal DAC can achieve annual capacity factors exceeding 80% and CO2 removal costs as low as 160-200 USD per ton, making it competitive with leading DAC technologies. The proposed system operates most efficiently with short-cycle sorbents that align with solar availability. The stand-alone Solar-DAC systems, which rely solely on solar energy for both electricity and thermal energy, are particularly promising in regions with high solar capacity and sandy terrain, exhibiting minimal ambient sensitivity from temperature and humidity. An optimal 6000 ton/yr modular system design takes <1 km2 land-use requirement and potentially >26 Gt/year DAC capacity is identified for sandy terrain alone globally. In areas with sedimentary basins suitable for CO2 storage, solar-powered DAC offers a lower-cost alternative to geothermal heating, which often faces geological and economic constraints.
comment: 4 figures
Interpolatory Approximations of PMU Data: Dimension Reduction and Pilot Selection
This work investigates the reduction of phasor measurement unit (PMU) data through low-rank matrix approximations. To reconstruct a PMU data matrix from fewer measurements, we propose the framework of interpolatory matrix decompositions (IDs). In contrast to methods relying on principal component analysis or singular value decomposition, IDs recover the complete data matrix using only a few of its rows (PMU datastreams) and/or a few of its columns (snapshots in time). This compression enables the real-time monitoring of power transmission systems using a limited number of measurements, thereby minimizing communication bandwidth. The ID perspective gives a rigorous error bound on the quality of the data compression. We propose selecting rows and columns used in an ID via the discrete empirical interpolation method (DEIM), a greedy algorithm that aims to control the error bound. This bound leads to a computable estimate for the reconstruction error during online operations. A violation of this estimate suggests a change in the system's operating conditions, and thus serves as a tool for fault detection. Numerical tests using synthetic PMU data illustrate DEIM's excellent performance for data compression, and validate the proposed DEIM-based fault-detection method.
SpeechAgent: An End-to-End Mobile Infrastructure for Speech Impairment Assistance
Speech is essential for human communication, yet millions of people face impairments such as dysarthria, stuttering, and aphasia conditions that often lead to social isolation and reduced participation. Despite recent progress in automatic speech recognition (ASR) and text-to-speech (TTS) technologies, accessible web and mobile infrastructures for users with impaired speech remain limited, hindering the practical adoption of these advances in daily communication. To bridge this gap, we present SpeechAgent, a mobile SpeechAgent designed to facilitate people with speech impairments in everyday communication. The system integrates large language model (LLM)- driven reasoning with advanced speech processing modules, providing adaptive support tailored to diverse impairment types. To ensure real-world practicality, we develop a structured deployment pipeline that enables real-time speech processing on mobile and edge devices, achieving imperceptible latency while maintaining high accuracy and speech quality. Evaluation on real-world impaired speech datasets and edge-device latency profiling confirms that SpeechAgent delivers both effective and user-friendly performance, demonstrating its feasibility for personalized, day-to-day assistive communication.
Modeling to Generate Alternatives for Robustness of Mixed Integer DC Optimal Power Flow
Transmission system operators face a variety of discrete operational decisions, such as switching of branches and/or devices. Incorporating these decisions into optimal power flow (OPF) results in mixed-integer non-linear programming problems (MINLPs), which can't presently be solved at scale in the required time. Various linearizations of the OPF exist, most famously the DC-OPF, which can be leveraged to find integer decisions. However, these linearizations can yield very poor integer solutions in some edge cases, making them challenging to incorporate into control rooms. This paper introduces the use of modeling to generate alternatives (MGA) to find alternative solutions to the linearized problems, reducing the chance of finding no AC feasible solutions. We test this approach using 13 networks where the DC linearization results in infeasible integer decisions, and MGA finds a solution in all cases. The MGA search criteria selected drastically affects the number and quality of solutions found, so network specific search functions may be necessary.
Lyapunov-Based Physics-Informed Deep Neural Networks with Skew Symmetry Considerations
Deep neural networks (DNNs) are powerful black-box function approximators which have been shown to yield improved performance compared to traditional neural network (NN) architectures. However, black-box algorithms do not incorporate known physics of the system and can yield results which are physically implausible. Physics-informed neural networks (PINNs) have grown in popularity due to their ability to leverage known physical principles in the learning process which has been empirically shown to improve performance compared to traditional black-box methods. This paper introduces the first physics-informed DNN controller for an Euler-Lagrange dynamic system where the adaptation laws are designed using a Lyapunov-based stability analysis to account for the skew-symmetry property of the inertia matrix and centripetal-Coriolis matrix. A Lyapunov-based stability analysis is provided to guarantee asymptotic convergence of the tracking error and the skew-symmetric prediction error. Simulations indicate that the developed update law demonstrates improvement in individual and overall function approximation capabilities when compared to a physics-informed adaptation law which does not incorporate knowledge of system symmetries.
House Thermal Model Estimation: Robustness Across Seasons and Setpoints
Achieving the flexibility from house heating, cooling, and ventilation systems (HVAC) has the potential to enable large-scale demand response by aggregating HVAC load adjustments across many homes. This demand response strategy helps distribution grid to flexibly ramp-up or ramp-down local load demand so that it can optimally match the bulk power system generation profile. However, achieving this capability requires house thermal models that are both computationally efficient and robust to operating conditions. In this work, parameters of the Resistance-Capacitance (RC) network thermal model for houses are estimated using three optimization algorithms: Nonlinear Least Squares (NLS), Batch Estimation (BE), and Maximum Likelihood Estimation (MLE). The resulting models are evaluated through a Forward-Simulation across four different seasons and three setpoints. The results illustrate a principled way of selecting reduced order models and estimation methods with respect to the robustness offered to seasonal and setpoint variations in training-testing datasets
comment: This manuscript is a version of our paper accepted at the 57th North American Power Symposium (NAPS) 2025
A Connectively Stable and Robust DAPI Control Scheme for Islanded Networks of Microgrids
The transition towards clean energy and the introduction of Distributed Energy Resources (DERs) are giving rise to the emergence of Microgrids (MGs) and Networks of MGs (NMGs). MGs and NMGs can operate autonomously in islanded mode. However, they face challenges in terms of secondary level frequency and voltage regulation, due to the variable nature of Renewable Energy Sources (RES) and loads. Distributed-Averaging Proportional-Integral (DAPI) control has been proposed in the literature for distributed frequency and voltage control of droop-controlled DERs, but it is not robust to operational or structural perturbations. To address this, we propose a robust DAPI frequency and voltage control scheme that ensures robustness using the concept of connective stability, along with the invariant ellipsoid technique for disturbance rejection. Simulation of an NMG model in MATLAB\textsuperscript{\textregistered}/Simulink\textsuperscript{\textregistered} consisting of 3 MGs and 5 DERs validates the effectiveness of the proposed method, and demonstrates that it can successfully mitigate the effects of major disturbances such as cyberattacks.
Safety Monitor for Off-Road Planning with Uncertainty Bounded Bekker Costs
Reliable off-road autonomy requires operational constraints so that behavior stays predictable and safe when soil strength is uncertain. This paper presents a runtime assurance safety monitor that collaborates with any planner and uses a Bekker-based cost model with bounded uncertainty. The monitor builds an upper confidence traversal cost from a lightweight pressure sinkage model identified in field tests and checks each planned motion against two limits: maximum sinkage and rollover margin. If the risk of crossing either limit is too high, the monitor switches to a certified fallback that reduces vehicle speed, increases standoff from soft ground, or stops on firmer soil. This separation lets the planner focus on efficiency while the monitor keeps the vehicle within clear safety limits on board. Wheel geometry, wheel load estimate, and a soil raster serve as inputs, which tie safety directly to vehicle design and let the monitor set clear limits on speed, curvature, and stopping at run time. The method carries uncertainty analytically into the upper confidence cost and applies simple intervention rules. Tuning of the sinkage limit, rollover margin, and risk window trades efficiency for caution while keeping the monitor light enough for embedded processors. Results from a simulation environment spanning loam to sand include intervention rates, violation probability, and path efficiency relative to the nominal plan, and a benchtop static loading check provides initial empirical validation.
Aircraft Collision Avoidance Systems: Technological Challenges and Solutions on the Path to Regulatory Acceptance
Aircraft collision avoidance systems is critical to modern aviation. These systems are designed to predict potential collisions between aircraft and recommend appropriate avoidance actions. Creating effective collision avoidance systems requires solutions to a variety of technical challenges related to surveillance, decision making, and validation. These challenges have sparked significant research and development efforts over the past several decades that have resulted in a variety of proposed solutions. This article provides an overview of these challenges and solutions with an emphasis on those that have been put through a rigorous validation process and accepted by regulatory bodies. The challenges posed by the collision avoidance problem are often present in other domains, and aircraft collision avoidance systems can serve as case studies that provide valuable insights for a wide range of safety-critical systems.
comment: 32 pages, 9 figures
Prognostic Framework for Robotic Manipulators Operating Under Dynamic Task Severities
Robotic manipulators are critical in many applications but are known to degrade over time. This degradation is influenced by the nature of the tasks performed by the robot. Tasks with higher severity, such as handling heavy payloads, can accelerate the degradation process. One way this degradation is reflected is in the position accuracy of the robot's end-effector. In this paper, we present a prognostic modeling framework that predicts a robotic manipulator's Remaining Useful Life (RUL) while accounting for the effects of task severity. Our framework represents the robot's position accuracy as a Brownian motion process with a random drift parameter that is influenced by task severity. The dynamic nature of task severity is modeled using a continuous-time Markov chain (CTMC). To evaluate RUL, we discuss two approaches -- (1) a novel closed-form expression for Remaining Lifetime Distribution (RLD), and (2) Monte Carlo simulations, commonly used in prognostics literature. Theoretical results establish the equivalence between these RUL computation approaches. We validate our framework through experiments using two distinct physics-based simulators for planar and spatial robot fleets. Our findings show that robots in both fleets experience shorter RUL when handling a higher proportion of high-severity tasks.
comment: Accepted for Publication in IEEE Transactions on Systems, Man, and Cybernetics: Systems
Deep Learning for Continuous-time Stochastic Control with Jumps
In this paper, we introduce a model-based deep-learning approach to solve finite-horizon continuous-time stochastic control problems with jumps. We iteratively train two neural networks: one to represent the optimal policy and the other to approximate the value function. Leveraging a continuous-time version of the dynamic programming principle, we derive two different training objectives based on the Hamilton-Jacobi-Bellman equation, ensuring that the networks capture the underlying stochastic dynamics. Empirical evaluations on different problems illustrate the accuracy and scalability of our approach, demonstrating its effectiveness in solving complex, high-dimensional stochastic control tasks.
Revisiting Functional Derivatives in Multi-object Tracking
Probability generating functionals (PGFLs) are efficient and powerful tools for tracking independent objects in clutter. It was shown that PGFLs could be used for the elegant derivation of practical multi-object tracking algorithms, e.g., the probability hypothesis density (PHD) filter. However, derivations using PGFLs use the so-called functional derivatives whose definitions usually appear too complicated or heuristic, involving Dirac delta ``functions''. This paper begins by comparing different definitions of functional derivatives and exploring their relationships and implications for practical applications. It then proposes a rigorous definition of the functional derivative, utilizing straightforward yet precise mathematics for clarity. Key properties of the functional derivative are revealed and discussed.
comment: submitted to SIAM Journal on Control and Optimization
Convergence in On-line Learning of Static and Dynamic Systems
The paper derives analytical expressions for the asymptotic average updating direction of the adaptive moment generation (ADAM) algorithm when applied to recursive identification of nonlinear systems. It is proved that the standard hyper-parameter setting results in the same asymptotic average updating direction as a diagonally power normalized stochastic gradient algorithm. With the internal filtering turned off, the asymptotic average updating direction is instead equivalent to that of a sign-sign stochastic gradient algorithm. Global convergence to an invariant set follows, where a subset of parameters contain those that give a correct input-output description of the system. The paper also exploits a nonlinear dynamic model to embed structure in recurrent neural networks. A Monte-Carlo simulation study validates the results.
Constrained Trajectory Optimization for Hybrid Dynamical Systems
Hybrid dynamical systems pose significant challenges for effective planning and control, especially when additional constraints such as obstacle avoidance, state boundaries, and actuation limits are present. In this letter, we extend the recently proposed Hybrid iLQR method [1] to handle state and input constraints within an indirect optimization framework, aiming to preserve computational efficiency and ensure dynamic feasibility. Specifically, we incorporate two constraint handling mechanisms into the Hybrid iLQR: Discrete Barrier State and Augmented Lagrangian methods. Comprehensive simulations across various operational situations are conducted to evaluate and compare the performance of these extended methods in terms of convergence and their ability to handle infeasible starting trajectories. Results indicate that while the Discrete Barrier State approach is more computationally efficient, the Augmented Lagrangian method outperforms it in complex and real-world scenarios with infeasible initial trajectories.
comment: 6 pages 4 figures
Online Regularized Statistical Learning in Reproducing Kernel Hilbert Space With Non-Stationary Data
We study recursive regularized learning algorithms in the reproducing kernel Hilbert space (RKHS) with non-stationary online data streams. We introduce the concept of random Tikhonov regularization path and decompose the tracking error of the algorithm's output for the regularization path into random difference equations in RKHS. We show that the tracking error vanishes in mean square if the regularization path is slowly time-varying. Then, leveraging the monotonicity of inverse operators and the spectral decomposition of compact operators, and introducing the RKHS persistence of excitation condition, we develop a dominated convergence method to prove the mean square consistency between the regularization path and the unknown function to be learned. Especially, for independent and non-identically distributed data streams, the mean square consistency between the algorithm's output and the unknown function is achieved if the input data's marginal probability measures are slowly time-varying and the average measure over each fixed-length time period has a uniformly strictly positive lower bound.
Bayesian Optimization of Process Parameters of a Sensor-Based Sorting System using Gaussian Processes as Surrogate Models
Sensor-based sorting systems enable the physical separation of a material stream into two fractions. The sorting decision is based on the image data evaluation of the sensors used and is carried out using actuators. Various process parameters must be set depending on the properties of the material stream, the dimensioning of the system, and the required sorting accuracy. However, continuous verification and re-adjustment are necessary due to changing requirements and material stream compositions. In this paper, we introduce an approach for optimizing, recurrently monitoring and adjusting the process parameters of a sensor-based sorting system. Based on Bayesian Optimization, Gaussian process regression models are used as surrogate models to achieve specific requirements for system behavior with the uncertainties contained therein. This method minimizes the number of necessary experiments while simultaneously considering two possible optimization targets based on the requirements for both material output streams. In addition, uncertainties are considered during determining sorting accuracies in the model calculation. We evaluated the method with three example process parameters.
comment: Accepted at the IEEE 30th International Conference on Emerging Technologies and Factory Automation (ETFA)
Scalable Distributed Least Squares Algorithm for Linear Algebraic Equations via Periodic Scheduling
In this work, we propose a novel discrete-time distributed algorithm for finding least-squares solutions of linear algebraic equations with a scheduling protocol to further enhance its scalability. Each agent in the network is assumed to know some rows of the coefficient matrix and the corresponding entries in the observation vector. Unlike typical distributed algorithms, our approach considers communication bandwidth limits, allowing agents to transmit only a portion of their ``guessed" solution, independent of its dimension. A cyclic scheduling protocol determines which portion is transmitted at each iteration. Assuming a small fixed step size and a diagonalizable algorithm matrix, we prove that agents' ``guessed" solutions converge exponentially to a least squares solution. For cases where the observation vectors are time-varying, a modified algorithm guarantees practical convergence, with tracking error bounded by the single-step variation in the observation vector. Simulations and comparisons with state-of-the-art algorithms validate our algorithm's feasibility and scalability.
comment: Submitted to IEEE TAC
Towards Machine Learning-based Model Predictive Control for HVAC Control in Multi-Context Buildings at Scale via Ensemble Learning
The building thermodynamics model, which predicts real-time indoor temperature changes under potential HVAC (Heating, Ventilation, and Air Conditioning) control operations, is crucial for optimizing HVAC control in buildings. While pioneering studies have attempted to develop such models for various building environments, these models often require extensive data collection periods and rely heavily on expert knowledge, making the modeling process inefficient and limiting the reusability of the models. This paper explores a model ensemble perspective that utilizes existing developed models as base models to serve a target building environment, thereby providing accurate predictions while reducing the associated efforts. Given that building data streams are non-stationary and the number of base models may increase, we propose a Hierarchical Reinforcement Learning (HRL) approach to dynamically select and weight the base models. Our approach employs a two-tiered decision-making process: the high-level focuses on model selection, while the low-level determines the weights of the selected models. We thoroughly evaluate the proposed approach through offline experiments and an on-site case study, and the experimental results demonstrate the effectiveness of our method.
Online Learning for Dynamic Vickrey-Clarke-Groves Mechanism in Unknown Environments
We consider the problem of online dynamic mechanism design for sequential auctions in unknown environments, where the underlying market and, thus, the bidders' values vary over time as interactions between the seller and the bidders progress. We model the sequential auctions as an infinite-horizon average-reward Markov decision process (MDP). In each round, the seller determines an allocation and sets a payment for each bidder, while each bidder receives a private reward and submits a sealed bid to the seller. The state, which represents the underlying market, evolves according to an unknown transition kernel and the seller's allocation policy without episodic resets. We first extend the Vickrey-Clarke-Groves (VCG) mechanism to sequential auctions, thereby obtaining a dynamic counterpart that preserves the desired properties: efficiency, truthfulness, and individual rationality. We then focus on the online setting and develop a reinforcement learning algorithm for the seller to learn the underlying MDP and implement a mechanism that closely resembles the dynamic VCG mechanism. We show that the learned mechanism approximately satisfies efficiency, truthfulness, and individual rationality and achieves guaranteed performance in terms of various notions of regret.
comment: 20 pages
MESS+: Dynamically Learned Inference-Time LLM Routing in Model Zoos with Service Level Guarantees NeurIPS 2025
Open-weight large language model (LLM) zoos provide access to numerous high-quality models, but selecting the appropriate model for specific tasks remains challenging and requires technical expertise. Most users simply want factually correct, safe, and satisfying responses without concerning themselves with model technicalities, while inference service providers prioritize minimizing operating costs. These competing interests are typically mediated through service level agreements (SLAs) that guarantee minimum service quality. We introduce MESS+, a stochastic optimization algorithm for cost-optimal LLM request routing while providing rigorous SLA compliance guarantees. MESS+ learns request satisfaction probabilities of LLMs in real-time as users interact with the system, based on which model selection decisions are made by solving a per-request optimization problem. Our algorithm includes a novel combination of virtual queues and request satisfaction prediction, along with a theoretical analysis of cost optimality and constraint satisfaction. Across a wide range of state-of-the-art LLM benchmarks, MESS+ achieves an average of $2\times$ cost savings compared to existing LLM routing techniques.
comment: NeurIPS 2025. Code: https://github.com/laminair/mess-plus
Systems and Control (EESS)
Bilevel Analysis of Cost and Emissions Externalities from Data Center Load Shifting
Data centers are emerging as large, flexible electricity consumers capable of shifting computational workloads across locations in response to economic and environmental signals. While this flexibility has potential for emissions reduction, its impact on power system operations depends critically on how such behavior interacts with network constraints and market signals. We develop a bilevel optimization framework in which a data center minimizes a weighted combination of electricity cost and marginal emissions intensity (LME), while the system operator clears economic dispatch under transmission and generation constraints. Focusing on a stylized three-bus power system, we derive closed-form, piecewise-linear expressions for both the data center and system-wide objectives as functions of the data centers' load shift. These expressions capture threshold-driven regime changes due to congestion and renewable saturation. We identify sufficient conditions under which the data center's decentralized decisions align with or diverge from socially optimal behavior and characterize the resulting externalities. Our results reveal how system topology and generator asymmetry affect incentive alignment and provide insight into when marginal price or emissions signals may fail to guide flexible loads toward socially beneficial outcomes. Our results offer a tractable starting point for analyzing decentralized flexibility under carbon-aware incentives and suggest directions for improving coordination between flexible loads and system operations.
Learning Optimal Power Flow with Pointwise Constraints
Training learning parameterizations to solve optimal power flow (OPF) with pointwise constraints is proposed. In this novel training approach, a learning parameterization is substituted directly into an OPF problem with constraints required to hold over all problem instances. This is different from existing supervised learning methods in which constraints are required to hold across the average of problem instances. Training with pointwise constraints is undertaken in the dual domain with the use of augmented Lagrangian and dual gradient ascent algorithm. Numerical experiments demonstrate that training with pointwise constraints produces solutions with smaller constraint violations. Experiments further demonstrated that pointwise constraints are most effective at reducing constraint violations in corner cases - defined as those realizations in which constraints are most difficult to satisfy. Gains are most pronounced in power systems with large numbers of buses.
comment: 19 pages, 13 figures
Sugar Shack 4.0: Implementation of a Cyber-Physical System for Logistic and Sanitary Automation in a Maple Syrup Boiling Center
This paper presents the design and deployment of a process-aware cyber-physical system that automates plant-level logistics, traceability, and sanitation in a centralized maple-syrup boiling center. The system replaces ad-hoc, manual operations with event-driven orchestration on a local server, employing reusable device abstractions and a centralized interlock with priority-based arbitration for shared piping. It implements deterministic routines for delivery, reverse osmosis integration, evaporator feed, and permeate management. The system is sensor rich: inline measurements of flow, temperature, and sugar concentration (degrees Brix) drive routing decisions and trigger systematic post-transfer rinses (cleaning-in-place), ensuring consistent hygiene and complete, immediate traceability up to the evaporator inlet. During the 2025 production season, the system queued 431 operations without incident; executed 908 \enquote{topstock} and \enquote{downstock} balancing cycles; increased usable permeate reserves from 22,712 to approximately 41,640 L through dynamic storage assignment; eliminated mid-season contamination incidents previously observed under manual practice; and reduced administrative effort for billing and reporting from more than 30 hours to roughly 1 hour through automatic documentation. These results demonstrate a practical path to modular, plant-scale automation beyond traditional architectures, and lay the groundwork for packaging reusable elements for similar plants or adjacent industries. This work is part of a larger project involving the first scientifically-documented integration of Industry 4.0 technologies in a maple syrup boiling center.
comment: 8 pages, 6 figures
Safe Decentralized Density Control of Multi-Robot Systems using PDE-Constrained Optimization with State Constraints
In this paper, we introduce a decentralized optimization-based density controller designed to enforce set invariance constraints in multi-robot systems. By designing a decentralized control barrier function, we derived sufficient conditions under which local safety constraints guarantee global safety. We account for localization and motion noise explicitly by modeling robots as spatial probability density functions governed by the Fokker-Planck equation. Compared to traditional centralized approaches, our controller requires less computational and communication power, making it more suitable for deployment in situations where perfect communication and localization are impractical. The controller is validated through simulations and experiments with four quadcopters.
comment: Accepted to MRS 2025
Decentralized Small Gain and Phase Stability Conditions for Grid-Forming Converters: Limitations and Extensions
The increasing share of converter based resources in power systems calls for scalable methods to analyse stability without relying on exhaustive system wide simulations. Decentralized small gain and small-phase criteria have recently been proposed for this purpose, but their applicability to grid forming converters is severely limited by the sectoriality assumption, which is not typically satisfied at low frequencies. This work revisits and extends mixed gain phase conditions by introducing loop shaping transformations that reformulate converter and network models in alternative coordinate frames. The proposed approach resolves intrinsic non sectoriality at low frequencies and reduces conservativeness, thereby improving the applicability of decentralized stability certification. Analytical results are illustrated using an infinite bus system first and then extended to the IEEE 14 bus network, demonstrating the practicality and scalability of the method. These findings provide a pathway toward less conservative and more widely applicable decentralized stability certificates in power grids.
Path-Based Conditions for the Identifiability of Non-additive Nonlinear Networks with Full Measurements
We analyze the identifiability of nonlinear networks with node dynamics characterized by functions that are non-additive. We consider the full measurement case (all the nodes are measured) in the path-independent delay scenario where all the excitation signals of a specific node have the same delay in the output of a measured node. Based on the notion of a generic nonlinear matrix associated with the network, we introduce the concept of generic identifiability and characterize the space of functions that satisfies this property. For directed acyclic graphs (DAGs) characterized by analytic functions, we derive a sufficient condition for identifiability based on vertex-disjoint paths from excited nodes to the in-neighbors of each node in the network. Furthermore, when we consider the class of polynomial functions, by using well-known results on algebraic varieties, we prove that the vertex-disjoint path condition is also necessary. Finally, we show that this identifiability condition is not necessary for the additive nonlinear model. Some examples are added to illustrate the results.
comment: 11 pages, 7 figures, submitted to IEEE Transactions on Automatic Control
Joint Computation Offloading and Resource Management for Cooperative Satellite-Aerial-Marine Internet of Things Networks
Devices within the marine Internet of Things (MIoT) can connect to low Earth orbit (LEO) satellites and unmanned aerial vehicles (UAVs) to facilitate low-latency data transmission and execution, as well as enhanced-capacity data storage. However, without proper traffic handling strategy, it is still difficult to effectively meet the low-latency requirements. In this paper, we consider a cooperative satellite-aerial-MIoT network (CSAMN) for maritime edge computing and maritime data storage to prioritize delay-sensitive (DS) tasks by employing mobile edge computing, while handling delay-tolerant (DT) tasks via the store-carry-forward method. Considering the delay constraints of DS tasks, we formulate a constrained joint optimization problem of maximizing satellite-collected data volume while minimizing system energy consumption by controlling four interdependent variables, including the transmit power of UAVs for DS tasks, the start time of DT tasks, computing resource allocation, and offloading ratio. To solve this non-convex and non-linear problem, we propose a joint computation offloading and resource management (JCORM) algorithm using the Dinkelbach method and linear programming. Our results show that the volume of data collected by the proposed JCORM algorithm can be increased by up to 41.5% compared to baselines. Moreover, JCORM algorithm achieves a dramatic reduction in computational time, from a maximum of 318.21 seconds down to just 0.16 seconds per experiment, making it highly suitable for real-time maritime applications.
Behavior-Aware Online Prediction of Obstacle Occupancy using Zonotopes
Predicting the motion of surrounding vehicles is key to safe autonomous driving, especially in unstructured environments without prior information. This paper proposes a novel online method to accurately predict the occupancy sets of surrounding vehicles based solely on motion observations. The approach is divided into two stages: first, an Extended Kalman Filter and a Linear Programming (LP) problem are used to estimate a compact zonotopic set of control actions; then, a reachability analysis propagates this set to predict future occupancy. The effectiveness of the method has been validated through simulations in an urban environment, showing accurate and compact predictions without relying on prior assumptions or prior training data.
comment: 64th IEEE Conference on Decision and Control
A Multifunctional Capacitive Sensing Platform for Wireless Vascular and Heart Monitoring
We present a multifunctional, antenna-integrated capacitive sensing (MAiCaS) platform for passive, wireless, and real-time cardiovascular monitoring. Unlike conventional systems that require separate sensors and wireless modules, our device unifies sensing, telemetry, and mechanical functionality into a compact and scalable design by exploiting the parasitic capacitance of an inductive antenna as a strain-sensitive element. The sensor is fabricated using a cleanroom-free, single-step UV laser patterning process on a flexible PDMS substrate, reducing manufacturing complexity and enabling high reproducibility. The MAiCaS is suitable for three different applications: as a sensor for epicardial strain measurement, a stent as a sensor, and a vascular graft sensor. We demonstrate MAiCaS's versatility by validating its wireless resonance-based response to strain, pressure, and deformation across unrolled and rolled forms. In vitro experiments demonstrated consistent resonance frequency shifts under physiological conditions, with stable performance on skin, in PBS, human serum, and simulated vascular environments. Repeatability and aging tests confirmed its long-term reliability and elasticity under cyclic loading. Calibration curves revealed high sensitivity across all configurations, with wireless interrogation achieved through S11 parameter measurements and resonance frequency shift as the output metric. The sensitivity of the device was measured to be 2.9 MHz per 1% strain, 0.43 MHz/mmHg, and 309.6kHz/\textmu m for epicardial patch, graft, and stent integrated sensor, respectively. The operation of MAiCaS was evaluated in a human experiment. This monolithic sensor architecture provides a scalable and cost-effective solution for battery-free monitoring of vascular dynamics, with potential for remote diagnostics, post-surgical follow-up, and continuous cardiovascular health management.
comment: 28 pages, 7 figures,
Balancing Specialization and Centralization: A Multi-Agent Reinforcement Learning Benchmark for Sequential Industrial Control
Autonomous control of multi-stage industrial processes requires both local specialization and global coordination. Reinforcement learning (RL) offers a promising approach, but its industrial adoption remains limited due to challenges such as reward design, modularity, and action space management. Many academic benchmarks differ markedly from industrial control problems, limiting their transferability to real-world applications. This study introduces an enhanced industry-inspired benchmark environment that combines tasks from two existing benchmarks, SortingEnv and ContainerGym, into a sequential recycling scenario with sorting and pressing operations. We evaluate two control strategies: a modular architecture with specialized agents and a monolithic agent governing the full system, while also analyzing the impact of action masking. Our experiments show that without action masking, agents struggle to learn effective policies, with the modular architecture performing better. When action masking is applied, both architectures improve substantially, and the performance gap narrows considerably. These results highlight the decisive role of action space constraints and suggest that the advantages of specialization diminish as action complexity is reduced. The proposed benchmark thus provides a valuable testbed for exploring practical and robust multi-agent RL solutions in industrial automation, while contributing to the ongoing debate on centralization versus specialization.
comment: Preprint (submitted version) to be presented at the 13th International Conference on Industrial Engineering and Applications (ICIEA-EU), Milan, 2026. The final Version of Record will appear in the official conference proceedings
Interlacing in Controllers Implementation: Frequency Analysis
The main goal of this contribution is to explain how to use interlacing techniques for LTI controllers implementation and analyze different struc- tures in this environment. These considerations lead to an important com- putation saving in constrained resource environments. It has been also intro- duced new procedures for obtaining the blocks related to different real and complex controllers poles. The resultant time-varying system is modeled using proper discrete lifting techniques and a new and efficient dual-rate fre- quency response computation allows to determine the characteristics of the control loop with interlaced controller. Examples illustrate the theoretical proposals.
comment: 18 pages, 6 figures, next version will be submitted to a journal
On MIMO Stability Analysis Methods Applied to Inverter-Based Resources Connected to Power Systems
This paper presents a critical review of methods commonly employed in the literature for small signal stability analysis of inverter based resources (IBRs). It discusses the intended purposes of these methods and outlines both their proper and improper implementations. The paper provides insights into the applicability of these techniques, clarifies their inherent limitations, and discusses and illustrates common sources of misinterpretation.
Multi-layer Optimized Coordination of Smart Building Resources in Active Power Distribution Systems
This paper proposes a multi-actor coordination platform for the optimal utilization of smart buildings resources, including roof top PV generation and battery energy storage system (BESS), in active power distribution systems. The proposed multi-actor coordination includes the Smart Building Coordinator (SBC), Micro-Grid Coordinator (MGC) and Distribution System Coordinator (DSC). The coordinators operate independently and only exchange limited information with each other to reach an optimal solution. In the proposed platform, a hierarchical optimization problem is solved to optimally determine the operating point of all distribution system resources. The proposed platform fully preserves the confidentiality of the behind the meter (BTM) data of the buildings since no information about the status of the PV system, BESS, and load of the building is shared with the owner of the power system. The proposed platform has a flexible and scalable architecture where the computational task of coordinating microgrids and smart buildings with distribution grid is performed locally at the MGC and SBC layers, respectively. Numerical simulations show the efficacy of the proposed platform in coordinating the BTM resources with the rest of the distribution system.
Observer-based Differentiators for Noisy Signals
We present a collection of different types of observation systems that work as differentiators. These observer-based differentiators can produce estimates for derivatives of a given signal, even though the given signal is prone to noise.
comment: 9 pages, 6 figures, technical report
From Bundles to Backstepping: Geometric Control Barrier Functions for Safety-Critical Control on Manifolds
Control barrier functions (CBFs) have a well-established theory in Euclidean spaces, yet still lack general formulations and constructive synthesis tools for systems evolving on manifolds common in robotics and aerospace applications. In this paper, we develop a general theory of geometric CBFs on bundles and, for control-affine systems, recover the standard optimization-based CBF controllers and their smooth analogues. Then, by generalizing kinetic energy-based CBF backstepping to Riemannian manifolds, we provide a constructive CBF synthesis technique for geometric mechanical systems, as well as easily verifiable conditions under which it succeeds. Further, this technique utilizes mechanical structure to avoid computations on higher-order tangent bundles. We demonstrate its application to an underactuated satellite on SO(3).
comment: 8 pages, 3 figures, Submitted to American Control Conference (ACC) 2026
Soft Switching Expert Policies for Controlling Systems with Uncertain Parameters
This paper proposes a simulation-based reinforcement learning algorithm for controlling systems with uncertain and varying system parameters. While simulators are useful for safely learning control policies for physical systems, mitigating the reality gap remains a major challenge. To address the challenge, we propose a two-stage algorithm. In the first stage, multiple control policies are learned for systems with different parameters in a simulator. In the second stage, for a real system, the control policies learned in the first stage are smoothly switched using an online convex optimization algorithm based on observations. Our proposed algorithm is demonstrated through numerical experiments.
comment: 6 pages, 8 figures. Submitted to an International Conference
Design Optimization and Global Impact Assessment of Solar-Thermal Direct Air Carbon Capture
The dual challenge of decarbonizing the economy and meeting rising global energy demand underscores the need for scalable and cost-effective carbon dioxide removal technologies. Direct air capture (DAC) is among the most promising approaches, but its high energy intensity, particularly the thermal energy required for sorbent regeneration, remains a critical barrier to cost reduction and sustainable deployment. This study explores solar-thermal DAC systems that combine concentrated solar thermal technology with low-cost sand-based thermal energy storage to meet this demand. We analyze the techno-economic performance of such systems in both grid-connected and stand-alone configurations. Results show that solar-thermal DAC can achieve annual capacity factors exceeding 80% and CO2 removal costs as low as 160-200 USD per ton, making it competitive with leading DAC technologies. The proposed system operates most efficiently with short-cycle sorbents that align with solar availability. The stand-alone Solar-DAC systems, which rely solely on solar energy for both electricity and thermal energy, are particularly promising in regions with high solar capacity and sandy terrain, exhibiting minimal ambient sensitivity from temperature and humidity. An optimal 6000 ton/yr modular system design takes <1 km2 land-use requirement and potentially >26 Gt/year DAC capacity is identified for sandy terrain alone globally. In areas with sedimentary basins suitable for CO2 storage, solar-powered DAC offers a lower-cost alternative to geothermal heating, which often faces geological and economic constraints.
comment: 4 figures
Interpolatory Approximations of PMU Data: Dimension Reduction and Pilot Selection
This work investigates the reduction of phasor measurement unit (PMU) data through low-rank matrix approximations. To reconstruct a PMU data matrix from fewer measurements, we propose the framework of interpolatory matrix decompositions (IDs). In contrast to methods relying on principal component analysis or singular value decomposition, IDs recover the complete data matrix using only a few of its rows (PMU datastreams) and/or a few of its columns (snapshots in time). This compression enables the real-time monitoring of power transmission systems using a limited number of measurements, thereby minimizing communication bandwidth. The ID perspective gives a rigorous error bound on the quality of the data compression. We propose selecting rows and columns used in an ID via the discrete empirical interpolation method (DEIM), a greedy algorithm that aims to control the error bound. This bound leads to a computable estimate for the reconstruction error during online operations. A violation of this estimate suggests a change in the system's operating conditions, and thus serves as a tool for fault detection. Numerical tests using synthetic PMU data illustrate DEIM's excellent performance for data compression, and validate the proposed DEIM-based fault-detection method.
SpeechAgent: An End-to-End Mobile Infrastructure for Speech Impairment Assistance
Speech is essential for human communication, yet millions of people face impairments such as dysarthria, stuttering, and aphasia conditions that often lead to social isolation and reduced participation. Despite recent progress in automatic speech recognition (ASR) and text-to-speech (TTS) technologies, accessible web and mobile infrastructures for users with impaired speech remain limited, hindering the practical adoption of these advances in daily communication. To bridge this gap, we present SpeechAgent, a mobile SpeechAgent designed to facilitate people with speech impairments in everyday communication. The system integrates large language model (LLM)- driven reasoning with advanced speech processing modules, providing adaptive support tailored to diverse impairment types. To ensure real-world practicality, we develop a structured deployment pipeline that enables real-time speech processing on mobile and edge devices, achieving imperceptible latency while maintaining high accuracy and speech quality. Evaluation on real-world impaired speech datasets and edge-device latency profiling confirms that SpeechAgent delivers both effective and user-friendly performance, demonstrating its feasibility for personalized, day-to-day assistive communication.
Modeling to Generate Alternatives for Robustness of Mixed Integer DC Optimal Power Flow
Transmission system operators face a variety of discrete operational decisions, such as switching of branches and/or devices. Incorporating these decisions into optimal power flow (OPF) results in mixed-integer non-linear programming problems (MINLPs), which can't presently be solved at scale in the required time. Various linearizations of the OPF exist, most famously the DC-OPF, which can be leveraged to find integer decisions. However, these linearizations can yield very poor integer solutions in some edge cases, making them challenging to incorporate into control rooms. This paper introduces the use of modeling to generate alternatives (MGA) to find alternative solutions to the linearized problems, reducing the chance of finding no AC feasible solutions. We test this approach using 13 networks where the DC linearization results in infeasible integer decisions, and MGA finds a solution in all cases. The MGA search criteria selected drastically affects the number and quality of solutions found, so network specific search functions may be necessary.
Lyapunov-Based Physics-Informed Deep Neural Networks with Skew Symmetry Considerations
Deep neural networks (DNNs) are powerful black-box function approximators which have been shown to yield improved performance compared to traditional neural network (NN) architectures. However, black-box algorithms do not incorporate known physics of the system and can yield results which are physically implausible. Physics-informed neural networks (PINNs) have grown in popularity due to their ability to leverage known physical principles in the learning process which has been empirically shown to improve performance compared to traditional black-box methods. This paper introduces the first physics-informed DNN controller for an Euler-Lagrange dynamic system where the adaptation laws are designed using a Lyapunov-based stability analysis to account for the skew-symmetry property of the inertia matrix and centripetal-Coriolis matrix. A Lyapunov-based stability analysis is provided to guarantee asymptotic convergence of the tracking error and the skew-symmetric prediction error. Simulations indicate that the developed update law demonstrates improvement in individual and overall function approximation capabilities when compared to a physics-informed adaptation law which does not incorporate knowledge of system symmetries.
House Thermal Model Estimation: Robustness Across Seasons and Setpoints
Achieving the flexibility from house heating, cooling, and ventilation systems (HVAC) has the potential to enable large-scale demand response by aggregating HVAC load adjustments across many homes. This demand response strategy helps distribution grid to flexibly ramp-up or ramp-down local load demand so that it can optimally match the bulk power system generation profile. However, achieving this capability requires house thermal models that are both computationally efficient and robust to operating conditions. In this work, parameters of the Resistance-Capacitance (RC) network thermal model for houses are estimated using three optimization algorithms: Nonlinear Least Squares (NLS), Batch Estimation (BE), and Maximum Likelihood Estimation (MLE). The resulting models are evaluated through a Forward-Simulation across four different seasons and three setpoints. The results illustrate a principled way of selecting reduced order models and estimation methods with respect to the robustness offered to seasonal and setpoint variations in training-testing datasets
comment: This manuscript is a version of our paper accepted at the 57th North American Power Symposium (NAPS) 2025
A Connectively Stable and Robust DAPI Control Scheme for Islanded Networks of Microgrids
The transition towards clean energy and the introduction of Distributed Energy Resources (DERs) are giving rise to the emergence of Microgrids (MGs) and Networks of MGs (NMGs). MGs and NMGs can operate autonomously in islanded mode. However, they face challenges in terms of secondary level frequency and voltage regulation, due to the variable nature of Renewable Energy Sources (RES) and loads. Distributed-Averaging Proportional-Integral (DAPI) control has been proposed in the literature for distributed frequency and voltage control of droop-controlled DERs, but it is not robust to operational or structural perturbations. To address this, we propose a robust DAPI frequency and voltage control scheme that ensures robustness using the concept of connective stability, along with the invariant ellipsoid technique for disturbance rejection. Simulation of an NMG model in MATLAB\textsuperscript{\textregistered}/Simulink\textsuperscript{\textregistered} consisting of 3 MGs and 5 DERs validates the effectiveness of the proposed method, and demonstrates that it can successfully mitigate the effects of major disturbances such as cyberattacks.
Safety Monitor for Off-Road Planning with Uncertainty Bounded Bekker Costs
Reliable off-road autonomy requires operational constraints so that behavior stays predictable and safe when soil strength is uncertain. This paper presents a runtime assurance safety monitor that collaborates with any planner and uses a Bekker-based cost model with bounded uncertainty. The monitor builds an upper confidence traversal cost from a lightweight pressure sinkage model identified in field tests and checks each planned motion against two limits: maximum sinkage and rollover margin. If the risk of crossing either limit is too high, the monitor switches to a certified fallback that reduces vehicle speed, increases standoff from soft ground, or stops on firmer soil. This separation lets the planner focus on efficiency while the monitor keeps the vehicle within clear safety limits on board. Wheel geometry, wheel load estimate, and a soil raster serve as inputs, which tie safety directly to vehicle design and let the monitor set clear limits on speed, curvature, and stopping at run time. The method carries uncertainty analytically into the upper confidence cost and applies simple intervention rules. Tuning of the sinkage limit, rollover margin, and risk window trades efficiency for caution while keeping the monitor light enough for embedded processors. Results from a simulation environment spanning loam to sand include intervention rates, violation probability, and path efficiency relative to the nominal plan, and a benchtop static loading check provides initial empirical validation.
Aircraft Collision Avoidance Systems: Technological Challenges and Solutions on the Path to Regulatory Acceptance
Aircraft collision avoidance systems is critical to modern aviation. These systems are designed to predict potential collisions between aircraft and recommend appropriate avoidance actions. Creating effective collision avoidance systems requires solutions to a variety of technical challenges related to surveillance, decision making, and validation. These challenges have sparked significant research and development efforts over the past several decades that have resulted in a variety of proposed solutions. This article provides an overview of these challenges and solutions with an emphasis on those that have been put through a rigorous validation process and accepted by regulatory bodies. The challenges posed by the collision avoidance problem are often present in other domains, and aircraft collision avoidance systems can serve as case studies that provide valuable insights for a wide range of safety-critical systems.
comment: 32 pages, 9 figures
Prognostic Framework for Robotic Manipulators Operating Under Dynamic Task Severities
Robotic manipulators are critical in many applications but are known to degrade over time. This degradation is influenced by the nature of the tasks performed by the robot. Tasks with higher severity, such as handling heavy payloads, can accelerate the degradation process. One way this degradation is reflected is in the position accuracy of the robot's end-effector. In this paper, we present a prognostic modeling framework that predicts a robotic manipulator's Remaining Useful Life (RUL) while accounting for the effects of task severity. Our framework represents the robot's position accuracy as a Brownian motion process with a random drift parameter that is influenced by task severity. The dynamic nature of task severity is modeled using a continuous-time Markov chain (CTMC). To evaluate RUL, we discuss two approaches -- (1) a novel closed-form expression for Remaining Lifetime Distribution (RLD), and (2) Monte Carlo simulations, commonly used in prognostics literature. Theoretical results establish the equivalence between these RUL computation approaches. We validate our framework through experiments using two distinct physics-based simulators for planar and spatial robot fleets. Our findings show that robots in both fleets experience shorter RUL when handling a higher proportion of high-severity tasks.
comment: Accepted for Publication in IEEE Transactions on Systems, Man, and Cybernetics: Systems
Deep Learning for Continuous-time Stochastic Control with Jumps
In this paper, we introduce a model-based deep-learning approach to solve finite-horizon continuous-time stochastic control problems with jumps. We iteratively train two neural networks: one to represent the optimal policy and the other to approximate the value function. Leveraging a continuous-time version of the dynamic programming principle, we derive two different training objectives based on the Hamilton-Jacobi-Bellman equation, ensuring that the networks capture the underlying stochastic dynamics. Empirical evaluations on different problems illustrate the accuracy and scalability of our approach, demonstrating its effectiveness in solving complex, high-dimensional stochastic control tasks.
Revisiting Functional Derivatives in Multi-object Tracking
Probability generating functionals (PGFLs) are efficient and powerful tools for tracking independent objects in clutter. It was shown that PGFLs could be used for the elegant derivation of practical multi-object tracking algorithms, e.g., the probability hypothesis density (PHD) filter. However, derivations using PGFLs use the so-called functional derivatives whose definitions usually appear too complicated or heuristic, involving Dirac delta ``functions''. This paper begins by comparing different definitions of functional derivatives and exploring their relationships and implications for practical applications. It then proposes a rigorous definition of the functional derivative, utilizing straightforward yet precise mathematics for clarity. Key properties of the functional derivative are revealed and discussed.
comment: submitted to SIAM Journal on Control and Optimization
Convergence in On-line Learning of Static and Dynamic Systems
The paper derives analytical expressions for the asymptotic average updating direction of the adaptive moment generation (ADAM) algorithm when applied to recursive identification of nonlinear systems. It is proved that the standard hyper-parameter setting results in the same asymptotic average updating direction as a diagonally power normalized stochastic gradient algorithm. With the internal filtering turned off, the asymptotic average updating direction is instead equivalent to that of a sign-sign stochastic gradient algorithm. Global convergence to an invariant set follows, where a subset of parameters contain those that give a correct input-output description of the system. The paper also exploits a nonlinear dynamic model to embed structure in recurrent neural networks. A Monte-Carlo simulation study validates the results.
Constrained Trajectory Optimization for Hybrid Dynamical Systems
Hybrid dynamical systems pose significant challenges for effective planning and control, especially when additional constraints such as obstacle avoidance, state boundaries, and actuation limits are present. In this letter, we extend the recently proposed Hybrid iLQR method [1] to handle state and input constraints within an indirect optimization framework, aiming to preserve computational efficiency and ensure dynamic feasibility. Specifically, we incorporate two constraint handling mechanisms into the Hybrid iLQR: Discrete Barrier State and Augmented Lagrangian methods. Comprehensive simulations across various operational situations are conducted to evaluate and compare the performance of these extended methods in terms of convergence and their ability to handle infeasible starting trajectories. Results indicate that while the Discrete Barrier State approach is more computationally efficient, the Augmented Lagrangian method outperforms it in complex and real-world scenarios with infeasible initial trajectories.
comment: 6 pages 4 figures
Online Regularized Statistical Learning in Reproducing Kernel Hilbert Space With Non-Stationary Data
We study recursive regularized learning algorithms in the reproducing kernel Hilbert space (RKHS) with non-stationary online data streams. We introduce the concept of random Tikhonov regularization path and decompose the tracking error of the algorithm's output for the regularization path into random difference equations in RKHS. We show that the tracking error vanishes in mean square if the regularization path is slowly time-varying. Then, leveraging the monotonicity of inverse operators and the spectral decomposition of compact operators, and introducing the RKHS persistence of excitation condition, we develop a dominated convergence method to prove the mean square consistency between the regularization path and the unknown function to be learned. Especially, for independent and non-identically distributed data streams, the mean square consistency between the algorithm's output and the unknown function is achieved if the input data's marginal probability measures are slowly time-varying and the average measure over each fixed-length time period has a uniformly strictly positive lower bound.
Bayesian Optimization of Process Parameters of a Sensor-Based Sorting System using Gaussian Processes as Surrogate Models
Sensor-based sorting systems enable the physical separation of a material stream into two fractions. The sorting decision is based on the image data evaluation of the sensors used and is carried out using actuators. Various process parameters must be set depending on the properties of the material stream, the dimensioning of the system, and the required sorting accuracy. However, continuous verification and re-adjustment are necessary due to changing requirements and material stream compositions. In this paper, we introduce an approach for optimizing, recurrently monitoring and adjusting the process parameters of a sensor-based sorting system. Based on Bayesian Optimization, Gaussian process regression models are used as surrogate models to achieve specific requirements for system behavior with the uncertainties contained therein. This method minimizes the number of necessary experiments while simultaneously considering two possible optimization targets based on the requirements for both material output streams. In addition, uncertainties are considered during determining sorting accuracies in the model calculation. We evaluated the method with three example process parameters.
comment: Accepted at the IEEE 30th International Conference on Emerging Technologies and Factory Automation (ETFA)
Scalable Distributed Least Squares Algorithm for Linear Algebraic Equations via Periodic Scheduling
In this work, we propose a novel discrete-time distributed algorithm for finding least-squares solutions of linear algebraic equations with a scheduling protocol to further enhance its scalability. Each agent in the network is assumed to know some rows of the coefficient matrix and the corresponding entries in the observation vector. Unlike typical distributed algorithms, our approach considers communication bandwidth limits, allowing agents to transmit only a portion of their ``guessed" solution, independent of its dimension. A cyclic scheduling protocol determines which portion is transmitted at each iteration. Assuming a small fixed step size and a diagonalizable algorithm matrix, we prove that agents' ``guessed" solutions converge exponentially to a least squares solution. For cases where the observation vectors are time-varying, a modified algorithm guarantees practical convergence, with tracking error bounded by the single-step variation in the observation vector. Simulations and comparisons with state-of-the-art algorithms validate our algorithm's feasibility and scalability.
comment: Submitted to IEEE TAC
Towards Machine Learning-based Model Predictive Control for HVAC Control in Multi-Context Buildings at Scale via Ensemble Learning
The building thermodynamics model, which predicts real-time indoor temperature changes under potential HVAC (Heating, Ventilation, and Air Conditioning) control operations, is crucial for optimizing HVAC control in buildings. While pioneering studies have attempted to develop such models for various building environments, these models often require extensive data collection periods and rely heavily on expert knowledge, making the modeling process inefficient and limiting the reusability of the models. This paper explores a model ensemble perspective that utilizes existing developed models as base models to serve a target building environment, thereby providing accurate predictions while reducing the associated efforts. Given that building data streams are non-stationary and the number of base models may increase, we propose a Hierarchical Reinforcement Learning (HRL) approach to dynamically select and weight the base models. Our approach employs a two-tiered decision-making process: the high-level focuses on model selection, while the low-level determines the weights of the selected models. We thoroughly evaluate the proposed approach through offline experiments and an on-site case study, and the experimental results demonstrate the effectiveness of our method.
Online Learning for Dynamic Vickrey-Clarke-Groves Mechanism in Unknown Environments
We consider the problem of online dynamic mechanism design for sequential auctions in unknown environments, where the underlying market and, thus, the bidders' values vary over time as interactions between the seller and the bidders progress. We model the sequential auctions as an infinite-horizon average-reward Markov decision process (MDP). In each round, the seller determines an allocation and sets a payment for each bidder, while each bidder receives a private reward and submits a sealed bid to the seller. The state, which represents the underlying market, evolves according to an unknown transition kernel and the seller's allocation policy without episodic resets. We first extend the Vickrey-Clarke-Groves (VCG) mechanism to sequential auctions, thereby obtaining a dynamic counterpart that preserves the desired properties: efficiency, truthfulness, and individual rationality. We then focus on the online setting and develop a reinforcement learning algorithm for the seller to learn the underlying MDP and implement a mechanism that closely resembles the dynamic VCG mechanism. We show that the learned mechanism approximately satisfies efficiency, truthfulness, and individual rationality and achieves guaranteed performance in terms of various notions of regret.
comment: 20 pages
MESS+: Dynamically Learned Inference-Time LLM Routing in Model Zoos with Service Level Guarantees NeurIPS 2025
Open-weight large language model (LLM) zoos provide access to numerous high-quality models, but selecting the appropriate model for specific tasks remains challenging and requires technical expertise. Most users simply want factually correct, safe, and satisfying responses without concerning themselves with model technicalities, while inference service providers prioritize minimizing operating costs. These competing interests are typically mediated through service level agreements (SLAs) that guarantee minimum service quality. We introduce MESS+, a stochastic optimization algorithm for cost-optimal LLM request routing while providing rigorous SLA compliance guarantees. MESS+ learns request satisfaction probabilities of LLMs in real-time as users interact with the system, based on which model selection decisions are made by solving a per-request optimization problem. Our algorithm includes a novel combination of virtual queues and request satisfaction prediction, along with a theoretical analysis of cost optimality and constraint satisfaction. Across a wide range of state-of-the-art LLM benchmarks, MESS+ achieves an average of $2\times$ cost savings compared to existing LLM routing techniques.
comment: NeurIPS 2025. Code: https://github.com/laminair/mess-plus
Robotics
Semantic World Models
Planning with world models offers a powerful paradigm for robotic control. Conventional approaches train a model to predict future frames conditioned on current frames and actions, which can then be used for planning. However, the objective of predicting future pixels is often at odds with the actual planning objective; strong pixel reconstruction does not always correlate with good planning decisions. This paper posits that instead of reconstructing future frames as pixels, world models only need to predict task-relevant semantic information about the future. For such prediction the paper poses world modeling as a visual question answering problem about semantic information in future frames. This perspective allows world modeling to be approached with the same tools underlying vision language models. Thus vision language models can be trained as "semantic" world models through a supervised finetuning process on image-action-text data, enabling planning for decision-making while inheriting many of the generalization and robustness properties from the pretrained vision-language models. The paper demonstrates how such a semantic world model can be used for policy improvement on open-ended robotics tasks, leading to significant generalization improvements over typical paradigms of reconstruction-based action-conditional world modeling. Website available at https://weirdlabuw.github.io/swm.
SEA: Semantic Map Prediction for Active Exploration of Uncertain Areas
In this paper, we propose SEA, a novel approach for active robot exploration through semantic map prediction and a reinforcement learning-based hierarchical exploration policy. Unlike existing learning-based methods that rely on one-step waypoint prediction, our approach enhances the agent's long-term environmental understanding to facilitate more efficient exploration. We propose an iterative prediction-exploration framework that explicitly predicts the missing areas of the map based on current observations. The difference between the actual accumulated map and the predicted global map is then used to guide exploration. Additionally, we design a novel reward mechanism that leverages reinforcement learning to update the long-term exploration strategies, enabling us to construct an accurate semantic map within limited steps. Experimental results demonstrate that our method significantly outperforms state-of-the-art exploration strategies, achieving superior coverage ares of the global map within the same time constraints.
Learning Affordances at Inference-Time for Vision-Language-Action Models
Solving complex real-world control tasks often takes multiple tries: if we fail at first, we reflect on what went wrong, and change our strategy accordingly to avoid making the same mistake. In robotics, Vision-Language-Action models (VLAs) offer a promising path towards solving complex control tasks, but lack the ability to contextually and dynamically readjust behavior when they fail to accomplish a task. In this work, we introduce Learning from Inference-Time Execution (LITEN), which connects a VLA low-level policy to a high-level VLM that conditions on past experiences by including them in-context, allowing it to learn the affordances and capabilities of the low-level VLA. Our approach iterates between a reasoning phase that generates and executes plans for the low-level VLA, and an assessment phase that reflects on the resulting execution and draws useful conclusions to be included in future reasoning contexts. Unlike similar approaches to self-refinement in non-robotics domains, LITEN must reflect on unstructured real-world robot trajectories (e.g., raw videos), which requires structured guiderails during assessment. Our experimental results demonstrate LITEN is able to effectively learn from past experience to generate plans that use high-affordance instructions to accomplish long-horizon tasks.
comment: 7 pages and appendix
Memo: Training Memory-Efficient Embodied Agents with Reinforcement Learning NeurIPS 2025
To enable embodied agents to operate effectively over extended timeframes, it is crucial to develop models that form and access memories to stay contextualized in their environment. In the current paradigm of training transformer-based policies for embodied sequential decision-making tasks, visual inputs often overwhelm the context limits of transformers, while humans can maintain and utilize a lifetime of experience compressed as memories. Significant compression is possible in principle, as much of the input is irrelevant and can be abstracted. However, existing approaches predominantly focus on either recurrent models with fixed-size memory or transformers with full-context reliance. In this work, we propose Memo, a transformer-based architecture and training recipe for reinforcement learning (RL) on memory-intensive, long-horizon tasks. Memo incorporates the creation and retrieval of memory by interleaving periodic summarization tokens with the inputs of a model during training. We demonstrate Memo's effectiveness on a gridworld meta-RL benchmark and a multi-object navigation task in photo-realistic indoor settings. Memo outperforms naive long-context transformer baselines while being more compute and storage efficient. Additionally, Memo generalizes better to longer contexts at inference time and remains robust in streaming settings, where historical context must be truncated to fit inference constraints.
comment: Accepted for Spotlight Presentation at NeurIPS 2025
Fast Marker Detection for UV-Based Visual Relative Localisation in Agile UAV Swarms
A novel approach for the fast onboard detection of isolated markers for visual relative localisation of multiple teammates in agile UAV swarms is introduced in this paper. As the detection forms a key component of real-time localisation systems, a three-fold innovation is presented, consisting of an optimised procedure for CPUs, a GPU shader program, and a functionally equivalent FPGA streaming architecture. For the proposed CPU and GPU solutions, the mean processing time per pixel of input camera frames was accelerated by two to three orders of magnitude compared to the state of the art. For the localisation task, the proposed FPGA architecture offered the most significant overall acceleration by minimising the total delay from camera exposure to detection results. Additionally, the proposed solutions were evaluated on various 32-bit and 64-bit embedded platforms to demonstrate their efficiency, as well as their feasibility for applications using low-end UAVs and MAVs. Thus, it has become a crucial enabling technology for agile UAV swarming.
LaViRA: Language-Vision-Robot Actions Translation for Zero-Shot Vision Language Navigation in Continuous Environments
Zero-shot Vision-and-Language Navigation in Continuous Environments (VLN-CE) requires an agent to navigate unseen environments based on natural language instructions without any prior training. Current methods face a critical trade-off: either rely on environment-specific waypoint predictors that limit scene generalization, or underutilize the reasoning capabilities of large models during navigation. We introduce LaViRA, a simple yet effective zero-shot framework that addresses this dilemma by decomposing action into a coarse-to-fine hierarchy: Language Action for high-level planning, Vision Action for perceptual grounding, and Robot Action for robust navigation. This modular decomposition allows us to leverage the distinct strengths of different scales of Multimodal Large Language Models (MLLMs) at each stage, creating a system that is powerful in its reasoning, grounding and practical control. LaViRA significantly outperforms existing state-of-the-art methods on the VLN-CE benchmark, demonstrating superior generalization capabilities in unseen environments, while maintaining transparency and efficiency for real-world deployment.
From Forecasting to Planning: Policy World Model for Collaborative State-Action Prediction
Despite remarkable progress in driving world models, their potential for autonomous systems remains largely untapped: the world models are mostly learned for world simulation and decoupled from trajectory planning. While recent efforts aim to unify world modeling and planning in a single framework, the synergistic facilitation mechanism of world modeling for planning still requires further exploration. In this work, we introduce a new driving paradigm named Policy World Model (PWM), which not only integrates world modeling and trajectory planning within a unified architecture, but is also able to benefit planning using the learned world knowledge through the proposed action-free future state forecasting scheme. Through collaborative state-action prediction, PWM can mimic the human-like anticipatory perception, yielding more reliable planning performance. To facilitate the efficiency of video forecasting, we further introduce a dynamically enhanced parallel token generation mechanism, equipped with a context-guided tokenizer and an adaptive dynamic focal loss. Despite utilizing only front camera input, our method matches or exceeds state-of-the-art approaches that rely on multi-view and multi-modal inputs. Code and model weights will be released at https://github.com/6550Zhao/Policy-World-Model.
comment: Accepted by NuerIPS 2025 (Poster)
Optimizing Prosthetic Wrist Movement: A Model Predictive Control Approach
The integration of advanced control strategies into prosthetic hands is essential to improve their adaptability and performance. In this study, we present an implementation of a Model Predictive Control (MPC) strategy to regulate the motions of a soft continuum wrist section attached to a tendon-driven prosthetic hand with less computational effort. MPC plays a crucial role in enhancing the functionality and responsiveness of prosthetic hands. By leveraging predictive modeling, this approach enables precise movement adjustments while accounting for dynamic user interactions. This advanced control strategy allows for the anticipation of future movements and adjustments based on the current state of the prosthetic device and the intentions of the user. Kinematic and dynamic modelings are performed using Euler-Bernoulli beam and Lagrange methods respectively. Through simulation and experimental validations, we demonstrate the effectiveness of MPC in optimizing wrist articulation and user control. Our findings suggest that this technique significantly improves the prosthetic hand dexterity, making movements more natural and intuitive. This research contributes to the field of robotics and biomedical engineering by offering a promising direction for intelligent prosthetic systems.
comment: International Conference on Social Robotics + AI 2025
Using Non-Expert Data to Robustify Imitation Learning via Offline Reinforcement Learning
Imitation learning has proven effective for training robots to perform complex tasks from expert human demonstrations. However, it remains limited by its reliance on high-quality, task-specific data, restricting adaptability to the diverse range of real-world object configurations and scenarios. In contrast, non-expert data -- such as play data, suboptimal demonstrations, partial task completions, or rollouts from suboptimal policies -- can offer broader coverage and lower collection costs. However, conventional imitation learning approaches fail to utilize this data effectively. To address these challenges, we posit that with right design decisions, offline reinforcement learning can be used as a tool to harness non-expert data to enhance the performance of imitation learning policies. We show that while standard offline RL approaches can be ineffective at actually leveraging non-expert data under the sparse data coverage settings typically encountered in the real world, simple algorithmic modifications can allow for the utilization of this data, without significant additional assumptions. Our approach shows that broadening the support of the policy distribution can allow imitation algorithms augmented by offline RL to solve tasks robustly, showing considerably enhanced recovery and generalization behavior. In manipulation tasks, these innovations significantly increase the range of initial conditions where learned policies are successful when non-expert data is incorporated. Moreover, we show that these methods are able to leverage all collected data, including partial or suboptimal demonstrations, to bolster task-directed policy performance. This underscores the importance of algorithmic techniques for using non-expert data for robust policy learning in robotics.
GigaBrain-0: A World Model-Powered Vision-Language-Action Model
Training Vision-Language-Action (VLA) models for generalist robots typically requires large-scale real-world robot data, which is expensive and time-consuming to collect. The inefficiency of physical data collection severely limits the scalability, and generalization capacity of current VLA systems. To address this challenge, we introduce GigaBrain-0, a novel VLA foundation model empowered by world model-generated data (e.g., video generation, real2real transfer, human transfer, view transfer, sim2real transfer data). By leveraging world models to generate diverse data at scale, GigaBrain-0 significantly reduces reliance on real robot data while improving cross-task generalization. Our approach further improves policy robustness through RGBD input modeling and embodied Chain-of-Thought (CoT) supervision, enabling the model to reason about spatial geometry, object states, and long-horizon dependencies during task execution. This leads to substantial gains in real-world performance on dexterous, long-horizon, and mobile manipulation tasks. Extensive experiments demonstrate that GigaBrain-0 achieves superior generalization across variations in appearances (e.g., textures, colors), object placements, and camera viewpoints. Additionally, we present GigaBrain-0-Small, an optimized lightweight variant designed to run efficiently on devices such as the NVIDIA Jetson AGX Orin.
comment: https://gigabrain0.github.io/
Risk Assessment of an Autonomous Underwater Snake Robot in Confined Operations
The growing interest in ocean discovery imposes a need for inspection and intervention in confined and demanding environments. Eely's slender shape, in addition to its ability to change its body configurations, makes articulated underwater robots an adequate option for such environments. However, operation of Eely in such environments imposes demanding requirements on the system, as it must deal with uncertain and unstructured environments, extreme environmental conditions, and reduced navigational capabilities. This paper proposes a Bayesian approach to assess the risks of losing Eely during two mission scenarios. The goal of this work is to improve Eely's performance and the likelihood of mission success. Sensitivity analysis results are presented in order to demonstrate the causes having the highest impact on losing Eely.
comment: 9 pages, 6 figures, Accepted for publication in OCEANS 2023 - Limerick
A Radius of Robust Feasibility Approach to Directional Sensors in Uncertain Terrain
A sensor has the ability to probe its surroundings. However, uncertainties in its exact location can significantly compromise its sensing performance. The radius of robust feasibility defines the maximum range within which robust feasibility is ensured. This work introduces a novel approach integrating it with the directional sensor networks to enhance coverage using a distributed greedy algorithm. In particular, we provide an exact formula for the radius of robust feasibility of sensors in a directional sensor network. The proposed model strategically orients the sensors in regions with high coverage potential, accounting for robustness in the face of uncertainty. We analyze the algorithm's adaptability in dynamic environments, demonstrating its ability to enhance efficiency and robustness. Experimental results validate its efficacy in maximizing coverage and optimizing sensor orientations, highlighting its practical advantages for real-world scenarios.
Using Temperature Sampling to Effectively Train Robot Learning Policies on Imbalanced Datasets
Increasingly large datasets of robot actions and sensory observations are being collected to train ever-larger neural networks. These datasets are collected based on tasks and while these tasks may be distinct in their descriptions, many involve very similar physical action sequences (e.g., 'pick up an apple' versus 'pick up an orange'). As a result, many datasets of robotic tasks are substantially imbalanced in terms of the physical robotic actions they represent. In this work, we propose a simple sampling strategy for policy training that mitigates this imbalance. Our method requires only a few lines of code to integrate into existing codebases and improves generalization. We evaluate our method in both pre-training small models and fine-tuning large foundational models. Our results show substantial improvements on low-resource tasks compared to prior state-of-the-art methods, without degrading performance on high-resource tasks. This enables more effective use of model capacity for multi-task policies. We also further validate our approach in a real-world setup on a Franka Panda robot arm across a diverse set of tasks.
ProTerrain: Probabilistic Physics-Informed Rough Terrain World Modeling ICRA
Uncertainty-aware robot motion prediction is crucial for downstream traversability estimation and safe autonomous navigation in unstructured, off-road environments, where terrain is heterogeneous and perceptual uncertainty is high. Most existing methods assume deterministic or spatially independent terrain uncertainties, ignoring the inherent local correlations of 3D spatial data and often producing unreliable predictions. In this work, we introduce an efficient probabilistic framework that explicitly models spatially correlated aleatoric uncertainty over terrain parameters as a probabilistic world model and propagates this uncertainty through a differentiable physics engine for probabilistic trajectory forecasting. By leveraging structured convolutional operators, our approach provides high-resolution multivariate predictions at manageable computational cost. Experimental evaluation on a publicly available dataset shows significantly improved uncertainty estimation and trajectory prediction accuracy over aleatoric uncertainty estimation baselines.
comment: This paper is submitted to IEEE International Conference on Robotics and Automation (ICRA) 2026
Imitation Learning Policy based on Multi-Step Consistent Integration Shortcut Model
The wide application of flow-matching methods has greatly promoted the development of robot imitation learning. However, these methods all face the problem of high inference time. To address this issue, researchers have proposed distillation methods and consistency methods, but the performance of these methods still struggles to compete with that of the original diffusion models and flow-matching models. In this article, we propose a one-step shortcut method with multi-step integration for robot imitation learning. To balance the inference speed and performance, we extend the multi-step consistency loss on the basis of the shortcut model, split the one-step loss into multi-step losses, and improve the performance of one-step inference. Secondly, to solve the problem of unstable optimization of the multi-step loss and the original flow-matching loss, we propose an adaptive gradient allocation method to enhance the stability of the learning process. Finally, we evaluate the proposed method in two simulation benchmarks and five real-world environment tasks. The experimental results verify the effectiveness of the proposed algorithm.
ConvXformer: Differentially Private Hybrid ConvNeXt-Transformer for Inertial Navigation
Data-driven inertial sequence learning has revolutionized navigation in GPS-denied environments, offering superior odometric resolution compared to traditional Bayesian methods. However, deep learning-based inertial tracking systems remain vulnerable to privacy breaches that can expose sensitive training data. \hl{Existing differential privacy solutions often compromise model performance by introducing excessive noise, particularly in high-frequency inertial measurements.} In this article, we propose ConvXformer, a hybrid architecture that fuses ConvNeXt blocks with Transformer encoders in a hierarchical structure for robust inertial navigation. We propose an efficient differential privacy mechanism incorporating adaptive gradient clipping and gradient-aligned noise injection (GANI) to protect sensitive information while ensuring model performance. Our framework leverages truncated singular value decomposition for gradient processing, enabling precise control over the privacy-utility trade-off. Comprehensive performance evaluations on benchmark datasets (OxIOD, RIDI, RoNIN) demonstrate that ConvXformer surpasses state-of-the-art methods, achieving more than 40% improvement in positioning accuracy while ensuring $(\epsilon,\delta)$-differential privacy guarantees. To validate real-world performance, we introduce the Mech-IO dataset, collected from the mechanical engineering building at KAIST, where intense magnetic fields from industrial equipment induce significant sensor perturbations. This demonstrated robustness under severe environmental distortions makes our framework well-suited for secure and intelligent navigation in cyber-physical systems.
comment: 14 pages, 8 figures, 3 tables
TARMAC: A Taxonomy for Robot Manipulation in Chemistry
Chemistry laboratory automation aims to increase throughput, reproducibility, and safety, yet many existing systems still depend on frequent human intervention. Advances in robotics have reduced this dependency, but without a structured representation of the required skills, autonomy remains limited to bespoke, task-specific solutions with little capacity to transfer beyond their initial design. Current experiment abstractions typically describe protocol-level steps without specifying the robotic actions needed to execute them. This highlights the lack of a systematic account of the manipulation skills required for robots in chemistry laboratories. To address this gap, we introduce TARMAC - a Taxonomy for Robot Manipulation in Chemistry - a domain-specific framework that defines and organizes the core manipulations needed in laboratory practice. Based on annotated teaching-lab demonstrations and supported by experimental validation, TARMAC categorizes actions according to their functional role and physical execution requirements. Beyond serving as a descriptive vocabulary, TARMAC can be instantiated as robot-executable primitives and composed into higher-level macros, enabling skill reuse and supporting scalable integration into long-horizon workflows. These contributions provide a structured foundation for more flexible and autonomous laboratory automation. More information is available at https://tarmac-paper.github.io/
Hierarchical DLO Routing with Reinforcement Learning and In-Context Vision-language Models
Long-horizon routing tasks of deformable linear objects (DLOs), such as cables and ropes, are common in industrial assembly lines and everyday life. These tasks are particularly challenging because they require robots to manipulate DLO with long-horizon planning and reliable skill execution. Successfully completing such tasks demands adapting to their nonlinear dynamics, decomposing abstract routing goals, and generating multi-step plans composed of multiple skills, all of which require accurate high-level reasoning during execution. In this paper, we propose a fully autonomous hierarchical framework for solving challenging DLO routing tasks. Given an implicit or explicit routing goal expressed in language, our framework leverages vision-language models~(VLMs) for in-context high-level reasoning to synthesize feasible plans, which are then executed by low-level skills trained via reinforcement learning. To improve robustness in long horizons, we further introduce a failure recovery mechanism that reorients the DLO into insertion-feasible states. Our approach generalizes to diverse scenes involving object attributes, spatial descriptions, as well as implicit language commands. It outperforms the next best baseline method by nearly 50% and achieves an overall success rate of 92.5% across long-horizon routing scenarios.
comment: 8 pages, 6 figures, 3 tables
Background Fades, Foreground Leads: Curriculum-Guided Background Pruning for Efficient Foreground-Centric Collaborative Perception
Collaborative perception enhances the reliability and spatial coverage of autonomous vehicles by sharing complementary information across vehicles, offering a promising solution to long-tail scenarios that challenge single-vehicle perception. However, the bandwidth constraints of vehicular networks make transmitting the entire feature map impractical. Recent methods, therefore, adopt a foreground-centric paradigm, transmitting only predicted foreground-region features while discarding the background, which encodes essential context. We propose FadeLead, a foreground-centric framework that overcomes this limitation by learning to encapsulate background context into compact foreground features during training. At the core of our design is a curricular learning strategy that leverages background cues early on but progressively prunes them away, forcing the model to internalize context into foreground representations without transmitting background itself. Extensive experiments on both simulated and real-world benchmarks show that FadeLead outperforms prior methods under different bandwidth settings, underscoring the effectiveness of context-enriched foreground sharing.
GRASPLAT: Enabling dexterous grasping through novel view synthesis IROS 2025
Achieving dexterous robotic grasping with multi-fingered hands remains a significant challenge. While existing methods rely on complete 3D scans to predict grasp poses, these approaches face limitations due to the difficulty of acquiring high-quality 3D data in real-world scenarios. In this paper, we introduce GRASPLAT, a novel grasping framework that leverages consistent 3D information while being trained solely on RGB images. Our key insight is that by synthesizing physically plausible images of a hand grasping an object, we can regress the corresponding hand joints for a successful grasp. To achieve this, we utilize 3D Gaussian Splatting to generate high-fidelity novel views of real hand-object interactions, enabling end-to-end training with RGB data. Unlike prior methods, our approach incorporates a photometric loss that refines grasp predictions by minimizing discrepancies between rendered and real images. We conduct extensive experiments on both synthetic and real-world grasping datasets, demonstrating that GRASPLAT improves grasp success rates up to 36.9% over existing image-based methods. Project page: https://mbortolon97.github.io/grasplat/
comment: Accepted IROS 2025
Design of a Bed Rotation Mechanism to Facilitate In-Situ Photogrammetric Reconstruction of Printed Parts
Additive manufacturing, or 3D printing, is a complex process that creates free-form geometric objects by sequentially placing material to construct an object, usually in a layer-by-layer process. One of the most widely used methods is Fused Deposition Modeling (FDM). FDM is used in many of the consumer-grade polymer 3D printers available today. While consumer grade machines are cheap and plentiful, they lack many of the features desired in a machine used for research purposes and are often closed-source platforms. Commercial-grade models are more expensive and are also usually closed-source platforms that do not offer flexibility for modifications often needed for research. The authors designed and fabricated a machine to be used as a test bed for research in the field of polymer FDM processes. The goal was to create a platform that tightly controls and/or monitors the FDM build parameters so that experiments can be repeated with a known accuracy. The platform offers closed loop position feedback, control of the hot end and bed temperature, and monitoring of environment temperature and humidity. Additionally, the platform is equipped with cameras and a mechanism for in-situ photogrammetry, creating a geometric record of the printing throughout the printing process. Through photogrammetry, backtracking and linking process parameters to observable geometric defects can be achieved. This paper focuses on the design of a novel mechanism for spinning the heated bed to allow for photogrammetric reconstruction of the printed part using a minimal number of cameras, as implemented on this platform.
Calibration of Parallel Kinematic Machine Based on Stewart Platform-A Literature Review
Stewart platform-based Parallel Kinematic (PKM) Machines have been extensively studied by researchers due to their inherent finer control characteristics. This has opened its potential deployment opportunities in versatile critical applications like the medical field, engineering machines, space research, electronic chip manufacturing, automobile manufacturing, etc. All these precise, complicated, and repeatable motion applications require micro and nano-scale movement control in 3D space; a 6-DOF PKM can take this challenge smartly. For this, the PKM must be more accurate than the desired application accuracy level and thus proper calibration for a PKM robot is essential. Forward kinematics-based calibration for such hexapod machines becomes unnecessarily complex and inverse kinematics complete this task with much ease. To analyze different techniques, an external instrument-based, constraint-based, and auto or self-calibration-based approaches have been used for calibration. This survey has been done by reviewing these key methodologies, their outcome, and important points related to inverse kinematic-based PKM calibrations in general. It is observed in this study that the researchers focused on improving the accuracy of the platform position and orientation considering the errors contributed by a single source or multiple sources. The error sources considered are mainly structural, in some cases, environmental factors are also considered, however, these calibrations are done under no-load conditions. This study aims to understand the current state of the art in this field and to expand the scope for other researchers in further exploration in a specific area.
Simultaneous learning of state-to-state minimum-time planning and control
This paper tackles the challenge of learning a generalizable minimum-time flight policy for UAVs, capable of navigating between arbitrary start and goal states while balancing agile flight and stable hovering. Traditional approaches, particularly in autonomous drone racing, achieve impressive speeds and agility but are constrained to predefined track layouts, limiting real-world applicability. To address this, we propose a reinforcement learning-based framework that simultaneously learns state-to-state minimum-time planning and control and generalizes to arbitrary state-to-state flights. Our approach leverages Point Mass Model (PMM) trajectories as proxy rewards to approximate the true optimal flight objective and employs curriculum learning to scale the training process efficiently and to achieve generalization. We validate our method through simulation experiments, comparing it against Nonlinear Model Predictive Control (NMPC) tracking PMM-generated trajectories and conducting ablation studies to assess the impact of curriculum learning. Finally, real-world experiments confirm the robustness of our learned policy in outdoor environments, demonstrating its ability to generalize and operate on a small ARM-based single-board computer.
Push Anything: Single- and Multi-Object Pushing From First Sight with Contact-Implicit MPC
Non-prehensile manipulation of diverse objects remains a core challenge in robotics, driven by unknown physical properties and the complexity of contact-rich interactions. Recent advances in contact-implicit model predictive control (CI-MPC), with contact reasoning embedded directly in the trajectory optimization, have shown promise in tackling the task efficiently and robustly, yet demonstrations have been limited to narrowly curated examples. In this work, we showcase the broader capabilities of CI-MPC through precise planar pushing tasks over a wide range of object geometries, including multi-object domains. These scenarios demand reasoning over numerous inter-object and object-environment contacts to strategically manipulate and de-clutter the environment, challenges that were intractable for prior CI-MPC methods. To achieve this, we introduce Consensus Complementarity Control Plus (C3+), an enhanced CI-MPC algorithm integrated into a complete pipeline spanning object scanning, mesh reconstruction, and hardware execution. Compared to its predecessor C3, C3+ achieves substantially faster solve times, enabling real-time performance even in multi-object pushing tasks. On hardware, our system achieves overall 98% success rate across 33 objects, reaching pose goals within tight tolerances. The average time-to-goal is approximately 0.5, 1.6, 3.2, and 5.3 minutes for 1-, 2-, 3-, and 4-object tasks, respectively. Project page: https://dairlab.github.io/push-anything.
comment: Hien Bui, Yufeiyang Gao, and Haoran Yang contributed equally to this work
Configuration-Dependent Robot Kinematics Model and Calibration
Accurate robot kinematics is essential for precise tool placement in articulated robots, but non-geometric factors can introduce configuration-dependent model discrepancies. This paper presents a configuration-dependent kinematic calibration framework for improving accuracy across the entire workspace. Local Product-of-Exponential (POE) models, selected for their parameterization continuity, are identified at multiple configurations and interpolated into a global model. Inspired by joint gravity load expressions, we employ Fourier basis function interpolation parameterized by the shoulder and elbow joint angles, achieving accuracy comparable to neural network and autoencoder methods but with substantially higher training efficiency. Validation on two 6-DoF industrial robots shows that the proposed approach reduces the maximum positioning error by over 50%, meeting the sub-millimeter accuracy required for cold spray manufacturing. Robots with larger configuration-dependent discrepancies benefit even more. A dual-robot collaborative task demonstrates the framework's practical applicability and repeatability.
MoTVLA: A Vision-Language-Action Model with Unified Fast-Slow Reasoning
Integrating visual-language instructions into visuomotor policies is gaining momentum in robot learning for enhancing open-world generalization. Despite promising advances, existing approaches face two challenges: limited language steerability when no generated reasoning is used as a condition, or significant inference latency when reasoning is incorporated.In this work, we introduce MoTVLA, a mixture-of-transformers (MoT)-based vision-language-action (VLA) model that integrates fast-slow unified reasoning with behavior policy learning. MoTVLA preserves the general intelligence of pre-trained VLMs (serving as the generalist) for tasks such as perception, scene understanding, and semantic planning, while incorporating a domain expert, a second transformer that shares knowledge with the pretrained VLM, to generate domain-specific fast reasoning (e.g., robot motion decomposition), thereby improving policy execution efficiency. By conditioning the action expert on decomposed motion instructions, MoTVLA can learn diverse behaviors and substantially improve language steerability. Extensive evaluations across natural language processing benchmarks, robotic simulation environments, and real-world experiments confirm the superiority of MoTVLA in both fast-slow reasoning and manipulation task performance.
OmniVIC: A Self-Improving Variable Impedance Controller with Vision-Language In-Context Learning for Safe Robotic Manipulation
We present OmniVIC, a universal variable impedance controller (VIC) enhanced by a vision language model (VLM), which improves safety and adaptation in any contact-rich robotic manipulation task to enhance safe physical interaction. Traditional VIC have shown advantages when the robot physically interacts with the environment, but lack generalization in unseen, complex, and unstructured safe interactions in universal task scenarios involving contact or uncertainty. To this end, the proposed OmniVIC interprets task context derived reasoning from images and natural language and generates adaptive impedance parameters for a VIC controller. Specifically, the core of OmniVIC is a self-improving Retrieval-Augmented Generation(RAG) and in-context learning (ICL), where RAG retrieves relevant prior experiences from a structured memory bank to inform the controller about similar past tasks, and ICL leverages these retrieved examples and the prompt of current task to query the VLM for generating context-aware and adaptive impedance parameters for the current manipulation scenario. Therefore, a self-improved RAG and ICL guarantee OmniVIC works in universal task scenarios. The impedance parameter regulation is further informed by real-time force/torque feedback to ensure interaction forces remain within safe thresholds. We demonstrate that our method outperforms baselines on a suite of complex contact-rich tasks, both in simulation and on real-world robotic tasks, with improved success rates and reduced force violations. OmniVIC takes a step towards bridging high-level semantic reasoning and low-level compliant control, enabling safer and more generalizable manipulation. Overall, the average success rate increases from 27% (baseline) to 61.4% (OmniVIC).
comment: Code, video and RAG dataset are available at \url{https://sites.google.com/view/omni-vic}
Efficient Vision-Language-Action Models for Embodied Manipulation: A Systematic Survey
Vision-Language-Action (VLA) models extend vision-language models to embodied control by mapping natural-language instructions and visual observations to robot actions. Despite their capabilities, VLA systems face significant challenges due to their massive computational and memory demands, which conflict with the constraints of edge platforms such as on-board mobile manipulators that require real-time performance. Addressing this tension has become a central focus of recent research. In light of the growing efforts toward more efficient and scalable VLA systems, this survey provides a systematic review of approaches for improving VLA efficiency, with an emphasis on reducing latency, memory footprint, and training and inference costs. We categorize existing solutions into four dimensions: model architecture, perception feature, action generation, and training/inference strategies, summarizing representative techniques within each category. Finally, we discuss future trends and open challenges, highlighting directions for advancing efficient embodied intelligence.
What Foundation Models can Bring for Robot Learning in Manipulation : A Survey
The realization of universal robots is an ultimate goal of researchers. However, a key hurdle in achieving this goal lies in the robots' ability to manipulate objects in their unstructured surrounding environments according to different tasks. The learning-based approach is considered an effective way to address generalization. The impressive performance of foundation models in the fields of computer vision and natural language suggests the potential of embedding foundation models into manipulation tasks as a viable path toward achieving general manipulation capability. However, we believe achieving general manipulation capability requires an overarching framework akin to auto driving. This framework should encompass multiple functional modules, with different foundation models assuming distinct roles in facilitating general manipulation capability. This survey focuses on the contributions of foundation models to robot learning for manipulation. We propose a comprehensive framework and detail how foundation models can address challenges in each module of the framework. What's more, we examine current approaches, outline challenges, suggest future research directions, and identify potential risks associated with integrating foundation models into this domain.
AttentionSwarm: Reinforcement Learning with Attention Control Barier Function for Crazyflie Drones in Dynamic Environments
We introduce AttentionSwarm, a novel benchmark designed to evaluate safe and efficient swarm control in a dynamic drone racing scenario. Central to our approach is the Attention Model-Based Control Barrier Function (CBF) framework, which integrates attention mechanisms with safety-critical control theory to enable real-time collision avoidance and trajectory optimization. This framework dynamically prioritizes critical obstacles and agents in the swarm's vicinity using attention weights, while CBFs formally guarantee safety by enforcing collision-free constraints. The AttentionSwarm algorithm was developed and evaluated using a swarm of Crazyflie 2.1 micro quadrotors, which were tested indoors with the Vicon motion capture system to ensure precise localization and control. Experimental results show that our system achieves a 95-100% collision-free navigation rate in a dynamic multi-agent drone racing environment, underscoring its effectiveness and robustness in real-world scenarios. This work offers a promising foundation for safe, high-speed multi-robot applications in logistics, inspection, and racing.
comment: 6 pages, 6 figures
Action Tokenizer Matters in In-Context Imitation Learning IROS 2025
In-context imitation learning (ICIL) is a new paradigm that enables robots to generalize from demonstrations to unseen tasks without retraining. A well-structured action representation is the key to capturing demonstration information effectively, yet action tokenizer (the process of discretizing and encoding actions) remains largely unexplored in ICIL. In this work, we first systematically evaluate existing action tokenizer methods in ICIL and reveal a critical limitation: while they effectively encode action trajectories, they fail to preserve temporal smoothness, which is crucial for stable robotic execution. To address this, we propose LipVQ-VAE, a variational autoencoder that enforces the Lipschitz condition in the latent action space via weight normalization. By propagating smoothness constraints from raw action inputs to a quantized latent codebook, LipVQ-VAE generates more stable and smoother actions. When integrating into ICIL, LipVQ-VAE improves performance by more than 5.3% in high-fidelity simulators, with real-world experiments confirming its ability to produce smoother, more reliable trajectories. Code and checkpoints are available at https://action-tokenizer-matters.github.io/
comment: IROS 2025
RoboGPT-R1: Enhancing Robot Planning with Reinforcement Learning
Improving the reasoning capabilities of embodied agents is crucial for robots to complete complex human instructions in long-view manipulation tasks successfully. Despite the success of large language models and vision language models based on Supervised Fine-Tuning (SFT) in planning tasks, they continue facing challenges in performing long-horizon manipulation tasks in complex real-world environments, owing to their restricted common sense and reasoning capabilities. Considering that aligning general-purpose vision language models to robotic planning tasks via supervised fine-tuning suffers from poor generalization and insufficient physical understanding, we propose RoboGPT-R1, a two-stage fine-tuning framework for embodied planning. In this framework, supervised training acquires foundational knowledge through expert sequences, followed by RL to address the model's shortcomings in visual-spatial understanding and reasoning. To achieve physical understanding and action sequence consistency in multi-step reasoning tasks, we design a rule-based reward function that simultaneously considers long-horizon performance and action constraint in the environment. The reasoning model, trained on Qwen2.5-VL-3B, significantly outperforms the larger-scale model, GPT-4o-mini, by 21.33% and surpasses other work trained on Qwen2.5-VL-7B by 20.33% on the EmbodiedBench benchmark.
RoboMemory: A Brain-inspired Multi-memory Agentic Framework for Interactive Environmental Learning in Physical Embodied Systems
Embodied agents face persistent challenges in real-world environments, including partial observability, limited spatial reasoning, and high-latency multi-memory integration. We present RoboMemory, a brain-inspired framework that unifies Spatial, Temporal, Episodic, and Semantic memory under a parallelized architecture for efficient long-horizon planning and interactive environmental learning. A dynamic spatial knowledge graph (KG) ensures scalable and consistent memory updates, while a closed-loop planner with a critic module supports adaptive decision-making in dynamic settings. Experiments on EmbodiedBench show that RoboMemory, built on Qwen2.5-VL-72B-Ins, improves average success rates by 25% over its baseline and exceeds the closed-source state-of-the-art (SOTA) Gemini-1.5-Pro by 3%. Real-world trials further confirm its capacity for cumulative learning, with performance improving across repeated tasks. These results highlight RoboMemory as a scalable foundation for memory-augmented embodied intelligence, bridging the gap between cognitive neuroscience and robotic autonomy.
Coordinated Strategies in Realistic Air Combat by Hierarchical Multi-Agent Reinforcement Learning
Achieving mission objectives in a realistic simulation of aerial combat is highly challenging due to imperfect situational awareness and nonlinear flight dynamics. In this work, we introduce a novel 3D multi-agent air combat environment and a Hierarchical Multi-Agent Reinforcement Learning framework to tackle these challenges. Our approach combines heterogeneous agent dynamics, curriculum learning, league-play, and a newly adapted training algorithm. To this end, the decision-making process is organized into two abstraction levels: low-level policies learn precise control maneuvers, while high-level policies issue tactical commands based on mission objectives. Empirical results show that our hierarchical approach improves both learning efficiency and combat performance in complex dogfight scenarios.
comment: 2025 IEEE International Conference on Agentic AI (ICA)
Open-World Drone Active Tracking with Goal-Centered Rewards NeurIPS 2025
Drone Visual Active Tracking aims to autonomously follow a target object by controlling the motion system based on visual observations, providing a more practical solution for effective tracking in dynamic environments. However, accurate Drone Visual Active Tracking using reinforcement learning remains challenging due to the absence of a unified benchmark and the complexity of open-world environments with frequent interference. To address these issues, we pioneer a systematic solution. First, we propose DAT, the first open-world drone active air-to-ground tracking benchmark. It encompasses 24 city-scale scenes, featuring targets with human-like behaviors and high-fidelity dynamics simulation. DAT also provides a digital twin tool for unlimited scene generation. Additionally, we propose a novel reinforcement learning method called GC-VAT, which aims to improve the performance of drone tracking targets in complex scenarios. Specifically, we design a Goal-Centered Reward to provide precise feedback across viewpoints to the agent, enabling it to expand perception and movement range through unrestricted perspectives. Inspired by curriculum learning, we introduce a Curriculum-Based Training strategy that progressively enhances the tracking performance in complex environments. Besides, experiments on simulator and real-world images demonstrate the superior performance of GC-VAT, achieving a Tracking Success Rate of approximately 72% on the simulator. The benchmark and code are available at https://github.com/SHWplus/DAT_Benchmark.
comment: NeurIPS 2025
On the Importance of Tactile Sensing for Imitation Learning: A Case Study on Robotic Match Lighting
The field of robotic manipulation has advanced significantly in recent years. At the sensing level, several novel tactile sensors have been developed, capable of providing accurate contact information. On a methodological level, learning from demonstrations has proven an efficient paradigm to obtain performant robotic manipulation policies. The combination of both holds the promise to extract crucial contact-related information from the demonstration data and actively exploit it during policy rollouts. However, this integration has so far been underexplored, most notably in dynamic, contact-rich manipulation tasks where precision and reactivity are essential. This work therefore proposes a multimodal, visuotactile imitation learning framework that integrates a modular transformer architecture with a flow-based generative model, enabling efficient learning of fast and dexterous manipulation policies. We evaluate our framework on the dynamic, contact-rich task of robotic match lighting - a task in which tactile feedback influences human manipulation performance. The experimental results highlight the effectiveness of our approach and show that adding tactile information improves policy performance, thereby underlining their combined potential for learning dynamic manipulation from few demonstrations. Project website: https://sites.google.com/view/tactile-il .
ComDrive: Comfort-Oriented End-to-End Autonomous Driving IROS 2025
We propose ComDrive: the first comfort-oriented end-to-end autonomous driving system to generate temporally consistent and comfortable trajectories. Recent studies have demonstrated that imitation learning-based planners and learning-based trajectory scorers can effectively generate and select safety trajectories that closely mimic expert demonstrations. However, such trajectory planners and scorers face the challenge of generating temporally inconsistent and uncomfortable trajectories. To address these issues, ComDrive first extracts 3D spatial representations through sparse perception, which then serves as conditional inputs. These inputs are used by a Conditional Denoising Diffusion Probabilistic Model (DDPM)-based motion planner to generate temporally consistent multi-modal trajectories. A dual-stream adaptive trajectory scorer subsequently selects the most comfortable trajectory from these candidates to control the vehicle. Experiments demonstrate that ComDrive achieves state-of-the-art performance in both comfort and safety, outperforming UniAD by 17% in driving comfort and reducing collision rates by 25% compared to SparseDrive. More results are available on our project page: https://jmwang0117.github.io/ComDrive/.
comment: IROS 2025
Flow with the Force Field: Learning 3D Compliant Flow Matching Policies from Force and Demonstration-Guided Simulation Data
While visuomotor policy has made advancements in recent years, contact-rich tasks still remain a challenge. Robotic manipulation tasks that require continuous contact demand explicit handling of compliance and force. However, most visuomotor policies ignore compliance, overlooking the importance of physical interaction with the real world, often leading to excessive contact forces or fragile behavior under uncertainty. Introducing force information into vision-based imitation learning could help improve awareness of contacts, but could also require a lot of data to perform well. One remedy for data scarcity is to generate data in simulation, yet computationally taxing processes are required to generate data good enough not to suffer from the Sim2Real gap. In this work, we introduce a framework for generating force-informed data in simulation, instantiated by a single human demonstration, and show how coupling with a compliant policy improves the performance of a visuomotor policy learned from synthetic data. We validate our approach on real-robot tasks, including non-prehensile block flipping and a bi-manual object moving, where the learned policy exhibits reliable contact maintenance and adaptation to novel conditions. Project Website: https://flow-with-the-force-field.github.io/webpage/
VO-DP: Semantic-Geometric Adaptive Diffusion Policy for Vision-Only Robotic Manipulation
In the context of imitation learning, visuomotor-based diffusion policy learning is one of the main directions in robotic manipulation. Most of these approaches rely on point clouds as observation inputs and construct scene representations through point clouds feature learning, which enables them to achieve remarkable accuracy. However, the existing literature lacks an in-depth exploration of vision-only solutions that have significant potential. In this paper, we propose a Vision-Only and single-view Diffusion Policy learning method (VO-DP) that leverages pretrained visual foundation models to achieve effective fusion of semantic and geometric features. We utilize intermediate features from VGGT incorporating semantic features from DINOv2 and geometric features from Alternating Attention blocks. Features are fused via cross-attention and spatially compressed with a CNN to form the input to the policy head. Extensive experiments demonstrate that VO-DP not only outperforms the vision-only baseline DP significantly but also exhibits distinct performance trends against the point cloud-based method DP3: in simulation tasks, VO-DP achieves an average success rate of 64.6% on par with DP3 64.0% and far higher than DP 34.8%, while in real-world tasks, it reaches 87.9%, outperforming both DP3 67.5% and DP 11.2% by a notable margin. Further robustness evaluations confirm that VO-DP remains highly stable under varying conditions including color, size, background, and lighting. Lastly, we open-source a training library for robotic manipulation. Built on Accelerate, this library supports multi-machine and multi-GPU parallel training, as well as mixed precision training. It is compatible with visuomotor policies such as DP, DP3 and VO-DP, and also supports the RoboTwin simulator.
Compositional Coordination for Multi-Robot Teams with Large Language Models
Multi-robot coordination has traditionally relied on a mission-specific and expert-driven pipeline, where natural language mission descriptions are manually translated by domain experts into mathematical formulation, algorithm design, and executable code. This conventional process is labor-intensive, inaccessible to non-experts, and inflexible to changes in mission requirements. Here, we propose LAN2CB (Language to Collective Behavior), a novel framework that leverages large language models (LLMs) to streamline and generalize the multi-robot coordination pipeline. LAN2CB transforms natural language (NL) mission descriptions into executable Python code for multi-robot systems through two core modules: (1) Mission Analysis, which parses mission descriptions into behavior trees, and (2) Code Generation, which leverages the behavior tree and a structured knowledge base to generate robot control code. We further introduce a dataset of natural language mission descriptions to support development and benchmarking. Experiments in both simulation and real-world environments demonstrate that LAN2CB enables robust and flexible multi-robot coordination from natural language, significantly reducing manual engineering effort and supporting broad generalization across diverse mission types. Website: https://sites.google.com/view/lan-cb
comment: IEEE International Symposium on Multi-Robot & Multi-Agent Systems (MRS 2025) Oral
Multiagent Systems
Toward Agentic Software Engineering Beyond Code: Framing Vision, Values, and Vocabulary
Agentic AI is poised to usher in a seismic paradigm shift in Software Engineering (SE). As technologists rush head-along to make agentic AI a reality, SE researchers are driven to establish agentic SE as a research area. While early visions of agentic SE are primarily focused on code-related activities, early empirical evidence calls for a consideration of a range of socio-technical concerns to make it work in practice. This paper contributes to the emerging community vision by: (a) recommending an expansion of its scope beyond code, toward a 'whole of process' vision, grounding it in SE foundations and evolution and emerging agentic SE frameworks, (b) proposing a preliminary set of values and principles to guide efforts, and (c) sharing guidance on designing/using well-defined vocabulary for agentic SE. It is hoped that these ideas will encourage community collaborations and steer the SE community towards laying strong foundations of agentic SE so its not only inevitable but also deliberate and desirable in the long run.
comment: 5 pages
HSCodeComp: A Realistic and Expert-level Benchmark for Deep Search Agents in Hierarchical Rule Application
Effective deep search agents must not only access open-domain and domain-specific knowledge but also apply complex rules-such as legal clauses, medical manuals and tariff rules. These rules often feature vague boundaries and implicit logic relationships, making precise application challenging for agents. However, this critical capability is largely overlooked by current agent benchmarks. To fill this gap, we introduce HSCodeComp, the first realistic, expert-level e-commerce benchmark designed to evaluate deep search agents in hierarchical rule application. In this task, the deep reasoning process of agents is guided by these rules to predict 10-digit Harmonized System Code (HSCode) of products with noisy but realistic descriptions. These codes, established by the World Customs Organization, are vital for global supply chain efficiency. Built from real-world data collected from large-scale e-commerce platforms, our proposed HSCodeComp comprises 632 product entries spanning diverse product categories, with these HSCodes annotated by several human experts. Extensive experimental results on several state-of-the-art LLMs, open-source, and closed-source agents reveal a huge performance gap: best agent achieves only 46.8% 10-digit accuracy, far below human experts at 95.0%. Besides, detailed analysis demonstrates the challenges of hierarchical rule application, and test-time scaling fails to improve performance further.
Polynomial-time Configuration Generator for Connected Unlabeled Multi-Agent Pathfinding
We consider Connected Unlabeled Multi-Agent Pathfinding (CUMAPF), a variant of MAPF where the agents must maintain connectivity at all times. This problem is fundamental to swarm robotics applications like self-reconfiguration and marching, where standard MAPF is insufficient as it does not guarantee the required connectivity between agents. While unlabeled MAPF is tractable in optimization, CUMAPF is NP-hard even on highly restricted graph classes. To tackle this challenge, we propose PULL, a complete and polynomial-time algorithm with a simple design. It is based on a rule-based one-step function that computes a subsequent configuration that preserves connectivity and advances towards the target configuration. PULL is lightweight, and runs in $O(n^2)$ time per step in 2D grid, where $n$ is the number of agents. Our experiments further demonstrate its practical performance: PULL finds competitive solution qualities against trivial solutions for hundreds of agents, in randomly generated instances. Furthermore, we develop an eventually optimal solver that integrates PULL into an existing search-based MAPF algorithm, providing a valuable tool for small-scale instances.
Modeling realistic human behavior using generative agents in a multimodal transport system: Software architecture and Application to Toulouse
Modeling realistic human behaviour to understand people's mode choices in order to propose personalised mobility solutions remains challenging. This paper presents an architecture for modeling realistic human mobility behavior in complex multimodal transport systems, demonstrated through a case study in Toulouse, France. We apply Large Language Models (LLMs) within an agent-based simulation to capture decision-making in a real urban setting. The framework integrates the GAMA simulation platform with an LLM-based generative agent, along with General Transit Feed Specification (GTFS) data for public transport, and OpenTripPlanner for multimodal routing. GAMA platform models the interactive transport environment, providing visualization and dynamic agent interactions while eliminating the need to construct the simulation environment from scratch. This design enables a stronger focus on developing generative agents and evaluating their performance in transport decision-making processes. Over a simulated month, results show that agents not only make context-aware transport decisions but also form habits over time. We conclude that combining LLMs with agent-based simulation offers a promising direction for advancing intelligent transportation systems and personalised multimodal mobility solutions. We also discuss some limitations of this approach and outline future work on scaling to larger regions, integrating real-time data, and refining memory models.
Monitoring LLM-based Multi-Agent Systems Against Corruptions via Node Evaluation
Large Language Model (LLM)-based Multi-Agent Systems (MAS) have become a popular paradigm of AI applications. However, trustworthiness issues in MAS remain a critical concern. Unlike challenges in single-agent systems, MAS involve more complex communication processes, making them susceptible to corruption attacks. To mitigate this issue, several defense mechanisms have been developed based on the graph representation of MAS, where agents represent nodes and communications form edges. Nevertheless, these methods predominantly focus on static graph defense, attempting to either detect attacks in a fixed graph structure or optimize a static topology with certain defensive capabilities. To address this limitation, we propose a dynamic defense paradigm for MAS graph structures, which continuously monitors communication within the MAS graph, then dynamically adjusts the graph topology, accurately disrupts malicious communications, and effectively defends against evolving and diverse dynamic attacks. Experimental results in increasingly complex and dynamic MAS environments demonstrate that our method significantly outperforms existing MAS defense mechanisms, contributing an effective guardrail for their trustworthy applications. Our code is available at https://github.com/ChengcanWu/Monitoring-LLM-Based-Multi-Agent-Systems.
ColorAgent: Building A Robust, Personalized, and Interactive OS Agent
With the advancements in hardware, software, and large language model technologies, the interaction between humans and operating systems has evolved from the command-line interface to the rapidly emerging AI agent interactions. Building an operating system (OS) agent capable of executing user instructions and faithfully following user desires is becoming a reality. In this technical report, we present ColorAgent, an OS agent designed to engage in long-horizon, robust interactions with the environment while also enabling personalized and proactive user interaction. To enable long-horizon interactions with the environment, we enhance the model's capabilities through step-wise reinforcement learning and self-evolving training, while also developing a tailored multi-agent framework that ensures generality, consistency, and robustness. In terms of user interaction, we explore personalized user intent recognition and proactive engagement, positioning the OS agent not merely as an automation tool but as a warm, collaborative partner. We evaluate ColorAgent on the AndroidWorld and AndroidLab benchmarks, achieving success rates of 77.2% and 50.7%, respectively, establishing a new state of the art. Nonetheless, we note that current benchmarks are insufficient for a comprehensive evaluation of OS agents and propose further exploring directions in future work, particularly in the areas of evaluation paradigms, agent collaboration, and security. Our code is available at https://github.com/MadeAgents/mobile-use.
SORA-ATMAS: Adaptive Trust Management and Multi-LLM Aligned Governance for Future Smart Cities
The rapid evolution of smart cities has increased the reliance on intelligent interconnected services to optimize infrastructure, resources, and citizen well-being. Agentic AI has emerged as a key enabler by supporting autonomous decision-making and adaptive coordination, allowing urban systems to respond in real time to dynamic conditions. Its benefits are evident in areas such as transportation, where the integration of traffic data, weather forecasts, and safety sensors enables dynamic rerouting and a faster response to hazards. However, its deployment across heterogeneous smart city ecosystems raises critical governance, risk, and compliance (GRC) challenges, including accountability, data privacy, and regulatory alignment within decentralized infrastructures. Evaluation of SORA-ATMAS with three domain agents (Weather, Traffic, and Safety) demonstrated that its governance policies, including a fallback mechanism for high-risk scenarios, effectively steer multiple LLMs (GPT, Grok, DeepSeek) towards domain-optimized, policy-aligned outputs, producing an average MAE reduction of 35% across agents. Results showed stable weather monitoring, effective handling of high-risk traffic plateaus 0.85, and adaptive trust regulation in Safety/Fire scenarios 0.65. Runtime profiling of a 3-agent deployment confirmed scalability, with throughput between 13.8-17.2 requests per second, execution times below 72~ms, and governance delays under 100 ms, analytical projections suggest maintained performance at larger scales. Cross-domain rules ensured safe interoperability, with traffic rerouting permitted only under validated weather conditions. These findings validate SORA-ATMAS as a regulation-aligned, context-aware, and verifiable governance framework that consolidates distributed agent outputs into accountable, real-time decisions, offering a resilient foundation for smart-city management.
Collaborative penetration testing suite for emerging generative AI algorithms
Problem Space: AI Vulnerabilities and Quantum Threats Generative AI vulnerabilities: model inversion, data poisoning, adversarial inputs. Quantum threats Shor Algorithm breaking RSA ECC encryption. Challenge Secure generative AI models against classical and quantum cyberattacks. Proposed Solution Collaborative Penetration Testing Suite Five Integrated Components: DAST SAST OWASP ZAP, Burp Suite, SonarQube, Fortify. IAST Contrast Assess integrated with CI CD pipeline. Blockchain Logging Hyperledger Fabric for tamper-proof logs. Quantum Cryptography Lattice based RLWE protocols. AI Red Team Simulations Adversarial ML & Quantum-assisted attacks. Integration Layer: Unified workflow for AI, cybersecurity, and quantum experts. Key Results 300+ vulnerabilities identified across test environments. 70% reduction in high-severity issues within 2 weeks. 90% resolution efficiency for blockchain-logged vulnerabilities. Quantum-resistant cryptography maintained 100% integrity in tests. Outcome: Quantum AI Security Protocol integrating Blockchain Quantum Cryptography AI Red Teaming.
Learning to Make Friends: Coaching LLM Agents toward Emergent Social Ties
Can large language model (LLM) agents reproduce the complex social dynamics that characterize human online behavior -- shaped by homophily, reciprocity, and social validation -- and what memory and learning mechanisms enable such dynamics to emerge? We present a multi-agent LLM simulation framework in which agents repeatedly interact, evaluate one another, and adapt their behavior through in-context learning accelerated by a coaching signal. To model human social behavior, we design behavioral reward functions that capture core drivers of online engagement, including social interaction, information seeking, self-presentation, coordination, and emotional support. These rewards align agent objectives with empirically observed user motivations, enabling the study of how network structures and group formations emerge from individual decision-making. Our experiments show that coached LLM agents develop stable interaction patterns and form emergent social ties, yielding network structures that mirror properties of real online communities. By combining behavioral rewards with in-context adaptation, our framework establishes a principled testbed for investigating collective dynamics in LLM populations and reveals how artificial agents may approximate or diverge from human-like social behavior.
Communication to Completion: Modeling Collaborative Workflows with Intelligent Multi-Agent Communication
Teamwork in workspace for complex tasks requires diverse communication strategies, but current multi-agent LLM systems lack systematic frameworks for task oriented communication. We introduce Communication to Completion (C2C), a scalable framework that addresses this gap through two key innovations: (1) the Alignment Factor (AF), a novel metric quantifying agent task alignment that directly impacts work efficiency, and (2) a Sequential Action Framework that integrates stepwise execution with intelligent communication decisions. C2C enables agents to make cost aware communication choices, dynamically improving task understanding through targeted interactions. We evaluated C2C on realistic coding workflows across three complexity tiers and team sizes from 5 to 17 agents, comparing against no communication and fixed steps baselines. The results show that C2C reduces the task completion time by about 40% with acceptable communication costs. The framework completes all tasks successfully in standard configurations and maintains effectiveness at scale. C2C establishes both a theoretical foundation for measuring communication effectiveness in multi-agent systems and a practical framework for complex collaborative tasks.
comment: 13 pages
SIGN: Schema-Induced Games for Naming AAAI 2026
Real-world AI systems are tackling increasingly complex problems, often through interactions among large language model (LLM) agents. When these agents develop inconsistent conventions, coordination can break down. Applications such as collaborative coding and distributed planning therefore require reliable, consistent communication, and scalability is a central concern as systems grow. We introduce Schema-Induced Games for Naming (SIGN), a naming game that examines how lightweight structure can steer convention formation. We compare schema-induced communication to unconstrained natural language and find faster convergence with up to 5.8x higher agreement. These results suggest that minimal structure can act as a simple control knob for efficient multi-agent coordination, pointing toward broader applications beyond the naming game.
comment: AAAI 2026 Student Abstract (Oral). Code available ar https://github.com/ryanzhangofficial/schema-induced-games-for-naming
The Emergence of Complex Behavior in Large-Scale Ecological Environments
We explore how physical scale and population size shape the emergence of complex behaviors in open-ended ecological environments. In our setting, agents are unsupervised and have no explicit rewards or learning objectives but instead evolve over time according to reproduction, mutation, and natural selection. As they act, agents also shape their environment and the population around them in an ongoing dynamic ecology. Our goal is not to optimize a single high-performance policy, but instead to examine how behaviors emerge and evolve across large populations due to natural competition and environmental pressures. In an effort to discover how complex behaviors naturally emerge, we conduct experiments in large-scale worlds that reach populations of more than 60,000 individual agents, each with their own evolved neural network policy. We identify various emergent behaviors such as long-range resource extraction, vision-based foraging, and predation that arise under competitive and survival pressures. We examine how sensing modalities and environmental scale affect the emergence of these behaviors, finding that some appear only in sufficiently large environments and populations, with larger scales increasing behavioral stability and consistency. While there is a rich history of research in evolutionary settings, our scaling results provide promising new directions to explore ecology as an instrument of machine learning in an era of abundant computational resources. Experimental code is available at https://github.com/jbejjani2022/ecological-emergent-behavior.
comment: 18 pages, 11 figures, 6 tables, experiment code available at https://github.com/jbejjani2022/ecological-emergent-behavior
Adaptive Coopetition: Leveraging Coarse Verifier Signals for Resilient Multi-Agent LLM Reasoning NeurIPS 2025
Inference-time computation is a critical yet challenging paradigm for enhancing the reasoning performance of large language models (LLMs). While existing strategies improve reasoning stability and consistency, they suffer from notable limitations: self-correction often reinforces the model's initial biases, and Multi-Agent Collaboration (MAC) often fails due to the lack of efficient coordination mechanisms, leading to collective errors. Although high-performing verifiers can detect reasoning errors, making them reliable requires substantial training. To address these challenges, we introduce a novel inference-time framework, Adaptive Coopetition (AdCo), in which LLM agents utilize an adaptive, UCB-based "coopetition" mechanism. At each round, agents leverage coarse verifier signals to determine whether to collaborate or compete, and iteratively refine their reasoning based on peer feedback. Without relying on high-performance verifiers, our adaptive strategy achieves significant performance gains on mathematical reasoning benchmarks, yielding a 20% relative improvement over baselines on the more challenging dataset. Our approach remains robust and consistent in terms of accuracy under different sample sizes and configurations. This adaptive, signal-guided "coopetition" framework enhances reasoning robustness by leveraging both model knowledge diversity and reasoning trace measures, while also promoting uncertainty-driven exploration, especially when participants have comparable capabilities. From this perspective, our work offers a fresh lens on inference-time computation and paves the way for more resilient multi-agent LLM systems. Our code is available at: https://github.com/AdCo-Research/adaptive-coopetition.
comment: 13 pages, 8 figures. Accepted for presentation at the 5th Workshop on Mathematical Reasoning and AI at NeurIPS 2025
PLAGUE: Plug-and-play framework for Lifelong Adaptive Generation of Multi-turn Exploits
Large Language Models (LLMs) are improving at an exceptional rate. With the advent of agentic workflows, multi-turn dialogue has become the de facto mode of interaction with LLMs for completing long and complex tasks. While LLM capabilities continue to improve, they remain increasingly susceptible to jailbreaking, especially in multi-turn scenarios where harmful intent can be subtly injected across the conversation to produce nefarious outcomes. While single-turn attacks have been extensively explored, adaptability, efficiency and effectiveness continue to remain key challenges for their multi-turn counterparts. To address these gaps, we present PLAGUE, a novel plug-and-play framework for designing multi-turn attacks inspired by lifelong-learning agents. PLAGUE dissects the lifetime of a multi-turn attack into three carefully designed phases (Primer, Planner and Finisher) that enable a systematic and information-rich exploration of the multi-turn attack family. Evaluations show that red-teaming agents designed using PLAGUE achieve state-of-the-art jailbreaking results, improving attack success rates (ASR) by more than 30% across leading models in a lesser or comparable query budget. Particularly, PLAGUE enables an ASR (based on StrongReject) of 81.4% on OpenAI's o3 and 67.3% on Claude's Opus 4.1, two models that are considered highly resistant to jailbreaks in safety literature. Our work offers tools and insights to understand the importance of plan initialization, context optimization and lifelong learning in crafting multi-turn attacks for a comprehensive model vulnerability evaluation.
comment: First two authors have equal author contributions
RADAR: A Risk-Aware Dynamic Multi-Agent Framework for LLM Safety Evaluation via Role-Specialized Collaboration
Existing safety evaluation methods for large language models (LLMs) suffer from inherent limitations, including evaluator bias and detection failures arising from model homogeneity, which collectively undermine the robustness of risk evaluation processes. This paper seeks to re-examine the risk evaluation paradigm by introducing a theoretical framework that reconstructs the underlying risk concept space. Specifically, we decompose the latent risk concept space into three mutually exclusive subspaces: the explicit risk subspace (encompassing direct violations of safety guidelines), the implicit risk subspace (capturing potential malicious content that requires contextual reasoning for identification), and the non-risk subspace. Furthermore, we propose RADAR, a multi-agent collaborative evaluation framework that leverages multi-round debate mechanisms through four specialized complementary roles and employs dynamic update mechanisms to achieve self-evolution of risk concept distributions. This approach enables comprehensive coverage of both explicit and implicit risks while mitigating evaluator bias. To validate the effectiveness of our framework, we construct an evaluation dataset comprising 800 challenging cases. Extensive experiments on our challenging testset and public benchmarks demonstrate that RADAR significantly outperforms baseline evaluation methods across multiple dimensions, including accuracy, stability, and self-evaluation risk sensitivity. Notably, RADAR achieves a 28.87% improvement in risk identification accuracy compared to the strongest baseline evaluation method.
Coordinated Strategies in Realistic Air Combat by Hierarchical Multi-Agent Reinforcement Learning
Achieving mission objectives in a realistic simulation of aerial combat is highly challenging due to imperfect situational awareness and nonlinear flight dynamics. In this work, we introduce a novel 3D multi-agent air combat environment and a Hierarchical Multi-Agent Reinforcement Learning framework to tackle these challenges. Our approach combines heterogeneous agent dynamics, curriculum learning, league-play, and a newly adapted training algorithm. To this end, the decision-making process is organized into two abstraction levels: low-level policies learn precise control maneuvers, while high-level policies issue tactical commands based on mission objectives. Empirical results show that our hierarchical approach improves both learning efficiency and combat performance in complex dogfight scenarios.
comment: 2025 IEEE International Conference on Agentic AI (ICA)
IM-Chat: A Multi-agent LLM Framework Integrating Tool-Calling and Diffusion Modeling for Knowledge Transfer in Injection Molding Industry
The injection molding industry faces critical challenges in preserving and transferring field knowledge, particularly as experienced workers retire and multilingual barriers hinder effective communication. This study introduces IM-Chat, a multi-agent framework based on large language models (LLMs), designed to facilitate knowledge transfer in injection molding. IM-Chat integrates both limited documented knowledge (e.g., troubleshooting tables, manuals) and extensive field data modeled through a data-driven process condition generator that infers optimal manufacturing settings from environmental inputs such as temperature and humidity, enabling robust and context-aware task resolution. By adopting a retrieval-augmented generation (RAG) strategy and tool-calling agents within a modular architecture, IM-Chat ensures adaptability without the need for fine-tuning. Performance was assessed across 100 single-tool and 60 hybrid tasks for GPT-4o, GPT-4o-mini, and GPT-3.5-turbo by domain experts using a 10-point rubric focused on relevance and correctness, and was further supplemented by automated evaluation using GPT-4o guided by a domain-adapted instruction prompt. The evaluation results indicate that more capable models tend to achieve higher accuracy, particularly in complex, tool-integrated scenarios. In addition, compared with the fine-tuned single-agent LLM, IM-Chat demonstrated superior accuracy, particularly in quantitative reasoning, and greater scalability in handling multiple information sources. Overall, these findings demonstrate the viability of multi-agent LLM systems for industrial knowledge workflows and establish IM-Chat as a scalable and generalizable approach to AI-assisted decision support in manufacturing.
Vahana.jl -- A framework (not only) for large-scale agent-based models
Agent-based models (ABMs) offer a powerful framework for understanding complex systems. However, their computational demands often become a significant barrier as the number of agents and complexity of the simulation increase. Traditional ABM platforms often struggle to fully exploit modern computing resources, hindering the development of large-scale simulations. This paper presents Vahana.jl, a high performance computing open source framework that aims to address these limitations. Building on the formalism of synchronous graph dynamical systems, Vahana.jl is especially well suited for models with a focus on (social) networks. The framework seamlessly supports distribution across multiple compute nodes, enabling simulations that would otherwise be beyond the capabilities of a single machine. Implemented in Julia, Vahana.jl leverages the interactive Read-Eval-Print Loop (REPL) environment, facilitating rapid model development and experimentation.
Visual Multi-Agent System: Mitigating Hallucination Snowballing via Visual Flow
Multi-Agent System (MAS) powered by Visual Language Models (VLMs) enables challenging tasks but suffers from a novel failure term, multi-agent visual hallucination snowballing, where hallucinations are seeded in a single agent and amplified by following ones due to the over-reliance on textual flow to relay visual information. Through turn-, layer-, and token-wise attention analyses, we provide detailed insights into the essence of hallucination snowballing regarding the reduction of visual attention allocation. It leads us to identify a subset of vision tokens with a unimodal attention peak in middle layers that best preserve visual evidence but gradually diminish in deeper agent turns, resulting in the visual hallucination snowballing in MAS. Thus, we propose ViF, a lightweight, plug-and-play mitigation paradigm that relays inter-agent messages with Visual Flow powered by the selected visual relay tokens and applies attention reallocation to amplify this pattern. The experiment results demonstrate that our method markedly reduces hallucination snowballing, consistently improving the performance across eight benchmarks based on four common MAS structures and ten base models. The source code is publicly available at: https://github.com/YU-deep/ViF.git.
PARCO: Parallel AutoRegressive Models for Multi-Agent Combinatorial Optimization NeurIPS 2025
Combinatorial optimization problems involving multiple agents are notoriously challenging due to their NP-hard nature and the necessity for effective agent coordination. Despite advancements in learning-based methods, existing approaches often face critical limitations, including suboptimal agent coordination, poor generalization, and high computational latency. To address these issues, we propose PARCO (Parallel AutoRegressive Combinatorial Optimization), a general reinforcement learning framework designed to construct high-quality solutions for multi-agent combinatorial tasks efficiently. To this end, PARCO integrates three key novel components: (1) transformer-based communication layers to enable effective agent collaboration during parallel solution construction, (2) a multiple pointer mechanism for low-latency, parallel agent decision-making, and (3) priority-based conflict handlers to resolve decision conflicts via learned priorities. We evaluate PARCO in multi-agent vehicle routing and scheduling problems, where our approach outperforms state-of-the-art learning methods, demonstrating strong generalization ability and remarkable computational efficiency. We make our source code publicly available to foster future research: https://github.com/ai4co/parco.
comment: Accepted at NeurIPS 2025
ROTATE: Regret-driven Open-ended Training for Ad Hoc Teamwork
Learning to collaborate with previously unseen partners is a fundamental generalization challenge in multi-agent learning, known as Ad Hoc Teamwork (AHT). Existing AHT approaches often adopt a two-stage pipeline, where first, a fixed population of teammates is generated with the idea that they should be representative of the teammates that will be seen at deployment time, and second, an AHT agent is trained to collaborate well with agents in the population. To date, the research community has focused on designing separate algorithms for each stage. This separation has led to algorithms that generate teammates with limited coverage of possible behaviors, and that ignore whether the generated teammates are easy to learn from for the AHT agent. Furthermore, algorithms for training AHT agents typically treat the set of training teammates as static, thus attempting to generalize to previously unseen partner agents without assuming any control over the set of training teammates. This paper presents a unified framework for AHT by reformulating the problem as an open-ended learning process between an AHT agent and an adversarial teammate generator. We introduce ROTATE, a regret-driven, open-ended training algorithm that alternates between improving the AHT agent and generating teammates that probe its deficiencies. Experiments across diverse two-player environments demonstrate that ROTATE significantly outperforms baselines at generalizing to an unseen set of evaluation teammates, thus establishing a new standard for robust and generalizable teamwork.
A Principle of Targeted Intervention for Multi-Agent Reinforcement Learning NeurIPS 2025
Steering cooperative multi-agent reinforcement learning (MARL) towards desired outcomes is challenging, particularly when the global guidance from a human on the whole multi-agent system is impractical in a large-scale MARL. On the other hand, designing external mechanisms (e.g., intrinsic rewards and human feedback) to coordinate agents mostly relies on empirical studies, lacking a easy-to-use research tool. In this work, we employ multi-agent influence diagrams (MAIDs) as a graphical framework to address the above issues. First, we introduce the concept of MARL interaction paradigms, using MAIDs to analyze and visualize both unguided self-organization and global guidance mechanisms in MARL. Then, we design a new MARL interaction paradigm, referred to as the targeted intervention paradigm that is applied to only a single targeted agent, so the problem of global guidance can be mitigated. In our implementation, we introduce a causal inference technique, referred to as Pre-Strategy Intervention (PSI), to realize the targeted intervention paradigm. Since MAIDs can be regarded as a special class of causal diagrams, a composite desired outcome that integrates the primary task goal and an additional desired outcome can be achieved by maximizing the corresponding causal effect through the PSI. Moreover, the bundled relevance graph analysis of MAIDs provides a tool to identify whether an MARL learning paradigm is workable under the design of an MARL interaction paradigm. In experiments, we demonstrate the effectiveness of our proposed targeted intervention, and verify the result of relevance graph analysis.
comment: Published in NeurIPS 2025
Compositional Coordination for Multi-Robot Teams with Large Language Models
Multi-robot coordination has traditionally relied on a mission-specific and expert-driven pipeline, where natural language mission descriptions are manually translated by domain experts into mathematical formulation, algorithm design, and executable code. This conventional process is labor-intensive, inaccessible to non-experts, and inflexible to changes in mission requirements. Here, we propose LAN2CB (Language to Collective Behavior), a novel framework that leverages large language models (LLMs) to streamline and generalize the multi-robot coordination pipeline. LAN2CB transforms natural language (NL) mission descriptions into executable Python code for multi-robot systems through two core modules: (1) Mission Analysis, which parses mission descriptions into behavior trees, and (2) Code Generation, which leverages the behavior tree and a structured knowledge base to generate robot control code. We further introduce a dataset of natural language mission descriptions to support development and benchmarking. Experiments in both simulation and real-world environments demonstrate that LAN2CB enables robust and flexible multi-robot coordination from natural language, significantly reducing manual engineering effort and supporting broad generalization across diverse mission types. Website: https://sites.google.com/view/lan-cb
comment: IEEE International Symposium on Multi-Robot & Multi-Agent Systems (MRS 2025) Oral
GTAlign: Game-Theoretic Alignment of LLM Assistants for Mutual Welfare
Large Language Models (LLMs) have achieved remarkable progress in reasoning, yet sometimes produce responses that are suboptimal for users in tasks such as writing, information seeking, or providing practical guidance. Conventional alignment practices typically assume that maximizing model reward also maximizes user welfare, but this assumption frequently fails in practice: models may over-clarify or generate overly verbose reasoning when users prefer concise answers. Such behaviors resemble the prisoner's dilemma, where individually rational choices lead to socially suboptimal outcomes. The fundamental challenge is the lack of a principled decision making mechanism that mutually benefits both the LLM and the user. We propose Game-Theoretic Alignment (GTAlign), an alignment framework that integrates game-theoretic decision making into both reasoning and training. During reasoning, the model explicitly treats user-LLM interaction as a strategic game: it constructs payoff matrices within its reasoning chain to estimate welfare for both itself and the user, and then selects actions that are mutually beneficial. During training, we introduce a mutual welfare reward that reinforces cooperative responses, aligning model behavior with socially efficient outcomes. In addition, we introduce an inference technique that leverages game-theoretic reasoning to dynamically adapt LLM's response when pricing policies of LLM service change. Extensive experiments demonstrate that GTAlign substantially improves reasoning efficiency, answer quality, and mutual welfare compared to baselines across diverse tasks. The code is available at https://github.com/ulab-uiuc/GTAlign .
comment: 31 pages, 6 figures
Systems and Control (CS)
Nodal Capacity Expansion Planning with Flexible Large-Scale Load Siting
We propose explicitly incorporating large-scale load siting into a stochastic nodal power system capacity expansion planning model that concurrently co-optimizes generation, transmission and storage expansion. The potential operational flexibility of some of these large loads is also taken into account by considering them as consisting of a set of tranches with different reliability requirements, which are modeled as a constraint on expected served energy across operational scenarios. We implement our model as a two-stage stochastic mixed-integer optimization problem with cross-scenario expectation constraints. To overcome the challenge of scalability, we build upon existing work to implement this model on a high performance computing platform and exploit scenario parallelization using an augmented Progressive Hedging Algorithm. The algorithm is implemented using the bounding features of mpisppy, which have shown to provide satisfactory provable optimality gaps despite the absence of theoretical guarantees of convergence. We test our approach to assess the value of this proactive planning framework on total system cost and reliability metrics using realistic testcases geographically assigned to San Diego and South Carolina, with datacenter and direct air capture facilities as large loads.
Bridging Earth and Space: A Survey on HAPS for Non-Terrestrial Networks
HAPS are emerging as key enablers in the evolution of 6G wireless networks, bridging terrestrial and non-terrestrial infrastructures. Operating in the stratosphere, HAPS can provide wide-area coverage, low-latency, energy-efficient broadband communications with flexible deployment options for diverse applications. This survey delivers a comprehensive overview of HAPS use cases, technologies, and integration strategies within the 6G ecosystem. The roles of HAPS in extending connectivity to underserved regions, supporting dynamic backhauling, enabling massive IoT, and delivering reliable low-latency communications for autonomous and immersive services are discussed. The paper reviews state-of-the-art architectures for terrestrial and non-terrestrial network integration, highlights recent field trials. Furthermore, key enabling technologies such as channel modeling, AI-driven resource allocation, interference control, mobility management, and energy-efficient communications are examined. The paper also outlines open research challenges. By addressing existing gaps in the literature, this survey positions HAPS as a foundational component of globally integrated, resilient, and sustainable 6G networks.
comment: 30 pages. This work has been submitted to IEEE Communications Surveys & Tutorials (under review)
Optimal Kron-based Reduction of Networks (Opti-KRON) for Three-phase Distribution Feeders
This paper presents a novel structure-preserving, Kron-based reduction framework for unbalanced distribution feeders. The method aggregates electrically similar nodes within a mixed-integer optimization (MIP) problem to produce reduced networks that optimally reproduce the voltage profiles of the original full network. To overcome computational bottlenecks of MIP formulations, we propose an exhaustive-search formulation to identify optimal aggregation decisions while enforcing voltage margin limits. The proposed exhaustive network reduction algorithm is parallelizable on GPUs, which enables scalable network reduction. The resulting reduced networks approximate the full system's voltage profiles with low errors and are suitable for steady-state analysis and optimal power flow studies. The framework is validated on two real utility distribution feeders with 5,991 and 8,381 nodes. The reduced models achieve up to 90% and 80% network reduction, respectively, while the maximum voltage-magnitude error remains below 0.003 p.u. Furthermore, on a 1000-node version of the network, the GPU-accelerated reduction algorithm runs up to 15x faster than its CPU-based counterpart.
Control Barrier Functions for the Full Class of Signal Temporal Logic Tasks using Spatiotemporal Tubes
This paper introduces a new framework for synthesizing time-varying control barrier functions (TV-CBFs) for general Signal Temporal Logic (STL) specifications using spatiotemporal tubes (STT). We first formulate the STT synthesis as a robust optimization problem (ROP) and solve it through a scenario optimization problem (SOP), providing formal guarantees that the resulting tubes capture the given STL specifications. These STTs are then used to construct TV-CBFs, ensuring that under any control law rendering them invariant, the system satisfies the STL tasks. We demonstrate the framework through case studies on a differential-drive mobile robot and a quadrotor, and provide a comparative analysis showing improved efficiency over existing approaches.
Multi-UAV Flood Monitoring via CVT with Gaussian Mixture of Density Functions for Coverage Control
This study presents a control strategy for coordinating multiple unmanned aerial vehicles (UAVs) to monitor unknown flood regions and estimate the extent of inundation. The proposed method adopts a density-driven coverage framework based on Centroidal Voronoi Tessellation (CVT), in which the density function is modeled using a Gaussian Mixture of Density Functions (GMDF). This formulation provides a more accurate characterization of inundated areas compared to conventional axis-aligned Gaussian models. The performance of the two density modeling approaches is systematically evaluated under different UAV fleet sizes (16, 20, and 24), with multiple simulation trials conducted in the ROS/Gazebo environment. The results show that the GMDF-based formulation consistently achieves higher coverage rates, demonstrating its effectiveness in enhancing flood monitoring and improving UAV spatial distribution.
comment: 9 pages,6 figures
Risk Assessment of an Autonomous Underwater Snake Robot in Confined Operations
The growing interest in ocean discovery imposes a need for inspection and intervention in confined and demanding environments. Eely's slender shape, in addition to its ability to change its body configurations, makes articulated underwater robots an adequate option for such environments. However, operation of Eely in such environments imposes demanding requirements on the system, as it must deal with uncertain and unstructured environments, extreme environmental conditions, and reduced navigational capabilities. This paper proposes a Bayesian approach to assess the risks of losing Eely during two mission scenarios. The goal of this work is to improve Eely's performance and the likelihood of mission success. Sensitivity analysis results are presented in order to demonstrate the causes having the highest impact on losing Eely.
comment: 9 pages, 6 figures, Accepted for publication in OCEANS 2023 - Limerick
Managing Charging Induced Grid Stress and Battery Degradation in Electric Taxi Fleets
Operating fleets of electric vehicles (EVs) introduces several challenges, some of which are borne by the fleet operator, and some of which are borne by the power grid. To maximize short-term profit a fleet operator could always charge EVs at the maximum rate to ensure vehicles are ready to service ride demand. However, due to the stochastic nature of electricity demand, charging EVs at their maximum rate may potentially increase the grid stress and lead to overall instability. Furthermore, high-rate charging of EVs can accelerate battery degradation, thereby reducing the service lifespan of the fleet. This study aims to reconcile the conflicting incentives of fleet longevity, short-term profitability, and grid stability by simulating a taxi fleet throughout its lifespan in relation to its charging policies and service conditions. We develop an EV fleet simulator to evaluate the battery degradation due to unpredictable charging and ride demand. Consequently, the impact on the power grid through the charging infrastructure is assessed due to these activities. This simulation utilizes publicly accessible real-world travel data from the NYC taxi dataset. We compare a baseline 80-20 fleet charging policy with a reinforcement learning-based policy designed to prolong the fleet's service life and alleviate grid stress. We monitor grid stress, battery degradation, and profitability over five years and find that our learned policy outperforms the baseline. This simulator enables fleet operators to assess the impact of different charging policies on these indicators to make informed decisions in the future.
comment: 7 pages, 8 figures, to be published in the proceedings of 2025 IEEE Innovative Smart Grid Technologies - Asia (ISGT-Asia)
Magnetic field estimation using Gaussian process regression for interactive wireless power system design
Wireless power transfer (WPT) with coupled resonators offers a promising solution for the seamless powering of electronic devices. Interactive design approaches that visualize the magnetic field and power transfer efficiency based on system geometry adjustments can facilitate the understanding and exploration of the behavior of these systems for dynamic applications. However, typical electromagnetic field simulation methods, such as the Method of Moments (MoM), require significant computational resources, limiting the rate at which computation can be performed for acceptable interactivity. Furthermore, the system's sensitivity to positional and geometrical changes necessitates a large number of simulations, and structures such as ferromagnetic shields further complicate these simulations. Here, we introduce a machine learning approach using Gaussian Process Regression (GPR), demonstrating for the first time the rapid estimation of the entire magnetic field and power transfer efficiency for near-field coupled systems. To achieve quick and accurate estimation, we develop 3D adaptive grid systems and an active learning strategy to effectively capture the nonlinear interactions between complex system geometries and magnetic fields. By training a regression model, our approach achieves magnetic field computation with sub-second latency and with an average error of less than 6% when validated against independent electromagnetic simulation results.
comment: 29 pages, 8 figures, 1 table
Spatiotemporal Tubes based Control of Unknown Multi-Agent Systems for Temporal Reach-Avoid-Stay Tasks
The paper focuses on designing a controller for unknown dynamical multi-agent systems to achieve temporal reach-avoid-stay tasks for each agent while preventing inter-agent collisions. The main objective is to generate a spatiotemporal tube (STT) for each agent and thereby devise a closed-form, approximation-free, and decentralized control strategy that ensures the system trajectory reaches the target within a specific time while avoiding time-varying unsafe sets and collisions with other agents. In order to achieve this, the requirements of STTs are formulated as a robust optimization problem (ROP) and solved using a sampling-based scenario optimization problem (SOP) to address the issue of infeasibility caused by the infinite number of constraints in ROP. The STTs are generated by solving the SOP, and the corresponding closed-form control is designed to fulfill the specified task. Finally, the effectiveness of our approach is demonstrated through two case studies, one involving omnidirectional robots and the other involving multiple drones modelled as Euler-Lagrange systems.
Query-Efficient Zeroth-Order Algorithms for Nonconvex Optimization
Zeroth-order optimization (ZO) has been a powerful framework for solving black-box problems, which estimates gradients using zeroth-order data to update variables iteratively. The practical applicability of ZO critically depends on the efficiency of single-step gradient estimation and the overall query complexity. However, existing ZO algorithms cannot achieve efficiency on both simultaneously. In this work, we consider a general constrained optimization model with black-box objective and constraint functions. To solve it, we propose novel algorithms that can achieve the state-of-the-art overall query complexity bound of $\mathcal{O}(d/\epsilon^4)$ to find an $\epsilon$-stationary solution ($d$ is the dimension of variable space), while reducing the queries for estimating a single-step gradient from $\mathcal{O}(d)$ to $\mathcal{O}(1)$. Specifically, we integrate block updates with gradient descent ascent and a block gradient estimator, which leads to two algorithms, ZOB-GDA and ZOB-SGDA, respectively. Instead of constructing full gradients, they estimate only partial gradients along random blocks of dimensions, where the adjustable block sizes enable high single-step efficiency without sacrificing convergence guarantees. Our theoretical results establish the finite-sample convergence of the proposed algorithms for nonconvex optimization. Finally, numerical experiments on a practical problem demonstrate that our algorithms require over ten times fewer queries than existing methods.
comment: 34 pages, 4 figures
Policy Gradient Method for LQG Control via Input-Output-History Representation: Convergence to $O(ε)$-Stationary Points
We study the policy gradient method (PGM) for the linear quadratic Gaussian (LQG) dynamic output-feedback control problem using an input-output-history (IOH) representation of the closed-loop system. First, we show that any dynamic output-feedback controller is equivalent to a static partial-state feedback gain for a new system representation characterized by a finite-length IOH. Leveraging this equivalence, we reformulate the search for an optimal dynamic output feedback controller as an optimization problem over the corresponding partial-state feedback gain. Next, we introduce a relaxed version of the IOH-based LQG problem by incorporating a small process noise with covariance $\epsilon I$ into the new system to ensure coerciveness, a key condition for establishing gradient-based convergence guarantees. Consequently, we show that a vanilla PGM for the relaxed problem converges to an $\mathcal{O}(\epsilon)$-stationary point, i.e., $\overline{K}$ satisfying $\|\nabla J(\overline{K})\|_F \leq \mathcal{O}(\epsilon)$, where $J$ denotes the original LQG cost. Numerical experiments empirically indicate convergence to the vicinity of the globally optimal LQG controller.
Safe Output-Feedback Adaptive Optimal Control of Affine Nonlinear Systems
In this paper, we develop a safe control synthesis method that integrates state estimation and parameter estimation within an adaptive optimal control (AOC) and control barrier function (CBF)-based control architecture. The developed approach decouples safety objectives from the learning objectives using a CBF-based guarding controller where the CBFs are robustified to account for the lack of full-state measurements. The coupling of this guarding controller with the AOC-based stabilizing control guarantees safety and regulation despite the lack of full state measurement. The paper leverages recent advancements in deep neural network-based adaptive observers to ensure safety in the presence of state estimation errors. Safety and convergence guarantees are provided using a Lyapunov-based analysis, and the effectiveness of the developed controller is demonstrated through simulation under mild excitation conditions.
Ultra-Fast Wireless Power Hacking
The rapid growth of electric vehicles (EVs) has driven the development of roadway wireless charging technology, effectively extending EV driving range. However, wireless charging introduces significant cybersecurity challenges. Any receiver within the magnetic field can potentially extract energy, and previous research demonstrated that a hacker could detect the operating frequency and steal substantial power. However, our approach required time to track new frequencies or precise adjustments of inductance and capacitance, which would be less effective against potential rapid transmitter frequency changes or capacitance drift. As a solution, we enhanced the interceptor and enabled it to intrude as well as steal energy within just three cycles of the high-frequency signal. Moreover, it can work without any circuit parameters or look-up tables. The key innovation is synchronizing the receiver current with the phase of the magnetic sensor voltage. Through MATLAB / Simulink simulations, finite-element analysis, and experimental validation, we demonstrated that our improved method can steal over 76% of the power received by a fully resonant receiver under identical conditions. This attack demonstrates that simple frequency-changing power encryption offers limited protection against such threats.
comment: 11 pages, 15 figures
Approximate Model Predictive Control for Microgrid Energy Management via Imitation Learning
Efficient energy management is essential for reliable and sustainable microgrid operation amid increasing renewable integration. This paper proposes an imitation learning-based framework to approximate mixed-integer Economic Model Predictive Control (EMPC) for microgrid energy management. The proposed method trains a neural network to imitate expert EMPC control actions from offline trajectories, enabling fast, real-time decision making without solving optimization problems online. To enhance robustness and generalization, the learning process includes noise injection during training to mitigate distribution shift and explicitly incorporates forecast uncertainty in renewable generation and demand. Simulation results demonstrate that the learned policy achieves economic performance comparable to EMPC while only requiring $10\%$ of the computation time of optimization-based EMPC in practice.
comment: Submitted to Engineering Applications of Artificial Intelligence (EAAI) and IFAC WC 2026
IMAS$^2$: Joint Agent Selection and Information-Theoretic Coordinated Perception In Dec-POMDPs
We study the problem of jointly selecting sensing agents and synthesizing decentralized active perception policies for the chosen subset of agents within a Decentralized Partially Observable Markov Decision Process (Dec-POMDP) framework. Our approach employs a two-layer optimization structure. In the inner layer, we introduce information-theoretic metrics, defined by the mutual information between the unknown trajectories or some hidden property in the environment and the collective partial observations in the multi-agent system, as a unified objective for active perception problems. We employ various optimization methods to obtain optimal sensor policies that maximize mutual information for distinct active perception tasks. In the outer layer, we prove that under certain conditions, the information-theoretic objectives are monotone and submodular with respect to the subset of observations collected from multiple agents. We then exploit this property to design an IMAS$^2$ (Information-theoretic Multi-Agent Selection and Sensing) algorithm for joint sensing agent selection and sensing policy synthesis. However, since the policy search space is infinite, we adapt the classical Nemhauser-Wolsey argument to prove that the proposed IMAS$^2$ algorithm can provide a tight $(1 - 1/e)$-guarantee on the performance. Finally, we demonstrate the effectiveness of our approach in a multi-agent cooperative perception in a grid-world environment.
Modeling and Simulation of an Active Car Suspension with a Robust LQR Controller under Road Disturbance, Parameter Uncertainty and White Noise
Vehicle suspension is important for passengers to travel comfortably and to be less exposed to effects such as vibration and shock. A good suspension system increases the road holding of vehicles, allows them to take turns safely, and reduces the risk of traffic accidents. A passive suspension system is the most widely used suspension system in vehicles due to its simple structure and low cost. Passive suspension systems do not have an actuator and therefore do not have a controller. Active suspension systems have an actuator and a controller. Although their structures are more complex and costly, they are safer. PID controller is widely used in active suspension systems due to its simple structure, reasonable cost, and easy adjustment of coefficients. In this study, a more robust LQR-controlled active suspension was designed than a passive suspension and a PID-controlled active suspension. Robustness analyses were performed for passive suspension, PID-controlled active suspension, and LQR-controlled active suspension. Suspension travel, sprung mass acceleration, and sprung mass motion simulations were performed for all three suspensions under road disturbance, under simultaneous road disturbance and parameter uncertainty and under road disturbance with white noise. A comparative analysis was performed by obtaining the rise time, overshoot, and settling time data of the suspensions under different conditions. It was observed that the LQR-controlled active suspension showed the fastest rise time, the least overshoot and had the shortest settling time. In this case, it was proven that the LQR controlled active suspension provided a more comfortable and safe ride compared to the other two suspension systems.
comment: 20 pages, 19 figures
AttentionSwarm: Reinforcement Learning with Attention Control Barier Function for Crazyflie Drones in Dynamic Environments
We introduce AttentionSwarm, a novel benchmark designed to evaluate safe and efficient swarm control in a dynamic drone racing scenario. Central to our approach is the Attention Model-Based Control Barrier Function (CBF) framework, which integrates attention mechanisms with safety-critical control theory to enable real-time collision avoidance and trajectory optimization. This framework dynamically prioritizes critical obstacles and agents in the swarm's vicinity using attention weights, while CBFs formally guarantee safety by enforcing collision-free constraints. The AttentionSwarm algorithm was developed and evaluated using a swarm of Crazyflie 2.1 micro quadrotors, which were tested indoors with the Vicon motion capture system to ensure precise localization and control. Experimental results show that our system achieves a 95-100% collision-free navigation rate in a dynamic multi-agent drone racing environment, underscoring its effectiveness and robustness in real-world scenarios. This work offers a promising foundation for safe, high-speed multi-robot applications in logistics, inspection, and racing.
comment: 6 pages, 6 figures
A Data-Driven Method to Identify Major Contributors to Low-Frequency Oscillations
We present a purely data-driven method to pinpoint generation plants that significantly contribute to poorly damped oscillations as part of post-event analysis. First, Extended Dynamic Mode Decomposition (EDMD) is applied on PMU data from the point of interconnection (POI) of the plants to obtain the finite-dimensional Koopman operator. Then, modal analysis is performed on a reduced-order Koopman operator to extract spatio-temporal patterns. The data-driven eigenvalues and eigenvectors quantify each plant's contribution to critical oscillatory modes without requiring any system model information. We demonstrate the effectiveness of this method through simulated case studies on modified IEEE 39-bus and WECC 179-bus test systems by benchmarking the data-driven results against ground-truth models. Its performance is further validated using PMU data from real oscillation events in the ISO-New England system. This data-driven method offers a practical tool for both planning-stage simulations and post-event analysis of real oscillation events, enabling effective mitigation.
comment: 10 pages, 11 figures, Journal paper.Submitted to IEEE Transactions on Power System
Planning of Off-Grid Renewable Power to Ammonia Systems with Heterogeneous Flexibility: A Multistakeholder Equilibrium Perspective
Off-grid renewable power to ammonia (ReP2A) systems present a promising pathway toward carbon neutrality in both the energy and chemical industries. However, due to chemical safety requirements, the limited flexibility of ammonia synthesis poses a challenge when attempting to align with the variable hydrogen flow produced from renewable power. This necessitates the optimal sizing of equipment capacity for effective and coordinated production across the system. Additionally, an ReP2A system may involve multiple stakeholders with varying degrees of operational flexibility, complicating the planning problem. This paper first examines the multistakeholder sizing equilibrium (MSSE) of the ReP2A system. First, we propose an MSSE model that accounts for individual planning decisions and the competing economic interests of the stakeholders of power generation, hydrogen production, and ammonia synthesis. We then construct an equivalent optimization problem based on Karush-Kuhn-Tucker (KKT) conditions to determine the equilibrium. Following this, we decompose the problem in the temporal dimension and solve it via multicut generalized Benders decomposition (GBD) to address long-term balancing issues. Case studies based on a realistic project reveal that the equilibrium does not naturally balance the interests of all stakeholders due to their heterogeneous characteristics. Our findings suggest that benefit transfer or re-arrangement ensure mutual benefits and the successful implementation of ReP2A projects.
comment: Accepted in IEEE Transactions on Power Systems \copyright2025 IEEE
Optimal Investment Portfolio of Thyristor- and IGBT-based Electrolysis Rectifiers in Utility-scale Renewable P2H Systems
Renewable power-to-hydrogen (ReP2H) systems require rectifiers to supply power to electrolyzers (ELZs). Two main types of rectifiers, insulated-gate bipolar transistor rectifiers (IGBT-Rs) and thyristor rectifiers (TRs), offer distinct tradeoffs. IGBT-Rs provide flexible reactive power control but are costly, whereas TRs are more affordable with lower power loss but consume a large amount of uncontrollable reactive power. A mixed configuration of rectifiers in utility-scale ReP2H systems could achieve a decent tradeoff and increase overall profitability. To explore this potential, this paper proposes an optimal investment portfolio model. First, we model and compare the active and reactive power characteristics of ELZs powered by TRs and IGBT-Rs. Second, we consider the investment of ELZs, rectifiers, and var resources and coordinate the operation of renewables, energy storage, var resources, and the on-off switching and load allocation of multiple ELZs. Subsequently, a two-stage stochastic programming (SP) model based on weighted information gap decision theory (W-IGDT) is developed to address the uncertainties of the renewable power and hydrogen price, and we apply the progressive hedging (PH) algorithm to accelerate its solution. Case studies demonstrate that optimal rectifier configurations increase revenue by at most 13.78% compared with configurations using only TRs or IGBT-Rs, existing project setups, or intuitive designs. Under the optimal portfolio, reactive power compensation investment is nearly eliminated, with a preferred TR-to-IGBT-R ratio of 3:1.
comment: Accepted in IEEE Transactions on Sustainable Energy \copyright 2025 IEEE
QPPG: Quantum-Preconditioned Policy Gradient for Link Adaptation in Rayleigh Fading Channels
Reliable link adaptation is critical for efficient wireless communications in dynamic fading environments. However, reinforcement learning (RL) solutions often suffer from unstable convergence due to poorly conditioned policy gradients, hindering their practical application. We propose the quantum-preconditioned policy gradient (QPPG) algorithm, which leverages Fisher-information-based preconditioning to stabilise and accelerate policy updates. Evaluations in Rayleigh fading scenarios show that QPPG achieves faster convergence, a 28.6% increase in average throughput, and a 43.8% decrease in average transmit power compared to classical methods. This work introduces quantum-geometric conditioning to link adaptation, marking a significant advance in developing robust, quantum-inspired reinforcement learning for future 6G networks, thereby enhancing communication reliability and energy efficiency.
comment: Submitted to IEEE Wireless Communications Letters
Time-causal and time-recursive wavelets
When to apply wavelet analysis to real-time temporal signals, where the future cannot be accessed, it is essential to base all the steps in the signal processing pipeline on computational mechanisms that are truly time-causal. This paper describes how a time-causal wavelet analysis can be performed based on concepts developed in the area of temporal scale-space theory, originating from a complete classification of temporal smoothing kernels that guarantee non-creation of new structures from finer to coarser temporal scale levels. By necessity, convolution with truncated exponential kernels in cascade constitutes the only permissable class of kernels, as well as their temporal derivatives as a natural complement to fulfil the admissibility conditions of wavelet representations. For a particular way of choosing the time constants in the resulting infinite convolution of truncated exponential kernels, to ensure temporal scale covariance and thus self-similarity over temporal scales, we describe how mother wavelets can be chosen as temporal derivatives of the resulting time-causal limit kernel. By developing connections between wavelet theory and scale-space theory, we characterize and quantify how the continuous scaling properties transfer to the discrete implementation, demonstrating how the proposed time-causal wavelet representation can reflect the duration of locally dominant temporal structures in the input signals. We propose that this notion of time-causal wavelet analysis could be a valuable tool for signal processing tasks, where streams of signals are to be processed in real time, specifically for signals that may contain local variations over a rich span of temporal scales, or more generally for analysing physical or biophysical temporal phenomena, where a fully time-causal analysis is called for to be physically realistic.
comment: 25 pages, 10 figures
Physics-Informed Reinforcement Learning for Large-Scale EV Smart Charging Considering Distribution Network Voltage Constraints
Electric Vehicles (EVs) offer substantial flexibility for grid services, yet large-scale, uncoordinated charging can threaten voltage stability in distribution networks. Existing Reinforcement Learning (RL) approaches for smart charging often disregard physical grid constraints or have limited performance for complex large-scale tasks, limiting their scalability and real-world applicability. This paper introduces a physics-informed (PI) RL algorithm that integrates a differentiable power flow model and voltage-based reward design into the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm, enabling EVs to deliver real-time voltage support while meeting user demands. The resulting PI-TD3 algorithm achieves faster convergence, improved sample efficiency, and reliable voltage magnitude regulation under uncertain and overloaded conditions. Benchmarks on the IEEE 34-bus and 123-bus networks show that the proposed PI-TD3 outperforms both model-free RL and optimization-based baselines in grid constraint management, user satisfaction, and economic metrics, even as the system scales to hundreds of EVs. These advances enable robust, scalable, and practical EV charging strategies that enhance grid resilience and support distribution networks operation.
Optimization and Control Technologies for Renewable-Dominated Hydrogen-Blended Integrated Gas-Electricity System: A Review
The growing coupling among electricity, gas, and hydrogen systems is driven by green hydrogen blending into existing natural gas pipelines, paving the way toward a renewable-dominated energy future. However, the integration poses significant challenges, particularly ensuring efficient and safe operation under varying hydrogen penetration and infrastructure adaptability. This paper reviews progress in optimization and control technologies for hydrogen-blended integrated gas-electricity system. First, key technologies and international demonstration projects are introduced to provide an overview of current developments. Besides, advances in gas-electricity system integration, including modeling, scheduling, planning and market design, are reviewed respectively. Then, the potential for cross-system fault propagation is highlighted, and practical methods for safety analysis and control are proposed. Finally, several possible research directions are introduced, aiming to ensure efficient renewable integration and reliable operation.
comment: Accepted by CSEE Journal of Power and Energy Systems in Oct. 2025
Equity-aware Design and Timing of Fare-free Transit Zoning under Demand Uncertainty
We propose the first analytical stochastic model for optimizing the configuration and implementation policies of fare-free transit. The model focuses on a transportation corridor with two transportation modes: automobiles and buses. The corridor is divided into two sections, an inner one with fare-free transit service and an outer one with fare-based transit service. Under the static version of the model, the optimized length and frequency of the fare-free transit zone can be determined by maximizing total social welfare. The findings indicate that implementing fare-free transit can increase transit ridership and reduce automobile use within the fare-free zone while social equity among the demand groups can be enhanced by lengthening the fare-free zone. Notably, the optimal zone length increases when both social welfare and equity are considered jointly, compared to only prioritizing social welfare. The dynamic model, framed within a market entry and exit real options approach, solves the fare policy switching problem, establishing optimal timing policies for activating or terminating fare-free service. The results from dynamic models reveal earlier implementation and extended durations of fare-free transit in the social welfare-aware regime, driven by lower thresholds compared to the social equity-aware regime.
Systems and Control (EESS)
Nodal Capacity Expansion Planning with Flexible Large-Scale Load Siting
We propose explicitly incorporating large-scale load siting into a stochastic nodal power system capacity expansion planning model that concurrently co-optimizes generation, transmission and storage expansion. The potential operational flexibility of some of these large loads is also taken into account by considering them as consisting of a set of tranches with different reliability requirements, which are modeled as a constraint on expected served energy across operational scenarios. We implement our model as a two-stage stochastic mixed-integer optimization problem with cross-scenario expectation constraints. To overcome the challenge of scalability, we build upon existing work to implement this model on a high performance computing platform and exploit scenario parallelization using an augmented Progressive Hedging Algorithm. The algorithm is implemented using the bounding features of mpisppy, which have shown to provide satisfactory provable optimality gaps despite the absence of theoretical guarantees of convergence. We test our approach to assess the value of this proactive planning framework on total system cost and reliability metrics using realistic testcases geographically assigned to San Diego and South Carolina, with datacenter and direct air capture facilities as large loads.
Bridging Earth and Space: A Survey on HAPS for Non-Terrestrial Networks
HAPS are emerging as key enablers in the evolution of 6G wireless networks, bridging terrestrial and non-terrestrial infrastructures. Operating in the stratosphere, HAPS can provide wide-area coverage, low-latency, energy-efficient broadband communications with flexible deployment options for diverse applications. This survey delivers a comprehensive overview of HAPS use cases, technologies, and integration strategies within the 6G ecosystem. The roles of HAPS in extending connectivity to underserved regions, supporting dynamic backhauling, enabling massive IoT, and delivering reliable low-latency communications for autonomous and immersive services are discussed. The paper reviews state-of-the-art architectures for terrestrial and non-terrestrial network integration, highlights recent field trials. Furthermore, key enabling technologies such as channel modeling, AI-driven resource allocation, interference control, mobility management, and energy-efficient communications are examined. The paper also outlines open research challenges. By addressing existing gaps in the literature, this survey positions HAPS as a foundational component of globally integrated, resilient, and sustainable 6G networks.
comment: 30 pages. This work has been submitted to IEEE Communications Surveys & Tutorials (under review)
Optimal Kron-based Reduction of Networks (Opti-KRON) for Three-phase Distribution Feeders
This paper presents a novel structure-preserving, Kron-based reduction framework for unbalanced distribution feeders. The method aggregates electrically similar nodes within a mixed-integer optimization (MIP) problem to produce reduced networks that optimally reproduce the voltage profiles of the original full network. To overcome computational bottlenecks of MIP formulations, we propose an exhaustive-search formulation to identify optimal aggregation decisions while enforcing voltage margin limits. The proposed exhaustive network reduction algorithm is parallelizable on GPUs, which enables scalable network reduction. The resulting reduced networks approximate the full system's voltage profiles with low errors and are suitable for steady-state analysis and optimal power flow studies. The framework is validated on two real utility distribution feeders with 5,991 and 8,381 nodes. The reduced models achieve up to 90% and 80% network reduction, respectively, while the maximum voltage-magnitude error remains below 0.003 p.u. Furthermore, on a 1000-node version of the network, the GPU-accelerated reduction algorithm runs up to 15x faster than its CPU-based counterpart.
Control Barrier Functions for the Full Class of Signal Temporal Logic Tasks using Spatiotemporal Tubes
This paper introduces a new framework for synthesizing time-varying control barrier functions (TV-CBFs) for general Signal Temporal Logic (STL) specifications using spatiotemporal tubes (STT). We first formulate the STT synthesis as a robust optimization problem (ROP) and solve it through a scenario optimization problem (SOP), providing formal guarantees that the resulting tubes capture the given STL specifications. These STTs are then used to construct TV-CBFs, ensuring that under any control law rendering them invariant, the system satisfies the STL tasks. We demonstrate the framework through case studies on a differential-drive mobile robot and a quadrotor, and provide a comparative analysis showing improved efficiency over existing approaches.
Multi-UAV Flood Monitoring via CVT with Gaussian Mixture of Density Functions for Coverage Control
This study presents a control strategy for coordinating multiple unmanned aerial vehicles (UAVs) to monitor unknown flood regions and estimate the extent of inundation. The proposed method adopts a density-driven coverage framework based on Centroidal Voronoi Tessellation (CVT), in which the density function is modeled using a Gaussian Mixture of Density Functions (GMDF). This formulation provides a more accurate characterization of inundated areas compared to conventional axis-aligned Gaussian models. The performance of the two density modeling approaches is systematically evaluated under different UAV fleet sizes (16, 20, and 24), with multiple simulation trials conducted in the ROS/Gazebo environment. The results show that the GMDF-based formulation consistently achieves higher coverage rates, demonstrating its effectiveness in enhancing flood monitoring and improving UAV spatial distribution.
comment: 9 pages,6 figures
Risk Assessment of an Autonomous Underwater Snake Robot in Confined Operations
The growing interest in ocean discovery imposes a need for inspection and intervention in confined and demanding environments. Eely's slender shape, in addition to its ability to change its body configurations, makes articulated underwater robots an adequate option for such environments. However, operation of Eely in such environments imposes demanding requirements on the system, as it must deal with uncertain and unstructured environments, extreme environmental conditions, and reduced navigational capabilities. This paper proposes a Bayesian approach to assess the risks of losing Eely during two mission scenarios. The goal of this work is to improve Eely's performance and the likelihood of mission success. Sensitivity analysis results are presented in order to demonstrate the causes having the highest impact on losing Eely.
comment: 9 pages, 6 figures, Accepted for publication in OCEANS 2023 - Limerick
Managing Charging Induced Grid Stress and Battery Degradation in Electric Taxi Fleets
Operating fleets of electric vehicles (EVs) introduces several challenges, some of which are borne by the fleet operator, and some of which are borne by the power grid. To maximize short-term profit a fleet operator could always charge EVs at the maximum rate to ensure vehicles are ready to service ride demand. However, due to the stochastic nature of electricity demand, charging EVs at their maximum rate may potentially increase the grid stress and lead to overall instability. Furthermore, high-rate charging of EVs can accelerate battery degradation, thereby reducing the service lifespan of the fleet. This study aims to reconcile the conflicting incentives of fleet longevity, short-term profitability, and grid stability by simulating a taxi fleet throughout its lifespan in relation to its charging policies and service conditions. We develop an EV fleet simulator to evaluate the battery degradation due to unpredictable charging and ride demand. Consequently, the impact on the power grid through the charging infrastructure is assessed due to these activities. This simulation utilizes publicly accessible real-world travel data from the NYC taxi dataset. We compare a baseline 80-20 fleet charging policy with a reinforcement learning-based policy designed to prolong the fleet's service life and alleviate grid stress. We monitor grid stress, battery degradation, and profitability over five years and find that our learned policy outperforms the baseline. This simulator enables fleet operators to assess the impact of different charging policies on these indicators to make informed decisions in the future.
comment: 7 pages, 8 figures, to be published in the proceedings of 2025 IEEE Innovative Smart Grid Technologies - Asia (ISGT-Asia)
Magnetic field estimation using Gaussian process regression for interactive wireless power system design
Wireless power transfer (WPT) with coupled resonators offers a promising solution for the seamless powering of electronic devices. Interactive design approaches that visualize the magnetic field and power transfer efficiency based on system geometry adjustments can facilitate the understanding and exploration of the behavior of these systems for dynamic applications. However, typical electromagnetic field simulation methods, such as the Method of Moments (MoM), require significant computational resources, limiting the rate at which computation can be performed for acceptable interactivity. Furthermore, the system's sensitivity to positional and geometrical changes necessitates a large number of simulations, and structures such as ferromagnetic shields further complicate these simulations. Here, we introduce a machine learning approach using Gaussian Process Regression (GPR), demonstrating for the first time the rapid estimation of the entire magnetic field and power transfer efficiency for near-field coupled systems. To achieve quick and accurate estimation, we develop 3D adaptive grid systems and an active learning strategy to effectively capture the nonlinear interactions between complex system geometries and magnetic fields. By training a regression model, our approach achieves magnetic field computation with sub-second latency and with an average error of less than 6% when validated against independent electromagnetic simulation results.
comment: 29 pages, 8 figures, 1 table
Spatiotemporal Tubes based Control of Unknown Multi-Agent Systems for Temporal Reach-Avoid-Stay Tasks
The paper focuses on designing a controller for unknown dynamical multi-agent systems to achieve temporal reach-avoid-stay tasks for each agent while preventing inter-agent collisions. The main objective is to generate a spatiotemporal tube (STT) for each agent and thereby devise a closed-form, approximation-free, and decentralized control strategy that ensures the system trajectory reaches the target within a specific time while avoiding time-varying unsafe sets and collisions with other agents. In order to achieve this, the requirements of STTs are formulated as a robust optimization problem (ROP) and solved using a sampling-based scenario optimization problem (SOP) to address the issue of infeasibility caused by the infinite number of constraints in ROP. The STTs are generated by solving the SOP, and the corresponding closed-form control is designed to fulfill the specified task. Finally, the effectiveness of our approach is demonstrated through two case studies, one involving omnidirectional robots and the other involving multiple drones modelled as Euler-Lagrange systems.
Query-Efficient Zeroth-Order Algorithms for Nonconvex Optimization
Zeroth-order optimization (ZO) has been a powerful framework for solving black-box problems, which estimates gradients using zeroth-order data to update variables iteratively. The practical applicability of ZO critically depends on the efficiency of single-step gradient estimation and the overall query complexity. However, existing ZO algorithms cannot achieve efficiency on both simultaneously. In this work, we consider a general constrained optimization model with black-box objective and constraint functions. To solve it, we propose novel algorithms that can achieve the state-of-the-art overall query complexity bound of $\mathcal{O}(d/\epsilon^4)$ to find an $\epsilon$-stationary solution ($d$ is the dimension of variable space), while reducing the queries for estimating a single-step gradient from $\mathcal{O}(d)$ to $\mathcal{O}(1)$. Specifically, we integrate block updates with gradient descent ascent and a block gradient estimator, which leads to two algorithms, ZOB-GDA and ZOB-SGDA, respectively. Instead of constructing full gradients, they estimate only partial gradients along random blocks of dimensions, where the adjustable block sizes enable high single-step efficiency without sacrificing convergence guarantees. Our theoretical results establish the finite-sample convergence of the proposed algorithms for nonconvex optimization. Finally, numerical experiments on a practical problem demonstrate that our algorithms require over ten times fewer queries than existing methods.
comment: 34 pages, 4 figures
Policy Gradient Method for LQG Control via Input-Output-History Representation: Convergence to $O(ε)$-Stationary Points
We study the policy gradient method (PGM) for the linear quadratic Gaussian (LQG) dynamic output-feedback control problem using an input-output-history (IOH) representation of the closed-loop system. First, we show that any dynamic output-feedback controller is equivalent to a static partial-state feedback gain for a new system representation characterized by a finite-length IOH. Leveraging this equivalence, we reformulate the search for an optimal dynamic output feedback controller as an optimization problem over the corresponding partial-state feedback gain. Next, we introduce a relaxed version of the IOH-based LQG problem by incorporating a small process noise with covariance $\epsilon I$ into the new system to ensure coerciveness, a key condition for establishing gradient-based convergence guarantees. Consequently, we show that a vanilla PGM for the relaxed problem converges to an $\mathcal{O}(\epsilon)$-stationary point, i.e., $\overline{K}$ satisfying $\|\nabla J(\overline{K})\|_F \leq \mathcal{O}(\epsilon)$, where $J$ denotes the original LQG cost. Numerical experiments empirically indicate convergence to the vicinity of the globally optimal LQG controller.
Safe Output-Feedback Adaptive Optimal Control of Affine Nonlinear Systems
In this paper, we develop a safe control synthesis method that integrates state estimation and parameter estimation within an adaptive optimal control (AOC) and control barrier function (CBF)-based control architecture. The developed approach decouples safety objectives from the learning objectives using a CBF-based guarding controller where the CBFs are robustified to account for the lack of full-state measurements. The coupling of this guarding controller with the AOC-based stabilizing control guarantees safety and regulation despite the lack of full state measurement. The paper leverages recent advancements in deep neural network-based adaptive observers to ensure safety in the presence of state estimation errors. Safety and convergence guarantees are provided using a Lyapunov-based analysis, and the effectiveness of the developed controller is demonstrated through simulation under mild excitation conditions.
Ultra-Fast Wireless Power Hacking
The rapid growth of electric vehicles (EVs) has driven the development of roadway wireless charging technology, effectively extending EV driving range. However, wireless charging introduces significant cybersecurity challenges. Any receiver within the magnetic field can potentially extract energy, and previous research demonstrated that a hacker could detect the operating frequency and steal substantial power. However, our approach required time to track new frequencies or precise adjustments of inductance and capacitance, which would be less effective against potential rapid transmitter frequency changes or capacitance drift. As a solution, we enhanced the interceptor and enabled it to intrude as well as steal energy within just three cycles of the high-frequency signal. Moreover, it can work without any circuit parameters or look-up tables. The key innovation is synchronizing the receiver current with the phase of the magnetic sensor voltage. Through MATLAB / Simulink simulations, finite-element analysis, and experimental validation, we demonstrated that our improved method can steal over 76% of the power received by a fully resonant receiver under identical conditions. This attack demonstrates that simple frequency-changing power encryption offers limited protection against such threats.
comment: 11 pages, 15 figures
Approximate Model Predictive Control for Microgrid Energy Management via Imitation Learning
Efficient energy management is essential for reliable and sustainable microgrid operation amid increasing renewable integration. This paper proposes an imitation learning-based framework to approximate mixed-integer Economic Model Predictive Control (EMPC) for microgrid energy management. The proposed method trains a neural network to imitate expert EMPC control actions from offline trajectories, enabling fast, real-time decision making without solving optimization problems online. To enhance robustness and generalization, the learning process includes noise injection during training to mitigate distribution shift and explicitly incorporates forecast uncertainty in renewable generation and demand. Simulation results demonstrate that the learned policy achieves economic performance comparable to EMPC while only requiring $10\%$ of the computation time of optimization-based EMPC in practice.
comment: Submitted to Engineering Applications of Artificial Intelligence (EAAI) and IFAC WC 2026
IMAS$^2$: Joint Agent Selection and Information-Theoretic Coordinated Perception In Dec-POMDPs
We study the problem of jointly selecting sensing agents and synthesizing decentralized active perception policies for the chosen subset of agents within a Decentralized Partially Observable Markov Decision Process (Dec-POMDP) framework. Our approach employs a two-layer optimization structure. In the inner layer, we introduce information-theoretic metrics, defined by the mutual information between the unknown trajectories or some hidden property in the environment and the collective partial observations in the multi-agent system, as a unified objective for active perception problems. We employ various optimization methods to obtain optimal sensor policies that maximize mutual information for distinct active perception tasks. In the outer layer, we prove that under certain conditions, the information-theoretic objectives are monotone and submodular with respect to the subset of observations collected from multiple agents. We then exploit this property to design an IMAS$^2$ (Information-theoretic Multi-Agent Selection and Sensing) algorithm for joint sensing agent selection and sensing policy synthesis. However, since the policy search space is infinite, we adapt the classical Nemhauser-Wolsey argument to prove that the proposed IMAS$^2$ algorithm can provide a tight $(1 - 1/e)$-guarantee on the performance. Finally, we demonstrate the effectiveness of our approach in a multi-agent cooperative perception in a grid-world environment.
Modeling and Simulation of an Active Car Suspension with a Robust LQR Controller under Road Disturbance, Parameter Uncertainty and White Noise
Vehicle suspension is important for passengers to travel comfortably and to be less exposed to effects such as vibration and shock. A good suspension system increases the road holding of vehicles, allows them to take turns safely, and reduces the risk of traffic accidents. A passive suspension system is the most widely used suspension system in vehicles due to its simple structure and low cost. Passive suspension systems do not have an actuator and therefore do not have a controller. Active suspension systems have an actuator and a controller. Although their structures are more complex and costly, they are safer. PID controller is widely used in active suspension systems due to its simple structure, reasonable cost, and easy adjustment of coefficients. In this study, a more robust LQR-controlled active suspension was designed than a passive suspension and a PID-controlled active suspension. Robustness analyses were performed for passive suspension, PID-controlled active suspension, and LQR-controlled active suspension. Suspension travel, sprung mass acceleration, and sprung mass motion simulations were performed for all three suspensions under road disturbance, under simultaneous road disturbance and parameter uncertainty and under road disturbance with white noise. A comparative analysis was performed by obtaining the rise time, overshoot, and settling time data of the suspensions under different conditions. It was observed that the LQR-controlled active suspension showed the fastest rise time, the least overshoot and had the shortest settling time. In this case, it was proven that the LQR controlled active suspension provided a more comfortable and safe ride compared to the other two suspension systems.
comment: 20 pages, 19 figures
AttentionSwarm: Reinforcement Learning with Attention Control Barier Function for Crazyflie Drones in Dynamic Environments
We introduce AttentionSwarm, a novel benchmark designed to evaluate safe and efficient swarm control in a dynamic drone racing scenario. Central to our approach is the Attention Model-Based Control Barrier Function (CBF) framework, which integrates attention mechanisms with safety-critical control theory to enable real-time collision avoidance and trajectory optimization. This framework dynamically prioritizes critical obstacles and agents in the swarm's vicinity using attention weights, while CBFs formally guarantee safety by enforcing collision-free constraints. The AttentionSwarm algorithm was developed and evaluated using a swarm of Crazyflie 2.1 micro quadrotors, which were tested indoors with the Vicon motion capture system to ensure precise localization and control. Experimental results show that our system achieves a 95-100% collision-free navigation rate in a dynamic multi-agent drone racing environment, underscoring its effectiveness and robustness in real-world scenarios. This work offers a promising foundation for safe, high-speed multi-robot applications in logistics, inspection, and racing.
comment: 6 pages, 6 figures
A Data-Driven Method to Identify Major Contributors to Low-Frequency Oscillations
We present a purely data-driven method to pinpoint generation plants that significantly contribute to poorly damped oscillations as part of post-event analysis. First, Extended Dynamic Mode Decomposition (EDMD) is applied on PMU data from the point of interconnection (POI) of the plants to obtain the finite-dimensional Koopman operator. Then, modal analysis is performed on a reduced-order Koopman operator to extract spatio-temporal patterns. The data-driven eigenvalues and eigenvectors quantify each plant's contribution to critical oscillatory modes without requiring any system model information. We demonstrate the effectiveness of this method through simulated case studies on modified IEEE 39-bus and WECC 179-bus test systems by benchmarking the data-driven results against ground-truth models. Its performance is further validated using PMU data from real oscillation events in the ISO-New England system. This data-driven method offers a practical tool for both planning-stage simulations and post-event analysis of real oscillation events, enabling effective mitigation.
comment: 10 pages, 11 figures, Journal paper.Submitted to IEEE Transactions on Power System
Planning of Off-Grid Renewable Power to Ammonia Systems with Heterogeneous Flexibility: A Multistakeholder Equilibrium Perspective
Off-grid renewable power to ammonia (ReP2A) systems present a promising pathway toward carbon neutrality in both the energy and chemical industries. However, due to chemical safety requirements, the limited flexibility of ammonia synthesis poses a challenge when attempting to align with the variable hydrogen flow produced from renewable power. This necessitates the optimal sizing of equipment capacity for effective and coordinated production across the system. Additionally, an ReP2A system may involve multiple stakeholders with varying degrees of operational flexibility, complicating the planning problem. This paper first examines the multistakeholder sizing equilibrium (MSSE) of the ReP2A system. First, we propose an MSSE model that accounts for individual planning decisions and the competing economic interests of the stakeholders of power generation, hydrogen production, and ammonia synthesis. We then construct an equivalent optimization problem based on Karush-Kuhn-Tucker (KKT) conditions to determine the equilibrium. Following this, we decompose the problem in the temporal dimension and solve it via multicut generalized Benders decomposition (GBD) to address long-term balancing issues. Case studies based on a realistic project reveal that the equilibrium does not naturally balance the interests of all stakeholders due to their heterogeneous characteristics. Our findings suggest that benefit transfer or re-arrangement ensure mutual benefits and the successful implementation of ReP2A projects.
comment: Accepted in IEEE Transactions on Power Systems \copyright2025 IEEE
Optimal Investment Portfolio of Thyristor- and IGBT-based Electrolysis Rectifiers in Utility-scale Renewable P2H Systems
Renewable power-to-hydrogen (ReP2H) systems require rectifiers to supply power to electrolyzers (ELZs). Two main types of rectifiers, insulated-gate bipolar transistor rectifiers (IGBT-Rs) and thyristor rectifiers (TRs), offer distinct tradeoffs. IGBT-Rs provide flexible reactive power control but are costly, whereas TRs are more affordable with lower power loss but consume a large amount of uncontrollable reactive power. A mixed configuration of rectifiers in utility-scale ReP2H systems could achieve a decent tradeoff and increase overall profitability. To explore this potential, this paper proposes an optimal investment portfolio model. First, we model and compare the active and reactive power characteristics of ELZs powered by TRs and IGBT-Rs. Second, we consider the investment of ELZs, rectifiers, and var resources and coordinate the operation of renewables, energy storage, var resources, and the on-off switching and load allocation of multiple ELZs. Subsequently, a two-stage stochastic programming (SP) model based on weighted information gap decision theory (W-IGDT) is developed to address the uncertainties of the renewable power and hydrogen price, and we apply the progressive hedging (PH) algorithm to accelerate its solution. Case studies demonstrate that optimal rectifier configurations increase revenue by at most 13.78% compared with configurations using only TRs or IGBT-Rs, existing project setups, or intuitive designs. Under the optimal portfolio, reactive power compensation investment is nearly eliminated, with a preferred TR-to-IGBT-R ratio of 3:1.
comment: Accepted in IEEE Transactions on Sustainable Energy \copyright 2025 IEEE
QPPG: Quantum-Preconditioned Policy Gradient for Link Adaptation in Rayleigh Fading Channels
Reliable link adaptation is critical for efficient wireless communications in dynamic fading environments. However, reinforcement learning (RL) solutions often suffer from unstable convergence due to poorly conditioned policy gradients, hindering their practical application. We propose the quantum-preconditioned policy gradient (QPPG) algorithm, which leverages Fisher-information-based preconditioning to stabilise and accelerate policy updates. Evaluations in Rayleigh fading scenarios show that QPPG achieves faster convergence, a 28.6% increase in average throughput, and a 43.8% decrease in average transmit power compared to classical methods. This work introduces quantum-geometric conditioning to link adaptation, marking a significant advance in developing robust, quantum-inspired reinforcement learning for future 6G networks, thereby enhancing communication reliability and energy efficiency.
comment: Submitted to IEEE Wireless Communications Letters
Time-causal and time-recursive wavelets
When to apply wavelet analysis to real-time temporal signals, where the future cannot be accessed, it is essential to base all the steps in the signal processing pipeline on computational mechanisms that are truly time-causal. This paper describes how a time-causal wavelet analysis can be performed based on concepts developed in the area of temporal scale-space theory, originating from a complete classification of temporal smoothing kernels that guarantee non-creation of new structures from finer to coarser temporal scale levels. By necessity, convolution with truncated exponential kernels in cascade constitutes the only permissable class of kernels, as well as their temporal derivatives as a natural complement to fulfil the admissibility conditions of wavelet representations. For a particular way of choosing the time constants in the resulting infinite convolution of truncated exponential kernels, to ensure temporal scale covariance and thus self-similarity over temporal scales, we describe how mother wavelets can be chosen as temporal derivatives of the resulting time-causal limit kernel. By developing connections between wavelet theory and scale-space theory, we characterize and quantify how the continuous scaling properties transfer to the discrete implementation, demonstrating how the proposed time-causal wavelet representation can reflect the duration of locally dominant temporal structures in the input signals. We propose that this notion of time-causal wavelet analysis could be a valuable tool for signal processing tasks, where streams of signals are to be processed in real time, specifically for signals that may contain local variations over a rich span of temporal scales, or more generally for analysing physical or biophysical temporal phenomena, where a fully time-causal analysis is called for to be physically realistic.
comment: 25 pages, 10 figures
Physics-Informed Reinforcement Learning for Large-Scale EV Smart Charging Considering Distribution Network Voltage Constraints
Electric Vehicles (EVs) offer substantial flexibility for grid services, yet large-scale, uncoordinated charging can threaten voltage stability in distribution networks. Existing Reinforcement Learning (RL) approaches for smart charging often disregard physical grid constraints or have limited performance for complex large-scale tasks, limiting their scalability and real-world applicability. This paper introduces a physics-informed (PI) RL algorithm that integrates a differentiable power flow model and voltage-based reward design into the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm, enabling EVs to deliver real-time voltage support while meeting user demands. The resulting PI-TD3 algorithm achieves faster convergence, improved sample efficiency, and reliable voltage magnitude regulation under uncertain and overloaded conditions. Benchmarks on the IEEE 34-bus and 123-bus networks show that the proposed PI-TD3 outperforms both model-free RL and optimization-based baselines in grid constraint management, user satisfaction, and economic metrics, even as the system scales to hundreds of EVs. These advances enable robust, scalable, and practical EV charging strategies that enhance grid resilience and support distribution networks operation.
Optimization and Control Technologies for Renewable-Dominated Hydrogen-Blended Integrated Gas-Electricity System: A Review
The growing coupling among electricity, gas, and hydrogen systems is driven by green hydrogen blending into existing natural gas pipelines, paving the way toward a renewable-dominated energy future. However, the integration poses significant challenges, particularly ensuring efficient and safe operation under varying hydrogen penetration and infrastructure adaptability. This paper reviews progress in optimization and control technologies for hydrogen-blended integrated gas-electricity system. First, key technologies and international demonstration projects are introduced to provide an overview of current developments. Besides, advances in gas-electricity system integration, including modeling, scheduling, planning and market design, are reviewed respectively. Then, the potential for cross-system fault propagation is highlighted, and practical methods for safety analysis and control are proposed. Finally, several possible research directions are introduced, aiming to ensure efficient renewable integration and reliable operation.
comment: Accepted by CSEE Journal of Power and Energy Systems in Oct. 2025
Equity-aware Design and Timing of Fare-free Transit Zoning under Demand Uncertainty
We propose the first analytical stochastic model for optimizing the configuration and implementation policies of fare-free transit. The model focuses on a transportation corridor with two transportation modes: automobiles and buses. The corridor is divided into two sections, an inner one with fare-free transit service and an outer one with fare-based transit service. Under the static version of the model, the optimized length and frequency of the fare-free transit zone can be determined by maximizing total social welfare. The findings indicate that implementing fare-free transit can increase transit ridership and reduce automobile use within the fare-free zone while social equity among the demand groups can be enhanced by lengthening the fare-free zone. Notably, the optimal zone length increases when both social welfare and equity are considered jointly, compared to only prioritizing social welfare. The dynamic model, framed within a market entry and exit real options approach, solves the fare policy switching problem, establishing optimal timing policies for activating or terminating fare-free service. The results from dynamic models reveal earlier implementation and extended durations of fare-free transit in the social welfare-aware regime, driven by lower thresholds compared to the social equity-aware regime.
Robotics
MADR: MPC-guided Adversarial DeepReach
Hamilton-Jacobi (HJ) Reachability offers a framework for generating safe value functions and policies in the face of adversarial disturbance, but is limited by the curse of dimensionality. Physics-informed deep learning is able to overcome this infeasibility, but itself suffers from slow and inaccurate convergence, primarily due to weak PDE gradients and the complexity of self-supervised learning. A few works, recently, have demonstrated that enriching the self-supervision process with regular supervision (based on the nature of the optimal control problem), greatly accelerates convergence and solution quality, however, these have been limited to single player problems and simple games. In this work, we introduce MADR: MPC-guided Adversarial DeepReach, a general framework to robustly approximate the two-player, zero-sum differential game value function. In doing so, MADR yields the corresponding optimal strategies for both players in zero-sum games as well as safe policies for worst-case robustness. We test MADR on a multitude of high-dimensional simulated and real robotic agents with varying dynamics and games, finding that our approach significantly out-performs state-of-the-art baselines in simulation and produces impressive results in hardware.
comment: 8 pages, under review
Online Object-Level Semantic Mapping for Quadrupeds in Real-World Environments
We present an online semantic object mapping system for a quadruped robot operating in real indoor environments, turning sensor detections into named objects in a global map. During a run, the mapper integrates range geometry with camera detections, merges co-located detections within a frame, and associates repeated detections into persistent object instances across frames. Objects remain in the map when they are out of view, and repeated sightings update the same instance rather than creating duplicates. The output is a compact object layer that can be queried (class, pose, and confidence), is integrated with the occupancy map and readable by a planner. In on-robot tests, the layer remained stable across viewpoint changes.
comment: Published at the Italian Conference on Robotics and Intelligent Machines (I-RIM) 3D, 2025
Sharing the Load: Distributed Model-Predictive Control for Precise Multi-Rover Cargo Transport
For autonomous cargo transportation, teams of mobile robots can provide more operational flexibility than a single large robot. In these scenarios, precision in both inter-vehicle distance and path tracking is key. With this motivation, we develop a distributed model-predictive controller (MPC) for multi-vehicle cargo operations that builds on the precise path-tracking of lidar teach and repeat. To carry cargo, a following vehicle must maintain a Euclidean distance offset from a lead vehicle regardless of the path curvature. Our approach uses a shared map to localize the robots relative to each other without GNSS or direct observations. We compare our approach to a centralized MPC and a baseline approach that directly measures the inter-vehicle distance. The distributed MPC shows equivalent nominal performance to the more complex centralized MPC. Using a direct measurement of the relative distance between the leader and follower shows improved tracking performance in close-range scenarios but struggles with long-range offsets. The operational flexibility provided by distributing the computation makes it well suited for real deployments. We evaluate four types of convoyed path trackers with over 10 km of driving in a coupled convoy. With convoys of two and three rovers, the proposed distributed MPC method works in real-time to allow map-based convoying to maintain maximum spacing within 20 cm of the target in various conditions.
comment: 8 pages, 4 figures
Event-Grounding Graph: Unified Spatio-Temporal Scene Graph from Robotic Observations
A fundamental aspect for building intelligent autonomous robots that can assist humans in their daily lives is the construction of rich environmental representations. While advances in semantic scene representations have enriched robotic scene understanding, current approaches lack a connection between spatial features and dynamic events; e.g., connecting the blue mug to the event washing a mug. In this work, we introduce the event-grounding graph (EGG), a framework grounding event interactions to spatial features of a scene. This representation allows robots to perceive, reason, and respond to complex spatio-temporal queries. Experiments using real robotic data demonstrate EGG's capability to retrieve relevant information and respond accurately to human inquiries concerning the environment and events within. Furthermore, the EGG framework's source code and evaluation dataset are released as open-source at: https://github.com/aalto-intelligent-robotics/EGG.
comment: Submitted to RA-L
Towards An Adaptive Locomotion Strategy For Quadruped Rovers: Quantifying When To Slide Or Walk On Planetary Slopes
Legged rovers provide enhanced mobility compared to wheeled platforms, enabling navigation on steep and irregular planetary terrains. However, traditional legged locomotion might be energetically inefficient and potentially dangerous to the rover on loose and inclined surfaces, such as crater walls and cave slopes. This paper introduces a preliminary study that compares the Cost of Transport (CoT) of walking and torso-based sliding locomotion for quadruped robots across different slopes, friction conditions and speed levels. By identifying intersections between walking and sliding CoT curves, we aim to define threshold conditions that may trigger transitions between the two strategies. The methodology combines physics-based simulations in Isaac Sim with particle interaction validation in ANSYS-Rocky. Our results represent an initial step towards adaptive locomotion strategies for planetary legged rovers.
comment: Published at the 18th Symposium on Advanced Space Technologies in Robotics and Automation (ASTRA 2025)
Least Restrictive Hyperplane Control Barrier Functions
Control Barrier Functions (CBFs) can provide provable safety guarantees for dynamic systems. However, finding a valid CBF for a system of interest is often non-trivial, especially if the shape of the unsafe region is complex and the CBFs are of higher order. A common solution to this problem is to make a conservative approximation of the unsafe region in the form of a line/hyperplane, and use the corresponding conservative Hyperplane-CBF when deciding on safe control actions. In this letter, we note that conservative constraints are only a problem if they prevent us from doing what we want. Thus, instead of first choosing a CBF and then choosing a safe control with respect to the CBF, we optimize over a combination of CBFs and safe controls to get as close as possible to our desired control, while still having the safety guarantee provided by the CBF. We call the corresponding CBF the least restrictive Hyperplane-CBF. Finally, we also provide a way of creating a smooth parameterization of the CBF-family for the optimization, and illustrate the approach on a double integrator dynamical system with acceleration constraints, moving through a group of arbitrarily shaped static and moving obstacles.
C-SWAP: Explainability-Aware Structured Pruning for Efficient Neural Networks Compression BMVC2025
Neural network compression has gained increasing attention in recent years, particularly in computer vision applications, where the need for model reduction is crucial for overcoming deployment constraints. Pruning is a widely used technique that prompts sparsity in model structures, e.g. weights, neurons, and layers, reducing size and inference costs. Structured pruning is especially important as it allows for the removal of entire structures, which further accelerates inference time and reduces memory overhead. However, it can be computationally expensive, requiring iterative retraining and optimization. To overcome this problem, recent methods considered one-shot setting, which applies pruning directly at post-training. Unfortunately, they often lead to a considerable drop in performance. In this paper, we focus on this issue by proposing a novel one-shot pruning framework that relies on explainable deep learning. First, we introduce a causal-aware pruning approach that leverages cause-effect relations between model predictions and structures in a progressive pruning process. It allows us to efficiently reduce the size of the network, ensuring that the removed structures do not deter the performance of the model. Then, through experiments conducted on convolution neural network and vision transformer baselines, pre-trained on classification tasks, we demonstrate that our method consistently achieves substantial reductions in model size, with minimal impact on performance, and without the need for fine-tuning. Overall, our approach outperforms its counterparts, offering the best trade-off. Our code is available on GitHub.
comment: 10 pages, BMVC2025
A Compositional Paradigm for Foundation Models: Towards Smarter Robotic Agents
The birth of Foundation Models brought unprecedented results in a wide range of tasks, from language to vision, to robotic control. These models are able to process huge quantities of data, and can extract and develop rich representations, which can be employed across different domains and modalities. However, they still have issues in adapting to dynamic, real-world scenarios without retraining the entire model from scratch. In this work, we propose the application of Continual Learning and Compositionality principles to foster the development of more flexible, efficient and smart AI solutions.
Quadrupeds for Planetary Exploration: Field Testing Control Algorithms on an Active Volcano
Missions such as the Ingenuity helicopter have shown the advantages of using novel locomotion modes to increase the scientific return of planetary exploration missions. Legged robots can further expand the reach and capability of future planetary missions by traversing more difficult terrain than wheeled rovers, such as jumping over cracks on the ground or traversing rugged terrain with boulders. To develop and test algorithms for using quadruped robots, the AAPLE project was carried out at DFKI. As part of the project, we conducted a series of field experiments on the Volcano on the Aeolian island of Vulcano, an active stratovolcano near Sicily, Italy. The experiments focused on validating newly developed state-of-the-art adaptive optimal control algorithms for quadrupedal locomotion in a high-fidelity analog environment for Lunar and Martian surfaces. This paper presents the technical approach, test plan, software architecture, field deployment strategy, and evaluation results from the Vulcano campaign.
comment: Presented at 18th Symposium on Advanced Space Technologies in Robotics and Automation (ASTRA)
Flexbee: A Grasping and Perching UAV Based on Soft Vector-Propulsion Nozzle
The aim of this paper is to design a new type of grasping and perching unmanned aerial vehicle (UAV), called Flexbee, which features a soft vector-propulsion nozzle (SVPN). Compared to previous UAVs, Flexbee integrates flight, grasping, and perching functionalities into the four SVPNs. This integration offers advantages including decoupled position and attitude control, high structural reuse, and strong adaptability strong adaptability for grasping and perching. A dynamics model of Flexbee has been developed, and the nonlinear coupling issue of the moment has been resolved through linearization of the equivalent moment model. A hierarchical control strategy was used to design controllers for the two operational modes of Flexbee. Finally, flight, grasping, and perching experiments were conducted to validate Flexbee's kinematic capabilities and the effectiveness of the control strategy.
comment: 11 pages, 17 figures
EfficientNav: Towards On-Device Object-Goal Navigation with Navigation Map Caching and Retrieval NeurIPS 2025
Object-goal navigation (ObjNav) tasks an agent with navigating to the location of a specific object in an unseen environment. Embodied agents equipped with large language models (LLMs) and online constructed navigation maps can perform ObjNav in a zero-shot manner. However, existing agents heavily rely on giant LLMs on the cloud, e.g., GPT-4, while directly switching to small LLMs, e.g., LLaMA3.2-11b, suffer from significant success rate drops due to limited model capacity for understanding complex navigation maps, which prevents deploying ObjNav on local devices. At the same time, the long prompt introduced by the navigation map description will cause high planning latency on local devices. In this paper, we propose EfficientNav to enable on-device efficient LLM-based zero-shot ObjNav. To help the smaller LLMs better understand the environment, we propose semantics-aware memory retrieval to prune redundant information in navigation maps. To reduce planning latency, we propose discrete memory caching and attention-based memory clustering to efficiently save and re-use the KV cache. Extensive experimental results demonstrate that EfficientNav achieves 11.1% improvement in success rate on HM3D benchmark over GPT-4-based baselines, and demonstrates 6.7x real-time latency reduction and 4.7x end-to-end latency reduction over GPT-4 planner. Our code will be released soon.
comment: NeurIPS 2025
Efficient Model-Based Reinforcement Learning for Robot Control via Online Learning
We present an online model-based reinforcement learning algorithm suitable for controlling complex robotic systems directly in the real world. Unlike prevailing sim-to-real pipelines that rely on extensive offline simulation and model-free policy optimization, our method builds a dynamics model from real-time interaction data and performs policy updates guided by the learned dynamics model. This efficient model-based reinforcement learning scheme significantly reduces the number of samples to train control policies, enabling direct training on real-world rollout data. This significantly reduces the influence of bias in the simulated data, and facilitates the search for high-performance control policies. We adopt online learning analysis to derive sublinear regret bounds under standard stochastic online optimization assumptions, providing formal guarantees on performance improvement as more interaction data are collected. Experimental evaluations were performed on a hydraulic excavator arm and a soft robot arm, where the algorithm demonstrates strong sample efficiency compared to model-free reinforcement learning methods, reaching comparable performance within hours. Robust adaptation to shifting dynamics was also observed when the payload condition was randomized. Our approach paves the way toward efficient and reliable on-robot learning for a broad class of challenging control tasks.
MPC-based motion planning for non-holonomic systems in non-convex domains
Motivated by the application of using model predictive control (MPC) for motion planning of autonomous mobile robots, a form of output tracking MPC for non-holonomic systems and with non-convex constraints is studied. Although the advantages of using MPC for motion planning have been demonstrated in several papers, in most of the available fundamental literature on output tracking MPC it is assumed, often implicitly, that the model is holonomic and generally the state or output constraints must be convex. Thus, in application-oriented publications, empirical results dominate and the topic of proving completeness, in particular under which assumptions the target is always reached, has received comparatively little attention. To address this gap, we present a novel MPC formulation that guarantees convergence to the desired target under realistic assumptions, which can be verified in relevant real-world scenarios.
comment: Preprint of ECC 2025 submission
Biomechanically consistent real-time action recognition for human-robot interaction
This paper presents a novel framework for real-time human action recognition in industrial contexts, using standard 2D cameras. We introduce a complete pipeline for robust and real-time estimation of human joint kinematics, input to a temporally smoothed Transformer-based network, for action recognition. We rely on a new dataset including 11 subjects performing various actions, to evaluate our approach. Unlike most of the literature that relies on joint center positions (JCP) and is offline, ours uses biomechanical prior, eg. joint angles, for fast and robust real-time recognition. Besides, joint angles make the proposed method agnostic to sensor and subject poses as well as to anthropometric differences, and ensure robustness across environments and subjects. Our proposed learning model outperforms the best baseline model, running also in real-time, along various metrics. It achieves 88% accuracy and shows great generalization ability, for subjects not facing the cameras. Finally, we demonstrate the robustness and usefulness of our technique, through an online interaction experiment, with a simulated robot controlled in real-time via the recognized actions.
MMRHP: A Miniature Mixed-Reality HIL Platform for Auditable Closed-Loop Evaluation
Validation of autonomous driving systems requires a trade-off between test fidelity, cost, and scalability. While miniaturized hardware-in-the-loop (HIL) platforms have emerged as a promising solution, a systematic framework supporting rigorous quantitative analysis is generally lacking, limiting their value as scientific evaluation tools. To address this challenge, we propose MMRHP, a miniature mixed-reality HIL platform that elevates miniaturized testing from functional demonstration to rigorous, reproducible quantitative analysis. The core contributions are threefold. First, we propose a systematic three-phase testing process oriented toward the Safety of the Intended Functionality(SOTIF)standard, providing actionable guidance for identifying the performance limits and triggering conditions of otherwise correctly functioning systems. Second, we design and implement a HIL platform centered around a unified spatiotemporal measurement core to support this process, ensuring consistent and traceable quantification of physical motion and system timing. Finally, we demonstrate the effectiveness of this solution through comprehensive experiments. The platform itself was first validated, achieving a spatial accuracy of 10.27 mm RMSE and a stable closed-loop latency baseline of approximately 45 ms. Subsequently, an in-depth Autoware case study leveraged this validated platform to quantify its performance baseline and identify a critical performance cliff at an injected latency of 40 ms. This work shows that a structured process, combined with a platform offering a unified spatio-temporal benchmark, enables reproducible, interpretable, and quantitative closed-loop evaluation of autonomous driving systems.
PGTT: Phase-Guided Terrain Traversal for Perceptive Legged Locomotion
State-of-the-art perceptive Reinforcement Learning controllers for legged robots either (i) impose oscillator or IK-based gait priors that constrain the action space, add bias to the policy optimization and reduce adaptability across robot morphologies, or (ii) operate "blind", which struggle to anticipate hind-leg terrain, and are brittle to noise. In this paper, we propose Phase-Guided Terrain Traversal (PGTT), a perception-aware deep-RL approach that overcomes these limitations by enforcing gait structure purely through reward shaping, thereby reducing inductive bias in policy learning compared to oscillator/IK-conditioned action priors. PGTT encodes per-leg phase as a cubic Hermite spline that adapts swing height to local heightmap statistics and adds a swing-phase contact penalty, while the policy acts directly in joint space supporting morphology-agnostic deployment. Trained in MuJoCo (MJX) on procedurally generated stair-like terrains with curriculum and domain randomization, PGTT achieves the highest success under push disturbances (median +7.5% vs. the next best method) and on discrete obstacles (+9%), with comparable velocity tracking, and converging to an effective policy roughly 2x faster than strong end-to-end baselines. We validate PGTT on a Unitree Go2 using a real-time LiDAR elevation-to-heightmap pipeline, and we report preliminary results on ANYmal-C obtained with the same hyperparameters. These findings indicate that terrain-adaptive, phase-guided reward shaping is a simple and general mechanism for robust perceptive locomotion across platforms.
comment: 9 pages, 9 figures, 2 tables
Coverage-Recon: Coordinated Multi-Drone Image Sampling with Online Map Feedback
This article addresses collaborative 3D map reconstruction using multiple drones. Achieving high-quality reconstruction requires capturing images of keypoints within the target scene from diverse viewing angles, and coverage control offers an effective framework to meet this requirement. Meanwhile, recent advances in real-time 3D reconstruction algorithms make it possible to render an evolving map during flight, enabling immediate feedback to guide drone motion. Building on this, we present Coverage-Recon, a novel coordinated image sampling algorithm that integrates online map feedback to improve reconstruction quality on-the-fly. In Coverage-Recon, the coordinated motion of drones is governed by a Quadratic Programming (QP)-based angle-aware coverage controller, which ensures multi-viewpoint image capture while enforcing safety constraints. The captured images are processed in real time by the NeuralRecon algorithm to generate an evolving 3D mesh. Mesh changes across the scene are interpreted as indicators of reconstruction uncertainty and serve as feedback to update the importance index of the coverage control as the map evolves. The effectiveness of Coverage-Recon is validated through simulation and experiments, demonstrating both qualitatively and quantitatively that incorporating online map feedback yields more complete and accurate 3D reconstructions than conventional methods. Project page: https://htnk-lab.github.io/coverage-recon/
comment: Submitted to IEEE Transactions on Control Systems Technology (under review). Project page: https://htnk-lab.github.io/coverage-recon/
MoTVLA: A Vision-Language-Action Model with Unified Fast-Slow Reasoning
Integrating visual-language instructions into visuomotor policies is gaining momentum in robot learning for enhancing open-world generalization. Despite promising advances, existing approaches face two challenges: limited language steerability when no generated reasoning is used as a condition, or significant inference latency when reasoning is incorporated.In this work, we introduce MoTVLA, a mixture-of-transformers (MoT)-based vision-language-action (VLA) model that integrates fast-slow unified reasoning with behavior policy learning. MoTVLA preserves the general intelligence of pre-trained VLMs (serving as the generalist) for tasks such as perception, scene understanding, and semantic planning, while incorporating a domain expert, a second transformer that shares knowledge with the pretrained VLM, to generate domain-specific fast reasoning (e.g., robot motion decomposition), thereby improving policy execution efficiency. By conditioning the action expert on decomposed motion instructions, MoTVLA can learn diverse behaviors and substantially improve language steerability. Extensive evaluations across natural language processing benchmarks, robotic simulation environments, and real-world experiments confirm the superiority of MoTVLA in both fast-slow reasoning and manipulation task performance.
MoMaGen: Generating Demonstrations under Soft and Hard Constraints for Multi-Step Bimanual Mobile Manipulation
Imitation learning from large-scale, diverse human demonstrations has proven effective for training robots, but collecting such data is costly and time-consuming. This challenge is amplified for multi-step bimanual mobile manipulation, where humans must teleoperate both a mobile base and two high-degree-of-freedom arms. Prior automated data generation frameworks have addressed static bimanual manipulation by augmenting a few human demonstrations in simulation, but they fall short for mobile settings due to two key challenges: (1) determining base placement to ensure reachability, and (2) positioning the camera to provide sufficient visibility for visuomotor policies. To address these issues, we introduce MoMaGen, which formulates data generation as a constrained optimization problem that enforces hard constraints (e.g., reachability) while balancing soft constraints (e.g., visibility during navigation). This formulation generalizes prior approaches and provides a principled foundation for future methods. We evaluate MoMaGen on four multi-step bimanual mobile manipulation tasks and show that it generates significantly more diverse datasets than existing methods. Leveraging this diversity, MoMaGen can train successful imitation learning policies from a single source demonstration, and these policies can be fine-tuned with as few as 40 real-world demonstrations to achieve deployment on physical robotic hardware. More details are available at our project page: momagen.github.io.
comment: Project website: momagen.github.io. The first four authors contribute equally
A Cross-Environment and Cross-Embodiment Path Planning Framework via a Conditional Diffusion Model
Path planning for a robotic system in high-dimensional cluttered environments needs to be efficient, safe, and adaptable for different environments and hardware. Conventional methods face high computation time and require extensive parameter tuning, while prior learning-based methods still fail to generalize effectively. The primary goal of this research is to develop a path planning framework capable of generalizing to unseen environments and new robotic manipulators without the need for retraining. We present GADGET (Generalizable and Adaptive Diffusion-Guided Environment-aware Trajectory generation), a diffusion-based planning model that generates joint-space trajectories conditioned on voxelized scene representations as well as start and goal configurations. A key innovation is GADGET's hybrid dual-conditioning mechanism that combines classifier-free guidance via learned scene encoding with classifier-guided Control Barrier Function (CBF) safety shaping, integrating environment awareness with real-time collision avoidance directly in the denoising process. This design supports zero-shot transfer to new environments and robotic embodiments without retraining. Experimental results show that GADGET achieves high success rates with low collision intensity in spherical-obstacle, bin-picking, and shelf environments, with CBF guidance further improving safety. Moreover, comparative evaluations indicate strong performance relative to both sampling-based and learning-based baselines. Furthermore, GADGET provides transferability across Franka Panda, Kinova Gen3 (6/7-DoF), and UR5 robots, and physical execution on a Kinova Gen3 demonstrates its ability to generate safe, collision-free trajectories in real-world settings.
comment: 20 pages, 9 figures
Safe Active Navigation and Exploration for Planetary Environments Using Proprioceptive Measurements
Legged robots can sense terrain through force interactions during locomotion, offering more reliable traversability estimates than remote sensing and serving as scouts for guiding wheeled rovers in challenging environments. However, even legged scouts face challenges when traversing highly deformable or unstable terrain. We present Safe Active Exploration for Granular Terrain (SAEGT), a navigation framework that enables legged robots to safely explore unknown granular environments using proprioceptive sensing, particularly where visual input fails to capture terrain deformability. SAEGT estimates the safe region and frontier region online from leg-terrain interactions using Gaussian Process regression for traversability assessment, with a reactive controller for real-time safe exploration and navigation. SAEGT demonstrated its ability to safely explore and navigate toward a specified goal using only proprioceptively estimated traversability in simulation.
Kinematic Analysis and Integration of Vision Algorithms for a Mobile Manipulator Employed Inside a Self-Driving Laboratory
Recent advances in robotics and autonomous systems have broadened the use of robots in laboratory settings, including automated synthesis, scalable reaction workflows, and collaborative tasks in self-driving laboratories (SDLs). This paper presents a comprehensive development of a mobile manipulator designed to assist human operators in such autonomous lab environments. Kinematic modeling of the manipulator is carried out based on the Denavit Hartenberg (DH) convention and inverse kinematics solution is determined to enable precise and adaptive manipulation capabilities. A key focus of this research is enhancing the manipulator ability to reliably grasp textured objects as a critical component of autonomous handling tasks. Advanced vision-based algorithms are implemented to perform real-time object detection and pose estimation, guiding the manipulator in dynamic grasping and following tasks. In this work, we integrate a vision method that combines feature-based detection with homography-driven pose estimation, leveraging depth information to represent an object pose as a $2$D planar projection within $3$D space. This adaptive capability enables the system to accommodate variations in object orientation and supports robust autonomous manipulation across diverse environments. By enabling autonomous experimentation and human-robot collaboration, this work contributes to the scalability and reproducibility of next-generation chemical laboratories
comment: International Journal of Intelligent Robotics and Applications 2025
Sample-Based Hybrid Mode Control: Asymptotically Optimal Switching of Algorithmic and Non-Differentiable Control Modes
This paper investigates a sample-based solution to the hybrid mode control problem across non-differentiable and algorithmic hybrid modes. Our approach reasons about a set of hybrid control modes as an integer-based optimization problem where we select what mode to apply, when to switch to another mode, and the duration for which we are in a given control mode. A sample-based variation is derived to efficiently search the integer domain for optimal solutions. We find our formulation yields strong performance guarantees that can be applied to a number of robotics-related tasks. In addition, our approach is able to synthesize complex algorithms and policies to compound behaviors and achieve challenging tasks. Last, we demonstrate the effectiveness of our approach in real-world robotic examples that require reactive switching between long-term planning and high-frequency control.
Local Guidance for Configuration-Based Multi-Agent Pathfinding
Guidance is an emerging concept that improves the empirical performance of real-time, sub-optimal multi-agent pathfinding (MAPF) methods. It offers additional information to MAPF algorithms to mitigate congestion on a global scale by considering the collective behavior of all agents across the entire workspace. This global perspective helps reduce agents' waiting times, thereby improving overall coordination efficiency. In contrast, this study explores an alternative approach: providing local guidance in the vicinity of each agent. While such localized methods involve recomputation as agents move and may appear computationally demanding, we empirically demonstrate that supplying informative spatiotemporal cues to the planner can significantly improve solution quality without exceeding a moderate time budget. When applied to LaCAM, a leading configuration-based solver, this form of guidance establishes a new performance frontier for MAPF.
comment: 10 pages
A Learning-based Model Reference Adaptive Controller Implemented on a Prosthetic Hand Wrist
The functionality and natural motion of prosthetic hands remain limited by the challenges in controlling compliant wrist mechanisms. Current control strategies often lack adaptability and incur high computational costs, which impedes real-time deployment in assistive robotics. To address this gap, this study presents a computationally efficient Neural Network (NN)-based Model Reference Adaptive Controller (MRAC) for a tendon-driven soft continuum wrist integrated with a prosthetic hand. The dynamic modeling of the wrist is formulated using Timoshenko beam theory, capturing both shear and bending deformations. The proposed NN-MRAC estimates the required tendon forces from deflection errors and minimizes deviation from a reference model through online adaptation. Simulation results demonstrate improved precision with a root mean square error (RMSE) of $6.14 \times 10^{-4}$ m and a settling time of $3.2$s. Experimental validations confirm real-time applicability, with an average RMSE of $5.66 \times 10^{-3}$ m, steady-state error of $8.05 \times 10^{-3}$ m, and settling time of $1.58$ s. These results highlight the potential of the controller to enhance motion accuracy and responsiveness in soft prosthetic systems, thereby advancing the integration of adaptive intelligent control in wearable assistive devices.
comment: International Conference on Social Robotics + AI
Convex Maneuver Planning for Spacecraft Collision Avoidance
Conjunction analysis and maneuver planning for spacecraft collision avoidance remains a manual and time-consuming process, typically involving repeated forward simulations of hand-designed maneuvers. With the growing density of satellites in low-Earth orbit (LEO), autonomy is becoming essential for efficiently evaluating and mitigating collisions. In this work, we present an algorithm to design low-thrust collision-avoidance maneuvers for short-term conjunction events. We first formulate the problem as a nonconvex quadratically-constrained quadratic program (QCQP), which we then relax into a convex semidefinite program (SDP) using Shor's relaxation. We demonstrate empirically that the relaxation is tight, which enables the recovery of globally optimal solutions to the original nonconvex problem. Our formulation produces a minimum-energy solution while ensuring a desired probability of collision at the time of closest approach. Finally, if the desired probability of collision cannot be satisfied, we relax this constraint into a penalty, yielding a minimum-risk solution. We validate our algorithm with a high-fidelity simulation of a satellite conjunction in low-Earth orbit with a simulated conjunction data message (CDM), demonstrating its effectiveness in reducing collision risk.
comment: 8 pages, 6 figures, Accepted to International Space Robotics Conference
Macroscopic EEG Reveals Discriminative Low-Frequency Oscillations in Plan-to-Grasp Visuomotor Tasks
The vision-based grasping brain network integrates visual perception with cognitive and motor processes for visuomotor tasks. While invasive recordings have successfully decoded localized neural activity related to grasp type planning and execution, macroscopic neural activation patterns captured by noninvasive electroencephalography (EEG) remain far less understood. We introduce a novel vision-based grasping platform to investigate grasp-type-specific (precision, power, no-grasp) neural activity across large-scale brain networks using EEG neuroimaging. The platform isolates grasp-specific planning from its associated execution phases in naturalistic visuomotor tasks, where the Filter-Bank Common Spatial Pattern (FBCSP) technique was designed to extract discriminative frequency-specific features within each phase. Support vector machine (SVM) classification discriminated binary (precision vs. power, grasp vs. no-grasp) and multiclass (precision vs. power vs. no-grasp) scenarios for each phase, and were compared against traditional Movement-Related Cortical Potential (MRCP) methods. Low-frequency oscillations (0.5-8 Hz) carry grasp-related information established during planning and maintained throughout execution, with consistent classification performance across both phases (75.3-77.8\%) for precision vs. power discrimination, compared to 61.1\% using MRCP. Higher-frequency activity (12-40 Hz) showed phase-dependent results with 93.3\% accuracy for grasp vs. no-grasp classification but 61.2\% for precision vs. power discrimination. Feature importance using SVM coefficients identified discriminative features within frontoparietal networks during planning and motor networks during execution. This work demonstrated the role of low-frequency oscillations in decoding grasp type during planning using noninvasive EEG.
comment: 12 pages, 8 figures, 1 table
Motion Planning and Control of an Overactuated 4-Wheel Drive with Constrained Independent Steering
This paper addresses motion planning and con- trol of an overactuated 4-wheel drive train with independent steering (4WIS) where mechanical constraints prevent the wheels from executing full 360-degree rotations (swerve). The configuration space of such a robot is constrained and contains discontinuities that affect the smoothness of the robot motion. We introduce a mathematical formulation of the steering constraints and derive discontinuity planes that partition the velocity space into regions of smooth and efficient motion. We further design the motion planner for path tracking and ob- stacle avoidance that explicitly accounts for swerve constraints and the velocity transition smoothness. The motion controller uses local feedback to generate actuation from the desired velocity, while properly handling the discontinuity crossing by temporarily stopping the motion and repositioning the wheels. We implement the proposed motion planner as an extension to ROS Navigation package and evaluate the system in simulation and on a physical robot.
comment: 7 pages, 5 figures, 3 tables, video available at https://youtu.be/8l9s2Wb_vec, To appear at IEEE 2025 International Conference on Advanced Robotics
Robust Driving QA through Metadata-Grounded Context and Task-Specific Prompts
We present a two-phase vision-language QA system for autonomous driving that answers high-level perception, prediction, and planning questions. In Phase-1, a large multimodal LLM (Qwen2.5-VL-32B) is conditioned on six-camera inputs, a short temporal window of history, and a chain-of-thought prompt with few-shot exemplars. A self-consistency ensemble (multiple sampled reasoning chains) further improves answer reliability. In Phase-2, we augment the prompt with nuScenes scene metadata (object annotations, ego-vehicle state, etc.) and category-specific question instructions (separate prompts for perception, prediction, planning tasks). In experiments on a driving QA benchmark, our approach significantly outperforms the baseline Qwen2.5 models. For example, using 5 history frames and 10-shot prompting in Phase-1 yields 65.1% overall accuracy (vs.62.61% with zero-shot); applying self-consistency raises this to 66.85%. Phase-2 achieves 67.37% overall. Notably, the system maintains 96% accuracy under severe visual corruption. These results demonstrate that carefully engineered prompts and contextual grounding can greatly enhance high-level driving QA with pretrained vision-language models.
$\nabla$-SDF: Learning Euclidean Signed Distance Functions Online with Gradient-Augmented Octree Interpolation and Neural Residual
Estimation of signed distance functions (SDFs) from point cloud data has been shown to benefit many robot autonomy capabilities, including localization, mapping, motion planning, and control. Methods that support online and large-scale SDF reconstruction tend to rely on discrete volumetric data structures, which affect the continuity and differentiability of the SDF estimates. Recently, using implicit features, neural network methods have demonstrated high-fidelity and differentiable SDF reconstruction but they tend to be less efficient, can experience catastrophic forgetting and memory limitations in large environments, and are often restricted to truncated SDFs. This work proposes $\nabla$-SDF, a hybrid method that combines an explicit prior obtained from gradient-augmented octree interpolation with an implicit neural residual. Our method achieves non-truncated (Euclidean) SDF reconstruction with computational and memory efficiency comparable to volumetric methods and differentiability and accuracy comparable to neural network methods. Extensive experiments demonstrate that \methodname{} outperforms the state of the art in terms of accuracy and efficiency, providing a scalable solution for downstream tasks in robotics and computer vision.
SHRUMS: Sensor Hallucination for Real-time Underwater Motion Planning with a Compact 3D Sonar
Autonomous navigation in 3D is a fundamental problem for autonomy. Despite major advancements in terrestrial and aerial settings due to improved range sensors including LiDAR, compact sensors with similar capabilities for underwater robots have only recently become available, in the form of 3D sonars. This paper introduces a novel underwater 3D navigation pipeline, called SHRUMS (Sensor Hallucination for Robust Underwater Motion planning with 3D Sonar). To the best of the authors' knowledge, SHRUMS is the first underwater autonomous navigation stack to integrate a 3D sonar. The proposed pipeline exhibits strong robustness while operating in complex 3D environments in spite of extremely poor visibility conditions. To accommodate the intricacies of the novel sensor data stream while achieving real-time locally optimal performance, SHRUMS introduces the concept of hallucinating sensor measurements from non-existent sensors with convenient arbitrary parameters, tailored to application specific requirements. The proposed concepts are validated with real 3D sonar sensor data, utilizing real inputs in challenging settings and local maps constructed in real-time. Field deployments validating the proposed approach in full are planned in the very near future.
comment: 8 pages, 5 figures
Underwater Dense Mapping with the First Compact 3D Sonar
In the past decade, the adoption of compact 3D range sensors, such as LiDARs, has driven the developments of robust state-estimation pipelines, making them a standard sensor for aerial, ground, and space autonomy. Unfortunately, poor propagation of electromagnetic waves underwater, has limited the visibility-independent sensing options of underwater state-estimation to acoustic range sensors, which provide 2D information including, at-best, spatially ambiguous information. This paper, to the best of our knowledge, is the first study examining the performance, capacity, and opportunities arising from the recent introduction of the first compact 3D sonar. Towards that purpose, we introduce calibration procedures for extracting the extrinsics between the 3D sonar and a camera and we provide a study on acoustic response in different surfaces and materials. Moreover, we provide novel mapping and SLAM pipelines tested in deployments in underwater cave systems and other geometrically and acoustically challenging underwater environments. Our assessment showcases the unique capacity of 3D sonars to capture consistent spatial information allowing for detailed reconstructions and localization in datasets expanding to hundreds of meters. At the same time it highlights remaining challenges related to acoustic propagation, as found also in other acoustic sensors. Datasets collected for our evaluations would be released and shared with the community to enable further research advancements.
comment: 8 pages, 12 figures
Towards Proprioceptive Terrain Mapping with Quadruped Robots for Exploration in Planetary Permanently Shadowed Regions
Permanently Shadowed Regions (PSRs) near the lunar poles are of interest for future exploration due to their potential to contain water ice and preserve geological records. Their complex, uneven terrain favors the use of legged robots, which can traverse challenging surfaces while collecting in-situ data, and have proven effective in Earth analogs, including dark caves, when equipped with onboard lighting. While exteroceptive sensors like cameras and lidars can capture terrain geometry and even semantic information, they cannot quantify its physical interaction with the robot, a capability provided by proprioceptive sensing. We propose a terrain mapping framework for quadruped robots, which estimates elevation, foot slippage, energy cost, and stability margins from internal sensing during locomotion. These metrics are incrementally integrated into a multi-layer 2.5D gridmap that reflects terrain interaction from the robot's perspective. The system is evaluated in a simulator that mimics a lunar environment, using the 21 kg quadruped robot Aliengo, showing consistent mapping performance under lunar gravity and terrain conditions.
comment: Published in the Proceedings of the International Conference on Space Robotics (iSpaRo 2025)
Actor-Free Continuous Control via Structurally Maximizable Q-Functions NeurIPS 2025
Value-based algorithms are a cornerstone of off-policy reinforcement learning due to their simplicity and training stability. However, their use has traditionally been restricted to discrete action spaces, as they rely on estimating Q-values for individual state-action pairs. In continuous action spaces, evaluating the Q-value over the entire action space becomes computationally infeasible. To address this, actor-critic methods are typically employed, where a critic is trained on off-policy data to estimate Q-values, and an actor is trained to maximize the critic's output. Despite their popularity, these methods often suffer from instability during training. In this work, we propose a purely value-based framework for continuous control that revisits structural maximization of Q-functions, introducing a set of key architectural and algorithmic choices to enable efficient and stable learning. We evaluate the proposed actor-free Q-learning approach on a range of standard simulation tasks, demonstrating performance and sample efficiency on par with state-of-the-art baselines, without the cost of learning a separate actor. Particularly, in environments with constrained action spaces, where the value functions are typically non-smooth, our method with structural maximization outperforms traditional actor-critic methods with gradient-based maximization. We have released our code at https://github.com/USC-Lira/Q3C.
comment: 39th Conference on Neural Information Processing Systems (NeurIPS 2025)
DDBot: Differentiable Physics-based Digging Robot for Unknown Granular Materials
Automating the manipulation of granular materials poses significant challenges due to complex contact dynamics, unpredictable material properties, and intricate system states. Existing approaches often fail to achieve efficiency and accuracy in such tasks. To fill the research gap, this paper studies the small-scale and high-precision granular material digging task with unknown physical properties. A new framework, named differentiable digging robot (DDBot), is proposed to manipulate granular materials, including sand and soil. Specifically, we equip DDBot with a differentiable physics-based simulator, tailored for granular material manipulation, powered by GPU-accelerated parallel computing and automatic differentiation. DDBot can perform efficient differentiable system identification and high-precision digging skill optimisation for unknown granular materials, which is enabled by a differentiable skill-to-action mapping, a task-oriented demonstration method, gradient clipping and line search-based gradient descent. Experimental results show that DDBot can efficiently (converge within 5 to 20 minutes) identify unknown granular material dynamics and optimise digging skills, with high-precision results in zero-shot real-world deployments, highlighting its practicality. Benchmark results against state-of-the-art baselines also confirm the robustness and efficiency of DDBot in such digging tasks.
comment: Accepted as a regular paper by the IEEE Transactions on Robotics
DiffVLA++: Bridging Cognitive Reasoning and End-to-End Driving through Metric-Guided Alignment
Conventional end-to-end (E2E) driving models are effective at generating physically plausible trajectories, but often fail to generalize to long-tail scenarios due to the lack of essential world knowledge to understand and reason about surrounding environments. In contrast, Vision-Language-Action (VLA) models leverage world knowledge to handle challenging cases, but their limited 3D reasoning capability can lead to physically infeasible actions. In this work we introduce DiffVLA++, an enhanced autonomous driving framework that explicitly bridges cognitive reasoning and E2E planning through metric-guided alignment. First, we build a VLA module directly generating semantically grounded driving trajectories. Second, we design an E2E module with a dense trajectory vocabulary that ensures physical feasibility. Third, and most critically, we introduce a metric-guided trajectory scorer that guides and aligns the outputs of the VLA and E2E modules, thereby integrating their complementary strengths. The experiment on the ICCV 2025 Autonomous Grand Challenge leaderboard shows that DiffVLA++ achieves EPDMS of 49.12.
RAPID Hand Prototype: Design of an Affordable, Fully-Actuated Biomimetic Hand for Dexterous Teleoperation IROS2025
This paper addresses the scarcity of affordable, fully-actuated five-fingered hands for dexterous teleoperation, which is crucial for collecting large-scale real-robot data within the "Learning from Demonstrations" paradigm. We introduce the prototype version of the RAPID Hand, the first low-cost, 20-degree-of-actuation (DoA) dexterous hand that integrates a novel anthropomorphic actuation and transmission scheme with an optimized motor layout and structural design to enhance dexterity. Specifically, the RAPID Hand features a universal phalangeal transmission scheme for the non-thumb fingers and an omnidirectional thumb actuation mechanism. Prioritizing affordability, the hand employs 3D-printed parts combined with custom gears for easier replacement and repair. We assess the RAPID Hand's performance through quantitative metrics and qualitative testing in a dexterous teleoperation system, which is evaluated on three challenging tasks: multi-finger retrieval, ladle handling, and human-like piano playing. The results indicate that the RAPID Hand's fully actuated 20-DoF design holds significant promise for dexterous teleoperation.
comment: Accepted by IROS2025
Learn2Decompose: Learning Problem Decomposition for Efficient Sequential Multi-object Manipulation Planning
We present an efficient task and motion replanning approach for sequential multi-object manipulation in dynamic environments. Conventional Task And Motion Planning (TAMP) solvers experience an exponential increase in planning time as the planning horizon and number of objects grow, limiting their applicability in real-world scenarios. To address this, we propose learning problem decompositions from demonstrations to accelerate TAMP solvers. Our approach consists of three key components: goal decomposition learning, computational distance learning, and object reduction. Goal decomposition identifies the necessary sequences of states that the system must pass through before reaching the final goal, treating them as subgoal sequences. Computational distance learning predicts the computational complexity between two states, enabling the system to identify the temporally closest subgoal from a disturbed state. Object reduction minimizes the set of active objects considered during replanning, further improving efficiency. We evaluate our approach on three benchmarks, demonstrating its effectiveness in improving replanning efficiency for sequential multi-object manipulation tasks in dynamic environments.
comment: Extension of RAL version: added PR2 Whole-body kitchen task and detailed discussion on limitations in main text; added pseudocode and robustness analysis of our approach, and formal analysis on why and when task goals are decomposable in appendix
NEBULA: Do We Evaluate Vision-Language-Action Agents Correctly?
The evaluation of Vision-Language-Action (VLA) agents is hindered by the coarse, end-task success metric that fails to provide precise skill diagnosis or measure robustness to real-world perturbations. This challenge is exacerbated by a fragmented data landscape that impedes reproducible research and the development of generalist models. To address these limitations, we introduce NEBULA, a unified ecosystem for single-arm manipulation that enables diagnostic and reproducible evaluation. NEBULA features a novel dual-axis evaluation protocol that combines fine-grained capability tests for precise skill diagnosis with systematic stress tests that measure robustness. A standardized API and a large-scale, aggregated dataset are provided to reduce fragmentation and support cross-dataset training and fair comparison. Using NEBULA, we demonstrate that top-performing VLAs struggle with key capabilities such as spatial reasoning and dynamic adaptation, which are consistently obscured by conventional end-task success metrics. By measuring both what an agent can do and when it does so reliably, NEBULA provides a practical foundation for robust, general-purpose embodied agents.
comment: Homepage: https://vulab-ai.github.io/NEBULA-Alpha/
Wonder Wins Ways: Curiosity-Driven Exploration through Multi-Agent Contextual Calibration
Autonomous exploration in complex multi-agent reinforcement learning (MARL) with sparse rewards critically depends on providing agents with effective intrinsic motivation. While artificial curiosity offers a powerful self-supervised signal, it often confuses environmental stochasticity with meaningful novelty. Moreover, existing curiosity mechanisms exhibit a uniform novelty bias, treating all unexpected observations equally. However, peer behavior novelty, which encode latent task dynamics, are often overlooked, resulting in suboptimal exploration in decentralized, communication-free MARL settings. To this end, inspired by how human children adaptively calibrate their own exploratory behaviors via observing peers, we propose a novel approach to enhance multi-agent exploration. We introduce CERMIC, a principled framework that empowers agents to robustly filter noisy surprise signals and guide exploration by dynamically calibrating their intrinsic curiosity with inferred multi-agent context. Additionally, CERMIC generates theoretically-grounded intrinsic rewards, encouraging agents to explore state transitions with high information gain. We evaluate CERMIC on benchmark suites including VMAS, Meltingpot, and SMACv2. Empirical results demonstrate that exploration with CERMIC significantly outperforms SoTA algorithms in sparse-reward environments.
Seeing through Uncertainty: Robust Task-Oriented Optimization in Visual Navigation
Visual navigation is a fundamental problem in embodied AI, yet practical deployments demand long-horizon planning capabilities to address multi-objective tasks. A major bottleneck is data scarcity: policies learned from limited data often overfit and fail to generalize OOD. Existing neural network-based agents typically increase architectural complexity that paradoxically become counterproductive in the small-sample regime. This paper introduce NeuRO, a integrated learning-to-optimize framework that tightly couples perception networks with downstream task-level robust optimization. Specifically, NeuRO addresses core difficulties in this integration: (i) it transforms noisy visual predictions under data scarcity into convex uncertainty sets using Partially Input Convex Neural Networks (PICNNs) with conformal calibration, which directly parameterize the optimization constraints; and (ii) it reformulates planning under partial observability as a robust optimization problem, enabling uncertainty-aware policies that transfer across environments. Extensive experiments on both unordered and sequential multi-object navigation tasks demonstrate that NeuRO establishes SoTA performance, particularly in generalization to unseen environments. Our work thus presents a significant advancement for developing robust, generalizable autonomous agents.
Rethink Repeatable Measures of Robot Performance with Statistical Query
For a general standardized testing algorithm designed to evaluate a specific aspect of a robot's performance, several key expectations are commonly imposed. Beyond accuracy (i.e., closeness to a typically unknown ground-truth reference) and efficiency (i.e., feasibility within acceptable testing costs and equipment constraints), one particularly important attribute is repeatability. Repeatability refers to the ability to consistently obtain the same testing outcome when similar testing algorithms are executed on the same subject robot by different stakeholders, across different times or locations. However, achieving repeatable testing has become increasingly challenging as the components involved grow more complex, intelligent, diverse, and, most importantly, stochastic. While related efforts have addressed repeatability at ethical, hardware, and procedural levels, this study focuses specifically on repeatable testing at the algorithmic level. Specifically, we target the well-adopted class of testing algorithms in standardized evaluation: statistical query (SQ) algorithms (i.e., algorithms that estimate the expected value of a bounded function over a distribution using sampled data). We propose a lightweight, parameterized, and adaptive modification applicable to any SQ routine, whether based on Monte Carlo sampling, importance sampling, or adaptive importance sampling, that makes it provably repeatable, with guaranteed bounds on both accuracy and efficiency. We demonstrate the effectiveness of the proposed approach across three representative scenarios: (i) established and widely adopted standardized testing of manipulators, (ii) emerging intelligent testing algorithms for operational risk assessment in automated vehicles, and (iii) developing use cases involving command tracking performance evaluation of humanoid robots in locomotion tasks.
Towards Versatile Humanoid Table Tennis: Unified Reinforcement Learning with Prediction Augmentation
Humanoid table tennis (TT) demands rapid perception, proactive whole-body motion, and agile footwork under strict timing -- capabilities that remain difficult for unified controllers. We propose a reinforcement learning framework that maps ball-position observations directly to whole-body joint commands for both arm striking and leg locomotion, strengthened by predictive signals and dense, physics-guided rewards. A lightweight learned predictor, fed with recent ball positions, estimates future ball states and augments the policy's observations for proactive decision-making. During training, a physics-based predictor supplies precise future states to construct dense, informative rewards that lead to effective exploration. The resulting policy attains strong performance across varied serve ranges (hit rate $\geq$ 96% and success rate $\geq$ 92%) in simulations. Ablation studies confirm that both the learned predictor and the predictive reward design are critical for end-to-end learning. Deployed zero-shot on a physical Booster T1 humanoid with 23 revolute joints, the policy produces coordinated lateral and forward-backward footwork with accurate, fast returns, suggesting a practical path toward versatile, competitive humanoid TT.
Interpretable Decision-Making for End-to-End Autonomous Driving ICCV 2025
Trustworthy AI is mandatory for the broad deployment of autonomous vehicles. Although end-to-end approaches derive control commands directly from raw data, interpreting these decisions remains challenging, especially in complex urban scenarios. This is mainly attributed to very deep neural networks with non-linear decision boundaries, making it challenging to grasp the logic behind AI-driven decisions. This paper presents a method to enhance interpretability while optimizing control commands in autonomous driving. To address this, we propose loss functions that promote the interpretability of our model by generating sparse and localized feature maps. The feature activations allow us to explain which image regions contribute to the predicted control command. We conduct comprehensive ablation studies on the feature extraction step and validate our method on the CARLA benchmarks. We also demonstrate that our approach improves interpretability, which correlates with reducing infractions, yielding a safer, high-performance driving model. Notably, our monocular, non-ensemble model surpasses the top-performing approaches from the CARLA Leaderboard by achieving lower infraction scores and the highest route completion rate, all while ensuring interpretability.
comment: Accepted to the ICCV 2025 2nd Workshop on the Challenge Of Out-of-Label Hazards in Autonomous Driving (2COOOL)
Learning to See and Act: Task-Aware View Planning for Robotic Manipulation
Recent vision-language-action (VLA) models for multi-task robotic manipulation commonly rely on static viewpoints and shared visual encoders, which limit 3D perception and cause task interference, hindering robustness and generalization. In this work, we propose Task-Aware View Planning (TAVP), a framework designed to overcome these challenges by integrating active view planning with task-specific representation learning. TAVP employs an efficient exploration policy, accelerated by a novel pseudo-environment, to actively acquire informative views. Furthermore, we introduce a Mixture-of-Experts (MoE) visual encoder to disentangle features across different tasks, boosting both representation fidelity and task generalization. By learning to see the world in a task-aware way, TAVP generates more complete and discriminative visual representations, demonstrating significantly enhanced action prediction across a wide array of manipulation challenges. Extensive experiments on RLBench tasks show that our proposed TAVP model achieves superior performance over state-of-the-art fixed-view approaches. Visual results and code are provided at: https://hcplab-sysu.github.io/TAVP.
comment: 14 pages, 8 figures, project page: https://hcplab-sysu.github.io/TAVP
Dynamic object goal pushing with mobile manipulators through model-free constrained reinforcement learning ICRA 2025
Non-prehensile pushing to move and reorient objects to a goal is a versatile loco-manipulation skill. In the real world, the object's physical properties and friction with the floor contain significant uncertainties, which makes the task challenging for a mobile manipulator. In this paper, we develop a learning-based controller for a mobile manipulator to move an unknown object to a desired position and yaw orientation through a sequence of pushing actions. The proposed controller for the robotic arm and the mobile base motion is trained using a constrained Reinforcement Learning (RL) formulation. We demonstrate its capability in experiments with a quadrupedal robot equipped with an arm. The learned policy achieves a success rate of 91.35% in simulation and at least 80% on hardware in challenging scenarios. Through our extensive hardware experiments, we show that the approach demonstrates high robustness against unknown objects of different masses, materials, sizes, and shapes. It reactively discovers the pushing location and direction, thus achieving contact-rich behavior while observing only the pose of the object. Additionally, we demonstrate the adaptive behavior of the learned policy towards preventing the object from toppling.
comment: presented at ICRA 2025, Video: https://youtu.be/wGAdPGVf9Ws?si=pi83ONWofHHqbFG0
Generation of Uncertainty-Aware Emergent Concepts in Factorized 3D Scene Graphs via Graph Neural Networks
Enabling robots to autonomously discover emergent spatial concepts (e.g., rooms) from primitive geometric observations (e.g., planar surfaces) within 3D Scene Graphs is essential for robust indoor navigation and mapping. These graphs provide a hierarchical metric-semantic representation in which such concepts are organized. To further enhance graph-SLAM performance, Factorized 3D Scene Graphs incorporate these concepts as optimization factors that constrain relative geometry and enforce global consistency. However, both stages of this process remain largely manual: concepts are typically derived using hand-crafted, concept-specific heuristics, while factors and their covariances are likewise manually designed. This reliance on manual specification limits generalization across diverse environments and scalability to new concept classes. This paper presents, for the first time, a learning-based method to generate online spatial emergent concepts as optimizable factors within a SLAM backend, reducing the need to handcraft both concept generation and the definition of their corresponding factors and covariances. In both simulated and real indoor scenarios, our approach improves complex concept detection by 20.7% and 5.3%, trajectory estimation by 19.2%, and map reconstruction by 12.3% and 3.8%, respectively, highlighting the benefits of this integration for robust and adaptive spatial understanding.
comment: Submitted to IEEE Robotics and Automation Letters (RA-L)
From Watch to Imagine: Steering Long-horizon Manipulation via Human Demonstration and Future Envisionment
Generalizing to long-horizon manipulation tasks in a zero-shot setting remains a central challenge in robotics. Current multimodal foundation based approaches, despite their capabilities, typically fail to decompose high-level commands into executable action sequences from static visual input alone. To address this challenge, we introduce Super-Mimic, a hierarchical framework that enables zero-shot robotic imitation by directly inferring procedural intent from unscripted human demonstration videos. Our framework is composed of two sequential modules. First, a Human Intent Translator (HIT) parses the input video using multimodal reasoning to produce a sequence of language-grounded subtasks. These subtasks then condition a Future Dynamics Predictor (FDP), which employs a generative model that synthesizes a physically plausible video rollout for each step. The resulting visual trajectories are dynamics-aware, explicitly modeling crucial object interactions and contact points to guide the low-level controller. We validate this approach through extensive experiments on a suite of long-horizon manipulation tasks, where Super-Mimic significantly outperforms state-of-the-art zero-shot methods by over 20%. These results establish that coupling video-driven intent parsing with prospective dynamics modeling is a highly effective strategy for developing general-purpose robotic systems.
comment: More details and videos can be found at: https://yipko.com/super-mimic
VIKI-R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning
Coordinating multiple embodied agents in dynamic environments remains a core challenge in artificial intelligence, requiring both perception-driven reasoning and scalable cooperation strategies. While recent works have leveraged large language models (LLMs) for multi-agent planning, a few have begun to explore vision-language models (VLMs) for visual reasoning. However, these VLM-based approaches remain limited in their support for diverse embodiment types. In this work, we introduce VIKI-Bench, the first hierarchical benchmark tailored for embodied multi-agent cooperation, featuring three structured levels: agent activation, task planning, and trajectory perception. VIKI-Bench includes diverse robot embodiments, multi-view visual observations, and structured supervision signals to evaluate reasoning grounded in visual inputs. To demonstrate the utility of VIKI-Bench, we propose VIKI-R, a two-stage framework that fine-tunes a pretrained vision-language model (VLM) using Chain-of-Thought annotated demonstrations, followed by reinforcement learning under multi-level reward signals. Our extensive experiments show that VIKI-R significantly outperforms baselines method across all task levels. Furthermore, we show that reinforcement learning enables the emergence of compositional cooperation patterns among heterogeneous agents. Together, VIKI-Bench and VIKI-R offer a unified testbed and method for advancing multi-agent, visual-driven cooperation in embodied AI systems.
comment: Project page: https://faceong.github.io/VIKI-R/
Neural 3D Object Reconstruction with Small-Scale Unmanned Aerial Vehicles
Small Unmanned Aerial Vehicles (UAVs) exhibit immense potential for navigating indoor and hard-to-reach areas, yet their significant constraints in payload and autonomy have largely prevented their use for complex tasks like high-quality 3-Dimensional (3D) reconstruction. To overcome this challenge, we introduce a novel system architecture that enables fully autonomous, high-fidelity 3D scanning of static objects using UAVs weighing under 100 grams. Our core innovation lies in a dual-reconstruction pipeline that creates a real-time feedback loop between data capture and flight control. A near-real-time (near-RT) process uses Structure from Motion (SfM) to generate an instantaneous pointcloud of the object. The system analyzes the model quality on the fly and dynamically adapts the UAV's trajectory to intelligently capture new images of poorly covered areas. This ensures comprehensive data acquisition. For the final, detailed output, a non-real-time (non-RT) pipeline employs a Neural Radiance Fields (NeRF)-based Neural 3D Reconstruction (N3DR) approach, fusing SfM-derived camera poses with precise Ultra Wide-Band (UWB) location data to achieve superior accuracy. We implemented and validated this architecture using Crazyflie 2.1 UAVs. Our experiments, conducted in both single- and multi-UAV configurations, conclusively show that dynamic trajectory adaptation consistently improves reconstruction quality over static flight paths. This work demonstrates a scalable and autonomous solution that unlocks the potential of miniaturized UAVs for fine-grained 3D reconstruction in constrained environments, a capability previously limited to much larger platforms.
comment: 13 pages, 16 figures, 3 tables, 45 references
VLA-Cache: Efficient Vision-Language-Action Manipulation via Adaptive Token Caching NeurIPS 2025
Vision-Language-Action (VLA) models have demonstrated strong multi-modal reasoning capabilities, enabling direct action generation from visual perception and language instructions in an end-to-end manner. However, their substantial computational cost poses a challenge for real-time robotic control, where rapid decision-making is essential. This paper introduces VLA-Cache, a training-free inference acceleration method that reduces computational overhead by adaptively caching and reusing static visual tokens across frames. Exploiting the temporal continuity in robotic manipulation, VLA-Cache identifies minimally changed tokens between adjacent frames and reuses their cached key-value representations, thereby circumventing redundant computations. Additionally, to maintain action precision, VLA-Cache selectively re-computes task-relevant tokens that are environmentally sensitive, ensuring the fidelity of critical visual information. To further optimize efficiency, we introduce a layer adaptive token reusing strategy that dynamically adjusts the reuse ratio based on attention concentration across decoder layers, prioritizing critical tokens for recomputation. Extensive experiments on two simulation platforms (LIBERO and SIMPLER) and a real-world robotic system demonstrate that VLA-Cache achieves up to 1.7x speedup in CUDA latency and a 15% increase in control frequency, with negligible loss on task success rate. The code and videos can be found at our project page: https://vla-cache.github.io.
comment: Accepted to NeurIPS 2025
Time Reversal Symmetry for Efficient Robotic Manipulations in Deep Reinforcement Learning NeurIPS 2025
Symmetry is pervasive in robotics and has been widely exploited to improve sample efficiency in deep reinforcement learning (DRL). However, existing approaches primarily focus on spatial symmetries, such as reflection, rotation, and translation, while largely neglecting temporal symmetries. To address this gap, we explore time reversal symmetry, a form of temporal symmetry commonly found in robotics tasks such as door opening and closing. We propose Time Reversal symmetry enhanced Deep Reinforcement Learning (TR-DRL), a framework that combines trajectory reversal augmentation and time reversal guided reward shaping to efficiently solve temporally symmetric tasks. Our method generates reversed transitions from fully reversible transitions, identified by a proposed dynamics-consistent filter, to augment the training data. For partially reversible transitions, we apply reward shaping to guide learning, according to successful trajectories from the reversed task. Extensive experiments on the Robosuite and MetaWorld benchmarks demonstrate that TR-DRL is effective in both single-task and multi-task settings, achieving higher sample efficiency and stronger final performance compared to baseline methods.
comment: Accepted in NeurIPS 2025
IR2: Implicit Rendezvous for Robotic Exploration Teams under Sparse Intermittent Connectivity
Information sharing is critical in time-sensitive and realistic multi-robot exploration, especially for smaller robotic teams in large-scale environments where connectivity may be sparse and intermittent. Existing methods often overlook such communication constraints by assuming unrealistic global connectivity. Other works account for communication constraints (by maintaining close proximity or line of sight during information exchange), but are often inefficient. For instance, preplanned rendezvous approaches typically involve unnecessary detours resulting from poorly timed rendezvous, while pursuit-based approaches often result in short-sighted decisions due to their greedy nature. We present IR2, a deep reinforcement learning approach to information sharing for multi-robot exploration. Leveraging attention-based neural networks trained via reinforcement and curriculum learning, IR2 allows robots to effectively reason about the longer-term trade-offs between disconnecting for solo exploration and reconnecting for information sharing. In addition, we propose a hierarchical graph formulation to maintain a sparse yet informative graph, enabling our approach to scale to large-scale environments. We present simulation results in three large-scale Gazebo environments, which show that our approach yields 6.6-34.1% shorter exploration paths when compared to state-of-the-art baselines, and lastly deploy our learned policy on hardware. Our simulation training and testing code is available at https://ir2-explore.github.io.
comment: \c{opyright} 20XX IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models
As large language models (LLMs) evolve, their integration with 3D spatial data (3D-LLMs) has seen rapid progress, offering unprecedented capabilities for understanding and interacting with physical spaces. This survey provides a comprehensive overview of the methodologies enabling LLMs to process, understand, and generate 3D data. Highlighting the unique advantages of LLMs, such as in-context learning, step-by-step reasoning, open-vocabulary capabilities, and extensive world knowledge, we underscore their potential to significantly advance spatial comprehension and interaction within embodied Artificial Intelligence (AI) systems. Our investigation spans various 3D data representations, from point clouds to Neural Radiance Fields (NeRFs). It examines their integration with LLMs for tasks such as 3D scene understanding, captioning, question-answering, and dialogue, as well as LLM-based agents for spatial reasoning, planning, and navigation. The paper also includes a brief review of other methods that integrate 3D and language. The meta-analysis presented in this paper reveals significant progress yet underscores the necessity for novel approaches to harness the full potential of 3D-LLMs. Hence, with this paper, we aim to chart a course for future research that explores and expands the capabilities of 3D-LLMs in understanding and interacting with the complex 3D world. To support this survey, we have established a project page where papers related to our topic are organized and listed: https://github.com/ActiveVisionLab/Awesome-LLM-3D.
comment: 2nd version update to Jun.2025
Distilling LLM Prior to Flow Model for Generalizable Agent's Imagination in Object Goal Navigation
The Object Goal Navigation (ObjectNav) task challenges agents to locate a specified object in an unseen environment by imagining unobserved regions of the scene. Prior approaches rely on deterministic and discriminative models to complete semantic maps, overlooking the inherent uncertainty in indoor layouts and limiting their ability to generalize to unseen environments. In this work, we propose GOAL, a generative flow-based framework that models the semantic distribution of indoor environments by bridging observed regions with LLM-enriched full-scene semantic maps. During training, spatial priors inferred from large language models (LLMs) are encoded as two-dimensional Gaussian fields and injected into target maps, distilling rich contextual knowledge into the flow model and enabling more generalizable completions. Extensive experiments demonstrate that GOAL achieves state-of-the-art performance on MP3D and Gibson, and shows strong generalization in transfer settings to HM3D. Codes and pretrained models are available at https://github.com/Badi-Li/GOAL.
RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning
Existing end-to-end autonomous driving (AD) algorithms typically follow the Imitation Learning (IL) paradigm, which faces challenges such as causal confusion and an open-loop gap. In this work, we propose RAD, a 3DGS-based closed-loop Reinforcement Learning (RL) framework for end-to-end Autonomous Driving. By leveraging 3DGS techniques, we construct a photorealistic digital replica of the real physical world, enabling the AD policy to extensively explore the state space and learn to handle out-of-distribution scenarios through large-scale trial and error. To enhance safety, we design specialized rewards to guide the policy in effectively responding to safety-critical events and understanding real-world causal relationships. To better align with human driving behavior, we incorporate IL into RL training as a regularization term. We introduce a closed-loop evaluation benchmark consisting of diverse, previously unseen 3DGS environments. Compared to IL-based methods, RAD achieves stronger performance in most closed-loop metrics, particularly exhibiting a 3x lower collision rate. Abundant closed-loop results are presented in the supplementary material. Code is available at https://github.com/hustvl/RAD for facilitating future research.
comment: Code: https://github.com/hustvl/RAD
SAMPO:Scale-wise Autoregression with Motion PrOmpt for generative world models
World models allow agents to simulate the consequences of actions in imagined environments for planning, control, and long-horizon decision-making. However, existing autoregressive world models struggle with visually coherent predictions due to disrupted spatial structure, inefficient decoding, and inadequate motion modeling. In response, we propose \textbf{S}cale-wise \textbf{A}utoregression with \textbf{M}otion \textbf{P}r\textbf{O}mpt (\textbf{SAMPO}), a hybrid framework that combines visual autoregressive modeling for intra-frame generation with causal modeling for next-frame generation. Specifically, SAMPO integrates temporal causal decoding with bidirectional spatial attention, which preserves spatial locality and supports parallel decoding within each scale. This design significantly enhances both temporal consistency and rollout efficiency. To further improve dynamic scene understanding, we devise an asymmetric multi-scale tokenizer that preserves spatial details in observed frames and extracts compact dynamic representations for future frames, optimizing both memory usage and model performance. Additionally, we introduce a trajectory-aware motion prompt module that injects spatiotemporal cues about object and robot trajectories, focusing attention on dynamic regions and improving temporal consistency and physical realism. Extensive experiments show that SAMPO achieves competitive performance in action-conditioned video prediction and model-based control, improving generation quality with 4.4$\times$ faster inference. We also evaluate SAMPO's zero-shot generalization and scaling behavior, demonstrating its ability to generalize to unseen tasks and benefit from larger model sizes.
comment: 22 pages,15 figures
LLM-RG: Referential Grounding in Outdoor Scenarios using Large Language Models IROS 2025
Referential grounding in outdoor driving scenes is challenging due to large scene variability, many visually similar objects, and dynamic elements that complicate resolving natural-language references (e.g., "the black car on the right"). We propose LLM-RG, a hybrid pipeline that combines off-the-shelf vision-language models for fine-grained attribute extraction with large language models for symbolic reasoning. LLM-RG processes an image and a free-form referring expression by using an LLM to extract relevant object types and attributes, detecting candidate regions, generating rich visual descriptors with a VLM, and then combining these descriptors with spatial metadata into natural-language prompts that are input to an LLM for chain-of-thought reasoning to identify the referent's bounding box. Evaluated on the Talk2Car benchmark, LLM-RG yields substantial gains over both LLM and VLM-based baselines. Additionally, our ablations show that adding 3D spatial cues further improves grounding. Our results demonstrate the complementary strengths of VLMs and LLMs, applied in a zero-shot manner, for robust outdoor referential grounding.
comment: Human-aware Embodied AI Workshop @ IROS 2025
Towards foundational LiDAR world models with efficient latent flow matching NeurIPS 2025
LiDAR-based world models offer more structured and geometry-aware representations than their image-based counterparts. However, existing LiDAR world models are narrowly trained; each model excels only in the domain for which it was built. Can we develop LiDAR world models that exhibit strong transferability across multiple domains? We conduct the first systematic domain transfer study across three demanding scenarios: (i) outdoor to indoor generalization, (ii) sparse-beam & dense-beam adaptation, and (iii) non-semantic to semantic transfer. Given different amounts of fine-tuning data, our experiments show that a single pre-trained model can achieve up to 11% absolute improvement (83% relative) over training from scratch and outperforms training from scratch in 30/36 of our comparisons. This transferability of dynamic learning significantly reduces the reliance on manually annotated data for semantic occupancy forecasting: our method exceed the previous semantic occupancy forecasting models with only 5% of the labeled training data required by prior models. We also observed inefficiencies of current LiDAR world models, mainly through their under-compression of LiDAR data and inefficient training objectives. To address this, we propose a latent conditional flow matching (CFM)-based frameworks that achieves state-of-the-art reconstruction accuracy using only half the training data and a compression ratio 6 times higher than that of prior methods. Our model achieves SOTA performance on future-trajectory-conditioned semantic occupancy forecasting while being 23x more computationally efficient (a 28x FPS speedup); and achieves SOTA performance on semantic occupancy forecasting while being 2x more computationally efficient (a 1.1x FPS speedup).
comment: Accepted to the Thirty-Ninth Conference on Neural Information Processing Systems (NeurIPS 2025), 25 pages, 13 figures
Improving planning and MBRL with temporally-extended actions NeurIPS 2025
Continuous time systems are often modeled using discrete time dynamics but this requires a small simulation step to maintain accuracy. In turn, this requires a large planning horizon which leads to computationally demanding planning problems and reduced performance. Previous work in model-free reinforcement learning has partially addressed this issue using action repeats where a policy is learned to determine a discrete action duration. Instead we propose to control the continuous decision timescale directly by using temporally-extended actions and letting the planner treat the duration of the action as an additional optimization variable along with the standard action variables. This additional structure has multiple advantages. It speeds up simulation time of trajectories and, importantly, it allows for deep horizon search in terms of primitive actions while using a shallow search depth in the planner. In addition, in the model-based reinforcement learning (MBRL) setting, it reduces compounding errors from model learning and improves training time for models. We show that this idea is effective and that the range for action durations can be automatically selected using a multi-armed bandit formulation and integrated into the MBRL framework. An extensive experimental evaluation both in planning and in MBRL, shows that our approach yields faster planning, better solutions, and that it enables solutions to problems that are not solved in the standard formulation.
comment: NeurIPS 2025. For project website, see https://pecey.github.io/MBRL-with-TEA/
SPiDR: A Simple Approach for Zero-Shot Safety in Sim-to-Real Transfer
Deploying reinforcement learning (RL) safely in the real world is challenging, as policies trained in simulators must face the inevitable sim-to-real gap. Robust safe RL techniques are provably safe, however difficult to scale, while domain randomization is more practical yet prone to unsafe behaviors. We address this gap by proposing SPiDR, short for Sim-to-real via Pessimistic Domain Randomization -- a scalable algorithm with provable guarantees for safe sim-to-real transfer. SPiDR uses domain randomization to incorporate the uncertainty about the sim-to-real gap into the safety constraints, making it versatile and highly compatible with existing training pipelines. Through extensive experiments on sim-to-sim benchmarks and two distinct real-world robotic platforms, we demonstrate that SPiDR effectively ensures safety despite the sim-to-real gap while maintaining strong performance.
Multiagent Systems
LightMem: Lightweight and Efficient Memory-Augmented Generation
Despite their remarkable capabilities, Large Language Models (LLMs) struggle to effectively leverage historical interaction information in dynamic and complex environments. Memory systems enable LLMs to move beyond stateless interactions by introducing persistent information storage, retrieval, and utilization mechanisms. However, existing memory systems often introduce substantial time and computational overhead. To this end, we introduce a new memory system called LightMem, which strikes a balance between the performance and efficiency of memory systems. Inspired by the Atkinson-Shiffrin model of human memory, LightMem organizes memory into three complementary stages. First, cognition-inspired sensory memory rapidly filters irrelevant information through lightweight compression and groups information according to their topics. Next, topic-aware short-term memory consolidates these topic-based groups, organizing and summarizing content for more structured access. Finally, long-term memory with sleep-time update employs an offline procedure that decouples consolidation from online inference. Experiments on LongMemEval with GPT and Qwen backbones show that LightMem outperforms strong baselines in accuracy (up to 10.9% gains) while reducing token usage by up to 117x, API calls by up to 159x, and runtime by over 12x. The code is available at https://github.com/zjunlp/LightMem.
comment: Work in progress
Computational Foundations for Strategic Coopetition: Formalizing Interdependence and Complementarity
Modern socio-technical systems are characterized by strategic coopetition where actors simultaneously cooperate to create value and compete to capture it. While conceptual modeling languages like i* provide rich qualitative representations of strategic dependencies, they lack mechanisms for quantitative analysis of dynamic trade-offs. Conversely, classical game theory offers mathematical rigor but strips away contextual richness. This technical report bridges this gap by developing computational foundations that formalize two critical dimensions of coopetition: interdependence and complementarity. We ground interdependence in i* structural dependency analysis, translating depender-dependee-dependum relationships into quantitative interdependence coefficients through a structured translation framework. We formalize complementarity following Brandenburger and Nalebuff's Added Value concept, modeling synergistic value creation with validated parameterization. We integrate structural dependencies with bargaining power in value appropriation and introduce a game-theoretic formulation where Nash Equilibrium incorporates structural interdependence. Validation combines comprehensive experimental testing across power and logarithmic value function specifications, demonstrating functional form robustness, with empirical application to the Samsung-Sony S-LCD joint venture (2004-2011), where logarithmic specifications achieve superior empirical fit (validation score 45/60) while power functions provide theoretical tractability. This technical report serves as the foundational reference for a coordinated research program examining strategic coopetition in requirements engineering and multi-agent systems, with companion work addressing trust dynamics, team production, and reciprocity mechanisms.
comment: 36 pages, 7 figures
Fetch.ai: An Architecture for Modern Multi-Agent Systems
Recent surges in LLM-driven intelligent systems largely overlook decades of foundational multi-agent systems (MAS) research, resulting in frameworks with critical limitations such as centralization and inadequate trust and communication protocols. This paper introduces the Fetch.ai architecture, an industrial-strength platform designed to bridge this gap by facilitating the integration of classical MAS principles with modern AI capabilities. We present a novel, multi-layered solution built on a decentralized foundation of on-chain blockchain services for verifiable identity, discovery, and transactions. This is complemented by a comprehensive development framework for creating secure, interoperable agents, a cloud-based platform for deployment, and an intelligent orchestration layer where an agent-native LLM translates high-level human goals into complex, multi-agent workflows. We demonstrate the deployed nature of this system through a decentralized logistics use case where autonomous agents dynamically discover, negotiate, and transact with one another securely. Ultimately, the Fetch.ai stack provides a principled architecture for moving beyond current agent implementations towards open, collaborative, and economically sustainable multi-agent ecosystems.
comment: 26 pages, figures, code examples
Socialized Learning and Emergent Behaviors in Multi-Agent Systems based on Multimodal Large Language Models
This search introduces the Multimodal Socialized Learning Framework (M-S2L), designed to foster emergent social intelligence in AI agents by integrating Multimodal Large Language Models (M-LLMs) with social learning mechanisms. The framework equips agents with multimodal perception (vision and text) and structured action capabilities, enabling physical manipulation and grounded multimodal communication (e.g., text with visual pointers). M-S2L combines direct reinforcement learning with two novel social learning pathways: multimodal observational learning and communication-driven learning from feedback, augmented by an episodic memory system for long-term social context. We evaluate M-S2L in a Collaborative Assembly Environment (CAE), where agent teams must construct complex devices from ambiguous blueprints under informational asymmetry. Across tasks of increasing complexity, M-S2L agents consistently outperform Text-Only and No-Social-Learning baselines in Task Completion Rate and Time to Completion, particularly in dynamic problem-solving scenarios. Ablation studies confirm the necessity of both multimodality and socialized learning. Our analysis reveals the emergence of efficient communication protocols integrating visual pointers with concise text, alongside rapid role specialization leading to stable labor division. Qualitative case studies demonstrate agents' abilities for shared awareness, dynamic re-planning, and adaptive problem-solving, suggesting a nascent form of machine social cognition. These findings indicate that integrating multimodal perception with explicit social learning is critical for developing human-like collaborative intelligence in multi-agent systems.
Food4All: A Multi-Agent Framework for Real-time Free Food Discovery with Integrated Nutritional Metadata
Food insecurity remains a persistent public health emergency in the United States, tightly interwoven with chronic disease, mental illness, and opioid misuse. Yet despite the existence of thousands of food banks and pantries, access remains fragmented: 1) current retrieval systems depend on static directories or generic search engines, which provide incomplete and geographically irrelevant results; 2) LLM-based chatbots offer only vague nutritional suggestions and fail to adapt to real-world constraints such as time, mobility, and transportation; and 3) existing food recommendation systems optimize for culinary diversity but overlook survival-critical needs of food-insecure populations, including immediate proximity, verified availability, and contextual barriers. These limitations risk leaving the most vulnerable individuals, those experiencing homelessness, addiction, or digital illiteracy, unable to access urgently needed resources. To address this, we introduce Food4All, the first multi-agent framework explicitly designed for real-time, context-aware free food retrieval. Food4All unifies three innovations: 1) heterogeneous data aggregation across official databases, community platforms, and social media to provide a continuously updated pool of food resources; 2) a lightweight reinforcement learning algorithm trained on curated cases to optimize for both geographic accessibility and nutritional correctness; and 3) an online feedback loop that dynamically adapts retrieval policies to evolving user needs. By bridging information acquisition, semantic analysis, and decision support, Food4All delivers nutritionally annotated and guidance at the point of need. This framework establishes an urgent step toward scalable, equitable, and intelligent systems that directly support populations facing food insecurity and its compounding health risks.
Distributed Allocation and Resource Scheduling Algorithms Resilient to Link Failure
Distributed resource allocation (DRA) is fundamental to modern networked systems, spanning applications from economic dispatch in smart grids to CPU scheduling in data centers. Conventional DRA approaches require reliable communication, yet real-world networks frequently suffer from link failures, packet drops, and communication delays due to environmental conditions, network congestion, and security threats. We introduce a novel resilient DRA algorithm that addresses these critical challenges, and our main contributions are as follows: (1) guaranteed constraint feasibility at all times, ensuring resource-demand balance even during algorithm termination or network disruption; (2) robust convergence despite sector-bound nonlinearities at nodes/links, accommodating practical constraints like quantization and saturation; and (3) optimal performance under merely uniformly-connected networks, eliminating the need for continuous connectivity. Unlike existing approaches that require persistent network connectivity and provide only asymptotic feasibility, our graph-theoretic solution leverages network percolation theory to maintain performance during intermittent disconnections. This makes it particularly valuable for mobile multi-agent systems where nodes frequently move out of communication range. Theoretical analysis and simulations demonstrate that our algorithm converges to optimal solutions despite heterogeneous time delays and substantial link failures, significantly advancing the reliability of distributed resource allocation in practical network environments.
comment: European Journal of Control
From Agent Simulation to Social Simulator: A Comprehensive Review (Part 1)
This is the first part of the comprehensive review, focusing on the historical development of Agent-Based Modeling (ABM) and its classic cases. It begins by discussing the development history and design principles of Agent-Based Modeling (ABM), helping readers understand the significant challenges that traditional physical simulation methods face in the social domain. Then, it provides a detailed introduction to foundational models for simulating social systems, including individual models, environmental models, and rule-based models. Finally, it presents classic cases of social simulation, covering three types: thought experiments, mechanism exploration, and parallel optimization.
The Emergence of Complex Behavior in Large-Scale Ecological Environments
We explore how physical scale and population size shape the emergence of complex behaviors in open-ended ecological environments. In our setting, agents are unsupervised and have no explicit rewards or learning objectives but instead evolve over time according to reproduction, mutation, and natural selection. As they act, agents also shape their environment and the population around them in an ongoing dynamic ecology. Our goal is not to optimize a single high-performance policy, but instead to examine how behaviors emerge and evolve across large populations due to natural competition and environmental pressures. In an effort to discover how complex behaviors naturally emerge, we conduct experiments in large-scale worlds that reach populations of more than 60,000 individual agents, each with their own evolved neural network policy. We identify various emergent behaviors such as long-range resource extraction, vision-based foraging, and predation that arise under competitive and survival pressures. We examine how sensing modalities and environmental scale affect the emergence of these behaviors, finding that some appear only in sufficiently large environments and populations, with larger scales increasing behavioral stability and consistency. While there is a rich history of research in evolutionary settings, our scaling results provide promising new directions to explore ecology as an instrument of machine learning in an era of abundant computational resources. Experimental code is available at https://github.com/jbejjani2022/ecological-emergent-behavior.
comment: 18 pages, 11 figures, 6 tables, experiment code available at https://github.com/jbejjani2022/ecological-emergent-behavior
Adaptive Coopetition: Leveraging Coarse Verifier Signals for Resilient Multi-Agent LLM Reasoning
Inference-time computation is a critical yet challenging paradigm for enhancing the reasoning performance of large language models (LLMs). While existing strategies improve reasoning stability and consistency, they suffer from notable limitations: self-correction often reinforces the model's initial biases, and Multi-Agent Collaboration (MAC) often fails due to the lack of efficient coordination mechanisms, leading to collective errors. Although high-performing verifiers can detect reasoning errors, making them reliable requires substantial training. To address these challenges, we introduce a novel inference-time framework, Adaptive Coopetition (AdCo), in which LLM agents utilize an adaptive, UCB-based "coopetition" mechanism. At each round, agents leverage coarse verifier signals to determine whether to collaborate or compete, and iteratively refine their reasoning based on peer feedback. Without relying on high-performance verifiers, our adaptive strategy achieves significant performance gains on mathematical reasoning benchmarks, yielding a 20% relative improvement over baselines on the more challenging dataset. Our approach remains robust and consistent in terms of accuracy under different sample sizes and configurations. This adaptive, signal-guided "coopetition" framework enhances reasoning robustness by leveraging both model knowledge diversity and reasoning trace measures, while also promoting uncertainty-driven exploration, especially when participants have comparable capabilities. From this perspective, our work offers a fresh lens on inference-time computation and paves the way for more resilient multi-agent LLM systems. Our code is available at: https://github.com/AdCo-Research/adaptive-coopetition.
comment: 13 pages, 8 figures
Plural Voices, Single Agent: Towards Inclusive AI in Multi-User Domestic Spaces
Domestic AI agents faces ethical, autonomy, and inclusion challenges, particularly for overlooked groups like children, elderly, and Neurodivergent users. We present the Plural Voices Model (PVM), a novel single-agent framework that dynamically negotiates multi-user needs through real-time value alignment, leveraging diverse public datasets on mental health, eldercare, education, and moral reasoning. Using human+synthetic curriculum design with fairness-aware scenarios and ethical enhancements, PVM identifies core values, conflicts, and accessibility requirements to inform inclusive principles. Our privacy-focused prototype features adaptive safety scaffolds, tailored interactions (e.g., step-by-step guidance for Neurodivergent users, simple wording for children), and equitable conflict resolution. In preliminary evaluations, PVM outperforms multi-agent baselines in compliance (76% vs. 70%), fairness (90% vs. 85%), safety-violation rate (0% vs. 7%), and latency. Design innovations, including video guidance, autonomy sliders, family hubs, and adaptive safety dashboards, demonstrate new directions for ethical and inclusive domestic AI, for building user-centered agentic systems in plural domestic contexts. Our Codes and Model are been open sourced, available for reproduction: https://github.com/zade90/Agora
Sync or Sink: Bounds on Algorithmic Collective Action with Noise and Multiple Groups NeurIPS
Collective action against algorithmic systems, which enables groups to promote their own interests, is poised to grow. Hence, there will be growth in the size and the number of distinct collectives. Currently, there is no formal analysis of how coordination challenges within a collective can impact downstream outcomes, or how multiple collectives may affect each other's success. In this work, we aim to provide guarantees on the success of collective action in the presence of both coordination noise and multiple groups. Our insight is that data generated by either multiple collectives or by coordination noise can be viewed as originating from multiple data distributions. Using this framing, we derive bounds on the success of collective action. We conduct experiments to study the effects of noise on collective action. We find that sufficiently high levels of noise can reduce the success of collective action. In certain scenarios, large noise can sink a collective success rate from $100\%$ to just under $60\%$. We identify potential trade-offs between collective size and coordination noise; for example, a collective that is twice as big but with four times more noise experiencing worse outcomes than the smaller, more coordinated one. This work highlights the importance of understanding nuanced dynamics of strategic behavior in algorithmic systems.
comment: Full Version of NeurIPS workshop paper
A Digital Twin Framework for Decision-Support and Optimization of EV Charging Infrastructure in Localized Urban Systems
As Electric Vehicle (EV) adoption accelerates in urban environments, optimizing charging infrastructure is vital for balancing user satisfaction, energy efficiency, and financial viability. This study advances beyond static models by proposing a digital twin framework that integrates agent-based decision support with embedded optimization to dynamically simulate EV charging behaviors, infrastructure layouts, and policy responses across scenarios. Applied to a localized urban site (a university campus) in Hanoi, Vietnam, the model evaluates operational policies, EV station configurations, and renewable energy sources. The interactive dashboard enables seasonal analysis, revealing a 20% drop in solar efficiency from October to March, with wind power contributing under 5% of demand, highlighting the need for adaptive energy management. Simulations show that real-time notifications of newly available charging slots improve user satisfaction, while gasoline bans and idle fees enhance slot turnover with minimal added complexity. Embedded metaheuristic optimization identifies near-optimal mixes of fast (30kW) and standard (11kW) solar-powered chargers, balancing energy performance, profitability, and demand with high computational efficiency. This digital twin provides a flexible, computation-driven platform for EV infrastructure planning, with a transferable, modular design that enables seamless scaling from localized to city-wide urban contexts.
comment: 35 pages, 11 figures. Submitted to Computers, Environment and Urban Systems (CEUS)
ATL*AS: An Automata-Theoretic Approach and Tool for the Verification of Strategic Abilities in Multi-Agent Systems
We present two novel symbolic algorithms for model checking the Alternating-time Temporal Logic ATL*, over both the infinite-trace and the finite-trace semantics. In particular, for infinite traces we design a novel symbolic reduction to parity games. We implement both methods in the ATL*AS model checker and evaluate it using synthetic benchmarks as well as a cybersecurity scenario. Our results demonstrate that the symbolic approach significantly outperforms the explicit-state representation and we find that our parity-game-based algorithm offers a more scalable and efficient solution for infinite-trace verification, outperforming previously available tools. Our results also confirm that finite-trace model checking yields substantial performance benefits over infinite-trace verification. As such, we provide a comprehensive toolset for verifying multiagent systems against specifications in ATL*.
Disaster Management in the Era of Agentic AI Systems: A Vision for Collective Human-Machine Intelligence for Augmented Resilience
The escalating frequency and severity of disasters routinely overwhelm traditional response capabilities, exposing critical vulnerability in disaster management. Current practices are hindered by fragmented data streams, siloed technologies, resource constraints, and the erosion of institutional memory, which collectively impede timely and effective decision making. This study introduces Disaster Copilot, a vision for a multi-agent artificial intelligence system designed to overcome these systemic challenges by unifying specialized AI tools within a collaborative framework. The proposed architecture utilizes a central orchestrator to coordinate diverse sub-agents, each specializing in critical domains such as predictive risk analytics, situational awareness, and impact assessment. By integrating multi-modal data, the system delivers a holistic, real-time operational picture and serve as the essential AI backbone required to advance Disaster Digital Twins from passive models to active, intelligent environments. Furthermore, it ensures functionality in resource-limited environments through on-device orchestration and incorporates mechanisms to capture institutional knowledge, mitigating the impact of staff turnover. We detail the system architecture and propose a three-phased roadmap emphasizing the parallel growth of technology, organizational capacity, and human-AI teaming. Disaster Copilot offers a transformative vision, fostering collective human-machine intelligence to build more adaptive, data-driven and resilient communities.
Stop Reducing Responsibility in LLM-Powered Multi-Agent Systems to Local Alignment
LLM-powered Multi-Agent Systems (LLM-MAS) unlock new potentials in distributed reasoning, collaboration, and task generalization but also introduce additional risks due to unguaranteed agreement, cascading uncertainty, and adversarial vulnerabilities. We argue that ensuring responsible behavior in such systems requires a paradigm shift: from local, superficial agent-level alignment to global, systemic agreement. We conceptualize responsibility not as a static constraint but as a lifecycle-wide property encompassing agreement, uncertainty, and security, each requiring the complementary integration of subjective human-centered values and objective verifiability. Furthermore, a dual-perspective governance framework that combines interdisciplinary design with human-AI collaborative oversight is essential for tracing and ensuring responsibility throughout the lifecycle of LLM-MAS. Our position views LLM-MAS not as loose collections of agents, but as unified, dynamic socio-technical systems that demand principled mechanisms to support each dimension of responsibility and enable ethically aligned, verifiably coherent, and resilient behavior for sustained, system-wide agreement.
comment: Updated manuscript of our previous version (arXiv:2502.01714). Under review
Static Sandboxes Are Inadequate: Modeling Societal Complexity Requires Open-Ended Co-Evolution in LLM-Based Multi-Agent Simulations
What if artificial agents could not just communicate, but also evolve, adapt, and reshape their worlds in ways we cannot fully predict? With llm now powering multi-agent systems and social simulations, we are witnessing new possibilities for modeling open-ended, ever-changing environments. Yet, most current simulations remain constrained within static sandboxes, characterized by predefined tasks, limited dynamics, and rigid evaluation criteria. These limitations prevent them from capturing the complexity of real-world societies. In this paper, we argue that static, task-specific benchmarks are fundamentally inadequate and must be rethought. We critically review emerging architectures that blend llm with multi-agent dynamics, highlight key hurdles such as balancing stability and diversity, evaluating unexpected behaviors, and scaling to greater complexity, and introduce a fresh taxonomy for this rapidly evolving field. Finally, we present a research roadmap centered on open-endedness, continuous co-evolution, and the development of resilient, socially aligned AI ecosystems. We call on the community to move beyond static paradigms and help shape the next generation of adaptive, socially-aware multi-agent simulations.
comment: Preprint; feedback welcome
Counterfactual Effect Decomposition in Multi-Agent Sequential Decision Making ICML 2025
We address the challenge of explaining counterfactual outcomes in multi-agent Markov decision processes. In particular, we aim to explain the total counterfactual effect of an agent's action on the outcome of a realized scenario through its influence on the environment dynamics and the agents' behavior. To achieve this, we introduce a novel causal explanation formula that decomposes the counterfactual effect by attributing to each agent and state variable a score reflecting their respective contributions to the effect. First, we show that the total counterfactual effect of an agent's action can be decomposed into two components: one measuring the effect that propagates through all subsequent agents' actions and another related to the effect that propagates through the state transitions. Building on recent advancements in causal contribution analysis, we further decompose these two effects as follows. For the former, we consider agent-specific effects -- a causal concept that quantifies the counterfactual effect of an agent's action that propagates through a subset of agents. Based on this notion, we use Shapley value to attribute the effect to individual agents. For the latter, we consider the concept of structure-preserving interventions and attribute the effect to state variables based on their "intrinsic" contributions. Through extensive experimentation, we demonstrate the interpretability of our approach in a Gridworld environment with LLM-assisted agents and a sepsis management simulator.
comment: ICML 2025
Multi-Agent Collaboration via Evolving Orchestration NeurIPS 2025
Large language models (LLMs) have achieved remarkable results across diverse downstream tasks, but their monolithic nature restricts scalability and efficiency in complex problem-solving. While recent research explores multi-agent collaboration among LLMs, most approaches rely on static organizational structures that struggle to adapt as task complexity and agent numbers grow, resulting in coordination overhead and inefficiencies. To this end, we propose a puppeteer-style paradigm for LLM-based multi-agent collaboration, where a centralized orchestrator ("puppeteer") dynamically directs agents ("puppets") in response to evolving task states. This orchestrator is trained via reinforcement learning to adaptively sequence and prioritize agents, enabling flexible and evolvable collective reasoning. Experiments on closed- and open-domain scenarios show that this method achieves superior performance with reduced computational costs. Analyses further reveal that the key improvements consistently stem from the emergence of more compact, cyclic reasoning structures under the orchestrator's evolution. Our code is available at https://github.com/OpenBMB/ChatDev/tree/puppeteer.
comment: accepted at NeurIPS 2025
DrunkAgent: Stealthy Memory Corruption in LLM-Powered Recommender Agents
Large language model (LLM)-powered agents are increasingly used in recommender systems (RSs) to achieve personalized behavior modeling, where the memory mechanism plays a pivotal role in enabling the agents to autonomously explore, learn and self-evolve from real-world interactions. However, this very mechanism, serving as a contextual repository, inherently exposes an attack surface for potential adversarial manipulations. Despite its central role, the robustness of agentic RSs in the face of such threats remains largely underexplored. Previous works suffer from semantic mismatches or rely on static embeddings or pre-defined prompts, all of which are not designed for dynamic systems, especially for dynamic memory states of LLM agents. This challenge is exacerbated by the black-box nature of commercial recommenders. To tackle the above problems, in this paper, we present the first systematic investigation of memory-based vulnerabilities in LLM-powered recommender agents, revealing their security limitations and guiding efforts to strengthen system resilience and trustworthiness. Specifically, we propose a novel black-box attack framework named DrunkAgent. DrunkAgent crafts semantically meaningful adversarial textual triggers for target item promotions and introduces a series of strategies to maximize the trigger effect by corrupting the memory updates during the interactions. The triggers and strategies are optimized on a surrogate model, enabling DrunkAgent transferable and stealthy. Extensive experiments on real-world datasets across diverse agentic RSs, including collaborative filtering, retrieval augmentation and sequential recommendations, demonstrate the generalizability, transferability and stealthiness of DrunkAgent.
Balancing Act: Prioritization Strategies for LLM-Designed Restless Bandit Rewards
LLMs are increasingly used to design reward functions based on human preferences in Reinforcement Learning (RL). We focus on LLM-designed rewards for Restless Multi-Armed Bandits, a framework for allocating limited resources among agents. In applications such as public health, this approach empowers grassroots health workers to tailor automated allocation decisions to community needs. In the presence of multiple agents, altering the reward function based on human preferences can impact subpopulations very differently, leading to complex tradeoffs and a multi-objective resource allocation problem. We are the first to present a principled method termed Social Choice Language Model for dealing with these tradeoffs for LLM-designed rewards for multiagent planners in general and restless bandits in particular. The novel part of our model is a transparent and configurable selection component, called an adjudicator, external to the LLM that controls complex tradeoffs via a user-selected social welfare function. Our experiments demonstrate that our model reliably selects more effective, aligned, and balanced reward functions compared to purely LLM-based approaches.
Systems and Control (CS)
Lyapunov-Aware Quantum-Inspired Reinforcement Learning for Continuous-Time Vehicle Control: A Feasibility Study
This paper presents a novel Lyapunov-Based Quantum Reinforcement Learning (LQRL) framework that integrates quantum policy optimization with Lyapunov stability analysis for continuous-time vehicle control. The proposed approach combines the representational power of variational quantum circuits (VQCs) with a stability-aware policy gradient mechanism to ensure asymptotic convergence and safe decision-making under dynamic environments. The vehicle longitudinal control problem was formulated as a continuous-state reinforcement learning task, where the quantum policy network generates control actions subject to Lyapunov stability constraints. Simulation experiments were conducted in a closed-loop adaptive cruise control scenario using a quantum-inspired policy trained under stability feedback. The results demonstrate that the LQRL framework successfully embeds Lyapunov stability verification into quantum policy learning, enabling interpretable and stability-aware control performance. Although transient overshoot and Lyapunov divergence were observed under aggressive acceleration, the system maintained bounded state evolution, validating the feasibility of integrating safety guarantees within quantum reinforcement learning architectures. The proposed framework provides a foundational step toward provably safe quantum control in autonomous systems and hybrid quantum-classical optimization domains.
comment: 7 pages, 4 figures, 20 equations, 3 appendices, 4 tables
MADR: MPC-guided Adversarial DeepReach
Hamilton-Jacobi (HJ) Reachability offers a framework for generating safe value functions and policies in the face of adversarial disturbance, but is limited by the curse of dimensionality. Physics-informed deep learning is able to overcome this infeasibility, but itself suffers from slow and inaccurate convergence, primarily due to weak PDE gradients and the complexity of self-supervised learning. A few works, recently, have demonstrated that enriching the self-supervision process with regular supervision (based on the nature of the optimal control problem), greatly accelerates convergence and solution quality, however, these have been limited to single player problems and simple games. In this work, we introduce MADR: MPC-guided Adversarial DeepReach, a general framework to robustly approximate the two-player, zero-sum differential game value function. In doing so, MADR yields the corresponding optimal strategies for both players in zero-sum games as well as safe policies for worst-case robustness. We test MADR on a multitude of high-dimensional simulated and real robotic agents with varying dynamics and games, finding that our approach significantly out-performs state-of-the-art baselines in simulation and produces impressive results in hardware.
comment: 8 pages, under review
$\ell_1$-Based Adaptive Identification under Quantized Observations with Applications
Quantized observations are ubiquitous in a wide range of applications across engineering and the social sciences, and algorithms based on the $\ell_1$-norm are well recognized for their robustness to outliers compared with their $\ell_2$-based counterparts. Nevertheless, adaptive identification methods that integrate quantized observations with $\ell_1$-optimization remain largely underexplored. Motivated by this gap, we develop a novel $\ell_1$-based adaptive identification algorithm specifically designed for quantized observations. Without relying on the traditional persistent excitation condition, we establish global convergence of the parameter estimates to their true values and show that the average regret asymptotically vanishes as the data size increases. Finally, we apply our new identification algorithm to a judicial sentencing problem using real-world data, which demonstrates its superior performance and practical significance.
A Note on Optimal Distributed State Estimation for Linear Time-Varying Systems
In this technical note, we prove that the ODEFTC algorithm constitutes the first optimal distributed state estimator for continuous-time linear time-varying systems subject to stochastic disturbances. Particularly, we formally show that it is able to asymptotically recover the performance, in terms of error covariance of the estimates at each node, of the centralized Kalman-Bucy filter, which is known to be the optimal filter for the considered class of systems. Moreover, we provide a simple sufficient value for the consensus gain to guarantee the stability of the distributed estimator.
comment: This work has been submitted to the IEEE for possible publication
Quantifying Security for Networked Control Systems: A Review
Networked Control Systems (NCSs) are integral in critical infrastructures such as power grids, transportation networks, and production systems. Ensuring the resilient operation of these large-scale NCSs against cyber-attacks is crucial for societal well-being. Over the past two decades, extensive research has been focused on developing metrics to quantify the vulnerabilities of NCSs against attacks. Once the vulnerabilities are quantified, mitigation strategies can be employed to enhance system resilience. This article provides a comprehensive overview of methods developed for assessing NCS vulnerabilities and the corresponding mitigation strategies. Furthermore, we emphasize the importance of probabilistic risk metrics to model vulnerabilities under adversaries with imperfect process knowledge. The article concludes by outlining promising directions for future research.
comment: Journal submission
PIRA: Pan-CDN Intra-video Resource Adaptation for Short Video Streaming
In large scale short video platforms, CDN resource selection plays a critical role in maintaining Quality of Experience (QoE) while controlling escalating traffic costs. To better understand this phenomenon, we conduct in the wild network measurements during video playback in a production short video system. The results reveal that CDNs delivering higher average QoE often come at greater financial cost, yet their connection quality fluctuates even within a single video underscoring a fundamental and dynamic trade off between QoE and cost. However, the problem of sustaining high QoE under cost constraints remains insufficiently investigated in the context of CDN selection for short video streaming. To address this, we propose PIRA, a dynamic resource selection algorithm that optimizes QoE and cost in real time during video playback. PIRA formally integrating QoE and cost by a mathematical model, and introduce a intra video control theoretic CDN resource selection approach which can balance QoE and cost under network dynamics. To reduce the computation overheads, PIRA employs state space pruning and adaptive parameter adjustment to efficiently solve the high dimensional optimization problem. In large scale production experiments involving 450,000 users over two weeks, PIRA outperforms the production baseline, achieving a 2.1% reduction in start up delay, 15.2% shorter rebuffering time, and 10% lower average unit traffic cost, demonstrating its effectiveness in balancing user experience and financial cost at scale.
Designing trajectories in the Earth-Moon system: a Levenberg-Marquardt approach
Trajectory design in cislunar space under a High-Fidelity Ephemeris Model (HFEM) is pursued through a nonlinear optimization perspective anchored on the transition of solutions from lower fidelity models, namely the Circular Restricted Three-Body Problem (CR3BP). The optimization problem is posed in the likeness of a multiple-shooting approach, aiming for segment-to-segment continuity while tracking proximity to the original CR3BP structures. The analysis of various formulations leads to the selection of an unconstrained least-squares problem for further investigation. The nonlinear optimization problem is convexified and the use of the Levenberg-Marquardt algorithm, as an alternative to the minimum-norm update equation found in most literature, is investigated for its control over the update step and inherent robustness. Additional techniques such as adaptive weighting are employed to further consolidate the behavior of the proposed algorithm in challenging scenarios. Numerical trials evaluate the adequacy of the methodology presented and compare it to the minimum-norm baseline over various application cases, including the generation of quasi-periodic trajectories and orbital transfers between them. The proposed approach is found to outperform the baseline in applications where the initial guess is poor and the ease of including proximity constraints provides benefits in control over the shape of the converged solution.
comment: Preprint submitted to Advances in Space Research
Sliding-Mode Control Strategies for PMSM speed control: A Comprehensive Review, Taxonomy and Research Gaps
Permanent Magnet Synchronous Motors (PMSMs) are widely employed in high-performance drive systems due to their high efficiency, power density, and precise dynamic behavior. However, nonlinearities, load disturbances, and parameter uncertainties present persistent challenges to control. Sliding-Mode Control (SMC) remains one of the most reliable strategies for high-performance PMSM drives. Yet, the rapid proliferation of adaptive, fractional-order, and intelligent variants has fragmented recent literature. This paper presents a comprehensive review and taxonomy of SMC-based PMSM speed-control methods published between 2020 and 2025. More than 200 studies are systematically analyzed and classified according to control order, surface design, disturbance-observer integration, optimization approach, and intelligent augmentation. Trends in publication activity, dominant hybrid structures, and application domains are quantitatively summarized. The review reveals a clear evolution from conventional discontinuous SMC toward adaptive, higher-order, and data-driven frameworks that mitigate chattering while preserving robustness. Persistent research gaps are identified in hardware validation, energy-efficiency assessment, and real-time tuning strategies. The taxonomy and critical synthesis provided herein establish a coherent reference for researchers and form the conceptual foundation for the companion paper (Part II), which delivers a unified benchmark and comparative simulation study of representative SMC designs.
comment: 30 Pages, 7 Fugures, 3 Tables
MPC-based motion planning for non-holonomic systems in non-convex domains
Motivated by the application of using model predictive control (MPC) for motion planning of autonomous mobile robots, a form of output tracking MPC for non-holonomic systems and with non-convex constraints is studied. Although the advantages of using MPC for motion planning have been demonstrated in several papers, in most of the available fundamental literature on output tracking MPC it is assumed, often implicitly, that the model is holonomic and generally the state or output constraints must be convex. Thus, in application-oriented publications, empirical results dominate and the topic of proving completeness, in particular under which assumptions the target is always reached, has received comparatively little attention. To address this gap, we present a novel MPC formulation that guarantees convergence to the desired target under realistic assumptions, which can be verified in relevant real-world scenarios.
comment: Preprint of ECC 2025 submission
MMRHP: A Miniature Mixed-Reality HIL Platform for Auditable Closed-Loop Evaluation
Validation of autonomous driving systems requires a trade-off between test fidelity, cost, and scalability. While miniaturized hardware-in-the-loop (HIL) platforms have emerged as a promising solution, a systematic framework supporting rigorous quantitative analysis is generally lacking, limiting their value as scientific evaluation tools. To address this challenge, we propose MMRHP, a miniature mixed-reality HIL platform that elevates miniaturized testing from functional demonstration to rigorous, reproducible quantitative analysis. The core contributions are threefold. First, we propose a systematic three-phase testing process oriented toward the Safety of the Intended Functionality(SOTIF)standard, providing actionable guidance for identifying the performance limits and triggering conditions of otherwise correctly functioning systems. Second, we design and implement a HIL platform centered around a unified spatiotemporal measurement core to support this process, ensuring consistent and traceable quantification of physical motion and system timing. Finally, we demonstrate the effectiveness of this solution through comprehensive experiments. The platform itself was first validated, achieving a spatial accuracy of 10.27 mm RMSE and a stable closed-loop latency baseline of approximately 45 ms. Subsequently, an in-depth Autoware case study leveraged this validated platform to quantify its performance baseline and identify a critical performance cliff at an injected latency of 40 ms. This work shows that a structured process, combined with a platform offering a unified spatio-temporal benchmark, enables reproducible, interpretable, and quantitative closed-loop evaluation of autonomous driving systems.
Coverage-Recon: Coordinated Multi-Drone Image Sampling with Online Map Feedback
This article addresses collaborative 3D map reconstruction using multiple drones. Achieving high-quality reconstruction requires capturing images of keypoints within the target scene from diverse viewing angles, and coverage control offers an effective framework to meet this requirement. Meanwhile, recent advances in real-time 3D reconstruction algorithms make it possible to render an evolving map during flight, enabling immediate feedback to guide drone motion. Building on this, we present Coverage-Recon, a novel coordinated image sampling algorithm that integrates online map feedback to improve reconstruction quality on-the-fly. In Coverage-Recon, the coordinated motion of drones is governed by a Quadratic Programming (QP)-based angle-aware coverage controller, which ensures multi-viewpoint image capture while enforcing safety constraints. The captured images are processed in real time by the NeuralRecon algorithm to generate an evolving 3D mesh. Mesh changes across the scene are interpreted as indicators of reconstruction uncertainty and serve as feedback to update the importance index of the coverage control as the map evolves. The effectiveness of Coverage-Recon is validated through simulation and experiments, demonstrating both qualitatively and quantitatively that incorporating online map feedback yields more complete and accurate 3D reconstructions than conventional methods. Project page: https://htnk-lab.github.io/coverage-recon/
comment: Submitted to IEEE Transactions on Control Systems Technology (under review). Project page: https://htnk-lab.github.io/coverage-recon/
Explicit Reformulation of Discrete Distributionally Robust Optimization Problems
Distributionally robust optimization (DRO) is an effective framework for controlling real-world systems with various uncertainties, typically modeled using distributional uncertainty balls. However, DRO problems often involve infinitely many inequality constraints, rendering exact solutions computationally expensive. In this study, we propose a discrete DRO (DDRO) method that significantly simplifies the problem by reducing it to a single trivial constraint. Specifically, the proposed method utilizes two types of distributional uncertainty balls to reformulate the DDRO problem into a single-layer smooth convex program, significantly improving tractability. Furthermore, we provide practical guidance for selecting the appropriate ball sizes. The original DDRO problem is further reformulated into two optimization problems: one minimizing the mean and standard deviation, and the other minimizing the conditional value at risk (CVaR). These formulations account for the choice of ball sizes, thereby enhancing the practical applicability of the method. The proposed method was applied to a distributionally robust patrol-agent design problem, identifying a Pareto front in which the mean and standard deviation of the mean hitting time varied by up to 3% and 14%, respectively, while achieving a CVaR reduction of up to 13%.
comment: 15 pages, 4 figures, This paper is submitted to a journal for possible publication
Distributed Allocation and Resource Scheduling Algorithms Resilient to Link Failure
Distributed resource allocation (DRA) is fundamental to modern networked systems, spanning applications from economic dispatch in smart grids to CPU scheduling in data centers. Conventional DRA approaches require reliable communication, yet real-world networks frequently suffer from link failures, packet drops, and communication delays due to environmental conditions, network congestion, and security threats. We introduce a novel resilient DRA algorithm that addresses these critical challenges, and our main contributions are as follows: (1) guaranteed constraint feasibility at all times, ensuring resource-demand balance even during algorithm termination or network disruption; (2) robust convergence despite sector-bound nonlinearities at nodes/links, accommodating practical constraints like quantization and saturation; and (3) optimal performance under merely uniformly-connected networks, eliminating the need for continuous connectivity. Unlike existing approaches that require persistent network connectivity and provide only asymptotic feasibility, our graph-theoretic solution leverages network percolation theory to maintain performance during intermittent disconnections. This makes it particularly valuable for mobile multi-agent systems where nodes frequently move out of communication range. Theoretical analysis and simulations demonstrate that our algorithm converges to optimal solutions despite heterogeneous time delays and substantial link failures, significantly advancing the reliability of distributed resource allocation in practical network environments.
comment: European Journal of Control
Brute-force search and Warshall algorithms for matrix-weighted graphs
Although research on the control of networked systems has grown considerably, graph-theoretic and algorithmic studies on matrix-weighted graphs remain limited. To bridge this gap in the literature, this work introduces two algorithms-the brute-force search and the Warshall algorithm-for determining connectedness and clustering in undirected matrix-weighted graphs. The proposed algorithms, which are derived from a sufficient condition for connectedness, emphasize a key distinction between matrix-weighted and scalar-weighted graphs. While the existence of a path between two vertices guarantees connectedness in scalar-weighted graphs, connectedness in matrix-weighted graphs is a collective contribution of all paths joining the two vertices. Proofs of correctness and numerical examples are provided to illustrate and demonstrate the effectiveness of the algorithms.
comment: 20 pages, 6 figures, preprint
Urban Air Mobility: A Review of Recent Advances in Communication, Management, and Sustainability
Urban Air Mobility (UAM) offers a transformative approach to addressing urban congestion, improving accessibility, and advancing environmental sustainability. Rapid progress has emerged in three tightly linked domains since 2020: (1) Communication, where dynamic spectrum allocation and low-altitude channel characterization support reliable air-ground data exchange; (2) UAM management, with novel air-traffic control concepts for dense, largely autonomous urban airspace; and (3) Sustainability, driven by energy-efficient propulsion, integrated charging infrastructure, and holistic environmental assessment. This paper reviews and synthesizes the latest research across these areas, compares the state-of-the-art solutions, and outlines the technological and infrastructural milestones that are critical to realizing a scalable, sustainable UAM ecosystem.
comment: This work has been accepted by the 2025 International Conference on Cyber-physical Social Intelligence (CPSI 2025)
Harmonic Cancellation in Multi-Electrolyzer P2H Plants via Phasor-Modulated Production Scheduling
Thyristor rectifiers (TRs) are cost-effective power supplies for hydrogen electrolyzers (ELZs) but introduce harmonic distortion that may violate grid codes. This letter proposes a self-governing harmonic mitigation strategy through coordinated operation of multiple ELZs in large power-to-hydrogen (P2H) plants. First, the harmonic model of TR-powered ELZs is derived, revealing a natural harmonic cancellation mechanism among them. Based on this, a system-level operation scheme based on phasor modulation is developed and integrated into plant scheduling. Case studies demonstrate that the proposed method reduces harmonic currents by 21.2%-39.7% and ensures grid-code compliance, with only a 0.25% loss in hydrogen output, while increasing total revenue by over 21\% compared to production-oriented strategies.
comment: This work has been submitted to the IEEE for possible publication
Wisdom of Crowds Effects under Antagonistic Interactions and Correlated Opinions
This paper investigates the wisdom of crowds of linear opinion dynamics models evolving on signed networks. Conditions are given under which models such as the DeGroot, Friedkin-Johnsen (FJ) and concatenated FJ models improve or undermine collective wisdom. The extension to dependent initial opinions is also presented, highlighting how the correlation structure influences the feasibility and geometry of the wisdom-improving regions.
A Configurable Simulation Framework for Safety Assessment of Vulnerable Road Users
Ensuring the safety of vulnerable road users (VRUs), including pedestrians, cyclists, electric scooter riders, and motorcyclists, remains a major challenge for advanced driver assistance systems (ADAS) and connected and automated vehicles (CAV) technologies. Real-world VRU tests are expensive and sometimes cannot capture or repeat rare and hazardous events. In this paper, we present a lightweight, configurable simulation framework that follows European New Car Assessment Program (Euro NCAP) VRU testing protocols. A rule-based finite-state machine (FSM) is developed as a motion planner to provide vehicle automation during the VRU interaction. We also integrate ego-vehicle perception and idealized Vehicle-to-Everything (V2X) awareness to demonstrate safety margins in different scenarios. This work provides an extensible platform for rapid and repeatable VRU safety validation, paving the way for broader case-study deployment in diverse, user-defined settings, which will be essential for building a more VRU-friendly and sustainable intelligent transportation system.
comment: This work has been accepted by the 2025 International Conference on Cyber-physical Social Intelligence (CPSI 2025)
Time Domain Differential Equation Based Fault Location Identification in Mixed Overhead-Underground Power Distribution Systems
This paper proposes a time-domain fault location identification method for mixed overhead-underground power distribution systems that can handle challenging fault scenarios such as sub-cycle faults, arcing faults and evolving faults. The proposed method is formulated based on differential equations of the system and accounts for the peculiarities of power distribution systems with distributed generations. It considers the presence of loads, multi-phase laterals and sub-laterals, heterogenous overhead and underground lines, and infeeds and remote-end fault current contributions of distributed generations. It utilizes data collected by limited number of measuring devices installed in modern power distribution systems to systematically eliminate possible multiple fault location estimations to provide a single correct estimation of the actual location of the fault. The performance of the proposed method is demonstrated using IEEE 34-node test system.
A Learning-based Model Reference Adaptive Controller Implemented on a Prosthetic Hand Wrist
The functionality and natural motion of prosthetic hands remain limited by the challenges in controlling compliant wrist mechanisms. Current control strategies often lack adaptability and incur high computational costs, which impedes real-time deployment in assistive robotics. To address this gap, this study presents a computationally efficient Neural Network (NN)-based Model Reference Adaptive Controller (MRAC) for a tendon-driven soft continuum wrist integrated with a prosthetic hand. The dynamic modeling of the wrist is formulated using Timoshenko beam theory, capturing both shear and bending deformations. The proposed NN-MRAC estimates the required tendon forces from deflection errors and minimizes deviation from a reference model through online adaptation. Simulation results demonstrate improved precision with a root mean square error (RMSE) of $6.14 \times 10^{-4}$ m and a settling time of $3.2$s. Experimental validations confirm real-time applicability, with an average RMSE of $5.66 \times 10^{-3}$ m, steady-state error of $8.05 \times 10^{-3}$ m, and settling time of $1.58$ s. These results highlight the potential of the controller to enhance motion accuracy and responsiveness in soft prosthetic systems, thereby advancing the integration of adaptive intelligent control in wearable assistive devices.
comment: International Conference on Social Robotics + AI
Graph Analysis to Fully Automate Fault Location Identification in Power Distribution Systems
This paper proposes graph analysis methods to fully automate the fault location identification task in power distribution systems. The proposed methods take basic unordered data from power distribution systems as input, including branch parameters, load values, and the location of measuring devices. The proposed data preparation and analysis methods automatically identify the system's topology and extract essential information, such as faulted paths, structures, loading of laterals and sublaterals, and estimate the fault location accordingly. The proposed graph analysis methods do not require complex node and branch numbering processes or renumbering following changes in the system topology. The proposed methods eliminate the need for human intervention at any step of the fault location identification process. They are scalable and applicable to systems of any size. The performance of the proposed algorithm is demonstrated using the IEEE 34-bus distribution test system.
Convex Maneuver Planning for Spacecraft Collision Avoidance
Conjunction analysis and maneuver planning for spacecraft collision avoidance remains a manual and time-consuming process, typically involving repeated forward simulations of hand-designed maneuvers. With the growing density of satellites in low-Earth orbit (LEO), autonomy is becoming essential for efficiently evaluating and mitigating collisions. In this work, we present an algorithm to design low-thrust collision-avoidance maneuvers for short-term conjunction events. We first formulate the problem as a nonconvex quadratically-constrained quadratic program (QCQP), which we then relax into a convex semidefinite program (SDP) using Shor's relaxation. We demonstrate empirically that the relaxation is tight, which enables the recovery of globally optimal solutions to the original nonconvex problem. Our formulation produces a minimum-energy solution while ensuring a desired probability of collision at the time of closest approach. Finally, if the desired probability of collision cannot be satisfied, we relax this constraint into a penalty, yielding a minimum-risk solution. We validate our algorithm with a high-fidelity simulation of a satellite conjunction in low-Earth orbit with a simulated conjunction data message (CDM), demonstrating its effectiveness in reducing collision risk.
comment: 8 pages, 6 figures, Accepted to International Space Robotics Conference
Motion Planning and Control of an Overactuated 4-Wheel Drive with Constrained Independent Steering
This paper addresses motion planning and con- trol of an overactuated 4-wheel drive train with independent steering (4WIS) where mechanical constraints prevent the wheels from executing full 360-degree rotations (swerve). The configuration space of such a robot is constrained and contains discontinuities that affect the smoothness of the robot motion. We introduce a mathematical formulation of the steering constraints and derive discontinuity planes that partition the velocity space into regions of smooth and efficient motion. We further design the motion planner for path tracking and ob- stacle avoidance that explicitly accounts for swerve constraints and the velocity transition smoothness. The motion controller uses local feedback to generate actuation from the desired velocity, while properly handling the discontinuity crossing by temporarily stopping the motion and repositioning the wheels. We implement the proposed motion planner as an extension to ROS Navigation package and evaluate the system in simulation and on a physical robot.
comment: 7 pages, 5 figures, 3 tables, video available at https://youtu.be/8l9s2Wb_vec, To appear at IEEE 2025 International Conference on Advanced Robotics
Extreme value distributions of peak loads for non-residential customer segments SC
Electrical grid congestion is a growing challenge in Europe, driving the need for accurate prediction of load, particularly of peak load. Non-time-resolved models of peak load offer the advantages of simplicity and compactness, and among them, Velander's formula (VF) is a traditional method that has been used for decades. Moreover, VF can be adapted into a quantile VF, which learns a truncated cumulative distribution function of peak load based on electricity consumption. This paper proposes a mathematical model based on extreme value theory to characterize the probability distribution of peak load for large non-residential customers. The model underpins the quantile VF as demonstrated through multiple quantile regression and reduces its representation to just four parameters without sacrificing predictive performance. Moreover, using maximum likelihood estimation and the likelihood ratio test, we validate that the probability distribution of peak load of analysed groups belongs to the heavy-tailed Fr\'echet class.
comment: Submitted to Power Systems Computation Conference (PSCC) 2026
Extending Resource Constrained Project Scheduling to Mega-Projects with Model-Based Systems Engineering & Hetero-functional Graph Theory
Within the project management context, project scheduling serves as an indispensable component, functioning as a fundamental tool for planning, monitoring, controlling, and managing projects more broadly. Although the resource-constrained project scheduling problem (RCPSP) lies at the core of project management activities, it remains largely disconnected from the broader literature on model-based systems engineering (MBSE), thereby limiting its integration into the design and management of complex systems. The original contribution of this paper is twofold. First, the paper seeks to reconcile the RCPSP with the broader literature and vocabulary of model-based systems engineering and hetero-functional graph theory (HFGT). A concrete translation pipeline from an activity-on-node network to a SysML activity diagram, and then to an operand net is constructed. Using this representation, it specializes the hetero-functional network minimum-cost flow (HFNMCF) formulation to the RCPSP context as a systematic means of HFGT for quantitative analysis and proves that the RCPSP is recoverable as a special case of a broader model. Secondly, on an illustrative instance with renewable and non-renewable operands, the specialized HFNMCF, while producing similar schedules, yields explicit explanations of the project states that enable richer monitoring and control. Overall, the framework preserves the strengths of the classical RCPSP while accommodating real-world constraints and enterprise-level decision processes encountered in large, complex megaprojects.
Active Cooling Device: A Flexible, Lab-Scale Experimental Unit to Develop Spatio-Temporal Temperature Control Strategies
We present an experimental unit that realizes the ``multi-input, multi-output manifold'' thermal management technology proposed by Lamarre & Raymond (2023). The proposed setup can be used for experiments aimed at controlling spatiotemporal temperature distribution. Temperature control is achieved by impinging coolant fluid jets, leveraging a manifold of channels targeted to the surface. The direction of the fluid is controlled by shifting the role of channels between inputs, outputs, or closing them. Files associated with this work include Computer-Aided Design (CAD) STEP files, Gerber files to manufacture a Printed Circuit Board (PCB), and a Graphical User Interface (GUI) written in Python. We provide a step-by-step guide to assemble the experimental setup. We also provide instructions to interact with the setup through the GUI, which allows for real-time tracking of sample temperature and flow rates per flow control device. Additionally, we provide examples of usage of the setup, including system characterization with step response, Proportional-Integral-Derivative performance tracking, and disturbance rejection in a coupled system. Extending the application is accessible through the files provided in the open repository associated with this work. The active cooling device presents a safe, flexible, and complete design, allowing for lab-scale assessment of the performance of custom temperature control strategies using enclosed impinging jets.
Through-the-Earth Magnetic Induction Communication and Networking: A Comprehensive Survey
Magnetic induction (MI) communication (MIC) has emerged as a promising candidate for underground communication networks due to its excellent penetration capabilities. Integration with Space-Air-Ground-Underground (SAGUI) networks in next-generation mobile communication systems requires a well-defined network architecture. A recent discovery in MIC research, MI fast fading, remains in its early stages and presents unique challenges. This paper provides a comprehensive survey on through-the-earth (TTE) MIC, covering MI applications, channel modeling, point-to-point MIC design, relay techniques, network frameworks, and emerging technologies. We compare various MIC applications to highlight TTE-specific challenges and review the principles of channel modeling, addressing both MI slow fading and MI fast fading, along with its potential impact on existing MIC theories. We conduct a fine-grained decomposition of MI channel power gain into four distinct physical parameters, and propose a novel geometric model to analyze MI fast fading. We also summarize MI relay techniques, examine crosstalk effects in relay and high-density networks, and explore key research tasks within the OSI framework for a holistic MI network protocol in SAGUI. To bridge the gaps identified, we propose a MIC framework that supports TCP/IP and Linux, enabling full implementation of existing and emerging MIC solutions. This framework empowers researchers to leverage Linux resources and deep learning platforms for accelerated development of MIC in SAGUI networks. Remaining research challenges, open issues, and promising novel techniques are further identified to advance MIC research.
comment: This work has been accepted by the IEEE Communications Surveys & Tutorials (COMST) for publication. The final published version will be available on IEEE Xplore
PowerChain: A Verifiable Agentic AI System for Automating Distribution Grid Analyses
Rapid electrification and decarbonization are increasing the complexity of distribution grid (DG) operation and planning, necessitating advanced computational analyses to ensure reliability and resilience. These analyses depend on disparate workflows comprising complex models, function calls, and data pipelines that require substantial expert knowledge and remain difficult to automate. Workforce and budget constraints further limit utilities' ability to apply such analyses at scale. To address this gap, we build an agentic system PowerChain, which is capable of autonomously performing complex grid analyses. Existing agentic AI systems are typically developed in a bottom-up manner with customized context for predefined analysis tasks; therefore, they do not generalize to tasks that the agent has never seen. In comparison, to generalize to unseen DG analysis tasks, PowerChain dynamically generates structured context by leveraging supervisory signals from self-contained power systems tools (e.g., GridLAB-D) and an optimized set of expert-annotated and verified reasoning trajectories. For complex DG tasks defined in natural language, empirical results on real utility data demonstrate that PowerChain achieves up to a 144/% improvement in performance over baselines.
Rethink Repeatable Measures of Robot Performance with Statistical Query
For a general standardized testing algorithm designed to evaluate a specific aspect of a robot's performance, several key expectations are commonly imposed. Beyond accuracy (i.e., closeness to a typically unknown ground-truth reference) and efficiency (i.e., feasibility within acceptable testing costs and equipment constraints), one particularly important attribute is repeatability. Repeatability refers to the ability to consistently obtain the same testing outcome when similar testing algorithms are executed on the same subject robot by different stakeholders, across different times or locations. However, achieving repeatable testing has become increasingly challenging as the components involved grow more complex, intelligent, diverse, and, most importantly, stochastic. While related efforts have addressed repeatability at ethical, hardware, and procedural levels, this study focuses specifically on repeatable testing at the algorithmic level. Specifically, we target the well-adopted class of testing algorithms in standardized evaluation: statistical query (SQ) algorithms (i.e., algorithms that estimate the expected value of a bounded function over a distribution using sampled data). We propose a lightweight, parameterized, and adaptive modification applicable to any SQ routine, whether based on Monte Carlo sampling, importance sampling, or adaptive importance sampling, that makes it provably repeatable, with guaranteed bounds on both accuracy and efficiency. We demonstrate the effectiveness of the proposed approach across three representative scenarios: (i) established and widely adopted standardized testing of manipulators, (ii) emerging intelligent testing algorithms for operational risk assessment in automated vehicles, and (iii) developing use cases involving command tracking performance evaluation of humanoid robots in locomotion tasks.
One-Pass Learning via Bridging Orthogonal Gradient Descent and Recursive Least-Squares
While large machine learning models have shown remarkable performance in various domains, their training typically requires iterating for many passes over the training data. However, due to computational and memory constraints and potential privacy concerns, storing and accessing all the data is impractical in many real-world scenarios where the data arrives in a stream. In this paper, we investigate the problem of one-pass learning, in which a model is trained on sequentially arriving data without retraining on previous datapoints. Motivated by the demonstrated effectiveness of overparameterized models and the phenomenon of benign overfitting, we propose Orthogonal Recursive Fitting (ORFit), an algorithm for one-pass learning which seeks to perfectly fit each new datapoint while minimally altering the predictions on previous datapoints. ORFit updates the parameters in a direction orthogonal to past gradients, similar to orthogonal gradient descent (OGD) in continual learning. We show that, interestingly, ORFit's update leads to an operation similar to the recursive least-squares (RLS) algorithm in adaptive filtering but with significantly improved memory and computational efficiency, i.e., linear, instead of quadratic, in the number of parameters. To further reduce memory usage, we leverage the structure of the streaming data via an incremental principal component analysis (IPCA). We show that using the principal components is minimax optimal, i.e., it minimizes the worst-case forgetting of previous predictions for unknown future updates. Further, we prove that, for overparameterized linear models, the parameter vector obtained by ORFit matches what the standard multi-pass stochastic gradient descent (SGD) would converge to. Finally, we extend our results to the nonlinear setting for highly overparameterized models, relevant for deep learning.
comment: Journal extension of v1: Y. Min, K, Ahn, N. Azizan, "One-Pass Learning via Bridging Orthogonal Gradient Descent and Recursive Least-Squares," IEEE Conference on Decision and Control, 2022
Nondeterminism-Aware Optimistic Verification for Floating-Point Neural Networks
Neural networks increasingly run on hardware outside the user's control (cloud GPUs, inference marketplaces). Yet ML-as-a-Service reveals little about what actually ran or whether returned outputs faithfully reflect the intended inputs. Users lack recourse against service downgrades (model swaps, quantization, graph rewrites, or discrepancies like altered ad embeddings). Verifying outputs is hard because floating-point(FP) execution on heterogeneous accelerators is inherently nondeterministic. Existing approaches are either impractical for real FP neural networks or reintroduce vendor trust. We present NAO: a Nondeterministic tolerance Aware Optimistic verification protocol that accepts outputs within principled operator-level acceptance regions rather than requiring bitwise equality. NAO combines two error models: (i) sound per-operator IEEE-754 worst-case bounds and (ii) tight empirical percentile profiles calibrated across hardware. Discrepancies trigger a Merkle-anchored, threshold-guided dispute game that recursively partitions the computation graph until one operator remains, where adjudication reduces to a lightweight theoretical-bound check or a small honest-majority vote against empirical thresholds. Unchallenged results finalize after a challenge window, without requiring trusted hardware or deterministic kernels. We implement NAO as a PyTorch-compatible runtime and a contract layer currently deployed on Ethereum Holesky testnet. The runtime instruments graphs, computes per-operator bounds, and runs unmodified vendor kernels in FP32 with negligible overhead (0.3% on Qwen3-8B). Across CNNs, Transformers and diffusion models on A100, H100, RTX6000, RTX4090, empirical thresholds are $10^2-10^3$ times tighter than theoretical bounds, and bound-aware adversarial attacks achieve 0% success. NAO reconciles scalability with verifiability for real-world heterogeneous ML compute.
comment: 17 pages, 7 figures
Unifying Direct and Indirect Learning for Safe Control of Linear Systems
This paper develops learning-enabled safe controllers for linear systems subject to system uncertainties and bounded disturbances. Given the disturbance zonotope, the databased closed-loop dynamics (CLDs) are first characterized using a matrix zonotope (MZ), and refined through several steps to yield a constrained matrix zonotope (CMZ). This refinement is achieved by introducing conformal equality constraints that eliminate incompatible disturbance realizations. More precisely, prior knowledge and observed data are used separately to construct CMZ representations of disturbance sequences that conform to both data and prior knowledge, and are intersected by the initial MZ of the disturbance sequence, producing a refined CMZ. This approach reduces conservatism. To further reduce the conservativeness, we unify open-loop learning with closed-loop learning by presenting a novel set-membership identification method that models open-loop dynamics as a CMZ. The prior knowledge serves as an initial feasible open-loop model set (FOLMS) of this CMZ, which is refined into a posterior set whenever new informative online data becomes available. This posterior FOLMS then adaptively replaces the prior knowledge set employed in the disturbance elimination of the closed-loop learning process. The resulting refined parameterized set of CLD is subsequently leveraged to directly and adaptively learn a controller that robustly enforces safety. Toward this goal, we formulate a linear programming problem that guarantees {\lambda}contractiveness of a polyhedral safe set. A simulation example is provided to validate the effectiveness of the proposed approach and support the theoretical results.
comment: arXiv admin note: text overlap with arXiv:2502.04195
Dynamic object goal pushing with mobile manipulators through model-free constrained reinforcement learning ICRA 2025
Non-prehensile pushing to move and reorient objects to a goal is a versatile loco-manipulation skill. In the real world, the object's physical properties and friction with the floor contain significant uncertainties, which makes the task challenging for a mobile manipulator. In this paper, we develop a learning-based controller for a mobile manipulator to move an unknown object to a desired position and yaw orientation through a sequence of pushing actions. The proposed controller for the robotic arm and the mobile base motion is trained using a constrained Reinforcement Learning (RL) formulation. We demonstrate its capability in experiments with a quadrupedal robot equipped with an arm. The learned policy achieves a success rate of 91.35% in simulation and at least 80% on hardware in challenging scenarios. Through our extensive hardware experiments, we show that the approach demonstrates high robustness against unknown objects of different masses, materials, sizes, and shapes. It reactively discovers the pushing location and direction, thus achieving contact-rich behavior while observing only the pose of the object. Additionally, we demonstrate the adaptive behavior of the learned policy towards preventing the object from toppling.
comment: presented at ICRA 2025, Video: https://youtu.be/wGAdPGVf9Ws?si=pi83ONWofHHqbFG0
Neural 3D Object Reconstruction with Small-Scale Unmanned Aerial Vehicles
Small Unmanned Aerial Vehicles (UAVs) exhibit immense potential for navigating indoor and hard-to-reach areas, yet their significant constraints in payload and autonomy have largely prevented their use for complex tasks like high-quality 3-Dimensional (3D) reconstruction. To overcome this challenge, we introduce a novel system architecture that enables fully autonomous, high-fidelity 3D scanning of static objects using UAVs weighing under 100 grams. Our core innovation lies in a dual-reconstruction pipeline that creates a real-time feedback loop between data capture and flight control. A near-real-time (near-RT) process uses Structure from Motion (SfM) to generate an instantaneous pointcloud of the object. The system analyzes the model quality on the fly and dynamically adapts the UAV's trajectory to intelligently capture new images of poorly covered areas. This ensures comprehensive data acquisition. For the final, detailed output, a non-real-time (non-RT) pipeline employs a Neural Radiance Fields (NeRF)-based Neural 3D Reconstruction (N3DR) approach, fusing SfM-derived camera poses with precise Ultra Wide-Band (UWB) location data to achieve superior accuracy. We implemented and validated this architecture using Crazyflie 2.1 UAVs. Our experiments, conducted in both single- and multi-UAV configurations, conclusively show that dynamic trajectory adaptation consistently improves reconstruction quality over static flight paths. This work demonstrates a scalable and autonomous solution that unlocks the potential of miniaturized UAVs for fine-grained 3D reconstruction in constrained environments, a capability previously limited to much larger platforms.
comment: 13 pages, 16 figures, 3 tables, 45 references
A Flow-Based Model for Conditional and Probabilistic Electricity Consumption Profile Generation and Prediction
Residential Load Profile (RLP) generation and prediction are critical for the operation and planning of distribution networks, especially as diverse low-carbon technologies (e.g., photovoltaic and electric vehicles) are increasingly adopted. This paper introduces a novel flow-based generative model, termed Full Convolutional Profile Flow (FCPFlow), which is uniquely designed for both conditional and unconditional RLP generation, and for probabilistic load forecasting. By introducing two new layers--the invertible linear layer and the invertible normalization layer--the proposed FCPFlow architecture shows three main advantages compared to traditional statistical and contemporary deep generative models: 1) it is well-suited for RLP generation under continuous conditions, such as varying weather and annual electricity consumption, 2) it demonstrates superior scalability in different datasets compared to traditional statistical models, and 3) it also demonstrates better modeling capabilities in capturing the complex correlation of RLPs compared with deep generative models.
Asynchronous Federated Learning: A Scalable Approach for Decentralized Machine Learning
Federated Learning (FL) has emerged as a powerful paradigm for decentralized machine learning, enabling collaborative model training across diverse clients without sharing raw data. However, traditional FL approaches often face limitations in scalability and efficiency due to their reliance on synchronous client updates, which can result in significant delays and increased communication overhead, particularly in heterogeneous and dynamic environments. To address these challenges in this paper, we propose an Asynchronous Federated Learning (AFL) algorithm, which allows clients to update the global model independently and asynchronously. Our key contributions include a comprehensive convergence analysis of AFL in the presence of client delays and model staleness. By leveraging martingale difference sequence theory and variance bounds, we ensure robust convergence despite asynchronous updates. Assuming strongly convex local objective functions, we establish bounds on gradient variance under random client sampling and derive a recursion formula quantifying the impact of client delays on convergence. Furthermore, we demonstrate the practical applicability of the AFL algorithm by training decentralized linear regression and Support Vector Machine (SVM) based classifiers and compare its results with synchronous FL algorithm to effectively handling non-IID data distributed among clients. The proposed AFL algorithm addresses key limitations of traditional FL methods, such as inefficiency due to global synchronization and susceptibility to client drift. It enhances scalability, robustness, and efficiency in real-world settings with heterogeneous client populations and dynamic network conditions. Our results underscore the potential of AFL to drive advancements indistributed learning systems, particularly for large-scale, privacy-preserving applications in resource-constrained environments.
Finite-time Safety and Reach-avoid Verification of Stochastic Discrete-time Systems
This paper studies finite-time safety and reach-avoid verification for stochastic discrete-time dynamical systems. The aim is to ascertain lower and upper bounds of the probability that, within a predefined finite-time horizon, a system starting from an initial state in a safe set will either exit the safe set (safety verification) or reach a target set while remaining within the safe set until the first encounter with the target (reach-avoid verification). We introduce novel barrier-like sufficient conditions for characterizing these bounds, which either complement existing ones or fill gaps. Finally, we demonstrate the efficacy of these conditions on two examples.
comment: To appear in Information and Computation
Optimal state estimation: Turnpike analysis and performance results
In this paper, we introduce turnpike arguments in the context of optimal state estimation. In particular, we show that the optimal solution of the state estimation problem involving all available past data serves as turnpike for the solutions of truncated problems involving only a subset of the data. We mathematically formalize this phenomenon and derive a sufficient condition that relies on a decaying sensitivity property of the underlying nonlinear program. As second contribution, we show how a specific turnpike property can be used to establish performance guarantees when approximating the optimal solution of the full problem by a sequence of truncated problems, and we show that the resulting performance (both averaged and non-averaged) is approximately optimal with error terms that can be made arbitrarily small by an appropriate choice of the horizon length. In addition, we discuss interesting implications of these results for the practically relevant case of moving horizon estimation and illustrate our results with a numerical example.
comment: replaced with final version
Finite Sample Identification of Partially Observed Bilinear Dynamical Systems
We consider the problem of learning a realization of a partially observed bilinear dynamical system (BLDS) from noisy input-output data. Given a single trajectory of input-output samples, we provide a finite time analysis for learning the system's Markov-like parameters, from which a balanced realization of the bilinear system can be obtained. Our bilinear system identification algorithm learns the system's Markov-like parameters by regressing the outputs to highly correlated, nonlinear, and heavy-tailed covariates. Moreover, the stability of BLDS depends on the sequence of inputs used to excite the system. These properties, unique to partially observed bilinear dynamical systems, pose significant challenges to the analysis of our algorithm for learning the unknown dynamics. We address these challenges and provide high probability error bounds on our identification algorithm under a uniform stability assumption. Our analysis provides insights into system theoretic quantities that affect learning accuracy and sample complexity. Lastly, we perform numerical experiments with synthetic data to reinforce these insights.
Sub-optimality of the Separation Principle for Quadratic Control from Bilinear Observations
We consider the problem of controlling a linear dynamical system from bilinear observations with minimal quadratic cost. Despite the similarity of this problem to standard linear quadratic Gaussian (LQG) control, we show that when the observation model is bilinear, neither does the Separation Principle hold, nor is the optimal controller affine in the estimated state. Moreover, the cost-to-go is non-convex in the control input. Hence, finding an analytical expression for the optimal feedback controller is difficult in general. Under certain settings, we show that the standard LQG controller locally maximizes the cost instead of minimizing it. Furthermore, the optimal controllers (derived analytically) are not unique and are nonlinear in the estimated state. We also introduce a notion of input-dependent observability and derive conditions under which the Kalman filter covariance remains bounded. We illustrate our theoretical results through numerical experiments in multiple synthetic settings.
Iterated Invariant Extended Kalman Filter (IterIEKF)
We study the mathematical properties of the Invariant Extended Kalman Filter (IEKF) when iterating on the measurement update step, following the principles of the well-known Iterated Extended Kalman Filter. This iterative variant of the IEKF (IterIEKF) systematically improves its accuracy through Gauss-Newton-based relinearization, and exhibits additional theoretical properties, particularly in the low-noise regime, that resemble those of the linear Kalman filter. We apply the proposed approach to the problem of estimating the extended pose of a crane payload using an inertial measurement unit. Our results suggest that the IterIEKF significantly outperforms the IEKF when measurements are highly accurate.
Systems and Control (EESS)
Lyapunov-Aware Quantum-Inspired Reinforcement Learning for Continuous-Time Vehicle Control: A Feasibility Study
This paper presents a novel Lyapunov-Based Quantum Reinforcement Learning (LQRL) framework that integrates quantum policy optimization with Lyapunov stability analysis for continuous-time vehicle control. The proposed approach combines the representational power of variational quantum circuits (VQCs) with a stability-aware policy gradient mechanism to ensure asymptotic convergence and safe decision-making under dynamic environments. The vehicle longitudinal control problem was formulated as a continuous-state reinforcement learning task, where the quantum policy network generates control actions subject to Lyapunov stability constraints. Simulation experiments were conducted in a closed-loop adaptive cruise control scenario using a quantum-inspired policy trained under stability feedback. The results demonstrate that the LQRL framework successfully embeds Lyapunov stability verification into quantum policy learning, enabling interpretable and stability-aware control performance. Although transient overshoot and Lyapunov divergence were observed under aggressive acceleration, the system maintained bounded state evolution, validating the feasibility of integrating safety guarantees within quantum reinforcement learning architectures. The proposed framework provides a foundational step toward provably safe quantum control in autonomous systems and hybrid quantum-classical optimization domains.
comment: 7 pages, 4 figures, 20 equations, 3 appendices, 4 tables
MADR: MPC-guided Adversarial DeepReach
Hamilton-Jacobi (HJ) Reachability offers a framework for generating safe value functions and policies in the face of adversarial disturbance, but is limited by the curse of dimensionality. Physics-informed deep learning is able to overcome this infeasibility, but itself suffers from slow and inaccurate convergence, primarily due to weak PDE gradients and the complexity of self-supervised learning. A few works, recently, have demonstrated that enriching the self-supervision process with regular supervision (based on the nature of the optimal control problem), greatly accelerates convergence and solution quality, however, these have been limited to single player problems and simple games. In this work, we introduce MADR: MPC-guided Adversarial DeepReach, a general framework to robustly approximate the two-player, zero-sum differential game value function. In doing so, MADR yields the corresponding optimal strategies for both players in zero-sum games as well as safe policies for worst-case robustness. We test MADR on a multitude of high-dimensional simulated and real robotic agents with varying dynamics and games, finding that our approach significantly out-performs state-of-the-art baselines in simulation and produces impressive results in hardware.
comment: 8 pages, under review
$\ell_1$-Based Adaptive Identification under Quantized Observations with Applications
Quantized observations are ubiquitous in a wide range of applications across engineering and the social sciences, and algorithms based on the $\ell_1$-norm are well recognized for their robustness to outliers compared with their $\ell_2$-based counterparts. Nevertheless, adaptive identification methods that integrate quantized observations with $\ell_1$-optimization remain largely underexplored. Motivated by this gap, we develop a novel $\ell_1$-based adaptive identification algorithm specifically designed for quantized observations. Without relying on the traditional persistent excitation condition, we establish global convergence of the parameter estimates to their true values and show that the average regret asymptotically vanishes as the data size increases. Finally, we apply our new identification algorithm to a judicial sentencing problem using real-world data, which demonstrates its superior performance and practical significance.
A Note on Optimal Distributed State Estimation for Linear Time-Varying Systems
In this technical note, we prove that the ODEFTC algorithm constitutes the first optimal distributed state estimator for continuous-time linear time-varying systems subject to stochastic disturbances. Particularly, we formally show that it is able to asymptotically recover the performance, in terms of error covariance of the estimates at each node, of the centralized Kalman-Bucy filter, which is known to be the optimal filter for the considered class of systems. Moreover, we provide a simple sufficient value for the consensus gain to guarantee the stability of the distributed estimator.
comment: This work has been submitted to the IEEE for possible publication
Quantifying Security for Networked Control Systems: A Review
Networked Control Systems (NCSs) are integral in critical infrastructures such as power grids, transportation networks, and production systems. Ensuring the resilient operation of these large-scale NCSs against cyber-attacks is crucial for societal well-being. Over the past two decades, extensive research has been focused on developing metrics to quantify the vulnerabilities of NCSs against attacks. Once the vulnerabilities are quantified, mitigation strategies can be employed to enhance system resilience. This article provides a comprehensive overview of methods developed for assessing NCS vulnerabilities and the corresponding mitigation strategies. Furthermore, we emphasize the importance of probabilistic risk metrics to model vulnerabilities under adversaries with imperfect process knowledge. The article concludes by outlining promising directions for future research.
comment: Journal submission
PIRA: Pan-CDN Intra-video Resource Adaptation for Short Video Streaming
In large scale short video platforms, CDN resource selection plays a critical role in maintaining Quality of Experience (QoE) while controlling escalating traffic costs. To better understand this phenomenon, we conduct in the wild network measurements during video playback in a production short video system. The results reveal that CDNs delivering higher average QoE often come at greater financial cost, yet their connection quality fluctuates even within a single video underscoring a fundamental and dynamic trade off between QoE and cost. However, the problem of sustaining high QoE under cost constraints remains insufficiently investigated in the context of CDN selection for short video streaming. To address this, we propose PIRA, a dynamic resource selection algorithm that optimizes QoE and cost in real time during video playback. PIRA formally integrating QoE and cost by a mathematical model, and introduce a intra video control theoretic CDN resource selection approach which can balance QoE and cost under network dynamics. To reduce the computation overheads, PIRA employs state space pruning and adaptive parameter adjustment to efficiently solve the high dimensional optimization problem. In large scale production experiments involving 450,000 users over two weeks, PIRA outperforms the production baseline, achieving a 2.1% reduction in start up delay, 15.2% shorter rebuffering time, and 10% lower average unit traffic cost, demonstrating its effectiveness in balancing user experience and financial cost at scale.
Designing trajectories in the Earth-Moon system: a Levenberg-Marquardt approach
Trajectory design in cislunar space under a High-Fidelity Ephemeris Model (HFEM) is pursued through a nonlinear optimization perspective anchored on the transition of solutions from lower fidelity models, namely the Circular Restricted Three-Body Problem (CR3BP). The optimization problem is posed in the likeness of a multiple-shooting approach, aiming for segment-to-segment continuity while tracking proximity to the original CR3BP structures. The analysis of various formulations leads to the selection of an unconstrained least-squares problem for further investigation. The nonlinear optimization problem is convexified and the use of the Levenberg-Marquardt algorithm, as an alternative to the minimum-norm update equation found in most literature, is investigated for its control over the update step and inherent robustness. Additional techniques such as adaptive weighting are employed to further consolidate the behavior of the proposed algorithm in challenging scenarios. Numerical trials evaluate the adequacy of the methodology presented and compare it to the minimum-norm baseline over various application cases, including the generation of quasi-periodic trajectories and orbital transfers between them. The proposed approach is found to outperform the baseline in applications where the initial guess is poor and the ease of including proximity constraints provides benefits in control over the shape of the converged solution.
comment: Preprint submitted to Advances in Space Research
Sliding-Mode Control Strategies for PMSM speed control: A Comprehensive Review, Taxonomy and Research Gaps
Permanent Magnet Synchronous Motors (PMSMs) are widely employed in high-performance drive systems due to their high efficiency, power density, and precise dynamic behavior. However, nonlinearities, load disturbances, and parameter uncertainties present persistent challenges to control. Sliding-Mode Control (SMC) remains one of the most reliable strategies for high-performance PMSM drives. Yet, the rapid proliferation of adaptive, fractional-order, and intelligent variants has fragmented recent literature. This paper presents a comprehensive review and taxonomy of SMC-based PMSM speed-control methods published between 2020 and 2025. More than 200 studies are systematically analyzed and classified according to control order, surface design, disturbance-observer integration, optimization approach, and intelligent augmentation. Trends in publication activity, dominant hybrid structures, and application domains are quantitatively summarized. The review reveals a clear evolution from conventional discontinuous SMC toward adaptive, higher-order, and data-driven frameworks that mitigate chattering while preserving robustness. Persistent research gaps are identified in hardware validation, energy-efficiency assessment, and real-time tuning strategies. The taxonomy and critical synthesis provided herein establish a coherent reference for researchers and form the conceptual foundation for the companion paper (Part II), which delivers a unified benchmark and comparative simulation study of representative SMC designs.
comment: 30 Pages, 7 Fugures, 3 Tables
MPC-based motion planning for non-holonomic systems in non-convex domains
Motivated by the application of using model predictive control (MPC) for motion planning of autonomous mobile robots, a form of output tracking MPC for non-holonomic systems and with non-convex constraints is studied. Although the advantages of using MPC for motion planning have been demonstrated in several papers, in most of the available fundamental literature on output tracking MPC it is assumed, often implicitly, that the model is holonomic and generally the state or output constraints must be convex. Thus, in application-oriented publications, empirical results dominate and the topic of proving completeness, in particular under which assumptions the target is always reached, has received comparatively little attention. To address this gap, we present a novel MPC formulation that guarantees convergence to the desired target under realistic assumptions, which can be verified in relevant real-world scenarios.
comment: Preprint of ECC 2025 submission
MMRHP: A Miniature Mixed-Reality HIL Platform for Auditable Closed-Loop Evaluation
Validation of autonomous driving systems requires a trade-off between test fidelity, cost, and scalability. While miniaturized hardware-in-the-loop (HIL) platforms have emerged as a promising solution, a systematic framework supporting rigorous quantitative analysis is generally lacking, limiting their value as scientific evaluation tools. To address this challenge, we propose MMRHP, a miniature mixed-reality HIL platform that elevates miniaturized testing from functional demonstration to rigorous, reproducible quantitative analysis. The core contributions are threefold. First, we propose a systematic three-phase testing process oriented toward the Safety of the Intended Functionality(SOTIF)standard, providing actionable guidance for identifying the performance limits and triggering conditions of otherwise correctly functioning systems. Second, we design and implement a HIL platform centered around a unified spatiotemporal measurement core to support this process, ensuring consistent and traceable quantification of physical motion and system timing. Finally, we demonstrate the effectiveness of this solution through comprehensive experiments. The platform itself was first validated, achieving a spatial accuracy of 10.27 mm RMSE and a stable closed-loop latency baseline of approximately 45 ms. Subsequently, an in-depth Autoware case study leveraged this validated platform to quantify its performance baseline and identify a critical performance cliff at an injected latency of 40 ms. This work shows that a structured process, combined with a platform offering a unified spatio-temporal benchmark, enables reproducible, interpretable, and quantitative closed-loop evaluation of autonomous driving systems.
Coverage-Recon: Coordinated Multi-Drone Image Sampling with Online Map Feedback
This article addresses collaborative 3D map reconstruction using multiple drones. Achieving high-quality reconstruction requires capturing images of keypoints within the target scene from diverse viewing angles, and coverage control offers an effective framework to meet this requirement. Meanwhile, recent advances in real-time 3D reconstruction algorithms make it possible to render an evolving map during flight, enabling immediate feedback to guide drone motion. Building on this, we present Coverage-Recon, a novel coordinated image sampling algorithm that integrates online map feedback to improve reconstruction quality on-the-fly. In Coverage-Recon, the coordinated motion of drones is governed by a Quadratic Programming (QP)-based angle-aware coverage controller, which ensures multi-viewpoint image capture while enforcing safety constraints. The captured images are processed in real time by the NeuralRecon algorithm to generate an evolving 3D mesh. Mesh changes across the scene are interpreted as indicators of reconstruction uncertainty and serve as feedback to update the importance index of the coverage control as the map evolves. The effectiveness of Coverage-Recon is validated through simulation and experiments, demonstrating both qualitatively and quantitatively that incorporating online map feedback yields more complete and accurate 3D reconstructions than conventional methods. Project page: https://htnk-lab.github.io/coverage-recon/
comment: Submitted to IEEE Transactions on Control Systems Technology (under review). Project page: https://htnk-lab.github.io/coverage-recon/
Explicit Reformulation of Discrete Distributionally Robust Optimization Problems
Distributionally robust optimization (DRO) is an effective framework for controlling real-world systems with various uncertainties, typically modeled using distributional uncertainty balls. However, DRO problems often involve infinitely many inequality constraints, rendering exact solutions computationally expensive. In this study, we propose a discrete DRO (DDRO) method that significantly simplifies the problem by reducing it to a single trivial constraint. Specifically, the proposed method utilizes two types of distributional uncertainty balls to reformulate the DDRO problem into a single-layer smooth convex program, significantly improving tractability. Furthermore, we provide practical guidance for selecting the appropriate ball sizes. The original DDRO problem is further reformulated into two optimization problems: one minimizing the mean and standard deviation, and the other minimizing the conditional value at risk (CVaR). These formulations account for the choice of ball sizes, thereby enhancing the practical applicability of the method. The proposed method was applied to a distributionally robust patrol-agent design problem, identifying a Pareto front in which the mean and standard deviation of the mean hitting time varied by up to 3% and 14%, respectively, while achieving a CVaR reduction of up to 13%.
comment: 15 pages, 4 figures, This paper is submitted to a journal for possible publication
Distributed Allocation and Resource Scheduling Algorithms Resilient to Link Failure
Distributed resource allocation (DRA) is fundamental to modern networked systems, spanning applications from economic dispatch in smart grids to CPU scheduling in data centers. Conventional DRA approaches require reliable communication, yet real-world networks frequently suffer from link failures, packet drops, and communication delays due to environmental conditions, network congestion, and security threats. We introduce a novel resilient DRA algorithm that addresses these critical challenges, and our main contributions are as follows: (1) guaranteed constraint feasibility at all times, ensuring resource-demand balance even during algorithm termination or network disruption; (2) robust convergence despite sector-bound nonlinearities at nodes/links, accommodating practical constraints like quantization and saturation; and (3) optimal performance under merely uniformly-connected networks, eliminating the need for continuous connectivity. Unlike existing approaches that require persistent network connectivity and provide only asymptotic feasibility, our graph-theoretic solution leverages network percolation theory to maintain performance during intermittent disconnections. This makes it particularly valuable for mobile multi-agent systems where nodes frequently move out of communication range. Theoretical analysis and simulations demonstrate that our algorithm converges to optimal solutions despite heterogeneous time delays and substantial link failures, significantly advancing the reliability of distributed resource allocation in practical network environments.
comment: European Journal of Control
Brute-force search and Warshall algorithms for matrix-weighted graphs
Although research on the control of networked systems has grown considerably, graph-theoretic and algorithmic studies on matrix-weighted graphs remain limited. To bridge this gap in the literature, this work introduces two algorithms-the brute-force search and the Warshall algorithm-for determining connectedness and clustering in undirected matrix-weighted graphs. The proposed algorithms, which are derived from a sufficient condition for connectedness, emphasize a key distinction between matrix-weighted and scalar-weighted graphs. While the existence of a path between two vertices guarantees connectedness in scalar-weighted graphs, connectedness in matrix-weighted graphs is a collective contribution of all paths joining the two vertices. Proofs of correctness and numerical examples are provided to illustrate and demonstrate the effectiveness of the algorithms.
comment: 20 pages, 6 figures, preprint
Urban Air Mobility: A Review of Recent Advances in Communication, Management, and Sustainability
Urban Air Mobility (UAM) offers a transformative approach to addressing urban congestion, improving accessibility, and advancing environmental sustainability. Rapid progress has emerged in three tightly linked domains since 2020: (1) Communication, where dynamic spectrum allocation and low-altitude channel characterization support reliable air-ground data exchange; (2) UAM management, with novel air-traffic control concepts for dense, largely autonomous urban airspace; and (3) Sustainability, driven by energy-efficient propulsion, integrated charging infrastructure, and holistic environmental assessment. This paper reviews and synthesizes the latest research across these areas, compares the state-of-the-art solutions, and outlines the technological and infrastructural milestones that are critical to realizing a scalable, sustainable UAM ecosystem.
comment: This work has been accepted by the 2025 International Conference on Cyber-physical Social Intelligence (CPSI 2025)
Harmonic Cancellation in Multi-Electrolyzer P2H Plants via Phasor-Modulated Production Scheduling
Thyristor rectifiers (TRs) are cost-effective power supplies for hydrogen electrolyzers (ELZs) but introduce harmonic distortion that may violate grid codes. This letter proposes a self-governing harmonic mitigation strategy through coordinated operation of multiple ELZs in large power-to-hydrogen (P2H) plants. First, the harmonic model of TR-powered ELZs is derived, revealing a natural harmonic cancellation mechanism among them. Based on this, a system-level operation scheme based on phasor modulation is developed and integrated into plant scheduling. Case studies demonstrate that the proposed method reduces harmonic currents by 21.2%-39.7% and ensures grid-code compliance, with only a 0.25% loss in hydrogen output, while increasing total revenue by over 21\% compared to production-oriented strategies.
comment: This work has been submitted to the IEEE for possible publication
Wisdom of Crowds Effects under Antagonistic Interactions and Correlated Opinions
This paper investigates the wisdom of crowds of linear opinion dynamics models evolving on signed networks. Conditions are given under which models such as the DeGroot, Friedkin-Johnsen (FJ) and concatenated FJ models improve or undermine collective wisdom. The extension to dependent initial opinions is also presented, highlighting how the correlation structure influences the feasibility and geometry of the wisdom-improving regions.
A Configurable Simulation Framework for Safety Assessment of Vulnerable Road Users
Ensuring the safety of vulnerable road users (VRUs), including pedestrians, cyclists, electric scooter riders, and motorcyclists, remains a major challenge for advanced driver assistance systems (ADAS) and connected and automated vehicles (CAV) technologies. Real-world VRU tests are expensive and sometimes cannot capture or repeat rare and hazardous events. In this paper, we present a lightweight, configurable simulation framework that follows European New Car Assessment Program (Euro NCAP) VRU testing protocols. A rule-based finite-state machine (FSM) is developed as a motion planner to provide vehicle automation during the VRU interaction. We also integrate ego-vehicle perception and idealized Vehicle-to-Everything (V2X) awareness to demonstrate safety margins in different scenarios. This work provides an extensible platform for rapid and repeatable VRU safety validation, paving the way for broader case-study deployment in diverse, user-defined settings, which will be essential for building a more VRU-friendly and sustainable intelligent transportation system.
comment: This work has been accepted by the 2025 International Conference on Cyber-physical Social Intelligence (CPSI 2025)
Time Domain Differential Equation Based Fault Location Identification in Mixed Overhead-Underground Power Distribution Systems
This paper proposes a time-domain fault location identification method for mixed overhead-underground power distribution systems that can handle challenging fault scenarios such as sub-cycle faults, arcing faults and evolving faults. The proposed method is formulated based on differential equations of the system and accounts for the peculiarities of power distribution systems with distributed generations. It considers the presence of loads, multi-phase laterals and sub-laterals, heterogenous overhead and underground lines, and infeeds and remote-end fault current contributions of distributed generations. It utilizes data collected by limited number of measuring devices installed in modern power distribution systems to systematically eliminate possible multiple fault location estimations to provide a single correct estimation of the actual location of the fault. The performance of the proposed method is demonstrated using IEEE 34-node test system.
A Learning-based Model Reference Adaptive Controller Implemented on a Prosthetic Hand Wrist
The functionality and natural motion of prosthetic hands remain limited by the challenges in controlling compliant wrist mechanisms. Current control strategies often lack adaptability and incur high computational costs, which impedes real-time deployment in assistive robotics. To address this gap, this study presents a computationally efficient Neural Network (NN)-based Model Reference Adaptive Controller (MRAC) for a tendon-driven soft continuum wrist integrated with a prosthetic hand. The dynamic modeling of the wrist is formulated using Timoshenko beam theory, capturing both shear and bending deformations. The proposed NN-MRAC estimates the required tendon forces from deflection errors and minimizes deviation from a reference model through online adaptation. Simulation results demonstrate improved precision with a root mean square error (RMSE) of $6.14 \times 10^{-4}$ m and a settling time of $3.2$s. Experimental validations confirm real-time applicability, with an average RMSE of $5.66 \times 10^{-3}$ m, steady-state error of $8.05 \times 10^{-3}$ m, and settling time of $1.58$ s. These results highlight the potential of the controller to enhance motion accuracy and responsiveness in soft prosthetic systems, thereby advancing the integration of adaptive intelligent control in wearable assistive devices.
comment: International Conference on Social Robotics + AI
Graph Analysis to Fully Automate Fault Location Identification in Power Distribution Systems
This paper proposes graph analysis methods to fully automate the fault location identification task in power distribution systems. The proposed methods take basic unordered data from power distribution systems as input, including branch parameters, load values, and the location of measuring devices. The proposed data preparation and analysis methods automatically identify the system's topology and extract essential information, such as faulted paths, structures, loading of laterals and sublaterals, and estimate the fault location accordingly. The proposed graph analysis methods do not require complex node and branch numbering processes or renumbering following changes in the system topology. The proposed methods eliminate the need for human intervention at any step of the fault location identification process. They are scalable and applicable to systems of any size. The performance of the proposed algorithm is demonstrated using the IEEE 34-bus distribution test system.
Convex Maneuver Planning for Spacecraft Collision Avoidance
Conjunction analysis and maneuver planning for spacecraft collision avoidance remains a manual and time-consuming process, typically involving repeated forward simulations of hand-designed maneuvers. With the growing density of satellites in low-Earth orbit (LEO), autonomy is becoming essential for efficiently evaluating and mitigating collisions. In this work, we present an algorithm to design low-thrust collision-avoidance maneuvers for short-term conjunction events. We first formulate the problem as a nonconvex quadratically-constrained quadratic program (QCQP), which we then relax into a convex semidefinite program (SDP) using Shor's relaxation. We demonstrate empirically that the relaxation is tight, which enables the recovery of globally optimal solutions to the original nonconvex problem. Our formulation produces a minimum-energy solution while ensuring a desired probability of collision at the time of closest approach. Finally, if the desired probability of collision cannot be satisfied, we relax this constraint into a penalty, yielding a minimum-risk solution. We validate our algorithm with a high-fidelity simulation of a satellite conjunction in low-Earth orbit with a simulated conjunction data message (CDM), demonstrating its effectiveness in reducing collision risk.
comment: 8 pages, 6 figures, Accepted to International Space Robotics Conference
Motion Planning and Control of an Overactuated 4-Wheel Drive with Constrained Independent Steering
This paper addresses motion planning and con- trol of an overactuated 4-wheel drive train with independent steering (4WIS) where mechanical constraints prevent the wheels from executing full 360-degree rotations (swerve). The configuration space of such a robot is constrained and contains discontinuities that affect the smoothness of the robot motion. We introduce a mathematical formulation of the steering constraints and derive discontinuity planes that partition the velocity space into regions of smooth and efficient motion. We further design the motion planner for path tracking and ob- stacle avoidance that explicitly accounts for swerve constraints and the velocity transition smoothness. The motion controller uses local feedback to generate actuation from the desired velocity, while properly handling the discontinuity crossing by temporarily stopping the motion and repositioning the wheels. We implement the proposed motion planner as an extension to ROS Navigation package and evaluate the system in simulation and on a physical robot.
comment: 7 pages, 5 figures, 3 tables, video available at https://youtu.be/8l9s2Wb_vec, To appear at IEEE 2025 International Conference on Advanced Robotics
Extreme value distributions of peak loads for non-residential customer segments SC
Electrical grid congestion is a growing challenge in Europe, driving the need for accurate prediction of load, particularly of peak load. Non-time-resolved models of peak load offer the advantages of simplicity and compactness, and among them, Velander's formula (VF) is a traditional method that has been used for decades. Moreover, VF can be adapted into a quantile VF, which learns a truncated cumulative distribution function of peak load based on electricity consumption. This paper proposes a mathematical model based on extreme value theory to characterize the probability distribution of peak load for large non-residential customers. The model underpins the quantile VF as demonstrated through multiple quantile regression and reduces its representation to just four parameters without sacrificing predictive performance. Moreover, using maximum likelihood estimation and the likelihood ratio test, we validate that the probability distribution of peak load of analysed groups belongs to the heavy-tailed Fr\'echet class.
comment: Submitted to Power Systems Computation Conference (PSCC) 2026
Extending Resource Constrained Project Scheduling to Mega-Projects with Model-Based Systems Engineering & Hetero-functional Graph Theory
Within the project management context, project scheduling serves as an indispensable component, functioning as a fundamental tool for planning, monitoring, controlling, and managing projects more broadly. Although the resource-constrained project scheduling problem (RCPSP) lies at the core of project management activities, it remains largely disconnected from the broader literature on model-based systems engineering (MBSE), thereby limiting its integration into the design and management of complex systems. The original contribution of this paper is twofold. First, the paper seeks to reconcile the RCPSP with the broader literature and vocabulary of model-based systems engineering and hetero-functional graph theory (HFGT). A concrete translation pipeline from an activity-on-node network to a SysML activity diagram, and then to an operand net is constructed. Using this representation, it specializes the hetero-functional network minimum-cost flow (HFNMCF) formulation to the RCPSP context as a systematic means of HFGT for quantitative analysis and proves that the RCPSP is recoverable as a special case of a broader model. Secondly, on an illustrative instance with renewable and non-renewable operands, the specialized HFNMCF, while producing similar schedules, yields explicit explanations of the project states that enable richer monitoring and control. Overall, the framework preserves the strengths of the classical RCPSP while accommodating real-world constraints and enterprise-level decision processes encountered in large, complex megaprojects.
Active Cooling Device: A Flexible, Lab-Scale Experimental Unit to Develop Spatio-Temporal Temperature Control Strategies
We present an experimental unit that realizes the ``multi-input, multi-output manifold'' thermal management technology proposed by Lamarre & Raymond (2023). The proposed setup can be used for experiments aimed at controlling spatiotemporal temperature distribution. Temperature control is achieved by impinging coolant fluid jets, leveraging a manifold of channels targeted to the surface. The direction of the fluid is controlled by shifting the role of channels between inputs, outputs, or closing them. Files associated with this work include Computer-Aided Design (CAD) STEP files, Gerber files to manufacture a Printed Circuit Board (PCB), and a Graphical User Interface (GUI) written in Python. We provide a step-by-step guide to assemble the experimental setup. We also provide instructions to interact with the setup through the GUI, which allows for real-time tracking of sample temperature and flow rates per flow control device. Additionally, we provide examples of usage of the setup, including system characterization with step response, Proportional-Integral-Derivative performance tracking, and disturbance rejection in a coupled system. Extending the application is accessible through the files provided in the open repository associated with this work. The active cooling device presents a safe, flexible, and complete design, allowing for lab-scale assessment of the performance of custom temperature control strategies using enclosed impinging jets.
Through-the-Earth Magnetic Induction Communication and Networking: A Comprehensive Survey
Magnetic induction (MI) communication (MIC) has emerged as a promising candidate for underground communication networks due to its excellent penetration capabilities. Integration with Space-Air-Ground-Underground (SAGUI) networks in next-generation mobile communication systems requires a well-defined network architecture. A recent discovery in MIC research, MI fast fading, remains in its early stages and presents unique challenges. This paper provides a comprehensive survey on through-the-earth (TTE) MIC, covering MI applications, channel modeling, point-to-point MIC design, relay techniques, network frameworks, and emerging technologies. We compare various MIC applications to highlight TTE-specific challenges and review the principles of channel modeling, addressing both MI slow fading and MI fast fading, along with its potential impact on existing MIC theories. We conduct a fine-grained decomposition of MI channel power gain into four distinct physical parameters, and propose a novel geometric model to analyze MI fast fading. We also summarize MI relay techniques, examine crosstalk effects in relay and high-density networks, and explore key research tasks within the OSI framework for a holistic MI network protocol in SAGUI. To bridge the gaps identified, we propose a MIC framework that supports TCP/IP and Linux, enabling full implementation of existing and emerging MIC solutions. This framework empowers researchers to leverage Linux resources and deep learning platforms for accelerated development of MIC in SAGUI networks. Remaining research challenges, open issues, and promising novel techniques are further identified to advance MIC research.
comment: This work has been accepted by the IEEE Communications Surveys & Tutorials (COMST) for publication. The final published version will be available on IEEE Xplore
PowerChain: A Verifiable Agentic AI System for Automating Distribution Grid Analyses
Rapid electrification and decarbonization are increasing the complexity of distribution grid (DG) operation and planning, necessitating advanced computational analyses to ensure reliability and resilience. These analyses depend on disparate workflows comprising complex models, function calls, and data pipelines that require substantial expert knowledge and remain difficult to automate. Workforce and budget constraints further limit utilities' ability to apply such analyses at scale. To address this gap, we build an agentic system PowerChain, which is capable of autonomously performing complex grid analyses. Existing agentic AI systems are typically developed in a bottom-up manner with customized context for predefined analysis tasks; therefore, they do not generalize to tasks that the agent has never seen. In comparison, to generalize to unseen DG analysis tasks, PowerChain dynamically generates structured context by leveraging supervisory signals from self-contained power systems tools (e.g., GridLAB-D) and an optimized set of expert-annotated and verified reasoning trajectories. For complex DG tasks defined in natural language, empirical results on real utility data demonstrate that PowerChain achieves up to a 144/% improvement in performance over baselines.
Rethink Repeatable Measures of Robot Performance with Statistical Query
For a general standardized testing algorithm designed to evaluate a specific aspect of a robot's performance, several key expectations are commonly imposed. Beyond accuracy (i.e., closeness to a typically unknown ground-truth reference) and efficiency (i.e., feasibility within acceptable testing costs and equipment constraints), one particularly important attribute is repeatability. Repeatability refers to the ability to consistently obtain the same testing outcome when similar testing algorithms are executed on the same subject robot by different stakeholders, across different times or locations. However, achieving repeatable testing has become increasingly challenging as the components involved grow more complex, intelligent, diverse, and, most importantly, stochastic. While related efforts have addressed repeatability at ethical, hardware, and procedural levels, this study focuses specifically on repeatable testing at the algorithmic level. Specifically, we target the well-adopted class of testing algorithms in standardized evaluation: statistical query (SQ) algorithms (i.e., algorithms that estimate the expected value of a bounded function over a distribution using sampled data). We propose a lightweight, parameterized, and adaptive modification applicable to any SQ routine, whether based on Monte Carlo sampling, importance sampling, or adaptive importance sampling, that makes it provably repeatable, with guaranteed bounds on both accuracy and efficiency. We demonstrate the effectiveness of the proposed approach across three representative scenarios: (i) established and widely adopted standardized testing of manipulators, (ii) emerging intelligent testing algorithms for operational risk assessment in automated vehicles, and (iii) developing use cases involving command tracking performance evaluation of humanoid robots in locomotion tasks.
One-Pass Learning via Bridging Orthogonal Gradient Descent and Recursive Least-Squares
While large machine learning models have shown remarkable performance in various domains, their training typically requires iterating for many passes over the training data. However, due to computational and memory constraints and potential privacy concerns, storing and accessing all the data is impractical in many real-world scenarios where the data arrives in a stream. In this paper, we investigate the problem of one-pass learning, in which a model is trained on sequentially arriving data without retraining on previous datapoints. Motivated by the demonstrated effectiveness of overparameterized models and the phenomenon of benign overfitting, we propose Orthogonal Recursive Fitting (ORFit), an algorithm for one-pass learning which seeks to perfectly fit each new datapoint while minimally altering the predictions on previous datapoints. ORFit updates the parameters in a direction orthogonal to past gradients, similar to orthogonal gradient descent (OGD) in continual learning. We show that, interestingly, ORFit's update leads to an operation similar to the recursive least-squares (RLS) algorithm in adaptive filtering but with significantly improved memory and computational efficiency, i.e., linear, instead of quadratic, in the number of parameters. To further reduce memory usage, we leverage the structure of the streaming data via an incremental principal component analysis (IPCA). We show that using the principal components is minimax optimal, i.e., it minimizes the worst-case forgetting of previous predictions for unknown future updates. Further, we prove that, for overparameterized linear models, the parameter vector obtained by ORFit matches what the standard multi-pass stochastic gradient descent (SGD) would converge to. Finally, we extend our results to the nonlinear setting for highly overparameterized models, relevant for deep learning.
comment: Journal extension of v1: Y. Min, K, Ahn, N. Azizan, "One-Pass Learning via Bridging Orthogonal Gradient Descent and Recursive Least-Squares," IEEE Conference on Decision and Control, 2022
Nondeterminism-Aware Optimistic Verification for Floating-Point Neural Networks
Neural networks increasingly run on hardware outside the user's control (cloud GPUs, inference marketplaces). Yet ML-as-a-Service reveals little about what actually ran or whether returned outputs faithfully reflect the intended inputs. Users lack recourse against service downgrades (model swaps, quantization, graph rewrites, or discrepancies like altered ad embeddings). Verifying outputs is hard because floating-point(FP) execution on heterogeneous accelerators is inherently nondeterministic. Existing approaches are either impractical for real FP neural networks or reintroduce vendor trust. We present NAO: a Nondeterministic tolerance Aware Optimistic verification protocol that accepts outputs within principled operator-level acceptance regions rather than requiring bitwise equality. NAO combines two error models: (i) sound per-operator IEEE-754 worst-case bounds and (ii) tight empirical percentile profiles calibrated across hardware. Discrepancies trigger a Merkle-anchored, threshold-guided dispute game that recursively partitions the computation graph until one operator remains, where adjudication reduces to a lightweight theoretical-bound check or a small honest-majority vote against empirical thresholds. Unchallenged results finalize after a challenge window, without requiring trusted hardware or deterministic kernels. We implement NAO as a PyTorch-compatible runtime and a contract layer currently deployed on Ethereum Holesky testnet. The runtime instruments graphs, computes per-operator bounds, and runs unmodified vendor kernels in FP32 with negligible overhead (0.3% on Qwen3-8B). Across CNNs, Transformers and diffusion models on A100, H100, RTX6000, RTX4090, empirical thresholds are $10^2-10^3$ times tighter than theoretical bounds, and bound-aware adversarial attacks achieve 0% success. NAO reconciles scalability with verifiability for real-world heterogeneous ML compute.
comment: 17 pages, 7 figures
Unifying Direct and Indirect Learning for Safe Control of Linear Systems
This paper develops learning-enabled safe controllers for linear systems subject to system uncertainties and bounded disturbances. Given the disturbance zonotope, the databased closed-loop dynamics (CLDs) are first characterized using a matrix zonotope (MZ), and refined through several steps to yield a constrained matrix zonotope (CMZ). This refinement is achieved by introducing conformal equality constraints that eliminate incompatible disturbance realizations. More precisely, prior knowledge and observed data are used separately to construct CMZ representations of disturbance sequences that conform to both data and prior knowledge, and are intersected by the initial MZ of the disturbance sequence, producing a refined CMZ. This approach reduces conservatism. To further reduce the conservativeness, we unify open-loop learning with closed-loop learning by presenting a novel set-membership identification method that models open-loop dynamics as a CMZ. The prior knowledge serves as an initial feasible open-loop model set (FOLMS) of this CMZ, which is refined into a posterior set whenever new informative online data becomes available. This posterior FOLMS then adaptively replaces the prior knowledge set employed in the disturbance elimination of the closed-loop learning process. The resulting refined parameterized set of CLD is subsequently leveraged to directly and adaptively learn a controller that robustly enforces safety. Toward this goal, we formulate a linear programming problem that guarantees {\lambda}contractiveness of a polyhedral safe set. A simulation example is provided to validate the effectiveness of the proposed approach and support the theoretical results.
comment: arXiv admin note: text overlap with arXiv:2502.04195
Dynamic object goal pushing with mobile manipulators through model-free constrained reinforcement learning ICRA 2025
Non-prehensile pushing to move and reorient objects to a goal is a versatile loco-manipulation skill. In the real world, the object's physical properties and friction with the floor contain significant uncertainties, which makes the task challenging for a mobile manipulator. In this paper, we develop a learning-based controller for a mobile manipulator to move an unknown object to a desired position and yaw orientation through a sequence of pushing actions. The proposed controller for the robotic arm and the mobile base motion is trained using a constrained Reinforcement Learning (RL) formulation. We demonstrate its capability in experiments with a quadrupedal robot equipped with an arm. The learned policy achieves a success rate of 91.35% in simulation and at least 80% on hardware in challenging scenarios. Through our extensive hardware experiments, we show that the approach demonstrates high robustness against unknown objects of different masses, materials, sizes, and shapes. It reactively discovers the pushing location and direction, thus achieving contact-rich behavior while observing only the pose of the object. Additionally, we demonstrate the adaptive behavior of the learned policy towards preventing the object from toppling.
comment: presented at ICRA 2025, Video: https://youtu.be/wGAdPGVf9Ws?si=pi83ONWofHHqbFG0
Neural 3D Object Reconstruction with Small-Scale Unmanned Aerial Vehicles
Small Unmanned Aerial Vehicles (UAVs) exhibit immense potential for navigating indoor and hard-to-reach areas, yet their significant constraints in payload and autonomy have largely prevented their use for complex tasks like high-quality 3-Dimensional (3D) reconstruction. To overcome this challenge, we introduce a novel system architecture that enables fully autonomous, high-fidelity 3D scanning of static objects using UAVs weighing under 100 grams. Our core innovation lies in a dual-reconstruction pipeline that creates a real-time feedback loop between data capture and flight control. A near-real-time (near-RT) process uses Structure from Motion (SfM) to generate an instantaneous pointcloud of the object. The system analyzes the model quality on the fly and dynamically adapts the UAV's trajectory to intelligently capture new images of poorly covered areas. This ensures comprehensive data acquisition. For the final, detailed output, a non-real-time (non-RT) pipeline employs a Neural Radiance Fields (NeRF)-based Neural 3D Reconstruction (N3DR) approach, fusing SfM-derived camera poses with precise Ultra Wide-Band (UWB) location data to achieve superior accuracy. We implemented and validated this architecture using Crazyflie 2.1 UAVs. Our experiments, conducted in both single- and multi-UAV configurations, conclusively show that dynamic trajectory adaptation consistently improves reconstruction quality over static flight paths. This work demonstrates a scalable and autonomous solution that unlocks the potential of miniaturized UAVs for fine-grained 3D reconstruction in constrained environments, a capability previously limited to much larger platforms.
comment: 13 pages, 16 figures, 3 tables, 45 references
A Flow-Based Model for Conditional and Probabilistic Electricity Consumption Profile Generation and Prediction
Residential Load Profile (RLP) generation and prediction are critical for the operation and planning of distribution networks, especially as diverse low-carbon technologies (e.g., photovoltaic and electric vehicles) are increasingly adopted. This paper introduces a novel flow-based generative model, termed Full Convolutional Profile Flow (FCPFlow), which is uniquely designed for both conditional and unconditional RLP generation, and for probabilistic load forecasting. By introducing two new layers--the invertible linear layer and the invertible normalization layer--the proposed FCPFlow architecture shows three main advantages compared to traditional statistical and contemporary deep generative models: 1) it is well-suited for RLP generation under continuous conditions, such as varying weather and annual electricity consumption, 2) it demonstrates superior scalability in different datasets compared to traditional statistical models, and 3) it also demonstrates better modeling capabilities in capturing the complex correlation of RLPs compared with deep generative models.
Asynchronous Federated Learning: A Scalable Approach for Decentralized Machine Learning
Federated Learning (FL) has emerged as a powerful paradigm for decentralized machine learning, enabling collaborative model training across diverse clients without sharing raw data. However, traditional FL approaches often face limitations in scalability and efficiency due to their reliance on synchronous client updates, which can result in significant delays and increased communication overhead, particularly in heterogeneous and dynamic environments. To address these challenges in this paper, we propose an Asynchronous Federated Learning (AFL) algorithm, which allows clients to update the global model independently and asynchronously. Our key contributions include a comprehensive convergence analysis of AFL in the presence of client delays and model staleness. By leveraging martingale difference sequence theory and variance bounds, we ensure robust convergence despite asynchronous updates. Assuming strongly convex local objective functions, we establish bounds on gradient variance under random client sampling and derive a recursion formula quantifying the impact of client delays on convergence. Furthermore, we demonstrate the practical applicability of the AFL algorithm by training decentralized linear regression and Support Vector Machine (SVM) based classifiers and compare its results with synchronous FL algorithm to effectively handling non-IID data distributed among clients. The proposed AFL algorithm addresses key limitations of traditional FL methods, such as inefficiency due to global synchronization and susceptibility to client drift. It enhances scalability, robustness, and efficiency in real-world settings with heterogeneous client populations and dynamic network conditions. Our results underscore the potential of AFL to drive advancements indistributed learning systems, particularly for large-scale, privacy-preserving applications in resource-constrained environments.
Finite-time Safety and Reach-avoid Verification of Stochastic Discrete-time Systems
This paper studies finite-time safety and reach-avoid verification for stochastic discrete-time dynamical systems. The aim is to ascertain lower and upper bounds of the probability that, within a predefined finite-time horizon, a system starting from an initial state in a safe set will either exit the safe set (safety verification) or reach a target set while remaining within the safe set until the first encounter with the target (reach-avoid verification). We introduce novel barrier-like sufficient conditions for characterizing these bounds, which either complement existing ones or fill gaps. Finally, we demonstrate the efficacy of these conditions on two examples.
comment: To appear in Information and Computation
Optimal state estimation: Turnpike analysis and performance results
In this paper, we introduce turnpike arguments in the context of optimal state estimation. In particular, we show that the optimal solution of the state estimation problem involving all available past data serves as turnpike for the solutions of truncated problems involving only a subset of the data. We mathematically formalize this phenomenon and derive a sufficient condition that relies on a decaying sensitivity property of the underlying nonlinear program. As second contribution, we show how a specific turnpike property can be used to establish performance guarantees when approximating the optimal solution of the full problem by a sequence of truncated problems, and we show that the resulting performance (both averaged and non-averaged) is approximately optimal with error terms that can be made arbitrarily small by an appropriate choice of the horizon length. In addition, we discuss interesting implications of these results for the practically relevant case of moving horizon estimation and illustrate our results with a numerical example.
comment: replaced with final version
Finite Sample Identification of Partially Observed Bilinear Dynamical Systems
We consider the problem of learning a realization of a partially observed bilinear dynamical system (BLDS) from noisy input-output data. Given a single trajectory of input-output samples, we provide a finite time analysis for learning the system's Markov-like parameters, from which a balanced realization of the bilinear system can be obtained. Our bilinear system identification algorithm learns the system's Markov-like parameters by regressing the outputs to highly correlated, nonlinear, and heavy-tailed covariates. Moreover, the stability of BLDS depends on the sequence of inputs used to excite the system. These properties, unique to partially observed bilinear dynamical systems, pose significant challenges to the analysis of our algorithm for learning the unknown dynamics. We address these challenges and provide high probability error bounds on our identification algorithm under a uniform stability assumption. Our analysis provides insights into system theoretic quantities that affect learning accuracy and sample complexity. Lastly, we perform numerical experiments with synthetic data to reinforce these insights.
Sub-optimality of the Separation Principle for Quadratic Control from Bilinear Observations
We consider the problem of controlling a linear dynamical system from bilinear observations with minimal quadratic cost. Despite the similarity of this problem to standard linear quadratic Gaussian (LQG) control, we show that when the observation model is bilinear, neither does the Separation Principle hold, nor is the optimal controller affine in the estimated state. Moreover, the cost-to-go is non-convex in the control input. Hence, finding an analytical expression for the optimal feedback controller is difficult in general. Under certain settings, we show that the standard LQG controller locally maximizes the cost instead of minimizing it. Furthermore, the optimal controllers (derived analytically) are not unique and are nonlinear in the estimated state. We also introduce a notion of input-dependent observability and derive conditions under which the Kalman filter covariance remains bounded. We illustrate our theoretical results through numerical experiments in multiple synthetic settings.
Iterated Invariant Extended Kalman Filter (IterIEKF)
We study the mathematical properties of the Invariant Extended Kalman Filter (IEKF) when iterating on the measurement update step, following the principles of the well-known Iterated Extended Kalman Filter. This iterative variant of the IEKF (IterIEKF) systematically improves its accuracy through Gauss-Newton-based relinearization, and exhibits additional theoretical properties, particularly in the low-noise regime, that resemble those of the linear Kalman filter. We apply the proposed approach to the problem of estimating the extended pose of a crane payload using an inertial measurement unit. Our results suggest that the IterIEKF significantly outperforms the IEKF when measurements are highly accurate.
Robotics
Robobench: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models as Embodied Brain
Building robots that can perceive, reason, and act in dynamic, unstructured environments remains a core challenge. Recent embodied systems often adopt a dual-system paradigm, where System 2 handles high-level reasoning while System 1 executes low-level control. In this work, we refer to System 2 as the embodied brain, emphasizing its role as the cognitive core for reasoning and decision-making in manipulation tasks. Given this role, systematic evaluation of the embodied brain is essential. Yet existing benchmarks emphasize execution success, or when targeting high-level reasoning, suffer from incomplete dimensions and limited task realism, offering only a partial picture of cognitive capability. To bridge this gap, we introduce RoboBench, a benchmark that systematically evaluates multimodal large language models (MLLMs) as embodied brains. Motivated by the critical roles across the full manipulation pipeline, RoboBench defines five dimensions-instruction comprehension, perception reasoning, generalized planning, affordance prediction, and failure analysis-spanning 14 capabilities, 25 tasks, and 6092 QA pairs. To ensure realism, we curate datasets across diverse embodiments, attribute-rich objects, and multi-view scenes, drawing from large-scale real robotic data. For planning, RoboBench introduces an evaluation framework, MLLM-as-world-simulator. It evaluate embodied feasibility by simulating whether predicted plans can achieve critical object-state changes. Experiments on 14 MLLMs reveal fundamental limitations: difficulties with implicit instruction comprehension, spatiotemporal reasoning, cross-scenario planning, fine-grained affordance understanding, and execution failure diagnosis. RoboBench provides a comprehensive scaffold to quantify high-level cognition, and guide the development of next-generation embodied MLLMs. The project page is in https://robo-bench.github.io.
SoftMimic: Learning Compliant Whole-body Control from Examples
We introduce SoftMimic, a framework for learning compliant whole-body control policies for humanoid robots from example motions. Imitating human motions with reinforcement learning allows humanoids to quickly learn new skills, but existing methods incentivize stiff control that aggressively corrects deviations from a reference motion, leading to brittle and unsafe behavior when the robot encounters unexpected contacts. In contrast, SoftMimic enables robots to respond compliantly to external forces while maintaining balance and posture. Our approach leverages an inverse kinematics solver to generate an augmented dataset of feasible compliant motions, which we use to train a reinforcement learning policy. By rewarding the policy for matching compliant responses rather than rigidly tracking the reference motion, SoftMimic learns to absorb disturbances and generalize to varied tasks from a single motion clip. We validate our method through simulations and real-world experiments, demonstrating safe and effective interaction with the environment.
comment: Website: https://gmargo11.github.io/softmimic/
Botany-Bot: Digital Twin Monitoring of Occluded and Underleaf Plant Structures with Gaussian Splats IROS 2025
Commercial plant phenotyping systems using fixed cameras cannot perceive many plant details due to leaf occlusion. In this paper, we present Botany-Bot, a system for building detailed "annotated digital twins" of living plants using two stereo cameras, a digital turntable inside a lightbox, an industrial robot arm, and 3D segmentated Gaussian Splat models. We also present robot algorithms for manipulating leaves to take high-resolution indexable images of occluded details such as stem buds and the underside/topside of leaves. Results from experiments suggest that Botany-Bot can segment leaves with 90.8% accuracy, detect leaves with 86.2% accuracy, lift/push leaves with 77.9% accuracy, and take detailed overside/underside images with 77.3% accuracy. Code, videos, and datasets are available at https://berkeleyautomation.github.io/Botany-Bot/.
comment: 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025)
RESample: A Robust Data Augmentation Framework via Exploratory Sampling for Robotic Manipulation ICRA2026
Vision-Language-Action models (VLAs) have demonstrated remarkable performance on complex robotic manipulation tasks through imitation learning. However, existing imitation learning datasets contain only successful trajectories and lack failure or recovery data, especially for out-of-distribution (OOD) states where the robot deviates from the main policy due to minor perturbations or errors, leading VLA models to struggle with states deviating from the training distribution. To this end, we propose an automated OOD data augmentation framework named RESample through exploratory sampling. Specifically, we first leverage offline reinforcement learning to obtain an action-value network that accurately identifies sub-optimal actions under the current manipulation policy. We further sample potential OOD states from trajectories via rollout, and design an exploratory sampling mechanism that adaptively incorporates these action proxies into the training dataset to ensure efficiency. Subsequently, our framework explicitly encourages the VLAs to recover from OOD states and enhances their robustness against distributional shifts. We conduct extensive experiments on the LIBERO benchmark as well as real-world robotic manipulation tasks, demonstrating that RESample consistently improves the stability and generalization ability of VLA models.
comment: 9 pages,7 figures, submitted to ICRA2026
Learned Inertial Odometry for Cycling Based on Mixture of Experts Algorithm
With the rapid growth of bike sharing and the increasing diversity of cycling applications, accurate bicycle localization has become essential. traditional GNSS-based methods suffer from multipath effects, while existing inertial navigation approaches rely on precise modeling and show limited robustness. Tight Learned Inertial Odometry (TLIO) achieves low position drift by combining raw IMU data with predicted displacements by neural networks, but its high computational cost restricts deployment on mobile devices. To overcome this, we extend TLIO to bicycle localization and introduce an improved Mixture-of Experts (MoE) model that reduces both training and inference costs. Experiments show that, compared to the state-of-the-art LLIO framework, our method achieves comparable accuracy while reducing parameters by 64.7% and computational cost by 81.8%.
Intent-Driven LLM Ensemble Planning for Flexible Multi-Robot Disassembly: Demonstration on EV Batteries
This paper addresses the problem of planning complex manipulation tasks, in which multiple robots with different end-effectors and capabilities, informed by computer vision, must plan and execute concatenated sequences of actions on a variety of objects that can appear in arbitrary positions and configurations in unstructured scenes. We propose an intent-driven planning pipeline which can robustly construct such action sequences with varying degrees of supervisory input from a human using simple language instructions. The pipeline integrates: (i) perception-to-text scene encoding, (ii) an ensemble of large language models (LLMs) that generate candidate removal sequences based on the operator's intent, (iii) an LLM-based verifier that enforces formatting and precedence constraints, and (iv) a deterministic consistency filter that rejects hallucinated objects. The pipeline is evaluated on an example task in which two robot arms work collaboratively to dismantle an Electric Vehicle battery for recycling applications. A variety of components must be grasped and removed in specific sequences, determined by human instructions and/or by task-order feasibility decisions made by the autonomous system. On 200 real scenes with 600 operator prompts across five component classes, we used metrics of full-sequence correctness and next-task correctness to evaluate and compare five LLM-based planners (including ablation analyses of pipeline components). We also evaluated the LLM-based human interface in terms of time to execution and NASA TLX with human participant experiments. Results indicate that our ensemble-with-verification approach reliably maps operator intent to safe, executable multi-robot plans while maintaining low user effort.
comment: This work is funded by the project called "Research and Development of a Highly Automated and Safe Streamlined Process for Increasing Lithium-ion Battery Repurposing and Recycling" (REBELION) under Grant 101104241, and partially supported by the Ministry of National Education, Republic of Turkey. Submitted to Frontiers for Review
An Empirical Study of Lagrangian Methods in Safe Reinforcement Learning
In safety-critical domains such as robotics, navigation and power systems, constrained optimization problems arise where maximizing performance must be carefully balanced with associated constraints. Safe reinforcement learning provides a framework to address these challenges, with Lagrangian methods being a popular choice. However, the effectiveness of Lagrangian methods crucially depends on the choice of the Lagrange multiplier $\lambda$, which governs the trade-off between return and constraint cost. A common approach is to update the multiplier automatically during training. Although this is standard in practice, there remains limited empirical evidence on the robustness of an automated update and its influence on overall performance. Therefore, we analyze (i) optimality and (ii) stability of Lagrange multipliers in safe reinforcement learning across a range of tasks. We provide $\lambda$-profiles that give a complete visualization of the trade-off between return and constraint cost of the optimization problem. These profiles show the highly sensitive nature of $\lambda$ and moreover confirm the lack of general intuition for choosing the optimal value $\lambda^*$. Our findings additionally show that automated multiplier updates are able to recover and sometimes even exceed the optimal performance found at $\lambda^*$ due to the vast difference in their learning trajectories. Furthermore, we show that automated multiplier updates exhibit oscillatory behavior during training, which can be mitigated through PID-controlled updates. However, this method requires careful tuning to achieve consistently better performance across tasks. This highlights the need for further research on stabilizing Lagrangian methods in safe reinforcement learning. The code used to reproduce our results can be found at https://github.com/lindsayspoor/Lagrangian_SafeRL.
Distributed Spatial-Temporal Trajectory Optimization for Unmanned-Aerial-Vehicle Swarm
Swarm trajectory optimization problems are a well-recognized class of multi-agent optimal control problems with strong nonlinearity. However, the heuristic nature of needing to set the final time for agents beforehand and the time-consuming limitation of the significant number of iterations prohibit the application of existing methods to large-scale swarm of Unmanned Aerial Vehicles (UAVs) in practice. In this paper, we propose a spatial-temporal trajectory optimization framework that accomplishes multi-UAV consensus based on the Alternating Direction Multiplier Method (ADMM) and uses Differential Dynamic Programming (DDP) for fast local planning of individual UAVs. The introduced framework is a two-level architecture that employs Parameterized DDP (PDDP) as the trajectory optimizer for each UAV, and ADMM to satisfy the local constraints and accomplish the spatial-temporal parameter consensus among all UAVs. This results in a fully distributed algorithm called Distributed Parameterized DDP (D-PDDP). In addition, an adaptive tuning criterion based on the spectral gradient method for the penalty parameter is proposed to reduce the number of algorithmic iterations. Several simulation examples are presented to verify the effectiveness of the proposed algorithm.
HumanMPC - Safe and Efficient MAV Navigation among Humans
Safe and efficient robotic navigation among humans is essential for integrating robots into everyday environments. Most existing approaches focus on simplified 2D crowd navigation and fail to account for the full complexity of human body dynamics beyond root motion. We present HumanMPC, a Model Predictive Control (MPC) framework for 3D Micro Air Vehicle (MAV) navigation among humans that combines theoretical safety guarantees with data-driven models for realistic human motion forecasting. Our approach introduces a novel twist to reachability-based safety formulation that constrains only the initial control input for safety while modeling its effects over the entire planning horizon, enabling safe yet efficient navigation. We validate HumanMPC in both simulated experiments using real human trajectories and in the real-world, demonstrating its effectiveness across tasks ranging from goal-directed navigation to visual servoing for human tracking. While we apply our method to MAVs in this work, it is generic and can be adapted by other platforms. Our results show that the method ensures safety without excessive conservatism and outperforms baseline approaches in both efficiency and reliability.
Inverse Optimal Control of Muscle Force Sharing During Pathological Gait
Muscle force sharing is typically resolved by minimizing a specific objective function to approximate neural control strategies. An inverse optimal control approach was applied to identify the "best" objective function, among a positive linear combination of basis objective functions, associated with the gait of two post-stroke males, one high-functioning (subject S1) and one low-functioning (subject S2). It was found that the "best" objective function is subject- and leg-specific. No single function works universally well, yet the best options are usually differently weighted combinations of muscle activation- and power-minimization. Subject-specific inverse optimal control models performed best on their respective limbs (\textbf{RMSE 178/213 N, CC 0.71/0.61} for non-paretic and paretic legs of S1; \textbf{RMSE 205/165 N, CC 0.88/0.85} for respective legs of S2), but cross-subject generalization was poor, particularly for paretic legs. Moreover, minimizing the root mean square of muscle power emerged as important for paretic limbs, while minimizing activation-based functions dominated for non-paretic limbs. This may suggest different neural control strategies between affected and unaffected sides, possibly altered by the presence of spasticity. Among the 15 considered objective functions commonly used in inverse dynamics-based computations, the root mean square of muscle power was the only one explicitly incorporating muscle velocity, leading to a possible model for spasticity in the paretic limbs. Although this objective function has been rarely used, it may be relevant for modeling pathological gait, such as post-stroke gait.
A Generalization of Input-Output Linearization via Dynamic Switching Between Melds of Output Functions
This letter presents a systematic framework for switching between different sets of outputs for the control of nonlinear systems via feedback linearization. We introduce the concept of a meld to formally define a valid, feedback-linearizable subset of outputs that can be selected from a larger deck of possible outputs. The main contribution is a formal proof establishing that under suitable dwell-time and compatibility conditions, it is possible to switch between different melds while guaranteeing the uniform boundedness of the system state. We further show that the error dynamics of the active outputs remain exponentially stable within each switching interval and that outputs common to consecutive melds are tracked seamlessly through transitions. The proposed theory is valid for any feedback linearizable nonlinear system, such as, e.g., robots, aerial and terrestrial vehicles, etc.. We demonstrate it on a simple numerical simulation of a robotic manipulator.
From Spatial to Actions: Grounding Vision-Language-Action Model in Spatial Foundation Priors
Existing vision-language-action (VLA) models act in 3D real-world but are typically built on 2D encoders, leaving a spatial reasoning gap that limits generalization and adaptability. Recent 3D integration techniques for VLAs either require specialized sensors and transfer poorly across modalities, or inject weak cues that lack geometry and degrade vision-language alignment. In this work, we introduce FALCON (From Spatial to Action), a novel paradigm that injects rich 3D spatial tokens into the action head. FALCON leverages spatial foundation models to deliver strong geometric priors from RGB alone, and includes an Embodied Spatial Model that can optionally fuse depth, or pose for higher fidelity when available, without retraining or architectural changes. To preserve language reasoning, spatial tokens are consumed by a Spatial-Enhanced Action Head rather than being concatenated into the vision-language backbone. These designs enable FALCON to address limitations in spatial representation, modality transferability, and alignment. In comprehensive evaluations across three simulation benchmarks and eleven real-world tasks, our proposed FALCON achieves state-of-the-art performance, consistently surpasses competitive baselines, and remains robust under clutter, spatial-prompt conditioning, and variations in object scale and height.
comment: Project page: https://falcon-vla.github.io/
Integrating Trustworthy Artificial Intelligence with Energy-Efficient Robotic Arms for Waste Sorting
This paper presents a novel methodology that integrates trustworthy artificial intelligence (AI) with an energy-efficient robotic arm for intelligent waste classification and sorting. By utilizing a convolutional neural network (CNN) enhanced through transfer learning with MobileNetV2, the system accurately classifies waste into six categories: plastic, glass, metal, paper, cardboard, and trash. The model achieved a high training accuracy of 99.8% and a validation accuracy of 80.5%, demonstrating strong learning and generalization. A robotic arm simulator is implemented to perform virtual sorting, calculating the energy cost for each action using Euclidean distance to ensure optimal and efficient movement. The framework incorporates key elements of trustworthy AI, such as transparency, robustness, fairness, and safety, making it a reliable and scalable solution for smart waste management systems in urban settings.
comment: 5 pages, 2 figures
Graph Attention-Guided Search for Dense Multi-Agent Pathfinding
Finding near-optimal solutions for dense multi-agent pathfinding (MAPF) problems in real-time remains challenging even for state-of-the-art planners. To this end, we develop a hybrid framework that integrates a learned heuristic derived from MAGAT, a neural MAPF policy with a graph attention scheme, into a leading search-based algorithm, LaCAM. While prior work has explored learning-guided search in MAPF, such methods have historically underperformed. In contrast, our approach, termed LaGAT, outperforms both purely search-based and purely learning-based methods in dense scenarios. This is achieved through an enhanced MAGAT architecture, a pre-train-then-fine-tune strategy on maps of interest, and a deadlock detection scheme to account for imperfect neural guidance. Our results demonstrate that, when carefully designed, hybrid search offers a powerful solution for tightly coupled, challenging multi-agent coordination problems.
Bridging Embodiment Gaps: Deploying Vision-Language-Action Models on Soft Robots NeurIPS 2025
Robotic systems are increasingly expected to operate in human-centered, unstructured environments where safety, adaptability, and generalization are essential. Vision-Language-Action (VLA) models have been proposed as a language guided generalized control framework for real robots. However, their deployment has been limited to conventional serial link manipulators. Coupled by their rigidity and unpredictability of learning based control, the ability to safely interact with the environment is missing yet critical. In this work, we present the deployment of a VLA model on a soft continuum manipulator to demonstrate autonomous safe human-robot interaction. We present a structured finetuning and deployment pipeline evaluating two state-of-the-art VLA models (OpenVLA-OFT and $\pi_0$) across representative manipulation tasks, and show while out-of-the-box policies fail due to embodiment mismatch, through targeted finetuning the soft robot performs equally to the rigid counterpart. Our findings highlight the necessity of finetuning for bridging embodiment gaps, and demonstrate that coupling VLA models with soft robots enables safe and flexible embodied AI in human-shared environments.
comment: Accepted by NeurIPS 2025 SpaVLE workshop. 4 pages, 2 figures(in main paper, excluding references and supplements)
M2H: Multi-Task Learning with Efficient Window-Based Cross-Task Attention for Monocular Spatial Perception IROS 2025
Deploying real-time spatial perception on edge devices requires efficient multi-task models that leverage complementary task information while minimizing computational overhead. This paper introduces Multi-Mono-Hydra (M2H), a novel multi-task learning framework designed for semantic segmentation and depth, edge, and surface normal estimation from a single monocular image. Unlike conventional approaches that rely on independent single-task models or shared encoder-decoder architectures, M2H introduces a Window-Based Cross-Task Attention Module that enables structured feature exchange while preserving task-specific details, improving prediction consistency across tasks. Built on a lightweight ViT-based DINOv2 backbone, M2H is optimized for real-time deployment and serves as the foundation for monocular spatial perception systems supporting 3D scene graph construction in dynamic environments. Comprehensive evaluations show that M2H outperforms state-of-the-art multi-task models on NYUDv2, surpasses single-task depth and semantic baselines on Hypersim, and achieves superior performance on the Cityscapes dataset, all while maintaining computational efficiency on laptop hardware. Beyond benchmarks, M2H is validated on real-world data, demonstrating its practicality in spatial perception tasks.
comment: Accepted to the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025). 8 pages, 7 figures
Interactive Force-Impedance Control
Human collaboration with robots requires flexible role adaptation, enabling robot to switch between active leader and passive follower. Effective role switching depends on accurately estimating human intention, which is typically achieved through external force analysis, nominal robot dynamics, or data-driven approaches. However, these methods are primarily effective in contact-sparse environments. When robots under hybrid or unified force-impedance control physically interact with active humans or non-passive environments, the robotic system may lose passivity and thus compromise safety. To address this challenge, this paper proposes the unified Interactive Force-Impedance Control (IFIC) framework that adapts to the interaction power flow, ensuring effortless and safe interaction in contact-rich environments. The proposed control architecture is formulated within a port-Hamiltonian framework, incorporating both interaction and task control ports, through which system passivity is guaranteed.
DDBot: Differentiable Physics-based Digging Robot for Unknown Granular Materials
Automating the manipulation of granular materials poses significant challenges due to complex contact dynamics, unpredictable material properties, and intricate system states. Existing approaches often fail to achieve efficiency and accuracy in such tasks. To fill the research gap, this paper studies the small-scale and high-precision granular material digging task with unknown physical properties. A new framework, named differentiable digging robot (DDBot), is proposed to manipulate granular materials, including sand and soil. Specifically, we equip DDBot with a differentiable physics-based simulator, tailored for granular material manipulation, powered by GPU-accelerated parallel computing and automatic differentiation. DDBot can perform efficient differentiable system identification and high-precision digging skill optimisation for unknown granular materials, which is enabled by a differentiable skill-to-action mapping, a task-oriented demonstration method, gradient clipping and line search-based gradient descent. Experimental results show that DDBot can efficiently (converge within 5 to 20 minutes) identify unknown granular material dynamics and optimise digging skills, with high-precision results in zero-shot real-world deployments, highlighting its practicality. Benchmark results against state-of-the-art baselines also confirm the robustness and efficiency of DDBot in such digging tasks.
comment: Accepted as a regular paper by the IEEE Transactions on Robotics
Implicit State Estimation via Video Replanning
Video-based representations have gained prominence in planning and decision-making due to their ability to encode rich spatiotemporal dynamics and geometric relationships. These representations enable flexible and generalizable solutions for complex tasks such as object manipulation and navigation. However, existing video planning frameworks often struggle to adapt to failures at interaction time due to their inability to reason about uncertainties in partially observed environments. To overcome these limitations, we introduce a novel framework that integrates interaction-time data into the planning process. Our approach updates model parameters online and filters out previously failed plans during generation. This enables implicit state estimation, allowing the system to adapt dynamically without explicitly modeling unknown state variables. We evaluate our framework through extensive experiments on a new simulated manipulation benchmark, demonstrating its ability to improve replanning performance and advance the field of video-based decision-making.
Floating-Base Deep Lagrangian Networks
Grey-box methods for system identification combine deep learning with physics-informed constraints, capturing complex dependencies while improving out-of-distribution generalization. Yet, despite the growing importance of floating-base systems such as humanoids and quadrupeds, current grey-box models ignore their specific physical constraints. For instance, the inertia matrix is not only positive definite but also exhibits branch-induced sparsity and input independence. Moreover, the 6x6 composite spatial inertia of the floating base inherits properties of single-rigid-body inertia matrices. As we show, this includes the triangle inequality on the eigenvalues of the composite rotational inertia. To address the lack of physical consistency in deep learning models of floating-base systems, we introduce a parameterization of inertia matrices that satisfies all these constraints. Inspired by Deep Lagrangian Networks (DeLaN), we train neural networks to predict physically plausible inertia matrices that minimize inverse dynamics error under Lagrangian mechanics. For evaluation, we collected and released a dataset on multiple quadrupeds and humanoids. In these experiments, our Floating-Base Deep Lagrangian Networks (FeLaN) achieve highly competitive performance on both simulated and real robots, while providing greater physical interpretability.
High-Level Multi-Robot Trajectory Planning And Spurious Behavior Detection
The reliable execution of high-level missions in multi-robot systems with heterogeneous agents, requires robust methods for detecting spurious behaviors. In this paper, we address the challenge of identifying spurious executions of plans specified as a Linear Temporal Logic (LTL) formula, as incorrect task sequences, violations of spatial constraints, timing inconsis- tencies, or deviations from intended mission semantics. To tackle this, we introduce a structured data generation framework based on the Nets-within-Nets (NWN) paradigm, which coordinates robot actions with LTL-derived global mission specifications. We further propose a Transformer-based anomaly detection pipeline that classifies robot trajectories as normal or anomalous. Experi- mental evaluations show that our method achieves high accuracy (91.3%) in identifying execution inefficiencies, and demonstrates robust detection capabilities for core mission violations (88.3%) and constraint-based adaptive anomalies (66.8%). An ablation experiment of the embedding and architecture was carried out, obtaining successful results where our novel proposition performs better than simpler representations.
comment: 6 pages,3 figures, Iberian Robotics Conference 2025
An adaptive hierarchical control framework for quadrupedal robots in planetary exploration
Planetary exploration missions require robots capable of navigating extreme and unknown environments. While wheeled rovers have dominated past missions, their mobility is limited to traversable surfaces. Legged robots, especially quadrupeds, can overcome these limitations by handling uneven, obstacle-rich, and deformable terrains. However, deploying such robots in unknown conditions is challenging due to the need for environment-specific control, which is infeasible when terrain and robot parameters are uncertain. This work presents a modular control framework that combines model-based dynamic control with online model adaptation and adaptive footstep planning to address uncertainties in both robot and terrain properties. The framework includes state estimation for quadrupeds with and without contact sensing, supports runtime reconfiguration, and is integrated into ROS 2 with open-source availability. Its performance was validated on two quadruped platforms, multiple hardware architectures, and in a volcano field test, where the robot walked over 700 m.
comment: Presented at 18th Symposium on Advanced Space Technologies in Robotics and Automation (ASTRA)
Pole-Image: A Self-Supervised Pole-Anchored Descriptor for Long-Term LiDAR Localization and Map Maintenance
Long-term autonomy for mobile robots requires both robust self-localization and reliable map maintenance. Conventional landmark-based methods face a fundamental trade-off between landmarks with high detectability but low distinctiveness (e.g., poles) and those with high distinctiveness but difficult stable detection (e.g., local point cloud structures). This work addresses the challenge of descriptively identifying a unique "signature" (local point cloud) by leveraging a detectable, high-precision "anchor" (like a pole). To solve this, we propose a novel canonical representation, "Pole-Image," as a hybrid method that uses poles as anchors to generate signatures from the surrounding 3D structure. Pole-Image represents a pole-like landmark and its surrounding environment, detected from a LiDAR point cloud, as a 2D polar coordinate image with the pole itself as the origin. This representation leverages the pole's nature as a high-precision reference point, explicitly encoding the "relative geometry" between the stable pole and the variable surrounding point cloud. The key advantage of pole landmarks is that "detection" is extremely easy. This ease of detection allows the robot to easily track the same pole, enabling the automatic and large-scale collection of diverse observational data (positive pairs). This data acquisition feasibility makes "Contrastive Learning (CL)" applicable. By applying CL, the model learns a viewpoint-invariant and highly discriminative descriptor. The contributions are twofold: 1) The descriptor overcomes perceptual aliasing, enabling robust self-localization. 2) The high-precision encoding enables high-sensitivity change detection, contributing to map maintenance.
comment: 4 pages, technical report
Performance Evaluation of an Integrated System for Visible Light Communication and Positioning Using an Event Camera
Event cameras, featuring high temporal resolution and high dynamic range, offer visual sensing capabilities comparable to conventional image sensors while capturing fast-moving objects and handling scenes with extreme lighting contrasts such as tunnel exits. Leveraging these properties, this study proposes a novel self-localization system that integrates visible light communication (VLC) and visible light positioning (VLP) within a single event camera. The system enables a vehicle to estimate its position even in GPS-denied environments, such as tunnels, by using VLC to obtain coordinate information from LED transmitters and VLP to estimate the distance to each transmitter. Multiple LEDs are installed on the transmitter side, each assigned a unique pilot sequence based on Walsh-Hadamard codes. The event camera identifies individual LEDs within its field of view by correlating the received signal with these codes, allowing clear separation and recognition of each light source. This mechanism enables simultaneous high-capacity MISO (multi-input single-output) communication through VLC and precise distance estimation via phase-only correlation (POC) between multiple LED pairs. To the best of our knowledge, this is the first vehicle-mounted system to achieve simultaneous VLC and VLP functionalities using a single event camera. Field experiments were conducted by mounting the system on a vehicle traveling at 30 km/h (8.3 m/s). The results demonstrated robust real-world performance, with a root mean square error (RMSE) of distance estimation within 0.75 m for ranges up to 100 m and a bit error rate (BER) below 0.01 across the same range.
comment: 7pages, APCC2025
SimpleVSF: VLM-Scoring Fusion for Trajectory Prediction of End-to-End Autonomous Driving
End-to-end autonomous driving has emerged as a promising paradigm for achieving robust and intelligent driving policies. However, existing end-to-end methods still face significant challenges, such as suboptimal decision-making in complex scenarios. In this paper,we propose SimpleVSF (Simple VLM-Scoring Fusion), a novel framework that enhances end-to-end planning by leveraging the cognitive capabilities of Vision-Language Models (VLMs) and advanced trajectory fusion techniques. We utilize the conventional scorers and the novel VLM-enhanced scorers. And we leverage a robust weight fusioner for quantitative aggregation and a powerful VLM-based fusioner for qualitative, context-aware decision-making. As the leading approach in the ICCV 2025 NAVSIM v2 End-to-End Driving Challenge, our SimpleVSF framework demonstrates state-of-the-art performance, achieving a superior balance between safety, comfort, and efficiency.
comment: 6 pages, 2 figures, 2 tables
OmniVIC: A Self-Improving Variable Impedance Controller with Vision-Language In-Context Learning for Safe Robotic Manipulation
We present OmniVIC, a universal variable impedance controller (VIC) enhanced by a vision language model (VLM), which improves safety and adaptation in any contact-rich robotic manipulation task to enhance safe physical interaction. Traditional VIC have shown advantages when the robot physically interacts with the environment, but lack generalization in unseen, complex, and unstructured safe interactions in universal task scenarios involving contact or uncertainty. To this end, the proposed OmniVIC interprets task context derived reasoning from images and natural language and generates adaptive impedance parameters for a VIC controller. Specifically, the core of OmniVIC is a self-improving Retrieval-Augmented Generation(RAG) and in-context learning (ICL), where RAG retrieves relevant prior experiences from a structured memory bank to inform the controller about similar past tasks, and ICL leverages these retrieved examples and the prompt of current task to query the VLM for generating context-aware and adaptive impedance parameters for the current manipulation scenario. Therefore, a self-improved RAG and ICL guarantee OmniVIC works in universal task scenarios. The impedance parameter regulation is further informed by real-time force/torque feedback to ensure interaction forces remain within safe thresholds. We demonstrate that our method outperforms baselines on a suite of complex contact-rich tasks, both in simulation and on real-world robotic tasks, with improved success rates and reduced force violations. OmniVIC takes a step towards bridging high-level semantic reasoning and low-level compliant control, enabling safer and more generalizable manipulation. Overall, the average success rate increases from 27% (baseline) to 61.4% (OmniVIC).
comment: Code, video and RAG dataset are available at \url{https://sites.google.com/view/omni-vic}
DiffVLA++: Bridging Cognitive Reasoning and End-to-End Driving through Metric-Guided Alignment
Conventional end-to-end (E2E) driving models are effective at generating physically plausible trajectories, but often fail to generalize to long-tail scenarios due to the lack of essential world knowledge to understand and reason about surrounding environments. In contrast, Vision-Language-Action (VLA) models leverage world knowledge to handle challenging cases, but their limited 3D reasoning capability can lead to physically infeasible actions. In this work we introduce DiffVLA++, an enhanced autonomous driving framework that explicitly bridges cognitive reasoning and E2E planning through metric-guided alignment. First, we build a VLA module directly generating semantically grounded driving trajectories. Second, we design an E2E module with a dense trajectory vocabulary that ensures physical feasibility. Third, and most critically, we introduce a metric-guided trajectory scorer that guides and aligns the outputs of the VLA and E2E modules, thereby integrating their complementary strengths. The experiment on the ICCV 2025 Autonomous Grand Challenge leaderboard shows that DiffVLA++ achieves EPDMS of 49.12.
Decentralized Real-Time Planning for Multi-UAV Cooperative Manipulation via Imitation Learning
Existing approaches for transporting and manipulating cable-suspended loads using multiple UAVs along reference trajectories typically rely on either centralized control architectures or reliable inter-agent communication. In this work, we propose a novel machine learning based method for decentralized kinodynamic planning that operates effectively under partial observability and without inter-agent communication. Our method leverages imitation learning to train a decentralized student policy for each UAV by imitating a centralized kinodynamic motion planner with access to privileged global observations. The student policy generates smooth trajectories using physics-informed neural networks that respect the derivative relationships in motion. During training, the student policies utilize the full trajectory generated by the teacher policy, leading to improved sample efficiency. Moreover, each student policy can be trained in under two hours on a standard laptop. We validate our method in both simulation and real-world environments to follow an agile reference trajectory, demonstrating performance comparable to that of centralized approaches.
comment: Accepted by IEEE MRS 2025
Efficient Vision-Language-Action Models for Embodied Manipulation: A Systematic Survey
Vision-Language-Action (VLA) models extend vision-language models to embodied control by mapping natural-language instructions and visual observations to robot actions. Despite their capabilities, VLA systems face significant challenges due to their massive computational and memory demands, which conflict with the constraints of edge platforms such as on-board mobile manipulators that require real-time performance. Addressing this tension has become a central focus of recent research. In light of the growing efforts toward more efficient and scalable VLA systems, this survey provides a systematic review of approaches for improving VLA efficiency, with an emphasis on reducing latency, memory footprint, and training and inference costs. We categorize existing solutions into four dimensions: model architecture, perception feature, action generation, and training/inference strategies, summarizing representative techniques within each category. Finally, we discuss future trends and open challenges, highlighting directions for advancing efficient embodied intelligence.
Learning to Design Soft Hands using Reward Models
Soft robotic hands promise to provide compliant and safe interaction with objects and environments. However, designing soft hands to be both compliant and functional across diverse use cases remains challenging. Although co-design of hardware and control better couples morphology to behavior, the resulting search space is high-dimensional, and even simulation-based evaluation is computationally expensive. In this paper, we propose a Cross-Entropy Method with Reward Model (CEM-RM) framework that efficiently optimizes tendon-driven soft robotic hands based on teleoperation control policy, reducing design evaluations by more than half compared to pure optimization while learning a distribution of optimized hand designs from pre-collected teleoperation data. We derive a design space for a soft robotic hand composed of flexural soft fingers and implement parallelized training in simulation. The optimized hands are then 3D-printed and deployed in the real world using both teleoperation data and real-time teleoperation. Experiments in both simulation and hardware demonstrate that our optimized design significantly outperforms baseline hands in grasping success rates across a diverse set of challenging objects.
Quality Over Quantity: Curating Contact-Based Robot Datasets Improves Learning
In this paper, we investigate the utility of datasets and whether more data or the 'right' data is advantageous for robot learning. In particular, we are interested on quantifying the utility of contact-based data as contact holds significant information for robot learning. Our approach derives a contact-aware objective function for learning object dynamics and shape from pose and contact data. We show that the contact-aware Fisher-information metric can be used to rank and curate contact-data based on how informative data is for learning. In addition, we find that selecting a reduced dataset based on this ranking improves the learning task while also making learning a deterministic process. Interestingly, our results show that more data is not necessarily advantageous, and rather, less but informative data can accelerate learning, especially depending on the contact interactions. Last, we show how our metric can be used to provide initial guidance on data curation for contact-based robot learning.
ANGEL: A Novel Gripper for Versatile and Light-touch Fruit Harvesting
Fruit harvesting remains predominantly a labor-intensive process, motivating the development of research for robotic grippers. Conventional rigid or vacuum-driven grippers require complex mechanical design or high energy consumption. Current enveloping-based fruit harvesting grippers lack adaptability to fruits of different sizes. This paper introduces a drawstring-inspired, cable-driven soft gripper for versatile and gentle fruit harvesting. The design employs 3D-printed Thermoplastic Polyurethane (TPU) pockets with integrated steel wires that constrict around the fruit when actuated, distributing pressure uniformly to minimize bruising and allow versatility to fruits of varying sizes. The lightweight structure, which requires few components, reduces mechanical complexity and cost compared to other grippers. Actuation is achieved through servo-driven cable control, while motor feedback provides autonomous grip adjustment with tunable grip strength. Experimental validation shows that, for tomatoes within the gripper's effective size range, harvesting was achieved with a 0% immediate damage rate and a bruising rate of less than 9% after five days, reinforcing the gripper's suitability for fruit harvesting.
SafeCoop: Unravelling Full Stack Safety in Agentic Collaborative Driving
Collaborative driving systems leverage vehicle-to-everything (V2X) communication across multiple agents to enhance driving safety and efficiency. Traditional V2X systems take raw sensor data, neural features, or perception results as communication media, which face persistent challenges, including high bandwidth demands, semantic loss, and interoperability issues. Recent advances investigate natural language as a promising medium, which can provide semantic richness, decision-level reasoning, and human-machine interoperability at significantly lower bandwidth. Despite great promise, this paradigm shift also introduces new vulnerabilities within language communication, including message loss, hallucinations, semantic manipulation, and adversarial attacks. In this work, we present the first systematic study of full-stack safety and security issues in natural-language-based collaborative driving. Specifically, we develop a comprehensive taxonomy of attack strategies, including connection disruption, relay/replay interference, content spoofing, and multi-connection forgery. To mitigate these risks, we introduce an agentic defense pipeline, which we call SafeCoop, that integrates a semantic firewall, language-perception consistency checks, and multi-source consensus, enabled by an agentic transformation function for cross-frame spatial alignment. We systematically evaluate SafeCoop in closed-loop CARLA simulation across 32 critical scenarios, achieving 69.15% driving score improvement under malicious attacks and up to 67.32% F1 score for malicious detection. This study provides guidance for advancing research on safe, secure, and trustworthy language-driven collaboration in transportation systems. Our project page is https://xiangbogaobarry.github.io/SafeCoop.
R2BC: Multi-Agent Imitation Learning from Single-Agent Demonstrations
Imitation Learning (IL) is a natural way for humans to teach robots, particularly when high-quality demonstrations are easy to obtain. While IL has been widely applied to single-robot settings, relatively few studies have addressed the extension of these methods to multi-agent systems, especially in settings where a single human must provide demonstrations to a team of collaborating robots. In this paper, we introduce and study Round-Robin Behavior Cloning (R2BC), a method that enables a single human operator to effectively train multi-robot systems through sequential, single-agent demonstrations. Our approach allows the human to teleoperate one agent at a time and incrementally teach multi-agent behavior to the entire system, without requiring demonstrations in the joint multi-agent action space. We show that R2BC methods match, and in some cases surpass, the performance of an oracle behavior cloning approach trained on privileged synchronized demonstrations across four multi-agent simulated tasks. Finally, we deploy R2BC on two physical robot tasks trained using real human demonstrations.
comment: 9 pages, 6 figures
Provably Optimal Reinforcement Learning under Safety Filtering
Recent advances in reinforcement learning (RL) enable its use on increasingly complex tasks, but the lack of formal safety guarantees still limits its application in safety-critical settings. A common practical approach is to augment the RL policy with a safety filter that overrides unsafe actions to prevent failures during both training and deployment. However, safety filtering is often perceived as sacrificing performance and hindering the learning process. We show that this perceived safety-performance tradeoff is not inherent and prove, for the first time, that enforcing safety with a sufficiently permissive safety filter does not degrade asymptotic performance. We formalize RL safety with a safety-critical Markov decision process (SC-MDP), which requires categorical, rather than high-probability, avoidance of catastrophic failure states. Additionally, we define an associated filtered MDP in which all actions result in safe effects, thanks to a safety filter that is considered to be a part of the environment. Our main theorem establishes that (i) learning in the filtered MDP is safe categorically, (ii) standard RL convergence carries over to the filtered MDP, and (iii) any policy that is optimal in the filtered MDP-when executed through the same filter-achieves the same asymptotic return as the best safe policy in the SC-MDP, yielding a complete separation between safety enforcement and performance optimization. We validate the theory on Safety Gymnasium with representative tasks and constraints, observing zero violations during training and final performance matching or exceeding unfiltered baselines. Together, these results shed light on a long-standing question in safety-filtered learning and provide a simple, principled recipe for safe RL: train and deploy RL policies with the most permissive safety filter that is available.
comment: 17 pages, 3 figures
MOFM-Nav: On-Manifold Ordering-Flexible Multi-Robot Navigation
This paper addresses the problem of multi-robot navigation where robots maneuver on a desired \(m\)-dimensional (i.e., \(m\)-D) manifold in the $n$-dimensional Euclidean space, and maintain a {\it flexible spatial ordering}. We consider $ m\geq 2$, and the multi-robot coordination is achieved via non-Euclidean metrics. However, since the $m$-D manifold can be characterized by the zero-level sets of $n$ implicit functions, the last $m$ entries of the GVF propagation term become {\it strongly coupled} with the partial derivatives of these functions if the auxiliary vectors are not appropriately chosen. These couplings not only influence the on-manifold maneuvering of robots, but also pose significant challenges to the further design of the ordering-flexible coordination via non-Euclidean metrics. To tackle this issue, we first identify a feasible solution of auxiliary vectors such that the last $m$ entries of the propagation term are effectively decoupled to be the same constant. Then, we redesign the coordinated GVF (CGVF) algorithm to {\it boost} the advantages of singularities elimination and global convergence by treating $m$ manifold parameters as additional $m$ virtual coordinates. Furthermore, we enable the on-manifold ordering-flexible motion coordination by allowing each robot to share $m$ virtual coordinates with its time-varying neighbors and a virtual target robot, which {\it circumvents} the possible complex calculation if Euclidean metrics were used instead. Finally, we showcase the proposed algorithm's flexibility, adaptability, and robustness through extensive simulations with different initial positions, higher-dimensional manifolds, and robot breakdown, respectively.
SPACeR: Self-Play Anchoring with Centralized Reference Models
Developing autonomous vehicles (AVs) requires not only safety and efficiency, but also realistic, human-like behaviors that are socially aware and predictable. Achieving this requires sim agent policies that are human-like, fast, and scalable in multi-agent settings. Recent progress in imitation learning with large diffusion-based or tokenized models has shown that behaviors can be captured directly from human driving data, producing realistic policies. However, these models are computationally expensive, slow during inference, and struggle to adapt in reactive, closed-loop scenarios. In contrast, self-play reinforcement learning (RL) scales efficiently and naturally captures multi-agent interactions, but it often relies on heuristics and reward shaping, and the resulting policies can diverge from human norms. We propose SPACeR, a framework that leverages a pretrained tokenized autoregressive motion model as a centralized reference policy to guide decentralized self-play. The reference model provides likelihood rewards and KL divergence, anchoring policies to the human driving distribution while preserving RL scalability. Evaluated on the Waymo Sim Agents Challenge, our method achieves competitive performance with imitation-learned policies while being up to 10x faster at inference and 50x smaller in parameter size than large generative models. In addition, we demonstrate in closed-loop ego planning evaluation tasks that our sim agents can effectively measure planner quality with fast and scalable traffic simulation, establishing a new paradigm for testing autonomous driving policies.
comment: Project page: https://spacer-ai.github.io/
SAVANT: Semantic Analysis with Vision-Augmented Anomaly deTection
Autonomous driving systems remain critically vulnerable to the long-tail of rare, out-of-distribution scenarios with semantic anomalies. While Vision Language Models (VLMs) offer promising reasoning capabilities, naive prompting approaches yield unreliable performance and depend on expensive proprietary models, limiting practical deployment. We introduce SAVANT (Semantic Analysis with Vision-Augmented Anomaly deTection), a structured reasoning framework that achieves high accuracy and recall in detecting anomalous driving scenarios from input images through layered scene analysis and a two-phase pipeline: structured scene description extraction followed by multi-modal evaluation. Our approach transforms VLM reasoning from ad-hoc prompting to systematic analysis across four semantic layers: Street, Infrastructure, Movable Objects, and Environment. SAVANT achieves 89.6% recall and 88.0% accuracy on real-world driving scenarios, significantly outperforming unstructured baselines. More importantly, we demonstrate that our structured framework enables a fine-tuned 7B parameter open-source model (Qwen2.5VL) to achieve 90.8% recall and 93.8% accuracy - surpassing all models evaluated while enabling local deployment at near-zero cost. By automatically labeling over 9,640 real-world images with high accuracy, SAVANT addresses the critical data scarcity problem in anomaly detection and provides a practical path toward reliable, accessible semantic monitoring for autonomous systems.
comment: 8 pages, 5 figures
Humanoid Goalkeeper: Learning from Position Conditioned Task-Motion Constraints
We present a reinforcement learning framework for autonomous goalkeeping with humanoid robots in real-world scenarios. While prior work has demonstrated similar capabilities on quadrupedal platforms, humanoid goalkeeping introduces two critical challenges: (1) generating natural, human-like whole-body motions, and (2) covering a wider guarding range with an equivalent response time. Unlike existing approaches that rely on separate teleoperation or fixed motion tracking for whole-body control, our method learns a single end-to-end RL policy, enabling fully autonomous, highly dynamic, and human-like robot-object interactions. To achieve this, we integrate multiple human motion priors conditioned on perceptual inputs into the RL training via an adversarial scheme. We demonstrate the effectiveness of our method through real-world experiments, where the humanoid robot successfully performs agile, autonomous, and naturalistic interceptions of fast-moving balls. In addition to goalkeeping, we demonstrate the generalization of our approach through tasks such as ball escaping and grabbing. Our work presents a practical and scalable solution for enabling highly dynamic interactions between robots and moving objects, advancing the field toward more adaptive and lifelike robotic behaviors.
RoboChallenge: Large-scale Real-robot Evaluation of Embodied Policies
Testing on real machines is indispensable for robotic control algorithms. In the context of learning-based algorithms, especially VLA models, demand for large-scale evaluation, i.e. testing a large number of models on a large number of tasks, is becoming increasingly urgent. However, doing this right is highly non-trivial, especially when scalability and reproducibility is taken into account. In this report, we describe our methodology for constructing RoboChallenge, an online evaluation system to test robotic control algorithms, and our survey of recent state-of-the-art VLA models using our initial benchmark Table30.
comment: Authors are listed in alphabetical order. The official website is located at https://robochallenge.ai
Studying the Effects of Robot Intervention on School Shooters in Virtual Reality
We advance the understanding of robotic intervention in high-risk scenarios by examining their potential to distract and impede a school shooter. To evaluate this concept, we conducted a virtual reality study with 150 university participants role-playing as a school shooter. Within the simulation, an autonomous robot predicted the shooter's movements and positioned itself strategically to interfere and distract. The strategy the robot used to approach the shooter was manipulated -- either moving directly in front of the shooter (aggressive) or maintaining distance (passive) -- and the distraction method, ranging from no additional cues (low), to siren and lights (medium), to siren, lights, and smoke to impair visibility (high). An aggressive, high-distraction robot reduced the number of victims by 46.6% relative to a no-robot control. This outcome underscores both the potential of robotic intervention to enhance safety and the pressing ethical questions surrounding their use in school environments.
comment: Preprint under review for conference publication. 10 pages, 9 figures, 3 tables (including 1-page appendix)
DISCOVER: Automated Curricula for Sparse-Reward Reinforcement Learning NeurIPS 2025
Sparse-reward reinforcement learning (RL) can model a wide range of highly complex tasks. Solving sparse-reward tasks is RL's core premise, requiring efficient exploration coupled with long-horizon credit assignment, and overcoming these challenges is key for building self-improving agents with superhuman ability. Prior work commonly explores with the objective of solving many sparse-reward tasks, making exploration of individual high-dimensional, long-horizon tasks intractable. We argue that solving such challenging tasks requires solving simpler tasks that are relevant to the target task, i.e., whose achieval will teach the agent skills required for solving the target task. We demonstrate that this sense of direction, necessary for effective exploration, can be extracted from existing RL algorithms, without leveraging any prior information. To this end, we propose a method for directed sparse-reward goal-conditioned very long-horizon RL (DISCOVER), which selects exploratory goals in the direction of the target task. We connect DISCOVER to principled exploration in bandits, formally bounding the time until the target task becomes achievable in terms of the agent's initial distance to the target, but independent of the volume of the space of all tasks. We then perform a thorough evaluation in high-dimensional environments. We find that the directed goal selection of DISCOVER solves exploration problems that are beyond the reach of prior state-of-the-art exploration methods in RL.
comment: NeurIPS 2025
General agents contain world models ICML 2025
Are world models a necessary ingredient for flexible, goal-directed behaviour, or is model-free learning sufficient? We provide a formal answer to this question, showing that any agent capable of generalizing to multi-step goal-directed tasks must have learned a predictive model of its environment. We show that this model can be extracted from the agent's policy, and that increasing the agents performance or the complexity of the goals it can achieve requires learning increasingly accurate world models. This has a number of consequences: from developing safe and general agents, to bounding agent capabilities in complex environments, and providing new algorithms for eliciting world models from agents.
comment: Accepted ICML 2025. Typos corrected
Leveraging Vision-Language Models for Open-Vocabulary Instance Segmentation and Tracking
Vision-language models (VLMs) excel in visual understanding but often lack reliable grounding capabilities and actionable inference rates. Integrating them with open-vocabulary object detection (OVD), instance segmentation, and tracking leverages their strengths while mitigating these drawbacks. We utilize VLM-generated structured descriptions to identify visible object instances, collect application-relevant attributes, and inform an open-vocabulary detector to extract corresponding bounding boxes that are passed to a video segmentation model providing segmentation masks and tracking. Once initialized, this model directly extracts segmentation masks, processing image streams in real time with minimal computational overhead. Tracks can be updated online as needed by generating new structured descriptions and detections. This combines the descriptive power of VLMs with the grounding capability of OVD and the pixel-level understanding and speed of video segmentation. Our evaluation across datasets and robotics platforms demonstrates the broad applicability of this approach, showcasing its ability to extract task-specific attributes from non-standard objects in dynamic environments. Code, data, videos, and benchmarks are available at https://vlm-gist.github.io
comment: IEEE Robotics and Automation Letters (RA-L), November 2025
4D Radar-Inertial Odometry based on Gaussian Modeling and Multi-Hypothesis Scan Matching
4D millimeter-wave (mmWave) radars are sensors that provide robustness against adverse weather conditions (rain, snow, fog, etc.), and as such they are increasingly used for odometry and SLAM (Simultaneous Location and Mapping). However, the noisy and sparse nature of the returned scan data proves to be a challenging obstacle for existing registration algorithms, especially those originally intended for more accurate sensors such as LiDAR. Following the success of 3D Gaussian Splatting for vision, in this paper we propose a summarized representation for radar scenes based on global simultaneous optimization of 3D Gaussians as opposed to voxel-based approaches, and leveraging its inherent probability distribution function for registration. Moreover, we propose tackling the problem of radar noise by optimizing multiple scan matching hypotheses in order to further increase the robustness of the system against local optima of the function. Finally, following existing practice we implement an Extended Kalman Filter-based Radar-Inertial Odometry pipeline in order to evaluate the effectiveness of our system. Experiments using publicly available 4D radar datasets show that our Gaussian approach is comparable to existing registration algorithms, outperforming them in several sequences.
comment: Our code and results can be publicly accessed at: https://github.com/robotics-upo/gaussian-rio-cpp
Unseen from Seen: Rewriting Observation-Instruction Using Foundation Models for Augmenting Vision-Language Navigation
Data scarcity is a long-standing challenge in the Vision-Language Navigation (VLN) field, which extremely hinders the generalization of agents to unseen environments. Previous works primarily rely on additional simulator data or web-collected images/videos to improve the generalization. However, the simulator environments still face limited diversity, and the web-collected data often requires extensive labor to remove the noise. In this paper, we propose a Rewriting-driven AugMentation (RAM) paradigm for VLN, which directly creates the unseen observation-instruction pairs via rewriting human-annotated training data. Benefiting from our rewriting mechanism, new observation-instruction pairs can be obtained in both simulator-free and labor-saving manners to promote generalization. Specifically, we first introduce Object-Enriched Observation Rewriting, where we combine Vision-Language Models (VLMs) and Large Language Models (LLMs) to derive rewritten object-enriched scene descriptions, enabling observation synthesis with diverse objects and spatial layouts via Text-to-Image Generation Models (T2IMs). Then, we propose Observation-Contrast Instruction Rewriting, which generates observation-aligned rewritten instructions by requiring LLMs to reason the difference between original and new observations. We further develop a mixing-then-focusing training strategy with a random observation cropping scheme, effectively enhancing data distribution diversity while suppressing augmentation data noise during training. Experiments on both the discrete environments (R2R, REVERIE, and R4R datasets) and continuous environments (R2R-CE dataset) show the superior performance and impressive generalization ability of our method. Code is available at https://github.com/SaDil13/VLN-RAM.
comment: Accepted by IEEE Transactions on Neural Networks and Learning Systems
ADA-DPM: A Neural Descriptors-based Adaptive Noise Filtering Strategy for SLAM
Lidar SLAM plays a significant role in mobile robot navigation and high-definition map construction. However, existing methods often face a trade-off between localization accuracy and system robustness in scenarios with a high proportion of dynamic objects, point cloud distortion, and unstructured environments. To address this issue, we propose a neural descriptors-based adaptive noise filtering strategy for SLAM, named ADA-DPM, which improves the performance of localization and mapping tasks through three key technical innovations. Firstly, to tackle dynamic object interference, we design the Dynamic Segmentation Head to predict and filter out dynamic feature points, eliminating the ego-motion interference caused by dynamic objects. Secondly, to mitigate the impact of noise and unstructured feature points, we propose the Global Importance Scoring Head that adaptively selects high-contribution feature points while suppressing the influence of noise and unstructured feature points. Moreover, we introduce the Cross-Layer Graph Convolution Module (GLI-GCN) to construct multi-scale neighborhood graphs, fusing local structural information across different scales and improving the discriminative power of overlapping features. Finally, experimental validations on multiple public datasets confirm the effectiveness of ADA-DPM.
LANGTRAJ: Diffusion Model and Dataset for Language-Conditioned Trajectory Simulation ICCV 2025
Evaluating autonomous vehicles with controllability enables scalable testing in counterfactual or structured settings, enhancing both efficiency and safety. We introduce LangTraj, a language-conditioned scene-diffusion model that simulates the joint behavior of all agents in traffic scenarios. By conditioning on natural language inputs, LangTraj provides flexible and intuitive control over interactive behaviors, generating nuanced and realistic scenarios. Unlike prior approaches that depend on domain-specific guidance functions, LangTraj incorporates language conditioning during training, facilitating more intuitive traffic simulation control. We propose a novel closed-loop training strategy for diffusion models, explicitly tailored to enhance stability and realism during closed-loop simulation. To support language-conditioned simulation, we develop Inter-Drive, a large-scale dataset with diverse and interactive labels for training language-conditioned diffusion models. Our dataset is built upon a scalable pipeline for annotating agent-agent interactions and single-agent behaviors, ensuring rich and varied supervision. Validated on the Waymo Open Motion Dataset, LangTraj demonstrates strong performance in realism, language controllability, and language-conditioned safety-critical simulation, establishing a new paradigm for flexible and scalable autonomous vehicle testing. Project Website: https://langtraj.github.io/
comment: ICCV 2025
Robust Deterministic Policy Gradient for Disturbance Attenuation and Its Application to Quadrotor Control
Practical control systems pose significant challenges in identifying optimal control policies due to uncertainties in the system model and external disturbances. While $H_\infty$ control techniques are commonly used to design robust controllers that mitigate the effects of disturbances, these methods often require complex and computationally intensive calculations. To address this issue, this paper proposes a reinforcement learning algorithm called robust deterministic policy gradient (RDPG), which formulates the $H_\infty$ control problem as a two-player zero-sum dynamic game. In this formulation, one player (the user) aims to minimize the cost, while the other player (the adversary) seeks to maximize it. We then employ deterministic policy gradient (DPG) and its deep reinforcement learning counterpart to train a robust control policy with effective disturbance attenuation. In particular, for practical implementation, we introduce an algorithm called robust deep deterministic policy gradient (RDDPG), which employs a deep neural network architecture and integrates techniques from the twin-delayed deep deterministic policy gradient (TD3) to enhance stability and learning efficiency. To evaluate the proposed algorithm, we implement it on an unmanned aerial vehicle (UAV) tasked with following a predefined path in a disturbance-prone environment. The experimental results demonstrate that the proposed method outperforms other control approaches in terms of robustness against disturbances, enabling precise real-time tracking of moving targets even under severe disturbance conditions.
comment: 24 pages
From Perception Logs to Failure Modes: Language-Driven Semantic Clustering of Failures for Robot Safety
As robotic systems become increasingly integrated into real-world environments -- ranging from autonomous vehicles to household assistants -- they inevitably encounter diverse and unstructured scenarios that lead to failures. While such failures pose safety and reliability challenges, they also provide rich perceptual data for improving future performance. However, manually analyzing large-scale failure datasets is impractical. In this work, we present a method for automatically organizing large-scale robotic failure data into semantically meaningful failure clusters, enabling scalable learning from failure without human supervision. Our approach leverages the reasoning capabilities of Multimodal Large Language Models (MLLMs), trained on internet-scale data, to infer high-level failure causes from raw perceptual trajectories and discover interpretable structure within uncurated failure logs. These semantic clusters reveal patterns and hypothesized causes of failure, enabling scalable learning from experience. We demonstrate that the discovered failure modes can guide targeted data collection for policy refinement, accelerating iterative improvement in agent policies and overall safety. Additionally, we show that these semantic clusters can benefit online failure monitoring systems, offering a lightweight yet powerful safeguard for real-time operation. We demonstrate that this framework enhances robot learning and robustness by transforming real-world failures into actionable and interpretable signals for adaptation.
COMPASS: Cooperative Multi-Agent Persistent Monitoring using Spatio-Temporal Attention Network
Persistent monitoring of dynamic targets is essential in real-world applications such as disaster response, environmental sensing, and wildlife conservation, where mobile agents must continuously gather information under uncertainty. We propose COMPASS, a multi-agent reinforcement learning (MARL) framework that enables decentralized agents to persistently monitor multiple moving targets efficiently. We model the environment as a graph, where nodes represent spatial locations and edges capture topological proximity, allowing agents to reason over structured layouts and revisit informative regions as needed. Each agent independently selects actions based on a shared spatio-temporal attention network that we design to integrate historical observations and spatial context. We model target dynamics using Gaussian Processes (GPs), which support principled belief updates and enable uncertainty-aware planning. We train COMPASS using centralized value estimation and decentralized policy execution under an adaptive reward setting. Our extensive experiments demonstrate that COMPASS consistently outperforms strong baselines in uncertainty reduction, target coverage, and coordination efficiency across dynamic multi-target scenarios.
comment: Accepted at IEEE MRS 2025
Optimal swimming with body compliance in an overdamped medium
Elongate animals and robots use undulatory body waves to locomote through diverse environments. Geometric mechanics provides a framework to model and optimize such systems in highly damped environments, connecting a prescribed shape change pattern (gait) with locomotion displacement. However, the practical applicability of controlling compliant physical robots remains to be demonstrated. In this work, we develop a framework based on geometric mechanics to predict locomotor performance and search for optimal swimming strategies of compliant swimmers. We introduce a compliant extension of Purcell's three-link swimmer by incorporating series-connected springs at the joints. Body dynamics are derived using resistive force theory. Geometric mechanics is incorporated into movement prediction and into an optimization framework that identifies strategies for controlling compliant swimmers to achieve maximal displacement. We validate our framework on a physical cable-driven three-link limbless robot and demonstrate accurate prediction and optimization of locomotor performance under varied programmed, state-dependent compliance in a granular medium. Our results establish a systematic, physics-based approach for modeling and controlling compliant swimming locomotion, highlighting compliance as a design feature that can be exploited for robust movement in both homogeneous and heterogeneous environments.
Validation of collision-free spheres of Stewart-Gough platforms for constant orientations using the Application Programming Interface of a CAD software
This paper presents a method of validation of the size of the largest collision-free sphere (CFS) of a 6-6 Stewart-Gough platform manipulator (SGPM) for a given orientation of its moving platform (MP) using the Application Programming Interface (API) of a CAD software. The position of the MP is updated via the API in an automated manner over a set of samples within a shell enclosing the surface of the CFS. For each pose of the manipulator, each pair of legs is investigated for mutual collisions. The CFS is considered safe or validated iff none of the points falling inside the CFS lead to a collision between any pair of legs. This approach can not only validate the safety of a precomputed CFS, but also estimate the same for any spatial parallel manipulator.
FlySearch: Exploring how vision-language models explore NeurIPS 2025
The real world is messy and unstructured. Uncovering critical information often requires active, goal-driven exploration. It remains to be seen whether Vision-Language Models (VLMs), which recently emerged as a popular zero-shot tool in many difficult tasks, can operate effectively in such conditions. In this paper, we answer this question by introducing FlySearch, a 3D, outdoor, photorealistic environment for searching and navigating to objects in complex scenes. We define three sets of scenarios with varying difficulty and observe that state-of-the-art VLMs cannot reliably solve even the simplest exploration tasks, with the gap to human performance increasing as the tasks get harder. We identify a set of central causes, ranging from vision hallucination, through context misunderstanding, to task planning failures, and we show that some of them can be addressed by finetuning. We publicly release the benchmark, scenarios, and the underlying codebase.
comment: NeurIPS 2025 Datasets and Benchmarks track
Sim2Dust: Mastering Dynamic Waypoint Tracking on Granular Media
Reliable autonomous navigation across the unstructured terrains of distant planetary surfaces is a critical enabler for future space exploration. However, the deployment of learning-based controllers is hindered by the inherent sim-to-real gap, particularly for the complex dynamics of wheel interactions with granular media. This work presents a complete sim-to-real framework for developing and validating robust control policies for dynamic waypoint tracking on such challenging surfaces. We leverage massively parallel simulation to train reinforcement learning agents across a vast distribution of procedurally generated environments with randomized physics. These policies are then transferred zero-shot to a physical wheeled rover operating in a lunar-analogue facility. Our experiments systematically compare multiple reinforcement learning algorithms and action smoothing filters to identify the most effective combinations for real-world deployment. Crucially, we provide strong empirical evidence that agents trained with procedural diversity achieve superior zero-shot performance compared to those trained on static scenarios. We also analyze the trade-offs of fine-tuning with high-fidelity particle physics, which offers minor gains in low-speed precision at a significant computational cost. Together, these contributions establish a validated workflow for creating reliable learning-based navigation systems, marking a substantial step towards deploying autonomous robots in the final frontier.
comment: Accepted for publication at the 2025 International Conference on Space Robotics (iSpaRo) | The source code is available at https://github.com/AndrejOrsula/space_robotics_bench
STITCHER: Constrained Trajectory Planning in Complex Environments with Real-Time Motion Primitive Search
Autonomous high-speed navigation through large, complex environments requires real-time generation of agile trajectories that are dynamically feasible, collision-free, and satisfy state or actuator constraints. Modern trajectory planning techniques primarily use numerical optimization, as they enable the systematic computation of high-quality, expressive trajectories that satisfy various constraints. However, stringent requirements on computation time and the risk of numerical instability can limit the use of optimization-based planners in safety-critical scenarios. This work presents an optimization-free planning framework called STITCHER that stitches short trajectory segments together with graph search to compute long-range, expressive, and near-optimal trajectories in real-time. STITCHER outperforms modern optimization-based planners through our innovative planning architecture and several algorithmic developments that make real-time planning possible. Extensive simulation testing is performed to analyze the algorithmic components that make up STITCHER, along with a thorough comparison with two state-of-the-art optimization planners. Simulation tests show that safe trajectories can be created within a few milliseconds for paths that span the entirety of two 50 m x 50 m environments. Hardware tests with a custom quadrotor verify that STITCHER can produce trackable paths in real-time while respecting nonconvex constraints, such as limits on tilt angle and motor forces, which are otherwise hard to include in optimization-based planners.
SDTagNet: Leveraging Text-Annotated Navigation Maps for Online HD Map Construction NeurIPS 2025
Autonomous vehicles rely on detailed and accurate environmental information to operate safely. High definition (HD) maps offer a promising solution, but their high maintenance cost poses a significant barrier to scalable deployment. This challenge is addressed by online HD map construction methods, which generate local HD maps from live sensor data. However, these methods are inherently limited by the short perception range of onboard sensors. To overcome this limitation and improve general performance, recent approaches have explored the use of standard definition (SD) maps as prior, which are significantly easier to maintain. We propose SDTagNet, the first online HD map construction method that fully utilizes the information of widely available SD maps, like OpenStreetMap, to enhance far range detection accuracy. Our approach introduces two key innovations. First, in contrast to previous work, we incorporate not only polyline SD map data with manually selected classes, but additional semantic information in the form of textual annotations. In this way, we enrich SD vector map tokens with NLP-derived features, eliminating the dependency on predefined specifications or exhaustive class taxonomies. Second, we introduce a point-level SD map encoder together with orthogonal element identifiers to uniformly integrate all types of map elements. Experiments on Argoverse 2 and nuScenes show that this boosts map perception performance by up to +5.9 mAP (+45%) w.r.t. map construction without priors and up to +3.2 mAP (+20%) w.r.t. previous approaches that already use SD map priors. Code is available at https://github.com/immel-f/SDTagNet
comment: 39th Conference on Neural Information Processing Systems (NeurIPS 2025)
Hierarchical Planning for Long-Horizon Multi-Target Tracking Under Target Motion Uncertainty
Achieving persistent tracking of multiple dynamic targets over a large spatial area poses significant challenges for a single-robot system with constrained sensing capabilities. As the robot moves to track different targets, the ones outside the field of view accumulate uncertainty, making them progressively harder to track. An effective path planning algorithm must manage uncertainty over a long horizon and account for the risk of permanently losing track of targets that remain unseen for too long. However, most existing approaches rely on short planning horizons and assume small, bounded environments, resulting in poor tracking performance and target loss in large-scale scenarios. In this paper, we present a hierarchical planner for tracking multiple moving targets with an aerial vehicle. To address the challenge of tracking non-static targets, our method incorporates motion models and uncertainty propagation during path execution, allowing for more informed decision-making. We decompose the multi-target tracking task into sub-tasks of single target search and detection, and our proposed pipeline consists a novel low-level coverage planner that enables searching for a target in an evolving belief area, and an estimation method to assess the likelihood of success for each sub-task, making it possible to convert the active target tracking task to a Markov decision process (MDP) that we solve with a tree-based algorithm to determine the sequence of sub-tasks. We validate our approach in simulation, demonstrating its effectiveness compared to existing planners for active target tracking tasks, and our proposed planner outperforms existing approaches, achieving a reduction of 11-70% in final uncertainty across different environments.
comment: Accepted to IEEE Robotics and Automation Letters (RA-L), 2025
Learning by Watching: A Review of Video-based Learning Approaches for Robot Manipulation
Robot learning of manipulation skills is hindered by the scarcity of diverse, unbiased datasets. While curated datasets can help, challenges remain in generalizability and real-world transfer. Meanwhile, large-scale "in-the-wild" video datasets have driven progress in computer vision through self-supervised techniques. Translating this to robotics, recent works have explored learning manipulation skills by passively watching abundant videos sourced online. Showing promising results, such video-based learning paradigms provide scalable supervision while reducing dataset bias. This survey reviews foundations such as video feature representation learning techniques, object affordance understanding, 3D hand/body modeling, and large-scale robot resources, as well as emerging techniques for acquiring robot manipulation skills from uncontrolled video demonstrations. We discuss how learning only from observing large-scale human videos can enhance generalization and sample efficiency for robotic manipulation. The survey summarizes video-based learning approaches, analyses their benefits over standard datasets, survey metrics, and benchmarks, and discusses open challenges and future directions in this nascent domain at the intersection of computer vision, natural language processing, and robot learning.
comment: Published at IEEE Access
Multiagent Systems
Executable Knowledge Graphs for Replicating AI Research
Replicating AI research is a crucial yet challenging task for large language model (LLM) agents. Existing approaches often struggle to generate executable code, primarily due to insufficient background knowledge and the limitations of retrieval-augmented generation (RAG) methods, which fail to capture latent technical details hidden in referenced papers. Furthermore, previous approaches tend to overlook valuable implementation-level code signals and lack structured knowledge representations that support multi-granular retrieval and reuse. To overcome these challenges, we propose Executable Knowledge Graphs (xKG), a modular and pluggable knowledge base that automatically integrates technical insights, code snippets, and domain-specific knowledge extracted from scientific literature. When integrated into three agent frameworks with two different LLMs, xKG shows substantial performance gains (10.9% with o3-mini) on PaperBench, demonstrating its effectiveness as a general and extensible solution for automated AI research replication. Code will released at https://github.com/zjunlp/xKG.
comment: Work in progress
A Principle of Targeted Intervention for Multi-Agent Reinforcement Learning NeurIPS 2025
Steering cooperative multi-agent reinforcement learning (MARL) towards desired outcomes is challenging, particularly when the global guidance from a human on the whole multi-agent system is impractical in a large-scale MARL. On the other hand, designing mechanisms to coordinate agents most relies on empirical studies, lacking a easy-to-use research tool. In this work, we employ multi-agent influence diagrams (MAIDs) as a graphical framework to address the above issues. First, we introduce interaction paradigms that leverage MAIDs to analyze and visualize existing approaches in MARL. Then, we design a new interaction paradigm based on MAIDs, referred to as targeted intervention that is applied to only a single targeted agent, so the problem of global guidance can be mitigated. In our implementation, we introduce a causal inference technique-referred to as Pre-Strategy Intervention (PSI)-to realize the targeted intervention paradigm. Since MAIDs can be regarded as a special class of causal diagrams, a composite desired outcome that integrates the primary task goal and an additional desired outcome can be achieved by maximizing the corresponding causal effect through the PSI. Moreover, the bundled relevance graph analysis of MAIDs provides a tool to identify whether an MARL learning paradigm is workable under the design of an interaction paradigm. In experiments, we demonstrate the effectiveness of our proposed targeted intervention, and verify the result of relevance graph analysis.
comment: Accepted to NeurIPS 2025
Intent-Driven LLM Ensemble Planning for Flexible Multi-Robot Disassembly: Demonstration on EV Batteries
This paper addresses the problem of planning complex manipulation tasks, in which multiple robots with different end-effectors and capabilities, informed by computer vision, must plan and execute concatenated sequences of actions on a variety of objects that can appear in arbitrary positions and configurations in unstructured scenes. We propose an intent-driven planning pipeline which can robustly construct such action sequences with varying degrees of supervisory input from a human using simple language instructions. The pipeline integrates: (i) perception-to-text scene encoding, (ii) an ensemble of large language models (LLMs) that generate candidate removal sequences based on the operator's intent, (iii) an LLM-based verifier that enforces formatting and precedence constraints, and (iv) a deterministic consistency filter that rejects hallucinated objects. The pipeline is evaluated on an example task in which two robot arms work collaboratively to dismantle an Electric Vehicle battery for recycling applications. A variety of components must be grasped and removed in specific sequences, determined by human instructions and/or by task-order feasibility decisions made by the autonomous system. On 200 real scenes with 600 operator prompts across five component classes, we used metrics of full-sequence correctness and next-task correctness to evaluate and compare five LLM-based planners (including ablation analyses of pipeline components). We also evaluated the LLM-based human interface in terms of time to execution and NASA TLX with human participant experiments. Results indicate that our ensemble-with-verification approach reliably maps operator intent to safe, executable multi-robot plans while maintaining low user effort.
comment: This work is funded by the project called "Research and Development of a Highly Automated and Safe Streamlined Process for Increasing Lithium-ion Battery Repurposing and Recycling" (REBELION) under Grant 101104241, and partially supported by the Ministry of National Education, Republic of Turkey. Submitted to Frontiers for Review
Strategyproof Facility Location for Five Agents on a Circle using PCD
We consider the strategyproof facility location problem on a circle. We focus on the case of 5 agents, and find a tight bound for the PCD strategyproof mechanism, which selects the reported location of an agent in proportion to the length of the arc in front of it. We methodically "reduce" the size of the instance space and then use standard optimization techniques to find and prove the bound is tight. Moreover we hypothesize the approximation ratio of PCD for general odd $n$.
Diverse Planning with Simulators via Linear Temporal Logic
Autonomous agents rely on automated planning algorithms to achieve their objectives. Simulation-based planning offers a significant advantage over declarative models in modelling complex environments. However, relying solely on a planner that produces a single plan may not be practical, as the generated plans may not always satisfy the agent's preferences. To address this limitation, we introduce $\texttt{FBI}_\texttt{LTL}$, a diverse planner explicitly designed for simulation-based planning problems. $\texttt{FBI}_\texttt{LTL}$ utilises Linear Temporal Logic (LTL) to define semantic diversity criteria, enabling agents to specify what constitutes meaningfully different plans. By integrating these LTL-based diversity models directly into the search process, $\texttt{FBI}_\texttt{LTL}$ ensures the generation of semantically diverse plans, addressing a critical limitation of existing diverse planning approaches that may produce syntactically different but semantically identical solutions. Extensive evaluations on various benchmarks consistently demonstrate that $\texttt{FBI}_\texttt{LTL}$ generates more diverse plans compared to a baseline approach. This work establishes the feasibility of semantically-guided diverse planning in simulation-based environments, paving the way for innovative approaches in realistic, non-symbolic domains where traditional model-based approaches fail.
BenCao: An Instruction-Tuned Large Language Model for Traditional Chinese Medicine
Traditional Chinese Medicine (TCM), with a history spanning over two millennia, plays a role in global healthcare. However, applying large language models (LLMs) to TCM remains challenging due to its reliance on holistic reasoning, implicit logic, and multimodal diagnostic cues. Existing TCM-domain LLMs have made progress in text-based understanding but lack multimodal integration, interpretability, and clinical applicability. To address these limitations, we developed BenCao, a ChatGPT-based multimodal assistant for TCM, integrating structured knowledge bases, diagnostic data, and expert feedback refinement. BenCao was trained through natural language instruction tuning rather than parameter retraining, aligning with expert-level reasoning and ethical norms specific to TCM. The system incorporates a comprehensive knowledge base of over 1,000 classical and modern texts, a scenario-based instruction framework for diverse interactions, a chain-of-thought simulation mechanism for interpretable reasoning, and a feedback refinement process involving licensed TCM practitioners. BenCao connects to external APIs for tongue-image classification and multimodal database retrieval, enabling dynamic access to diagnostic resources. In evaluations across single-choice question benchmarks and multimodal classification tasks, BenCao achieved superior accuracy to general-domain and TCM-domain models, particularly in diagnostics, herb recognition, and constitution classification. The model was deployed as an interactive application on the OpenAI GPTs Store, accessed by nearly 1,000 users globally as of October 2025. This study demonstrates the feasibility of developing a TCM-domain LLM through natural language-based instruction tuning and multimodal integration, offering a practical framework for aligning generative AI with traditional medical reasoning and a scalable pathway for real-world deployment.
MiCRO for Multilateral Negotiations
Recently, a very simple new bilateral negotiation strategy called MiCRO was introduced that does not make use of any kind of opponent modeling or machine learning techniques and that does not require fine-tuning of any parameters. Despite its simplicity, it was shown that MiCRO performs similar to -- or even better than -- most state-of-the-art negotiation strategies. This lead its authors to argue that the benchmark domains on which negotiation algorithms are typically tested may be too simplistic. However, one question that was left open, was how MiCRO could be generalized to multilateral negotiations. In this paper we fill this gap by introducing a multilateral variant of MiCRO. We compare it with the winners of the Automated Negotiating Agents Competitions (ANAC) of 2015, 2017 and 2018 and show that it outperforms them. Furthermore, we perform an empirical game-theoretical analysis to show that our new version of MiCRO forms an empirical Nash equilibrium.
comment: Extended version of short-paper presented at PRIMA2025
Graph Attention-Guided Search for Dense Multi-Agent Pathfinding
Finding near-optimal solutions for dense multi-agent pathfinding (MAPF) problems in real-time remains challenging even for state-of-the-art planners. To this end, we develop a hybrid framework that integrates a learned heuristic derived from MAGAT, a neural MAPF policy with a graph attention scheme, into a leading search-based algorithm, LaCAM. While prior work has explored learning-guided search in MAPF, such methods have historically underperformed. In contrast, our approach, termed LaGAT, outperforms both purely search-based and purely learning-based methods in dense scenarios. This is achieved through an enhanced MAGAT architecture, a pre-train-then-fine-tune strategy on maps of interest, and a deadlock detection scheme to account for imperfect neural guidance. Our results demonstrate that, when carefully designed, hybrid search offers a powerful solution for tightly coupled, challenging multi-agent coordination problems.
ATL*AS: An Automata-Theoretic Approach and Tool for the Verification of Strategic Abilities in Multi-Agent Systems
We present two novel symbolic algorithms for model checking the Alternating-time Temporal Logic ATL*, over both the infinite-trace and the finite-trace semantics. In particular, for infinite traces we design a novel symbolic reduction to parity games. We implement both methods in the ATL*AS model checker and evaluate it using synthetic benchmarks as well as a cybersecurity scenario. Our results demonstrate that the symbolic approach significantly outperforms the explicit-state representation and we find that our parity-game-based algorithm offers a more scalable and efficient solution for infinite-trace verification, outperforming previously available tools. Our results also confirm that finite-trace model checking yields substantial performance benefits over infinite-trace verification. As such, we provide a comprehensive toolset for verifying multiagent systems against specifications in ATL*.
Verification-Aware Planning for Multi-Agent Systems
Large language model (LLM) agents are increasingly deployed to tackle complex tasks, often necessitating collaboration among multiple specialized agents. However, multi-agent collaboration introduces new challenges in planning, coordination, and verification. Execution failures frequently arise not from flawed reasoning alone, but from subtle misalignments in task interpretation, output format, or inter-agent handoffs. To address these challenges, we present VeriMAP, a framework for multi-agent collaboration with verification-aware planning. The VeriMAP planner decomposes tasks, models subtask dependencies, and encodes planner-defined passing criteria as subtask verification functions (VFs) in Python and natural language. We evaluate VeriMAP on diverse datasets, demonstrating that it outperforms both single- and multi-agent baselines while enhancing system robustness and interpretability. Our analysis highlights how verification-aware planning enables reliable coordination and iterative refinement in multi-agent systems, without relying on external labels or annotations.
comment: Submission for ARR Oct
R2BC: Multi-Agent Imitation Learning from Single-Agent Demonstrations
Imitation Learning (IL) is a natural way for humans to teach robots, particularly when high-quality demonstrations are easy to obtain. While IL has been widely applied to single-robot settings, relatively few studies have addressed the extension of these methods to multi-agent systems, especially in settings where a single human must provide demonstrations to a team of collaborating robots. In this paper, we introduce and study Round-Robin Behavior Cloning (R2BC), a method that enables a single human operator to effectively train multi-robot systems through sequential, single-agent demonstrations. Our approach allows the human to teleoperate one agent at a time and incrementally teach multi-agent behavior to the entire system, without requiring demonstrations in the joint multi-agent action space. We show that R2BC methods match, and in some cases surpass, the performance of an oracle behavior cloning approach trained on privileged synchronized demonstrations across four multi-agent simulated tasks. Finally, we deploy R2BC on two physical robot tasks trained using real human demonstrations.
comment: 9 pages, 6 figures
On Condorcet's Jury Theorem with Abstention
The well-known Condorcet Jury Theorem states that, under majority rule, the better of two alternatives is chosen with probability approaching one as the population grows. We study an asymmetric setting where voters face varying participation costs and share a possibly heuristic belief about their pivotality (ability to influence the outcome). In a costly voting setup where voters abstain if their participation cost is greater than their pivotality estimate, we identify a single property of the heuristic belief -- weakly vanishing pivotality -- that gives rise to multiple stable equilibria in which elections are nearly tied. In contrast, strongly vanishing pivotality (as in the standard Calculus of Voting model) yields a unique, trivial equilibrium where only zero-cost voters participate as the population grows. We then characterize when nontrivial equilibria satisfy a version of the Jury Theorem: below a sharp threshold, the majority-preferred candidate wins with probability approaching one; above it, both candidates either win with equal probability.
OPTAGENT: Optimizing Multi-Agent LLM Interactions Through Verbal Reinforcement Learning for Enhanced Reasoning
Large Language Models (LLMs) have shown remarkable reasoning capabilities in mathematical and scientific tasks. To enhance complex reasoning, multi-agent systems have been proposed to harness the collective intelligence of LLM agents. However, existing collaboration structures are either predefined or rely on majority voting or round-table debates, which can suppress correct but less dominant agent contributions. Recent approaches model multi-agent systems as graph networks but optimize purely for agent performance, neglecting the quality of interactions. We hypothesize that effective agent communication is crucial for multi-agent reasoning and that debating quality plays a significant role. To address this, we propose $\ours$, a multi-agent verbal reinforcement learning algorithm that dynamically constructs and refines multi-agent collaboration structures. Our method defines action spaces and a feedback mechanism that evaluates communication robustness and coherence throughout the debate. The final decision is achieved through a majority vote over all the agents. We assess $\ours$ on various reasoning tasks, including mathematical reasoning, creative writing, scientific reasoning, and numerical sorting. Results demonstrate that our approach significantly outperforms single-agent prompting methods and state-of-the-art multi-agent frameworks on diverse tasks.
comment: 8 pages for main content
PLAGUE: Plug-and-play framework for Lifelong Adaptive Generation of Multi-turn Exploits
Large Language Models (LLMs) are improving at an exceptional rate. With the advent of agentic workflows, multi-turn dialogue has become the de facto mode of interaction with LLMs for completing long and complex tasks. While LLM capabilities continue to improve, they remain increasingly susceptible to jailbreaking, especially in multi-turn scenarios where harmful intent can be subtly injected across the conversation to produce nefarious outcomes. While single-turn attacks have been extensively explored, adaptability, efficiency and effectiveness continue to remain key challenges for their multi-turn counterparts. To address these gaps, we present PLAGUE, a novel plug-and-play framework for designing multi-turn attacks inspired by lifelong-learning agents. PLAGUE dissects the lifetime of a multi-turn attack into three carefully designed phases (Primer, Planner and Finisher) that enable a systematic and information-rich exploration of the multi-turn attack family. Evaluations show that red-teaming agents designed using PLAGUE achieve state-of-the-art jailbreaking results, improving attack success rates (ASR) by more than 30% across leading models in a lesser or comparable query budget. Particularly, PLAGUE enables an ASR (based on StrongReject) of 81.4% on OpenAI's o3 and 67.3% on Claude's Opus 4.1, two models that are considered highly resistant to jailbreaks in safety literature. Our work offers tools and insights to understand the importance of plan initialization, context optimization and lifelong learning in crafting multi-turn attacks for a comprehensive model vulnerability evaluation.
\textsc{autoresearcher}: Automating Knowledge-Grounded and Transparent Research Ideation with Multi-Agent Collaboration
Effective research relies on organizing extensive information and stimulating novel solutions. Agentic systems have recently emerged as a promising tool to automate literature-based ideation. However, current systems often remain black-box. Their outputs may appear plausible but weakly grounded, with limited transparency or control for researchers. Our work introduces \textsc{autoresearcher}, a multi-agent demo system for knowledge-grounded and transparent ideation. Specifically, \textsc{autoresearcher} integrates meticulously designed four stages into a unified framework: (A) Structured Knowledge Curation, (B) Diversified Idea Generation, (C) Multi-stage Idea Selection, and (D) Expert Panel Review \& Synthesis. Different from prior pipelines, our system not only exposes intermediate reasoning states, execution logs, and tunable agents for inspections, but also enables the generation of hypotheses that are both diverse and evidence-aligned. Our design is also domain-agnostic: as long as literature sources exist, the same pipeline can be instantiated in any scientific field. As an illustrative case, we demonstrate \textsc{autoresearcher} on a graph-mining case study ($k$-truss breaking problem), where it generates distinct, plausible hypotheses with evidence and critiques. A live demo and source code are available at https://github.com/valleysprings/AutoResearcher.
Evolution of AI Agent Registry Solutions: Centralized, Enterprise, and Distributed Approaches
Autonomous AI agents now operate across cloud, enterprise, and decentralized domains, creating demand for registry infrastructures that enable trustworthy discovery, capability negotiation, and identity assurance. We analyze five prominent approaches: (1) MCP Registry (centralized publication of mcp.json descriptors), (2) A2A Agent Cards (decentralized self-describing JSON capability manifests), (3) AGNTCY Agent Directory Service (IPFS Kademlia DHT content routing extended for semantic taxonomy-based content discovery, OCI artifact storage, and Sigstore-backed integrity), (4) Microsoft Entra Agent ID (enterprise SaaS directory with policy and zero-trust integration), and (5) NANDA Index AgentFacts (cryptographically verifiable, privacy-preserving fact model with credentialed assertions). Using four evaluation dimensions: security, authentication, scalability, and maintainability, we surface architectural trade-offs between centralized control, enterprise governance, and distributed resilience. We conclude with design recommendations for an emerging Internet of AI Agents requiring verifiable identity, adaptive discovery flows, and interoperable capability semantics.
Asynchronous Agents with Perfect Recall: Model Reductions, Knowledge-Based Construction, and Model Checking for Coalitional Strategies
Model checking of strategic abilities for agents with memory is a notoriously hard problem, and very few attempts have been made to tackle it. In this paper, we present two important steps towards this goal. First, we take the partial-order reduction scheme that was recently proved to preserve individual and coalitional abilities of memoryless agents, and show that it also works for agents with memory. Secondly, we take the Knowledge-Based Subset Construction, that was recently studied for synchronous concurrent games, and adapt it to preserve abilities of memoryful agents in asynchronous MAS. On the way, we also propose a new execution semantics for strategies in asynchronous MAS, that combines elements of Concurrent Game Structures and Interleaved Interpreted Systems in a natural and intuitive way.
First Field-Trial Demonstration of L4 Autonomous Optical Network for Distributed AI Training Communication: An LLM-Powered Multi-AI-Agent Solution
We demonstrate the first cross-domain cross-layer level-4 autonomous optical network via a multi-AI-agent system. Field trials show ~98% task completion rate across the distributed AI training lifecycle-3.2x higher than single agents using state-of-the-art LLMs.
comment: Accepted by 51st European Conference on Optical Communication (ECOC 2025), paper W.02.01.177
COMPASS: Cooperative Multi-Agent Persistent Monitoring using Spatio-Temporal Attention Network
Persistent monitoring of dynamic targets is essential in real-world applications such as disaster response, environmental sensing, and wildlife conservation, where mobile agents must continuously gather information under uncertainty. We propose COMPASS, a multi-agent reinforcement learning (MARL) framework that enables decentralized agents to persistently monitor multiple moving targets efficiently. We model the environment as a graph, where nodes represent spatial locations and edges capture topological proximity, allowing agents to reason over structured layouts and revisit informative regions as needed. Each agent independently selects actions based on a shared spatio-temporal attention network that we design to integrate historical observations and spatial context. We model target dynamics using Gaussian Processes (GPs), which support principled belief updates and enable uncertainty-aware planning. We train COMPASS using centralized value estimation and decentralized policy execution under an adaptive reward setting. Our extensive experiments demonstrate that COMPASS consistently outperforms strong baselines in uncertainty reduction, target coverage, and coordination efficiency across dynamic multi-target scenarios.
comment: Accepted at IEEE MRS 2025
Value-Based Large Language Model Agent Simulation for Mutual Evaluation of Trust and Interpersonal Closeness
Large language models (LLMs) have emerged as powerful tools for simulating complex social phenomena using human-like agents with specific traits. In human societies, value similarity is important for building trust and close relationships; however, it remains unexplored whether this principle holds true in artificial societies comprising LLM agents. Therefore, this study investigates the influence of value similarity on relationship-building among LLM agents through two experiments. First, in a preliminary experiment, we evaluated the controllability of values in LLMs to identify the most effective model and prompt design for controlling the values. Subsequently, in the main experiment, we generated pairs of LLM agents imbued with specific values and analyzed their mutual evaluations of trust and interpersonal closeness following a dialogue. The experiments were conducted in English and Japanese to investigate language dependence. The results confirmed that pairs of agents with higher value similarity exhibited greater mutual trust and interpersonal closeness. Our findings demonstrate that the LLM agent simulation serves as a valid testbed for social science theories, contributes to elucidating the mechanisms by which values influence relationship building, and provides a foundation for inspiring new theories and insights into the social sciences.
Safe Voting: Resilience to Abstention and Sybils
Voting rules may implement the will of the society when all eligible voters vote, and only them. However, they may fail to do so when sybil (fake or duplicate) votes are present and when only some honest (non sybil) voters actively participate. As, unfortunately, sometimes this is the case, our aim here is to address social choice in the presence of sybils and voter abstention. % To do so, we build upon the framework of Reality-aware Social Choice: we assume the status quo as an ever-present distinguished alternative, and study \emph{status quo Enforcing (QUE) voting rules}, which add virtual votes in support of the status quo. We characterize the tradeoff between \emph{safety} and \emph{liveness} (the ability of active honest voters to maintain/change the status quo, respectively) in several domains, and show that the voting rules are often optimal. \revision{Our characterization identifies the exact conditions under which mechanisms remain both resilient to sybils and responsive to verified participation, offering a quantitative tool for designers to measure the benefit of increased participation and verified identities.
Systems and Control (CS)
Admittance Matrix Concentration Inequalities for Understanding Uncertain Power Networks
This paper presents probabilistic bounds for the spectrum of the admittance matrix and classical linear power flow models under uncertain network parameters; for example, probabilistic line contingencies. Our proposed approach imports tools from probability theory, such as concentration inequalities for random matrices with independent entries. It yields error bounds for common approximations of the AC power flow equations under parameter uncertainty, including the DC and LinDistFlow approximations.
comment: 9 pages, 1 figure
Data-driven Communication and Control Design for Distributed Frequency Regulation with Black-box Inverters SC
The increasing penetration of inverter-based resources into the power grid, with often only black-box models available, challenges long-standing frequency control methods. Most recent works take a decentralized approach without online device coordination via communication. This paper considers both dynamic behavior and communication within secondary frequency control on an intermediate timescale. We develop a distributed data-driven approach that utilizes peer-to-peer communication between inverters to avoid the need for a central control center. To enable a trade off between communication network requirements and control performance, we present a framework to guide communication topology design for secondary frequency regulation. Following design of the inter-agent information exchange scheme, we design a controller that is structured according to the communication topology with a closed-loop stability guarantee. Case studies on the IEEE 39-bus system validate the framework and illustrate the trade-off between communication requirements and control performance that is enabled by our approach.
comment: Preprint submitted to PSCC 2026
Trajectory Optimization for Minimum Threat Exposure using Physics-Informed Neural Networks
We apply a physics-informed neural network (PINN) to solve the two-point boundary value problem (BVP) arising from the necessary conditions postulated by Pontryagin's Minimum Principle for optimal control. Such BVPs are known to be numerically difficult to solve by traditional shooting methods due to extremely high sensitivity to initial guesses. In the light of recent successes in applying PINNs for solving high-dimensional differential equations, we develop a PINN to solve the problem of finding trajectories with minimum exposure to a spatiotemporal threat for a vehicle kinematic model. First, we implement PINNs that are trained to solve the BVP for a given pair of initial and final states for a given threat field. Next, we implement a PINN conditioned on the initial state for a given threat field, which eliminates the need for retraining for each initial state. We demonstrate that the PINN outputs satisfy the necessary conditions with low numerical error.
comment: 2025 Indian Control Conference
Artificial magnetic conductor backed dual-mode sectoral cylindrical DRA for off-body biomedical telemetry
This research investigates the potential of a sectoral Cylindrical Dielectric Resonator Antenna (CDRA) for biomedical telemetry. CDRAs are known for their low loss, ruggedness, and stability, but their limited bandwidth and size make them unsuitable for wearable devices. The research addresses these limitations by proposing a dual mode antenna that operates in EH110 and TE210 modes. The sectoral CDRA is a quarter segment with Perfect Electric Conductor boundaries, reducing its size by a factor of four. Mathematical derivations of the field components for both modes are derived to support the design. To minimize specific absorption rate (SAR), an Artificial Magnetic Conductor (AMC) surface is applied to the antennas backside, enhancing compatibility with the transverse electric modes. The antenna achieves a bandwidth of 0.7 GHz (5.2-5.9 GHz), suitable for biomedical applications, with a measured peak gain of 7.9 dBi and a SAR of 1.24 W/kg when applied to a human arm.
comment: 13 pages
An Empirical Study of Lagrangian Methods in Safe Reinforcement Learning
In safety-critical domains such as robotics, navigation and power systems, constrained optimization problems arise where maximizing performance must be carefully balanced with associated constraints. Safe reinforcement learning provides a framework to address these challenges, with Lagrangian methods being a popular choice. However, the effectiveness of Lagrangian methods crucially depends on the choice of the Lagrange multiplier $\lambda$, which governs the trade-off between return and constraint cost. A common approach is to update the multiplier automatically during training. Although this is standard in practice, there remains limited empirical evidence on the robustness of an automated update and its influence on overall performance. Therefore, we analyze (i) optimality and (ii) stability of Lagrange multipliers in safe reinforcement learning across a range of tasks. We provide $\lambda$-profiles that give a complete visualization of the trade-off between return and constraint cost of the optimization problem. These profiles show the highly sensitive nature of $\lambda$ and moreover confirm the lack of general intuition for choosing the optimal value $\lambda^*$. Our findings additionally show that automated multiplier updates are able to recover and sometimes even exceed the optimal performance found at $\lambda^*$ due to the vast difference in their learning trajectories. Furthermore, we show that automated multiplier updates exhibit oscillatory behavior during training, which can be mitigated through PID-controlled updates. However, this method requires careful tuning to achieve consistently better performance across tasks. This highlights the need for further research on stabilizing Lagrangian methods in safe reinforcement learning. The code used to reproduce our results can be found at https://github.com/lindsayspoor/Lagrangian_SafeRL.
A condensing approach for linear-quadratic optimization with geometric constraints
Optimization problems with convex quadratic cost and polyhedral constraints are ubiquitous in signal processing, automatic control and decision-making. We consider here an enlarged problem class that allows to encode logical conditions and cardinality constraints, among others. In particular, we cover also situations where parts of the constraints are nonconvex and possibly complicated, but it is practical to compute projections onto this nonconvex set. Our approach combines the augmented Lagrangian framework with a solver-agnostic structure-exploiting subproblem reformulation. While convergence guarantees follow from the former, the proposed condensing technique leads to significant improvements in computational performance.
comment: 14 pages, 5 figures
ORIX: Orchestration of RIS with xApps for Smart Wireless Factory Environments
The vision of a smart wireless factory (SWF) demands highly flexible, low-latency, and reliable connectivity that goes beyond conventional wireless solutions. Reconfigurable intelligent surface (RIS)-empowered communications, when integrated with the open radio access network (O-RAN) architectures, have emerged as a promising enabler to meet these challenging requirements. This article introduces the methodology for the orchestration of RIS with xApps (ORIX), bringing the RIS technology into the O-RAN ecosystem through xApp-based control for SWF environments. ORIX features three key components: an O-RAN-compliant RIS service model for dynamic configuration, an RIS channel simulator that supports 3GPP indoor factory models with multiple industrial scenarios, and practical RIS optimization strategies with finite-resolution control. Together, these elements provide a realistic end-to-end emulation platform for evaluating RIS placement, control, and performance in SWF environments prior to deployment. The presented case study demonstrates how ORIX enables the evaluation of achievable performance gains, exploration of trade-offs among key RIS design parameters, and identification of deployment strategies that balance system performance with practical implementation constraints. By bridging theoretical advances with industrial feasibility, ORIX lays the groundwork for RIS-assisted O-RAN networks to power next-generation wireless communication in industrial scenarios.
comment: Submitted in IEEE
Inverse Optimal Control of Muscle Force Sharing During Pathological Gait
Muscle force sharing is typically resolved by minimizing a specific objective function to approximate neural control strategies. An inverse optimal control approach was applied to identify the "best" objective function, among a positive linear combination of basis objective functions, associated with the gait of two post-stroke males, one high-functioning (subject S1) and one low-functioning (subject S2). It was found that the "best" objective function is subject- and leg-specific. No single function works universally well, yet the best options are usually differently weighted combinations of muscle activation- and power-minimization. Subject-specific inverse optimal control models performed best on their respective limbs (\textbf{RMSE 178/213 N, CC 0.71/0.61} for non-paretic and paretic legs of S1; \textbf{RMSE 205/165 N, CC 0.88/0.85} for respective legs of S2), but cross-subject generalization was poor, particularly for paretic legs. Moreover, minimizing the root mean square of muscle power emerged as important for paretic limbs, while minimizing activation-based functions dominated for non-paretic limbs. This may suggest different neural control strategies between affected and unaffected sides, possibly altered by the presence of spasticity. Among the 15 considered objective functions commonly used in inverse dynamics-based computations, the root mean square of muscle power was the only one explicitly incorporating muscle velocity, leading to a possible model for spasticity in the paretic limbs. Although this objective function has been rarely used, it may be relevant for modeling pathological gait, such as post-stroke gait.
Integrating Trustworthy Artificial Intelligence with Energy-Efficient Robotic Arms for Waste Sorting
This paper presents a novel methodology that integrates trustworthy artificial intelligence (AI) with an energy-efficient robotic arm for intelligent waste classification and sorting. By utilizing a convolutional neural network (CNN) enhanced through transfer learning with MobileNetV2, the system accurately classifies waste into six categories: plastic, glass, metal, paper, cardboard, and trash. The model achieved a high training accuracy of 99.8% and a validation accuracy of 80.5%, demonstrating strong learning and generalization. A robotic arm simulator is implemented to perform virtual sorting, calculating the energy cost for each action using Euclidean distance to ensure optimal and efficient movement. The framework incorporates key elements of trustworthy AI, such as transparency, robustness, fairness, and safety, making it a reliable and scalable solution for smart waste management systems in urban settings.
comment: 5 pages, 2 figures
Process Automation Architecture Using RFID for Transparent Voting Systems
This paper presents the development of a process automation architecture leveraging Radio Frequency Identification (RFID) technology for secure, transparent and efficient voting systems. The proposed architecture automates the voting workflow through RFID-enabled voter identification, encrypted vote casting, and secure data transmission. Each eligible voter receives a smart RFID card containing a uniquely encrypted identifier, which is verified using an RC522 reader interfaced with a microcontroller. Upon successful verification, the voter interacts with a touchscreen interface to cast a vote, which is then encrypted using AES-128 and securely stored on a local SD card or transmitted via GSM to a central server. A tamper-proof monitoring mechanism records each session with time-stamped digital signatures, ensuring auditability and data integrity. The architecture is designed to function in both online and offline modes, with an automated batch synchronization mechanism that updates vote records once network connectivity is restored. System testing in simulated environments confirmed 100% voter authentication accuracy, minimized latency (average voting time of 11.5 seconds), and robustness against cloning, double voting, and data interception. The integration of real-time monitoring and secure process control modules enables electoral authorities to automate data logging, detect anomalies, and validate system integrity dynamically. This work demonstrates a scalable, automation-driven solution for voting infrastructure, offering enhanced transparency, resilience, and deployment flexibility, especially in environments where digital transformation of electoral processes is critically needed.
comment: 7 pages, 5 figures, 1 table
Accelerating Adaptive Systems via Normalized Parameter Estimation Laws
In this paper, we propose a new class of parameter estimation laws for adaptive systems, called \emph{normalized parameter estimation laws}. A key feature of these estimation laws is that they accelerate the convergence of the system state, $\mathit{x(t)}$, to the origin. We quantify this improvement by showing that our estimation laws guarantee finite integrability of the $\mathit{r}$-th root of the squared norm of the system state, i.e., \( \mathit{\|x(t)\|}_2^{2/\mathit{r}} \in \mathcal{L}_1, \) where $\mathit{r} \geq 1$ is a pre-specified parameter that, for a broad class of systems, can be chosen arbitrarily large. In contrast, standard Lyapunov-based estimation laws only guarantee integrability of $\mathit{\|x(t)\|}_2^2$ (i.e., $\mathit{r} = 1$). We motivate our method by showing that, for large values of $r$, this guarantee serves as a sparsity-promoting mechanism in the time domain, meaning that it penalizes prolonged signal duration and slow decay, thereby promoting faster convergence of $\mathit{x(t)}$. The proposed estimation laws do not rely on time-varying or high adaptation gains and do not require persistent excitation. Moreover, they can be applied to systems with matched and unmatched uncertainties, regardless of their dynamic structure, as long as a control Lyapunov function (CLF) exists. Finally, they are compatible with any CLF-based certainty equivalence controllers. We further develop higher-order extensions of our estimation laws by incorporating momentum into the estimation dynamics. We illustrate the performance improvements achieved with the proposed scheme through various numerical experiments.
Assessing the Quality of a Set of Basis Functions for Inverse Optimal Control via Projection onto Global Minimizers
Inverse optimization (Inverse optimal control) is the task of imputing a cost function such that given test points (trajectories) are (nearly) optimal with respect to the discovered cost. Prior methods in inverse optimization assume that the true cost is a convex combination of a set of convex basis functions and that this basis is consistent with the test points. However, the consistency assumption is not always justified, as in many applications the principles by which the data is generated are not well understood. This work proposes using the distance between a test point and the set of global optima generated by the convex combinations of the convex basis functions as a measurement for the expressive quality of the basis with respect to the test point. A large minimal distance invalidates the set of basis functions. The concept of a set of global optima is introduced and its properties are explored in unconstrained and constrained settings. Upper and lower bounds for the minimum distance in the convex quadratic setting are implemented by bi-level gradient descent and an enriched linear matrix inequality respectively. Extensions to this framework include max-representable basis functions, nonconvex basis functions (local minima), and applying polynomial optimization techniques.
comment: 8 pages, 4 figures
Comparison and performance analysis of dynamic encrypted control approaches
Encrypted controllers using homomorphic encryption have proven to guarantee the privacy of measurement and control signals, as well as system and controller parameters, while regulating the system as intended. However, encrypting dynamic controllers has remained a challenge due to growing noise and overflow issues in the encoding. In this paper, we review recent approaches to dynamic encrypted control, such as bootstrapping, periodic resets of the controller state, integer reformulations, and FIR controllers, and equip them with a stability and performance analysis to evaluate their suitability. We complement the analysis with a numerical performance comparison on a benchmark system.
A polynomial-based QCQP solver for encrypted optimization
In this paper, we present a novel method for solving a class of quadratically constrained quadratic optimization problems using only additions and multiplications. This approach enables solving constrained optimization problems on private data since the operations involved are compatible with the capabilities of homomorphic encryption schemes. To solve the constrained optimization problem, a sequence of polynomial penalty functions of increasing degree is introduced, which are sufficiently steep at the boundary of the feasible set. Adding the penalty function to the original cost function creates a sequence of unconstrained optimization problems whose minimizer always lies in the admissible set and converges to the minimizer of the constrained problem. A gradient descent method is used to generate a sequence of iterates associated with these problems. For the algorithm, it is shown that the iterate converges to a minimizer of the original problem, and the feasible set is positively invariant under the iteration. Finally, the method is demonstrated on an illustrative cryptographic problem, finding the smaller value of two numbers, and the encrypted implementability is discussed.
comment: Accepted for presentation at the 64th IEEE Conference on Decision and Control (CDC2025)
Enhanced Ground-Satellite Direct Access via Onboard Rydberg Atomic Quantum Receivers
Ground-satellite links for 6G networks face critical challenges, including severe path loss, tight size-weight-power limits, and congested spectrum, all of which significantly hinder the performance of traditional radio frequency (RF) front ends. This article introduces the Rydberg Atomic Quantum Receiver (RAQR) for onboard satellite systems, a millimeter-scale front end that converts radio fields to optical signals through atomic electromagnetically induced transparency. RAQR's high sensitivity and high frequency selectivity address link budget, payload, and interference challenges while fitting within space constraints. A hybrid atomic-electronic design and supporting signal model demonstrate enhanced data rate, coverage, and sensing accuracy relative to conventional RF receivers. The article concludes with integration strategies, distributed-satellite concepts, and open research problems for bringing RAQR-enabled satellite payloads into service.
comment: Submitted to IEEE Journal
Breaking and Fixing Defenses Against Control-Flow Hijacking in Multi-Agent Systems
Control-flow hijacking attacks manipulate orchestration mechanisms in multi-agent systems into performing unsafe actions that compromise the system and exfiltrate sensitive information. Recently proposed defenses, such as LlamaFirewall, rely on alignment checks of inter-agent communications to ensure that all agent invocations are "related to" and "likely to further" the original objective. We start by demonstrating control-flow hijacking attacks that evade these defenses even if alignment checks are performed by advanced LLMs. We argue that the safety and functionality objectives of multi-agent systems fundamentally conflict with each other. This conflict is exacerbated by the brittle definitions of "alignment" and the checkers' incomplete visibility into the execution context. We then propose, implement, and evaluate ControlValve, a new defense inspired by the principles of control-flow integrity and least privilege. ControlValve (1) generates permitted control-flow graphs for multi-agent systems, and (2) enforces that all executions comply with these graphs, along with contextual rules (generated in a zero-shot manner) for each agent invocation.
Floating-Base Deep Lagrangian Networks
Grey-box methods for system identification combine deep learning with physics-informed constraints, capturing complex dependencies while improving out-of-distribution generalization. Yet, despite the growing importance of floating-base systems such as humanoids and quadrupeds, current grey-box models ignore their specific physical constraints. For instance, the inertia matrix is not only positive definite but also exhibits branch-induced sparsity and input independence. Moreover, the 6x6 composite spatial inertia of the floating base inherits properties of single-rigid-body inertia matrices. As we show, this includes the triangle inequality on the eigenvalues of the composite rotational inertia. To address the lack of physical consistency in deep learning models of floating-base systems, we introduce a parameterization of inertia matrices that satisfies all these constraints. Inspired by Deep Lagrangian Networks (DeLaN), we train neural networks to predict physically plausible inertia matrices that minimize inverse dynamics error under Lagrangian mechanics. For evaluation, we collected and released a dataset on multiple quadrupeds and humanoids. In these experiments, our Floating-Base Deep Lagrangian Networks (FeLaN) achieve highly competitive performance on both simulated and real robots, while providing greater physical interpretability.
Generalized Group Selection Strategies for Self-sustainable RIS-aided Communication
Reconfigurable intelligent surface (RIS) is a cutting-edge communication technology that has been proposed as aviable option for beyond fifth-generation wireless communication networks. This paper investigates various group selection strategies in the context of grouping-based self-sustainable RIS-aided device-to-device (D2D) communication with spatially correlated wireless channels. Specifically, we consider both power splitting (PS) and time switching (TS) configurations, of the self-sustainable RIS to analyze the system performance and propose appropriate bounds on the choice of system parameters. The analysis takes into account a simplified linear energy harvesting (EH) model as well as a practical non-linear EH model. Based on the application requirements, we propose various group selection strategies at the RIS. Notably, each strategy schedules the k-th best available group at the RIS based on the end-to-end signal-to-noise ratio (SNR) and also the energy harvested at a particular group of the RIS. Accordingly, by using tools from high order statistics, we derive analytical expressions for the outage probability of each selection strategy. Moreover, by applying the tools from extreme value theory, we also investigate an asymptotic scenario, where the number of groups available for selection at an RIS approaches infinity. The nontrivial insights obtained from this approach is especially beneficial in applications like large intelligent surface-aided wireless communication. Finally, the numerical results demonstrate the importance and benefits of the proposed approaches in terms of metrics such as the data throughput and the outage (both data and energy) performance.
comment: This work has been submitted to an IEEE journal for possible publication
A Data-Driven Framework for Online Mitigation of False Data Injection Signals in Networked Control Systems
This paper introduces a novel two-stage framework for online mitigation of False Data Injection (FDI) signals to improve the resiliency of Networked Control Systems (NCSs) and ensure their safe operation in the presence of malicious activities. The first stage involves meta learning to select a base time series forecasting model within a stacked ensemble learning architecture. This is achieved by converting time series data into scalograms using continuous wavelet transform, which are then split into image frames to generate a scalo-temporal representation of the data and to distinguish between different complexity levels of time series data based on an entropy metric using a convolutional neural network. In the second stage, the selected model mitigates false data injection signals in real-time. The proposed framework's effectiveness is demonstrated through rigorous simulations involving the formation control of differential drive mobile robots. By addressing the security challenges in NCSs, this framework offers a promising approach to maintaining system integrity and ensuring operational safety.
comment: 17 pages, 9 figures
Semantic Intelligence: A Bio-Inspired Cognitive Framework for Embodied Agents
Recent advancements in Large Language Models (LLMs) have greatly enhanced natural language understanding and content generation. However, these models primarily operate in disembodied digital environments and lack interaction with the physical world. To address this limitation, Embodied Artificial Intelligence (EAI) has emerged, focusing on agents that can perceive and interact with their surroundings. Despite progress, current embodied agents face challenges in unstructured real-world environments due to insufficient semantic intelligence, which is critical for understanding and reasoning about complex tasks. This paper introduces the Semantic Intelligence-Driven Embodied (SIDE) agent framework, which integrates a hierarchical semantic cognition architecture with a semantic-driven decision-making process. This enables agents to reason about and interact with the physical world in a contextually adaptive manner. The framework is inspired by biological cognitive mechanisms and utilizes bio-inspired principles to design a semantic cognitive architecture that mimics how humans and animals integrate and process sensory information. We present this framework as a step toward developing more intelligent and versatile embodied agents.
Quantum Key Distribution for Virtual Power Plant Communication: A Lightweight Key-Aware Scheduler with Provable Stability
Virtual power plants (VPPs) are becoming a cornerstone of future grids, aggregating distributed PV, wind, storage, and flexible loads for market participation and real-time balancing. As operations move to minute-- and second--level feedback, communication security shifts from a compliance item to an operational constraint: latency, reliability, and confidentiality jointly determine whether dispatch, protection, and settlement signals arrive on time. Conventional PKI and key-rotation schemes struggle with cross-domain, high-frequency messaging and face long-term quantum threats. Quantum key distribution (QKD) offers information-theoretic key freshness, but its key yield is scarce and stochastic, often misaligned with bursty VPP traffic. This paper proposes a key-aware priority and quota framework that treats quantum keys as first-class scheduling resources. The design combines (i) forecast-driven long-term quotas and short-term tokens, (ii) key-aware deficit-round-robin arbitration, (iii) a preemptive emergency key reserve, and (iv) graceful degradation via encryption-mode switching and controlled down-sampling for non-critical traffic. A drift-plus-penalty analysis establishes strong stability under average supply--demand balance with quantifiable bounds on backlog and tail latency, providing interpretable operating guarantees. We build a reproducible testbed on IEEE 33- and 123-bus VPP systems and evaluate normal, degraded, and outage regimes with industry-consistent message classes and TTLs. Against FIFO, fixed-priority, and static-quota baselines, the proposed scheme consistently reduces tail delay and passive timeouts for critical messages, improves per-bit key utility, and enhances power-tracking reliability during key scarcity and regime switches.
Differentiating Through Power Flow Solutions for Admittance and Topology Control
The power flow equations relate bus voltage phasors to power injections via the network admittance matrix. These equations are central to the key operational and protection functions of power systems (e.g., optimal power flow scheduling and control, state estimation, protection, and fault location, among others). As control, optimization, and estimation of network admittance parameters are central to multiple avenues of research in electric power systems, we propose a linearization of power flow solutions obtained by implicitly differentiating them with respect to the network admittance parameters. This is achieved by utilizing the implicit function theorem, in which we show that such a differentiation is guaranteed to exist under mild conditions and is applicable to generic power systems (radial or meshed). The proposed theory is applied to derive sensitivities of complex voltages, line currents, and power flows. The developed theory of linearizing the power flow equations around changes in the complex network admittance parameters has numerous applications. We demonstrate several of these applications, such as predicting the nodal voltages when the network topology changes without solving the power flow equations. We showcase the application for continuous admittance control, which is used to increase the hosting capacity of a given distribution network.
comment: 10 pages, 6 figures
ANGEL: A Novel Gripper for Versatile and Light-touch Fruit Harvesting
Fruit harvesting remains predominantly a labor-intensive process, motivating the development of research for robotic grippers. Conventional rigid or vacuum-driven grippers require complex mechanical design or high energy consumption. Current enveloping-based fruit harvesting grippers lack adaptability to fruits of different sizes. This paper introduces a drawstring-inspired, cable-driven soft gripper for versatile and gentle fruit harvesting. The design employs 3D-printed Thermoplastic Polyurethane (TPU) pockets with integrated steel wires that constrict around the fruit when actuated, distributing pressure uniformly to minimize bruising and allow versatility to fruits of varying sizes. The lightweight structure, which requires few components, reduces mechanical complexity and cost compared to other grippers. Actuation is achieved through servo-driven cable control, while motor feedback provides autonomous grip adjustment with tunable grip strength. Experimental validation shows that, for tomatoes within the gripper's effective size range, harvesting was achieved with a 0% immediate damage rate and a bruising rate of less than 9% after five days, reinforcing the gripper's suitability for fruit harvesting.
Provably Optimal Reinforcement Learning under Safety Filtering
Recent advances in reinforcement learning (RL) enable its use on increasingly complex tasks, but the lack of formal safety guarantees still limits its application in safety-critical settings. A common practical approach is to augment the RL policy with a safety filter that overrides unsafe actions to prevent failures during both training and deployment. However, safety filtering is often perceived as sacrificing performance and hindering the learning process. We show that this perceived safety-performance tradeoff is not inherent and prove, for the first time, that enforcing safety with a sufficiently permissive safety filter does not degrade asymptotic performance. We formalize RL safety with a safety-critical Markov decision process (SC-MDP), which requires categorical, rather than high-probability, avoidance of catastrophic failure states. Additionally, we define an associated filtered MDP in which all actions result in safe effects, thanks to a safety filter that is considered to be a part of the environment. Our main theorem establishes that (i) learning in the filtered MDP is safe categorically, (ii) standard RL convergence carries over to the filtered MDP, and (iii) any policy that is optimal in the filtered MDP-when executed through the same filter-achieves the same asymptotic return as the best safe policy in the SC-MDP, yielding a complete separation between safety enforcement and performance optimization. We validate the theory on Safety Gymnasium with representative tasks and constraints, observing zero violations during training and final performance matching or exceeding unfiltered baselines. Together, these results shed light on a long-standing question in safety-filtered learning and provide a simple, principled recipe for safe RL: train and deploy RL policies with the most permissive safety filter that is available.
comment: 17 pages, 3 figures
Prompt-to-Primal Teaching
This paper introduces Prompt-to-Primal (P2P) Teaching, an AI-integrated instructional approach that links prompt-driven exploration with first-principles reasoning, guided and moderated by the instructor within the classroom setting. In P2P teaching, student-generated AI prompts serve as entry points for inquiry and initial discussions in class, while the instructor guides learners to validate, challenge, and reconstruct AI responses through fundamental physical and mathematical laws. The approach encourages self-reflective development, critical evaluation of AI outputs, and conceptual foundational knowledge of the core engineering principles. A large language model (LLM) can be a highly effective tool for those who already possess foundational knowledge of a subject; however, it may also mislead students who lack sufficient background in the subject matter. Results from two student cohorts across different semesters suggest the pedagogical effectiveness of the P2P teaching framework in enhancing both AI literacy and engineering reasoning.
comment: 9 pages, 5 figures
An Exact Quantile-Energy Equality for Terminal Halfspaces in Linear-Gaussian Control with a Discrete-Time Companion, KL/Schrodinger Links, and High-Precision Validation
We prove an exact equality between the minimal quadratic control energy and the squared normal-quantile gap for terminal halfspaces in linear-Gaussian systems with additive control and quadratic effort $E(u) = \tfrac12\!\int u^\top M u\,dt$ where $M = B^\top\Sigma^{-1}B$. For terminal halfspace events, the minimal energy equals the squared normal-quantile gap divided by twice a controllability-to-noise ratio $R_T^2(w)=(w^\top W_c^M w)/(w^\top V_T w)$ and is attained by a matched-filter control. We provide an exact zero-order-hold discrete-time companion via block exponentials, relate the result to minimum-energy control, Gaussian isoperimetry, risk-sensitive/KL control, and Schrodinger bridges, and validate to high precision with Monte Carlo. We state assumptions, singular-$M$ handling, and edge cases. The statement is a compact synthesis and design-ready translator, not a universal principle. Novelty: while the ingredients (Gramians, Cauchy-Schwarz, Gaussian isoperimetry) are classical, to our knowledge the explicit quantile-energy equality with a constructive matched-filter achiever for terminal halfspaces, and its discrete-time companion, are not recorded together in the cited literature.
Model-Free Dynamic Consensus in Multi-Agent Systems: A Q-Function Perspective
This paper presents a new method for achieving dynamic consensus in linear discrete-time homogeneous multi-agent systems (MAS) with marginally stable or unstable dynamics. The guarantee of consensus in this setting involves a set of constraints based on the graph's spectral properties, complicating the design of the coupling gains. This challenge intensifies for large-scale systems with diverse graph Laplacian spectra. The proposed approach reformulates the dynamic consensus problem with a prescribed convergence rate using a state-action value function framework inspired by optimal control theory. Specifically, a synthetic linear quadratic regulation (LQR) formulation is introduced to encode the consensus objective, enabling its translation into a convex semidefinite programming (SDP) problem. The resulting SDP is applicable in both model-based and model-free settings for jointly designing the local feedback and coupling gains. To handle the inherent non-convex feasibility conditions, a convex-concave decomposition strategy is employed. Adaptation of the method in a completely model-free set-up eliminates the need for system identification or knowledge of the agents' dynamics. Instead, it relies on input-state data collection and offers an entirely data-driven equivalent SDP formulation. Finally, a new algorithm balancing feasibility, convergence rate, robustness, and energy efficiency, is established to provide design flexibility. Numerical simulations validate the method's effectiveness in various scenarios.
Direct data-driven interpolation and approximation of linear parameter-varying system trajectories
We consider the problem of estimating missing values in trajectories of linear parameter-varying (LPV) systems. We solve this interpolation problem for the class of shifted-affine LPV systems. Conditions for the existence and uniqueness of solutions are given and a direct data-driven algorithm for its computation is presented, i.e., the data-generating system is not given by a parametric model but is implicitly specified by data. We illustrate the applicability of the proposed solution on illustrative examples of a mass-spring-damper system with exogenous and endogenous parameter variation.
comment: 10 pages, 5 figures, submitted for review
Sparse Identification of Nonlinear Dynamics Enhanced by Ensemble Learning, Multi-Step Prediction Evaluation, Elite Strategy, and Classification Techniques for Applications to Industrial Systems
This paper proposes a sparse identification of nonlinear dynamics (SINDy) with control and exogenous inputs for highly accurate and reliable prediction. Although SINDy is recognized as a remarkable approach for identifying nonlinear systems, several challenges remain. Its application to industrial systems remains limited, and multi-step predictions are not guaranteed due to overfitting and noisy data. This phenomenon is often caused by the increase in basis functions resulting from the extension of coordinates, such as time-delay embedding. To address these problems, this study proposes an emphasized SINDy framework by integrating ensemble-learning, multi-step prediction evaluations, elite strategy, and classification techniques (EMEC-SINDy), while preserving convex optimization. The proposed method employs library bagging and extracts elites with an R-squared greater than 90%. Then, clustering is performed on the surviving elites because physically motivated basis functions are not always available, and the elites obtained do not always have similar basis functions. After the classification, discrete model candidates are obtained by taking the mean of each classified elite. Finally, the best model is selected. Simulation results demonstrate that EMEC-SINDy significantly outperforms original SINDy approaches in multi-step prediction accuracy under noisy conditions, validating its applicability to the diesel engine airpath system, which is known as a complex and highly coupled nonlinear multi-input multi-output system.
Observer Design over Hypercomplex Quaternions
We develop observer design over hypercomplex quaternions in a characteristic-polynomial-free framework. Using the standard right-module convention, we derive a right observable companion form and its companion polynomial that encodes error dynamics via right-eigenvalue similarity classes. The design mirrors the real/complex case - coefficient updates in companion coordinates, followed by a similarity back - yet avoids determinants, characteristic/minimal polynomials, and Cayley-Hamilton identities that do not transfer to quaternions. We also give an Ackermann-type construction for the important case of closed-loop companion polynomials with real coefficients, ensuring similarity-equivariant evaluation. The results yield simple recipes for full-order observers directly over quaternions, clarify the role of right spectra and their similarity classes, and pinpoint when classical one-shot formulas remain valid. Numerical examples illustrate the method and advantages over vectorized or complex-adjoint surrogates.
comment: Submitted for presentation at the 24th European Control Conference (ECC 2026), Reykjavik, Iceland. This work was co-funded by the European Union under the project ROBOPROX (reg. no. CZ.02.01.01/00/22 008/0004590)
Optimal control of stochastic reaction networks with entropic control cost and emergence of mode-switching strategies
Controlling the stochastic dynamics of biological populations is a challenge that arises across various biological contexts. However, these dynamics are inherently nonlinear and involve a discrete state space, i.e., the number of molecules, cells, or organisms. Additionally, the possibility of extinction has a significant impact on both dynamics and control strategies, particularly when the population size is small. These factors hamper the direct application of conventional control theories to biological systems. To address these challenges, we formulate the optimal control problem for stochastic population dynamics by utilizing control cost functions based on the f-divergence, which naturally accounts for population-specific factors. If Kullback-Leibler (KL) divergence is adopted for the cost function, the complex nonlinear Hamilton-Jacobi-Bellman equation is simplified into a linear form, facilitating efficient computation of optimal solutions. We demonstrate the effectiveness of our approach by applying it to the control of interacting random walkers, Moran processes, and SIR models, and observe the mode-switching phenomena in the control strategies. Our approach provides new opportunities for applying control theory to a wide range of biological problems.
comment: 12 pages, 4 figures
Informativity Conditions for Multiple Signals: Properties, Experimental Design, and Applications
Recent studies highlight the importance of persistently exciting condition in single signal sequence for model identification and data-driven control methodologies. However, maintaining prolonged excitation in control signals introduces significant challenges, as continuous excitation can reduce the lifetime of mechanical devices. In this paper, we introduce three informativity conditions for various types of multi-signal data, each augmented by weight factors. We explore the interrelations between these conditions and their rank properties in linear time-invariant systems. Furthermore, we introduce open-loop experimental design methods tailored to each of the three conditions, which can synthesize the required excitation conditions either offline or online, even in the presence of limited information within each signal segment. We demonstrate the effectiveness of these informativity conditions in least-squares identification. Additionally, all three conditions can extend Willems' fundamental lemma and are utilized to assess the properties of the system. Illustrative examples confirm that these conditions yield satisfactory outcomes in both least-squares identification and the construction of data-driven controllers.
Accurate Small-Signal Modeling of Digitally Controlled Buck Converters with ADC-PWM Synchronization
Digital control has become increasingly widespread in modern power electronic converters. When acquiring feedback signals such as the inductor current, synchronizing the analog-to-digital converter (ADC) with the digital pulse-width modulator (DPWM) is commonly employed to accurately track their steady-state average. However, the small-signal implications of such synchronization have not been investigated. This paper presents an exact small-signal model for digitally controlled buck converters operating in forced continuous-conduction mode (FCCM) under constant-frequency current-mode control, explicitly accounting for DPWM-ADC synchronization. Using a sampled-data framework, the proposed model captures all sideband effects introduced by the sampling process, yielding precise predictions of both analog and digital loop gains, even at frequencies beyond the switching and sampling frequencies. Both asymmetrical and symmetrical carrier modulations are considered. Furthermore, the digital loop gain is derived in closed form using the modified z-transform, enabling low-complexity compensator design and stability assessment. Within this framework, the analog loop gain can be directly obtained from the digital loop gain, thereby eliminating the need for computationally intensive infinite series evaluations. The validity of the proposed model is confirmed through both simulation and experimental results.
Modeling the Impact of Communication and Human Uncertainties on Runway Capacity in Terminal Airspace
We investigate the potential impact of communication and human performance uncertainties on runway operations. Specifically, we consider these impacts within the context of an arrival scenario with two converging flows: a straight-in approach stream and a downwind stream merging into it. Both arrival stream are modeled using a modified Possion distribution that incorporate the separation minima as well as the runway occupancy time. Various system level uncertainties are addressed in this process, including communication link- and human-related uncertainties. In this research, we first build a Monte Carlo-based discrete-time simulation, where aircraft arrivals are generated by modified Poisson processes subject to minimum separation constraints, simulating various traffic operations. The merging logic incorporates standard bank angle continuous turn-to-final, pilot response delays, and dynamic gap availability in real time. Then, we investigate an automated final approach vectoring model (i.e., Auto-ATC), in which inverse optimal control is used to learn decision advisories from human expert records. By augmenting trajectories and incorporating the aforementioned uncertainties into the planning scenario, we create a setup analogous to the discrete event simulation. For both studies, runway capacity is measured by runway throughput, the fraction of downwind arrivals that merge immediately without holding, and the average delay (i.e., holding time/distance) experienced on the downwind leg. This research provides a method for runway capacity estimation in merging scenarios, and demonstrates that aeronautical communication link uncertainties significantly affect runway capacity in current voice-based operations, whereas the impact can be mitigated in autonomous operational settings.
Identification and Adaptive Control of Markov Jump Systems: Sample Complexity and Regret Bounds
Learning how to effectively control unknown dynamical systems is crucial for intelligent autonomous systems. This task becomes a significant challenge when the underlying dynamics are changing with time. Motivated by this challenge, this paper considers the problem of controlling an unknown Markov jump linear system (MJS) to optimize a quadratic objective. By taking a model-based perspective, we consider identification-based adaptive control of MJSs. We first provide a system identification algorithm for MJS to learn the dynamics in each mode as well as the Markov transition matrix, underlying the evolution of the mode switches, from a single trajectory of the system states, inputs, and modes. Through martingale-based arguments, sample complexity of this algorithm is shown to be $\mathcal{O}(1/\sqrt{T})$. We then propose an adaptive control scheme that performs system identification together with certainty equivalent control to adapt the controllers in an episodic fashion. Combining our sample complexity results with recent perturbation results for certainty equivalent control, we prove that when the episode lengths are appropriately chosen, the proposed adaptive control scheme achieves $\mathcal{O}(\sqrt{T})$ regret, which can be improved to $\mathcal{O}(polylog(T))$ with partial knowledge of the system. Our proof strategy introduces innovations to handle Markovian jumps and a weaker notion of stability common in MJSs. Our analysis provides insights into system theoretic quantities that affect learning accuracy and control performance. Numerical simulations are presented to further reinforce these insights.
comment: Improved results using Martingale-based arguments
Systems and Control (EESS)
Admittance Matrix Concentration Inequalities for Understanding Uncertain Power Networks
This paper presents probabilistic bounds for the spectrum of the admittance matrix and classical linear power flow models under uncertain network parameters; for example, probabilistic line contingencies. Our proposed approach imports tools from probability theory, such as concentration inequalities for random matrices with independent entries. It yields error bounds for common approximations of the AC power flow equations under parameter uncertainty, including the DC and LinDistFlow approximations.
comment: 9 pages, 1 figure
Data-driven Communication and Control Design for Distributed Frequency Regulation with Black-box Inverters SC
The increasing penetration of inverter-based resources into the power grid, with often only black-box models available, challenges long-standing frequency control methods. Most recent works take a decentralized approach without online device coordination via communication. This paper considers both dynamic behavior and communication within secondary frequency control on an intermediate timescale. We develop a distributed data-driven approach that utilizes peer-to-peer communication between inverters to avoid the need for a central control center. To enable a trade off between communication network requirements and control performance, we present a framework to guide communication topology design for secondary frequency regulation. Following design of the inter-agent information exchange scheme, we design a controller that is structured according to the communication topology with a closed-loop stability guarantee. Case studies on the IEEE 39-bus system validate the framework and illustrate the trade-off between communication requirements and control performance that is enabled by our approach.
comment: Preprint submitted to PSCC 2026
Trajectory Optimization for Minimum Threat Exposure using Physics-Informed Neural Networks
We apply a physics-informed neural network (PINN) to solve the two-point boundary value problem (BVP) arising from the necessary conditions postulated by Pontryagin's Minimum Principle for optimal control. Such BVPs are known to be numerically difficult to solve by traditional shooting methods due to extremely high sensitivity to initial guesses. In the light of recent successes in applying PINNs for solving high-dimensional differential equations, we develop a PINN to solve the problem of finding trajectories with minimum exposure to a spatiotemporal threat for a vehicle kinematic model. First, we implement PINNs that are trained to solve the BVP for a given pair of initial and final states for a given threat field. Next, we implement a PINN conditioned on the initial state for a given threat field, which eliminates the need for retraining for each initial state. We demonstrate that the PINN outputs satisfy the necessary conditions with low numerical error.
comment: 2025 Indian Control Conference
Artificial magnetic conductor backed dual-mode sectoral cylindrical DRA for off-body biomedical telemetry
This research investigates the potential of a sectoral Cylindrical Dielectric Resonator Antenna (CDRA) for biomedical telemetry. CDRAs are known for their low loss, ruggedness, and stability, but their limited bandwidth and size make them unsuitable for wearable devices. The research addresses these limitations by proposing a dual mode antenna that operates in EH110 and TE210 modes. The sectoral CDRA is a quarter segment with Perfect Electric Conductor boundaries, reducing its size by a factor of four. Mathematical derivations of the field components for both modes are derived to support the design. To minimize specific absorption rate (SAR), an Artificial Magnetic Conductor (AMC) surface is applied to the antennas backside, enhancing compatibility with the transverse electric modes. The antenna achieves a bandwidth of 0.7 GHz (5.2-5.9 GHz), suitable for biomedical applications, with a measured peak gain of 7.9 dBi and a SAR of 1.24 W/kg when applied to a human arm.
comment: 13 pages
An Empirical Study of Lagrangian Methods in Safe Reinforcement Learning
In safety-critical domains such as robotics, navigation and power systems, constrained optimization problems arise where maximizing performance must be carefully balanced with associated constraints. Safe reinforcement learning provides a framework to address these challenges, with Lagrangian methods being a popular choice. However, the effectiveness of Lagrangian methods crucially depends on the choice of the Lagrange multiplier $\lambda$, which governs the trade-off between return and constraint cost. A common approach is to update the multiplier automatically during training. Although this is standard in practice, there remains limited empirical evidence on the robustness of an automated update and its influence on overall performance. Therefore, we analyze (i) optimality and (ii) stability of Lagrange multipliers in safe reinforcement learning across a range of tasks. We provide $\lambda$-profiles that give a complete visualization of the trade-off between return and constraint cost of the optimization problem. These profiles show the highly sensitive nature of $\lambda$ and moreover confirm the lack of general intuition for choosing the optimal value $\lambda^*$. Our findings additionally show that automated multiplier updates are able to recover and sometimes even exceed the optimal performance found at $\lambda^*$ due to the vast difference in their learning trajectories. Furthermore, we show that automated multiplier updates exhibit oscillatory behavior during training, which can be mitigated through PID-controlled updates. However, this method requires careful tuning to achieve consistently better performance across tasks. This highlights the need for further research on stabilizing Lagrangian methods in safe reinforcement learning. The code used to reproduce our results can be found at https://github.com/lindsayspoor/Lagrangian_SafeRL.
A condensing approach for linear-quadratic optimization with geometric constraints
Optimization problems with convex quadratic cost and polyhedral constraints are ubiquitous in signal processing, automatic control and decision-making. We consider here an enlarged problem class that allows to encode logical conditions and cardinality constraints, among others. In particular, we cover also situations where parts of the constraints are nonconvex and possibly complicated, but it is practical to compute projections onto this nonconvex set. Our approach combines the augmented Lagrangian framework with a solver-agnostic structure-exploiting subproblem reformulation. While convergence guarantees follow from the former, the proposed condensing technique leads to significant improvements in computational performance.
comment: 14 pages, 5 figures
ORIX: Orchestration of RIS with xApps for Smart Wireless Factory Environments
The vision of a smart wireless factory (SWF) demands highly flexible, low-latency, and reliable connectivity that goes beyond conventional wireless solutions. Reconfigurable intelligent surface (RIS)-empowered communications, when integrated with the open radio access network (O-RAN) architectures, have emerged as a promising enabler to meet these challenging requirements. This article introduces the methodology for the orchestration of RIS with xApps (ORIX), bringing the RIS technology into the O-RAN ecosystem through xApp-based control for SWF environments. ORIX features three key components: an O-RAN-compliant RIS service model for dynamic configuration, an RIS channel simulator that supports 3GPP indoor factory models with multiple industrial scenarios, and practical RIS optimization strategies with finite-resolution control. Together, these elements provide a realistic end-to-end emulation platform for evaluating RIS placement, control, and performance in SWF environments prior to deployment. The presented case study demonstrates how ORIX enables the evaluation of achievable performance gains, exploration of trade-offs among key RIS design parameters, and identification of deployment strategies that balance system performance with practical implementation constraints. By bridging theoretical advances with industrial feasibility, ORIX lays the groundwork for RIS-assisted O-RAN networks to power next-generation wireless communication in industrial scenarios.
comment: Submitted in IEEE
Inverse Optimal Control of Muscle Force Sharing During Pathological Gait
Muscle force sharing is typically resolved by minimizing a specific objective function to approximate neural control strategies. An inverse optimal control approach was applied to identify the "best" objective function, among a positive linear combination of basis objective functions, associated with the gait of two post-stroke males, one high-functioning (subject S1) and one low-functioning (subject S2). It was found that the "best" objective function is subject- and leg-specific. No single function works universally well, yet the best options are usually differently weighted combinations of muscle activation- and power-minimization. Subject-specific inverse optimal control models performed best on their respective limbs (\textbf{RMSE 178/213 N, CC 0.71/0.61} for non-paretic and paretic legs of S1; \textbf{RMSE 205/165 N, CC 0.88/0.85} for respective legs of S2), but cross-subject generalization was poor, particularly for paretic legs. Moreover, minimizing the root mean square of muscle power emerged as important for paretic limbs, while minimizing activation-based functions dominated for non-paretic limbs. This may suggest different neural control strategies between affected and unaffected sides, possibly altered by the presence of spasticity. Among the 15 considered objective functions commonly used in inverse dynamics-based computations, the root mean square of muscle power was the only one explicitly incorporating muscle velocity, leading to a possible model for spasticity in the paretic limbs. Although this objective function has been rarely used, it may be relevant for modeling pathological gait, such as post-stroke gait.
Integrating Trustworthy Artificial Intelligence with Energy-Efficient Robotic Arms for Waste Sorting
This paper presents a novel methodology that integrates trustworthy artificial intelligence (AI) with an energy-efficient robotic arm for intelligent waste classification and sorting. By utilizing a convolutional neural network (CNN) enhanced through transfer learning with MobileNetV2, the system accurately classifies waste into six categories: plastic, glass, metal, paper, cardboard, and trash. The model achieved a high training accuracy of 99.8% and a validation accuracy of 80.5%, demonstrating strong learning and generalization. A robotic arm simulator is implemented to perform virtual sorting, calculating the energy cost for each action using Euclidean distance to ensure optimal and efficient movement. The framework incorporates key elements of trustworthy AI, such as transparency, robustness, fairness, and safety, making it a reliable and scalable solution for smart waste management systems in urban settings.
comment: 5 pages, 2 figures
Process Automation Architecture Using RFID for Transparent Voting Systems
This paper presents the development of a process automation architecture leveraging Radio Frequency Identification (RFID) technology for secure, transparent and efficient voting systems. The proposed architecture automates the voting workflow through RFID-enabled voter identification, encrypted vote casting, and secure data transmission. Each eligible voter receives a smart RFID card containing a uniquely encrypted identifier, which is verified using an RC522 reader interfaced with a microcontroller. Upon successful verification, the voter interacts with a touchscreen interface to cast a vote, which is then encrypted using AES-128 and securely stored on a local SD card or transmitted via GSM to a central server. A tamper-proof monitoring mechanism records each session with time-stamped digital signatures, ensuring auditability and data integrity. The architecture is designed to function in both online and offline modes, with an automated batch synchronization mechanism that updates vote records once network connectivity is restored. System testing in simulated environments confirmed 100% voter authentication accuracy, minimized latency (average voting time of 11.5 seconds), and robustness against cloning, double voting, and data interception. The integration of real-time monitoring and secure process control modules enables electoral authorities to automate data logging, detect anomalies, and validate system integrity dynamically. This work demonstrates a scalable, automation-driven solution for voting infrastructure, offering enhanced transparency, resilience, and deployment flexibility, especially in environments where digital transformation of electoral processes is critically needed.
comment: 7 pages, 5 figures, 1 table
Accelerating Adaptive Systems via Normalized Parameter Estimation Laws
In this paper, we propose a new class of parameter estimation laws for adaptive systems, called \emph{normalized parameter estimation laws}. A key feature of these estimation laws is that they accelerate the convergence of the system state, $\mathit{x(t)}$, to the origin. We quantify this improvement by showing that our estimation laws guarantee finite integrability of the $\mathit{r}$-th root of the squared norm of the system state, i.e., \( \mathit{\|x(t)\|}_2^{2/\mathit{r}} \in \mathcal{L}_1, \) where $\mathit{r} \geq 1$ is a pre-specified parameter that, for a broad class of systems, can be chosen arbitrarily large. In contrast, standard Lyapunov-based estimation laws only guarantee integrability of $\mathit{\|x(t)\|}_2^2$ (i.e., $\mathit{r} = 1$). We motivate our method by showing that, for large values of $r$, this guarantee serves as a sparsity-promoting mechanism in the time domain, meaning that it penalizes prolonged signal duration and slow decay, thereby promoting faster convergence of $\mathit{x(t)}$. The proposed estimation laws do not rely on time-varying or high adaptation gains and do not require persistent excitation. Moreover, they can be applied to systems with matched and unmatched uncertainties, regardless of their dynamic structure, as long as a control Lyapunov function (CLF) exists. Finally, they are compatible with any CLF-based certainty equivalence controllers. We further develop higher-order extensions of our estimation laws by incorporating momentum into the estimation dynamics. We illustrate the performance improvements achieved with the proposed scheme through various numerical experiments.
Assessing the Quality of a Set of Basis Functions for Inverse Optimal Control via Projection onto Global Minimizers
Inverse optimization (Inverse optimal control) is the task of imputing a cost function such that given test points (trajectories) are (nearly) optimal with respect to the discovered cost. Prior methods in inverse optimization assume that the true cost is a convex combination of a set of convex basis functions and that this basis is consistent with the test points. However, the consistency assumption is not always justified, as in many applications the principles by which the data is generated are not well understood. This work proposes using the distance between a test point and the set of global optima generated by the convex combinations of the convex basis functions as a measurement for the expressive quality of the basis with respect to the test point. A large minimal distance invalidates the set of basis functions. The concept of a set of global optima is introduced and its properties are explored in unconstrained and constrained settings. Upper and lower bounds for the minimum distance in the convex quadratic setting are implemented by bi-level gradient descent and an enriched linear matrix inequality respectively. Extensions to this framework include max-representable basis functions, nonconvex basis functions (local minima), and applying polynomial optimization techniques.
comment: 8 pages, 4 figures
Comparison and performance analysis of dynamic encrypted control approaches
Encrypted controllers using homomorphic encryption have proven to guarantee the privacy of measurement and control signals, as well as system and controller parameters, while regulating the system as intended. However, encrypting dynamic controllers has remained a challenge due to growing noise and overflow issues in the encoding. In this paper, we review recent approaches to dynamic encrypted control, such as bootstrapping, periodic resets of the controller state, integer reformulations, and FIR controllers, and equip them with a stability and performance analysis to evaluate their suitability. We complement the analysis with a numerical performance comparison on a benchmark system.
A polynomial-based QCQP solver for encrypted optimization
In this paper, we present a novel method for solving a class of quadratically constrained quadratic optimization problems using only additions and multiplications. This approach enables solving constrained optimization problems on private data since the operations involved are compatible with the capabilities of homomorphic encryption schemes. To solve the constrained optimization problem, a sequence of polynomial penalty functions of increasing degree is introduced, which are sufficiently steep at the boundary of the feasible set. Adding the penalty function to the original cost function creates a sequence of unconstrained optimization problems whose minimizer always lies in the admissible set and converges to the minimizer of the constrained problem. A gradient descent method is used to generate a sequence of iterates associated with these problems. For the algorithm, it is shown that the iterate converges to a minimizer of the original problem, and the feasible set is positively invariant under the iteration. Finally, the method is demonstrated on an illustrative cryptographic problem, finding the smaller value of two numbers, and the encrypted implementability is discussed.
comment: Accepted for presentation at the 64th IEEE Conference on Decision and Control (CDC2025)
Enhanced Ground-Satellite Direct Access via Onboard Rydberg Atomic Quantum Receivers
Ground-satellite links for 6G networks face critical challenges, including severe path loss, tight size-weight-power limits, and congested spectrum, all of which significantly hinder the performance of traditional radio frequency (RF) front ends. This article introduces the Rydberg Atomic Quantum Receiver (RAQR) for onboard satellite systems, a millimeter-scale front end that converts radio fields to optical signals through atomic electromagnetically induced transparency. RAQR's high sensitivity and high frequency selectivity address link budget, payload, and interference challenges while fitting within space constraints. A hybrid atomic-electronic design and supporting signal model demonstrate enhanced data rate, coverage, and sensing accuracy relative to conventional RF receivers. The article concludes with integration strategies, distributed-satellite concepts, and open research problems for bringing RAQR-enabled satellite payloads into service.
comment: Submitted to IEEE Journal
Breaking and Fixing Defenses Against Control-Flow Hijacking in Multi-Agent Systems
Control-flow hijacking attacks manipulate orchestration mechanisms in multi-agent systems into performing unsafe actions that compromise the system and exfiltrate sensitive information. Recently proposed defenses, such as LlamaFirewall, rely on alignment checks of inter-agent communications to ensure that all agent invocations are "related to" and "likely to further" the original objective. We start by demonstrating control-flow hijacking attacks that evade these defenses even if alignment checks are performed by advanced LLMs. We argue that the safety and functionality objectives of multi-agent systems fundamentally conflict with each other. This conflict is exacerbated by the brittle definitions of "alignment" and the checkers' incomplete visibility into the execution context. We then propose, implement, and evaluate ControlValve, a new defense inspired by the principles of control-flow integrity and least privilege. ControlValve (1) generates permitted control-flow graphs for multi-agent systems, and (2) enforces that all executions comply with these graphs, along with contextual rules (generated in a zero-shot manner) for each agent invocation.
Floating-Base Deep Lagrangian Networks
Grey-box methods for system identification combine deep learning with physics-informed constraints, capturing complex dependencies while improving out-of-distribution generalization. Yet, despite the growing importance of floating-base systems such as humanoids and quadrupeds, current grey-box models ignore their specific physical constraints. For instance, the inertia matrix is not only positive definite but also exhibits branch-induced sparsity and input independence. Moreover, the 6x6 composite spatial inertia of the floating base inherits properties of single-rigid-body inertia matrices. As we show, this includes the triangle inequality on the eigenvalues of the composite rotational inertia. To address the lack of physical consistency in deep learning models of floating-base systems, we introduce a parameterization of inertia matrices that satisfies all these constraints. Inspired by Deep Lagrangian Networks (DeLaN), we train neural networks to predict physically plausible inertia matrices that minimize inverse dynamics error under Lagrangian mechanics. For evaluation, we collected and released a dataset on multiple quadrupeds and humanoids. In these experiments, our Floating-Base Deep Lagrangian Networks (FeLaN) achieve highly competitive performance on both simulated and real robots, while providing greater physical interpretability.
Generalized Group Selection Strategies for Self-sustainable RIS-aided Communication
Reconfigurable intelligent surface (RIS) is a cutting-edge communication technology that has been proposed as aviable option for beyond fifth-generation wireless communication networks. This paper investigates various group selection strategies in the context of grouping-based self-sustainable RIS-aided device-to-device (D2D) communication with spatially correlated wireless channels. Specifically, we consider both power splitting (PS) and time switching (TS) configurations, of the self-sustainable RIS to analyze the system performance and propose appropriate bounds on the choice of system parameters. The analysis takes into account a simplified linear energy harvesting (EH) model as well as a practical non-linear EH model. Based on the application requirements, we propose various group selection strategies at the RIS. Notably, each strategy schedules the k-th best available group at the RIS based on the end-to-end signal-to-noise ratio (SNR) and also the energy harvested at a particular group of the RIS. Accordingly, by using tools from high order statistics, we derive analytical expressions for the outage probability of each selection strategy. Moreover, by applying the tools from extreme value theory, we also investigate an asymptotic scenario, where the number of groups available for selection at an RIS approaches infinity. The nontrivial insights obtained from this approach is especially beneficial in applications like large intelligent surface-aided wireless communication. Finally, the numerical results demonstrate the importance and benefits of the proposed approaches in terms of metrics such as the data throughput and the outage (both data and energy) performance.
comment: This work has been submitted to an IEEE journal for possible publication
A Data-Driven Framework for Online Mitigation of False Data Injection Signals in Networked Control Systems
This paper introduces a novel two-stage framework for online mitigation of False Data Injection (FDI) signals to improve the resiliency of Networked Control Systems (NCSs) and ensure their safe operation in the presence of malicious activities. The first stage involves meta learning to select a base time series forecasting model within a stacked ensemble learning architecture. This is achieved by converting time series data into scalograms using continuous wavelet transform, which are then split into image frames to generate a scalo-temporal representation of the data and to distinguish between different complexity levels of time series data based on an entropy metric using a convolutional neural network. In the second stage, the selected model mitigates false data injection signals in real-time. The proposed framework's effectiveness is demonstrated through rigorous simulations involving the formation control of differential drive mobile robots. By addressing the security challenges in NCSs, this framework offers a promising approach to maintaining system integrity and ensuring operational safety.
comment: 17 pages, 9 figures
Semantic Intelligence: A Bio-Inspired Cognitive Framework for Embodied Agents
Recent advancements in Large Language Models (LLMs) have greatly enhanced natural language understanding and content generation. However, these models primarily operate in disembodied digital environments and lack interaction with the physical world. To address this limitation, Embodied Artificial Intelligence (EAI) has emerged, focusing on agents that can perceive and interact with their surroundings. Despite progress, current embodied agents face challenges in unstructured real-world environments due to insufficient semantic intelligence, which is critical for understanding and reasoning about complex tasks. This paper introduces the Semantic Intelligence-Driven Embodied (SIDE) agent framework, which integrates a hierarchical semantic cognition architecture with a semantic-driven decision-making process. This enables agents to reason about and interact with the physical world in a contextually adaptive manner. The framework is inspired by biological cognitive mechanisms and utilizes bio-inspired principles to design a semantic cognitive architecture that mimics how humans and animals integrate and process sensory information. We present this framework as a step toward developing more intelligent and versatile embodied agents.
Quantum Key Distribution for Virtual Power Plant Communication: A Lightweight Key-Aware Scheduler with Provable Stability
Virtual power plants (VPPs) are becoming a cornerstone of future grids, aggregating distributed PV, wind, storage, and flexible loads for market participation and real-time balancing. As operations move to minute-- and second--level feedback, communication security shifts from a compliance item to an operational constraint: latency, reliability, and confidentiality jointly determine whether dispatch, protection, and settlement signals arrive on time. Conventional PKI and key-rotation schemes struggle with cross-domain, high-frequency messaging and face long-term quantum threats. Quantum key distribution (QKD) offers information-theoretic key freshness, but its key yield is scarce and stochastic, often misaligned with bursty VPP traffic. This paper proposes a key-aware priority and quota framework that treats quantum keys as first-class scheduling resources. The design combines (i) forecast-driven long-term quotas and short-term tokens, (ii) key-aware deficit-round-robin arbitration, (iii) a preemptive emergency key reserve, and (iv) graceful degradation via encryption-mode switching and controlled down-sampling for non-critical traffic. A drift-plus-penalty analysis establishes strong stability under average supply--demand balance with quantifiable bounds on backlog and tail latency, providing interpretable operating guarantees. We build a reproducible testbed on IEEE 33- and 123-bus VPP systems and evaluate normal, degraded, and outage regimes with industry-consistent message classes and TTLs. Against FIFO, fixed-priority, and static-quota baselines, the proposed scheme consistently reduces tail delay and passive timeouts for critical messages, improves per-bit key utility, and enhances power-tracking reliability during key scarcity and regime switches.
Differentiating Through Power Flow Solutions for Admittance and Topology Control
The power flow equations relate bus voltage phasors to power injections via the network admittance matrix. These equations are central to the key operational and protection functions of power systems (e.g., optimal power flow scheduling and control, state estimation, protection, and fault location, among others). As control, optimization, and estimation of network admittance parameters are central to multiple avenues of research in electric power systems, we propose a linearization of power flow solutions obtained by implicitly differentiating them with respect to the network admittance parameters. This is achieved by utilizing the implicit function theorem, in which we show that such a differentiation is guaranteed to exist under mild conditions and is applicable to generic power systems (radial or meshed). The proposed theory is applied to derive sensitivities of complex voltages, line currents, and power flows. The developed theory of linearizing the power flow equations around changes in the complex network admittance parameters has numerous applications. We demonstrate several of these applications, such as predicting the nodal voltages when the network topology changes without solving the power flow equations. We showcase the application for continuous admittance control, which is used to increase the hosting capacity of a given distribution network.
comment: 10 pages, 6 figures
ANGEL: A Novel Gripper for Versatile and Light-touch Fruit Harvesting
Fruit harvesting remains predominantly a labor-intensive process, motivating the development of research for robotic grippers. Conventional rigid or vacuum-driven grippers require complex mechanical design or high energy consumption. Current enveloping-based fruit harvesting grippers lack adaptability to fruits of different sizes. This paper introduces a drawstring-inspired, cable-driven soft gripper for versatile and gentle fruit harvesting. The design employs 3D-printed Thermoplastic Polyurethane (TPU) pockets with integrated steel wires that constrict around the fruit when actuated, distributing pressure uniformly to minimize bruising and allow versatility to fruits of varying sizes. The lightweight structure, which requires few components, reduces mechanical complexity and cost compared to other grippers. Actuation is achieved through servo-driven cable control, while motor feedback provides autonomous grip adjustment with tunable grip strength. Experimental validation shows that, for tomatoes within the gripper's effective size range, harvesting was achieved with a 0% immediate damage rate and a bruising rate of less than 9% after five days, reinforcing the gripper's suitability for fruit harvesting.
Provably Optimal Reinforcement Learning under Safety Filtering
Recent advances in reinforcement learning (RL) enable its use on increasingly complex tasks, but the lack of formal safety guarantees still limits its application in safety-critical settings. A common practical approach is to augment the RL policy with a safety filter that overrides unsafe actions to prevent failures during both training and deployment. However, safety filtering is often perceived as sacrificing performance and hindering the learning process. We show that this perceived safety-performance tradeoff is not inherent and prove, for the first time, that enforcing safety with a sufficiently permissive safety filter does not degrade asymptotic performance. We formalize RL safety with a safety-critical Markov decision process (SC-MDP), which requires categorical, rather than high-probability, avoidance of catastrophic failure states. Additionally, we define an associated filtered MDP in which all actions result in safe effects, thanks to a safety filter that is considered to be a part of the environment. Our main theorem establishes that (i) learning in the filtered MDP is safe categorically, (ii) standard RL convergence carries over to the filtered MDP, and (iii) any policy that is optimal in the filtered MDP-when executed through the same filter-achieves the same asymptotic return as the best safe policy in the SC-MDP, yielding a complete separation between safety enforcement and performance optimization. We validate the theory on Safety Gymnasium with representative tasks and constraints, observing zero violations during training and final performance matching or exceeding unfiltered baselines. Together, these results shed light on a long-standing question in safety-filtered learning and provide a simple, principled recipe for safe RL: train and deploy RL policies with the most permissive safety filter that is available.
comment: 17 pages, 3 figures
Prompt-to-Primal Teaching
This paper introduces Prompt-to-Primal (P2P) Teaching, an AI-integrated instructional approach that links prompt-driven exploration with first-principles reasoning, guided and moderated by the instructor within the classroom setting. In P2P teaching, student-generated AI prompts serve as entry points for inquiry and initial discussions in class, while the instructor guides learners to validate, challenge, and reconstruct AI responses through fundamental physical and mathematical laws. The approach encourages self-reflective development, critical evaluation of AI outputs, and conceptual foundational knowledge of the core engineering principles. A large language model (LLM) can be a highly effective tool for those who already possess foundational knowledge of a subject; however, it may also mislead students who lack sufficient background in the subject matter. Results from two student cohorts across different semesters suggest the pedagogical effectiveness of the P2P teaching framework in enhancing both AI literacy and engineering reasoning.
comment: 9 pages, 5 figures
An Exact Quantile-Energy Equality for Terminal Halfspaces in Linear-Gaussian Control with a Discrete-Time Companion, KL/Schrodinger Links, and High-Precision Validation
We prove an exact equality between the minimal quadratic control energy and the squared normal-quantile gap for terminal halfspaces in linear-Gaussian systems with additive control and quadratic effort $E(u) = \tfrac12\!\int u^\top M u\,dt$ where $M = B^\top\Sigma^{-1}B$. For terminal halfspace events, the minimal energy equals the squared normal-quantile gap divided by twice a controllability-to-noise ratio $R_T^2(w)=(w^\top W_c^M w)/(w^\top V_T w)$ and is attained by a matched-filter control. We provide an exact zero-order-hold discrete-time companion via block exponentials, relate the result to minimum-energy control, Gaussian isoperimetry, risk-sensitive/KL control, and Schrodinger bridges, and validate to high precision with Monte Carlo. We state assumptions, singular-$M$ handling, and edge cases. The statement is a compact synthesis and design-ready translator, not a universal principle. Novelty: while the ingredients (Gramians, Cauchy-Schwarz, Gaussian isoperimetry) are classical, to our knowledge the explicit quantile-energy equality with a constructive matched-filter achiever for terminal halfspaces, and its discrete-time companion, are not recorded together in the cited literature.
Model-Free Dynamic Consensus in Multi-Agent Systems: A Q-Function Perspective
This paper presents a new method for achieving dynamic consensus in linear discrete-time homogeneous multi-agent systems (MAS) with marginally stable or unstable dynamics. The guarantee of consensus in this setting involves a set of constraints based on the graph's spectral properties, complicating the design of the coupling gains. This challenge intensifies for large-scale systems with diverse graph Laplacian spectra. The proposed approach reformulates the dynamic consensus problem with a prescribed convergence rate using a state-action value function framework inspired by optimal control theory. Specifically, a synthetic linear quadratic regulation (LQR) formulation is introduced to encode the consensus objective, enabling its translation into a convex semidefinite programming (SDP) problem. The resulting SDP is applicable in both model-based and model-free settings for jointly designing the local feedback and coupling gains. To handle the inherent non-convex feasibility conditions, a convex-concave decomposition strategy is employed. Adaptation of the method in a completely model-free set-up eliminates the need for system identification or knowledge of the agents' dynamics. Instead, it relies on input-state data collection and offers an entirely data-driven equivalent SDP formulation. Finally, a new algorithm balancing feasibility, convergence rate, robustness, and energy efficiency, is established to provide design flexibility. Numerical simulations validate the method's effectiveness in various scenarios.
Direct data-driven interpolation and approximation of linear parameter-varying system trajectories
We consider the problem of estimating missing values in trajectories of linear parameter-varying (LPV) systems. We solve this interpolation problem for the class of shifted-affine LPV systems. Conditions for the existence and uniqueness of solutions are given and a direct data-driven algorithm for its computation is presented, i.e., the data-generating system is not given by a parametric model but is implicitly specified by data. We illustrate the applicability of the proposed solution on illustrative examples of a mass-spring-damper system with exogenous and endogenous parameter variation.
comment: 10 pages, 5 figures, submitted for review
Sparse Identification of Nonlinear Dynamics Enhanced by Ensemble Learning, Multi-Step Prediction Evaluation, Elite Strategy, and Classification Techniques for Applications to Industrial Systems
This paper proposes a sparse identification of nonlinear dynamics (SINDy) with control and exogenous inputs for highly accurate and reliable prediction. Although SINDy is recognized as a remarkable approach for identifying nonlinear systems, several challenges remain. Its application to industrial systems remains limited, and multi-step predictions are not guaranteed due to overfitting and noisy data. This phenomenon is often caused by the increase in basis functions resulting from the extension of coordinates, such as time-delay embedding. To address these problems, this study proposes an emphasized SINDy framework by integrating ensemble-learning, multi-step prediction evaluations, elite strategy, and classification techniques (EMEC-SINDy), while preserving convex optimization. The proposed method employs library bagging and extracts elites with an R-squared greater than 90%. Then, clustering is performed on the surviving elites because physically motivated basis functions are not always available, and the elites obtained do not always have similar basis functions. After the classification, discrete model candidates are obtained by taking the mean of each classified elite. Finally, the best model is selected. Simulation results demonstrate that EMEC-SINDy significantly outperforms original SINDy approaches in multi-step prediction accuracy under noisy conditions, validating its applicability to the diesel engine airpath system, which is known as a complex and highly coupled nonlinear multi-input multi-output system.
Observer Design over Hypercomplex Quaternions
We develop observer design over hypercomplex quaternions in a characteristic-polynomial-free framework. Using the standard right-module convention, we derive a right observable companion form and its companion polynomial that encodes error dynamics via right-eigenvalue similarity classes. The design mirrors the real/complex case - coefficient updates in companion coordinates, followed by a similarity back - yet avoids determinants, characteristic/minimal polynomials, and Cayley-Hamilton identities that do not transfer to quaternions. We also give an Ackermann-type construction for the important case of closed-loop companion polynomials with real coefficients, ensuring similarity-equivariant evaluation. The results yield simple recipes for full-order observers directly over quaternions, clarify the role of right spectra and their similarity classes, and pinpoint when classical one-shot formulas remain valid. Numerical examples illustrate the method and advantages over vectorized or complex-adjoint surrogates.
comment: Submitted for presentation at the 24th European Control Conference (ECC 2026), Reykjavik, Iceland. This work was co-funded by the European Union under the project ROBOPROX (reg. no. CZ.02.01.01/00/22 008/0004590)
Optimal control of stochastic reaction networks with entropic control cost and emergence of mode-switching strategies
Controlling the stochastic dynamics of biological populations is a challenge that arises across various biological contexts. However, these dynamics are inherently nonlinear and involve a discrete state space, i.e., the number of molecules, cells, or organisms. Additionally, the possibility of extinction has a significant impact on both dynamics and control strategies, particularly when the population size is small. These factors hamper the direct application of conventional control theories to biological systems. To address these challenges, we formulate the optimal control problem for stochastic population dynamics by utilizing control cost functions based on the f-divergence, which naturally accounts for population-specific factors. If Kullback-Leibler (KL) divergence is adopted for the cost function, the complex nonlinear Hamilton-Jacobi-Bellman equation is simplified into a linear form, facilitating efficient computation of optimal solutions. We demonstrate the effectiveness of our approach by applying it to the control of interacting random walkers, Moran processes, and SIR models, and observe the mode-switching phenomena in the control strategies. Our approach provides new opportunities for applying control theory to a wide range of biological problems.
comment: 12 pages, 4 figures
Informativity Conditions for Multiple Signals: Properties, Experimental Design, and Applications
Recent studies highlight the importance of persistently exciting condition in single signal sequence for model identification and data-driven control methodologies. However, maintaining prolonged excitation in control signals introduces significant challenges, as continuous excitation can reduce the lifetime of mechanical devices. In this paper, we introduce three informativity conditions for various types of multi-signal data, each augmented by weight factors. We explore the interrelations between these conditions and their rank properties in linear time-invariant systems. Furthermore, we introduce open-loop experimental design methods tailored to each of the three conditions, which can synthesize the required excitation conditions either offline or online, even in the presence of limited information within each signal segment. We demonstrate the effectiveness of these informativity conditions in least-squares identification. Additionally, all three conditions can extend Willems' fundamental lemma and are utilized to assess the properties of the system. Illustrative examples confirm that these conditions yield satisfactory outcomes in both least-squares identification and the construction of data-driven controllers.
Accurate Small-Signal Modeling of Digitally Controlled Buck Converters with ADC-PWM Synchronization
Digital control has become increasingly widespread in modern power electronic converters. When acquiring feedback signals such as the inductor current, synchronizing the analog-to-digital converter (ADC) with the digital pulse-width modulator (DPWM) is commonly employed to accurately track their steady-state average. However, the small-signal implications of such synchronization have not been investigated. This paper presents an exact small-signal model for digitally controlled buck converters operating in forced continuous-conduction mode (FCCM) under constant-frequency current-mode control, explicitly accounting for DPWM-ADC synchronization. Using a sampled-data framework, the proposed model captures all sideband effects introduced by the sampling process, yielding precise predictions of both analog and digital loop gains, even at frequencies beyond the switching and sampling frequencies. Both asymmetrical and symmetrical carrier modulations are considered. Furthermore, the digital loop gain is derived in closed form using the modified z-transform, enabling low-complexity compensator design and stability assessment. Within this framework, the analog loop gain can be directly obtained from the digital loop gain, thereby eliminating the need for computationally intensive infinite series evaluations. The validity of the proposed model is confirmed through both simulation and experimental results.
Modeling the Impact of Communication and Human Uncertainties on Runway Capacity in Terminal Airspace
We investigate the potential impact of communication and human performance uncertainties on runway operations. Specifically, we consider these impacts within the context of an arrival scenario with two converging flows: a straight-in approach stream and a downwind stream merging into it. Both arrival stream are modeled using a modified Possion distribution that incorporate the separation minima as well as the runway occupancy time. Various system level uncertainties are addressed in this process, including communication link- and human-related uncertainties. In this research, we first build a Monte Carlo-based discrete-time simulation, where aircraft arrivals are generated by modified Poisson processes subject to minimum separation constraints, simulating various traffic operations. The merging logic incorporates standard bank angle continuous turn-to-final, pilot response delays, and dynamic gap availability in real time. Then, we investigate an automated final approach vectoring model (i.e., Auto-ATC), in which inverse optimal control is used to learn decision advisories from human expert records. By augmenting trajectories and incorporating the aforementioned uncertainties into the planning scenario, we create a setup analogous to the discrete event simulation. For both studies, runway capacity is measured by runway throughput, the fraction of downwind arrivals that merge immediately without holding, and the average delay (i.e., holding time/distance) experienced on the downwind leg. This research provides a method for runway capacity estimation in merging scenarios, and demonstrates that aeronautical communication link uncertainties significantly affect runway capacity in current voice-based operations, whereas the impact can be mitigated in autonomous operational settings.
Identification and Adaptive Control of Markov Jump Systems: Sample Complexity and Regret Bounds
Learning how to effectively control unknown dynamical systems is crucial for intelligent autonomous systems. This task becomes a significant challenge when the underlying dynamics are changing with time. Motivated by this challenge, this paper considers the problem of controlling an unknown Markov jump linear system (MJS) to optimize a quadratic objective. By taking a model-based perspective, we consider identification-based adaptive control of MJSs. We first provide a system identification algorithm for MJS to learn the dynamics in each mode as well as the Markov transition matrix, underlying the evolution of the mode switches, from a single trajectory of the system states, inputs, and modes. Through martingale-based arguments, sample complexity of this algorithm is shown to be $\mathcal{O}(1/\sqrt{T})$. We then propose an adaptive control scheme that performs system identification together with certainty equivalent control to adapt the controllers in an episodic fashion. Combining our sample complexity results with recent perturbation results for certainty equivalent control, we prove that when the episode lengths are appropriately chosen, the proposed adaptive control scheme achieves $\mathcal{O}(\sqrt{T})$ regret, which can be improved to $\mathcal{O}(polylog(T))$ with partial knowledge of the system. Our proof strategy introduces innovations to handle Markovian jumps and a weaker notion of stability common in MJSs. Our analysis provides insights into system theoretic quantities that affect learning accuracy and control performance. Numerical simulations are presented to further reinforce these insights.
comment: Improved results using Martingale-based arguments
Robotics
DINO-CVA: A Multimodal Goal-Conditioned Vision-to-Action Model for Autonomous Catheter Navigation
Cardiac catheterization remains a cornerstone of minimally invasive interventions, yet it continues to rely heavily on manual operation. Despite advances in robotic platforms, existing systems are predominantly follow-leader in nature, requiring continuous physician input and lacking intelligent autonomy. This dependency contributes to operator fatigue, more radiation exposure, and variability in procedural outcomes. This work moves towards autonomous catheter navigation by introducing DINO-CVA, a multimodal goal-conditioned behavior cloning framework. The proposed model fuses visual observations and joystick kinematics into a joint embedding space, enabling policies that are both vision-aware and kinematic-aware. Actions are predicted autoregressively from expert demonstrations, with goal conditioning guiding navigation toward specified destinations. A robotic experimental setup with a synthetic vascular phantom was designed to collect multimodal datasets and evaluate performance. Results show that DINO-CVA achieves high accuracy in predicting actions, matching the performance of a kinematics-only baseline while additionally grounding predictions in the anatomical environment. These findings establish the feasibility of multimodal, goal-conditioned architectures for catheter navigation, representing an important step toward reducing operator dependency and improving the reliability of catheterbased therapies.
Safe Payload Transfer with Ship-Mounted Cranes: A Robust Model Predictive Control Approach
Ensuring safe real-time control of ship-mounted cranes in unstructured transportation environments requires handling multiple safety constraints while maintaining effective payload transfer performance. Unlike traditional crane systems, ship-mounted cranes are consistently subjected to significant external disturbances affecting underactuated crane dynamics due to the ship's dynamic motion response to harsh sea conditions, which can lead to robustness issues. To tackle these challenges, we propose a robust and safe model predictive control (MPC) framework and demonstrate it on a 5-DOF crane system, where a Stewart platform simulates the external disturbances that ocean surface motions would have on the supporting ship. The crane payload transfer operation must avoid obstacles and accurately place the payload within a designated target area. We use a robust zero-order control barrier function (R-ZOCBF)-based safety constraint in the nonlinear MPC to ensure safe payload positioning, while time-varying bounding boxes are utilized for collision avoidance. We introduce a new optimization-based online robustness parameter adaptation scheme to reduce the conservativeness of R-ZOCBFs. Experimental trials on a crane prototype demonstrate the overall performance of our safe control approach under significant perturbing motions of the crane base. While our focus is on crane-facilitated transfer, the methods more generally apply to safe robotically-assisted parts mating and parts insertion.
Design of an Affordable, Fully-Actuated Biomimetic Hand for Dexterous Teleoperation Systems IROS2025
This paper addresses the scarcity of affordable, fully-actuated five-fingered hands for dexterous teleoperation, which is crucial for collecting large-scale real-robot data within the "Learning from Demonstrations" paradigm. We introduce the prototype version of the RAPID Hand, the first low-cost, 20-degree-of-actuation (DoA) dexterous hand that integrates a novel anthropomorphic actuation and transmission scheme with an optimized motor layout and structural design to enhance dexterity. Specifically, the RAPID Hand features a universal phalangeal transmission scheme for the non-thumb fingers and an omnidirectional thumb actuation mechanism. Prioritizing affordability, the hand employs 3D-printed parts combined with custom gears for easier replacement and repair. We assess the RAPID Hand's performance through quantitative metrics and qualitative testing in a dexterous teleoperation system, which is evaluated on three challenging tasks: multi-finger retrieval, ladle handling, and human-like piano playing. The results indicate that the RAPID Hand's fully actuated 20-DoF design holds significant promise for dexterous teleoperation.
comment: Accepted by IROS2025
C-Free-Uniform: A Map-Conditioned Trajectory Sampler for Model Predictive Path Integral Control ICRA
Trajectory sampling is a key component of sampling-based control mechanisms. Trajectory samplers rely on control input samplers, which generate control inputs u from a distribution p(u | x) where x is the current state. We introduce the notion of Free Configuration Space Uniformity (C-Free-Uniform for short) which has two key features: (i) it generates a control input distribution so as to uniformly sample the free configuration space, and (ii) in contrast to previously introduced trajectory sampling mechanisms where the distribution p(u | x) is independent of the environment, C-Free-Uniform is explicitly conditioned on the current local map. Next, we integrate this sampler into a new Model Predictive Path Integral (MPPI) Controller, CFU-MPPI. Experiments show that CFU-MPPI outperforms existing methods in terms of success rate in challenging navigation tasks in cluttered polygonal environments while requiring a much smaller sampling budget.
comment: Submitted to the 2026 IEEE International Conference on Robotics and Automation (ICRA). 8 pages, 4 figures
DiRAC - Distributed Robot Awareness and Consensus
DiRAC is a scalable, distributed framework designed to enable efficient task assignment and path planning in very large robotic swarms. It introduces a novel zone-partitioned architecture with dynamically elected leaders and a tick-synchronized consensus protocol that yields strong consistency and deterministic outcomes. For path planning, DiRAC uses a novel algorithm, a force-based decentralized planner for real-time collision resolution. Validated within ROS 2 middleware through preliminary simulation, DiRAC demonstrates architectural scalability and modular efficiency in simulated warehouse environments, laying the groundwork for real-world deployment in large-scale industrial and logistics domains.
An RGB-D Image Dataset for Lychee Detection and Maturity Classification for Robotic Harvesting
Lychee is a high-value subtropical fruit. The adoption of vision-based harvesting robots can significantly improve productivity while reduce reliance on labor. High-quality data are essential for developing such harvesting robots. However, there are currently no consistently and comprehensively annotated open-source lychee datasets featuring fruits in natural growing environments. To address this, we constructed a dataset to facilitate lychee detection and maturity classification. Color (RGB) images were acquired under diverse weather conditions, and at different times of the day, across multiple lychee varieties, such as Nuomici, Feizixiao, Heiye, and Huaizhi. The dataset encompasses three different ripeness stages and contains 11,414 images, consisting of 878 raw RGB images, 8,780 augmented RGB images, and 1,756 depth images. The images are annotated with 9,658 pairs of lables for lychee detection and maturity classification. To improve annotation consistency, three individuals independently labeled the data, and their results were then aggregated and verified by a fourth reviewer. Detailed statistical analyses were done to examine the dataset. Finally, we performed experiments using three representative deep learning models to evaluate the dataset. It is publicly available for academic
A Preliminary Exploration of the Differences and Conjunction of Traditional PNT and Brain-inspired PNT
Developing universal Positioning, Navigation, and Timing (PNT) is our enduring goal. Today's complex environments demand PNT that is more resilient, energy-efficient and cognitively capable. This paper asks how we can endow unmanned systems with brain-inspired spatial cognition navigation while exploiting the high precision of machine PNT to advance universal PNT. We provide a new perspective and roadmap for shifting PNT from "tool-oriented" to "cognition-driven". Contributions: (1) multi-level dissection of differences among traditional PNT, biological brain PNT and brain-inspired PNT; (2) a four-layer (observation-capability-decision-hardware) fusion framework that unites numerical precision and brain-inspired intelligence; (3) forward-looking recommendations for future development of brain-inspired PNT.
T3 Planner: A Self-Correcting LLM Framework for Robotic Motion Planning with Temporal Logic
Translating natural language instructions into executable motion plans is a fundamental challenge in robotics. Traditional approaches are typically constrained by their reliance on domain-specific expertise to customize planners, and often struggle with spatio-temporal couplings that usually lead to infeasible motions or discrepancies between task planning and motion execution. Despite the proficiency of Large Language Models (LLMs) in high-level semantic reasoning, hallucination could result in infeasible motion plans. In this paper, we introduce the T3 Planner, an LLM-enabled robotic motion planning framework that self-corrects it output with formal methods. The framework decomposes spatio-temporal task constraints via three cascaded modules, each of which stimulates an LLM to generate candidate trajectory sequences and examines their feasibility via a Signal Temporal Logic (STL) verifier until one that satisfies complex spatial, temporal, and logical constraints is found.Experiments across different scenarios show that T3 Planner significantly outperforms the baselines. The required reasoning can be distilled into a lightweight Qwen3-4B model that enables efficient deployment. All supplementary materials are accessible at https://github.com/leeejia/T3_Planner.
End-to-end Listen, Look, Speak and Act
Human interaction is inherently multimodal and full-duplex: we listen while watching, speak while acting, and fluidly adapt to turn-taking and interruptions. Realizing these capabilities is essential for building models simulating humans. We present ELLSA (End-to-end Listen, Look, Speak and Act), which, to our knowledge, is the first full-duplex, end-to-end model that simultaneously perceives and generates across vision, text, speech, and action within a single architecture, enabling interaction patterns previously out of reach, yielding more natural, human-like behaviors. At its core is a novel SA-MoE architecture (Self-Attention Mixture-of-Experts) that routes each modality to specialized experts and fuses them through a unified attention backbone. This provides a generalizable solution for joint multimodal perception and concurrent generation, leveraging strong pre-trained components while enabling efficient modality integration and mitigating modality interference. On speech-interaction and robot-manipulation benchmarks, ELLSA matches modality-specific baselines, while uniquely supporting advanced multimodal and full-duplex behaviors such as dialogue and action turn-taking, defective instruction rejection, speaking-while-acting, context-grounded visual question answering, and action barge-ins. We contend that ELLSA represents a step toward more natural and general interactive intelligence, contributing to the broader pursuit of artificial general intelligence. All data, code and model checkpoints will be released upon acceptance.
comment: 22 pages, 8 figures
Adaptive Invariant Extended Kalman Filter for Legged Robot State Estimation IROS
State estimation is crucial for legged robots as it directly affects control performance and locomotion stability. In this paper, we propose an Adaptive Invariant Extended Kalman Filter to improve proprioceptive state estimation for legged robots. The proposed method adaptively adjusts the noise level of the contact foot model based on online covariance estimation, leading to improved state estimation under varying contact conditions. It effectively handles small slips that traditional slip rejection fails to address, as overly sensitive slip rejection settings risk causing filter divergence. Our approach employs a contact detection algorithm instead of contact sensors, reducing the reliance on additional hardware. The proposed method is validated through real-world experiments on the quadruped robot LeoQuad, demonstrating enhanced state estimation performance in dynamic locomotion scenarios.
comment: 6 pages, accepted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2025
Towards Active Excitation-Based Dynamic Inertia Identification in Satellites
This paper presents a comprehensive analysis of how excitation design influences the identification of the inertia properties of rigid nano- and micro-satellites. We simulate nonlinear attitude dynamics with reaction-wheel coupling, actuator limits, and external disturbances, and excite the system using eight torque profiles of varying spectral richness. Two estimators are compared, a batch Least Squares method and an Extended Kalman Filter, across three satellite configurations and time-varying inertia scenarios. Results show that excitation frequency content and estimator assumptions jointly determine estimation accuracy and robustness, offering practical guidance for in-orbit adaptive inertia identification by outlining the conditions under which each method performs best. The code is provided as open-source .
First Responders' Perceptions of Semantic Information for Situational Awareness in Robot-Assisted Emergency Response
This study investigates First Responders' (FRs) attitudes toward the use of semantic information and Situational Awareness (SA) in robotic systems during emergency operations. A structured questionnaire was administered to 22 FRs across eight countries, capturing their demographic profiles, general attitudes toward robots, and experiences with semantics-enhanced SA. Results show that most FRs expressed positive attitudes toward robots, and rated the usefulness of semantic information for building SA at an average of 3.6 out of 5. Semantic information was also valued for its role in predicting unforeseen emergencies (mean 3.9). Participants reported requiring an average of 74.6\% accuracy to trust semantic outputs and 67.8\% for them to be considered useful, revealing a willingness to use imperfect but informative AI support tools. To the best of our knowledge, this study offers novel insights by being one of the first to directly survey FRs on semantic-based SA in a cross-national context. It reveals the types of semantic information most valued in the field, such as object identity, spatial relationships, and risk context-and connects these preferences to the respondents' roles, experience, and education levels. The findings also expose a critical gap between lab-based robotics capabilities and the realities of field deployment, highlighting the need for more meaningful collaboration between FRs and robotics researchers. These insights contribute to the development of more user-aligned and situationally aware robotic systems for emergency response.
CBF-RL: Safety Filtering Reinforcement Learning in Training with Control Barrier Functions
Reinforcement learning (RL), while powerful and expressive, can often prioritize performance at the expense of safety. Yet safety violations can lead to catastrophic outcomes in real-world deployments. Control Barrier Functions (CBFs) offer a principled method to enforce dynamic safety -- traditionally deployed online via safety filters. While the result is safe behavior, the fact that the RL policy does not have knowledge of the CBF can lead to conservative behaviors. This paper proposes CBF-RL, a framework for generating safe behaviors with RL by enforcing CBFs in training. CBF-RL has two key attributes: (1) minimally modifying a nominal RL policy to encode safety constraints via a CBF term, (2) and safety filtering of the policy rollouts in training. Theoretically, we prove that continuous-time safety filters can be deployed via closed-form expressions on discrete-time roll-outs. Practically, we demonstrate that CBF-RL internalizes the safety constraints in the learned policy -- both enforcing safer actions and biasing towards safer rewards -- enabling safe deployment without the need for an online safety filter. We validate our framework through ablation studies on navigation tasks and on the Unitree G1 humanoid robot, where CBF-RL enables safer exploration, faster convergence, and robust performance under uncertainty, enabling the humanoid robot to avoid obstacles and climb stairs safely in real-world settings without a runtime safety filter.
comment: 8 pages
Architecture Is All You Need: Diversity-Enabled Sweet Spots for Robust Humanoid Locomotion
Robust humanoid locomotion in unstructured environments requires architectures that balance fast low-level stabilization with slower perceptual decision-making. We show that a simple layered control architecture (LCA), a proprioceptive stabilizer running at high rate, coupled with a compact low-rate perceptual policy, enables substantially more robust performance than monolithic end-to-end designs, even when using minimal perception encoders. Through a two-stage training curriculum (blind stabilizer pretraining followed by perceptual fine-tuning), we demonstrate that layered policies consistently outperform one-stage alternatives in both simulation and hardware. On a Unitree G1 humanoid, our approach succeeds across stair and ledge tasks where one-stage perceptual policies fail. These results highlight that architectural separation of timescales, rather than network scale or complexity, is the key enabler for robust perception-conditioned locomotion.
comment: 8 pages
Online automatic code generation for robot swarms: LLMs and self-organizing hierarchy IROS 2025
Our recently introduced self-organizing nervous system (SoNS) provides robot swarms with 1) ease of behavior design and 2) global estimation of the swarm configuration and its collective environment, facilitating the implementation of online automatic code generation for robot swarms. In a demonstration with 6 real robots and simulation trials with >30 robots, we show that when a SoNS-enhanced robot swarm gets stuck, it can automatically solicit and run code generated by an external LLM on the fly, completing its mission with an 85% success rate.
comment: This abstract was accepted to and presented at the "Multi-Agent Cooperative Systems and Swarm Robotics in the Era of Generative AI" (MACRAI) workshop at the 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025)
VolleyBots: A Testbed for Multi-Drone Volleyball Game Combining Motion Control and Strategic Play NeurIPS 2025
Robot sports, characterized by well-defined objectives, explicit rules, and dynamic interactions, present ideal scenarios for demonstrating embodied intelligence. In this paper, we present VolleyBots, a novel robot sports testbed where multiple drones cooperate and compete in the sport of volleyball under physical dynamics. VolleyBots integrates three features within a unified platform: competitive and cooperative gameplay, turn-based interaction structure, and agile 3D maneuvering. These intertwined features yield a complex problem combining motion control and strategic play, with no available expert demonstrations. We provide a comprehensive suite of tasks ranging from single-drone drills to multi-drone cooperative and competitive tasks, accompanied by baseline evaluations of representative reinforcement learning (RL), multi-agent reinforcement learning (MARL) and game-theoretic algorithms. Simulation results show that on-policy RL methods outperform off-policy methods in single-agent tasks, but both approaches struggle in complex tasks that combine motion control and strategic play. We additionally design a hierarchical policy which achieves 69.5% win rate against the strongest baseline in the 3 vs 3 task, demonstrating its potential for tackling the complex interplay between low-level control and high-level strategy. To highlight VolleyBots' sim-to-real potential, we further demonstrate the zero-shot deployment of a policy trained entirely in simulation on real-world drones.
comment: Accepted by NeurIPS 2025
Learn2Decompose: Learning Problem Decomposition for Efficient Sequential Multi-object Manipulation Planning
We present a Reactive Task and Motion Planning (TAMP) approach for efficient sequential multi-object manipulation in dynamic environments. Conventional TAMP solvers experience an exponential increase in planning time as the planning horizon and number of objects grow, limiting their applicability in real-world scenarios. To address this, we propose learning problem decomposition from demonstrations to accelerate TAMP solvers. Our approach consists of three key components: goal decomposition learning, temporal distance learning, and object reduction. Goal decomposition identifies the necessary sequences of states that the system must pass through before reaching the final goal, treating them as subgoal sequences. Temporal distance learning predicts the temporal distance between two states, enabling the system to identify the closest subgoal from a disturbed state. Object reduction minimizes the set of active objects considered during replanning, further improving efficiency. We evaluate our approach on three benchmarks, demonstrating its effectiveness in improving replanning efficiency for sequential multi-object manipulation tasks in dynamic environments.
LIPM-Guided Reinforcement Learning for Stable and Perceptive Locomotion in Bipedal Robots
Achieving stable and robust perceptive locomotion for bipedal robots in unstructured outdoor environments remains a critical challenge due to complex terrain geometry and susceptibility to external disturbances. In this work, we propose a novel reward design inspired by the Linear Inverted Pendulum Model (LIPM) to enable perceptive and stable locomotion in the wild. The LIPM provides theoretical guidance for dynamic balance by regulating the center of mass (CoM) height and the torso orientation. These are key factors for terrain-aware locomotion, as they help ensure a stable viewpoint for the robot's camera. Building on this insight, we design a reward function that promotes balance and dynamic stability while encouraging accurate CoM trajectory tracking. To adaptively trade off between velocity tracking and stability, we leverage the Reward Fusion Module (RFM) approach that prioritizes stability when needed. A double-critic architecture is adopted to separately evaluate stability and locomotion objectives, improving training efficiency and robustness. We validate our approach through extensive experiments on a bipedal robot in both simulation and real-world outdoor environments. The results demonstrate superior terrain adaptability, disturbance rejection, and consistent performance across a wide range of speeds and perceptual conditions.
Exploring the Limits of Vision-Language-Action Manipulations in Cross-task Generalization
The generalization capabilities of vision-language-action (VLA) models to unseen tasks are crucial to achieving general-purpose robotic manipulation in open-world settings. However, the cross-task generalization capabilities of existing VLA models remain significantly underexplored. To address this gap, we introduce AGNOSTOS, a novel simulation benchmark designed to rigorously evaluate cross-task zero-shot generalization in manipulation. AGNOSTOS comprises 23 unseen manipulation tasks for testing, distinct from common training task distributions, and incorporates two levels of generalization difficulty to assess robustness. Our systematic evaluation reveals that current VLA models, despite being trained on diverse datasets, struggle to generalize effectively to these unseen tasks. To overcome this limitation, we propose Cross-Task In-Context Manipulation (X-ICM), a method that conditions large language models (LLMs) on in-context demonstrations from seen tasks to predict action sequences for unseen tasks. Additionally, we introduce a dynamics-guided sample selection strategy that identifies relevant demonstrations by capturing cross-task dynamics. On AGNOSTOS, X-ICM significantly improves cross-task zero-shot generalization performance over leading VLAs. We believe AGNOSTOS and X-ICM will serve as valuable tools for advancing general-purpose robotic manipulation.
comment: Project Page: https://jiaming-zhou.github.io/AGNOSTOS
AGENTSAFE: Benchmarking the Safety of Embodied Agents on Hazardous Instructions
The integration of vision-language models (VLMs) is driving a new generation of embodied agents capable of operating in human-centered environments. However, as deployment expands, these systems face growing safety risks, particularly when executing hazardous instructions. Current safety evaluation benchmarks remain limited: they cover only narrow scopes of hazards and focus primarily on final outcomes, neglecting the agent's full perception-planning-execution process and thereby obscuring critical failure modes. Therefore, we present SAFE, a benchmark for systematically assessing the safety of embodied VLM agents on hazardous instructions. SAFE comprises three components: SAFE-THOR, an extensible adversarial simulation sandbox with a universal adapter that maps high-level VLM outputs to low-level embodied controls, supporting diverse agent workflow integration; SAFE-VERSE, a risk-aware task suite inspired by Asimov's Three Laws of Robotics, comprising 45 adversarial scenarios, 1,350 hazardous tasks, and 9,900 instructions that span risks to humans, environments, and agents; and SAFE-DIAGNOSE, a multi-level and fine-grained evaluation protocol measuring agent performance across perception, planning, and execution. Applying SAFE to nine state-of-the-art VLMs and two embodied agent workflows, we uncover systematic failures in translating hazard recognition into safe planning and execution. Our findings reveal fundamental limitations in current safety alignment and demonstrate the necessity of a comprehensive, multi-stage evaluation for developing safer embodied intelligence.
ETA-IK: Execution-Time-Aware Inverse Kinematics for Dual-Arm Systems
This paper presents ETA-IK, a novel Execution-Time-Aware Inverse Kinematics method tailored for dual-arm robotic systems. The primary goal is to optimize motion execution time by leveraging the redundancy of both arms, specifically in tasks where only the relative pose of the robots is constrained, such as dual-arm scanning of unknown objects. Unlike traditional inverse kinematics methods that use surrogate metrics such as joint configuration distance, our method incorporates direct motion execution time and implicit collisions into the optimization process, thereby finding target joints that allow subsequent trajectory generation to get more efficient and collision-free motion. A neural network based execution time approximator is employed to predict time-efficient joint configurations while accounting for potential collisions. Through experimental evaluation on a system composed of a UR5 and a KUKA iiwa robot, we demonstrate significant reductions in execution time. The proposed method outperforms conventional approaches, showing improved motion efficiency without sacrificing positioning accuracy. These results highlight the potential of ETA-IK to improve the performance of dual-arm systems in applications, where efficiency and safety are paramount.
MoRe-ERL: Learning Motion Residuals using Episodic Reinforcement Learning
We propose MoRe-ERL, a framework that combines Episodic Reinforcement Learning (ERL) and residual learning, which refines preplanned reference trajectories into safe, feasible, and efficient task-specific trajectories. This framework is general enough to incorporate into arbitrary ERL methods and motion generators seamlessly. MoRe-ERL identifies trajectory segments requiring modification while preserving critical task-related maneuvers. Then it generates smooth residual adjustments using B-Spline-based movement primitives to ensure adaptability to dynamic task contexts and smoothness in trajectory refinement. Experimental results demonstrate that residual learning significantly outperforms training from scratch using ERL methods, achieving superior sample efficiency and task performance. Hardware evaluations further validate the framework, showing that policies trained in simulation can be directly deployed in real-world systems, exhibiting a minimal sim-to-real gap.
Safe Multi-Agent Reinforcement Learning for Behavior-Based Cooperative Navigation
In this paper, we address the problem of behavior-based cooperative navigation of mobile robots using safe multi-agent reinforcement learning~(MARL). Our work is the first to focus on cooperative navigation without individual reference targets for the robots, using a single target for the formation's centroid. This eliminates the complexities involved in having several path planners to control a team of robots. To ensure safety, our MARL framework uses model predictive control (MPC) to prevent actions that could lead to collisions during training and execution. We demonstrate the effectiveness of our method in simulation and on real robots, achieving safe behavior-based cooperative navigation without using individual reference targets, with zero collisions, and faster target reaching compared to baselines. Finally, we study the impact of MPC safety filters on the learning process, revealing that we achieve faster convergence during training and we show that our approach can be safely deployed on real robots, even during early stages of the training.
Multiagent Systems
ReclAIm: A multi-agent framework for degradation-aware performance tuning of medical imaging AI
Ensuring the long-term reliability of AI models in clinical practice requires continuous performance monitoring and corrective actions when degradation occurs. Addressing this need, this manuscript presents ReclAIm, a multi-agent framework capable of autonomously monitoring, evaluating, and fine-tuning medical image classification models. The system, built on a large language model core, operates entirely through natural language interaction, eliminating the need for programming expertise. ReclAIm successfully trains, evaluates, and maintains consistent performance of models across MRI, CT, and X-ray datasets. Once ReclAIm detects significant performance degradation, it autonomously executes state-of-the-art fine-tuning procedures that substantially reduce the performance gap. In cases with performance drops of up to -41.1% (MRI InceptionV3), ReclAIm managed to readjust performance metrics within 1.5% of the initial model results. ReclAIm enables automated, continuous maintenance of medical imaging AI models in a user-friendly and adaptable manner that facilitates broader adoption in both research and clinical environments.
comment: 25 pages, 4 figures
Lark: Biologically Inspired Neuroevolution for Multi-Stakeholder LLM Agents NeurIPS 2025
We present Lark, a biologically inspired decision-making framework that couples LLM-driven reasoning with an evolutionary, stakeholder-aware Multi-Agent System (MAS). To address verbosity and stakeholder trade-offs, we integrate four mechanisms: (i) plasticity, which applies concise adjustments to candidate solutions; (ii) duplication and maturation, which copy high-performing candidates and specialize them into new modules; (iii) ranked-choice stakeholder aggregation using influence-weighted Borda scoring; and (iv) compute awareness via token-based penalties that reward brevity. The system iteratively proposes diverse strategies, applies plasticity tweaks, simulates stakeholder evaluations, aggregates preferences, selects top candidates, and performs duplication/maturation while factoring compute cost into final scores. In a controlled evaluation over 30 rounds comparing 14 systems, Lark Full achieves a mean rank of 2.55 (95% CI [2.17, 2.93]) and a mean composite score of 29.4/50 (95% CI [26.34, 32.46]), finishing Top-3 in 80% of rounds while remaining cost competitive with leading commercial models ($0.016 per task). Paired Wilcoxon tests confirm that all four mechanisms contribute significantly as ablating duplication/maturation yields the largest deficit ({\Delta}Score = 3.5, Cohen's d_z = 2.53, p < 0.001), followed by plasticity ({\Delta}Score = 3.4, d_z = 1.86), ranked-choice voting ({\Delta}Score = 2.4, d_z = 1.20), and token penalties ({\Delta}Score = 2.2, d_z = 1.63). Rather than a formal Markov Decision Process with constrained optimization, Lark is a practical, compute-aware neuroevolutionary loop that scales stakeholder-aligned strategy generation and makes trade-offs transparent through per-step metrics. Our work presents proof-of-concept findings and invites community feedback as we expand toward real-world validation studies.
comment: 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: NeurIPS 2025 Workshop on Efficient Reasoning
DiRAC - Distributed Robot Awareness and Consensus
DiRAC is a scalable, distributed framework designed to enable efficient task assignment and path planning in very large robotic swarms. It introduces a novel zone-partitioned architecture with dynamically elected leaders and a tick-synchronized consensus protocol that yields strong consistency and deterministic outcomes. For path planning, DiRAC uses a novel algorithm, a force-based decentralized planner for real-time collision resolution. Validated within ROS 2 middleware through preliminary simulation, DiRAC demonstrates architectural scalability and modular efficiency in simulated warehouse environments, laying the groundwork for real-world deployment in large-scale industrial and logistics domains.
Surrogate Modeling and Explainable Artificial Intelligence for Complex Systems: A Workflow for Automated Simulation Exploration
Complex systems are increasingly explored through simulation-driven engineering workflows that combine physics-based and empirical models with optimization and analytics. Despite their power, these workflows face two central obstacles: (1) high computational cost, since accurate exploration requires many expensive simulator runs; and (2) limited transparency and reliability when decisions rely on opaque blackbox components. We propose a workflow that addresses both challenges by training lightweight emulators on compact designs of experiments that (i) provide fast, low-latency approximations of expensive simulators, (ii) enable rigorous uncertainty quantification, and (iii) are adapted for global and local Explainable Artificial Intelligence (XAI) analyses. This workflow unifies every simulation-based complex-system analysis tool, ranging from engineering design to agent-based models for socio-environmental understanding. In this paper, we proposea comparative methodology and practical recommendations for using surrogate-based explainability tools within the proposed workflow. The methodology supports continuous and categorical inputs, combines global-effect and uncertainty analyses with local attribution, and evaluates the consistency of explanations across surrogate models, thereby diagnosing surrogate adequacy and guiding further data collection or model refinement. We demonstrate the approach on two contrasting case studies: a multidisciplinary design analysis of a hybrid-electric aircraft and an agent-based model of urban segregation. Results show that the surrogate model and XAI coupling enables large-scale exploration in seconds, uncovers nonlinear interactions and emergent behaviors, identifies key design and policy levers, and signals regions where surrogates require more data or alternative architectures.
TACLA: An LLM-Based Multi-Agent Tool for Transactional Analysis Training in Education ICTAI 2025
Simulating nuanced human social dynamics with Large Language Models (LLMs) remains a significant challenge, particularly in achieving psychological depth and consistent persona behavior crucial for high-fidelity training tools. This paper introduces TACLA (Transactional Analysis Contextual LLM-based Agents), a novel Multi-Agent architecture designed to overcome these limitations. TACLA integrates core principles of Transactional Analysis (TA) by modeling agents as an orchestrated system of distinct Parent, Adult, and Child ego states, each with its own pattern memory. An Orchestrator Agent prioritizes ego state activation based on contextual triggers and an agent's life script, ensuring psychologically authentic responses. Validated in an educational scenario, TACLA demonstrates realistic ego state shifts in Student Agents, effectively modeling conflict de-escalation and escalation based on different teacher intervention strategies. Evaluation shows high conversational credibility and confirms TACLA's capacity to create dynamic, psychologically-grounded social simulations, advancing the development of effective AI tools for education and beyond.
comment: Accepted for publication in the proceedings of ICTAI 2025
A Vision for Access Control in LLM-based Agent Systems
The autonomy and contextual complexity of LLM-based agents render traditional access control (AC) mechanisms insufficient. Static, rule-based systems designed for predictable environments are fundamentally ill-equipped to manage the dynamic information flows inherent in agentic interactions. This position paper argues for a paradigm shift from binary access control to a more sophisticated model of information governance, positing that the core challenge is not merely about permission, but about governing the flow of information. We introduce Agent Access Control (AAC), a novel framework that reframes AC as a dynamic, context-aware process of information flow governance. AAC operates on two core modules: (1) multi-dimensional contextual evaluation, which assesses not just identity but also relationships, scenarios, and norms; and (2) adaptive response formulation, which moves beyond simple allow/deny decisions to shape information through redaction, summarization, and paraphrasing. This vision, powered by a dedicated AC reasoning engine, aims to bridge the gap between human-like nuanced judgment and scalable Al safety, proposing a new conceptual lens for future research in trustworthy agent design.
comment: 11 pages, 1 figure
Co-Alignment: Rethinking Alignment as Bidirectional Human-AI Cognitive Adaptation
Current AI alignment through RLHF follows a single directional paradigm that AI conforms to human preferences while treating human cognition as fixed. We propose a shift to co-alignment through Bidirectional Cognitive Alignment (BiCA), where humans and AI mutually adapt. BiCA uses learnable protocols, representation mapping, and KL-budget constraints for controlled co-evolution. In collaborative navigation, BiCA achieved 85.5% success versus 70.3% baseline, with 230% better mutual adaptation and 332% better protocol convergence. Emergent protocols outperformed handcrafted ones by 84%, while bidirectional adaptation unexpectedly improved safety (+23% out-of-distribution robustness). The 46% synergy improvement demonstrates optimal collaboration exists at the intersection, not union, of human and AI capabilities, validating the shift from single-directional to co-alignment paradigms.
Sequence Modeling for N-Agent Ad Hoc Teamwork
N-agent ad hoc teamwork (NAHT) is a newly introduced challenge in multi-agent reinforcement learning, where controlled subteams of varying sizes must dynamically collaborate with varying numbers and types of unknown teammates without pre-coordination. The existing learning algorithm (POAM) considers only independent learning for its flexibility in dealing with a changing number of agents. However, independent learning fails to fully capture the inter-agent dynamics essential for effective collaboration. Based on our observation that transformers deal effectively with sequences with varying lengths and have been shown to be highly effective for a variety of machine learning problems, this work introduces a centralized, transformer-based method for N-agent ad hoc teamwork. Our proposed approach incorporates historical observations and actions of all controlled agents, enabling optimal responses to diverse and unseen teammates in partially observable environments. Empirical evaluation on a StarCraft II task demonstrates that MAT-NAHT outperforms POAM, achieving superior sample efficiency and generalization, without auxiliary agent-modeling objectives.
comment: Presented at RLDM 2025
Smart Traffic Signals: Comparing MARL and Fixed-Time Strategies
Urban traffic congestion, particularly at intersections, significantly impacts travel time, fuel consumption, and emissions. Traditional fixed-time signal control systems often lack the adaptability to manage dynamic traffic patterns effectively. This study explores the application of multi-agent reinforcement learning (MARL) to optimize traffic signal coordination across multiple intersections within a simulated environment. Utilizing Pygame, a simulation was developed to model a network of interconnected intersections with randomly generated vehicle flows to reflect realistic traffic variability. A decentralized MARL controller was implemented, in which each traffic signal operates as an autonomous agent, making decisions based on local observations and information from neighboring agents. Performance was evaluated against a baseline fixed-time controller using metrics such as average vehicle wait time and overall throughput. The MARL approach demonstrated statistically significant improvements, including reduced average waiting times and improved throughput. These findings suggest that MARL-based dynamic control strategies hold substantial promise for improving urban traffic management efficiency. More research is recommended to address scalability and real-world implementation challenges.
Online automatic code generation for robot swarms: LLMs and self-organizing hierarchy IROS 2025
Our recently introduced self-organizing nervous system (SoNS) provides robot swarms with 1) ease of behavior design and 2) global estimation of the swarm configuration and its collective environment, facilitating the implementation of online automatic code generation for robot swarms. In a demonstration with 6 real robots and simulation trials with >30 robots, we show that when a SoNS-enhanced robot swarm gets stuck, it can automatically solicit and run code generated by an external LLM on the fly, completing its mission with an 85% success rate.
comment: This abstract was accepted to and presented at the "Multi-Agent Cooperative Systems and Swarm Robotics in the Era of Generative AI" (MACRAI) workshop at the 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025)
Systems and Control (CS)
Safe Payload Transfer with Ship-Mounted Cranes: A Robust Model Predictive Control Approach
Ensuring safe real-time control of ship-mounted cranes in unstructured transportation environments requires handling multiple safety constraints while maintaining effective payload transfer performance. Unlike traditional crane systems, ship-mounted cranes are consistently subjected to significant external disturbances affecting underactuated crane dynamics due to the ship's dynamic motion response to harsh sea conditions, which can lead to robustness issues. To tackle these challenges, we propose a robust and safe model predictive control (MPC) framework and demonstrate it on a 5-DOF crane system, where a Stewart platform simulates the external disturbances that ocean surface motions would have on the supporting ship. The crane payload transfer operation must avoid obstacles and accurately place the payload within a designated target area. We use a robust zero-order control barrier function (R-ZOCBF)-based safety constraint in the nonlinear MPC to ensure safe payload positioning, while time-varying bounding boxes are utilized for collision avoidance. We introduce a new optimization-based online robustness parameter adaptation scheme to reduce the conservativeness of R-ZOCBFs. Experimental trials on a crane prototype demonstrate the overall performance of our safe control approach under significant perturbing motions of the crane base. While our focus is on crane-facilitated transfer, the methods more generally apply to safe robotically-assisted parts mating and parts insertion.
Ultra High Sensitivity Soil Moisture Detection Using Photonic Crystal Cavity with SIW Technology
Soil nutrients and water content are two crucial factors that significantly affect agricultural production yields. Hence, monitoring and measuring the water content and soil type are critical requirements. This study proposes a two-dimensional structure of photonic crystals centered around a symmetrical cross-shaped slot. The cross-slots act as resonators, and the photonic crystals surrounding the slots tune the resonance frequency of the resonators to enhance mode confinement within the resonator. The various resonant modes are located in the 2.1 GHz, 5.2 GHz, and 8.1 GHz bands, which correspond to the S band, C band, and X band, respectively. These bands are used to compare the absorption, whereas the upper resonant mode is of the order of 20 GHz. Band structure analysis was performed using the Plane Wave Method (PWM). The resonant frequency is computed using a 3D electromagnetic (EM) simulation software that utilizes the Finite Element Method (FEM) and lies in the radiation mode region of the band structure of the photonic crystal. Varying the incident angle had a negligible effect on the absorption characteristics of the sensor, allowing it to produce accurate sensing results regardless of the incident angle. The sensor's sensitivity is maximized using this design, which results in a sensitivity of 85.4 % in the 2.1 GHz resonant frequency, which is much higher than that of a single column of photonic crystal-based SIW, resulting in 50.6 % of sensitivity at 2.1 GHz, at which there is a frequency shift of the order of GHz. In contrast, in the proposed design, the frequency shift is on the order of MHz, resulting in ultra-high sensitivity.
Adaptive Invariant Extended Kalman Filter for Legged Robot State Estimation IROS
State estimation is crucial for legged robots as it directly affects control performance and locomotion stability. In this paper, we propose an Adaptive Invariant Extended Kalman Filter to improve proprioceptive state estimation for legged robots. The proposed method adaptively adjusts the noise level of the contact foot model based on online covariance estimation, leading to improved state estimation under varying contact conditions. It effectively handles small slips that traditional slip rejection fails to address, as overly sensitive slip rejection settings risk causing filter divergence. Our approach employs a contact detection algorithm instead of contact sensors, reducing the reliance on additional hardware. The proposed method is validated through real-world experiments on the quadruped robot LeoQuad, demonstrating enhanced state estimation performance in dynamic locomotion scenarios.
comment: 6 pages, accepted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2025
A Control-Theoretic Approach to Dynamic Payment Routing for Success Rate Optimization
This paper introduces a control-theoretic framework for dynamic payment routing, implemented within JUSPAY's Payment Orchestrator to maximize transaction success rate. The routing system is modeled as a closed-loop feedback controller continuously sensing gateway performance, computing corrective actions, and dynamically routes transactions across gateway to ensure operational resilience. The system leverages concepts from control theory, reinforcement learning, and multi-armed bandit optimization to achieve both short-term responsiveness and long-term stability. Rather than relying on explicit PID regulation, the framework applies generalized feedback-based adaptation, ensuring that corrective actions remain proportional to observed performance deviations and the computed gateway score gradually converges toward the success rate. This hybrid approach unifies control theory and adaptive decision systems, enabling self-regulating transaction routing that dampens instability, and improves reliability. Live production results show an improvement of up to 1.15% in success rate over traditional rule-based routing, demonstrating the effectiveness of feedback-based control in payment systems.
comment: 7 Pages, 8 Figures
Local integral input-to-state stability for non-autonomous infinite-dimensional systems
In this paper, we prove comparison principles for nonlinear differential equations with time-varying coefficients and develop Lyapunov analytical tools for the integral input-to-state stability (iISS) analysis of nonlinear non-autonomous infinite-dimensional systems, which involve nonlinearities satisfying a superlinear growth, {bringing} difficulties to the iISS {analysis.} Specifically, our approach starts by establishing several forms of comparison principles for a wide range of ordinary differential equations having time-varying coefficients and superlinear terms, paving the way to conduct iISS assessment for general nonlinear non-autonomous infinite-dimensional systems within the Lyapunov stability framework. Then, by using the comparison principles, we prove a local {iISS} {(LiISS)} Lyapunov theorem for the nonlinear non-autonomous infinite-dimensional systems in the framework of Banach spaces. {Furthermore,} we provide sufficient conditions of the existence of a local iISS Lyapunonv functional (LiISS-LF) and construct LiISS-LFs for the systems in the framework of Hilbert spaces. Finally, we preset two examples to illustrate the proposed {Lyapunov} method for the LiISS analysis: one is to show how to obtain the LiISS of a nonlinear finite-dimensional system with time-varying coefficients and superlinear terms under linear state feedback control law while another one is to show how to employ the interpolation inequalities to handle superliner terms and establish the LiISS-LF for a class of multi-dimensional parabolic equations with space-time-varying coefficients. To demonstrate the validity of the results, numerical experiments are also conducted to verify the LiISS of these two classes of systems.
Linear State Estimation in Presence of Bounded Uncertainties: A Comparative Analysis
A variety of algorithms have been proposed to address the power system state estimation problem in the presence of uncertainties in the data. However, less emphasis has been given to handling perturbations in the model. In the context of linear state estimation (LSE), which is the focus of this paper, perturbations in the model come from variations in the line parameters. Since the actual values of the line parameters can be different from the values stored in a power utility's database, we investigate three approaches in this paper to estimate the states in the presence of bounded uncertainties in the data and the model. The first approach is based on interval arithmetic, the second is based on convex optimization, and the third is based on generalized linear fractional programming. The three algorithms are applied to multiple IEEE test systems and compared in terms of their speed and accuracy. The results indicate that the first two algorithms are extremely fast and give expected results, while the third suffers from scalability issues and is unsuitable for LSE.
Geometric Control Theory Over Networks: Minimal Node Cardinality Disturbance Decoupling Problems
In this paper we show how to formulate and solve disturbance decoupling problems over networks while choosing a minimal number of input and output nodes. Feedback laws that isolate and eliminate the impact of disturbance nodes on specific target nodes to be protected are provided using state, output, and dynamical feedback. For that, we leverage the fact that when reformulated in terms of sets of nodes rather than subspaces, the controlled and conditional invariance properties admit a simple graphical interpretation. For state and dynamical feedback, the minimal input and output cardinality solutions can be computed exactly in polynomial time, via min-cut/max-flow algorithms.
CBF-RL: Safety Filtering Reinforcement Learning in Training with Control Barrier Functions
Reinforcement learning (RL), while powerful and expressive, can often prioritize performance at the expense of safety. Yet safety violations can lead to catastrophic outcomes in real-world deployments. Control Barrier Functions (CBFs) offer a principled method to enforce dynamic safety -- traditionally deployed online via safety filters. While the result is safe behavior, the fact that the RL policy does not have knowledge of the CBF can lead to conservative behaviors. This paper proposes CBF-RL, a framework for generating safe behaviors with RL by enforcing CBFs in training. CBF-RL has two key attributes: (1) minimally modifying a nominal RL policy to encode safety constraints via a CBF term, (2) and safety filtering of the policy rollouts in training. Theoretically, we prove that continuous-time safety filters can be deployed via closed-form expressions on discrete-time roll-outs. Practically, we demonstrate that CBF-RL internalizes the safety constraints in the learned policy -- both enforcing safer actions and biasing towards safer rewards -- enabling safe deployment without the need for an online safety filter. We validate our framework through ablation studies on navigation tasks and on the Unitree G1 humanoid robot, where CBF-RL enables safer exploration, faster convergence, and robust performance under uncertainty, enabling the humanoid robot to avoid obstacles and climb stairs safely in real-world settings without a runtime safety filter.
comment: 8 pages
Architecture Is All You Need: Diversity-Enabled Sweet Spots for Robust Humanoid Locomotion
Robust humanoid locomotion in unstructured environments requires architectures that balance fast low-level stabilization with slower perceptual decision-making. We show that a simple layered control architecture (LCA), a proprioceptive stabilizer running at high rate, coupled with a compact low-rate perceptual policy, enables substantially more robust performance than monolithic end-to-end designs, even when using minimal perception encoders. Through a two-stage training curriculum (blind stabilizer pretraining followed by perceptual fine-tuning), we demonstrate that layered policies consistently outperform one-stage alternatives in both simulation and hardware. On a Unitree G1 humanoid, our approach succeeds across stair and ledge tasks where one-stage perceptual policies fail. These results highlight that architectural separation of timescales, rather than network scale or complexity, is the key enabler for robust perception-conditioned locomotion.
comment: 8 pages
Subgradient Method for System Identification with Non-Smooth Objectives
This paper investigates a subgradient-based algorithm to solve the system identification problem for linear time-invariant systems with non-smooth objectives. This is essential for robust system identification in safety-critical applications. While existing work provides theoretical exact recovery guarantees using optimization solvers, the design of fast learning algorithms with convergence guarantees for practical use remains unexplored. We analyze the subgradient method in this setting, where the optimization problems to be solved evolve over time as new measurements are collected, and we establish linear convergence to the ground-truth system for both the best and Polyak step sizes after a burn-in period. We further characterize sublinear convergence of the iterates under constant and diminishing step sizes, which require only minimal information and thus offer broad applicability. Finally, we compare the time complexity of standard solvers with the subgradient algorithm and support our findings with experimental results. This is the first work to analyze subgradient algorithms for system identification with non-smooth objectives.
comment: 20 pages, 2 figures
Performance Analysis of Underwater Optical Wireless Communication Using O-RIS and Fiber Optic Backhaul (Extended version)
This Letter presents a novel hybrid underwater wireless optical communication (UWOC) system that integrates underwater optical access points (UOAPs) with a passive optical network (PON)-based fiber-optic backhaul to provide a resilient backbone. A hard switching mechanism is employed between direct and optical reconfigurable intelligent surface (O-RIS)-assisted links to ensure reliable connectivity. Unlike previous studies, the proposed system is evaluated under both active and multiple passive O-RIS configurations. To enhance reliability, the Selection Combining (SC) and Maximal Ratio Combining (MRC) schemes are applied. Analytical and simulation results demonstrate that optimal O-RIS placement significantly enhances system performance. However, in the linear regime, placing it too close to the receiver causes degradation due to increased path loss and beam jitter in an identical water type. Moreover, increasing the number of O-RIS elements within practical limits further improves overall system performance and enhances adaptability to variations in the underwater channel.
comment: This is version 3 (v3) of the manuscript with further improvements and refinements
Interacting Particle Systems for Fast Linear Quadratic RL
This paper is concerned with the design of algorithms based on systems of interacting particles to represent, approximate, and learn the optimal control law for reinforcement learning (RL). The primary contribution is that convergence rates are greatly accelerated by the interactions between particles. Theory focuses on the linear quadratic stochastic optimal control problem for which a complete and novel theory is presented. Apart from the new algorithm, sample complexity bounds are obtained, and it is shown that the mean square error scales as $1/N$ where $N$ is the number of particles. The theoretical results and algorithms are illustrated with numerical experiments and comparisons with other recent approaches, where the faster convergence of the proposed algorithm is numerically demonstrated.
A Volumetric Privacy Measure for Dynamical Systems With Bounded Disturbance
This paper presents a volumetric privacy framework for dynamical systems subject to bounded disturbances, developed without requiring prior knowledge of their probability distributions. We consider systems with both public and private states, where a set containing the public state is shared as the observation. An adversary is assumed to execute an inference attack by exploiting the observed public state set to estimate an uncertainty set for the private state. The volume of this inferred set quantifies the adversary's estimation uncertainty and serves as the proposed volumetric privacy metric. Approximate set-membership estimation techniques are developed to compute the private-state uncertainty set, and the properties of the privacy measure are analyzed, demonstrating that it is bounded by the information gain from the observation set. Furthermore, an optimization-based privacy filter design problem is formulated, employing randomization and linear programming to enhance the volumetric privacy level. The effectiveness of the proposed approach is validated through a production-inventory case study. Results show that the optimal privacy filter significantly improves robustness against inference attacks and outperforms two baseline mechanisms based on additive noise and quantization.
Nash equilibrium seeking in coalition games for multiple Euler-Lagrange systems: Analysis and application to USV swarm confrontation
This paper addresses a class of Nash equilibrium (NE) seeking problems in coalition games involving both local and coupling constraints for multiple Euler-Lagrange (EL) systems subject to disturbances of unknown bounds. Within each coalition, agents cooperatively minimize a shared cost function while competing against other coalitions. A distributed strategy is proposed to seek the NE under informational constraints, where each agent has access only to its own action, cost function, and constraint parameters. In the proposed distributed NE seeking strategy, adaptive techniques are combined with sign functions to handle model uncertainties and disturbances with unknown bounds in the EL systems. To deal with the Lagrange multipliers associated with local and coupling constraints, primal-dual techniques are integrated with consensus protocols. Additionally, a dynamic average consensus algorithm is employed to estimate the gradient of the coalition cost function, while a leader-following protocol is utilized to estimate the actions of other agents. Under standard convexity and graph-connectivity assumptions, global convergence of the closed-loop EL system to the NE is established. As an illustrative application, a swarm confrontation of unmanned surface vehicles involving formation, encirclement, and interception tasks is modeled within the coalition game framework, and numerical simulations are conducted under this model to validate the theoretical results.
Transfer Learning-Enabled Efficient Raman Pump Tuning under Dynamic Launch Power for C+L Band Transmission
We propose a transfer learning-enabled Transformer framework to simultaneously realize accurate modeling and Raman pump design in C+L-band systems. The RMSE for modeling and peak-to-peak GSNR variation/deviation is within 0.22 dB and 0.86/0.1 dB, respectively.
comment: There are some rather serious problems in this paper
FIRE: A Failure-Adaptive Reinforcement Learning Framework for Edge Computing Migrations
In edge computing, users' service profiles are migrated due to user mobility. Reinforcement learning (RL) frameworks have been proposed to do so, often trained on simulated data. However, existing RL frameworks overlook occasional server failures, which although rare, impact latency-sensitive applications like autonomous driving and real-time obstacle detection. Nevertheless, these failures (rare events), being not adequately represented in historical training data, pose a challenge for data-driven RL algorithms. As it is impractical to adjust failure frequency in real-world applications for training, we introduce FIRE, a framework that adapts to rare events by training a RL policy in an edge computing digital twin environment. We propose ImRE, an importance sampling-based Q-learning algorithm, which samples rare events proportionally to their impact on the value function. FIRE considers delay, migration, failure, and backup placement costs across individual and shared service profiles. We prove ImRE's boundedness and convergence to optimality. Next, we introduce novel deep Q-learning (ImDQL) and actor critic (ImACRE) versions of our algorithm to enhance scalability. We extend our framework to accommodate users with varying risk tolerances. Through trace driven experiments, we show that FIRE reduces costs compared to vanilla RL and the greedy baseline in the event of failures.
comment: Accepted at IEEE Transactions on Services Computing
Systems and Control (EESS)
Safe Payload Transfer with Ship-Mounted Cranes: A Robust Model Predictive Control Approach
Ensuring safe real-time control of ship-mounted cranes in unstructured transportation environments requires handling multiple safety constraints while maintaining effective payload transfer performance. Unlike traditional crane systems, ship-mounted cranes are consistently subjected to significant external disturbances affecting underactuated crane dynamics due to the ship's dynamic motion response to harsh sea conditions, which can lead to robustness issues. To tackle these challenges, we propose a robust and safe model predictive control (MPC) framework and demonstrate it on a 5-DOF crane system, where a Stewart platform simulates the external disturbances that ocean surface motions would have on the supporting ship. The crane payload transfer operation must avoid obstacles and accurately place the payload within a designated target area. We use a robust zero-order control barrier function (R-ZOCBF)-based safety constraint in the nonlinear MPC to ensure safe payload positioning, while time-varying bounding boxes are utilized for collision avoidance. We introduce a new optimization-based online robustness parameter adaptation scheme to reduce the conservativeness of R-ZOCBFs. Experimental trials on a crane prototype demonstrate the overall performance of our safe control approach under significant perturbing motions of the crane base. While our focus is on crane-facilitated transfer, the methods more generally apply to safe robotically-assisted parts mating and parts insertion.
Ultra High Sensitivity Soil Moisture Detection Using Photonic Crystal Cavity with SIW Technology
Soil nutrients and water content are two crucial factors that significantly affect agricultural production yields. Hence, monitoring and measuring the water content and soil type are critical requirements. This study proposes a two-dimensional structure of photonic crystals centered around a symmetrical cross-shaped slot. The cross-slots act as resonators, and the photonic crystals surrounding the slots tune the resonance frequency of the resonators to enhance mode confinement within the resonator. The various resonant modes are located in the 2.1 GHz, 5.2 GHz, and 8.1 GHz bands, which correspond to the S band, C band, and X band, respectively. These bands are used to compare the absorption, whereas the upper resonant mode is of the order of 20 GHz. Band structure analysis was performed using the Plane Wave Method (PWM). The resonant frequency is computed using a 3D electromagnetic (EM) simulation software that utilizes the Finite Element Method (FEM) and lies in the radiation mode region of the band structure of the photonic crystal. Varying the incident angle had a negligible effect on the absorption characteristics of the sensor, allowing it to produce accurate sensing results regardless of the incident angle. The sensor's sensitivity is maximized using this design, which results in a sensitivity of 85.4 % in the 2.1 GHz resonant frequency, which is much higher than that of a single column of photonic crystal-based SIW, resulting in 50.6 % of sensitivity at 2.1 GHz, at which there is a frequency shift of the order of GHz. In contrast, in the proposed design, the frequency shift is on the order of MHz, resulting in ultra-high sensitivity.
Adaptive Invariant Extended Kalman Filter for Legged Robot State Estimation IROS
State estimation is crucial for legged robots as it directly affects control performance and locomotion stability. In this paper, we propose an Adaptive Invariant Extended Kalman Filter to improve proprioceptive state estimation for legged robots. The proposed method adaptively adjusts the noise level of the contact foot model based on online covariance estimation, leading to improved state estimation under varying contact conditions. It effectively handles small slips that traditional slip rejection fails to address, as overly sensitive slip rejection settings risk causing filter divergence. Our approach employs a contact detection algorithm instead of contact sensors, reducing the reliance on additional hardware. The proposed method is validated through real-world experiments on the quadruped robot LeoQuad, demonstrating enhanced state estimation performance in dynamic locomotion scenarios.
comment: 6 pages, accepted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2025
A Control-Theoretic Approach to Dynamic Payment Routing for Success Rate Optimization
This paper introduces a control-theoretic framework for dynamic payment routing, implemented within JUSPAY's Payment Orchestrator to maximize transaction success rate. The routing system is modeled as a closed-loop feedback controller continuously sensing gateway performance, computing corrective actions, and dynamically routes transactions across gateway to ensure operational resilience. The system leverages concepts from control theory, reinforcement learning, and multi-armed bandit optimization to achieve both short-term responsiveness and long-term stability. Rather than relying on explicit PID regulation, the framework applies generalized feedback-based adaptation, ensuring that corrective actions remain proportional to observed performance deviations and the computed gateway score gradually converges toward the success rate. This hybrid approach unifies control theory and adaptive decision systems, enabling self-regulating transaction routing that dampens instability, and improves reliability. Live production results show an improvement of up to 1.15% in success rate over traditional rule-based routing, demonstrating the effectiveness of feedback-based control in payment systems.
comment: 7 Pages, 8 Figures
Local integral input-to-state stability for non-autonomous infinite-dimensional systems
In this paper, we prove comparison principles for nonlinear differential equations with time-varying coefficients and develop Lyapunov analytical tools for the integral input-to-state stability (iISS) analysis of nonlinear non-autonomous infinite-dimensional systems, which involve nonlinearities satisfying a superlinear growth, {bringing} difficulties to the iISS {analysis.} Specifically, our approach starts by establishing several forms of comparison principles for a wide range of ordinary differential equations having time-varying coefficients and superlinear terms, paving the way to conduct iISS assessment for general nonlinear non-autonomous infinite-dimensional systems within the Lyapunov stability framework. Then, by using the comparison principles, we prove a local {iISS} {(LiISS)} Lyapunov theorem for the nonlinear non-autonomous infinite-dimensional systems in the framework of Banach spaces. {Furthermore,} we provide sufficient conditions of the existence of a local iISS Lyapunonv functional (LiISS-LF) and construct LiISS-LFs for the systems in the framework of Hilbert spaces. Finally, we preset two examples to illustrate the proposed {Lyapunov} method for the LiISS analysis: one is to show how to obtain the LiISS of a nonlinear finite-dimensional system with time-varying coefficients and superlinear terms under linear state feedback control law while another one is to show how to employ the interpolation inequalities to handle superliner terms and establish the LiISS-LF for a class of multi-dimensional parabolic equations with space-time-varying coefficients. To demonstrate the validity of the results, numerical experiments are also conducted to verify the LiISS of these two classes of systems.
Linear State Estimation in Presence of Bounded Uncertainties: A Comparative Analysis
A variety of algorithms have been proposed to address the power system state estimation problem in the presence of uncertainties in the data. However, less emphasis has been given to handling perturbations in the model. In the context of linear state estimation (LSE), which is the focus of this paper, perturbations in the model come from variations in the line parameters. Since the actual values of the line parameters can be different from the values stored in a power utility's database, we investigate three approaches in this paper to estimate the states in the presence of bounded uncertainties in the data and the model. The first approach is based on interval arithmetic, the second is based on convex optimization, and the third is based on generalized linear fractional programming. The three algorithms are applied to multiple IEEE test systems and compared in terms of their speed and accuracy. The results indicate that the first two algorithms are extremely fast and give expected results, while the third suffers from scalability issues and is unsuitable for LSE.
Geometric Control Theory Over Networks: Minimal Node Cardinality Disturbance Decoupling Problems
In this paper we show how to formulate and solve disturbance decoupling problems over networks while choosing a minimal number of input and output nodes. Feedback laws that isolate and eliminate the impact of disturbance nodes on specific target nodes to be protected are provided using state, output, and dynamical feedback. For that, we leverage the fact that when reformulated in terms of sets of nodes rather than subspaces, the controlled and conditional invariance properties admit a simple graphical interpretation. For state and dynamical feedback, the minimal input and output cardinality solutions can be computed exactly in polynomial time, via min-cut/max-flow algorithms.
CBF-RL: Safety Filtering Reinforcement Learning in Training with Control Barrier Functions
Reinforcement learning (RL), while powerful and expressive, can often prioritize performance at the expense of safety. Yet safety violations can lead to catastrophic outcomes in real-world deployments. Control Barrier Functions (CBFs) offer a principled method to enforce dynamic safety -- traditionally deployed online via safety filters. While the result is safe behavior, the fact that the RL policy does not have knowledge of the CBF can lead to conservative behaviors. This paper proposes CBF-RL, a framework for generating safe behaviors with RL by enforcing CBFs in training. CBF-RL has two key attributes: (1) minimally modifying a nominal RL policy to encode safety constraints via a CBF term, (2) and safety filtering of the policy rollouts in training. Theoretically, we prove that continuous-time safety filters can be deployed via closed-form expressions on discrete-time roll-outs. Practically, we demonstrate that CBF-RL internalizes the safety constraints in the learned policy -- both enforcing safer actions and biasing towards safer rewards -- enabling safe deployment without the need for an online safety filter. We validate our framework through ablation studies on navigation tasks and on the Unitree G1 humanoid robot, where CBF-RL enables safer exploration, faster convergence, and robust performance under uncertainty, enabling the humanoid robot to avoid obstacles and climb stairs safely in real-world settings without a runtime safety filter.
comment: 8 pages
Architecture Is All You Need: Diversity-Enabled Sweet Spots for Robust Humanoid Locomotion
Robust humanoid locomotion in unstructured environments requires architectures that balance fast low-level stabilization with slower perceptual decision-making. We show that a simple layered control architecture (LCA), a proprioceptive stabilizer running at high rate, coupled with a compact low-rate perceptual policy, enables substantially more robust performance than monolithic end-to-end designs, even when using minimal perception encoders. Through a two-stage training curriculum (blind stabilizer pretraining followed by perceptual fine-tuning), we demonstrate that layered policies consistently outperform one-stage alternatives in both simulation and hardware. On a Unitree G1 humanoid, our approach succeeds across stair and ledge tasks where one-stage perceptual policies fail. These results highlight that architectural separation of timescales, rather than network scale or complexity, is the key enabler for robust perception-conditioned locomotion.
comment: 8 pages
Subgradient Method for System Identification with Non-Smooth Objectives
This paper investigates a subgradient-based algorithm to solve the system identification problem for linear time-invariant systems with non-smooth objectives. This is essential for robust system identification in safety-critical applications. While existing work provides theoretical exact recovery guarantees using optimization solvers, the design of fast learning algorithms with convergence guarantees for practical use remains unexplored. We analyze the subgradient method in this setting, where the optimization problems to be solved evolve over time as new measurements are collected, and we establish linear convergence to the ground-truth system for both the best and Polyak step sizes after a burn-in period. We further characterize sublinear convergence of the iterates under constant and diminishing step sizes, which require only minimal information and thus offer broad applicability. Finally, we compare the time complexity of standard solvers with the subgradient algorithm and support our findings with experimental results. This is the first work to analyze subgradient algorithms for system identification with non-smooth objectives.
comment: 20 pages, 2 figures
Performance Analysis of Underwater Optical Wireless Communication Using O-RIS and Fiber Optic Backhaul (Extended version)
This Letter presents a novel hybrid underwater wireless optical communication (UWOC) system that integrates underwater optical access points (UOAPs) with a passive optical network (PON)-based fiber-optic backhaul to provide a resilient backbone. A hard switching mechanism is employed between direct and optical reconfigurable intelligent surface (O-RIS)-assisted links to ensure reliable connectivity. Unlike previous studies, the proposed system is evaluated under both active and multiple passive O-RIS configurations. To enhance reliability, the Selection Combining (SC) and Maximal Ratio Combining (MRC) schemes are applied. Analytical and simulation results demonstrate that optimal O-RIS placement significantly enhances system performance. However, in the linear regime, placing it too close to the receiver causes degradation due to increased path loss and beam jitter in an identical water type. Moreover, increasing the number of O-RIS elements within practical limits further improves overall system performance and enhances adaptability to variations in the underwater channel.
comment: This is version 3 (v3) of the manuscript with further improvements and refinements
Interacting Particle Systems for Fast Linear Quadratic RL
This paper is concerned with the design of algorithms based on systems of interacting particles to represent, approximate, and learn the optimal control law for reinforcement learning (RL). The primary contribution is that convergence rates are greatly accelerated by the interactions between particles. Theory focuses on the linear quadratic stochastic optimal control problem for which a complete and novel theory is presented. Apart from the new algorithm, sample complexity bounds are obtained, and it is shown that the mean square error scales as $1/N$ where $N$ is the number of particles. The theoretical results and algorithms are illustrated with numerical experiments and comparisons with other recent approaches, where the faster convergence of the proposed algorithm is numerically demonstrated.
A Volumetric Privacy Measure for Dynamical Systems With Bounded Disturbance
This paper presents a volumetric privacy framework for dynamical systems subject to bounded disturbances, developed without requiring prior knowledge of their probability distributions. We consider systems with both public and private states, where a set containing the public state is shared as the observation. An adversary is assumed to execute an inference attack by exploiting the observed public state set to estimate an uncertainty set for the private state. The volume of this inferred set quantifies the adversary's estimation uncertainty and serves as the proposed volumetric privacy metric. Approximate set-membership estimation techniques are developed to compute the private-state uncertainty set, and the properties of the privacy measure are analyzed, demonstrating that it is bounded by the information gain from the observation set. Furthermore, an optimization-based privacy filter design problem is formulated, employing randomization and linear programming to enhance the volumetric privacy level. The effectiveness of the proposed approach is validated through a production-inventory case study. Results show that the optimal privacy filter significantly improves robustness against inference attacks and outperforms two baseline mechanisms based on additive noise and quantization.
Nash equilibrium seeking in coalition games for multiple Euler-Lagrange systems: Analysis and application to USV swarm confrontation
This paper addresses a class of Nash equilibrium (NE) seeking problems in coalition games involving both local and coupling constraints for multiple Euler-Lagrange (EL) systems subject to disturbances of unknown bounds. Within each coalition, agents cooperatively minimize a shared cost function while competing against other coalitions. A distributed strategy is proposed to seek the NE under informational constraints, where each agent has access only to its own action, cost function, and constraint parameters. In the proposed distributed NE seeking strategy, adaptive techniques are combined with sign functions to handle model uncertainties and disturbances with unknown bounds in the EL systems. To deal with the Lagrange multipliers associated with local and coupling constraints, primal-dual techniques are integrated with consensus protocols. Additionally, a dynamic average consensus algorithm is employed to estimate the gradient of the coalition cost function, while a leader-following protocol is utilized to estimate the actions of other agents. Under standard convexity and graph-connectivity assumptions, global convergence of the closed-loop EL system to the NE is established. As an illustrative application, a swarm confrontation of unmanned surface vehicles involving formation, encirclement, and interception tasks is modeled within the coalition game framework, and numerical simulations are conducted under this model to validate the theoretical results.
Transfer Learning-Enabled Efficient Raman Pump Tuning under Dynamic Launch Power for C+L Band Transmission
We propose a transfer learning-enabled Transformer framework to simultaneously realize accurate modeling and Raman pump design in C+L-band systems. The RMSE for modeling and peak-to-peak GSNR variation/deviation is within 0.22 dB and 0.86/0.1 dB, respectively.
comment: There are some rather serious problems in this paper
FIRE: A Failure-Adaptive Reinforcement Learning Framework for Edge Computing Migrations
In edge computing, users' service profiles are migrated due to user mobility. Reinforcement learning (RL) frameworks have been proposed to do so, often trained on simulated data. However, existing RL frameworks overlook occasional server failures, which although rare, impact latency-sensitive applications like autonomous driving and real-time obstacle detection. Nevertheless, these failures (rare events), being not adequately represented in historical training data, pose a challenge for data-driven RL algorithms. As it is impractical to adjust failure frequency in real-world applications for training, we introduce FIRE, a framework that adapts to rare events by training a RL policy in an edge computing digital twin environment. We propose ImRE, an importance sampling-based Q-learning algorithm, which samples rare events proportionally to their impact on the value function. FIRE considers delay, migration, failure, and backup placement costs across individual and shared service profiles. We prove ImRE's boundedness and convergence to optimality. Next, we introduce novel deep Q-learning (ImDQL) and actor critic (ImACRE) versions of our algorithm to enhance scalability. We extend our framework to accommodate users with varying risk tolerances. Through trace driven experiments, we show that FIRE reduces costs compared to vanilla RL and the greedy baseline in the event of failures.
comment: Accepted at IEEE Transactions on Services Computing
Robotics
Structured Interfaces for Automated Reasoning with 3D Scene Graphs
In order to provide a robot with the ability to understand and react to a user's natural language inputs, the natural language must be connected to the robot's underlying representations of the world. Recently, large language models (LLMs) and 3D scene graphs (3DSGs) have become a popular choice for grounding natural language and representing the world. In this work, we address the challenge of using LLMs with 3DSGs to ground natural language. Existing methods encode the scene graph as serialized text within the LLM's context window, but this encoding does not scale to large or rich 3DSGs. Instead, we propose to use a form of Retrieval Augmented Generation to select a subset of the 3DSG relevant to the task. We encode a 3DSG in a graph database and provide a query language interface (Cypher) as a tool to the LLM with which it can retrieve relevant data for language grounding. We evaluate our approach on instruction following and scene question-answering tasks and compare against baseline context window and code generation methods. Our results show that using Cypher as an interface to 3D scene graphs scales significantly better to large, rich graphs on both local and cloud-based models. This leads to large performance improvements in grounded language tasks while also substantially reducing the token count of the scene graph content. A video supplement is available at https://www.youtube.com/watch?v=zY_YI9giZSA.
comment: 25 pages, 3 figures
Self-Supervised Learning to Fly using Efficient Semantic Segmentation and Metric Depth Estimation for Low-Cost Autonomous UAVs
This paper presents a vision-only autonomous flight system for small UAVs operating in controlled indoor environments. The system combines semantic segmentation with monocular depth estimation to enable obstacle avoidance, scene exploration, and autonomous safe landing operations without requiring GPS or expensive sensors such as LiDAR. A key innovation is an adaptive scale factor algorithm that converts non-metric monocular depth predictions into accurate metric distance measurements by leveraging semantic ground plane detection and camera intrinsic parameters, achieving a mean distance error of 14.4 cm. The approach uses a knowledge distillation framework where a color-based Support Vector Machine (SVM) teacher generates training data for a lightweight U-Net student network (1.6M parameters) capable of real-time semantic segmentation. For more complex environments, the SVM teacher can be replaced with a state-of-the-art segmentation model. Testing was conducted in a controlled 5x4 meter laboratory environment with eight cardboard obstacles simulating urban structures. Extensive validation across 30 flight tests in a real-world environment and 100 flight tests in a digital-twin environment demonstrates that the combined segmentation and depth approach increases the distance traveled during surveillance and reduces mission time while maintaining 100% success rates. The system is further optimized through end-to-end learning, where a compact student neural network learns complete flight policies from demonstration data generated by our best-performing method, achieving an 87.5% autonomous mission success rate. This work advances practical vision-based drone navigation in structured environments, demonstrating solutions for metric depth estimation and computational efficiency challenges that enable deployment on resource-constrained platforms.
MoS-VLA: A Vision-Language-Action Model with One-Shot Skill Adaptation
Vision-Language-Action (VLA) models trained on large robot datasets promise general-purpose, robust control across diverse domains and embodiments. However, existing approaches often fail out-of-the-box when deployed in novel environments, embodiments, or tasks. We introduce Mixture of Skills VLA (MoS-VLA), a framework that represents robot manipulation policies as linear combinations of a finite set of learned basis functions. During pretraining, MoS-VLA jointly learns these basis functions across datasets from the Open X-Embodiment project, producing a structured skill space. At test time, adapting to a new task requires only a single expert demonstration. The corresponding skill representation is then inferred via a lightweight convex optimization problem that minimizes the L1 action error, without requiring gradient updates. This gradient-free adaptation incurs minimal overhead while enabling rapid instantiation of new skills. Empirically, MoS-VLA achieves lower action-prediction error on five out of five unseen datasets and succeeds in both simulation and real-robot tasks where a pretrained VLA model fails outright. Project page: mos-vla.github.io/
Semi-Peaucellier Linkage and Differential Mechanism for Linear Pinching and Self-Adaptive Grasping
This paper presents the SP-Diff parallel gripper system, addressing the limited adaptability of conventional end-effectors in intelligent industrial automation. The proposed design employs an innovative differential linkage mechanism with a modular symmetric dual-finger configuration to achieve linear-parallel grasping. By integrating a planetary gear transmission, the system enables synchronized linear motion and independent finger pose adjustment while maintaining structural rigidity, reducing Z-axis recalibration requirements by 30% compared to arc-trajectory grippers. The compact palm architecture incorporates a kinematically optimized parallelogram linkage and Differential mechanism, demonstrating adaptive grasping capabilities for diverse industrial workpieces and deformable objects such as citrus fruits. Future-ready interfaces are embedded for potential force/vision sensor integration to facilitate multimodal data acquisition (e.g., trajectory planning and object deformation) in digital twin frameworks. Designed as a flexible manufacturing solution, SP-Diff advances robotic end-effector intelligence through its adaptive architecture, showing promising applications in collaborative robotics, logistics automation, and specialized operational scenarios.
comment: 6 pages, 9 figures, Accepted author manuscript for IEEE CASE 2025
DIV-Nav: Open-Vocabulary Spatial Relationships for Multi-Object Navigation
Advances in open-vocabulary semantic mapping and object navigation have enabled robots to perform an informed search of their environment for an arbitrary object. However, such zero-shot object navigation is typically designed for simple queries with an object name like "television" or "blue rug". Here, we consider more complex free-text queries with spatial relationships, such as "find the remote on the table" while still leveraging robustness of a semantic map. We present DIV-Nav, a real-time navigation system that efficiently addresses this problem through a series of relaxations: i) Decomposing natural language instructions with complex spatial constraints into simpler object-level queries on a semantic map, ii) computing the Intersection of individual semantic belief maps to identify regions where all objects co-exist, and iii) Validating the discovered objects against the original, complex spatial constrains via a LVLM. We further investigate how to adapt the frontier exploration objectives of online semantic mapping to such spatial search queries to more effectively guide the search process. We validate our system through extensive experiments on the MultiON benchmark and real-world deployment on a Boston Dynamics Spot robot using a Jetson Orin AGX. More details and videos are available at https://anonsub42.github.io/reponame/
A Novel Gripper with Semi-Peaucellier Linkage and Idle-Stroke Mechanism for Linear Pinching and Self-Adaptive Grasping IROS 2025
This paper introduces a novel robotic gripper, named as the SPD gripper. It features a palm and two mechanically identical and symmetrically arranged fingers, which can be driven independently or by a single motor. The fingertips of the fingers follow a linear motion trajectory, facilitating the grasping of objects of various sizes on a tabletop without the need to adjust the overall height of the gripper. Traditional industrial grippers with parallel gripping capabilities often exhibit an arcuate motion at the fingertips, requiring the entire robotic arm to adjust its height to avoid collisions with the tabletop. The SPD gripper, with its linear parallel gripping mechanism, effectively addresses this issue. Furthermore, the SPD gripper possesses adaptive capabilities, accommodating objects of different shapes and sizes. This paper presents the design philosophy, fundamental composition principles, and optimization analysis theory of the SPD gripper. Based on the design theory, a robotic gripper prototype was developed and tested. The experimental results demonstrate that the robotic gripper successfully achieves linear parallel gripping functionality and exhibits good adaptability. In the context of the ongoing development of embodied intelligence technologies, this robotic gripper can assist various robots in achieving effective grasping, laying a solid foundation for collecting data to enhance deep learning training.
comment: Accepted author manuscript (AAM) for IEEE/RSJ IROS 2025. 6 pages, 10 figures
Advancing Off-Road Autonomous Driving: The Large-Scale ORAD-3D Dataset and Comprehensive Benchmarks
A major bottleneck in off-road autonomous driving research lies in the scarcity of large-scale, high-quality datasets and benchmarks. To bridge this gap, we present ORAD-3D, which, to the best of our knowledge, is the largest dataset specifically curated for off-road autonomous driving. ORAD-3D covers a wide spectrum of terrains, including woodlands, farmlands, grasslands, riversides, gravel roads, cement roads, and rural areas, while capturing diverse environmental variations across weather conditions (sunny, rainy, foggy, and snowy) and illumination levels (bright daylight, daytime, twilight, and nighttime). Building upon this dataset, we establish a comprehensive suite of benchmark evaluations spanning five fundamental tasks: 2D free-space detection, 3D occupancy prediction, rough GPS-guided path planning, vision-language model-driven autonomous driving, and world model for off-road environments. Together, the dataset and benchmarks provide a unified and robust resource for advancing perception and planning in challenging off-road scenarios. The dataset and code will be made publicly available at https://github.com/chaytonmin/ORAD-3D.
comment: Off-road robotics
NavQ: Learning a Q-Model for Foresighted Vision-and-Language Navigation ICCV 2025
In this work we concentrate on the task of goal-oriented Vision-and-Language Navigation (VLN). Existing methods often make decisions based on historical information, overlooking the future implications and long-term outcomes of the actions. In contrast, we aim to develop a foresighted agent. Specifically, we draw upon Q-learning to train a Q-model using large-scale unlabeled trajectory data, in order to learn the general knowledge regarding the layout and object relations within indoor scenes. This model can generate a Q-feature, analogous to the Q-value in traditional Q-network, for each candidate action, which describes the potential future information that may be observed after taking the specific action. Subsequently, a cross-modal future encoder integrates the task-agnostic Q-feature with navigation instructions to produce a set of action scores reflecting future prospects. These scores, when combined with the original scores based on history, facilitate an A*-style searching strategy to effectively explore the regions that are more likely to lead to the destination. Extensive experiments conducted on widely used goal-oriented VLN datasets validate the effectiveness of the proposed method.
comment: ICCV 2025
RefAtomNet++: Advancing Referring Atomic Video Action Recognition using Semantic Retrieval based Multi-Trajectory Mamba ECCV 2024
Referring Atomic Video Action Recognition (RAVAR) aims to recognize fine-grained, atomic-level actions of a specific person of interest conditioned on natural language descriptions. Distinct from conventional action recognition and detection tasks, RAVAR emphasizes precise language-guided action understanding, which is particularly critical for interactive human action analysis in complex multi-person scenarios. In this work, we extend our previously introduced RefAVA dataset to RefAVA++, which comprises >2.9 million frames and >75.1k annotated persons in total. We benchmark this dataset using baselines from multiple related domains, including atomic action localization, video question answering, and text-video retrieval, as well as our earlier model, RefAtomNet. Although RefAtomNet surpasses other baselines by incorporating agent attention to highlight salient features, its ability to align and retrieve cross-modal information remains limited, leading to suboptimal performance in localizing the target person and predicting fine-grained actions. To overcome the aforementioned limitations, we introduce RefAtomNet++, a novel framework that advances cross-modal token aggregation through a multi-hierarchical semantic-aligned cross-attention mechanism combined with multi-trajectory Mamba modeling at the partial-keyword, scene-attribute, and holistic-sentence levels. In particular, scanning trajectories are constructed by dynamically selecting the nearest visual spatial tokens at each timestep for both partial-keyword and scene-attribute levels. Moreover, we design a multi-hierarchical semantic-aligned cross-attention strategy, enabling more effective aggregation of spatial and temporal tokens across different semantic hierarchies. Experiments show that RefAtomNet++ establishes new state-of-the-art results. The dataset and code are released at https://github.com/KPeng9510/refAVA2.
comment: Extended version of ECCV 2024 paper arXiv:2407.01872. The dataset and code are released at https://github.com/KPeng9510/refAVA2
What Questions Should Robots Be Able to Answer? A Dataset of User Questions for Explainable Robotics
With the growing use of large language models and conversational interfaces in human-robot interaction, robots' ability to answer user questions is more important than ever. We therefore introduce a dataset of 1,893 user questions for household robots, collected from 100 participants and organized into 12 categories and 70 subcategories. Most work in explainable robotics focuses on why-questions. In contrast, our dataset provides a wide variety of questions, from questions about simple execution details to questions about how the robot would act in hypothetical scenarios -- thus giving roboticists valuable insights into what questions their robot needs to be able to answer. To collect the dataset, we created 15 video stimuli and 7 text stimuli, depicting robots performing varied household tasks. We then asked participants on Prolific what questions they would want to ask the robot in each portrayed situation. In the final dataset, the most frequent categories are questions about task execution details (22.5%), the robot's capabilities (12.7%), and performance assessments (11.3%). Although questions about how robots would handle potentially difficult scenarios and ensure correct behavior are less frequent, users rank them as the most important for robots to be able to answer. Moreover, we find that users who identify as novices in robotics ask different questions than more experienced users. Novices are more likely to inquire about simple facts, such as what the robot did or the current state of the environment. As robots enter environments shared with humans and language becomes central to giving instructions and interaction, this dataset provides a valuable foundation for (i) identifying the information robots need to log and expose to conversational interfaces, (ii) benchmarking question-answering modules, and (iii) designing explanation strategies that align with user expectations.
Learning to Optimize Edge Robotics: A Fast Integrated Perception-Motion-Communication Approach
Edge robotics involves frequent exchanges of large-volume multi-modal data. Existing methods ignore the interdependency between robotic functionalities and communication conditions, leading to excessive communication overhead. This paper revolutionizes edge robotics systems through integrated perception, motion, and communication (IPMC). As such, robots can dynamically adapt their communication strategies (i.e., compression ratio, transmission frequency, transmit power) by leveraging the knowledge of robotic perception and motion dynamics, thus reducing the need for excessive sensor data uploads. Furthermore, by leveraging the learning to optimize (LTO) paradigm, an imitation learning neural network is designed and implemented, which reduces the computational complexity by over 10x compared to state-of-the art optimization solvers. Experiments demonstrate the superiority of the proposed IPMC and the real-time execution capability of LTO.
Conformal Prediction in The Loop: A Feedback-Based Uncertainty Model for Trajectory Optimization NeurIPS 2025
Conformal Prediction (CP) is a powerful statistical machine learning tool to construct uncertainty sets with coverage guarantees, which has fueled its extensive adoption in generating prediction regions for decision-making tasks, e.g., Trajectory Optimization (TO) in uncertain environments. However, existing methods predominantly employ a sequential scheme, where decisions rely unidirectionally on the prediction regions, and consequently the information from decision-making fails to be fed back to instruct CP. In this paper, we propose a novel Feedback-Based CP (Fb-CP) framework for shrinking-horizon TO with a joint risk constraint over the entire mission time. Specifically, a CP-based posterior risk calculation method is developed by fully leveraging the realized trajectories to adjust the posterior allowable risk, which is then allocated to future times to update prediction regions. In this way, the information in the realized trajectories is continuously fed back to the CP, enabling attractive feedback-based adjustments of the prediction regions and a provable online improvement in trajectory performance. Furthermore, we theoretically prove that such adjustments consistently maintain the coverage guarantees of the prediction regions, thereby ensuring provable safety. Additionally, we develop a decision-focused iterative risk allocation algorithm with theoretical convergence analysis for allocating the posterior allowable risk which closely aligns with Fb-CP. Furthermore, we extend the proposed method to handle distribution shift. The effectiveness and superiority of the proposed method are demonstrated through benchmark experiments.
comment: Accepted by NeurIPS 2025 Main Track
Manual2Skill++: Connector-Aware General Robotic Assembly from Instruction Manuals via Vision-Language Models
Assembly hinges on reliably forming connections between parts; yet most robotic approaches plan assembly sequences and part poses while treating connectors as an afterthought. Connections represent the critical "last mile" of assembly execution, while task planning may sequence operations and motion plan may position parts, the precise establishment of physical connections ultimately determines assembly success or failure. In this paper, we consider connections as first-class primitives in assembly representation, including connector types, specifications, quantities, and placement locations. Drawing inspiration from how humans learn assembly tasks through step-by-step instruction manuals, we present Manual2Skill++, a vision-language framework that automatically extracts structured connection information from assembly manuals. We encode assembly tasks as hierarchical graphs where nodes represent parts and sub-assemblies, and edges explicitly model connection relationships between components. A large-scale vision-language model parses symbolic diagrams and annotations in manuals to instantiate these graphs, leveraging the rich connection knowledge embedded in human-designed instructions. We curate a dataset containing over 20 assembly tasks with diverse connector types to validate our representation extraction approach, and evaluate the complete task understanding-to-execution pipeline across four complex assembly scenarios in simulation, spanning furniture, toys, and manufacturing components with real-world correspondence.
SPOT: Sensing-augmented Trajectory Planning via Obstacle Threat Modeling
UAVs equipped with a single depth camera encounter significant challenges in dynamic obstacle avoidance due to limited field of view and inevitable blind spots. While active vision strategies that steer onboard cameras have been proposed to expand sensing coverage, most existing methods separate motion planning from sensing considerations, resulting in less effective and delayed obstacle response. To address this limitation, we introduce SPOT (Sensing-augmented Planning via Obstacle Threat modeling), a unified planning framework for observation-aware trajectory planning that explicitly incorporates sensing objectives into motion optimization. At the core of our method is a Gaussian Process-based obstacle belief map, which establishes a unified probabilistic representation of both recognized (previously observed) and potential obstacles. This belief is further processed through a collision-aware inference mechanism that transforms spatial uncertainty and trajectory proximity into a time-varying observation urgency map. By integrating urgency values within the current field of view, we define differentiable objectives that enable real-time, observation-aware trajectory planning with computation times under 10 ms. Simulation and real-world experiments in dynamic, cluttered, and occluded environments show that our method detects potential dynamic obstacles 2.8 seconds earlier than baseline approaches, increasing dynamic obstacle visibility by over 500\%, and enabling safe navigation through cluttered, occluded environments.
Do What You Say: Steering Vision-Language-Action Models via Runtime Reasoning-Action Alignment Verification
Reasoning Vision Language Action (VLA) models improve robotic instruction-following by generating step-by-step textual plans before low-level actions, an approach inspired by Chain-of-Thought (CoT) reasoning in language models. Yet even with a correct textual plan, the generated actions can still miss the intended outcomes in the plan, especially in out-of-distribution (OOD) scenarios. We formalize this phenomenon as a lack of embodied CoT faithfulness, and introduce a training-free, runtime policy steering method for reasoning-action alignment. Given a reasoning VLA's intermediate textual plan, our framework samples multiple candidate action sequences from the same model, predicts their outcomes via simulation, and uses a pre-trained Vision-Language Model (VLM) to select the sequence whose outcome best aligns with the VLA's own textual plan. Only executing action sequences that align with the textual reasoning turns our base VLA's natural action diversity from a source of error into a strength, boosting robustness to semantic and visual OOD perturbations and enabling novel behavior composition without costly re-training. We also contribute a reasoning-annotated extension of LIBERO-100, environment variations tailored for OOD evaluation, and demonstrate up to 15% performance gain over prior work on behavior composition tasks and scales with compute and data diversity. Project Website at: https://yilin-wu98.github.io/steering-reasoning-vla/
VT-Refine: Learning Bimanual Assembly with Visuo-Tactile Feedback via Simulation Fine-Tuning
Humans excel at bimanual assembly tasks by adapting to rich tactile feedback -- a capability that remains difficult to replicate in robots through behavioral cloning alone, due to the suboptimality and limited diversity of human demonstrations. In this work, we present VT-Refine, a visuo-tactile policy learning framework that combines real-world demonstrations, high-fidelity tactile simulation, and reinforcement learning to tackle precise, contact-rich bimanual assembly. We begin by training a diffusion policy on a small set of demonstrations using synchronized visual and tactile inputs. This policy is then transferred to a simulated digital twin equipped with simulated tactile sensors and further refined via large-scale reinforcement learning to enhance robustness and generalization. To enable accurate sim-to-real transfer, we leverage high-resolution piezoresistive tactile sensors that provide normal force signals and can be realistically modeled in parallel using GPU-accelerated simulation. Experimental results show that VT-Refine improves assembly performance in both simulation and the real world by increasing data diversity and enabling more effective policy fine-tuning. Our project page is available at https://binghao-huang.github.io/vt_refine/.
comment: Accepted by 9th Conference on Robot Learning (CoRL 2025); Website: https://binghao-huang.github.io/vt_refine/
Context-Based Meta Reinforcement Learning for Robust and Adaptable Peg-in-Hole Assembly Tasks
Autonomous assembly is an essential capability for industrial and service robots, with Peg-in-Hole (PiH) insertion being one of the core tasks. However, PiH assembly in unknown environments is still challenging due to uncertainty in task parameters, such as the hole position and orientation, resulting from sensor noise. Although context-based meta reinforcement learning (RL) methods have been previously presented to adapt to unknown task parameters in PiH assembly tasks, the performance depends on a sample-inefficient procedure or human demonstrations. Thus, to enhance the applicability of meta RL in real-world PiH assembly tasks, we propose to train the agent to use information from the robot's forward kinematics and an uncalibrated camera. Furthermore, we improve the performance by efficiently adapting the meta-trained agent to use data from force/torque sensor. Finally, we propose an adaptation procedure for out-of-distribution tasks whose parameters are different from the training tasks. Experiments on simulated and real robots prove that our modifications enhance the sample efficiency during meta training, real-world adaptation performance, and generalization of the context-based meta RL agent in PiH assembly tasks compared to previous approaches.
Multi-Layered Reasoning from a Single Viewpoint for Learning See-Through Grasping
Sensory substitution enables biological systems to perceive stimuli typically obtained by another organ, which is inspirational for physical agents. Multi-modal perception of intrinsic and extrinsic interactions is critical in building an intelligent robot that learns. This study presents a Vision-based See-Through Perception (VBSeeThruP) architecture that simultaneously perceives multiple intrinsic and extrinsic modalities via a single visual input in a markerless way, all packed within a soft robotic finger using the Soft Polyhedral Network design. It is generally applicable to miniature vision systems placed underneath deformable networks with a see-through design, capturing real-time images of the network's physical interactions induced by contact-based events overlayed on top of the visual scene of the external environment, as demonstrated in the ablation study. We present the VBSeeThruP's capability for learning reactive grasping without using external cameras or dedicated force and torque sensors on the fingertips. Using the inpainted scene and the deformation mask, we further demonstrate the multi-modal performance of the VBSeeThruP architecture to simultaneously achieve various perceptions, including but not limited to scene inpainting, object detection, depth sensing, scene segmentation, masked deformation tracking, 6D force/torque sensing, and contact event detection, all within a single sensory input from the in-finger vision markerlessly.
comment: 23 pages, 13 figures, 2 tables, for supplementary videos, see https://bionicdl.ancorasir.com/?p=1658, for opensourced codes, see https://github.com/ ancorasir/SeeThruFinger
Real-time Spatial-temporal Traversability Assessment via Feature-based Sparse Gaussian Process IROS2025
Terrain analysis is critical for the practical ap- plication of ground mobile robots in real-world tasks, espe- cially in outdoor unstructured environments. In this paper, we propose a novel spatial-temporal traversability assessment method, which aims to enable autonomous robots to effectively navigate through complex terrains. Our approach utilizes sparse Gaussian processes (SGP) to extract geometric features (curvature, gradient, elevation, etc.) directly from point cloud scans. These features are then used to construct a high- resolution local traversability map. Then, we design a spatial- temporal Bayesian Gaussian kernel (BGK) inference method to dynamically evaluate traversability scores, integrating historical and real-time data while considering factors such as slope, flatness, gradient, and uncertainty metrics. GPU acceleration is applied in the feature extraction step, and the system achieves real-time performance. Extensive simulation experiments across diverse terrain scenarios demonstrate that our method outper- forms SOTA approaches in both accuracy and computational efficiency. Additionally, we develop an autonomous navigation framework integrated with the traversability map and validate it with a differential driven vehicle in complex outdoor envi- ronments. Our code will be open-source for further research and development by the community, https://github.com/ZJU-FAST-Lab/FSGP_BGK.
comment: accepted by IROS2025
Whole-Body Model-Predictive Control of Legged Robots with MuJoCo
We demonstrate the surprising real-world effectiveness of a very simple approach to whole-body model-predictive control (MPC) of quadruped and humanoid robots: the iterative LQR (iLQR) algorithm with MuJoCo dynamics and finite-difference approximated derivatives. Building upon the previous success of model-based behavior synthesis and control of locomotion and manipulation tasks with MuJoCo in simulation, we show that these policies can easily generalize to the real world with few sim-to-real considerations. Our baseline method achieves real-time whole-body MPC on a variety of hardware experiments, including dynamic quadruped locomotion, quadruped walking on two legs, and full-sized humanoid bipedal locomotion. We hope this easy-to-reproduce hardware baseline lowers the barrier to entry for real-world whole-body MPC research and contributes to accelerating research velocity in the community. Our code and experiment videos will be available online at:https://johnzhang3.github.io/mujoco_ilqr
comment: under review
Development of a Linear Guide-Rail Testbed for Physically Emulating ISAM Operations
In-Space Servicing, Assembly, and Manufacturing (ISAM) is a set of emerging operations that provides several benefits to improve the longevity, capacity, mo- bility, and expandability of existing and future space assets. Serial robotic ma- nipulators are particularly vital in accomplishing ISAM operations, however, the complex perturbation forces and motions associated with movement of a robotic arm on a free-flying satellite presents a complex controls problem requiring addi- tional study. While many dynamical models are developed, experimentally test- ing and validating these models is challenging given that the models operate in space, where satellites have six-degrees-of-freedom (6-DOF). This paper attempts to resolve those challenges by presenting the design and development of a new hardware-in-the-loop (HIL) experimental testbed utilized to emulate ISAM. This emulation will be accomplished by means of a 6-DOF UR3e robotic arm attached to a satellite bus. This satellite bus is mounted to a 1-DOF guide-rail system, en- abling the satellite bus and robotic arm to move freely in one linear direction. This experimental ISAM emulation system will explore and validate models for space motion, serial robot manipulation, and contact mechanics.
comment: 12 pages, 4 figures, AAS/AIAA Space Flight Mechanics
RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics NeurIPS 2025
Spatial referring is a fundamental capability of embodied robots to interact with the 3D physical world. However, even with the powerful pretrained vision language models (VLMs), recent approaches are still not qualified to accurately understand the complex 3D scenes and dynamically reason about the instruction-indicated locations for interaction. To this end, we propose RoboRefer, a 3D-aware VLM that can first achieve precise spatial understanding by integrating a disentangled but dedicated depth encoder via supervised fine-tuning (SFT). Moreover, RoboRefer advances generalized multi-step spatial reasoning via reinforcement fine-tuning (RFT), with metric-sensitive process reward functions tailored for spatial referring tasks. To support SFT and RFT training, we introduce RefSpatial, a large-scale dataset of 20M QA pairs (2x prior), covering 31 spatial relations (vs. 15 prior) and supporting complex reasoning processes (up to 5 steps). In addition, we introduce RefSpatial-Bench, a challenging benchmark filling the gap in evaluating spatial referring with multi-step reasoning. Experiments show that SFT-trained RoboRefer achieves state-of-the-art spatial understanding, with an average success rate of 89.6%. RFT-trained RoboRefer further outperforms all other baselines by a large margin, even surpassing Gemini-2.5-Pro by 17.4% in average accuracy on RefSpatial-Bench. Notably, RoboRefer can be integrated with various control policies to execute long-horizon, dynamic tasks across diverse robots (e,g., UR5, G1 humanoid) in cluttered real-world scenes. See the project page at https://zhoues.github.io/RoboRefer.
comment: Accepted by NeurIPS 2025. Project page: https://zhoues.github.io/RoboRefer/
GeNIE: A Generalizable Navigation System for In-the-Wild Environments
Reliable navigation in unstructured, real-world environments remains a significant challenge for embodied agents, especially when operating across diverse terrains, weather conditions, and sensor configurations. In this paper, we introduce GeNIE (Generalizable Navigation System for In-the-Wild Environments), a robust navigation framework designed for global deployment. GeNIE integrates a generalizable traversability prediction model built on SAM2 with a novel path fusion strategy that enhances planning stability in noisy and ambiguous settings. We deployed GeNIE in the Earth Rover Challenge (ERC) at ICRA 2025, where it was evaluated across six countries spanning three continents. GeNIE took first place and achieved 79% of the maximum possible score, outperforming the second-best team by 17%, and completed the entire competition without a single human intervention. These results set a new benchmark for robust, generalizable outdoor robot navigation. We will release the codebase, pretrained model weights, and newly curated datasets to support future research in real-world navigation.
comment: Accepted to IEEE Robotics and Automation Letters (RAL), 2025. Jiaming Wang, Diwen Liu, and Jizhuo Chen contributed equally to this work
Guided Multi-Fidelity Bayesian Optimization for Data-driven Controller Tuning with Digital Twins
We propose a \textit{guided multi-fidelity Bayesian optimization} framework for data-efficient controller tuning that integrates corrected digital twin simulations with real-world measurements. The method targets closed-loop systems with limited-fidelity simulations or inexpensive approximations. To address model mismatch, we build a multi-fidelity surrogate with a learned correction model that refines digital twin estimates using real data. An adaptive cost-aware acquisition function balances expected improvement, fidelity, and sampling cost. Our method ensures adaptability as new measurements arrive. The digital twin accuracy is re-estimated, dynamically adapting both cross-source correlations and the acquisition function. This ensures that accurate simulations are used more frequently, while inaccurate simulation data are appropriately downweighted. Experiments on robotic drive hardware and supporting numerical studies demonstrate that our method enhances tuning efficiency compared to standard Bayesian optimization and multi-fidelity methods.
comment: This work has been submitted to IEEE Robotics and Automation Letters (RA-L) for review
Policy Contrastive Decoding for Robotic Foundation Models
Robotic foundation models, or generalist robot policies, hold immense potential to enable flexible, general-purpose and dexterous robotic systems. Despite their advancements, our empirical experiments reveal that existing robot policies are prone to learning spurious correlations from pre-training trajectories, adversely affecting their generalization capabilities beyond the training data. To tackle this, we propose a novel Policy Contrastive Decoding (PCD) approach, which redirects the robot policy's focus toward object-relevant visual clues by contrasting action probability distributions derived from original and object-masked visual inputs. As a training-free method, our PCD can be used as a plugin to improve different types of robot policies without needing to finetune or access model weights. We conduct extensive experiments on top of three open-source robot policies, including the autoregressive policy OpenVLA and the diffusion-based policies Octo and $\pi_0$. The obtained results in both simulation and real-world environments prove PCD's flexibility and effectiveness, e.g., PCD enhances the state-of-the-art policy $\pi_0$ by 8.9% in the simulation environment and by 108% in the real-world environment. Code and demos are publicly available at: https://Koorye.github.io/proj/PCD.
Kinetostatics and Particle-Swarm Optimization of Vehicle-Mounted Underactuated Metamorphic Loading Manipulators
Fixed degree-of-freedom (DoF) loading mechanisms often suffer from excessive actuators, complex control, and limited adaptability to dynamic tasks. This study proposes an innovative mechanism of underactuated metamorphic loading manipulators (UMLM), integrating a metamorphic arm with a passively adaptive gripper. The metamorphic arm exploits geometric constraints, enabling the topology reconfiguration and flexible motion trajectories without additional actuators. The adaptive gripper, driven entirely by the arm, conforms to diverse objects through passive compliance. A structural model is developed, and a kinetostatics analysis is conducted to investigate isomorphic grasping configurations. To optimize performance, Particle-Swarm Optimization (PSO) is utilized to refine the gripper's dimensional parameters, ensuring robust adaptability across various applications. Simulation results validate the UMLM's easily implemented control strategy, operational versatility, and effectiveness in grasping diverse objects in dynamic environments. This work underscores the practical potential of underactuated metamorphic mechanisms in applications requiring efficient and adaptable loading solutions. Beyond the specific design, this generalized modeling and optimization framework extends to a broader class of manipulators, offering a scalable approach to the development of robotic systems that require efficiency, flexibility, and robust performance.
comment: 48 pages, 18 figures
MoReFlow: Motion Retargeting Learning through Unsupervised Flow Matching
Motion retargeting holds a premise of offering a larger set of motion data for characters and robots with different morphologies. Many prior works have approached this problem via either handcrafted constraints or paired motion datasets, limiting their applicability to humanoid characters or narrow behaviors such as locomotion. Moreover, they often assume a fixed notion of retargeting, overlooking domain-specific objectives like style preservation in animation or task-space alignment in robotics. In this work, we propose MoReFlow, Motion Retargeting via Flow Matching, an unsupervised framework that learns correspondences between characters' motion embedding spaces. Our method consists of two stages. First, we train tokenized motion embeddings for each character using a VQ-VAE, yielding compact latent representations. Then, we employ flow matching with conditional coupling to align the latent spaces across characters, which simultaneously learns conditioned and unconditioned matching to achieve robust but flexible retargeting. Once trained, MoReFlow enables flexible and reversible retargeting without requiring paired data. Experiments demonstrate that MoReFlow produces high-quality motions across diverse characters and tasks, offering improved controllability, generalization, and motion realism compared to the baselines.
EDEN: Efficient Dual-Layer Exploration Planning for Fast UAV Autonomous Exploration in Large 3-D Environments
Efficient autonomous exploration in large-scale environments remains challenging due to the high planning computational cost and low-speed maneuvers. In this paper, we propose a fast and computationally efficient dual-layer exploration planning method. The insight of our dual-layer method is efficiently finding an acceptable long-term region routing and greedily exploring the target in the region of the first routing area with high speed. Specifically, the proposed method finds the long-term area routing through an approximate algorithm to ensure real-time planning in large-scale environments. Then, the viewpoint in the first routing region with the lowest curvature-penalized cost, which can effectively reduce decelerations caused by sharp turn motions, will be chosen as the next exploration target. To further speed up the exploration, we adopt an aggressive and safe exploration-oriented trajectory to enhance exploration continuity. The proposed method is compared to state-of-the-art methods in challenging simulation environments. The results show that the proposed method outperforms other methods in terms of exploration efficiency, computational cost, and trajectory speed. We also conduct real-world experiments to validate the effectiveness of the proposed method. The code will be open-sourced.
comment: nothing
EvidMTL: Evidential Multi-Task Learning for Uncertainty-Aware Semantic Surface Mapping from Monocular RGB Images IROS 2025
For scene understanding in unstructured environments, an accurate and uncertainty-aware metric-semantic mapping is required to enable informed action selection by autonomous systems. Existing mapping methods often suffer from overconfident semantic predictions, and sparse and noisy depth sensing, leading to inconsistent map representations. In this paper, we therefore introduce EvidMTL, a multi-task learning framework that uses evidential heads for depth estimation and semantic segmentation, enabling uncertainty-aware inference from monocular RGB images. To enable uncertainty-calibrated evidential multi-task learning, we propose a novel evidential depth loss function that jointly optimizes the belief strength of the depth prediction in conjunction with evidential segmentation loss. Building on this, we present EvidKimera, an uncertainty-aware semantic surface mapping framework, which uses evidential depth and semantics prediction for improved 3D metric-semantic consistency. We train and evaluate EvidMTL on the NYUDepthV2 and assess its zero-shot performance on ScanNetV2, demonstrating superior uncertainty estimation compared to conventional approaches while maintaining comparable depth estimation and semantic segmentation. In zero-shot mapping tests on ScanNetV2, EvidKimera outperforms Kimera in semantic surface mapping accuracy and consistency, highlighting the benefits of uncertainty-aware mapping and underscoring its potential for real-world robotic applications.
comment: Submitted to IROS 2025 Conference
Manual2Skill: Learning to Read Manuals and Acquire Robotic Skills for Furniture Assembly Using Vision-Language Models
Humans possess an extraordinary ability to understand and execute complex manipulation tasks by interpreting abstract instruction manuals. For robots, however, this capability remains a substantial challenge, as they cannot interpret abstract instructions and translate them into executable actions. In this paper, we present Manual2Skill, a novel framework that enables robots to perform complex assembly tasks guided by high-level manual instructions. Our approach leverages a Vision-Language Model (VLM) to extract structured information from instructional images and then uses this information to construct hierarchical assembly graphs. These graphs represent parts, subassemblies, and the relationships between them. To facilitate task execution, a pose estimation model predicts the relative 6D poses of components at each assembly step. At the same time, a motion planning module generates actionable sequences for real-world robotic implementation. We demonstrate the effectiveness of Manual2Skill by successfully assembling several real-world IKEA furniture items. This application highlights its ability to manage long-horizon manipulation tasks with both efficiency and precision, significantly enhancing the practicality of robot learning from instruction manuals. This work marks a step forward in advancing robotic systems capable of understanding and executing complex manipulation tasks in a manner akin to human capabilities.Project Page: https://owensun2004.github.io/Furniture-Assembly-Web/
Auditory Localization and Assessment of Consequential Robot Sounds: A Multi-Method Study in Virtual Reality
Mobile robots increasingly operate alongside humans but are often out of sight, so that humans need to rely on the sounds of the robots to recognize their presence. For successful human-robot interaction (HRI), it is therefore crucial to understand how humans perceive robots by their consequential sounds, i.e., operating noise. Prior research suggests that the sound of a quadruped Go1 is more detectable than that of a wheeled Turtlebot. This study builds on this and examines the human ability to localize consequential sounds of three robots (quadruped Go1, wheeled Turtlebot 2i, wheeled HSR) in Virtual Reality. In a within-subjects design, we assessed participants' localization performance for the robots with and without an acoustic vehicle alerting system (AVAS) for two velocities (0.3, 0.8 m/s) and two trajectories (head-on, radial). In each trial, participants were presented with the sound of a moving robot for 3~s and were tasked to point at its final position (localization task). Localization errors were measured as the absolute angular difference between the participants' estimated and the actual robot position. Results showed that the robot type significantly influenced the localization accuracy and precision, with the sound of the wheeled HSR (especially without AVAS) performing worst under all experimental conditions. Surprisingly, participants rated the HSR sound as more positive, less annoying, and more trustworthy than the Turtlebot and Go1 sound. This reveals a tension between subjective evaluation and objective auditory localization performance. Our findings highlight consequential robot sounds as a critical factor for designing intuitive and effective HRI, with implications for human-centered robot design and social navigation.
Demonstration-Enhanced Adaptable Multi-Objective Robot Navigation
Preference-aligned robot navigation in human environments is typically achieved through learning-based approaches, utilizing user feedback or demonstrations for personalization. However, personal preferences are subject to change and might even be context-dependent. Yet traditional reinforcement learning (RL) approaches with static reward functions often fall short in adapting to evolving user preferences, inevitably reflecting demonstrations once training is completed. This paper introduces a structured framework that combines demonstration-based learning with multi-objective reinforcement learning (MORL). To ensure real-world applicability, our approach allows for dynamic adaptation of the robot navigation policy to changing user preferences without retraining. It fluently modulates the amount of demonstration data reflection and other preference-related objectives. Through rigorous evaluations, including a baseline comparison and sim-to-real transfer on two robots, we demonstrate our framework's capability to adapt to user preferences accurately while achieving high navigational performance in terms of collision avoidance and goal pursuance.
Safe, Task-Consistent Manipulation with Operational Space Control Barrier Functions IROS
Safe real-time control of robotic manipulators in unstructured environments requires handling numerous safety constraints without compromising task performance. Traditional approaches, such as artificial potential fields (APFs), suffer from local minima, oscillations, and limited scalability, while model predictive control (MPC) can be computationally expensive. Control barrier functions (CBFs) offer a promising alternative due to their high level of robustness and low computational cost, but these safety filters must be carefully designed to avoid significant reductions in the overall performance of the manipulator. In this work, we introduce an Operational Space Control Barrier Function (OSCBF) framework that integrates safety constraints while preserving task-consistent behavior. Our approach scales to hundreds of simultaneous constraints while retaining real-time control rates, ensuring collision avoidance, singularity prevention, and workspace containment even in highly cluttered settings or during dynamic motions. By explicitly accounting for the task hierarchy in the CBF objective, we prevent degraded performance across both joint-space and operational-space tasks, when at the limit of safety. We validate performance in both simulation and hardware, and release our open-source high-performance code and media on our project webpage, https://stanfordasl.github.io/oscbf/
comment: To be presented at 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
The Impact of VR and 2D Interfaces on Human Feedback in Preference-Based Robot Learning
Aligning robot navigation with human preferences is essential for ensuring comfortable, and predictable robot movement in shared spaces. While preference-based learning methods, such as reinforcement learning from human feedback (RLHF), enable this alignment, the choice of the preference collection interface may influence the process. Traditional 2D interfaces provide structured views but lack spatial depth, whereas immersive VR offers richer perception, potentially affecting preference articulation. This study systematically examines how the interface modality impacts human preference collection and navigation policy alignment. We introduce a novel dataset of 2,325 human preference queries collected through both VR and 2D interfaces, revealing significant differences in user experience, preference consistency, and policy outcomes. Our findings highlight the trade-offs between immersion, perception, and preference reliability, emphasizing the importance of interface selection in preference-based robot learning. The dataset is available to support future research.
Immersive Explainability: Visualizing Robot Navigation Decisions through XAI Semantic Scene Projections in Virtual Reality
End-to-end robot policies achieve high performance through neural networks trained via reinforcement learning (RL). Yet, their black box nature and abstract reasoning pose challenges for human-robot interaction (HRI), because humans may experience difficulty in understanding and predicting the robot's navigation decisions, hindering trust development. We present a virtual reality (VR) interface that visualizes explainable AI (XAI) outputs and the robot's lidar perception to support intuitive interpretation of RL-based navigation behavior. By visually highlighting objects based on their attribution scores, the interface grounds abstract policy explanations in the scene context. This XAI visualization bridges the gap between obscure numerical XAI attribution scores and a human-centric semantic level of explanation. A within-subjects study with 24 participants evaluated the effectiveness of our interface for four visualization conditions combining XAI and lidar. Participants ranked scene objects across navigation scenarios based on their importance to the robot, followed by a questionnaire assessing subjective understanding and predictability. Results show that semantic projection of attributions significantly enhances non-expert users' objective understanding and subjective awareness of robot behavior. In addition, lidar visualization further improves perceived predictability, underscoring the value of integrating XAI and sensor for transparent, trustworthy HRI.
Multiagent Systems
Unleashing Diverse Thinking Modes in LLMs through Multi-Agent Collaboration
Large Language Models (LLMs) demonstrate strong performance but often lack interpretable reasoning. This paper introduces the Multi-Agent Collaboration Framework for Diverse Thinking Modes (DiMo), which enhances both performance and interpretability by simulating a structured debate among four specialized LLM agents. Each agent embodies a distinct reasoning paradigm, allowing the framework to collaboratively explore diverse cognitive approaches. Through iterative debate, agents challenge and refine initial responses, yielding more robust conclusions and an explicit, auditable reasoning chain. Across six benchmarks and under a unified open-source setup, DiMo improves accuracy over widely used single-model and debate baselines, with the largest gains on math. We position DiMo as a semantics-aware, Web-native multi-agent framework: it models human-machine intelligence with LLM agents that produce semantically typed, URL-annotated evidence chains for explanations and user-friendly interactions. Although our experiments use standard reasoning benchmarks, the framework is designed to be instantiated over Web corpora and knowledge graphs, combining retrieval-augmented reasoning with structured justifications that downstream systems can inspect and reuse.
Prompt Optimization via Retrieved Reasoning Assets and Multi-Agent Analysis
Prompt optimization has emerged as an effective alternative to retraining for improving the performance of Large Language Models (LLMs). However, most existing approaches treat evaluation as a black box, relying solely on numerical scores while offering limited insight into why a prompt succeeds or fails. They also depend heavily on trial-and-error refinements, which are difficult to interpret and control. In this paper, we introduce MA-SAPO, a Multi-Agent framework for Score-Aware Prompt Optimization. Compared to prior methods, MA-SAPO explicitly couples evaluation outcomes with structured reasoning to guide systematic edits. The framework specifically consists of two stages: during the Reasoning Phase, agents collaboratively explain metric scores, diagnose weaknesses, and synthesize targeted refinements that are stored as reusable reasoning assets; during the Test Phase, agents retrieve these assets to analyze optimized prompts and apply only evidence-grounded edits. By turning evaluation signals into interpretable reasoning chains, MA-SAPO produces prompt refinements that are more transparent, auditable, and controllable. Experiments on the HelpSteer1/2 benchmarks demonstrate consistent improvements over single-pass prompting, retrieval-augmented baselines, and prior multi-agent strategies, validating the effectiveness of our approach.
comment: Preprint
Ripple Effect Protocol: Coordinating Agent Populations
Modern AI agents can exchange messages using protocols such as A2A and ACP, yet these mechanisms emphasize communication over coordination. As agent populations grow, this limitation produces brittle collective behavior, where individually smart agents converge on poor group outcomes. We introduce the Ripple Effect Protocol (REP), a coordination protocol in which agents share not only their decisions but also lightweight sensitivities - signals expressing how their choices would change if key environmental variables shifted. These sensitivities ripple through local networks, enabling groups to align faster and more stably than with agent-centric communication alone. We formalize REP's protocol specification, separating required message schemas from optional aggregation rules, and evaluate it across scenarios with varying incentives and network topologies. Benchmarks across three domains: (i) supply chain cascades (Beer Game), (ii) preference aggregation in sparse networks (Movie Scheduling), and (iii) sustainable resource allocation (Fishbanks) show that REP improves coordination accuracy and efficiency over A2A by 41 to 100%, while flexibly handling multimodal sensitivity signals from LLMs. By making coordination a protocol-level capability, REP provides scalable infrastructure for the emerging Internet of Agents
Static Sandboxes Are Inadequate: Modeling Societal Complexity Requires Open-Ended Co-Evolution in LLM-Based Multi-Agent Simulations
What if artificial agents could not just communicate, but also evolve, adapt, and reshape their worlds in ways we cannot fully predict? With llm now powering multi-agent systems and social simulations, we are witnessing new possibilities for modeling open-ended, ever-changing environments. Yet, most current simulations remain constrained within static sandboxes, characterized by predefined tasks, limited dynamics, and rigid evaluation criteria. These limitations prevent them from capturing the complexity of real-world societies. In this paper, we argue that static, task-specific benchmarks are fundamentally inadequate and must be rethought. We critically review emerging architectures that blend llm with multi-agent dynamics, highlight key hurdles such as balancing stability and diversity, evaluating unexpected behaviors, and scaling to greater complexity, and introduce a fresh taxonomy for this rapidly evolving field. Finally, we present a research roadmap centered on open-endedness, continuous co-evolution, and the development of resilient, socially aligned AI ecosystems. We call on the community to move beyond static paradigms and help shape the next generation of adaptive, socially-aware multi-agent simulations.
Agentic System with Modal Logic for Autonomous Diagnostics
The development of intelligent agents, particularly those powered by language models (LMs), has shown a critical role in various environments that require intelligent and autonomous decision-making. Environments are not passive testing grounds, and they represent the data required for agents to learn and exhibit in very challenging conditions that require adaptive, complex, and autonomous capacity to make decisions. While the paradigm of scaling models and datasets has led to remarkable emergent capabilities, we argue that scaling the structure, fidelity, and logical consistency of agent reasoning within these environments is a crucial, yet underexplored, dimension of AI research. This paper introduces a neuro-symbolic multi-agent architecture where the belief states of individual agents are formally represented as Kripke models. This foundational choice enables them to reason about known concepts of \emph{possibility} and \emph{necessity} using the formal language of modal logic. In this work, we use immutable, domain-specific knowledge to make an informed root cause diagnosis, which is encoded as logical constraints essential for proper, reliable, and explainable diagnosis. In the proposed model, we show constraints that actively guide the hypothesis generation of LMs, effectively preventing them from reaching physically or logically untenable conclusions. In a high-fidelity simulated particle accelerator environment, our system successfully diagnoses complex, cascading failures by combining the powerful semantic intuition of LMs with the rigorous, verifiable validation of modal logic and a factual world model and showcasing a viable path toward more robust, reliable, and verifiable autonomous agents.
comment: 10 pages, 1 figure
ReaGAN: Node-as-Agent-Reasoning Graph Agentic Network
Graph Neural Networks (GNNs) have achieved remarkable success in graph-based learning by propagating information among neighbor nodes via predefined aggregation mechanisms. However, such fixed schemes often suffer from two key limitations. First, they cannot handle the imbalance in node informativeness -- some nodes are rich in information, while others remain sparse. Second, predefined message passing primarily leverages local structural similarity while ignoring global semantic relationships across the graph, limiting the model's ability to capture distant but relevant information. We propose Retrieval-augmented Graph Agentic Network (ReaGAN), an agent-based framework that empowers each node with autonomous, node-level decision-making. Each node acts as an agent that independently plans its next action based on its internal memory, enabling node-level planning and adaptive message propagation. Additionally, retrieval-augmented generation (RAG) allows nodes to access semantically relevant content and build global relationships in the graph. ReaGAN achieves competitive performance under few-shot in-context settings using a frozen LLM backbone without fine-tuning, showcasing the potential of agentic planning and local-global retrieval in graph learning.
comment: 11 pages, work in progress
Dominated Actions in Imperfect-Information Games
Dominance is a fundamental concept in game theory. In strategic-form games dominated strategies can be identified in polynomial time. As a consequence, iterative removal of dominated strategies can be performed efficiently as a preprocessing step for reducing the size of a game before computing a Nash equilibrium. For imperfect-information games in extensive form, we could convert the game to strategic form and then iteratively remove dominated strategies in the same way; however, this conversion may cause an exponential blowup in game size. In this paper we define and study the concept of dominated actions in imperfect-information games. Our main result is a polynomial-time algorithm for determining whether an action is dominated (strictly or weakly) by any mixed strategy in n-player games, which can be extended to an algorithm for iteratively removing dominated actions. This allows us to efficiently reduce the size of the game tree as a preprocessing step for Nash equilibrium computation. We explore the role of dominated actions empirically in the "All In or Fold" No-Limit Texas Hold'em poker variant.
Systems and Control (CS)
Robust Dynamic Staffing with Predictions
We consider a natural dynamic staffing problem in which a decision-maker sequentially hires workers over a finite horizon to meet an unknown demand revealed at the end. Predictions about demand arrive over time and become increasingly accurate, while worker availability decreases. This creates a fundamental trade-off between hiring early to avoid understaffing (when workers are more available but forecasts are less reliable) and hiring late to avoid overstaffing (when forecasts are more accurate but availability is lower). This problem is motivated by last-mile delivery operations, where companies such as Amazon rely on gig-economy workers whose availability declines closer to the operating day. To address practical limitations of Bayesian models (in particular, to remain agnostic to the underlying forecasting method), we study this problem under adversarial predictions. In this model, sequential predictions are adversarially chosen uncertainty intervals that (approximately) contain the true demand. The objective is to minimize worst-case staffing imbalance cost. Our main result is a simple and computationally efficient online algorithm that is minimax optimal. We first characterize the minimax cost against a restricted adversary via a polynomial-size linear program, then show how to emulate this solution in the general case. While our base model focuses on a single demand, we extend the framework to multiple demands (with egalitarian/utilitarian objectives), to settings with costly reversals of hiring decisions, and to inconsistent prediction intervals. We also introduce a practical "re-solving" variant of our algorithm, which we prove is also minimax optimal. Finally we conduct numerical experiments showing that our algorithms outperform Bayesian heuristics in both cost and speed, and are competitive with (approximate or exact) Bayesian-optimal policies when those can be computed.
Adversarial Reinforcement Learning for Robust Control of Fixed-Wing Aircraft under Model Uncertainty
This paper presents a reinforcement learning-based path-following controller for a fixed-wing small uncrewed aircraft system (sUAS) that is robust to uncertainties in the aerodynamic model of the sUAS. The controller is trained using the Robust Adversarial Reinforcement Learning framework, where an adversary perturbs the environment (aerodynamic model) to expose the agent (sUAS) to demanding scenarios. In our formulation, the adversary introduces rate-bounded perturbations to the aerodynamic model coefficients. We demonstrate that adversarial training improves robustness compared to controllers trained using stochastic model uncertainty. The learned controller is also benchmarked against a switched uncertain initial condition controller. The effectiveness of the approach is validated through high-fidelity simulations using a realistic six-degree-of-freedom fixed-wing aircraft model, showing accurate and robust path-following performance under a variety of uncertain aerodynamic conditions.
QRTlib: A Library for Fast Quantum Real Transforms
Real-valued transforms such as the discrete cosine, sine, and Hartley transforms play a central role in classical computing, complementing the Fourier transform in applications from signal and image processing to data compression. However, their quantum counterparts have not evolved in parallel, and no unified framework exists for implementing them efficiently on quantum hardware. This article addresses this gap by introducing QRTlib, a library for fast and practical implementations of quantum real transforms, including the quantum Hartley, cosine, and sine transforms of various types. We develop new algorithms and circuit optimizations that make these transforms efficient and suitable for near-term devices. In particular, we present a quantum Hartley transform based on the linear combination of unitaries (LCU) technique, achieving a fourfold reduction in circuit size compared to prior methods, and an improved quantum sine transform of Type I that removes large multi-controlled operations. We also introduce circuit-level optimizations, including two's-complement and or-tree constructions. QRTlib provides the first complete implementations of these quantum real transforms in Qiskit.
Towards Intelligent Traffic Signaling in Dhaka City Based on Vehicle Detection and Congestion Optimization
The vehicular density in urbanizing cities of developing countries such as Dhaka, Bangladesh result in a lot of traffic congestion, causing poor on-road experiences. Traffic signaling is a key component in effective traffic management for such situations, but the advancements in intelligent traffic signaling have been exclusive to developed countries with structured traffic. The non-lane-based, heterogeneous traffic of Dhaka City requires a contextual approach. This study focuses on the development of an intelligent traffic signaling system feasible in the context of developing countries such as Bangladesh. We propose a pipeline leveraging Real Time Streaming Protocol (RTSP) feeds, a low resources system Raspberry Pi 4B processing, and a state of the art YOLO-based object detection model trained on the Non-lane-based and Heterogeneous Traffic (NHT-1071) dataset to detect and classify heterogeneous traffic. A multi-objective optimization algorithm, NSGA-II, then generates optimized signal timings, minimizing waiting time while maximizing vehicle throughput. We test our implementation in a five-road intersection at Palashi, Dhaka, demonstrating the potential to significantly improve traffic management in similar situations. The developed testbed paves the way for more contextual and effective Intelligent Traffic Signaling (ITS) solutions for developing areas with complicated traffic dynamics such as Dhaka City.
comment: 10 pages, Submitted to IEEE Transactions on Intelligent Transportation Systems (T-ITS)
Enhancing Channel Estimation in RIS-aided Systems via Observation Matrix Design
Reconfigurable intelligent surfaces (RISs) have emerged as a promising technology for enhancing wireless communications through dense antenna arrays. Accurate channel estimation is critical to unlocking their full performance potential. To enhance RIS channel estimators, this paper proposes a novel observation matrix design scheme. Bayesian optimization framework is adopted to generate observation matrices that maximize the mutual information between received pilot signals and RIS channels. To solve the formulated problem efficiently, we develop an alternating Riemannian manifold optimization (ARMO) algorithm to alternately update the receiver combiners and RIS phase-shift matrices. An adaptive kernel training strategy is further introduced to iteratively refine the channel covariance matrix without requiring additional pilot resources. Simulation results demonstrate that the proposed ARMO-enhanced estimator achieves substantial gains in estimation accuracy over state-of-the-art methods.
comment: 5 pages, 2 figures
Topology-Aware Hybrid Wi-Fi/BLE Fingerprinting via Evidence-Theoretic Fusion and Persistent Homology
Indoor localization remains challenging in GNSS-denied environments due to multipath, device heterogeneity, and volatile radio conditions. We propose a topology-aware, hybrid Wi-Fi/BLE fingerprinting framework that (i) applies physically consistent RSS normalization (dBm z-scoring or dBm -> linear mW -> z-score), (ii) denoises streams with classical Bayesian filters (KF/UKF/PF), (iii) combines complementary regressors (Random Forest and weighted kNN with a diagonal Mahalanobis metric), (iv) performs evidence-theoretic fusion via Dempster-Shafer theory (DST), and (v) augments each sample with persistent-homology (PH) descriptors. The system outputs both (x, y) estimates and interpretable belief maps, and is engineered for microcontroller-class deployment with per-update cost O(T log M + log M + Mp + S). We evaluate on two heterogeneous datasets, including a new 1,200-sample ESP32 survey, and report ablations, robustness to test-only noise, and significance across 10 stratified splits. Under 10% synthetic RSS noise, the full pipeline attains 3.40 m (Dataset 1) and 2.45 m (Dataset 2) RMSE, improving a strong PF + RF baseline by about 37%. Averaged across splits, it yields 4.993 +/- 0.15 m versus 6.292 +/- 0.13 m (20.6% relative reduction; p < 0.001). In noise-free tests, accuracy tightens to 0.44 m and 0.32 m (up to 56% better). Compared with recent learning-heavy approaches that assume large site-specific datasets and GPU inference, our method delivers competitive accuracy with formal uncertainty quantification and low computational cost suitable for real-time deployment.
SMP-RCR: A Sparse Multipoint Moment Matching Method for RC Reduction
In post--layout circuit simulation, efficient model order reduction (MOR) for many--port resistor--capacitor (RC) circuits remains a crucial issue. The current mainstream MOR methods for such circuits include high--order moment matching methods and elimination methods. High-order moment matching methods--characterized by high accuracy, such as PRIMA and TurboMOR--tend to generate large dense reduced-order systems when the number of ports is large, which impairs the efficiency of MOR. Another common type of MOR method for many--port circuits is based on Gaussian elimination, with the SIP method as a representative. The main limitation of this method lies in the inadequate matching of high--order moments. In this paper, we propose a sparse multipoint moment matching method and present comprehensive theoretical analysis results regarding the multi--frequency high--order moment matching property. Meanwhile, to enhance the algorithm's efficiency, sparse control and deflation techniques are introduced to further optimize the algorithm. Numerical experiments demonstrated that, compared to SIP, the accuracy is improved by more than two orders of magnitude at high frequency points without adding many extra linear components. Compared to TurboMOR methods, our method achieves a speed improvement of more than twice while maintaining the same level of precision.
Small-Signal Stability Analysis of Power Systems by Implicit Multilinear Models
This paper proposes a new approach to perform small-signal stability analysis based on linearization of implicit multilinear models. Multilinear models describe the system dynamics by multilinear functions of state, input, and algebraic variables. Using suitable transformations of variables, they can also represent trigonometric functions, which often occur in power systems modeling. This allows tensor representations of grid-following and grid-forming power converters. This paper introduces small-signal stability analysis of equilibrium points based on implicit multilinear models using generalized eigenvalues. The generalized eigenvalues are computed from linear descriptor models of the linearized implicit multilinear model. The proposed approach is tested using a 3-bus network example, first by comparing time-domain simulations of the implicit multilinear model with those of the nonlinear model, and second by comparing the generalized eigenvalues with those of the linearized nonlinear model. The results show that the decomposed tensor representation of the implicit multilinear model allows for a faster linearization compared to conventional methods in MATLAB Simulink.
Single-Step Digital Backpropagation for O-band Coherent Transmission Systems
We demonstrate digital backpropagation-based compensation of fibre nonlinearities in the near-zero dispersion regime of the O-band. Single-step DBP effectively mitigates self-phase modulation, achieving SNR gains of up to 1.6 dB for 50 Gbaud PDM-256QAM transmission over a 2-span 151 km SMF-28 ULL fibre link.
comment: conference, 3 pages, 2 figures
Stabilization of Nonlinear Systems with State-Dependent Representation: From Model-Based to Direct Data-Driven Control
This paper presents a novel framework for stabilizing nonlinear systems represented in state-dependent form. We first reformulate the nonlinear dynamics as a state-dependent parameter-varying model and synthesize a stabilizing controller offline via tractable linear matrix inequalities (LMIs). The resulting controller guarantees local exponential stability, maintains robustness against disturbances, and provides an estimate of the region of attraction under input saturation. We then extend the formulation to the direct data-driven setting, where a known library of basis functions represents the dynamics with unknown coefficients consistent with noisy experimental data. By leveraging Petersen's lemma, we derive data-dependent LMIs that ensure stability and robustness for all systems compatible with the data. Numerical and physical experimental results validate that our approach achieves rigorous end-to-end guarantees on stability, robustness, and safety directly from finite data without explicit model identification.
AoI-Aware Task Offloading and Transmission Optimization for Industrial IoT Networks: A Branching Deep Reinforcement Learning Approach
In the Industrial Internet of Things (IIoT), the frequent transmission of large amounts of data over wireless networks should meet the stringent timeliness requirements. Particularly, the freshness of packet status updates has a significant impact on the system performance. In this paper, we propose an age-of-information (AoI)-aware multi-base station (BS) real-time monitoring framework to support extensive IIoT deployments. To meet the freshness requirements of IIoT, we formulate a joint task offloading and resource allocation optimization problem with the goal of minimizing long-term average AoI. Tackling the core challenges of combinatorial explosion in multi-BS decision spaces and the stochastic dynamics of IIoT systems is crucial, as these factors render traditional optimization methods intractable. Firstly, an innovative branching-based Dueling Double Deep Q-Network (Branching-D3QN) algorithm is proposed to effectively implement task offloading, which optimizes the convergence performance by reducing the action space complexity from exponential to linear levels. Then, an efficient optimization solution to resource allocation is proposed by proving the semi-definite property of the Hessian matrix of bandwidth and computation resources. Finally, we propose an iterative optimization algorithm for efficient joint task offloading and resource allocation to achieve optimal average AoI performance. Extensive simulations demonstrate that our proposed Branching-D3QN algorithm outperforms both state-of-the-art DRL methods and classical heuristics, achieving up to a 75% enhanced convergence speed and at least a 22% reduction in the long-term average AoI.
comment: 15 pages, 13 figures, submitted to IEEE journal for potential publication
Real-time Measurement-based Optimization for Distribution System Operation Considering Battery Voltage and Thermal Constraints SC
The secure operation of power distribution systems is challenged by the growing integration of distributed energy resources. Leveraging the flexibility of battery storage offers a cost-effective alternative to measures like generation curtailment, which results in energy losses. However, developing an effective operational model for battery storage is hindered by inaccurate grid models, unavailability of load data, nonlinear relationship between power injections and network states, intertemporal constraints, and complex electrochemical and thermal dynamics. To address these challenges, this paper proposes a data-driven operational control scheme for battery storage in distribution systems. Linear and convex quadratic operational constraints are constructed based on real-time distribution system and battery storage measurements. Lyapunov optimization decouples multi-period battery operation, enabling a real-time, forecast-free control strategy with low computational complexity. Numerical studies using nonlinear distribution system and battery storage simulators validate the effectiveness of the approach in ensuring secure distribution system operation and satisfaction of voltage and thermal constraints of battery storage.
comment: 7 pages, submitted to PSCC 2026
Iterative solvers for partial differential equations with dissipative structure: Operator preconditioning and optimal control
This work considers the iterative solution of large-scale problems subject to non-symmetric matrices or operators arising in discretizations of (port-)Hamiltonian partial differential equations. We consider problems governed by an operator $\mathcal{A}=\mathcal{H}+\mathcal{S}$ with symmetric part $\mathcal{H}$ that is positive (semi-)definite and skew-symmetric part $\mathcal{S}$. Prior work has shown that the structure and sparsity of the associated linear system enables Krylov subspace solvers such as the generalized minimal residual method (GMRES) or short recurrence variants such as Widlund's or Rapoport's method using the symmetric part $\mathcal{H}$, or an approximation of it, as preconditioner. In this work, we analyze the resulting condition numbers, which are crucial for fast convergence of these methods, for various partial differential equations (PDEs) arising in diffusion phenomena, fluid dynamics, and elasticity. We show that preconditioning with the symmetric part leads to a condition number uniform in the mesh size in case of elliptic and parabolic PDEs where $\mathcal{H}^{-1}\mathcal{S}$ is a bounded operator. Further, we employ the tailored Krylov subspace methods in optimal control by means of a condensing approach and a constraint preconditioner for the optimality system. We illustrate the results by various large-scale numerical examples and discuss efficient evaluations of the preconditioner, such as incomplete Cholesky factorization or the algebraic multigrid method.
comment: 26 pages, 8 figures
Adaptive Sensing Performance Design for Enhancing Secure Communication in Networked ISAC Systems
The channel state information (CSI) of an eavesdropper is crucial for physical layer security (PLS) design, but it is difficult to obtain due to the passive and non-cooperative nature of the eavesdropper. To this end, integrated sensing and communication (ISAC) offers a novel solution by estimating the CSI of the eavesdropper based on sensing information. However, existing studies normally impose explicit and fixed sensing performance requirement without considering the varying communication conditions, which hinders the system from fully exploiting the synergy between sensing and communication. To address this issue, this paper proposes sensing-enhanced secure communication with adaptive sensing performance. Specifically, we formulate the sensing performance implicitly in the information leakage rate and adaptively optimize it for the minimization of the power consumption, offering enhanced flexibility and adaptability in sensing performance. We consider both centralized and decentralized designs to thoroughly investigate the impact of network structure on system performance and complexity. Specifically, we devise a block coordinate descent (BCD)-based method for centralized design. For decentralized design, we develop an optimization framework based on consensus alternating direction method of multipliers (ADMM) to reduce complexity and information exchange overhead. Experimental results demonstrate the advantage of the proposed implicit sensing performance requirement design due to its capability to adaptively adjust the sensing performance to enhance the system performance for varying system configurations.
comment: 16 pages
Conformal Prediction in The Loop: A Feedback-Based Uncertainty Model for Trajectory Optimization NeurIPS 2025
Conformal Prediction (CP) is a powerful statistical machine learning tool to construct uncertainty sets with coverage guarantees, which has fueled its extensive adoption in generating prediction regions for decision-making tasks, e.g., Trajectory Optimization (TO) in uncertain environments. However, existing methods predominantly employ a sequential scheme, where decisions rely unidirectionally on the prediction regions, and consequently the information from decision-making fails to be fed back to instruct CP. In this paper, we propose a novel Feedback-Based CP (Fb-CP) framework for shrinking-horizon TO with a joint risk constraint over the entire mission time. Specifically, a CP-based posterior risk calculation method is developed by fully leveraging the realized trajectories to adjust the posterior allowable risk, which is then allocated to future times to update prediction regions. In this way, the information in the realized trajectories is continuously fed back to the CP, enabling attractive feedback-based adjustments of the prediction regions and a provable online improvement in trajectory performance. Furthermore, we theoretically prove that such adjustments consistently maintain the coverage guarantees of the prediction regions, thereby ensuring provable safety. Additionally, we develop a decision-focused iterative risk allocation algorithm with theoretical convergence analysis for allocating the posterior allowable risk which closely aligns with Fb-CP. Furthermore, we extend the proposed method to handle distribution shift. The effectiveness and superiority of the proposed method are demonstrated through benchmark experiments.
comment: Accepted by NeurIPS 2025 Main Track
Supervisory Control of Hybrid Power Plants Using Online Feedback Optimization: Designs and Validations with a Hybrid Co-Simulation Engine
This research investigates designing a supervisory feedback controller for a hybrid power plant that coordinates the wind, solar, and battery energy storage plants to meet the desired power demands. We have explored an online feedback control design that does not require detailed knowledge about the models, known as feedback optimization. The control inputs are updated using the gradient information of the cost and the outputs with respect to the input control commands. This enables us to adjust the active power references of wind, solar, and storage plants to meet the power generation requirements set by grid operators. The methodology also ensures robust control performance in the presence of uncertainties in the weather. In this paper, we focus on describing the supervisory feedback optimization formulation and control-oriented modeling for individual renewable and storage components of the hybrid power plant. The proposed supervisory control has been integrated with the hybrid plant co-simulation engine, Hercules, demonstrating its effectiveness in more realistic simulation scenarios.
comment: 20 pages, 9 figures
Predictability of Complex Systems
The study of complex systems has attracted widespread attention from researchers in the fields of natural sciences, social sciences, and engineering. Prediction is one of the central issues in this field. Although most related studies have focused on prediction methods, research on the predictability of complex systems has received increasing attention across disciplines--aiming to provide theories and tools to address a key question: What are the limits of prediction accuracy? Predictability itself can serve as an important feature for characterizing complex systems, and accurate estimation of predictability can provide a benchmark for the study of prediction algorithms. This allows researchers to clearly identify the gap between current prediction accuracy and theoretical limits, thereby helping them determine whether there is still significant room to improve existing algorithms. More importantly, investigating predictability often requires the development of new theories and methods, which can further inspire the design of more effective algorithms. Over the past few decades, this field has undergone significant evolution. In particular, the rapid development of data science has introduced a wealth of data-driven approaches for understanding and quantifying predictability. This review summarizes representative achievements, integrating both data-driven and mechanistic perspectives. After a brief introduction to the significance of the topic in focus, we will explore three core aspects: the predictability of time series, the predictability of network structures, and the predictability of dynamical processes. Finally, we will provide extensive application examples across various fields and outline open challenges for future research.
AC Dynamics-aware Trajectory Optimization with Binary Enforcement for Adaptive UFLS Design
The high penetration of distributed energy resources, resulting in backfeed of power at the transmission and distribution interface, is causing conventional underfrequency load shedding (UFLS) schemes to become nonconforming. Adaptive schemes that update UFLS relay settings recursively in time offer a solution, but existing adaptive techniques that obtain UFLS relay settings with linearized or reduced-order model formulations fail to capture AC nonlinear network behavior. In practice, this will result in relays unable to restore system frequency during adverse disturbances. We formulate an adaptive UFLS problem as a trajectory optimization and include the full AC nonlinear network dynamics to ensure AC feasibility and time-coordinated control actions. We include binary decisions to model relay switching action and time-delayed multi-stage load-shedding. However, this formulation results in an intractable MINLP problem. To enforce model tractability, we relax these binary variables into continuous surrogates and reformulate the MINLP as a sequence of NLPs. We solve the NLPs with a homotopy-driven method that enforces near-integer-feasible solutions. We evaluate the framework on multiple synthetic transmission systems and demonstrate that it scales efficiently to networks exceeding 1500+ nodes with over 170k+ continuous and 73k+ binary decision variables, while successfully recovering binary-feasible solutions that arrest the frequency decline during worst-case disturbance.
Towards Smart Manufacturing Metaverse via Digital Twinning in Extended Reality
The rapid evolution of modern manufacturing systems is driven by the integration of emerging metaverse technologies such as artificial intelligence (AI), digital twin (DT) with different forms of extended reality (XR) like virtual reality (VR), augmented reality (AR), and mixed reality (MR). These advances confront manufacturing workers with complex and evolving environments that demand digital literacy for problem solving in the future workplace. However, manufacturing industry faces a critical shortage of skilled workforce with digital literacy in the world. Further, global pandemic has significantly changed how people work and collaborate digitally and remotely. There is an urgent need to rethink digital platformization and leverage emerging technologies to propel industrial evolution toward human-centered manufacturing metaverse (MfgVerse). This paper presents a forward-looking perspective on the development of smart MfgVerse, highlighting current efforts in learning factory, cognitive digital twinning, and the new sharing economy of manufacturing-as-a-service (MaaS). MfgVerse is converging into multiplex networks, including a social network of human stakeholders, an interconnected network of manufacturing things or agents (e.g., machines, robots, facilities, material handling systems), a network of digital twins of physical things, as well as auxiliary networks of sales, supply chain, logistics, and remanufacturing systems. We also showcase the design and development of a learning factory for workforce training in extended reality. Finally, future directions, challenges, and opportunities are discussed for human-centered manufacturing metaverse. We hope this work helps stimulate more comprehensive studies and in-depth research efforts to advance MfgVerse technologies.
An ANN-Enhanced Approach for Flatness-Based Constrained Control of Nonlinear Systems
Neural networks have proven practical for a synergistic combination of advanced control techniques. This work analyzes the implementation of rectified linear unit neural networks to achieve constrained control in differentially flat systems. Specifically, the class of flat systems enjoys the benefit of feedback linearizability, i.e., the systems can be linearized by means of a proper variable transformation. However, the price for linearizing the dynamics is that the constraint descriptions are distorted geometrically. Our results show that, by using neural networks, these constraints can be represented as a union of polytopes, enabling the use of mixed-integer programming tools to guarantee constraint satisfaction. We further analyze the integration of the characterization into efficient settings such as control Lyapunov function-based and model predictive control (MPC). Interestingly, this description also allows us to explicitly compute the solution of the MPC problem for the nonlinear system. Several examples are provided to illustrate the effectiveness of our framework.
FlipDyn with Control: Resource Takeover Games with Dynamics
We introduce FlipDyn with control, a finite-horizon zero-sum resource takeover game, where a defender and an adversary decide when to takeover and how to control a common resource. At each discrete-time step, the players can take over or retain control, incurring state and control-dependent costs. The system is modeled as a hybrid dynamical system, with a discrete \texttt{FlipDyn} state determining control authority. Our contributions are: (i) For arbitrary non-negative costs, we derive the saddle-point value of the \texttt{FlipDyn} game and the corresponding Nash equilibria (NE) takeover strategies. (ii) For linear dynamical systems with quadratic costs, we establish sufficient conditions under which the game admits an NE. (iii) For scalar linear dynamical systems with quadratic costs, we derive parameterized NE takeover strategies and saddle-point values independent of the continuous state. (iv) For higher-dimensional linear dynamical systems with quadratic costs, we derive approximate NE takeover strategies and control policies, and compute bounds on the saddle-point values. We validate our results through a numerical study on adversarial control of a linear system.
comment: 17 Pages, 2 figures. Under review at IEEE TAC
Physics-Informed Deep B-Spline Networks
Physics-informed machine learning offers a promising framework for solving complex partial differential equations (PDEs) by integrating observational data with governing physical laws. However, learning PDEs with varying parameters and changing initial conditions and boundary conditions (ICBCs) with theoretical guarantees remains an open challenge. In this paper, we propose physics-informed deep B-spline networks, a novel technique that approximates a family of PDEs with different parameters and ICBCs by learning B-spline control points through neural networks. The proposed B-spline representation reduces the learning task from predicting solution values over the entire domain to learning a compact set of control points, enforces strict compliance to initial and Dirichlet boundary conditions by construction, and enables analytical computation of derivatives for incorporating PDE residual losses. While existing approximation and generalization theories are not applicable in this setting - where solutions of parametrized PDE families are represented via B-spline bases - we fill this gap by showing that B-spline networks are universal approximators for such families under mild conditions. We also derive generalization error bounds for physics-informed learning in both elliptic and parabolic PDE settings, establishing new theoretical guarantees. Finally, we demonstrate in experiments that the proposed technique has improved efficiency-accuracy tradeoffs compared to existing techniques in a dynamical system problem with discontinuous ICBCs and can handle nonhomogeneous ICBCs and non-rectangular domains.
Passivity-Based Robust Shape Control of a Cable-Driven Solar Sail Boom for the CABLESSail Concept
Solar sails provide a means of propulsion using solar radiation pressure, which offers the possibility of exciting new spacecraft capabilities. However, solar sails have attitude control challenges because of the significant disturbance torques that they encounter due to imperfections in the sail and its supporting structure, as well as limited actuation capabilities. The Cable-Actuated Bio-inspired Lightweight Elastic Solar Sail (CABLESSail) concept was previously proposed to overcome these challenges by controlling the shape of the sail through cable actuation. The structural flexibility of CABLESSail introduces control challenges, which necessitate the design of a robust feedback controller for this system. The purpose of the proposed research here is to design a robust controller to ensure precise and reliable control of CABLESSail's boom. Taking into account the system dynamics and the dynamic properties of the CABLESSail concept, a passivity-based proportional-derivative (PD) controller for a single boom on the CABLESSail system is designed. To reach the nonzero desired setpoints, a feedforward input is additionally applied to the control law and a time-varying feedforward input is used instead of the constant one to effectively track a time-varying desired boom tip deflection. This control law is assessed by numerical simulations and by tests using a smaller-scale prototype of Solar Cruiser. Both the simulation and the test results show that this PD control with the time-varying feedforward input robustly controls the flexible cable-actuated solar sail.
comment: Submitted to Acta Astronautica
Whole-Body Model-Predictive Control of Legged Robots with MuJoCo
We demonstrate the surprising real-world effectiveness of a very simple approach to whole-body model-predictive control (MPC) of quadruped and humanoid robots: the iterative LQR (iLQR) algorithm with MuJoCo dynamics and finite-difference approximated derivatives. Building upon the previous success of model-based behavior synthesis and control of locomotion and manipulation tasks with MuJoCo in simulation, we show that these policies can easily generalize to the real world with few sim-to-real considerations. Our baseline method achieves real-time whole-body MPC on a variety of hardware experiments, including dynamic quadruped locomotion, quadruped walking on two legs, and full-sized humanoid bipedal locomotion. We hope this easy-to-reproduce hardware baseline lowers the barrier to entry for real-world whole-body MPC research and contributes to accelerating research velocity in the community. Our code and experiment videos will be available online at:https://johnzhang3.github.io/mujoco_ilqr
comment: under review
The impact of large-scale EV charging on the real-time operation of distribution systems: A comprehensive review
With the large-scale integration of electric vehicles (EVs) in the distribution grid, the unpredictable nature of EV charging introduces considerable uncertainties to the grid's real-time operations. This can exacerbate load fluctuations, compromise power quality, and pose risks to the grid's stability and security. However, due to their dual role as controllable loads and energy storage devices, EVs have the potential to mitigate these fluctuations, balance the variability of renewable energy sources, and provide ancillary services that support grid stability. By leveraging the bidirectional flow of information and energy in smart grids, the adverse effects of EV charging can be minimized and even converted into beneficial outcomes through effective real-time management strategies. This paper explores the negative impacts of EV charging on the distribution system's real-time operations and outlines methods to transform these challenges into positive contributions. Additionally, it provides an in-depth analysis of the real-time management system for EV charging, focusing on state estimation and management strategies.
Guided Multi-Fidelity Bayesian Optimization for Data-driven Controller Tuning with Digital Twins
We propose a \textit{guided multi-fidelity Bayesian optimization} framework for data-efficient controller tuning that integrates corrected digital twin simulations with real-world measurements. The method targets closed-loop systems with limited-fidelity simulations or inexpensive approximations. To address model mismatch, we build a multi-fidelity surrogate with a learned correction model that refines digital twin estimates using real data. An adaptive cost-aware acquisition function balances expected improvement, fidelity, and sampling cost. Our method ensures adaptability as new measurements arrive. The digital twin accuracy is re-estimated, dynamically adapting both cross-source correlations and the acquisition function. This ensures that accurate simulations are used more frequently, while inaccurate simulation data are appropriately downweighted. Experiments on robotic drive hardware and supporting numerical studies demonstrate that our method enhances tuning efficiency compared to standard Bayesian optimization and multi-fidelity methods.
comment: This work has been submitted to IEEE Robotics and Automation Letters (RA-L) for review
Axial current as the origin of quantum intrinsic orbital angular momentum
We show that the axial current density is the physical origin (generator) of quantum intrinsic orbital angular momentum (IOAM). Without the axial current, the IOAM of particles vanishes. Broadly speaking, we argue that the spiral or interference characteristics of the axial current density determine the occurrence of nonlinear or tunneling effects in any spacetime-dependent quantum systems. Our findings offer a comprehensive theoretical framework that addresses the limitations of Keldysh's ionization theory and provides new insights into the angular momentum properties of quantum systems, particularly in tunneling-dominated regimes. Using Wigner function methods, fermionic generalized two-level model, and Berry phase simulations, we predict that IOAM effect can persist even in pure quantum tunneling processes. These results open the door for experimental verification of IOAM effects in future high-intensity QED experiments, such as those using X-ray free electron lasers.
comment: 8 pages, 2 figures
Systems and Control (EESS)
Robust Dynamic Staffing with Predictions
We consider a natural dynamic staffing problem in which a decision-maker sequentially hires workers over a finite horizon to meet an unknown demand revealed at the end. Predictions about demand arrive over time and become increasingly accurate, while worker availability decreases. This creates a fundamental trade-off between hiring early to avoid understaffing (when workers are more available but forecasts are less reliable) and hiring late to avoid overstaffing (when forecasts are more accurate but availability is lower). This problem is motivated by last-mile delivery operations, where companies such as Amazon rely on gig-economy workers whose availability declines closer to the operating day. To address practical limitations of Bayesian models (in particular, to remain agnostic to the underlying forecasting method), we study this problem under adversarial predictions. In this model, sequential predictions are adversarially chosen uncertainty intervals that (approximately) contain the true demand. The objective is to minimize worst-case staffing imbalance cost. Our main result is a simple and computationally efficient online algorithm that is minimax optimal. We first characterize the minimax cost against a restricted adversary via a polynomial-size linear program, then show how to emulate this solution in the general case. While our base model focuses on a single demand, we extend the framework to multiple demands (with egalitarian/utilitarian objectives), to settings with costly reversals of hiring decisions, and to inconsistent prediction intervals. We also introduce a practical "re-solving" variant of our algorithm, which we prove is also minimax optimal. Finally we conduct numerical experiments showing that our algorithms outperform Bayesian heuristics in both cost and speed, and are competitive with (approximate or exact) Bayesian-optimal policies when those can be computed.
Adversarial Reinforcement Learning for Robust Control of Fixed-Wing Aircraft under Model Uncertainty
This paper presents a reinforcement learning-based path-following controller for a fixed-wing small uncrewed aircraft system (sUAS) that is robust to uncertainties in the aerodynamic model of the sUAS. The controller is trained using the Robust Adversarial Reinforcement Learning framework, where an adversary perturbs the environment (aerodynamic model) to expose the agent (sUAS) to demanding scenarios. In our formulation, the adversary introduces rate-bounded perturbations to the aerodynamic model coefficients. We demonstrate that adversarial training improves robustness compared to controllers trained using stochastic model uncertainty. The learned controller is also benchmarked against a switched uncertain initial condition controller. The effectiveness of the approach is validated through high-fidelity simulations using a realistic six-degree-of-freedom fixed-wing aircraft model, showing accurate and robust path-following performance under a variety of uncertain aerodynamic conditions.
QRTlib: A Library for Fast Quantum Real Transforms
Real-valued transforms such as the discrete cosine, sine, and Hartley transforms play a central role in classical computing, complementing the Fourier transform in applications from signal and image processing to data compression. However, their quantum counterparts have not evolved in parallel, and no unified framework exists for implementing them efficiently on quantum hardware. This article addresses this gap by introducing QRTlib, a library for fast and practical implementations of quantum real transforms, including the quantum Hartley, cosine, and sine transforms of various types. We develop new algorithms and circuit optimizations that make these transforms efficient and suitable for near-term devices. In particular, we present a quantum Hartley transform based on the linear combination of unitaries (LCU) technique, achieving a fourfold reduction in circuit size compared to prior methods, and an improved quantum sine transform of Type I that removes large multi-controlled operations. We also introduce circuit-level optimizations, including two's-complement and or-tree constructions. QRTlib provides the first complete implementations of these quantum real transforms in Qiskit.
Towards Intelligent Traffic Signaling in Dhaka City Based on Vehicle Detection and Congestion Optimization
The vehicular density in urbanizing cities of developing countries such as Dhaka, Bangladesh result in a lot of traffic congestion, causing poor on-road experiences. Traffic signaling is a key component in effective traffic management for such situations, but the advancements in intelligent traffic signaling have been exclusive to developed countries with structured traffic. The non-lane-based, heterogeneous traffic of Dhaka City requires a contextual approach. This study focuses on the development of an intelligent traffic signaling system feasible in the context of developing countries such as Bangladesh. We propose a pipeline leveraging Real Time Streaming Protocol (RTSP) feeds, a low resources system Raspberry Pi 4B processing, and a state of the art YOLO-based object detection model trained on the Non-lane-based and Heterogeneous Traffic (NHT-1071) dataset to detect and classify heterogeneous traffic. A multi-objective optimization algorithm, NSGA-II, then generates optimized signal timings, minimizing waiting time while maximizing vehicle throughput. We test our implementation in a five-road intersection at Palashi, Dhaka, demonstrating the potential to significantly improve traffic management in similar situations. The developed testbed paves the way for more contextual and effective Intelligent Traffic Signaling (ITS) solutions for developing areas with complicated traffic dynamics such as Dhaka City.
comment: 10 pages, Submitted to IEEE Transactions on Intelligent Transportation Systems (T-ITS)
Enhancing Channel Estimation in RIS-aided Systems via Observation Matrix Design
Reconfigurable intelligent surfaces (RISs) have emerged as a promising technology for enhancing wireless communications through dense antenna arrays. Accurate channel estimation is critical to unlocking their full performance potential. To enhance RIS channel estimators, this paper proposes a novel observation matrix design scheme. Bayesian optimization framework is adopted to generate observation matrices that maximize the mutual information between received pilot signals and RIS channels. To solve the formulated problem efficiently, we develop an alternating Riemannian manifold optimization (ARMO) algorithm to alternately update the receiver combiners and RIS phase-shift matrices. An adaptive kernel training strategy is further introduced to iteratively refine the channel covariance matrix without requiring additional pilot resources. Simulation results demonstrate that the proposed ARMO-enhanced estimator achieves substantial gains in estimation accuracy over state-of-the-art methods.
comment: 5 pages, 2 figures
Topology-Aware Hybrid Wi-Fi/BLE Fingerprinting via Evidence-Theoretic Fusion and Persistent Homology
Indoor localization remains challenging in GNSS-denied environments due to multipath, device heterogeneity, and volatile radio conditions. We propose a topology-aware, hybrid Wi-Fi/BLE fingerprinting framework that (i) applies physically consistent RSS normalization (dBm z-scoring or dBm -> linear mW -> z-score), (ii) denoises streams with classical Bayesian filters (KF/UKF/PF), (iii) combines complementary regressors (Random Forest and weighted kNN with a diagonal Mahalanobis metric), (iv) performs evidence-theoretic fusion via Dempster-Shafer theory (DST), and (v) augments each sample with persistent-homology (PH) descriptors. The system outputs both (x, y) estimates and interpretable belief maps, and is engineered for microcontroller-class deployment with per-update cost O(T log M + log M + Mp + S). We evaluate on two heterogeneous datasets, including a new 1,200-sample ESP32 survey, and report ablations, robustness to test-only noise, and significance across 10 stratified splits. Under 10% synthetic RSS noise, the full pipeline attains 3.40 m (Dataset 1) and 2.45 m (Dataset 2) RMSE, improving a strong PF + RF baseline by about 37%. Averaged across splits, it yields 4.993 +/- 0.15 m versus 6.292 +/- 0.13 m (20.6% relative reduction; p < 0.001). In noise-free tests, accuracy tightens to 0.44 m and 0.32 m (up to 56% better). Compared with recent learning-heavy approaches that assume large site-specific datasets and GPU inference, our method delivers competitive accuracy with formal uncertainty quantification and low computational cost suitable for real-time deployment.
SMP-RCR: A Sparse Multipoint Moment Matching Method for RC Reduction
In post--layout circuit simulation, efficient model order reduction (MOR) for many--port resistor--capacitor (RC) circuits remains a crucial issue. The current mainstream MOR methods for such circuits include high--order moment matching methods and elimination methods. High-order moment matching methods--characterized by high accuracy, such as PRIMA and TurboMOR--tend to generate large dense reduced-order systems when the number of ports is large, which impairs the efficiency of MOR. Another common type of MOR method for many--port circuits is based on Gaussian elimination, with the SIP method as a representative. The main limitation of this method lies in the inadequate matching of high--order moments. In this paper, we propose a sparse multipoint moment matching method and present comprehensive theoretical analysis results regarding the multi--frequency high--order moment matching property. Meanwhile, to enhance the algorithm's efficiency, sparse control and deflation techniques are introduced to further optimize the algorithm. Numerical experiments demonstrated that, compared to SIP, the accuracy is improved by more than two orders of magnitude at high frequency points without adding many extra linear components. Compared to TurboMOR methods, our method achieves a speed improvement of more than twice while maintaining the same level of precision.
Small-Signal Stability Analysis of Power Systems by Implicit Multilinear Models
This paper proposes a new approach to perform small-signal stability analysis based on linearization of implicit multilinear models. Multilinear models describe the system dynamics by multilinear functions of state, input, and algebraic variables. Using suitable transformations of variables, they can also represent trigonometric functions, which often occur in power systems modeling. This allows tensor representations of grid-following and grid-forming power converters. This paper introduces small-signal stability analysis of equilibrium points based on implicit multilinear models using generalized eigenvalues. The generalized eigenvalues are computed from linear descriptor models of the linearized implicit multilinear model. The proposed approach is tested using a 3-bus network example, first by comparing time-domain simulations of the implicit multilinear model with those of the nonlinear model, and second by comparing the generalized eigenvalues with those of the linearized nonlinear model. The results show that the decomposed tensor representation of the implicit multilinear model allows for a faster linearization compared to conventional methods in MATLAB Simulink.
Single-Step Digital Backpropagation for O-band Coherent Transmission Systems
We demonstrate digital backpropagation-based compensation of fibre nonlinearities in the near-zero dispersion regime of the O-band. Single-step DBP effectively mitigates self-phase modulation, achieving SNR gains of up to 1.6 dB for 50 Gbaud PDM-256QAM transmission over a 2-span 151 km SMF-28 ULL fibre link.
comment: conference, 3 pages, 2 figures
Stabilization of Nonlinear Systems with State-Dependent Representation: From Model-Based to Direct Data-Driven Control
This paper presents a novel framework for stabilizing nonlinear systems represented in state-dependent form. We first reformulate the nonlinear dynamics as a state-dependent parameter-varying model and synthesize a stabilizing controller offline via tractable linear matrix inequalities (LMIs). The resulting controller guarantees local exponential stability, maintains robustness against disturbances, and provides an estimate of the region of attraction under input saturation. We then extend the formulation to the direct data-driven setting, where a known library of basis functions represents the dynamics with unknown coefficients consistent with noisy experimental data. By leveraging Petersen's lemma, we derive data-dependent LMIs that ensure stability and robustness for all systems compatible with the data. Numerical and physical experimental results validate that our approach achieves rigorous end-to-end guarantees on stability, robustness, and safety directly from finite data without explicit model identification.
AoI-Aware Task Offloading and Transmission Optimization for Industrial IoT Networks: A Branching Deep Reinforcement Learning Approach
In the Industrial Internet of Things (IIoT), the frequent transmission of large amounts of data over wireless networks should meet the stringent timeliness requirements. Particularly, the freshness of packet status updates has a significant impact on the system performance. In this paper, we propose an age-of-information (AoI)-aware multi-base station (BS) real-time monitoring framework to support extensive IIoT deployments. To meet the freshness requirements of IIoT, we formulate a joint task offloading and resource allocation optimization problem with the goal of minimizing long-term average AoI. Tackling the core challenges of combinatorial explosion in multi-BS decision spaces and the stochastic dynamics of IIoT systems is crucial, as these factors render traditional optimization methods intractable. Firstly, an innovative branching-based Dueling Double Deep Q-Network (Branching-D3QN) algorithm is proposed to effectively implement task offloading, which optimizes the convergence performance by reducing the action space complexity from exponential to linear levels. Then, an efficient optimization solution to resource allocation is proposed by proving the semi-definite property of the Hessian matrix of bandwidth and computation resources. Finally, we propose an iterative optimization algorithm for efficient joint task offloading and resource allocation to achieve optimal average AoI performance. Extensive simulations demonstrate that our proposed Branching-D3QN algorithm outperforms both state-of-the-art DRL methods and classical heuristics, achieving up to a 75% enhanced convergence speed and at least a 22% reduction in the long-term average AoI.
comment: 15 pages, 13 figures, submitted to IEEE journal for potential publication
Real-time Measurement-based Optimization for Distribution System Operation Considering Battery Voltage and Thermal Constraints SC
The secure operation of power distribution systems is challenged by the growing integration of distributed energy resources. Leveraging the flexibility of battery storage offers a cost-effective alternative to measures like generation curtailment, which results in energy losses. However, developing an effective operational model for battery storage is hindered by inaccurate grid models, unavailability of load data, nonlinear relationship between power injections and network states, intertemporal constraints, and complex electrochemical and thermal dynamics. To address these challenges, this paper proposes a data-driven operational control scheme for battery storage in distribution systems. Linear and convex quadratic operational constraints are constructed based on real-time distribution system and battery storage measurements. Lyapunov optimization decouples multi-period battery operation, enabling a real-time, forecast-free control strategy with low computational complexity. Numerical studies using nonlinear distribution system and battery storage simulators validate the effectiveness of the approach in ensuring secure distribution system operation and satisfaction of voltage and thermal constraints of battery storage.
comment: 7 pages, submitted to PSCC 2026
Iterative solvers for partial differential equations with dissipative structure: Operator preconditioning and optimal control
This work considers the iterative solution of large-scale problems subject to non-symmetric matrices or operators arising in discretizations of (port-)Hamiltonian partial differential equations. We consider problems governed by an operator $\mathcal{A}=\mathcal{H}+\mathcal{S}$ with symmetric part $\mathcal{H}$ that is positive (semi-)definite and skew-symmetric part $\mathcal{S}$. Prior work has shown that the structure and sparsity of the associated linear system enables Krylov subspace solvers such as the generalized minimal residual method (GMRES) or short recurrence variants such as Widlund's or Rapoport's method using the symmetric part $\mathcal{H}$, or an approximation of it, as preconditioner. In this work, we analyze the resulting condition numbers, which are crucial for fast convergence of these methods, for various partial differential equations (PDEs) arising in diffusion phenomena, fluid dynamics, and elasticity. We show that preconditioning with the symmetric part leads to a condition number uniform in the mesh size in case of elliptic and parabolic PDEs where $\mathcal{H}^{-1}\mathcal{S}$ is a bounded operator. Further, we employ the tailored Krylov subspace methods in optimal control by means of a condensing approach and a constraint preconditioner for the optimality system. We illustrate the results by various large-scale numerical examples and discuss efficient evaluations of the preconditioner, such as incomplete Cholesky factorization or the algebraic multigrid method.
comment: 26 pages, 8 figures
Adaptive Sensing Performance Design for Enhancing Secure Communication in Networked ISAC Systems
The channel state information (CSI) of an eavesdropper is crucial for physical layer security (PLS) design, but it is difficult to obtain due to the passive and non-cooperative nature of the eavesdropper. To this end, integrated sensing and communication (ISAC) offers a novel solution by estimating the CSI of the eavesdropper based on sensing information. However, existing studies normally impose explicit and fixed sensing performance requirement without considering the varying communication conditions, which hinders the system from fully exploiting the synergy between sensing and communication. To address this issue, this paper proposes sensing-enhanced secure communication with adaptive sensing performance. Specifically, we formulate the sensing performance implicitly in the information leakage rate and adaptively optimize it for the minimization of the power consumption, offering enhanced flexibility and adaptability in sensing performance. We consider both centralized and decentralized designs to thoroughly investigate the impact of network structure on system performance and complexity. Specifically, we devise a block coordinate descent (BCD)-based method for centralized design. For decentralized design, we develop an optimization framework based on consensus alternating direction method of multipliers (ADMM) to reduce complexity and information exchange overhead. Experimental results demonstrate the advantage of the proposed implicit sensing performance requirement design due to its capability to adaptively adjust the sensing performance to enhance the system performance for varying system configurations.
comment: 16 pages
Conformal Prediction in The Loop: A Feedback-Based Uncertainty Model for Trajectory Optimization NeurIPS 2025
Conformal Prediction (CP) is a powerful statistical machine learning tool to construct uncertainty sets with coverage guarantees, which has fueled its extensive adoption in generating prediction regions for decision-making tasks, e.g., Trajectory Optimization (TO) in uncertain environments. However, existing methods predominantly employ a sequential scheme, where decisions rely unidirectionally on the prediction regions, and consequently the information from decision-making fails to be fed back to instruct CP. In this paper, we propose a novel Feedback-Based CP (Fb-CP) framework for shrinking-horizon TO with a joint risk constraint over the entire mission time. Specifically, a CP-based posterior risk calculation method is developed by fully leveraging the realized trajectories to adjust the posterior allowable risk, which is then allocated to future times to update prediction regions. In this way, the information in the realized trajectories is continuously fed back to the CP, enabling attractive feedback-based adjustments of the prediction regions and a provable online improvement in trajectory performance. Furthermore, we theoretically prove that such adjustments consistently maintain the coverage guarantees of the prediction regions, thereby ensuring provable safety. Additionally, we develop a decision-focused iterative risk allocation algorithm with theoretical convergence analysis for allocating the posterior allowable risk which closely aligns with Fb-CP. Furthermore, we extend the proposed method to handle distribution shift. The effectiveness and superiority of the proposed method are demonstrated through benchmark experiments.
comment: Accepted by NeurIPS 2025 Main Track
Supervisory Control of Hybrid Power Plants Using Online Feedback Optimization: Designs and Validations with a Hybrid Co-Simulation Engine
This research investigates designing a supervisory feedback controller for a hybrid power plant that coordinates the wind, solar, and battery energy storage plants to meet the desired power demands. We have explored an online feedback control design that does not require detailed knowledge about the models, known as feedback optimization. The control inputs are updated using the gradient information of the cost and the outputs with respect to the input control commands. This enables us to adjust the active power references of wind, solar, and storage plants to meet the power generation requirements set by grid operators. The methodology also ensures robust control performance in the presence of uncertainties in the weather. In this paper, we focus on describing the supervisory feedback optimization formulation and control-oriented modeling for individual renewable and storage components of the hybrid power plant. The proposed supervisory control has been integrated with the hybrid plant co-simulation engine, Hercules, demonstrating its effectiveness in more realistic simulation scenarios.
comment: 20 pages, 9 figures
Predictability of Complex Systems
The study of complex systems has attracted widespread attention from researchers in the fields of natural sciences, social sciences, and engineering. Prediction is one of the central issues in this field. Although most related studies have focused on prediction methods, research on the predictability of complex systems has received increasing attention across disciplines--aiming to provide theories and tools to address a key question: What are the limits of prediction accuracy? Predictability itself can serve as an important feature for characterizing complex systems, and accurate estimation of predictability can provide a benchmark for the study of prediction algorithms. This allows researchers to clearly identify the gap between current prediction accuracy and theoretical limits, thereby helping them determine whether there is still significant room to improve existing algorithms. More importantly, investigating predictability often requires the development of new theories and methods, which can further inspire the design of more effective algorithms. Over the past few decades, this field has undergone significant evolution. In particular, the rapid development of data science has introduced a wealth of data-driven approaches for understanding and quantifying predictability. This review summarizes representative achievements, integrating both data-driven and mechanistic perspectives. After a brief introduction to the significance of the topic in focus, we will explore three core aspects: the predictability of time series, the predictability of network structures, and the predictability of dynamical processes. Finally, we will provide extensive application examples across various fields and outline open challenges for future research.
AC Dynamics-aware Trajectory Optimization with Binary Enforcement for Adaptive UFLS Design
The high penetration of distributed energy resources, resulting in backfeed of power at the transmission and distribution interface, is causing conventional underfrequency load shedding (UFLS) schemes to become nonconforming. Adaptive schemes that update UFLS relay settings recursively in time offer a solution, but existing adaptive techniques that obtain UFLS relay settings with linearized or reduced-order model formulations fail to capture AC nonlinear network behavior. In practice, this will result in relays unable to restore system frequency during adverse disturbances. We formulate an adaptive UFLS problem as a trajectory optimization and include the full AC nonlinear network dynamics to ensure AC feasibility and time-coordinated control actions. We include binary decisions to model relay switching action and time-delayed multi-stage load-shedding. However, this formulation results in an intractable MINLP problem. To enforce model tractability, we relax these binary variables into continuous surrogates and reformulate the MINLP as a sequence of NLPs. We solve the NLPs with a homotopy-driven method that enforces near-integer-feasible solutions. We evaluate the framework on multiple synthetic transmission systems and demonstrate that it scales efficiently to networks exceeding 1500+ nodes with over 170k+ continuous and 73k+ binary decision variables, while successfully recovering binary-feasible solutions that arrest the frequency decline during worst-case disturbance.
Towards Smart Manufacturing Metaverse via Digital Twinning in Extended Reality
The rapid evolution of modern manufacturing systems is driven by the integration of emerging metaverse technologies such as artificial intelligence (AI), digital twin (DT) with different forms of extended reality (XR) like virtual reality (VR), augmented reality (AR), and mixed reality (MR). These advances confront manufacturing workers with complex and evolving environments that demand digital literacy for problem solving in the future workplace. However, manufacturing industry faces a critical shortage of skilled workforce with digital literacy in the world. Further, global pandemic has significantly changed how people work and collaborate digitally and remotely. There is an urgent need to rethink digital platformization and leverage emerging technologies to propel industrial evolution toward human-centered manufacturing metaverse (MfgVerse). This paper presents a forward-looking perspective on the development of smart MfgVerse, highlighting current efforts in learning factory, cognitive digital twinning, and the new sharing economy of manufacturing-as-a-service (MaaS). MfgVerse is converging into multiplex networks, including a social network of human stakeholders, an interconnected network of manufacturing things or agents (e.g., machines, robots, facilities, material handling systems), a network of digital twins of physical things, as well as auxiliary networks of sales, supply chain, logistics, and remanufacturing systems. We also showcase the design and development of a learning factory for workforce training in extended reality. Finally, future directions, challenges, and opportunities are discussed for human-centered manufacturing metaverse. We hope this work helps stimulate more comprehensive studies and in-depth research efforts to advance MfgVerse technologies.
An ANN-Enhanced Approach for Flatness-Based Constrained Control of Nonlinear Systems
Neural networks have proven practical for a synergistic combination of advanced control techniques. This work analyzes the implementation of rectified linear unit neural networks to achieve constrained control in differentially flat systems. Specifically, the class of flat systems enjoys the benefit of feedback linearizability, i.e., the systems can be linearized by means of a proper variable transformation. However, the price for linearizing the dynamics is that the constraint descriptions are distorted geometrically. Our results show that, by using neural networks, these constraints can be represented as a union of polytopes, enabling the use of mixed-integer programming tools to guarantee constraint satisfaction. We further analyze the integration of the characterization into efficient settings such as control Lyapunov function-based and model predictive control (MPC). Interestingly, this description also allows us to explicitly compute the solution of the MPC problem for the nonlinear system. Several examples are provided to illustrate the effectiveness of our framework.
FlipDyn with Control: Resource Takeover Games with Dynamics
We introduce FlipDyn with control, a finite-horizon zero-sum resource takeover game, where a defender and an adversary decide when to takeover and how to control a common resource. At each discrete-time step, the players can take over or retain control, incurring state and control-dependent costs. The system is modeled as a hybrid dynamical system, with a discrete \texttt{FlipDyn} state determining control authority. Our contributions are: (i) For arbitrary non-negative costs, we derive the saddle-point value of the \texttt{FlipDyn} game and the corresponding Nash equilibria (NE) takeover strategies. (ii) For linear dynamical systems with quadratic costs, we establish sufficient conditions under which the game admits an NE. (iii) For scalar linear dynamical systems with quadratic costs, we derive parameterized NE takeover strategies and saddle-point values independent of the continuous state. (iv) For higher-dimensional linear dynamical systems with quadratic costs, we derive approximate NE takeover strategies and control policies, and compute bounds on the saddle-point values. We validate our results through a numerical study on adversarial control of a linear system.
comment: 17 Pages, 2 figures. Under review at IEEE TAC
Physics-Informed Deep B-Spline Networks
Physics-informed machine learning offers a promising framework for solving complex partial differential equations (PDEs) by integrating observational data with governing physical laws. However, learning PDEs with varying parameters and changing initial conditions and boundary conditions (ICBCs) with theoretical guarantees remains an open challenge. In this paper, we propose physics-informed deep B-spline networks, a novel technique that approximates a family of PDEs with different parameters and ICBCs by learning B-spline control points through neural networks. The proposed B-spline representation reduces the learning task from predicting solution values over the entire domain to learning a compact set of control points, enforces strict compliance to initial and Dirichlet boundary conditions by construction, and enables analytical computation of derivatives for incorporating PDE residual losses. While existing approximation and generalization theories are not applicable in this setting - where solutions of parametrized PDE families are represented via B-spline bases - we fill this gap by showing that B-spline networks are universal approximators for such families under mild conditions. We also derive generalization error bounds for physics-informed learning in both elliptic and parabolic PDE settings, establishing new theoretical guarantees. Finally, we demonstrate in experiments that the proposed technique has improved efficiency-accuracy tradeoffs compared to existing techniques in a dynamical system problem with discontinuous ICBCs and can handle nonhomogeneous ICBCs and non-rectangular domains.
Passivity-Based Robust Shape Control of a Cable-Driven Solar Sail Boom for the CABLESSail Concept
Solar sails provide a means of propulsion using solar radiation pressure, which offers the possibility of exciting new spacecraft capabilities. However, solar sails have attitude control challenges because of the significant disturbance torques that they encounter due to imperfections in the sail and its supporting structure, as well as limited actuation capabilities. The Cable-Actuated Bio-inspired Lightweight Elastic Solar Sail (CABLESSail) concept was previously proposed to overcome these challenges by controlling the shape of the sail through cable actuation. The structural flexibility of CABLESSail introduces control challenges, which necessitate the design of a robust feedback controller for this system. The purpose of the proposed research here is to design a robust controller to ensure precise and reliable control of CABLESSail's boom. Taking into account the system dynamics and the dynamic properties of the CABLESSail concept, a passivity-based proportional-derivative (PD) controller for a single boom on the CABLESSail system is designed. To reach the nonzero desired setpoints, a feedforward input is additionally applied to the control law and a time-varying feedforward input is used instead of the constant one to effectively track a time-varying desired boom tip deflection. This control law is assessed by numerical simulations and by tests using a smaller-scale prototype of Solar Cruiser. Both the simulation and the test results show that this PD control with the time-varying feedforward input robustly controls the flexible cable-actuated solar sail.
comment: Submitted to Acta Astronautica
Whole-Body Model-Predictive Control of Legged Robots with MuJoCo
We demonstrate the surprising real-world effectiveness of a very simple approach to whole-body model-predictive control (MPC) of quadruped and humanoid robots: the iterative LQR (iLQR) algorithm with MuJoCo dynamics and finite-difference approximated derivatives. Building upon the previous success of model-based behavior synthesis and control of locomotion and manipulation tasks with MuJoCo in simulation, we show that these policies can easily generalize to the real world with few sim-to-real considerations. Our baseline method achieves real-time whole-body MPC on a variety of hardware experiments, including dynamic quadruped locomotion, quadruped walking on two legs, and full-sized humanoid bipedal locomotion. We hope this easy-to-reproduce hardware baseline lowers the barrier to entry for real-world whole-body MPC research and contributes to accelerating research velocity in the community. Our code and experiment videos will be available online at:https://johnzhang3.github.io/mujoco_ilqr
comment: under review
The impact of large-scale EV charging on the real-time operation of distribution systems: A comprehensive review
With the large-scale integration of electric vehicles (EVs) in the distribution grid, the unpredictable nature of EV charging introduces considerable uncertainties to the grid's real-time operations. This can exacerbate load fluctuations, compromise power quality, and pose risks to the grid's stability and security. However, due to their dual role as controllable loads and energy storage devices, EVs have the potential to mitigate these fluctuations, balance the variability of renewable energy sources, and provide ancillary services that support grid stability. By leveraging the bidirectional flow of information and energy in smart grids, the adverse effects of EV charging can be minimized and even converted into beneficial outcomes through effective real-time management strategies. This paper explores the negative impacts of EV charging on the distribution system's real-time operations and outlines methods to transform these challenges into positive contributions. Additionally, it provides an in-depth analysis of the real-time management system for EV charging, focusing on state estimation and management strategies.
Guided Multi-Fidelity Bayesian Optimization for Data-driven Controller Tuning with Digital Twins
We propose a \textit{guided multi-fidelity Bayesian optimization} framework for data-efficient controller tuning that integrates corrected digital twin simulations with real-world measurements. The method targets closed-loop systems with limited-fidelity simulations or inexpensive approximations. To address model mismatch, we build a multi-fidelity surrogate with a learned correction model that refines digital twin estimates using real data. An adaptive cost-aware acquisition function balances expected improvement, fidelity, and sampling cost. Our method ensures adaptability as new measurements arrive. The digital twin accuracy is re-estimated, dynamically adapting both cross-source correlations and the acquisition function. This ensures that accurate simulations are used more frequently, while inaccurate simulation data are appropriately downweighted. Experiments on robotic drive hardware and supporting numerical studies demonstrate that our method enhances tuning efficiency compared to standard Bayesian optimization and multi-fidelity methods.
comment: This work has been submitted to IEEE Robotics and Automation Letters (RA-L) for review
Axial current as the origin of quantum intrinsic orbital angular momentum
We show that the axial current density is the physical origin (generator) of quantum intrinsic orbital angular momentum (IOAM). Without the axial current, the IOAM of particles vanishes. Broadly speaking, we argue that the spiral or interference characteristics of the axial current density determine the occurrence of nonlinear or tunneling effects in any spacetime-dependent quantum systems. Our findings offer a comprehensive theoretical framework that addresses the limitations of Keldysh's ionization theory and provides new insights into the angular momentum properties of quantum systems, particularly in tunneling-dominated regimes. Using Wigner function methods, fermionic generalized two-level model, and Berry phase simulations, we predict that IOAM effect can persist even in pure quantum tunneling processes. These results open the door for experimental verification of IOAM effects in future high-intensity QED experiments, such as those using X-ray free electron lasers.
comment: 8 pages, 2 figures
Systems and Control (CS)
Bio-inspired Microgrid Management based on Brain's Sensorimotor Gating
Microgrids are emerging as key enablers of resilient, sustainable, and intelligent power systems, but they continue to face challenges in dynamic disturbance handling, protection coordination, and uncertainty. Recent efforts have explored Brain Emotional Learning (BEL) controllers as bio-inspired solutions for microgrid control. Building on this growing trajectory, this article introduces a new paradigm for Neuro-Microgrids, inspired by the brain's sensorimotor gating mechanisms, specifically the Prepulse Inhibition (PPI) and Prepulse Facilitation (PPF). Sensorimotor gating offers a biological model for selectively suppressing or amplifying responses depending on contextual relevance. By mapping these principles onto the hierarchical control architecture of microgrids, we propose a Sensorimotor Gating-Inspired Neuro-Microgrid (SG-NMG) framework. In this architecture, PPI-like control decisions correspond to protective damping in primary and secondary management of microgrids, whereas PPF-like decisions correspond to adaptive amplification of corrective control actions. The framework is presented through analytical workflow design, neuro-circuitry analogies, and integration with machine learning methods. Finally, open challenges and research directions are outlined, including the mathematical modeling of gating, digital twin validation, and cross-disciplinary collaboration between neuroscience and industrial power systems. The resulting paradigm highlights sensorimotor gating as a promising framework for designing self-protective, adaptive, and resilient microgrids.
Braking within Barriers: Constructive Safety-Critical Control for Input-Constrained Vehicles via the Backup Set Method
This paper presents a safety-critical control framework to maintain bounded lateral motions for vehicles braking on asymmetric surfaces. We synthesize a brake controller that assists drivers and guarantees safety against excessive lateral motions (i.e., prevents the vehicle from spinning out) while minimizing the stopping distance. We address this safety-critical control problem in the presence of input constraints, since braking forces are limited by the available friction on the road. We use backup control barrier functions for safe control design. As this approach requires the construction of a backup set and a backup controller, we propose a novel, systematic method to creating valid backup set-backup controller pairs based on feedback linearization and continuous-time Lyapunov equations. We use simple examples to demonstrate our proposed safety-critical control method. Finally, we implement our approach on a four-wheel vehicle model for braking on asymmetric surfaces and present simulation results.
comment: Submitted to the IEEE Transactions on Automation Science and Engineering. 14 pages, 10 figures
Cavity Duplexer Tuning with 1d Resnet-like Neural Networks
This paper presents machine learning method for tuning of cavity duplexer with a large amount of adjustment screws. After testing we declined conventional reinforcement learning approach and reformulated our task in the supervised learning setup. The suggested neural network architecture includes 1d ResNet-like backbone and processing of some additional information about S-parameters, like the shape of curve and peaks positions and amplitudes. This neural network with external control algorithm is capable to reach almost the tuned state of the duplexer within 4-5 rotations per screw.
Integrating Conductor Health into Dynamic Line Rating and Unit Commitment under Uncertainty
Dynamic line rating (DLR) enables greater utilization of existing transmission lines by leveraging real-time weather data. However, the elevated temperature operation (ETO) of conductors under DLR is often overlooked, despite its long-term impact on conductor health. This paper addresses this issue by 1) quantifying depreciation costs associated with ETO and 2) proposing a Conductor Health-Aware Unit Commitment (CHA-UC) that internalizes these costs in operational decisions. The CHA-UC incorporates a robust linear approximation of conductor temperature and integration of expected depreciation costs due to hourly ETO into the objective function. Case studies on the Texas 123-bus backbone test system using NOAA weather data demonstrate that the proposed CHA-UC model reduces the total cost by 0.8% and renewable curtailment by 84%compared to static line rating (SLR), while conventional DLR operation without risk consideration resulted in higher costs due to excessive ETO. Further analysis of the commitment decisions and the line temperature statistics confirms that the CHA-UC achieves safer line flows by shifting generator commitments. Finally, we examine the emergent correlation between wind generation and DLR forecast errors, and show that CHA-UC adaptively manages this effect by relaxing flows for risk-hedging conditions while tightening flows for risk-amplifying ones.
Sugar Shack 4.0: Practical Demonstration of an IIoT-Based Event-Driven Automation System
This paper presents a practical alternative to programmable-logic-controller-centric automation by implementing an event-driven architecture built with industrial Internet of Things tools. A layered design on a local edge server (i) abstracts actuators, (ii) enforces mutual exclusion of shared physical resources through an interlock with priority queueing, (iii) composes deterministic singular operations, and (iv) orchestrates complete workflows as state machines in Node-RED, with communication over MQTT. The device layer uses low-cost ESP32-based gateways to interface sensors and actuators, while all automation logic is offloaded to the server side. As part of a larger project involving the first scientifically-documented integration of Industry 4.0 technologies in a maple syrup boiling center, this work demonstrates the deployment of the proposed system as a case-study. Evaluation over an entire production season shows median message time of flight around one tenth of a second, command issuance-to-motion latencies of about two to three seconds, and command completion near six seconds dominated by actuator mechanics; operation runtimes span tens of seconds to minutes. These results indicate that network and orchestration overheads are negligible relative to process dynamics, enabling modular, distributed control without compromising determinism or fault isolation. The approach reduces material and integration effort, supports portable containerized deployment, and naturally enables an edge/cloud split in which persistence and analytics are offloaded while automation remains at the edge.
comment: 10 pages, 15 figures
Mitigating Underwater Noise from Offshore Wind Turbines via Individual Pitch Control
This paper proposes a pitch control strategy to mitigate the underwater acoustic footprint of offshore wind turbines, a measure that will soon become necessary to minimize impacts on marine life, which rely on sound for communication, navigation, and survival. First, we quantify the underwater acoustic signature of blade-generated aerodynamic noise from three reference turbines, the NREL 5 MW, DTU 10 MW, and IEA 22 MW, using coupling blade element momentum and coupled air-water acoustic propagation modeling. Second, we propose and implement an open-loop individual pitch control (IPC) strategy that modulates the pitch of the blade at the blade passing frequency to attenuate the overall sound pressure level (OSPL) and the amplitude modulation (AM) of the transmitted noise. Third, we benchmark IPC performance against conventional pitch schemes. The results indicate that up to 5 dB reductions in OSPL and a decrease in AM depth 20% can be achieved with a pitch variation of $\Delta\theta\approx 5^\circ$, with small losses (5-10%) in energy capture. These findings highlight a previously underappreciated noise pathway and demonstrate that targeted blade-pitch modulation can mitigate its impact.
Cross-border offshore hydrogen trade and carbon mitigation for Europe's net zero transition
European countries are ambitious in both the net-zero transition and offshore energy resource development. The Irish and UK governments announced their commitments to offshore wind capacities - 37 and 125 GW, respectively, in 2050, more than two times higher than their projected power demands. While other continental countries, such as Germany, are calling for cleaner fuel resources. Exporting surplus offshore green hydrogen and bridging supply and demand could be pivotal in carbon emission mitigation for Europe. Yet, the potentials of these Island countries, are usually underestimated. This paper developed a bottom-up method to investigate the role of offshore hydrogen from Ireland and the UK in the decarbonisation of the entire Europe. We evaluate the future hydrogen/ammonia trading and the contributions of each country in carbon emission mitigation, considering their relative cost-competitiveness in offshore hydrogen production, domestic hourly power and gas system operation, and international shipping costs. Results indicate that the offshore green hydrogen could reduce 175.16 Mt/year of carbon dioxide emissions in Europe. The UK will be the largest hydrogen supplier from 2030 to 2040, while surpassed by Ireland in 2050, with 161 TWh of hydrogen exports to France and Spain. The offshore green hydrogen can contribute to 175.16 Mt of annual carbon dioxide emission reductions in total. This general flow of hydrogen from the West to the East not only facilitates Europe's net-zero progress, but also reshapes the energy supply structure and helps to ensure energy security across the European continent.
Freehand 3D Ultrasound Imaging: Sim-in-the-Loop Probe Pose Optimization via Visual Servoing
Freehand 3D ultrasound (US) imaging using conventional 2D probes offers flexibility and accessibility for diverse clinical applications but faces challenges in accurate probe pose estimation. Traditional methods depend on costly tracking systems, while neural network-based methods struggle with image noise and error accumulation, compromising reconstruction precision. We propose a cost-effective and versatile solution that leverages lightweight cameras and visual servoing in simulated environments for precise 3D US imaging. These cameras capture visual feedback from a textured planar workspace. To counter occlusions and lighting issues, we introduce an image restoration method that reconstructs occluded regions by matching surrounding texture patterns. For pose estimation, we develop a simulation-in-the-loop approach, which replicates the system setup in simulation and iteratively minimizes pose errors between simulated and real-world observations. A visual servoing controller refines the alignment of camera views, improving translational estimation by optimizing image alignment. Validations on a soft vascular phantom, a 3D-printed conical model, and a human arm demonstrate the robustness and accuracy of our approach, with Hausdorff distances to the reference reconstructions of 0.359 mm, 1.171 mm, and 0.858 mm, respectively. These results confirm the method's potential for reliable freehand 3D US reconstruction.
Adaptive Legged Locomotion via Online Learning for Model Predictive Control
We provide an algorithm for adaptive legged locomotion via online learning and model predictive control. The algorithm is composed of two interacting modules: model predictive control (MPC) and online learning of residual dynamics. The residual dynamics can represent modeling errors and external disturbances. We are motivated by the future of autonomy where quadrupeds will autonomously perform complex tasks despite real-world unknown uncertainty, such as unknown payload and uneven terrains. The algorithm uses random Fourier features to approximate the residual dynamics in reproducing kernel Hilbert spaces. Then, it employs MPC based on the current learned model of the residual dynamics. The model is updated online in a self-supervised manner using least squares based on the data collected while controlling the quadruped. The algorithm enjoys sublinear \textit{dynamic regret}, defined as the suboptimality against an optimal clairvoyant controller that knows how the residual dynamics. We validate our algorithm in Gazebo and MuJoCo simulations, where the quadruped aims to track reference trajectories. The Gazebo simulations include constant unknown external forces up to $12\boldsymbol{g}$, where $\boldsymbol{g}$ is the gravity vector, in flat terrain, slope terrain with $20\degree$ inclination, and rough terrain with $0.25m$ height variation. The MuJoCo simulations include time-varying unknown disturbances with payload up to $8~kg$ and time-varying ground friction coefficients in flat terrain.
comment: 9 pages
A Predictive Flexibility Aggregation Method for Low Voltage Distribution System Control
This paper presents a predictive control strategy to manage low-voltage distribution systems. The proposed approach relies on an aggregate of the flexibility at the residential unit level into a three-dimensional chart that represents the injected active and reactive power, and the flexibility cost. First, this method solves a multiparametric optimization problem offline at the residential unit level to aggregate the flexibility of the assets. Then, a semi-explicit model predictive control problem is solved to account for forecasts. By combining the results of these problems with measurements, the method generates the desired flexibility chart. The proposed approach is compatible with realtime control requirements, as heavy computations are performed offline locally, making it naturally parallelizable. By linking realtime flexibility assessment with energy scheduling, our approach enables efficient, low-cost, and privacy-preserving management of low-voltage distribution systems. We validate this method on a low-voltage network of 5 buses by comparing it with an ideal technique.
comment: 8 pages, 6 figures
Observer Design over Hypercomplex Quaternions
We develop observer design over hypercomplex quaternions in a characteristic-polynomial-free framework. Using the standard right-module convention, we derive a right observable companion form and its companion polynomial that encodes error dynamics via right-eigenvalue similarity classes. The design mirrors the real/complex case - coefficient updates in companion coordinates, followed by a similarity back - yet avoids determinants, characteristic/minimal polynomials, and Cayley-Hamilton identities that do not transfer to quaternions. We also give an Ackermann-type construction for the important case of closed-loop companion polynomials with real coefficients, ensuring similarity-equivariant evaluation. The results yield simple recipes for full-order observers directly over quaternions, clarify the role of right spectra and their similarity classes, and pinpoint when classical one-shot formulas remain valid. Numerical examples illustrate the method and advantages over vectorized or complex-adjoint surrogates.
comment: Accepted for presentation at the 24th European Control Conference (ECC 2026), Reykjavik, Iceland. This work was co-funded by the European Union under the project ROBOPROX (reg. no. CZ.02.01.01/00/22 008/0004590)
Active Inverse Methods in Stackelberg Games with Bounded Rationality
Inverse game theory is utilized to infer the cost functions of all players based on game outcomes. However, existing inverse game theory methods do not consider the learner as an active participant in the game, which could significantly enhance the learning process. In this paper, we extend inverse game theory to active inverse methods. For Stackelberg games with bounded rationality, the leader, acting as a learner, actively chooses actions to better understand the follower's cost functions. First, we develop a method of active learning by leveraging Fisher information to maximize information gain about the unknown parameters and prove the consistency and asymptotic normality. Additionally, when leaders consider its cost, we develop a method of active inverse game to balance exploration and exploitation, and prove the consistency and asymptotic Stackelberg equilibrium with quadratic cost functions. Finally, we verify the properties of these methods through simulations in the quadratic case and demonstrate that the active inverse game method can achieve Stackelberg equilibrium more quickly through active exploration.
Hypergame-based Cognition Modeling and Intention Interpretation for Human-Driven Vehicles in Connected Mixed Traffic
With the practical implementation of connected and autonomous vehicles (CAVs), the traffic system is expected to remain a mix of CAVs and human-driven vehicles (HVs) for the foreseeable future. To enhance safety and traffic efficiency, the trajectory planning strategies of CAVs must account for the influence of HVs, necessitating accurate HV trajectory prediction. Current research often assumes that human drivers have perfect knowledge of all vehicles' objectives, an unrealistic premise. This paper bridges the gap by leveraging hypergame theory to account for cognitive and perception limitations in HVs. We model human bounded rationality without assuming them to be merely passive followers and propose a hierarchical cognition modeling framework that captures cognitive relationships among vehicles. We further analyze the cognitive stability of the system, proving that the strategy profile where all vehicles adopt cognitively equilibrium strategies constitutes a hyper Nash equilibrium when CAVs accurately learn HV parameters. To achieve this, we develop an inverse learning algorithm for distributed intention interpretation via vehicle-to-everything (V2X) communication, which extends the framework to both offline and online scenarios. Additionally, we introduce a distributed trajectory prediction and planning approach for CAVs, leveraging the learned parameters in real time. Simulations in highway lane-changing scenarios demonstrate the proposed method's accuracy in parameter learning, robustness to noisy trajectory observations, and safety in HV trajectory prediction. The results validate the effectiveness of our method in both offline and online implementations.
Hypergraph Contrastive Sensor Fusion for Multimodal Fault Diagnosis in Induction Motors
Reliable induction motor (IM) fault diagnosis is vital for industrial safety and operational continuity, mitigating costly unplanned downtime. Conventional approaches often struggle to capture complex multimodal signal relationships, are constrained to unimodal data or single fault types, and exhibit performance degradation under noisy or cross-domain conditions. This paper proposes the Multimodal Hypergraph Contrastive Attention Network (MM-HCAN), a unified framework for robust fault diagnosis. To the best of our knowledge, MM-HCAN is the first to integrate contrastive learning within a hypergraph topology specifically designed for multimodal sensor fusion, enabling the joint modelling of intra- and inter-modal dependencies and enhancing generalisation beyond Euclidean embedding spaces. The model facilitates simultaneous diagnosis of bearing, stator, and rotor faults, addressing the engineering need for consolidated di- agnostic capabilities. Evaluated on three real-world benchmarks, MM-HCAN achieves up to 99.82% accuracy with strong cross-domain generalisation and resilience to noise, demonstrating its suitability for real-world deployment. An ablation study validates the contribution of each component. MM-HCAN provides a scalable and robust solution for comprehensive multi-fault diagnosis, supporting predictive maintenance and extended asset longevity in industrial environments.
comment: Submitted to IEEE Sensors Journal
A Tsetlin Machine Image Classification Accelerator on a Flexible Substrate
This paper introduces the first implementation of digital Tsetlin Machines (TMs) on flexible integrated circuit (FlexIC) using Pragmatic's 600nm IGZO-based FlexIC technology. TMs, known for their energy efficiency, interpretability, and suitability for edge computing, have previously been limited by the rigidity of conventional silicon-based chips. We develop two TM inference models as FlexICs: one achieving 98.5% accuracy using 6800 NAND2 equivalent logic gates with an area of 8X8 mm2, and a second more compact version achieving slightly lower prediction accuracy of 93% but using only 1420 NAND2 equivalent gates with an area of 4X4 mm2, both of which are custom-designed for an 8X8-pixel handwritten digit recognition dataset. The paper demonstrates the feasibility of deploying flexible TM inference engines into wearable healthcare and edge computing applications.
comment: accepted by International Symposium on the Tsetlin Machine (ISTM) 2025
Modelling-driven requirements for Error Field Control Coil application to initial JT-60SA plasmas
JT-60SA is a large superconducting tokamak built in Naka, Japan. After the successful achievement of its first MA-class plasma, the installation of several additional sub-systems, including a set of non-axisymmetric Error Field Correction Coils (EFCC), is ongoing. Optimization of future JT-60SA plasma scenarios will critically depend on the correct use of EFCC, including careful fulfillment of system specifications. In addition to that, preparation and risk mitigation of early ITER operations will greatly benefit from the experience gained by early EFCC application to JT-60SA experiments, in particular to optimize error field detection and control strategies. In this work, EFCC application in JT-60SA Initial Research Phase I perspective scenarios is modeled including plasma response. Impact of (Resonant) Magnetic Perturbations on the different plasma scenarios is assessed for both core and pedestal regions by the linear resistive MHD code MARS-F. The dominant core response to EFs is discussed case by case and compared to mode locking thresholds from literature. Typical current/voltage amplitudes and wave-forms are then compared to EFCC specifications in order to assess a safe operational space.
Balancing Fairness and Performance in Multi-User Spark Workloads with Dynamic Scheduling (extended version) SoCC'25
Apache Spark is a widely adopted framework for large-scale data processing. However, in industrial analytics environments, Spark's built-in schedulers, such as FIFO and fair scheduling, struggle to maintain both user-level fairness and low mean response time, particularly in long-running shared applications. Existing solutions typically focus on job-level fairness which unintentionally favors users who submit more jobs. Although Spark offers a built-in fair scheduler, it lacks adaptability to dynamic user workloads and may degrade overall job performance. We present the User Weighted Fair Queuing (UWFQ) scheduler, designed to minimize job response times while ensuring equitable resource distribution across users and their respective jobs. UWFQ simulates a virtual fair queuing system and schedules jobs based on their estimated finish times under a bounded fairness model. To further address task skew and reduce priority inversions, which are common in Spark workloads, we introduce runtime partitioning, a method that dynamically refines task granularity based on expected runtime. We implement UWFQ within the Spark framework and evaluate its performance using multi-user synthetic workloads and Google cluster traces. We show that UWFQ reduces the average response time of small jobs by up to 74% compared to existing built-in Spark schedulers and to state-of-the-art fair scheduling algorithms.
comment: This paper is an extended version of a paper accepted at the ACM Symposium on Cloud Computing (SoCC'25) that contains a proof of correctness
Recursive Inference for Heterogeneous Multi-Output GP State-Space Models with Arbitrary Moment Matching
Accurate learning of system dynamics is becoming increasingly crucial for advanced control and decision-making in engineering. However, real-world systems often exhibit multiple channels and highly nonlinear transition dynamics, challenging traditional modeling methods. To enable online learning for these systems, this paper formulates the system as Gaussian process state-space models (GPSSMs) and develops a recursive learning method. The main contributions are threefold. First, a heterogeneous multi-output kernel is designed, allowing each output dimension to adopt distinct kernel types, hyperparameters, and input variables, improving expressiveness in multi-dimensional dynamics learning. Second, an inducing-point management algorithm enhances computational efficiency through independent selection and pruning for each output dimension. Third, a unified recursive inference framework for GPSSMs is derived, supporting general moment matching approaches, including the extended Kalman filter (EKF), unscented Kalman filter (UKF), and assumed density filtering (ADF), enabling accurate learning under strong nonlinearity and significant noise. Experiments on synthetic and real-world datasets show that the proposed method matches the accuracy of SOTA offline GPSSMs with only 1/100 of the runtime, and surpasses SOTA online GPSSMs by around 70% in accuracy under heavy noise while using only 1/20 of the runtime.
TranSimHub:A Unified Air-Ground Simulation Platform for Multi-Modal Perception and Decision-Making
Air-ground collaborative intelligence is becoming a key approach for next-generation urban intelligent transportation management, where aerial and ground systems work together on perception, communication, and decision-making. However, the lack of a unified multi-modal simulation environment has limited progress in studying cross-domain perception, coordination under communication constraints, and joint decision optimization. To address this gap, we present TranSimHub, a unified simulation platform for air-ground collaborative intelligence. TranSimHub offers synchronized multi-view rendering across RGB, depth, and semantic segmentation modalities, ensuring consistent perception between aerial and ground viewpoints. It also supports information exchange between the two domains and includes a causal scene editor that enables controllable scenario creation and counterfactual analysis under diverse conditions such as different weather, emergency events, and dynamic obstacles. We release TranSimHub as an open-source platform that supports end-to-end research on perception, fusion, and control across realistic air and ground traffic scenes. Our code is available at https://github.com/Traffic-Alpha/TranSimHub.
comment: 9 pages, 4 figures
Singularity-free dynamical invariants-based quantum control
State preparation is a cornerstone of quantum technologies, underpinning applications in computation, communication, and sensing. Its importance becomes even more pronounced in non-Markovian open quantum systems, where environmental memory and model uncertainties pose significant challenges to achieving high-fidelity control. Invariant-based inverse engineering provides a principled framework for synthesizing analytic control fields, yet existing parameterizations often lead to experimentally infeasible, singular pulses and are limited to simplified noise models such as those of Lindblad form. Here, we introduce a generalized invariant-based protocol for single-qubit state preparation under arbitrary noise conditions. The control proceeds in two-stages: first, we construct a family of bounded pulses that achieve perfect state preparation in a closed system; second, we identify the optimal member of this family that minimizes the effect of noise. The framework accommodates both (i) characterized noise, enabling noise-aware control synthesis, and (ii) uncharacterized noise, where a noise-agnostic variant preserves robustness without requiring a master-equation description. Numerical simulations demonstrate high-fidelity state preparation across diverse targets while producing smooth, hardware-feasible control fields. This singularity-free framework extends invariant-based control to realistic open-system regimes, providing a versatile route toward robust quantum state engineering on NISQ hardware and other platforms exhibiting non-Markovian dynamics.
Adaptive Cost-Map-based Path Planning in Partially Unknown Environments with Movable Obstacles
Reliable navigation in disaster-response and other unstructured indoor settings requires robots not only to avoid obstacles but also to recognise when those obstacles can be pushed aside. We present an adaptive, LiDAR and odometry-based path-planning framework that embeds this capability into the ROS2 Nav2 stack. A new Movable Obstacles Layer labels all LiDAR returns missing from a prior static map as tentatively movable and assigns a reduced traversal cost. A companion Slow-Pose Progress Checker monitors the ratio of commanded to actual velocity; when the robot slows appreciably, the local cost is raised from light to heavy, and on a stall to lethal, prompting the global planner to back out and re-route. Gazebo evaluations on a Scout Mini, spanning isolated objects and cluttered corridors, show higher goal-reach rates and fewer deadlocks than a no-layer baseline, with traversal times broadly comparable. Because the method relies only on planar scans and CPU-level computation, it suits resource-constrained search and rescue robots and integrates into heterogeneous platforms with minimal engineering. Overall, the results indicate that interaction-aware cost maps are a lightweight, ROS2-native extension for navigating among potentially movable obstacles in unstructured settings. The full implementation will be released as open source athttps://costmap-namo.github.io.
Modeling and Dynamic Simulation of a Hybrid Wind-Wave System on a Hexagonal Semi-Submersible Platform
Offshore renewable energy systems offer promising solutions for sustainable power generation, yet most existing platforms harvest either wind or wave energy in isolation. This study presents a hybrid floating offshore platform that integrates a wind turbine with three oscillating surge wave energy converters (WECs) into a hexagonal semi-submersible structure. In this configuration, the flaps are integrated with the platform geometry to provide both energy extraction and hydrodynamic stability. A modeling and simulation framework was developed using WEC-Sim and benchmarked against the NREL 5 MW semisubmersible reference. Metacentric height analysis confirmed hydrostatic stability across a range of prescribed flap angles. Sensitivity analysis of twelve geometric variables identified flap dimensions and tower length as dominant drivers of stability, energy capture, and tower stress. Time-domain simulations revealed dependence on wave incidence angle, with variations in flap power sharing, capture width ratio (CWR), and platform response. The feasibility of using flap sweeps to modulate pitch motion was also demonstrated. Annual energy production (AEP) estimates based on site-specific data indicate 16.86 GWh from wind and 3.65 GWh from wave energy, with WECs contributing about 18% of the total. These results highlight the potential of integrated wind-wave platforms and point toward future studies on structural modeling and advanced control.
comment: 28 pages, 17 figures
An Iterative Problem-Driven Scenario Reduction Framework for Stochastic Optimization with Conditional Value-at-Risk
Scenario reduction (SR) alleviates the computational complexity of scenario-based stochastic optimization with conditional value-at-risk (SBSO-CVaR) by identifying representative scenarios to depict the underlying uncertainty and tail risks. Existing distribution-driven SR methods emphasize statistical similarity but often exclude extreme scenarios, leading to weak tail-risk awareness and insufficient problem-specific representativeness. Instead, this paper proposes an iterative problem-driven scenario reduction framework. Specifically, we integrate the SBSO-CVaR problem structure into SR process and project the original scenario set from the distribution space onto the problem space. Subsequently, to minimize the SR optimality gap with acceptable computation complexity, we propose a tractable iterative problem-driven scenario reduction (IPDSR) method that selects representative scenarios that best approximate the optimality distribution of the original scenario set while preserving tail risks. Furthermore, the iteration process is rendered as a mixed-integer program to enable scenario partitioning and representative scenarios selection. And ex-post problem-driven evaluation indices are proposed to evaluate the SR performance. Numerical experiments show IPDSR significantly outperforms existing SR methods by achieving an optimality gap of less than 1% within an acceptable computation time.
Comprehensive Dynamic Modeling and Constraint-Aware Air Supply Control for Localized Water Management in Automotive Polymer Electrolyte Membrane Fuel Cells
In this paper, a predictive constraint-aware control scheme is formulated within the Command Governor (CG) framework for localized hydration management of a proton exchange membrane (PEM) fuel cell system. First, a comprehensive nonlinear dynamic model of the fuel cell system is presented which includes a pseudo 2-dimensional (P2D) model of the stack, reactant supply and cooling subsystems. The model captures the couplings among the various subsystems and serves as the basis for designing output feedback controllers to track the optimal set-points of the air supply and cooling systems for power optimization. The closed-loop nonlinear model is then used to analyze the dynamic behavior of membrane hydration near the anode inlet, the driest region of the membrane in a counter-flow configuration, under various operating conditions. A reduced-order linearized model is then derived to approximate hydration behavior with sufficient fidelity for constraint enforcement. This model is used within the CG framework to adjust the air supply set-points when necessary to prevent membrane dry-out. The effectiveness of the proposed approach in maintaining local membrane hydration while closely tracking the requested net power is demonstrated through realistic drive-cycle simulations.
comment: This is a manuscript submitted to Applied Energy
Techno-Economic Feasibility Analysis of Quantum Key Distribution for Power-System Communications
The accelerating digitalization and decentralization of modern power systems expose critical communication infrastructures to escalating cyber risks, particularly under emerging quantum computing threats. This paper presents an integrated techno-economic framework to evaluate the feasibility of Quantum Key Distribution (QKD) for secure power-system communications. A stochastic system model is developed to jointly capture time-varying key demand, QKD supply under optical-loss constraints, station-side buffering, and post-quantum cryptography (PQC) fallback mechanisms. Analytical conditions are derived for service-level assurance, including buffer stability, outage probability, and availability bounds. Building on this, two quantitative metrics, including the Levelized Cost of Security (LCoSec) and Cost of Incremental Security (CIS), are formulated to unify capital, operational, and risk-related expenditures within a discounted net-present-value framework. Using IEEE 118-bus, 123-node, and 39-bus test systems, we conduct discrete-event simulations comparing PQC-only, QKD-only, and Hybrid architectures across multiple topologies and service profiles. Results show that Hybrid architectures dominated by QKD significantly reduce key-outage probability and SLA shortfalls, achieving near-unit availability for real-time and confidentiality-critical services. Economic analyses reveal clear breakeven zones where QKD-enhanced deployments become cost-effective, primarily in metropolitan and distribution-level networks under moderate optical loss and buffer sizing. The proposed framework provides a reproducible, risk-aware decision tool for guiding large-scale, economically justified QKD adoption in future resilient power-system infrastructures.
Quantum-Key-Distribution Authenticated Aggregation and Settlement for Virtual Power Plants
The proliferation of distributed energy resources (DERs) and demand-side flexibility has made virtual power plants (VPPs) central to modern grid operation. Yet their end-to-end business pipeline, covering bidding, dispatch, metering, settlement, and archival, forms a tightly coupled cyber-physical-economic system where secure and timely communication is critical. Under the combined stress of sophisticated cyberattacks and extreme weather shocks, conventional cryptography offers limited long-term protection. Quantum key distribution (QKD), with information-theoretic guarantees, is viewed as a gold standard for securing critical infrastructures. However, limited key generation rates, routing capacity, and system overhead render key allocation a pressing challenge: scarce quantum keys must be scheduled across heterogeneous processes to minimize residual risk while maintaining latency guarantees. This paper introduces a quantum-authenticated aggregation and settlement framework for VPPs. We first develop a system-threat model that connects QKD key generation and routing with business-layer security strategies, authentication strength, refresh frequency, and delay constraints. Building on this, we formulate a key-budgeted risk minimization problem that jointly accounts for economic risk, service-level violations, and key-budget feasibility, and reveal a threshold property linking marginal security value to shadow prices. Case studies on a representative VPP system demonstrate that the proposed approach significantly reduces residual risk and SLA violations, enhances key efficiency and robustness, and aligns observed dynamics with the theoretical shadow price mechanism.
Spatial-to-Spectral Harmonic-Modulated Arrays for 6G Multi-Beam MIMO
This article presents an overview and analysis of spatial-to-spectral harmonic-modulated arrays (SHAs). Compared to traditional analog or digital beamforming arrays, SHAs enable concurrent multi-beamforming without requiring substantial hardware replication. SHAs replace the need for hardware replication with frequency-domain multiplexing. Furthermore, SHAs have the potential to become key contributors to future 6G networks by enabling scalable multi-user communications, joint communication and sensing, and spatial interference mitigation. In addition, an analysis of the SHA's harmonic-modulation waveform and its effects on gain, noise and bandwidth is presented. A comb-like modulation waveform for SHAs that minimizes spectral inefficiency is proposed. Further, an analysis of the SHA's capability to independently steer multiple beams is presented. This capability is quantified in terms of the SHA's spatial-to-spectral degrees of freedom. Lastly, this work introduces a novel SHA architecture that provides three spatial-to-spectral degrees of freedom with minimal hardware replication.
A Motivational Driver Steering Model: Task Difficulty Homeostasis From Control Theory Perspective
A general and psychologically plausible collision avoidance driver model can improve transportation safety significantly. Most computational driver models found in the literature have used control theory methods only, and they are not established based on psychological theories. In this paper, a unified approach is presented based on concepts taken from psychology and control theory. The "task difficulty homeostasis theory", a prominent motivational theory, is combined with the "Lyapunov stability method" in control theory to present a general and psychologically plausible model. This approach is used to model driver steering behavior for collision avoidance. The performance of this model is measured by simulation of two collision avoidance scenarios at a wide range of speeds from 20 km/h to 170 km/h. The model is validated by experiments on a driving simulator. The results demonstrate that the model follows human behavior accurately with a mean error of 7 percent.
comment: Cognitive systems Research
Personalized Collaborative Learning with Affinity-Based Variance Reduction
Multi-agent learning faces a fundamental tension: leveraging distributed collaboration without sacrificing the personalization needed for diverse agents. This tension intensifies when aiming for full personalization while adapting to unknown heterogeneity levels -- gaining collaborative speedup when agents are similar, without performance degradation when they are different. Embracing the challenge, we propose personalized collaborative learning (PCL), a novel framework for heterogeneous agents to collaboratively learn personalized solutions with seamless adaptivity. Through carefully designed bias correction and importance correction mechanisms, our method AffPCL robustly handles both environment and objective heterogeneity. We prove that AffPCL reduces sample complexity over independent learning by a factor of $\max\{n^{-1}, \delta\}$, where $n$ is the number of agents and $\delta\in[0,1]$ measures their heterogeneity. This affinity-based acceleration automatically interpolates between the linear speedup of federated learning in homogeneous settings and the baseline of independent learning, without requiring prior knowledge of the system. Our analysis further reveals that an agent may obtain linear speedup even by collaborating with arbitrarily dissimilar agents, unveiling new insights into personalization and collaboration in the high heterogeneity regime.
DeGrip: A Compact Cable-driven Robotic Gripper for Desktop Disassembly
Intelligent robotic disassembly of end-of-life (EOL) products has been a long-standing challenge in robotics. While machine learning techniques have shown promise, the lack of specialized hardware limits their application in real-world scenarios. We introduce DeGrip, a customized gripper designed for the disassembly of EOL computer desktops. DeGrip provides three degrees of freedom (DOF), enabling arbitrary configurations within the disassembly environment when mounted on a robotic manipulator. It employs a cable-driven transmission mechanism that reduces its overall size and enables operation in confined spaces. The wrist is designed to decouple the actuation of wrist and jaw joints. We also developed an EOL desktop disassembly environment in Isaac Sim to evaluate the effectiveness of DeGrip. The tasks were designed to demonstrate its ability to operate in confined spaces and disassemble components in arbitrary configurations. The evaluation results confirm the capability of DeGrip for EOL desktop disassembly.
Heterogeneous Multi-Agent Task-Assignment with Uncertain Execution Times and Preferences
While sequential task assignment for a single agent has been widely studied, such problems in a multi-agent setting, where the agents have heterogeneous task preferences or capabilities, remain less well-characterized. We study a multi-agent task assignment problem where a central planner assigns recurring tasks to multiple members of a team over a finite time horizon. For any given task, the members have heterogeneous capabilities in terms of task completion times, task resource consumption (which can model variables such as energy or attention), and preferences in terms of the rewards they collect upon task completion. We assume that the reward, execution time, and resource consumption for each member to complete any task are stochastic with unknown distributions. The goal of the planner is to maximize the total expected reward that the team receives over the problem horizon while ensuring that the resource consumption required for any assigned task is within the capability of the agent. We propose and analyze a bandit algorithm for this problem. Since the bandit algorithm relies on solving an optimal task assignment problem repeatedly, we analyze the achievable regret in two cases: when we can solve the optimal task assignment exactly and when we can solve it only approximately.
comment: 14 pages
Explore-then-Commit for Nonstationary Linear Bandits with Latent Dynamics
We study a nonstationary bandit problem where rewards depend on both actions and latent states, the latter governed by unknown linear dynamics. Crucially, the state dynamics also depend on the actions, resulting in tension between short-term and long-term rewards. We propose an explore-then-commit algorithm for a finite horizon $T$. During the exploration phase, random Rademacher actions enable estimation of the Markov parameters of the linear dynamics, which characterize the action-reward relationship. In the commit phase, the algorithm uses the estimated parameters to design an optimized action sequence for long-term reward. Our proposed algorithm achieves $\tilde{\mathcal{O}}(T^{2/3})$ regret. Our analysis handles two key challenges: learning from temporally correlated rewards, and designing action sequences with optimal long-term reward. We address the first challenge by providing near-optimal sample complexity and error bounds for system identification using bilinear rewards. We address the second challenge by proving an equivalence with indefinite quadratic optimization over a hypercube, a known NP-hard problem. We provide a sub-optimality guarantee for this problem, enabling our regret upper bound. Lastly, we propose a semidefinite relaxation with Goemans-Williamson rounding as a practical approach.
Residual Correction Models for AC Optimal Power Flow Using DC Optimal Power Flow Solutions
Solving the nonlinear AC optimal power flow (AC OPF) problem remains a major computational bottleneck for real-time grid operations. In this paper, we propose a residual learning paradigm that uses fast DC optimal power flow (DC OPF) solutions as a baseline, and learns only the nonlinear corrections required to provide the full AC-OPF solution. The method utilizes a topology-aware Graph Neural Network with local attention and two-level DC feature integration, trained using a physics-informed loss that enforces AC power-flow feasibility and operational limits. Evaluations on OPFData for 57-, 118-, and 2000-bus systems show around 25% lower MSE, up to 3X reduction in feasibility error, and up to 13X runtime speedup compared to conventional AC OPF solvers. The model maintains accuracy under N-1 contingencies and scales efficiently to large networks. These results demonstrate that residual learning is a practical and scalable bridge between linear approximations and AC-feasible OPF, enabling near real-time operational decision making.
Learning a Generalized Model for Substation Level Voltage Estimation in Distribution Networks
Accurate voltage estimation in distribution networks is critical for real-time monitoring and increasing the reliability of the grid. As DER penetration and distribution level voltage variability increase, robust distribution system state estimation (DSSE) has become more essential to maintain safe and efficient operations. Traditional DSSE techniques, however, struggle with sparse measurements and the scale of modern feeders, limiting their scalability to large networks. This paper presents a hierarchical graph neural network for substation-level voltage estimation that exploits both electrical topology and physical features, while remaining robust to the low observability levels common to real-world distribution networks. Leveraging the public SMART-DS datasets, the model is trained and evaluated on thousands of buses across multiple substations and DER penetration scenarios. Comprehensive experiments demonstrate that the proposed method achieves up to 2 times lower RMSE than alternative data-driven models, and maintains high accuracy with as little as 1\% measurement coverage. The results highlight the potential of GNNs to enable scalable, reproducible, and data-driven voltage monitoring for distribution systems.
DRL-Based Resource Allocation for Energy-Efficient IRS-Assisted UAV Spectrum Sharing Systems
Intelligent reflecting surface (IRS) assisted unmanned aerial vehicle (UAV) systems provide a new paradigm for reconfigurable and flexible wireless communications. To enable more energy efficient and spectrum efficient IRS assisted UAV wireless communications, this paper introduces a novel IRS-assisted UAV enabled spectrum sharing system with orthogonal frequency division multiplexing (OFDM). The goal is to maximize the energy efficiency (EE) of the secondary network by jointly optimizing the beamforming, subcarrier allocation, IRS phase shifts, and the UAV trajectory subject to practical transmit power and passive reflection constraints as well as UAV physical limitations. A physically grounded propulsion-energy model is adopted, with its tight upper bound used to form a tractable EE lower bound for the spectrum sharing system. To handle highly non convex, time coupled optimization problems with a mixed continuous and discrete policy space, we develop a deep reinforcement learning (DRL) approach based on the actor critic framework. Extended experiments show the significant EE improvement of the proposed DRL-based approach compared to several benchmark schemes, thus demonstrating the effectiveness and robustness of the proposed approach with mobility.
comment: 7 pages, 3 figures, 1 algorithm. LaTeX class: IEEEtran
Through-the-Earth Magnetic Induction Communication and Networking: A Comprehensive Survey
Magnetic induction (MI) communication (MIC) has emerged as a promising candidate for underground communication networks due to its excellent penetration capabilities. Integration with Space-Air-Ground-Underground (SAGUI) networks in next-generation mobile communication systems requires a well-defined network architecture. A recent discovery in MIC research, MI fast fading, remains in its early stages and presents unique challenges. This paper provides a comprehensive survey on through-the-earth (TTE) MIC, covering MI applications, channel modeling, point-to-point MIC design, relay techniques, network frameworks, and emerging technologies. We compare various MIC applications to highlight TTE-specific challenges and review the principles of channel modeling, addressing both MI slow fading and MI fast fading, along with its potential impact on existing MIC theories. We conduct a fine-grained decomposition of MI channel power gain into four distinct physical parameters, and propose a novel geometric model to analyze MI fast fading. We also summarize MI relay techniques, examine crosstalk effects in relay and high-density networks, and explore key research tasks within the OSI framework for a holistic MI network protocol in SAGUI. To bridge the gaps identified, we propose a MIC framework that supports TCP/IP and Linux, enabling full implementation of existing and emerging MIC solutions. This framework empowers researchers to leverage Linux resources and deep learning platforms for accelerated development of MIC in SAGUI networks. Remaining research challenges, open issues, and promising novel techniques are further identified to advance MIC research.
comment: This work has been accepted by the IEEE Communications Surveys & Tutorials (COMST) for publication. The final published version will be available on IEEE Xplore
Kernel-based Koopman approximants for control: Flexible sampling, error analysis, and stability
Data-driven techniques for analysis, modeling, and control of complex dynamical systems are on the uptake. Koopman theory provides the theoretical foundation for the popular kernel extended dynamic mode decomposition (kEDMD). In this work, we propose a novel kEDMD scheme to approximate nonlinear control systems accompanied by an in-depth error analysis. Key features are regularization-based robustness and an adroit decomposition into micro and macro grids enabling flexible sampling. But foremost, we prove proportionality, i.e., explicit dependence on the distance to the (controlled) equilibrium, of the derived bound on the full approximation error. Leveraging this key property, we rigorously show that asymptotic stability of the data-driven surrogate (control) system implies asymptotic stability of the original (control) system and vice versa.
comment: 29 pages, 5 figures
Contact-Aware Safety in Soft Robots Using High-Order Control Barrier and Lyapunov Functions
Robots operating alongside people, particularly in sensitive scenarios such as aiding the elderly with daily tasks or collaborating with workers in manufacturing, must guarantee safety and cultivate user trust. Continuum soft manipulators promise safety through material compliance, but as designs evolve for greater precision, payload capacity, and speed, and increasingly incorporate rigid elements, their injury risk resurfaces. In this letter, we introduce a comprehensive High-Order Control Barrier Function (HOCBF) + High-Order Control Lyapunov Function (HOCLF) framework that enforces strict contact force limits across the entire soft-robot body during environmental interactions. Our approach combines a differentiable Piecewise Cosserat-Segment (PCS) dynamics model with a convex-polygon distance approximation metric, named Differentiable Conservative Separating Axis Theorem (DCSAT), based on the soft robot geometry to enable real-time, whole-body collision detection, resolution, and enforcement of the safety constraints. By embedding HOCBFs into our optimization routine, we guarantee safety, allowing, for instance, safe navigation in operational space under HOCLF-driven motion objectives. Extensive planar simulations demonstrate that our method maintains safety-bounded contacts while achieving precise shape and task-space regulation. This work thus lays a foundation for the deployment of soft robots in human-centric environments with provable safety and performance.
comment: 8 pages
Pseudo-Kinematic Trajectory Control and Planning of Tracked Vehicles
Tracked vehicles distribute their weight continuously over a large surface area (the tracks). This distinctive feature makes them the preferred choice for vehicles required to traverse soft and uneven terrain. From a robotics perspective, however, this flexibility comes at a cost: the complexity of modelling the system and the resulting difficulty in designing theoretically sound navigation solutions. In this paper, we aim to bridge this gap by proposing a framework for the navigation of tracked vehicles, built upon three key pillars. The first pillar comprises two models: a simulation model and a control-oriented model. The simulation model captures the intricate terramechanics dynamics arising from soil-track interaction and is employed to develop faithful digital twins of the system across a wide range of operating conditions. The control-oriented model is pseudo-kinematic and mathematically tractable, enabling the design of efficient and theoretically robust control schemes. The second pillar is a Lyapunov-based feedback trajectory controller that provides certifiable tracking guarantees. The third pillar is a portfolio of motion planning solutions, each offering different complexity-accuracy trade-offs. The various components of the proposed approach are validated through an extensive set of simulation and experimental data.
RadioDiff-$k^2$: Helmholtz Equation Informed Generative Diffusion Model for Multi-Path Aware Radio Map Construction
In this paper, we propose a novel physics-informed generative learning approach, named RadioDiff-$k^2$, for accurate and efficient multipath-aware radio map (RM) construction. As future wireless communication evolves towards environment-aware paradigms, the accurate construction of RMs becomes crucial yet highly challenging. Conventional electromagnetic (EM)-based methods, such as full-wave solvers and ray-tracing approaches, exhibit substantial computational overhead and limited adaptability to dynamic scenarios. Although existing neural network (NN) approaches have efficient inferencing speed, they lack sufficient consideration of the underlying physics of EM wave propagation, limiting their effectiveness in accurately modeling critical EM singularities induced by complex multipath environments. To address these fundamental limitations, we propose a novel physics-inspired RM construction method guided explicitly by the Helmholtz equation, which inherently governs EM wave propagation. Specifically, based on the analysis of partial differential equations (PDEs), we theoretically establish a direct correspondence between EM singularities, which correspond to the critical spatial features influencing wireless propagation, and regions defined by negative wave numbers in the Helmholtz equation. We then design an innovative dual diffusion model (DM)-based large artificial intelligence framework comprising one DM dedicated to accurately inferring EM singularities and another DM responsible for reconstructing the complete RM using these singularities along with environmental contextual information. Experimental results demonstrate that the proposed RadioDiff-$k^2$ framework achieves state-of-the-art (SOTA) performance in both image-level RM construction and localization tasks, while maintaining inference latency within a few hundred milliseconds.
A kernel-based approach to physics-informed nonlinear system identification
This paper presents a kernel-based framework for physics-informed nonlinear system identification. The key contribution is a structured methodology that extends kernel-based techniques to seamlessly embed partially known physics-based models, improving parameter estimation and overall model accuracy. The proposed method enhances traditional modeling approaches by embedding a parametric model, which provides physical interpretability, with a kernel-based function, which accounts for unmodeled dynamics. The two models' components are identified from the data simultaneously, thereby minimizing a suitable cost that balances the relative importance of the physical and the black-box parts of the model. Additionally, nonlinear state smoothing is employed to address scenarios involving state-space models with not fully measurable states. Numerical simulations on an experimental benchmark system demonstrate the effectiveness of the proposed approach, achieving up to 51% reduction in simulation root mean square error compared to physics-only models and 31% performance improvement over state-of-the-art identification techniques.
comment: [Extended version] This work has been submitted to the IEEE for possible publication
Stochastic Model Predictive Control for Sub-Gaussian Noise
We propose a stochastic Model Predictive Control (MPC) framework that ensures closed-loop chance constraint satisfaction for linear systems with general sub-Gaussian process and measurement noise. By considering sub-Gaussian noise, we can provide guarantees for a large class of distributions, including time-varying distributions. Specifically, we first provide a new characterization of sub-Gaussian random vectors using matrix variance proxy, which can more accurately represent the predicted state distribution. We then derive tail bounds under linear propagation for the new characterization, enabling tractable computation of probabilistic reachable sets of linear systems. Lastly, we utilize these probabilistic reachable sets to formulate a stochastic MPC scheme that provides closed-loop guarantees for general sub-Gaussian noise. We further demonstrate our approach in simulations, including a challenging task of surgical planning from image observations.
comment: 15 pages, 6 figures, submitted to Automatica
Multi-stage model predictive control for slug flow crystallizers using uncertainty-aware surrogate models
This paper presents a novel dynamic model for slug flow crystallizers that addresses the challenges of spatial distribution without backmixing or diffusion, potentially enabling advanced model-based control. The developed model can accurately describe the main characteristics of slug flow crystallizers, including slug-to-slug variability but leads to a high computational complexity due to the consideration of partial differential equations and population balance equations. For that reason, the model cannot be directly used for process optimization and control. To solve this challenge, we propose two different approaches, conformalized quantile regression and Bayesian last layer neural networks, to develop surrogate models with uncertainty quantification capabilities. These surrogates output a prediction of the system states together with an uncertainty of these predictions to account for process variability and model uncertainty. We use the uncertainty of the predictions to formulate a robust model predictive control approach, enabling robust real-time advanced control of a slug flow crystallizer.
Decentralized Real-Time Iterations for Distributed NMPC
This article presents a Real-Time Iteration (RTI) scheme for distributed Nonlinear Model Predictive Control (NMPC). The scheme transfers the well-known RTI approach, a key enabler for many industrial real-time NMPC implementations, to the setting of cooperative distributed control. At each sampling instant, one outer iteration of a bi-level decentralized Sequential Quadratic Programming (dSQP) method is applied to a centralized optimal control problem. This ensures that real-time requirements are met and it facilitates cooperation between subsystems. Combining novel dSQP convergence results with RTI stability guarantees, we prove local exponential stability under standard assumptions on the MPC design with and without terminal constraints. The proposed scheme only requires neighbor-to-neighbor communication and avoids a central coordinator. A numerical example with coupled inverted pendulums demonstrates the efficacy of the approach.
A Set-Theoretic Robust Control Approach for Linear Quadratic Games with Unknown Counterparts
Ensuring robust decision-making in multi-agent systems is challenging when agents have distinct, possibly conflicting objectives and lack full knowledge of each other's strategies. This is apparent in safety-critical applications such as human-robot interaction and assisted driving, where uncertainty arises not only from unknown adversary strategies but also from external disturbances. To address this, the paper proposes a robust adaptive control approach based on linear quadratic differential games. Our method allows a controlled agent to iteratively refine its belief about the adversary strategy and disturbances using a set-membership approach, while simultaneously adapting its policy to guarantee robustness against the uncertain adversary policy and improve performance over time. We formally derive theoretical guarantees on the robustness of the proposed control scheme and its convergence to $\epsilon$-Nash strategies. The effectiveness of our approach is demonstrated in a numerical simulation.
comment: Accepted for publication in the Proceedings of the 64th IEEE Conference on Decision and Control
Wearable and Ultra-Low-Power Fusion of EMG and A-Mode US for Hand-Wrist Kinematic Tracking
Hand gesture recognition based on biosignals has shown strong potential for developing intuitive human-machine interaction strategies that closely mimic natural human behavior. In particular, sensor fusion approaches have gained attention for combining complementary information and overcoming the limitations of individual sensing modalities, thereby enabling more robust and reliable systems. Among them, the fusion of surface electromyography (EMG) and A-mode ultrasound (US) is very promising. However, prior solutions rely on power-hungry platforms unsuitable for multi-day use and are limited to discrete gesture classification. In this work, we present an ultra-low-power (sub-50 mW) system for concurrent acquisition of 8-channel EMG and 4-channel A-mode US signals, integrating two state-of-the-art platforms into fully wearable, dry-contact armbands. We propose a framework for continuous tracking of 23 degrees of freedom (DoFs), 20 for the hand and 3 for the wrist, using a kinematic glove for ground-truth labeling. Our method employs lightweight encoder-decoder architectures with multi-task learning to simultaneously estimate hand and wrist joint angles. Experimental results under realistic sensor repositioning conditions demonstrate that EMG-US fusion achieves a root mean squared error of $10.6^\circ\pm2.0^\circ$, compared to $12.0^\circ\pm1^\circ$ for EMG and $13.1^\circ\pm2.6^\circ$ for US, and a R$^2$ score of $0.61\pm0.1$, with $0.54\pm0.03$ for EMG and $0.38\pm0.20$ for US.
comment: 5 pages, 3 figures
VLMLight: Safety-Critical Traffic Signal Control via Vision-Language Meta-Control and Dual-Branch Reasoning Architecture
Traffic signal control (TSC) is a core challenge in urban mobility, where real-time decisions must balance efficiency and safety. Existing methods - ranging from rule-based heuristics to reinforcement learning (RL) - often struggle to generalize to complex, dynamic, and safety-critical scenarios. We introduce VLMLight, a novel TSC framework that integrates vision-language meta-control with dual-branch reasoning. At the core of VLMLight is the first image-based traffic simulator that enables multi-view visual perception at intersections, allowing policies to reason over rich cues such as vehicle type, motion, and spatial density. A large language model (LLM) serves as a safety-prioritized meta-controller, selecting between a fast RL policy for routine traffic and a structured reasoning branch for critical cases. In the latter, multiple LLM agents collaborate to assess traffic phases, prioritize emergency vehicles, and verify rule compliance. Experiments show that VLMLight reduces waiting times for emergency vehicles by up to 65% over RL-only systems, while preserving real-time performance in standard conditions with less than 1% degradation. VLMLight offers a scalable, interpretable, and safety-aware solution for next-generation traffic signal control.
comment: 25 pages, 15 figures
Learning to Capture Rocks using an Excavator: A Reinforcement Learning Approach with Guiding Reward Formulation
Rock capturing with standard excavator buckets is a challenging task typically requiring the expertise of skilled operators. Unlike soil digging, it involves manipulating large, irregular rocks in unstructured environments where complex contact interactions with granular material make model-based control impractical. Existing autonomous excavation methods focus mainly on continuous media or rely on specialized grippers, limiting their applicability to real-world construction sites. This paper introduces a fully data-driven control framework for rock capturing that eliminates the need for explicit modeling of rock or soil properties. A model-free reinforcement learning agent is trained in the AGX Dynamics simulator using the Proximal Policy Optimization (PPO) algorithm and a guiding reward formulation. The learned policy outputs joint velocity commands directly to the boom, arm, and bucket of a CAT365 excavator model. Robustness is enhanced through extensive domain randomization of rock geometry, density, and mass, as well as the initial configurations of the bucket, rock, and goal position. To the best of our knowledge, this is the first study to develop and evaluate an RL-based controller for the rock capturing task. Experimental results show that the policy generalizes well to unseen rocks and varying soil conditions, achieving high success rates comparable to those of human participants while maintaining machine stability. These findings demonstrate the feasibility of learning-based excavation strategies for discrete object manipulation without requiring specialized hardware or detailed material models.
Grid-Aware Real-Time Dispatch of Microgrid with Generalized Energy Storage: A Prediction-Free Online Optimization Approach
This paper proposes a novel prediction-free two-stage coordinated dispatch framework for the real-time dispatch of grid-connected microgrid with generalized energy storages (GES). The proposed framework explicitly addresses grid awareness, non-anticipativity constraints, and the time-coupling characteristics of GES, providing microgrid operators with a near-optimal, reliable, and adaptable dispatch tool. In the offline stage, we generate the hindsight state-of-charge (SoC) trajectories of GES by solving the multi-period economic dispatch with historical scenarios. Subsequently, leveraging this historical information (SoC trajectories, net loads, and electricity prices), we synthesize and dynamically update online references for both SoC and opportunity cost through kernel regression. We propose an adaptive Lagrange multiplier-based online convex optimization algorithm, which innovatively incorporates reference tracking for global vision and expert-tracking for step-size updates. We provide theoretical proof to show that the proposed OCO algorithm achieves a sublinear bound of both dynamic regret and time-varying hard constraint violation. Numerical studies using ground-truth data from the Australian Energy Market Operator demonstrate that the proposed method outperforms state-of-the-art methods, reducing operational costs by 5.0-6.2% and voltage violations by 0.8-9.1%. These improvements mainly result from mitigating myopia by reference tracking and the adaptive capability provided by dynamically updated references and adaptive Lagrange multipliers. Sensitivity analysis demonstrates the robustness, computational efficiency, and scalability of the proposed method.
A Multimodal Lightweight Approach to Fault Diagnosis of Induction Motors in High-Dimensional Dataset
An accurate AI-based diagnostic system for induction motors (IMs) holds the potential to enhance proactive maintenance, mitigating unplanned downtime and curbing overall maintenance costs within an industrial environment. Notably, among the prevalent faults in IMs, a Broken Rotor Bar (BRB) fault is frequently encountered. Researchers have proposed various fault diagnosis approaches using signal processing (SP), machine learning (ML), deep learning (DL), and hybrid architectures for BRB faults. One limitation in the existing literature is the training of these architectures on relatively small datasets, risking overfitting when implementing such systems in industrial environments. This paper addresses this limitation by implementing large-scale data of BRB faults by using a transfer-learning-based lightweight DL model named ShuffleNetV2 for diagnosing one, two, three, and four BRB faults using current and vibration signal data. Spectral images for training and testing are generated using a Short-Time Fourier Transform (STFT). The dataset comprises 57,500 images, with 47,500 used for training and 10,000 for testing. Remarkably, the ShuffleNetV2 model exhibited superior performance, in less computational cost as well as accurately classifying 98.856% of spectral images. To further enhance the visualization of harmonic sidebands resulting from broken bars, Fast Fourier Transform (FFT) is applied to current and vibration data. The paper also provides insights into the training and testing times for each model, contributing to a comprehensive understanding of the proposed fault diagnosis methodology. The findings of our research provide valuable insights into the performance and efficiency of different ML and DL models, offering a foundation for the development of robust fault diagnosis systems for induction motors in industrial settings.
Feedback Stackelberg-Nash equilibria in difference games with quasi-hierarchical interactions and inequality constraints
In this paper, we study a class of two-player deterministic finite-horizon difference games with coupled inequality constraints, where each player has two types of decision variables: one involving sequential interactions and the other simultaneous interactions. We refer to this class of games as quasi-hierarchical dynamic games and define a solution concept called the feedback Stackelberg-Nash (FSN) equilibrium. Under separability assumption on cost functions, we provide a recursive formulation of the FSN solution using dynamic programming. We show that the FSN solution can be derived from the parametric feedback Stackelberg solution of an associated unconstrained game involving only sequential interactions, with a specific choice of the parameters that satisfy certain implicit complementarity conditions. For the linear-quadratic case, we show that an FSN solution is obtained by reformulating these complementarity conditions as a single large-scale linear complementarity problem. Finally, we illustrate our results using a dynamic duopoly game with production constraints.
Real-Time Linear MPC for Quadrotors on SE(3): An Analytical Koopman-based Realization
This letter presents an analytical linear parameter-varying (LPV) representation of quadrotor dynamics utilizing Koopman theory, facilitating computationally efficient linear model predictive control (LMPC) for real-time trajectory tracking. By leveraging carefully designed Koopman observables, the proposed approach enables a compact lifted-space evolution that mitigates the curse of dimensionality while preserving the nonlinear characteristics of the system. Although model predictive control (MPC) is a powerful strategy for quadrotor control, it faces a trade-off between the high computational cost of nonlinear MPC (NMPC) and the reduced accuracy of LMPC. To address this gap, we introduce KQ-LMPC (Koopman Quasilinear LPV MPC), which leverages the Koopman-lifted LPV formulation to enforce constraints, ensure lower computational burden and real-time feasibility, and deliver tracking performance comparable to NMPC. Experimental validation confirms the effectiveness of the framework in reasonably agile flight. To the best of our knowledge, this is the first experimentally validated LMPC for quadrotors that employs analytically derived Koopman observables without requiring training data.
comment: 6 pages, 3 figures, accepted for publication at IEEE Robotics and Automation Letters
Recursive Gaussian Process State Space Model
Learning dynamical models from data is not only fundamental but also holds great promise for advancing principle discovery, time-series prediction, and controller design. Among various approaches, Gaussian Process State-Space Models (GPSSMs) have recently gained significant attention due to their combination of flexibility and interpretability. However, for online learning, the field lacks an efficient method suitable for scenarios where prior information regarding data distribution and model function is limited. To address this issue, this paper proposes a recursive GPSSM method with adaptive capabilities for both operating domains and Gaussian process (GP) hyperparameters. Specifically, we first utilize first-order linearization to derive a Bayesian update equation for the joint distribution between the system state and the GP model, enabling closed-form and domain-independent learning. Second, an online selection algorithm for inducing points is developed based on informative criteria to achieve lightweight learning. Third, to support online hyperparameter optimization, we recover historical measurement information from the current filtering distribution. Comprehensive evaluations on both synthetic and real-world datasets demonstrate the superior accuracy, computational efficiency, and adaptability of our method compared to state-of-the-art online GPSSM techniques.
A Human-Vector Susceptible-Infected-Susceptible Model for Analyzing and Controlling the Spread of Vector-Borne Diseases
We propose an epidemic model for the spread of vector-borne diseases. The model, which is built extending the classical susceptible-infected-susceptible model, accounts for two populations -- humans and vectors -- and for cross-contagion between the two species, whereby humans become infected upon interaction with carrier vectors, and vectors become carriers after interaction with infected humans. We formulate the model as a system of ordinary differential equations and leverage monotone systems theory to rigorously characterize the epidemic dynamics. Specifically, we characterize the global asymptotic behavior of the disease, determining conditions for quick eradication of the disease (i.e., for which all trajectories converge to a disease-free equilibrium), or convergence to a (unique) endemic equilibrium. Then, we incorporate two control actions: namely, vector control and incentives to adopt protection measures. Using the derived mathematical tools, we assess the impact of these two control actions and determine the optimal control policy.
comment: Published in the Proceedings of the 2025 European Control Conference (ECC)
Robust Closed-Form Control for MIMO Nonlinear Systems under Conflicting Time-Varying Hard and Soft Constraints (extended version)
This paper introduces a novel robust closed-form control law to handle time-varying hard and soft constraints in uncertain high-relative-degree nonlinear MIMO systems. These constraints represent spatiotemporal specifications in mechanical systems' operational space, with hard constraints ensuring safety-critical requirements and soft constraints encoding performance or task objectives. Initially, all constraints are consolidated into two separate scalar time-varying hard and soft constraint functions, whose positive level sets define feasible regions. A closed-form control law is developed to enforce these constraints using appropriately designed reciprocal barriers and nonlinear transformation functions. When conflicts between hard and soft constraints arise, the control law prioritizes hard constraints by virtually relaxing soft constraints via a dynamic relaxation law. Notably, the proposed control law maintains low complexity by avoiding approximation schemes for coping with system uncertainties. Simulation results confirm the effectiveness of the proposed method.
comment: 18 pages, 6 figures
Systems and Control (EESS)
Bio-inspired Microgrid Management based on Brain's Sensorimotor Gating
Microgrids are emerging as key enablers of resilient, sustainable, and intelligent power systems, but they continue to face challenges in dynamic disturbance handling, protection coordination, and uncertainty. Recent efforts have explored Brain Emotional Learning (BEL) controllers as bio-inspired solutions for microgrid control. Building on this growing trajectory, this article introduces a new paradigm for Neuro-Microgrids, inspired by the brain's sensorimotor gating mechanisms, specifically the Prepulse Inhibition (PPI) and Prepulse Facilitation (PPF). Sensorimotor gating offers a biological model for selectively suppressing or amplifying responses depending on contextual relevance. By mapping these principles onto the hierarchical control architecture of microgrids, we propose a Sensorimotor Gating-Inspired Neuro-Microgrid (SG-NMG) framework. In this architecture, PPI-like control decisions correspond to protective damping in primary and secondary management of microgrids, whereas PPF-like decisions correspond to adaptive amplification of corrective control actions. The framework is presented through analytical workflow design, neuro-circuitry analogies, and integration with machine learning methods. Finally, open challenges and research directions are outlined, including the mathematical modeling of gating, digital twin validation, and cross-disciplinary collaboration between neuroscience and industrial power systems. The resulting paradigm highlights sensorimotor gating as a promising framework for designing self-protective, adaptive, and resilient microgrids.
Braking within Barriers: Constructive Safety-Critical Control for Input-Constrained Vehicles via the Backup Set Method
This paper presents a safety-critical control framework to maintain bounded lateral motions for vehicles braking on asymmetric surfaces. We synthesize a brake controller that assists drivers and guarantees safety against excessive lateral motions (i.e., prevents the vehicle from spinning out) while minimizing the stopping distance. We address this safety-critical control problem in the presence of input constraints, since braking forces are limited by the available friction on the road. We use backup control barrier functions for safe control design. As this approach requires the construction of a backup set and a backup controller, we propose a novel, systematic method to creating valid backup set-backup controller pairs based on feedback linearization and continuous-time Lyapunov equations. We use simple examples to demonstrate our proposed safety-critical control method. Finally, we implement our approach on a four-wheel vehicle model for braking on asymmetric surfaces and present simulation results.
comment: Submitted to the IEEE Transactions on Automation Science and Engineering. 14 pages, 10 figures
Cavity Duplexer Tuning with 1d Resnet-like Neural Networks
This paper presents machine learning method for tuning of cavity duplexer with a large amount of adjustment screws. After testing we declined conventional reinforcement learning approach and reformulated our task in the supervised learning setup. The suggested neural network architecture includes 1d ResNet-like backbone and processing of some additional information about S-parameters, like the shape of curve and peaks positions and amplitudes. This neural network with external control algorithm is capable to reach almost the tuned state of the duplexer within 4-5 rotations per screw.
Integrating Conductor Health into Dynamic Line Rating and Unit Commitment under Uncertainty
Dynamic line rating (DLR) enables greater utilization of existing transmission lines by leveraging real-time weather data. However, the elevated temperature operation (ETO) of conductors under DLR is often overlooked, despite its long-term impact on conductor health. This paper addresses this issue by 1) quantifying depreciation costs associated with ETO and 2) proposing a Conductor Health-Aware Unit Commitment (CHA-UC) that internalizes these costs in operational decisions. The CHA-UC incorporates a robust linear approximation of conductor temperature and integration of expected depreciation costs due to hourly ETO into the objective function. Case studies on the Texas 123-bus backbone test system using NOAA weather data demonstrate that the proposed CHA-UC model reduces the total cost by 0.8% and renewable curtailment by 84%compared to static line rating (SLR), while conventional DLR operation without risk consideration resulted in higher costs due to excessive ETO. Further analysis of the commitment decisions and the line temperature statistics confirms that the CHA-UC achieves safer line flows by shifting generator commitments. Finally, we examine the emergent correlation between wind generation and DLR forecast errors, and show that CHA-UC adaptively manages this effect by relaxing flows for risk-hedging conditions while tightening flows for risk-amplifying ones.
Sugar Shack 4.0: Practical Demonstration of an IIoT-Based Event-Driven Automation System
This paper presents a practical alternative to programmable-logic-controller-centric automation by implementing an event-driven architecture built with industrial Internet of Things tools. A layered design on a local edge server (i) abstracts actuators, (ii) enforces mutual exclusion of shared physical resources through an interlock with priority queueing, (iii) composes deterministic singular operations, and (iv) orchestrates complete workflows as state machines in Node-RED, with communication over MQTT. The device layer uses low-cost ESP32-based gateways to interface sensors and actuators, while all automation logic is offloaded to the server side. As part of a larger project involving the first scientifically-documented integration of Industry 4.0 technologies in a maple syrup boiling center, this work demonstrates the deployment of the proposed system as a case-study. Evaluation over an entire production season shows median message time of flight around one tenth of a second, command issuance-to-motion latencies of about two to three seconds, and command completion near six seconds dominated by actuator mechanics; operation runtimes span tens of seconds to minutes. These results indicate that network and orchestration overheads are negligible relative to process dynamics, enabling modular, distributed control without compromising determinism or fault isolation. The approach reduces material and integration effort, supports portable containerized deployment, and naturally enables an edge/cloud split in which persistence and analytics are offloaded while automation remains at the edge.
comment: 10 pages, 15 figures
Mitigating Underwater Noise from Offshore Wind Turbines via Individual Pitch Control
This paper proposes a pitch control strategy to mitigate the underwater acoustic footprint of offshore wind turbines, a measure that will soon become necessary to minimize impacts on marine life, which rely on sound for communication, navigation, and survival. First, we quantify the underwater acoustic signature of blade-generated aerodynamic noise from three reference turbines, the NREL 5 MW, DTU 10 MW, and IEA 22 MW, using coupling blade element momentum and coupled air-water acoustic propagation modeling. Second, we propose and implement an open-loop individual pitch control (IPC) strategy that modulates the pitch of the blade at the blade passing frequency to attenuate the overall sound pressure level (OSPL) and the amplitude modulation (AM) of the transmitted noise. Third, we benchmark IPC performance against conventional pitch schemes. The results indicate that up to 5 dB reductions in OSPL and a decrease in AM depth 20% can be achieved with a pitch variation of $\Delta\theta\approx 5^\circ$, with small losses (5-10%) in energy capture. These findings highlight a previously underappreciated noise pathway and demonstrate that targeted blade-pitch modulation can mitigate its impact.
Cross-border offshore hydrogen trade and carbon mitigation for Europe's net zero transition
European countries are ambitious in both the net-zero transition and offshore energy resource development. The Irish and UK governments announced their commitments to offshore wind capacities - 37 and 125 GW, respectively, in 2050, more than two times higher than their projected power demands. While other continental countries, such as Germany, are calling for cleaner fuel resources. Exporting surplus offshore green hydrogen and bridging supply and demand could be pivotal in carbon emission mitigation for Europe. Yet, the potentials of these Island countries, are usually underestimated. This paper developed a bottom-up method to investigate the role of offshore hydrogen from Ireland and the UK in the decarbonisation of the entire Europe. We evaluate the future hydrogen/ammonia trading and the contributions of each country in carbon emission mitigation, considering their relative cost-competitiveness in offshore hydrogen production, domestic hourly power and gas system operation, and international shipping costs. Results indicate that the offshore green hydrogen could reduce 175.16 Mt/year of carbon dioxide emissions in Europe. The UK will be the largest hydrogen supplier from 2030 to 2040, while surpassed by Ireland in 2050, with 161 TWh of hydrogen exports to France and Spain. The offshore green hydrogen can contribute to 175.16 Mt of annual carbon dioxide emission reductions in total. This general flow of hydrogen from the West to the East not only facilitates Europe's net-zero progress, but also reshapes the energy supply structure and helps to ensure energy security across the European continent.
Freehand 3D Ultrasound Imaging: Sim-in-the-Loop Probe Pose Optimization via Visual Servoing
Freehand 3D ultrasound (US) imaging using conventional 2D probes offers flexibility and accessibility for diverse clinical applications but faces challenges in accurate probe pose estimation. Traditional methods depend on costly tracking systems, while neural network-based methods struggle with image noise and error accumulation, compromising reconstruction precision. We propose a cost-effective and versatile solution that leverages lightweight cameras and visual servoing in simulated environments for precise 3D US imaging. These cameras capture visual feedback from a textured planar workspace. To counter occlusions and lighting issues, we introduce an image restoration method that reconstructs occluded regions by matching surrounding texture patterns. For pose estimation, we develop a simulation-in-the-loop approach, which replicates the system setup in simulation and iteratively minimizes pose errors between simulated and real-world observations. A visual servoing controller refines the alignment of camera views, improving translational estimation by optimizing image alignment. Validations on a soft vascular phantom, a 3D-printed conical model, and a human arm demonstrate the robustness and accuracy of our approach, with Hausdorff distances to the reference reconstructions of 0.359 mm, 1.171 mm, and 0.858 mm, respectively. These results confirm the method's potential for reliable freehand 3D US reconstruction.
Adaptive Legged Locomotion via Online Learning for Model Predictive Control
We provide an algorithm for adaptive legged locomotion via online learning and model predictive control. The algorithm is composed of two interacting modules: model predictive control (MPC) and online learning of residual dynamics. The residual dynamics can represent modeling errors and external disturbances. We are motivated by the future of autonomy where quadrupeds will autonomously perform complex tasks despite real-world unknown uncertainty, such as unknown payload and uneven terrains. The algorithm uses random Fourier features to approximate the residual dynamics in reproducing kernel Hilbert spaces. Then, it employs MPC based on the current learned model of the residual dynamics. The model is updated online in a self-supervised manner using least squares based on the data collected while controlling the quadruped. The algorithm enjoys sublinear \textit{dynamic regret}, defined as the suboptimality against an optimal clairvoyant controller that knows how the residual dynamics. We validate our algorithm in Gazebo and MuJoCo simulations, where the quadruped aims to track reference trajectories. The Gazebo simulations include constant unknown external forces up to $12\boldsymbol{g}$, where $\boldsymbol{g}$ is the gravity vector, in flat terrain, slope terrain with $20\degree$ inclination, and rough terrain with $0.25m$ height variation. The MuJoCo simulations include time-varying unknown disturbances with payload up to $8~kg$ and time-varying ground friction coefficients in flat terrain.
comment: 9 pages
A Predictive Flexibility Aggregation Method for Low Voltage Distribution System Control
This paper presents a predictive control strategy to manage low-voltage distribution systems. The proposed approach relies on an aggregate of the flexibility at the residential unit level into a three-dimensional chart that represents the injected active and reactive power, and the flexibility cost. First, this method solves a multiparametric optimization problem offline at the residential unit level to aggregate the flexibility of the assets. Then, a semi-explicit model predictive control problem is solved to account for forecasts. By combining the results of these problems with measurements, the method generates the desired flexibility chart. The proposed approach is compatible with realtime control requirements, as heavy computations are performed offline locally, making it naturally parallelizable. By linking realtime flexibility assessment with energy scheduling, our approach enables efficient, low-cost, and privacy-preserving management of low-voltage distribution systems. We validate this method on a low-voltage network of 5 buses by comparing it with an ideal technique.
comment: 8 pages, 6 figures
Observer Design over Hypercomplex Quaternions
We develop observer design over hypercomplex quaternions in a characteristic-polynomial-free framework. Using the standard right-module convention, we derive a right observable companion form and its companion polynomial that encodes error dynamics via right-eigenvalue similarity classes. The design mirrors the real/complex case - coefficient updates in companion coordinates, followed by a similarity back - yet avoids determinants, characteristic/minimal polynomials, and Cayley-Hamilton identities that do not transfer to quaternions. We also give an Ackermann-type construction for the important case of closed-loop companion polynomials with real coefficients, ensuring similarity-equivariant evaluation. The results yield simple recipes for full-order observers directly over quaternions, clarify the role of right spectra and their similarity classes, and pinpoint when classical one-shot formulas remain valid. Numerical examples illustrate the method and advantages over vectorized or complex-adjoint surrogates.
comment: Accepted for presentation at the 24th European Control Conference (ECC 2026), Reykjavik, Iceland. This work was co-funded by the European Union under the project ROBOPROX (reg. no. CZ.02.01.01/00/22 008/0004590)
Active Inverse Methods in Stackelberg Games with Bounded Rationality
Inverse game theory is utilized to infer the cost functions of all players based on game outcomes. However, existing inverse game theory methods do not consider the learner as an active participant in the game, which could significantly enhance the learning process. In this paper, we extend inverse game theory to active inverse methods. For Stackelberg games with bounded rationality, the leader, acting as a learner, actively chooses actions to better understand the follower's cost functions. First, we develop a method of active learning by leveraging Fisher information to maximize information gain about the unknown parameters and prove the consistency and asymptotic normality. Additionally, when leaders consider its cost, we develop a method of active inverse game to balance exploration and exploitation, and prove the consistency and asymptotic Stackelberg equilibrium with quadratic cost functions. Finally, we verify the properties of these methods through simulations in the quadratic case and demonstrate that the active inverse game method can achieve Stackelberg equilibrium more quickly through active exploration.
Hypergame-based Cognition Modeling and Intention Interpretation for Human-Driven Vehicles in Connected Mixed Traffic
With the practical implementation of connected and autonomous vehicles (CAVs), the traffic system is expected to remain a mix of CAVs and human-driven vehicles (HVs) for the foreseeable future. To enhance safety and traffic efficiency, the trajectory planning strategies of CAVs must account for the influence of HVs, necessitating accurate HV trajectory prediction. Current research often assumes that human drivers have perfect knowledge of all vehicles' objectives, an unrealistic premise. This paper bridges the gap by leveraging hypergame theory to account for cognitive and perception limitations in HVs. We model human bounded rationality without assuming them to be merely passive followers and propose a hierarchical cognition modeling framework that captures cognitive relationships among vehicles. We further analyze the cognitive stability of the system, proving that the strategy profile where all vehicles adopt cognitively equilibrium strategies constitutes a hyper Nash equilibrium when CAVs accurately learn HV parameters. To achieve this, we develop an inverse learning algorithm for distributed intention interpretation via vehicle-to-everything (V2X) communication, which extends the framework to both offline and online scenarios. Additionally, we introduce a distributed trajectory prediction and planning approach for CAVs, leveraging the learned parameters in real time. Simulations in highway lane-changing scenarios demonstrate the proposed method's accuracy in parameter learning, robustness to noisy trajectory observations, and safety in HV trajectory prediction. The results validate the effectiveness of our method in both offline and online implementations.
Hypergraph Contrastive Sensor Fusion for Multimodal Fault Diagnosis in Induction Motors
Reliable induction motor (IM) fault diagnosis is vital for industrial safety and operational continuity, mitigating costly unplanned downtime. Conventional approaches often struggle to capture complex multimodal signal relationships, are constrained to unimodal data or single fault types, and exhibit performance degradation under noisy or cross-domain conditions. This paper proposes the Multimodal Hypergraph Contrastive Attention Network (MM-HCAN), a unified framework for robust fault diagnosis. To the best of our knowledge, MM-HCAN is the first to integrate contrastive learning within a hypergraph topology specifically designed for multimodal sensor fusion, enabling the joint modelling of intra- and inter-modal dependencies and enhancing generalisation beyond Euclidean embedding spaces. The model facilitates simultaneous diagnosis of bearing, stator, and rotor faults, addressing the engineering need for consolidated di- agnostic capabilities. Evaluated on three real-world benchmarks, MM-HCAN achieves up to 99.82% accuracy with strong cross-domain generalisation and resilience to noise, demonstrating its suitability for real-world deployment. An ablation study validates the contribution of each component. MM-HCAN provides a scalable and robust solution for comprehensive multi-fault diagnosis, supporting predictive maintenance and extended asset longevity in industrial environments.
comment: Submitted to IEEE Sensors Journal
A Tsetlin Machine Image Classification Accelerator on a Flexible Substrate
This paper introduces the first implementation of digital Tsetlin Machines (TMs) on flexible integrated circuit (FlexIC) using Pragmatic's 600nm IGZO-based FlexIC technology. TMs, known for their energy efficiency, interpretability, and suitability for edge computing, have previously been limited by the rigidity of conventional silicon-based chips. We develop two TM inference models as FlexICs: one achieving 98.5% accuracy using 6800 NAND2 equivalent logic gates with an area of 8X8 mm2, and a second more compact version achieving slightly lower prediction accuracy of 93% but using only 1420 NAND2 equivalent gates with an area of 4X4 mm2, both of which are custom-designed for an 8X8-pixel handwritten digit recognition dataset. The paper demonstrates the feasibility of deploying flexible TM inference engines into wearable healthcare and edge computing applications.
comment: accepted by International Symposium on the Tsetlin Machine (ISTM) 2025
Modelling-driven requirements for Error Field Control Coil application to initial JT-60SA plasmas
JT-60SA is a large superconducting tokamak built in Naka, Japan. After the successful achievement of its first MA-class plasma, the installation of several additional sub-systems, including a set of non-axisymmetric Error Field Correction Coils (EFCC), is ongoing. Optimization of future JT-60SA plasma scenarios will critically depend on the correct use of EFCC, including careful fulfillment of system specifications. In addition to that, preparation and risk mitigation of early ITER operations will greatly benefit from the experience gained by early EFCC application to JT-60SA experiments, in particular to optimize error field detection and control strategies. In this work, EFCC application in JT-60SA Initial Research Phase I perspective scenarios is modeled including plasma response. Impact of (Resonant) Magnetic Perturbations on the different plasma scenarios is assessed for both core and pedestal regions by the linear resistive MHD code MARS-F. The dominant core response to EFs is discussed case by case and compared to mode locking thresholds from literature. Typical current/voltage amplitudes and wave-forms are then compared to EFCC specifications in order to assess a safe operational space.
Balancing Fairness and Performance in Multi-User Spark Workloads with Dynamic Scheduling (extended version) SoCC'25
Apache Spark is a widely adopted framework for large-scale data processing. However, in industrial analytics environments, Spark's built-in schedulers, such as FIFO and fair scheduling, struggle to maintain both user-level fairness and low mean response time, particularly in long-running shared applications. Existing solutions typically focus on job-level fairness which unintentionally favors users who submit more jobs. Although Spark offers a built-in fair scheduler, it lacks adaptability to dynamic user workloads and may degrade overall job performance. We present the User Weighted Fair Queuing (UWFQ) scheduler, designed to minimize job response times while ensuring equitable resource distribution across users and their respective jobs. UWFQ simulates a virtual fair queuing system and schedules jobs based on their estimated finish times under a bounded fairness model. To further address task skew and reduce priority inversions, which are common in Spark workloads, we introduce runtime partitioning, a method that dynamically refines task granularity based on expected runtime. We implement UWFQ within the Spark framework and evaluate its performance using multi-user synthetic workloads and Google cluster traces. We show that UWFQ reduces the average response time of small jobs by up to 74% compared to existing built-in Spark schedulers and to state-of-the-art fair scheduling algorithms.
comment: This paper is an extended version of a paper accepted at the ACM Symposium on Cloud Computing (SoCC'25) that contains a proof of correctness
Recursive Inference for Heterogeneous Multi-Output GP State-Space Models with Arbitrary Moment Matching
Accurate learning of system dynamics is becoming increasingly crucial for advanced control and decision-making in engineering. However, real-world systems often exhibit multiple channels and highly nonlinear transition dynamics, challenging traditional modeling methods. To enable online learning for these systems, this paper formulates the system as Gaussian process state-space models (GPSSMs) and develops a recursive learning method. The main contributions are threefold. First, a heterogeneous multi-output kernel is designed, allowing each output dimension to adopt distinct kernel types, hyperparameters, and input variables, improving expressiveness in multi-dimensional dynamics learning. Second, an inducing-point management algorithm enhances computational efficiency through independent selection and pruning for each output dimension. Third, a unified recursive inference framework for GPSSMs is derived, supporting general moment matching approaches, including the extended Kalman filter (EKF), unscented Kalman filter (UKF), and assumed density filtering (ADF), enabling accurate learning under strong nonlinearity and significant noise. Experiments on synthetic and real-world datasets show that the proposed method matches the accuracy of SOTA offline GPSSMs with only 1/100 of the runtime, and surpasses SOTA online GPSSMs by around 70% in accuracy under heavy noise while using only 1/20 of the runtime.
TranSimHub:A Unified Air-Ground Simulation Platform for Multi-Modal Perception and Decision-Making
Air-ground collaborative intelligence is becoming a key approach for next-generation urban intelligent transportation management, where aerial and ground systems work together on perception, communication, and decision-making. However, the lack of a unified multi-modal simulation environment has limited progress in studying cross-domain perception, coordination under communication constraints, and joint decision optimization. To address this gap, we present TranSimHub, a unified simulation platform for air-ground collaborative intelligence. TranSimHub offers synchronized multi-view rendering across RGB, depth, and semantic segmentation modalities, ensuring consistent perception between aerial and ground viewpoints. It also supports information exchange between the two domains and includes a causal scene editor that enables controllable scenario creation and counterfactual analysis under diverse conditions such as different weather, emergency events, and dynamic obstacles. We release TranSimHub as an open-source platform that supports end-to-end research on perception, fusion, and control across realistic air and ground traffic scenes. Our code is available at https://github.com/Traffic-Alpha/TranSimHub.
comment: 9 pages, 4 figures
Singularity-free dynamical invariants-based quantum control
State preparation is a cornerstone of quantum technologies, underpinning applications in computation, communication, and sensing. Its importance becomes even more pronounced in non-Markovian open quantum systems, where environmental memory and model uncertainties pose significant challenges to achieving high-fidelity control. Invariant-based inverse engineering provides a principled framework for synthesizing analytic control fields, yet existing parameterizations often lead to experimentally infeasible, singular pulses and are limited to simplified noise models such as those of Lindblad form. Here, we introduce a generalized invariant-based protocol for single-qubit state preparation under arbitrary noise conditions. The control proceeds in two-stages: first, we construct a family of bounded pulses that achieve perfect state preparation in a closed system; second, we identify the optimal member of this family that minimizes the effect of noise. The framework accommodates both (i) characterized noise, enabling noise-aware control synthesis, and (ii) uncharacterized noise, where a noise-agnostic variant preserves robustness without requiring a master-equation description. Numerical simulations demonstrate high-fidelity state preparation across diverse targets while producing smooth, hardware-feasible control fields. This singularity-free framework extends invariant-based control to realistic open-system regimes, providing a versatile route toward robust quantum state engineering on NISQ hardware and other platforms exhibiting non-Markovian dynamics.
Adaptive Cost-Map-based Path Planning in Partially Unknown Environments with Movable Obstacles
Reliable navigation in disaster-response and other unstructured indoor settings requires robots not only to avoid obstacles but also to recognise when those obstacles can be pushed aside. We present an adaptive, LiDAR and odometry-based path-planning framework that embeds this capability into the ROS2 Nav2 stack. A new Movable Obstacles Layer labels all LiDAR returns missing from a prior static map as tentatively movable and assigns a reduced traversal cost. A companion Slow-Pose Progress Checker monitors the ratio of commanded to actual velocity; when the robot slows appreciably, the local cost is raised from light to heavy, and on a stall to lethal, prompting the global planner to back out and re-route. Gazebo evaluations on a Scout Mini, spanning isolated objects and cluttered corridors, show higher goal-reach rates and fewer deadlocks than a no-layer baseline, with traversal times broadly comparable. Because the method relies only on planar scans and CPU-level computation, it suits resource-constrained search and rescue robots and integrates into heterogeneous platforms with minimal engineering. Overall, the results indicate that interaction-aware cost maps are a lightweight, ROS2-native extension for navigating among potentially movable obstacles in unstructured settings. The full implementation will be released as open source athttps://costmap-namo.github.io.
Modeling and Dynamic Simulation of a Hybrid Wind-Wave System on a Hexagonal Semi-Submersible Platform
Offshore renewable energy systems offer promising solutions for sustainable power generation, yet most existing platforms harvest either wind or wave energy in isolation. This study presents a hybrid floating offshore platform that integrates a wind turbine with three oscillating surge wave energy converters (WECs) into a hexagonal semi-submersible structure. In this configuration, the flaps are integrated with the platform geometry to provide both energy extraction and hydrodynamic stability. A modeling and simulation framework was developed using WEC-Sim and benchmarked against the NREL 5 MW semisubmersible reference. Metacentric height analysis confirmed hydrostatic stability across a range of prescribed flap angles. Sensitivity analysis of twelve geometric variables identified flap dimensions and tower length as dominant drivers of stability, energy capture, and tower stress. Time-domain simulations revealed dependence on wave incidence angle, with variations in flap power sharing, capture width ratio (CWR), and platform response. The feasibility of using flap sweeps to modulate pitch motion was also demonstrated. Annual energy production (AEP) estimates based on site-specific data indicate 16.86 GWh from wind and 3.65 GWh from wave energy, with WECs contributing about 18% of the total. These results highlight the potential of integrated wind-wave platforms and point toward future studies on structural modeling and advanced control.
comment: 28 pages, 17 figures
An Iterative Problem-Driven Scenario Reduction Framework for Stochastic Optimization with Conditional Value-at-Risk
Scenario reduction (SR) alleviates the computational complexity of scenario-based stochastic optimization with conditional value-at-risk (SBSO-CVaR) by identifying representative scenarios to depict the underlying uncertainty and tail risks. Existing distribution-driven SR methods emphasize statistical similarity but often exclude extreme scenarios, leading to weak tail-risk awareness and insufficient problem-specific representativeness. Instead, this paper proposes an iterative problem-driven scenario reduction framework. Specifically, we integrate the SBSO-CVaR problem structure into SR process and project the original scenario set from the distribution space onto the problem space. Subsequently, to minimize the SR optimality gap with acceptable computation complexity, we propose a tractable iterative problem-driven scenario reduction (IPDSR) method that selects representative scenarios that best approximate the optimality distribution of the original scenario set while preserving tail risks. Furthermore, the iteration process is rendered as a mixed-integer program to enable scenario partitioning and representative scenarios selection. And ex-post problem-driven evaluation indices are proposed to evaluate the SR performance. Numerical experiments show IPDSR significantly outperforms existing SR methods by achieving an optimality gap of less than 1% within an acceptable computation time.
Comprehensive Dynamic Modeling and Constraint-Aware Air Supply Control for Localized Water Management in Automotive Polymer Electrolyte Membrane Fuel Cells
In this paper, a predictive constraint-aware control scheme is formulated within the Command Governor (CG) framework for localized hydration management of a proton exchange membrane (PEM) fuel cell system. First, a comprehensive nonlinear dynamic model of the fuel cell system is presented which includes a pseudo 2-dimensional (P2D) model of the stack, reactant supply and cooling subsystems. The model captures the couplings among the various subsystems and serves as the basis for designing output feedback controllers to track the optimal set-points of the air supply and cooling systems for power optimization. The closed-loop nonlinear model is then used to analyze the dynamic behavior of membrane hydration near the anode inlet, the driest region of the membrane in a counter-flow configuration, under various operating conditions. A reduced-order linearized model is then derived to approximate hydration behavior with sufficient fidelity for constraint enforcement. This model is used within the CG framework to adjust the air supply set-points when necessary to prevent membrane dry-out. The effectiveness of the proposed approach in maintaining local membrane hydration while closely tracking the requested net power is demonstrated through realistic drive-cycle simulations.
comment: This is a manuscript submitted to Applied Energy
Techno-Economic Feasibility Analysis of Quantum Key Distribution for Power-System Communications
The accelerating digitalization and decentralization of modern power systems expose critical communication infrastructures to escalating cyber risks, particularly under emerging quantum computing threats. This paper presents an integrated techno-economic framework to evaluate the feasibility of Quantum Key Distribution (QKD) for secure power-system communications. A stochastic system model is developed to jointly capture time-varying key demand, QKD supply under optical-loss constraints, station-side buffering, and post-quantum cryptography (PQC) fallback mechanisms. Analytical conditions are derived for service-level assurance, including buffer stability, outage probability, and availability bounds. Building on this, two quantitative metrics, including the Levelized Cost of Security (LCoSec) and Cost of Incremental Security (CIS), are formulated to unify capital, operational, and risk-related expenditures within a discounted net-present-value framework. Using IEEE 118-bus, 123-node, and 39-bus test systems, we conduct discrete-event simulations comparing PQC-only, QKD-only, and Hybrid architectures across multiple topologies and service profiles. Results show that Hybrid architectures dominated by QKD significantly reduce key-outage probability and SLA shortfalls, achieving near-unit availability for real-time and confidentiality-critical services. Economic analyses reveal clear breakeven zones where QKD-enhanced deployments become cost-effective, primarily in metropolitan and distribution-level networks under moderate optical loss and buffer sizing. The proposed framework provides a reproducible, risk-aware decision tool for guiding large-scale, economically justified QKD adoption in future resilient power-system infrastructures.
Quantum-Key-Distribution Authenticated Aggregation and Settlement for Virtual Power Plants
The proliferation of distributed energy resources (DERs) and demand-side flexibility has made virtual power plants (VPPs) central to modern grid operation. Yet their end-to-end business pipeline, covering bidding, dispatch, metering, settlement, and archival, forms a tightly coupled cyber-physical-economic system where secure and timely communication is critical. Under the combined stress of sophisticated cyberattacks and extreme weather shocks, conventional cryptography offers limited long-term protection. Quantum key distribution (QKD), with information-theoretic guarantees, is viewed as a gold standard for securing critical infrastructures. However, limited key generation rates, routing capacity, and system overhead render key allocation a pressing challenge: scarce quantum keys must be scheduled across heterogeneous processes to minimize residual risk while maintaining latency guarantees. This paper introduces a quantum-authenticated aggregation and settlement framework for VPPs. We first develop a system-threat model that connects QKD key generation and routing with business-layer security strategies, authentication strength, refresh frequency, and delay constraints. Building on this, we formulate a key-budgeted risk minimization problem that jointly accounts for economic risk, service-level violations, and key-budget feasibility, and reveal a threshold property linking marginal security value to shadow prices. Case studies on a representative VPP system demonstrate that the proposed approach significantly reduces residual risk and SLA violations, enhances key efficiency and robustness, and aligns observed dynamics with the theoretical shadow price mechanism.
Spatial-to-Spectral Harmonic-Modulated Arrays for 6G Multi-Beam MIMO
This article presents an overview and analysis of spatial-to-spectral harmonic-modulated arrays (SHAs). Compared to traditional analog or digital beamforming arrays, SHAs enable concurrent multi-beamforming without requiring substantial hardware replication. SHAs replace the need for hardware replication with frequency-domain multiplexing. Furthermore, SHAs have the potential to become key contributors to future 6G networks by enabling scalable multi-user communications, joint communication and sensing, and spatial interference mitigation. In addition, an analysis of the SHA's harmonic-modulation waveform and its effects on gain, noise and bandwidth is presented. A comb-like modulation waveform for SHAs that minimizes spectral inefficiency is proposed. Further, an analysis of the SHA's capability to independently steer multiple beams is presented. This capability is quantified in terms of the SHA's spatial-to-spectral degrees of freedom. Lastly, this work introduces a novel SHA architecture that provides three spatial-to-spectral degrees of freedom with minimal hardware replication.
A Motivational Driver Steering Model: Task Difficulty Homeostasis From Control Theory Perspective
A general and psychologically plausible collision avoidance driver model can improve transportation safety significantly. Most computational driver models found in the literature have used control theory methods only, and they are not established based on psychological theories. In this paper, a unified approach is presented based on concepts taken from psychology and control theory. The "task difficulty homeostasis theory", a prominent motivational theory, is combined with the "Lyapunov stability method" in control theory to present a general and psychologically plausible model. This approach is used to model driver steering behavior for collision avoidance. The performance of this model is measured by simulation of two collision avoidance scenarios at a wide range of speeds from 20 km/h to 170 km/h. The model is validated by experiments on a driving simulator. The results demonstrate that the model follows human behavior accurately with a mean error of 7 percent.
comment: Cognitive systems Research
Personalized Collaborative Learning with Affinity-Based Variance Reduction
Multi-agent learning faces a fundamental tension: leveraging distributed collaboration without sacrificing the personalization needed for diverse agents. This tension intensifies when aiming for full personalization while adapting to unknown heterogeneity levels -- gaining collaborative speedup when agents are similar, without performance degradation when they are different. Embracing the challenge, we propose personalized collaborative learning (PCL), a novel framework for heterogeneous agents to collaboratively learn personalized solutions with seamless adaptivity. Through carefully designed bias correction and importance correction mechanisms, our method AffPCL robustly handles both environment and objective heterogeneity. We prove that AffPCL reduces sample complexity over independent learning by a factor of $\max\{n^{-1}, \delta\}$, where $n$ is the number of agents and $\delta\in[0,1]$ measures their heterogeneity. This affinity-based acceleration automatically interpolates between the linear speedup of federated learning in homogeneous settings and the baseline of independent learning, without requiring prior knowledge of the system. Our analysis further reveals that an agent may obtain linear speedup even by collaborating with arbitrarily dissimilar agents, unveiling new insights into personalization and collaboration in the high heterogeneity regime.
DeGrip: A Compact Cable-driven Robotic Gripper for Desktop Disassembly
Intelligent robotic disassembly of end-of-life (EOL) products has been a long-standing challenge in robotics. While machine learning techniques have shown promise, the lack of specialized hardware limits their application in real-world scenarios. We introduce DeGrip, a customized gripper designed for the disassembly of EOL computer desktops. DeGrip provides three degrees of freedom (DOF), enabling arbitrary configurations within the disassembly environment when mounted on a robotic manipulator. It employs a cable-driven transmission mechanism that reduces its overall size and enables operation in confined spaces. The wrist is designed to decouple the actuation of wrist and jaw joints. We also developed an EOL desktop disassembly environment in Isaac Sim to evaluate the effectiveness of DeGrip. The tasks were designed to demonstrate its ability to operate in confined spaces and disassemble components in arbitrary configurations. The evaluation results confirm the capability of DeGrip for EOL desktop disassembly.
Heterogeneous Multi-Agent Task-Assignment with Uncertain Execution Times and Preferences
While sequential task assignment for a single agent has been widely studied, such problems in a multi-agent setting, where the agents have heterogeneous task preferences or capabilities, remain less well-characterized. We study a multi-agent task assignment problem where a central planner assigns recurring tasks to multiple members of a team over a finite time horizon. For any given task, the members have heterogeneous capabilities in terms of task completion times, task resource consumption (which can model variables such as energy or attention), and preferences in terms of the rewards they collect upon task completion. We assume that the reward, execution time, and resource consumption for each member to complete any task are stochastic with unknown distributions. The goal of the planner is to maximize the total expected reward that the team receives over the problem horizon while ensuring that the resource consumption required for any assigned task is within the capability of the agent. We propose and analyze a bandit algorithm for this problem. Since the bandit algorithm relies on solving an optimal task assignment problem repeatedly, we analyze the achievable regret in two cases: when we can solve the optimal task assignment exactly and when we can solve it only approximately.
comment: 14 pages
Explore-then-Commit for Nonstationary Linear Bandits with Latent Dynamics
We study a nonstationary bandit problem where rewards depend on both actions and latent states, the latter governed by unknown linear dynamics. Crucially, the state dynamics also depend on the actions, resulting in tension between short-term and long-term rewards. We propose an explore-then-commit algorithm for a finite horizon $T$. During the exploration phase, random Rademacher actions enable estimation of the Markov parameters of the linear dynamics, which characterize the action-reward relationship. In the commit phase, the algorithm uses the estimated parameters to design an optimized action sequence for long-term reward. Our proposed algorithm achieves $\tilde{\mathcal{O}}(T^{2/3})$ regret. Our analysis handles two key challenges: learning from temporally correlated rewards, and designing action sequences with optimal long-term reward. We address the first challenge by providing near-optimal sample complexity and error bounds for system identification using bilinear rewards. We address the second challenge by proving an equivalence with indefinite quadratic optimization over a hypercube, a known NP-hard problem. We provide a sub-optimality guarantee for this problem, enabling our regret upper bound. Lastly, we propose a semidefinite relaxation with Goemans-Williamson rounding as a practical approach.
Residual Correction Models for AC Optimal Power Flow Using DC Optimal Power Flow Solutions
Solving the nonlinear AC optimal power flow (AC OPF) problem remains a major computational bottleneck for real-time grid operations. In this paper, we propose a residual learning paradigm that uses fast DC optimal power flow (DC OPF) solutions as a baseline, and learns only the nonlinear corrections required to provide the full AC-OPF solution. The method utilizes a topology-aware Graph Neural Network with local attention and two-level DC feature integration, trained using a physics-informed loss that enforces AC power-flow feasibility and operational limits. Evaluations on OPFData for 57-, 118-, and 2000-bus systems show around 25% lower MSE, up to 3X reduction in feasibility error, and up to 13X runtime speedup compared to conventional AC OPF solvers. The model maintains accuracy under N-1 contingencies and scales efficiently to large networks. These results demonstrate that residual learning is a practical and scalable bridge between linear approximations and AC-feasible OPF, enabling near real-time operational decision making.
Learning a Generalized Model for Substation Level Voltage Estimation in Distribution Networks
Accurate voltage estimation in distribution networks is critical for real-time monitoring and increasing the reliability of the grid. As DER penetration and distribution level voltage variability increase, robust distribution system state estimation (DSSE) has become more essential to maintain safe and efficient operations. Traditional DSSE techniques, however, struggle with sparse measurements and the scale of modern feeders, limiting their scalability to large networks. This paper presents a hierarchical graph neural network for substation-level voltage estimation that exploits both electrical topology and physical features, while remaining robust to the low observability levels common to real-world distribution networks. Leveraging the public SMART-DS datasets, the model is trained and evaluated on thousands of buses across multiple substations and DER penetration scenarios. Comprehensive experiments demonstrate that the proposed method achieves up to 2 times lower RMSE than alternative data-driven models, and maintains high accuracy with as little as 1\% measurement coverage. The results highlight the potential of GNNs to enable scalable, reproducible, and data-driven voltage monitoring for distribution systems.
DRL-Based Resource Allocation for Energy-Efficient IRS-Assisted UAV Spectrum Sharing Systems
Intelligent reflecting surface (IRS) assisted unmanned aerial vehicle (UAV) systems provide a new paradigm for reconfigurable and flexible wireless communications. To enable more energy efficient and spectrum efficient IRS assisted UAV wireless communications, this paper introduces a novel IRS-assisted UAV enabled spectrum sharing system with orthogonal frequency division multiplexing (OFDM). The goal is to maximize the energy efficiency (EE) of the secondary network by jointly optimizing the beamforming, subcarrier allocation, IRS phase shifts, and the UAV trajectory subject to practical transmit power and passive reflection constraints as well as UAV physical limitations. A physically grounded propulsion-energy model is adopted, with its tight upper bound used to form a tractable EE lower bound for the spectrum sharing system. To handle highly non convex, time coupled optimization problems with a mixed continuous and discrete policy space, we develop a deep reinforcement learning (DRL) approach based on the actor critic framework. Extended experiments show the significant EE improvement of the proposed DRL-based approach compared to several benchmark schemes, thus demonstrating the effectiveness and robustness of the proposed approach with mobility.
comment: 7 pages, 3 figures, 1 algorithm. LaTeX class: IEEEtran
Through-the-Earth Magnetic Induction Communication and Networking: A Comprehensive Survey
Magnetic induction (MI) communication (MIC) has emerged as a promising candidate for underground communication networks due to its excellent penetration capabilities. Integration with Space-Air-Ground-Underground (SAGUI) networks in next-generation mobile communication systems requires a well-defined network architecture. A recent discovery in MIC research, MI fast fading, remains in its early stages and presents unique challenges. This paper provides a comprehensive survey on through-the-earth (TTE) MIC, covering MI applications, channel modeling, point-to-point MIC design, relay techniques, network frameworks, and emerging technologies. We compare various MIC applications to highlight TTE-specific challenges and review the principles of channel modeling, addressing both MI slow fading and MI fast fading, along with its potential impact on existing MIC theories. We conduct a fine-grained decomposition of MI channel power gain into four distinct physical parameters, and propose a novel geometric model to analyze MI fast fading. We also summarize MI relay techniques, examine crosstalk effects in relay and high-density networks, and explore key research tasks within the OSI framework for a holistic MI network protocol in SAGUI. To bridge the gaps identified, we propose a MIC framework that supports TCP/IP and Linux, enabling full implementation of existing and emerging MIC solutions. This framework empowers researchers to leverage Linux resources and deep learning platforms for accelerated development of MIC in SAGUI networks. Remaining research challenges, open issues, and promising novel techniques are further identified to advance MIC research.
comment: This work has been accepted by the IEEE Communications Surveys & Tutorials (COMST) for publication. The final published version will be available on IEEE Xplore
Kernel-based Koopman approximants for control: Flexible sampling, error analysis, and stability
Data-driven techniques for analysis, modeling, and control of complex dynamical systems are on the uptake. Koopman theory provides the theoretical foundation for the popular kernel extended dynamic mode decomposition (kEDMD). In this work, we propose a novel kEDMD scheme to approximate nonlinear control systems accompanied by an in-depth error analysis. Key features are regularization-based robustness and an adroit decomposition into micro and macro grids enabling flexible sampling. But foremost, we prove proportionality, i.e., explicit dependence on the distance to the (controlled) equilibrium, of the derived bound on the full approximation error. Leveraging this key property, we rigorously show that asymptotic stability of the data-driven surrogate (control) system implies asymptotic stability of the original (control) system and vice versa.
comment: 29 pages, 5 figures
Contact-Aware Safety in Soft Robots Using High-Order Control Barrier and Lyapunov Functions
Robots operating alongside people, particularly in sensitive scenarios such as aiding the elderly with daily tasks or collaborating with workers in manufacturing, must guarantee safety and cultivate user trust. Continuum soft manipulators promise safety through material compliance, but as designs evolve for greater precision, payload capacity, and speed, and increasingly incorporate rigid elements, their injury risk resurfaces. In this letter, we introduce a comprehensive High-Order Control Barrier Function (HOCBF) + High-Order Control Lyapunov Function (HOCLF) framework that enforces strict contact force limits across the entire soft-robot body during environmental interactions. Our approach combines a differentiable Piecewise Cosserat-Segment (PCS) dynamics model with a convex-polygon distance approximation metric, named Differentiable Conservative Separating Axis Theorem (DCSAT), based on the soft robot geometry to enable real-time, whole-body collision detection, resolution, and enforcement of the safety constraints. By embedding HOCBFs into our optimization routine, we guarantee safety, allowing, for instance, safe navigation in operational space under HOCLF-driven motion objectives. Extensive planar simulations demonstrate that our method maintains safety-bounded contacts while achieving precise shape and task-space regulation. This work thus lays a foundation for the deployment of soft robots in human-centric environments with provable safety and performance.
comment: 8 pages
Pseudo-Kinematic Trajectory Control and Planning of Tracked Vehicles
Tracked vehicles distribute their weight continuously over a large surface area (the tracks). This distinctive feature makes them the preferred choice for vehicles required to traverse soft and uneven terrain. From a robotics perspective, however, this flexibility comes at a cost: the complexity of modelling the system and the resulting difficulty in designing theoretically sound navigation solutions. In this paper, we aim to bridge this gap by proposing a framework for the navigation of tracked vehicles, built upon three key pillars. The first pillar comprises two models: a simulation model and a control-oriented model. The simulation model captures the intricate terramechanics dynamics arising from soil-track interaction and is employed to develop faithful digital twins of the system across a wide range of operating conditions. The control-oriented model is pseudo-kinematic and mathematically tractable, enabling the design of efficient and theoretically robust control schemes. The second pillar is a Lyapunov-based feedback trajectory controller that provides certifiable tracking guarantees. The third pillar is a portfolio of motion planning solutions, each offering different complexity-accuracy trade-offs. The various components of the proposed approach are validated through an extensive set of simulation and experimental data.
RadioDiff-$k^2$: Helmholtz Equation Informed Generative Diffusion Model for Multi-Path Aware Radio Map Construction
In this paper, we propose a novel physics-informed generative learning approach, named RadioDiff-$k^2$, for accurate and efficient multipath-aware radio map (RM) construction. As future wireless communication evolves towards environment-aware paradigms, the accurate construction of RMs becomes crucial yet highly challenging. Conventional electromagnetic (EM)-based methods, such as full-wave solvers and ray-tracing approaches, exhibit substantial computational overhead and limited adaptability to dynamic scenarios. Although existing neural network (NN) approaches have efficient inferencing speed, they lack sufficient consideration of the underlying physics of EM wave propagation, limiting their effectiveness in accurately modeling critical EM singularities induced by complex multipath environments. To address these fundamental limitations, we propose a novel physics-inspired RM construction method guided explicitly by the Helmholtz equation, which inherently governs EM wave propagation. Specifically, based on the analysis of partial differential equations (PDEs), we theoretically establish a direct correspondence between EM singularities, which correspond to the critical spatial features influencing wireless propagation, and regions defined by negative wave numbers in the Helmholtz equation. We then design an innovative dual diffusion model (DM)-based large artificial intelligence framework comprising one DM dedicated to accurately inferring EM singularities and another DM responsible for reconstructing the complete RM using these singularities along with environmental contextual information. Experimental results demonstrate that the proposed RadioDiff-$k^2$ framework achieves state-of-the-art (SOTA) performance in both image-level RM construction and localization tasks, while maintaining inference latency within a few hundred milliseconds.
A kernel-based approach to physics-informed nonlinear system identification
This paper presents a kernel-based framework for physics-informed nonlinear system identification. The key contribution is a structured methodology that extends kernel-based techniques to seamlessly embed partially known physics-based models, improving parameter estimation and overall model accuracy. The proposed method enhances traditional modeling approaches by embedding a parametric model, which provides physical interpretability, with a kernel-based function, which accounts for unmodeled dynamics. The two models' components are identified from the data simultaneously, thereby minimizing a suitable cost that balances the relative importance of the physical and the black-box parts of the model. Additionally, nonlinear state smoothing is employed to address scenarios involving state-space models with not fully measurable states. Numerical simulations on an experimental benchmark system demonstrate the effectiveness of the proposed approach, achieving up to 51% reduction in simulation root mean square error compared to physics-only models and 31% performance improvement over state-of-the-art identification techniques.
comment: [Extended version] This work has been submitted to the IEEE for possible publication
Stochastic Model Predictive Control for Sub-Gaussian Noise
We propose a stochastic Model Predictive Control (MPC) framework that ensures closed-loop chance constraint satisfaction for linear systems with general sub-Gaussian process and measurement noise. By considering sub-Gaussian noise, we can provide guarantees for a large class of distributions, including time-varying distributions. Specifically, we first provide a new characterization of sub-Gaussian random vectors using matrix variance proxy, which can more accurately represent the predicted state distribution. We then derive tail bounds under linear propagation for the new characterization, enabling tractable computation of probabilistic reachable sets of linear systems. Lastly, we utilize these probabilistic reachable sets to formulate a stochastic MPC scheme that provides closed-loop guarantees for general sub-Gaussian noise. We further demonstrate our approach in simulations, including a challenging task of surgical planning from image observations.
comment: 15 pages, 6 figures, submitted to Automatica
Multi-stage model predictive control for slug flow crystallizers using uncertainty-aware surrogate models
This paper presents a novel dynamic model for slug flow crystallizers that addresses the challenges of spatial distribution without backmixing or diffusion, potentially enabling advanced model-based control. The developed model can accurately describe the main characteristics of slug flow crystallizers, including slug-to-slug variability but leads to a high computational complexity due to the consideration of partial differential equations and population balance equations. For that reason, the model cannot be directly used for process optimization and control. To solve this challenge, we propose two different approaches, conformalized quantile regression and Bayesian last layer neural networks, to develop surrogate models with uncertainty quantification capabilities. These surrogates output a prediction of the system states together with an uncertainty of these predictions to account for process variability and model uncertainty. We use the uncertainty of the predictions to formulate a robust model predictive control approach, enabling robust real-time advanced control of a slug flow crystallizer.
Decentralized Real-Time Iterations for Distributed NMPC
This article presents a Real-Time Iteration (RTI) scheme for distributed Nonlinear Model Predictive Control (NMPC). The scheme transfers the well-known RTI approach, a key enabler for many industrial real-time NMPC implementations, to the setting of cooperative distributed control. At each sampling instant, one outer iteration of a bi-level decentralized Sequential Quadratic Programming (dSQP) method is applied to a centralized optimal control problem. This ensures that real-time requirements are met and it facilitates cooperation between subsystems. Combining novel dSQP convergence results with RTI stability guarantees, we prove local exponential stability under standard assumptions on the MPC design with and without terminal constraints. The proposed scheme only requires neighbor-to-neighbor communication and avoids a central coordinator. A numerical example with coupled inverted pendulums demonstrates the efficacy of the approach.
A Set-Theoretic Robust Control Approach for Linear Quadratic Games with Unknown Counterparts
Ensuring robust decision-making in multi-agent systems is challenging when agents have distinct, possibly conflicting objectives and lack full knowledge of each other's strategies. This is apparent in safety-critical applications such as human-robot interaction and assisted driving, where uncertainty arises not only from unknown adversary strategies but also from external disturbances. To address this, the paper proposes a robust adaptive control approach based on linear quadratic differential games. Our method allows a controlled agent to iteratively refine its belief about the adversary strategy and disturbances using a set-membership approach, while simultaneously adapting its policy to guarantee robustness against the uncertain adversary policy and improve performance over time. We formally derive theoretical guarantees on the robustness of the proposed control scheme and its convergence to $\epsilon$-Nash strategies. The effectiveness of our approach is demonstrated in a numerical simulation.
comment: Accepted for publication in the Proceedings of the 64th IEEE Conference on Decision and Control
Wearable and Ultra-Low-Power Fusion of EMG and A-Mode US for Hand-Wrist Kinematic Tracking
Hand gesture recognition based on biosignals has shown strong potential for developing intuitive human-machine interaction strategies that closely mimic natural human behavior. In particular, sensor fusion approaches have gained attention for combining complementary information and overcoming the limitations of individual sensing modalities, thereby enabling more robust and reliable systems. Among them, the fusion of surface electromyography (EMG) and A-mode ultrasound (US) is very promising. However, prior solutions rely on power-hungry platforms unsuitable for multi-day use and are limited to discrete gesture classification. In this work, we present an ultra-low-power (sub-50 mW) system for concurrent acquisition of 8-channel EMG and 4-channel A-mode US signals, integrating two state-of-the-art platforms into fully wearable, dry-contact armbands. We propose a framework for continuous tracking of 23 degrees of freedom (DoFs), 20 for the hand and 3 for the wrist, using a kinematic glove for ground-truth labeling. Our method employs lightweight encoder-decoder architectures with multi-task learning to simultaneously estimate hand and wrist joint angles. Experimental results under realistic sensor repositioning conditions demonstrate that EMG-US fusion achieves a root mean squared error of $10.6^\circ\pm2.0^\circ$, compared to $12.0^\circ\pm1^\circ$ for EMG and $13.1^\circ\pm2.6^\circ$ for US, and a R$^2$ score of $0.61\pm0.1$, with $0.54\pm0.03$ for EMG and $0.38\pm0.20$ for US.
comment: 5 pages, 3 figures
VLMLight: Safety-Critical Traffic Signal Control via Vision-Language Meta-Control and Dual-Branch Reasoning Architecture
Traffic signal control (TSC) is a core challenge in urban mobility, where real-time decisions must balance efficiency and safety. Existing methods - ranging from rule-based heuristics to reinforcement learning (RL) - often struggle to generalize to complex, dynamic, and safety-critical scenarios. We introduce VLMLight, a novel TSC framework that integrates vision-language meta-control with dual-branch reasoning. At the core of VLMLight is the first image-based traffic simulator that enables multi-view visual perception at intersections, allowing policies to reason over rich cues such as vehicle type, motion, and spatial density. A large language model (LLM) serves as a safety-prioritized meta-controller, selecting between a fast RL policy for routine traffic and a structured reasoning branch for critical cases. In the latter, multiple LLM agents collaborate to assess traffic phases, prioritize emergency vehicles, and verify rule compliance. Experiments show that VLMLight reduces waiting times for emergency vehicles by up to 65% over RL-only systems, while preserving real-time performance in standard conditions with less than 1% degradation. VLMLight offers a scalable, interpretable, and safety-aware solution for next-generation traffic signal control.
comment: 25 pages, 15 figures
Learning to Capture Rocks using an Excavator: A Reinforcement Learning Approach with Guiding Reward Formulation
Rock capturing with standard excavator buckets is a challenging task typically requiring the expertise of skilled operators. Unlike soil digging, it involves manipulating large, irregular rocks in unstructured environments where complex contact interactions with granular material make model-based control impractical. Existing autonomous excavation methods focus mainly on continuous media or rely on specialized grippers, limiting their applicability to real-world construction sites. This paper introduces a fully data-driven control framework for rock capturing that eliminates the need for explicit modeling of rock or soil properties. A model-free reinforcement learning agent is trained in the AGX Dynamics simulator using the Proximal Policy Optimization (PPO) algorithm and a guiding reward formulation. The learned policy outputs joint velocity commands directly to the boom, arm, and bucket of a CAT365 excavator model. Robustness is enhanced through extensive domain randomization of rock geometry, density, and mass, as well as the initial configurations of the bucket, rock, and goal position. To the best of our knowledge, this is the first study to develop and evaluate an RL-based controller for the rock capturing task. Experimental results show that the policy generalizes well to unseen rocks and varying soil conditions, achieving high success rates comparable to those of human participants while maintaining machine stability. These findings demonstrate the feasibility of learning-based excavation strategies for discrete object manipulation without requiring specialized hardware or detailed material models.
Grid-Aware Real-Time Dispatch of Microgrid with Generalized Energy Storage: A Prediction-Free Online Optimization Approach
This paper proposes a novel prediction-free two-stage coordinated dispatch framework for the real-time dispatch of grid-connected microgrid with generalized energy storages (GES). The proposed framework explicitly addresses grid awareness, non-anticipativity constraints, and the time-coupling characteristics of GES, providing microgrid operators with a near-optimal, reliable, and adaptable dispatch tool. In the offline stage, we generate the hindsight state-of-charge (SoC) trajectories of GES by solving the multi-period economic dispatch with historical scenarios. Subsequently, leveraging this historical information (SoC trajectories, net loads, and electricity prices), we synthesize and dynamically update online references for both SoC and opportunity cost through kernel regression. We propose an adaptive Lagrange multiplier-based online convex optimization algorithm, which innovatively incorporates reference tracking for global vision and expert-tracking for step-size updates. We provide theoretical proof to show that the proposed OCO algorithm achieves a sublinear bound of both dynamic regret and time-varying hard constraint violation. Numerical studies using ground-truth data from the Australian Energy Market Operator demonstrate that the proposed method outperforms state-of-the-art methods, reducing operational costs by 5.0-6.2% and voltage violations by 0.8-9.1%. These improvements mainly result from mitigating myopia by reference tracking and the adaptive capability provided by dynamically updated references and adaptive Lagrange multipliers. Sensitivity analysis demonstrates the robustness, computational efficiency, and scalability of the proposed method.
A Multimodal Lightweight Approach to Fault Diagnosis of Induction Motors in High-Dimensional Dataset
An accurate AI-based diagnostic system for induction motors (IMs) holds the potential to enhance proactive maintenance, mitigating unplanned downtime and curbing overall maintenance costs within an industrial environment. Notably, among the prevalent faults in IMs, a Broken Rotor Bar (BRB) fault is frequently encountered. Researchers have proposed various fault diagnosis approaches using signal processing (SP), machine learning (ML), deep learning (DL), and hybrid architectures for BRB faults. One limitation in the existing literature is the training of these architectures on relatively small datasets, risking overfitting when implementing such systems in industrial environments. This paper addresses this limitation by implementing large-scale data of BRB faults by using a transfer-learning-based lightweight DL model named ShuffleNetV2 for diagnosing one, two, three, and four BRB faults using current and vibration signal data. Spectral images for training and testing are generated using a Short-Time Fourier Transform (STFT). The dataset comprises 57,500 images, with 47,500 used for training and 10,000 for testing. Remarkably, the ShuffleNetV2 model exhibited superior performance, in less computational cost as well as accurately classifying 98.856% of spectral images. To further enhance the visualization of harmonic sidebands resulting from broken bars, Fast Fourier Transform (FFT) is applied to current and vibration data. The paper also provides insights into the training and testing times for each model, contributing to a comprehensive understanding of the proposed fault diagnosis methodology. The findings of our research provide valuable insights into the performance and efficiency of different ML and DL models, offering a foundation for the development of robust fault diagnosis systems for induction motors in industrial settings.
Feedback Stackelberg-Nash equilibria in difference games with quasi-hierarchical interactions and inequality constraints
In this paper, we study a class of two-player deterministic finite-horizon difference games with coupled inequality constraints, where each player has two types of decision variables: one involving sequential interactions and the other simultaneous interactions. We refer to this class of games as quasi-hierarchical dynamic games and define a solution concept called the feedback Stackelberg-Nash (FSN) equilibrium. Under separability assumption on cost functions, we provide a recursive formulation of the FSN solution using dynamic programming. We show that the FSN solution can be derived from the parametric feedback Stackelberg solution of an associated unconstrained game involving only sequential interactions, with a specific choice of the parameters that satisfy certain implicit complementarity conditions. For the linear-quadratic case, we show that an FSN solution is obtained by reformulating these complementarity conditions as a single large-scale linear complementarity problem. Finally, we illustrate our results using a dynamic duopoly game with production constraints.
Real-Time Linear MPC for Quadrotors on SE(3): An Analytical Koopman-based Realization
This letter presents an analytical linear parameter-varying (LPV) representation of quadrotor dynamics utilizing Koopman theory, facilitating computationally efficient linear model predictive control (LMPC) for real-time trajectory tracking. By leveraging carefully designed Koopman observables, the proposed approach enables a compact lifted-space evolution that mitigates the curse of dimensionality while preserving the nonlinear characteristics of the system. Although model predictive control (MPC) is a powerful strategy for quadrotor control, it faces a trade-off between the high computational cost of nonlinear MPC (NMPC) and the reduced accuracy of LMPC. To address this gap, we introduce KQ-LMPC (Koopman Quasilinear LPV MPC), which leverages the Koopman-lifted LPV formulation to enforce constraints, ensure lower computational burden and real-time feasibility, and deliver tracking performance comparable to NMPC. Experimental validation confirms the effectiveness of the framework in reasonably agile flight. To the best of our knowledge, this is the first experimentally validated LMPC for quadrotors that employs analytically derived Koopman observables without requiring training data.
comment: 6 pages, 3 figures, accepted for publication at IEEE Robotics and Automation Letters
Recursive Gaussian Process State Space Model
Learning dynamical models from data is not only fundamental but also holds great promise for advancing principle discovery, time-series prediction, and controller design. Among various approaches, Gaussian Process State-Space Models (GPSSMs) have recently gained significant attention due to their combination of flexibility and interpretability. However, for online learning, the field lacks an efficient method suitable for scenarios where prior information regarding data distribution and model function is limited. To address this issue, this paper proposes a recursive GPSSM method with adaptive capabilities for both operating domains and Gaussian process (GP) hyperparameters. Specifically, we first utilize first-order linearization to derive a Bayesian update equation for the joint distribution between the system state and the GP model, enabling closed-form and domain-independent learning. Second, an online selection algorithm for inducing points is developed based on informative criteria to achieve lightweight learning. Third, to support online hyperparameter optimization, we recover historical measurement information from the current filtering distribution. Comprehensive evaluations on both synthetic and real-world datasets demonstrate the superior accuracy, computational efficiency, and adaptability of our method compared to state-of-the-art online GPSSM techniques.
A Human-Vector Susceptible-Infected-Susceptible Model for Analyzing and Controlling the Spread of Vector-Borne Diseases
We propose an epidemic model for the spread of vector-borne diseases. The model, which is built extending the classical susceptible-infected-susceptible model, accounts for two populations -- humans and vectors -- and for cross-contagion between the two species, whereby humans become infected upon interaction with carrier vectors, and vectors become carriers after interaction with infected humans. We formulate the model as a system of ordinary differential equations and leverage monotone systems theory to rigorously characterize the epidemic dynamics. Specifically, we characterize the global asymptotic behavior of the disease, determining conditions for quick eradication of the disease (i.e., for which all trajectories converge to a disease-free equilibrium), or convergence to a (unique) endemic equilibrium. Then, we incorporate two control actions: namely, vector control and incentives to adopt protection measures. Using the derived mathematical tools, we assess the impact of these two control actions and determine the optimal control policy.
comment: Published in the Proceedings of the 2025 European Control Conference (ECC)
Robust Closed-Form Control for MIMO Nonlinear Systems under Conflicting Time-Varying Hard and Soft Constraints (extended version)
This paper introduces a novel robust closed-form control law to handle time-varying hard and soft constraints in uncertain high-relative-degree nonlinear MIMO systems. These constraints represent spatiotemporal specifications in mechanical systems' operational space, with hard constraints ensuring safety-critical requirements and soft constraints encoding performance or task objectives. Initially, all constraints are consolidated into two separate scalar time-varying hard and soft constraint functions, whose positive level sets define feasible regions. A closed-form control law is developed to enforce these constraints using appropriately designed reciprocal barriers and nonlinear transformation functions. When conflicts between hard and soft constraints arise, the control law prioritizes hard constraints by virtually relaxing soft constraints via a dynamic relaxation law. Notably, the proposed control law maintains low complexity by avoiding approximation schemes for coping with system uncertainties. Simulation results confirm the effectiveness of the proposed method.
comment: 18 pages, 6 figures
Robotics
Dynamic Recalibration in LiDAR SLAM: Integrating AI and Geometric Methods with Real-Time Feedback Using INAF Fusion
This paper presents a novel fusion technique for LiDAR Simultaneous Localization and Mapping (SLAM), aimed at improving localization and 3D mapping using LiDAR sensor. Our approach centers on the Inferred Attention Fusion (INAF) module, which integrates AI with geometric odometry. Utilizing the KITTI dataset's LiDAR data, INAF dynamically adjusts attention weights based on environmental feedback, enhancing the system's adaptability and measurement accuracy. This method advances the precision of both localization and 3D mapping, demonstrating the potential of our fusion technique to enhance autonomous navigation systems in complex scenarios.
comment: 9 pages, 9 figures
DexCanvas: Bridging Human Demonstrations and Robot Learning for Dexterous Manipulation
We present DexCanvas, a large-scale hybrid real-synthetic human manipulation dataset containing 7,000 hours of dexterous hand-object interactions seeded from 70 hours of real human demonstrations, organized across 21 fundamental manipulation types based on the Cutkosky taxonomy. Each entry combines synchronized multi-view RGB-D, high-precision mocap with MANO hand parameters, and per-frame contact points with physically consistent force profiles. Our real-to-sim pipeline uses reinforcement learning to train policies that control an actuated MANO hand in physics simulation, reproducing human demonstrations while discovering the underlying contact forces that generate the observed object motion. DexCanvas is the first manipulation dataset to combine large-scale real demonstrations, systematic skill coverage based on established taxonomies, and physics-validated contact annotations. The dataset can facilitate research in robotic manipulation learning, contact-rich control, and skill transfer across different hand morphologies.
Few-Shot Demonstration-Driven Task Coordination and Trajectory Execution for Multi-Robot Systems
In this paper, we propose a novel few-shot learning framework for multi-robot systems that integrate both spatial and temporal elements: Few-Shot Demonstration-Driven Task Coordination and Trajectory Execution (DDACE). Our approach leverages temporal graph networks for learning task-agnostic temporal sequencing and Gaussian Processes for spatial trajectory modeling, ensuring modularity and generalization across various tasks. By decoupling temporal and spatial aspects, DDACE requires only a small number of demonstrations, significantly reducing data requirements compared to traditional learning from demonstration approaches. To validate our proposed framework, we conducted extensive experiments in task environments designed to assess various aspects of multi-robot coordination-such as multi-sequence execution, multi-action dynamics, complex trajectory generation, and heterogeneous configurations. The experimental results demonstrate that our approach successfully achieves task execution under few-shot learning conditions and generalizes effectively across dynamic and diverse settings. This work underscores the potential of modular architectures in enhancing the practicality and scalability of multi-robot systems in real-world applications. Additional materials are available at https://sites.google.com/view/ddace.
HEADER: Hierarchical Robot Exploration via Attention-Based Deep Reinforcement Learning with Expert-Guided Reward
This work pushes the boundaries of learning-based methods in autonomous robot exploration in terms of environmental scale and exploration efficiency. We present HEADER, an attention-based reinforcement learning approach with hierarchical graphs for efficient exploration in large-scale environments. HEADER follows existing conventional methods to construct hierarchical representations for the robot belief/map, but further designs a novel community-based algorithm to construct and update a global graph, which remains fully incremental, shape-adaptive, and operates with linear complexity. Building upon attention-based networks, our planner finely reasons about the nearby belief within the local range while coarsely leveraging distant information at the global scale, enabling next-best-viewpoint decisions that consider multi-scale spatial dependencies. Beyond novel map representation, we introduce a parameter-free privileged reward that significantly improves model performance and produces near-optimal exploration behaviors, by avoiding training objective bias caused by handcrafted reward shaping. In simulated challenging, large-scale exploration scenarios, HEADER demonstrates better scalability than most existing learning and non-learning methods, while achieving a significant improvement in exploration efficiency (up to 20%) over state-of-the-art baselines. We also deploy HEADER on hardware and validate it in complex, large-scale real-life scenarios, including a 300m*230m campus environment.
Freehand 3D Ultrasound Imaging: Sim-in-the-Loop Probe Pose Optimization via Visual Servoing
Freehand 3D ultrasound (US) imaging using conventional 2D probes offers flexibility and accessibility for diverse clinical applications but faces challenges in accurate probe pose estimation. Traditional methods depend on costly tracking systems, while neural network-based methods struggle with image noise and error accumulation, compromising reconstruction precision. We propose a cost-effective and versatile solution that leverages lightweight cameras and visual servoing in simulated environments for precise 3D US imaging. These cameras capture visual feedback from a textured planar workspace. To counter occlusions and lighting issues, we introduce an image restoration method that reconstructs occluded regions by matching surrounding texture patterns. For pose estimation, we develop a simulation-in-the-loop approach, which replicates the system setup in simulation and iteratively minimizes pose errors between simulated and real-world observations. A visual servoing controller refines the alignment of camera views, improving translational estimation by optimizing image alignment. Validations on a soft vascular phantom, a 3D-printed conical model, and a human arm demonstrate the robustness and accuracy of our approach, with Hausdorff distances to the reference reconstructions of 0.359 mm, 1.171 mm, and 0.858 mm, respectively. These results confirm the method's potential for reliable freehand 3D US reconstruction.
Integration of a Variable Stiffness Link for Long-Reach Aerial Manipulation
This paper presents the integration of a Variable Stiffness Link (VSL) for long-reach aerial manipulation, enabling adaptable mechanical coupling between an aerial multirotor platform and a dual-arm manipulator. Conventional long-reach manipulation systems rely on rigid or cable connections, which limit precision or transmit disturbances to the aerial vehicle. The proposed VSL introduces an adjustable stiffness mechanism that allows the link to behave either as a flexible rope or as a rigid rod, depending on task requirements. The system is mounted on a quadrotor equipped with the LiCAS dual-arm manipulator and evaluated through teleoperated experiments, involving external disturbances and parcel transportation tasks. Results demonstrate that varying the link stiffness significantly modifies the dynamic interaction between the UAV and the payload. The flexible configuration attenuates external impacts and aerodynamic perturbations, while the rigid configuration improves positional accuracy during manipulation phases. These results confirm that VSL enhances versatility and safety, providing a controllable trade-off between compliance and precision. Future work will focus on autonomous stiffness regulation, multi-rope configurations, cooperative aerial manipulation and user studies to further assess its impact on teleoperated and semi-autonomous aerial tasks.
Educational SoftHand-A: Building an Anthropomorphic Hand with Soft Synergies using LEGO MINDSTORMS IROS 2025
This paper introduces an anthropomorphic robot hand built entirely using LEGO MINDSTORMS: the Educational SoftHand-A, a tendon-driven, highly-underactuated robot hand based on the Pisa/IIT SoftHand and related hands. To be suitable for an educational context, the design is constrained to use only standard LEGO pieces with tests using common equipment available at home. The hand features dual motors driving an agonist/antagonist opposing pair of tendons on each finger, which are shown to result in reactive fine control. The finger motions are synchonized through soft synergies, implemented with a differential mechanism using clutch gears. Altogether, this design results in an anthropomorphic hand that can adaptively grasp a broad range of objects using a simple actuation and control mechanism. Since the hand can be constructed from LEGO pieces and uses state-of-the-art design concepts for robotic hands, it has the potential to educate and inspire children to learn about the frontiers of modern robotics.
comment: 6 pages. Accepted at IROS 2025
Adaptive Legged Locomotion via Online Learning for Model Predictive Control
We provide an algorithm for adaptive legged locomotion via online learning and model predictive control. The algorithm is composed of two interacting modules: model predictive control (MPC) and online learning of residual dynamics. The residual dynamics can represent modeling errors and external disturbances. We are motivated by the future of autonomy where quadrupeds will autonomously perform complex tasks despite real-world unknown uncertainty, such as unknown payload and uneven terrains. The algorithm uses random Fourier features to approximate the residual dynamics in reproducing kernel Hilbert spaces. Then, it employs MPC based on the current learned model of the residual dynamics. The model is updated online in a self-supervised manner using least squares based on the data collected while controlling the quadruped. The algorithm enjoys sublinear \textit{dynamic regret}, defined as the suboptimality against an optimal clairvoyant controller that knows how the residual dynamics. We validate our algorithm in Gazebo and MuJoCo simulations, where the quadruped aims to track reference trajectories. The Gazebo simulations include constant unknown external forces up to $12\boldsymbol{g}$, where $\boldsymbol{g}$ is the gravity vector, in flat terrain, slope terrain with $20\degree$ inclination, and rough terrain with $0.25m$ height variation. The MuJoCo simulations include time-varying unknown disturbances with payload up to $8~kg$ and time-varying ground friction coefficients in flat terrain.
comment: 9 pages
Improved Extended Kalman Filter-Based Disturbance Observers for Exoskeletons
The nominal performance of mechanical systems is often degraded by unknown disturbances. A two-degree-of-freedom control structure can decouple nominal performance from disturbance rejection. However, perfect disturbance rejection is unattainable when the disturbance dynamic is unknown. In this work, we reveal an inherent trade-off in disturbance estimation subject to tracking speed and tracking uncertainty. Then, we propose two novel methods to enhance disturbance estimation: an interacting multiple model extended Kalman filter-based disturbance observer and a multi-kernel correntropy extended Kalman filter-based disturbance observer. Experiments on an exoskeleton verify that the proposed two methods improve the tracking accuracy $36.3\%$ and $16.2\%$ in hip joint error, and $46.3\%$ and $24.4\%$ in knee joint error, respectively, compared to the extended Kalman filter-based disturbance observer, in a time-varying interaction force scenario, demonstrating the superiority of the proposed method.
VO-DP: Semantic-Geometric Adaptive Diffusion Policy for Vision-Only Robotic Manipulation
In the context of imitation learning, visuomotor-based diffusion policy learning is one of the main directions in robotic manipulation. Most of these approaches rely on point clouds as observation inputs and construct scene representations through point clouds feature learning, which enables them to achieve remarkable accuracy. However, the existing literature lacks an in-depth exploration of vision-only solutions that have significant potential. In this paper, we propose a Vision-Only and single-view Diffusion Policy learning method (VO-DP) that leverages pretrained visual foundation models to achieve effective fusion of semantic and geometric features. We utilize intermediate features from VGGT incorporating semantic features from DINOv2 and geometric features from Alternating Attention blocks. Features are fused via cross-attention and spatially compressed with a CNN to form the input to the policy head. Extensive experiments demonstrate that VO-DP not only outperforms the vision-only baseline DP significantly but also exhibits distinct performance trends against the point cloud-based method DP3: in simulation tasks, VO-DP achieves an average success rate of 64.6% on par with DP3 64.0% and far higher than DP 34.8%, while in real-world tasks, it reaches 87.9%, outperforming both DP3 67.5% and DP 11.2% by a notable margin. Further robustness evaluations confirm that VO-DP remains highly stable under varying conditions including color, size, background, and lighting. Lastly, we open-source a training library for robotic manipulation. Built on Accelerate, this library supports multi-machine and multi-GPU parallel training, as well as mixed precision training. It is compatible with visuomotor policies such as DP, DP3 and VO-DP, and also supports the RoboTwin simulator.
Exploring Conditions for Diffusion models in Robotic Control
While pre-trained visual representations have significantly advanced imitation learning, they are often task-agnostic as they remain frozen during policy learning. In this work, we explore leveraging pre-trained text-to-image diffusion models to obtain task-adaptive visual representations for robotic control, without fine-tuning the model itself. However, we find that naively applying textual conditions - a successful strategy in other vision domains - yields minimal or even negative gains in control tasks. We attribute this to the domain gap between the diffusion model's training data and robotic control environments, leading us to argue for conditions that consider the specific, dynamic visual information required for control. To this end, we propose ORCA, which introduces learnable task prompts that adapt to the control environment and visual prompts that capture fine-grained, frame-specific details. Through facilitating task-adaptive representations with our newly devised conditions, our approach achieves state-of-the-art performance on various robotic control benchmarks, significantly surpassing prior methods.
comment: Project page: https://orca-rc.github.io/
Perfect Prediction or Plenty of Proposals? What Matters Most in Planning for Autonomous Driving
Traditionally, prediction and planning in autonomous driving (AD) have been treated as separate, sequential modules. Recently, there has been a growing shift towards tighter integration of these components, known as Integrated Prediction and Planning (IPP), with the aim of enabling more informed and adaptive decision-making. However, it remains unclear to what extent this integration actually improves planning performance. In this work, we investigate the role of prediction in IPP approaches, drawing on the widely adopted Val14 benchmark, which encompasses more common driving scenarios with relatively low interaction complexity, and the interPlan benchmark, which includes highly interactive and out-of-distribution driving situations. Our analysis reveals that even access to perfect future predictions does not lead to better planning outcomes, indicating that current IPP methods often fail to fully exploit future behavior information. Instead, we focus on high-quality proposal generation, while using predictions primarily for collision checks. We find that many imitation learning-based planners struggle to generate realistic and plausible proposals, performing worse than PDM - a simple lane-following approach. Motivated by this observation, we build on PDM with an enhanced proposal generation method, shifting the emphasis towards producing diverse but realistic and high-quality proposals. This proposal-centric approach significantly outperforms existing methods, especially in out-of-distribution and highly interactive settings, where it sets new state-of-the-art results.
comment: 8 pages, 5 figures
VDRive: Leveraging Reinforced VLA and Diffusion Policy for End-to-end Autonomous Driving
In autonomous driving, dynamic environment and corner cases pose significant challenges to the robustness of ego vehicle's state understanding and decision making. We introduce VDRive, a novel pipeline for end-to-end autonomous driving that explicitly models state-action mapping to address these challenges, enabling interpretable and robust decision making. By leveraging the advancement of the state understanding of the Vision Language Action Model (VLA) with generative diffusion policy-based action head, our VDRive guides the driving contextually and geometrically. Contextually, VLA predicts future observations through token generation pre-training, where the observations are represented as discrete codes by a Conditional Vector Quantized Variational Autoencoder (CVQ-VAE). Geometrically, we perform reinforcement learning fine-tuning of the VLA to predict future trajectories and actions based on current driving conditions. VLA supplies the current state tokens and predicted state tokens for the action policy head to generate hierarchical actions and trajectories. During policy training, a learned critic evaluates the actions generated by the policy and provides gradient-based feedback, forming an actor-critic framework that enables a reinforcement-based policy learning pipeline. Experiments show that our VDRive achieves state-of-the-art performance in the Bench2Drive closed-loop benchmark and nuScenes open-loop planning.
comment: 1st version
Towards Robust Zero-Shot Reinforcement Learning
The recent development of zero-shot reinforcement learning (RL) has opened a new avenue for learning pre-trained generalist policies that can adapt to arbitrary new tasks in a zero-shot manner. While the popular Forward-Backward representations (FB) and related methods have shown promise in zero-shot RL, we empirically found that their modeling lacks expressivity and that extrapolation errors caused by out-of-distribution (OOD) actions during offline learning sometimes lead to biased representations, ultimately resulting in suboptimal performance. To address these issues, we propose Behavior-REgularizEd Zero-shot RL with Expressivity enhancement (BREEZE), an upgraded FB-based framework that simultaneously enhances learning stability, policy extraction capability, and representation learning quality. BREEZE introduces behavioral regularization in zero-shot RL policy learning, transforming policy optimization into a stable in-sample learning paradigm. Additionally, BREEZE extracts the policy using a task-conditioned diffusion model, enabling the generation of high-quality and multimodal action distributions in zero-shot RL settings. Moreover, BREEZE employs expressive attention-based architectures for representation modeling to capture the complex relationships between environmental dynamics. Extensive experiments on ExORL and D4RL Kitchen demonstrate that BREEZE achieves the best or near-the-best performance while exhibiting superior robustness compared to prior offline zero-shot RL methods. The official implementation is available at: https://github.com/Whiterrrrr/BREEZE.
comment: Neurips 2025, 36 pages, 18 figures
Towards Automated Chicken Deboning via Learning-based Dynamically-Adaptive 6-DoF Multi-Material Cutting
Automating chicken shoulder deboning requires precise 6-DoF cutting through a partially occluded, deformable, multi-material joint, since contact with the bones presents serious health and safety risks. Our work makes both systems-level and algorithmic contributions to train and deploy a reactive force-feedback cutting policy that dynamically adapts a nominal trajectory and enables full 6-DoF knife control to traverse the narrow joint gap while avoiding contact with the bones. First, we introduce an open-source custom-built simulator for multi-material cutting that models coupling, fracture, and cutting forces, and supports reinforcement learning, enabling efficient training and rapid prototyping. Second, we design a reusable physical testbed to emulate the chicken shoulder: two rigid "bone" spheres with controllable pose embedded in a softer block, enabling rigorous and repeatable evaluation while preserving essential multi-material characteristics of the target problem. Third, we train and deploy a residual RL policy, with discretized force observations and domain randomization, enabling robust zero-shot sim-to-real transfer and the first demonstration of a learned policy that debones a real chicken shoulder. Our experiments in our simulator, on our physical testbed, and on real chicken shoulders show that our learned policy reliably navigates the joint gap and reduces undesired bone/cartilage contact, resulting in up to a 4x improvement over existing open-loop cutting baselines in terms of success rate and bone avoidance. Our results also illustrate the necessity of force feedback for safe and effective multi-material cutting. The project website is at https://sites.google.com/view/chickendeboning-2026.
comment: 8 Pages, 8 figures
GaussGym: An open-source real-to-sim framework for learning locomotion from pixels
We present a novel approach for photorealistic robot simulation that integrates 3D Gaussian Splatting as a drop-in renderer within vectorized physics simulators such as IsaacGym. This enables unprecedented speed -- exceeding 100,000 steps per second on consumer GPUs -- while maintaining high visual fidelity, which we showcase across diverse tasks. We additionally demonstrate its applicability in a sim-to-real robotics setting. Beyond depth-based sensing, our results highlight how rich visual semantics improve navigation and decision-making, such as avoiding undesirable regions. We further showcase the ease of incorporating thousands of environments from iPhone scans, large-scale scene datasets (e.g., GrandTour, ARKit), and outputs from generative video models like Veo, enabling rapid creation of realistic training worlds. This work bridges high-throughput simulation and high-fidelity perception, advancing scalable and generalizable robot learning. All code and data will be open-sourced for the community to build upon. Videos, code, and data available at https://escontrela.me/gauss_gym/.
Nauplius Optimisation for Autonomous Hydrodynamics
Autonomous Underwater vehicles must operate in strong currents, limited acoustic bandwidth, and persistent sensing requirements where conventional swarm optimisation methods are unreliable. This paper presents NOAH, a novel nature-inspired swarm optimisation algorithm that combines current-aware drift, irreversible settlement in persistent sensing nodes, and colony-based communication. Drawing inspiration from the behaviour of barnacle nauplii, NOAH addresses the critical limitations of existing swarm algorithms by providing hydrodynamic awareness, irreversible anchoring mechanisms, and colony-based communication capabilities essential for underwater exploration missions. The algorithm establishes a comprehensive foundation for scalable and energy-efficient underwater swarm robotics with validated performance analysis. Validation studies demonstrate an 86% success rate for permanent anchoring scenarios, providing a unified formulation for hydrodynamic constraints and irreversible settlement behaviours with an empirical study under flow.
Adaptive Cost-Map-based Path Planning in Partially Unknown Environments with Movable Obstacles
Reliable navigation in disaster-response and other unstructured indoor settings requires robots not only to avoid obstacles but also to recognise when those obstacles can be pushed aside. We present an adaptive, LiDAR and odometry-based path-planning framework that embeds this capability into the ROS2 Nav2 stack. A new Movable Obstacles Layer labels all LiDAR returns missing from a prior static map as tentatively movable and assigns a reduced traversal cost. A companion Slow-Pose Progress Checker monitors the ratio of commanded to actual velocity; when the robot slows appreciably, the local cost is raised from light to heavy, and on a stall to lethal, prompting the global planner to back out and re-route. Gazebo evaluations on a Scout Mini, spanning isolated objects and cluttered corridors, show higher goal-reach rates and fewer deadlocks than a no-layer baseline, with traversal times broadly comparable. Because the method relies only on planar scans and CPU-level computation, it suits resource-constrained search and rescue robots and integrates into heterogeneous platforms with minimal engineering. Overall, the results indicate that interaction-aware cost maps are a lightweight, ROS2-native extension for navigating among potentially movable obstacles in unstructured settings. The full implementation will be released as open source athttps://costmap-namo.github.io.
ASBI: Leveraging Informative Real-World Data for Active Black-Box Simulator Tuning
Black-box simulators are widely used in robotics, but optimizing their parameters remains challenging due to inaccessible likelihoods. Simulation-Based Inference (SBI) tackles this issue using simulation-driven approaches, estimating the posterior from offline real observations and forward simulations. However, in black-box scenarios, preparing observations that contain sufficient information for parameter estimation is difficult due to the unknown relationship between parameters and observations. In this work, we present Active Simulation-Based Inference (ASBI), a parameter estimation framework that uses robots to actively collect real-world online data to achieve accurate black-box simulator tuning. Our framework optimizes robot actions to collect informative observations by maximizing information gain, which is defined as the expected reduction in Shannon entropy between the posterior and the prior. While calculating information gain requires the likelihood, which is inaccessible in black-box simulators, our method solves this problem by leveraging Neural Posterior Estimation (NPE), which leverages a neural network to learn the posterior estimator. Three simulation experiments quantitatively verify that our method achieves accurate parameter estimation, with posteriors sharply concentrated around the true parameters. Moreover, we show a practical application using a real robot to estimate the simulation parameters of cubic particles corresponding to two real objects, beads and gravel, with a bucket pouring action.
Traversability-aware Consistent Situational Graphs for Indoor Localization and Mapping
Scene graphs enhance 3D mapping capabilities in robotics by understanding the relationships between different spatial elements, such as rooms and objects. Recent research extends scene graphs to hierarchical layers, adding and leveraging constraints across these levels. This approach is tightly integrated with pose-graph optimization, improving both localization and mapping accuracy simultaneously. However, when segmenting spatial characteristics, consistently recognizing rooms becomes challenging due to variations in viewpoints and limited field of view (FOV) of sensors. For example, existing real-time approaches often over-segment large rooms into smaller, non-functional spaces that are not useful for localization and mapping due to the time-dependent method. Conversely, their voxel-based room segmentation method often under-segment in complex cases like not fully enclosed 3D space that are non-traversable for ground robots or humans, leading to false constraints in pose-graph optimization. We propose a traversability-aware room segmentation method that considers the interaction between robots and surroundings, with consistent feasibility of traversability information. This enhances both the semantic coherence and computational efficiency of pose-graph optimization. Improved performance is demonstrated through the re-detection frequency of the same rooms in a dataset involving repeated traversals of the same space along the same path, as well as the optimization time consumption.
comment: Accepted by RiTA 2024
CuSfM: CUDA-Accelerated Structure-from-Motion
Efficient and accurate camera pose estimation forms the foundational requirement for dense reconstruction in autonomous navigation, robotic perception, and virtual simulation systems. This paper addresses the challenge via cuSfM, a CUDA-accelerated offline Structure-from-Motion system that leverages GPU parallelization to efficiently employ computationally intensive yet highly accurate feature extractors, generating comprehensive and non-redundant data associations for precise camera pose estimation and globally consistent mapping. The system supports pose optimization, mapping, prior-map localization, and extrinsic refinement. It is designed for offline processing, where computational resources can be fully utilized to maximize accuracy. Experimental results demonstrate that cuSfM achieves significantly improved accuracy and processing speed compared to the widely used COLMAP method across various testing scenarios, while maintaining the high precision and global consistency essential for offline SfM applications. The system is released as an open-source Python wrapper implementation, PyCuSfM, available at https://github.com/nvidia-isaac/pyCuSFM, to facilitate research and applications in computer vision and robotics.
A Generalized Sylvester-Fermat-Torricelli problem with application in disaster relief operations by UAVs
Natural and human-made disasters can cause severe devastation and claim thousands of lives worldwide. Therefore, developing efficient methods for disaster response and management is a critical task for relief teams. One of the most essential components of effective response is the rapid collection of information about affected areas, damages, and victims. More data translates into better coordination, faster rescue operations, and ultimately, more lives saved. However, in some disasters, such as earthquakes, the communication infrastructure is often partially or completely destroyed, making it extremely difficult for victims to send distress signals and for rescue teams to locate and assist them in time. Unmanned Aerial Vehicles (UAVs) have emerged as valuable tools in such scenarios. In particular, a fleet of UAVs can be dispatched from a mobile station to the affected area to facilitate data collection and establish temporary communication networks. Nevertheless, real-world deployment of UAVs faces several challenges, with adverse weather conditions--especially wind--being among the most significant. To address this, we develop a novel mathematical framework to determine the optimal location of a mobile UAV station while explicitly accounting for the heterogeneity of the UAVs and the effect of wind. In particular, we generalize the Sylvester problem to introduce the Sylvester-Fermat-Torricelli (SFT) problem, which captures complex factors such as wind influence, UAV heterogeneity, and back-and-forth motion within a unified framework. The proposed framework enhances the practicality of UAV-based disaster response planning by accounting for real-world factors such as wind and UAV heterogeneity. Experimental results demonstrate that it can reduce wasted operational time by up to 84%, making post-disaster missions significantly more efficient and effective.
PolyFly: Polytopic Optimal Planning for Collision-Free Cable-Suspended Aerial Payload Transportation
Aerial transportation robots using suspended cables have emerged as versatile platforms for disaster response and rescue operations. To maximize the capabilities of these systems, robots need to aggressively fly through tightly constrained environments, such as dense forests and structurally unsafe buildings, while minimizing flight time and avoiding obstacles. Existing methods geometrically over-approximate the vehicle and obstacles, leading to conservative maneuvers and increased flight times. We eliminate these restrictions by proposing PolyFly, an optimal global planner which considers a non-conservative representation for aerial transportation by modeling each physical component of the environment, and the robot (quadrotor, cable and payload), as independent polytopes. We further increase the model accuracy by incorporating the attitude of the physical components by constructing orientation-aware polytopes. The resulting optimal control problem is efficiently solved by converting the polytope constraints into smooth differentiable constraints via duality theory. We compare our method against the existing state-of-the-art approach in eight maze-like environments and show that PolyFly produces faster trajectories in each scenario. We also experimentally validate our proposed approach on a real quadrotor with a suspended payload, demonstrating the practical reliability and accuracy of our method.
LVI-Q: Robust LiDAR-Visual-Inertial-Kinematic Odometry for Quadruped Robots Using Tightly-Coupled and Efficient Alternating Optimization
Autonomous navigation for legged robots in complex and dynamic environments relies on robust simultaneous localization and mapping (SLAM) systems to accurately map surroundings and localize the robot, ensuring safe and efficient operation. While prior sensor fusion-based SLAM approaches have integrated various sensor modalities to improve their robustness, these algorithms are still susceptible to estimation drift in challenging environments due to their reliance on unsuitable fusion strategies. Therefore, we propose a robust LiDAR-visual-inertial-kinematic odometry system that integrates information from multiple sensors, such as a camera, LiDAR, inertial measurement unit (IMU), and joint encoders, for visual and LiDAR-based odometry estimation. Our system employs a fusion-based pose estimation approach that runs optimization-based visual-inertial-kinematic odometry (VIKO) and filter-based LiDAR-inertial-kinematic odometry (LIKO) based on measurement availability. In VIKO, we utilize the footpreintegration technique and robust LiDAR-visual depth consistency using superpixel clusters in a sliding window optimization. In LIKO, we incorporate foot kinematics and employ a point-toplane residual in an error-state iterative Kalman filter (ESIKF). Compared with other sensor fusion-based SLAM algorithms, our approach shows robust performance across public and longterm datasets.
comment: 8 Pages, 9 Figures
NEBULA: Do We Evaluate Vision-Language-Action Agents Correctly?
The evaluation of Vision-Language-Action (VLA) agents is hindered by the coarse, end-task success metric that fails to provide precise skill diagnosis or measure robustness to real-world perturbations. This challenge is exacerbated by a fragmented data landscape that impedes reproducible research and the development of generalist models. To address these limitations, we introduce \textbf{NEBULA}, a unified ecosystem for single-arm manipulation that enables diagnostic and reproducible evaluation. NEBULA features a novel dual-axis evaluation protocol that combines fine-grained \textit{capability tests} for precise skill diagnosis with systematic \textit{stress tests} that measure robustness. A standardized API and a large-scale, aggregated dataset are provided to reduce fragmentation and support cross-dataset training and fair comparison. Using NEBULA, we demonstrate that top-performing VLAs struggle with key capabilities such as spatial reasoning and dynamic adaptation, which are consistently obscured by conventional end-task success metrics. By measuring both what an agent can do and when it does so reliably, NEBULA provides a practical foundation for robust, general-purpose embodied agents.
comment: Homepage: https://vulab-ai.github.io/NEBULA-Alpha/
Cosmos-Surg-dVRK: World Foundation Model-based Automated Online Evaluation of Surgical Robot Policy Learning
The rise of surgical robots and vision-language-action models has accelerated the development of autonomous surgical policies and efficient assessment strategies. However, evaluating these policies directly on physical robotic platforms such as the da Vinci Research Kit (dVRK) remains hindered by high costs, time demands, reproducibility challenges, and variability in execution. World foundation models (WFM) for physical AI offer a transformative approach to simulate complex real-world surgical tasks, such as soft tissue deformation, with high fidelity. This work introduces Cosmos-Surg-dVRK, a surgical finetune of the Cosmos WFM, which, together with a trained video classifier, enables fully automated online evaluation and benchmarking of surgical policies. We evaluate Cosmos-Surg-dVRK using two distinct surgical datasets. On tabletop suture pad tasks, the automated pipeline achieves strong correlation between online rollouts in Cosmos-Surg-dVRK and policy outcomes on the real dVRK Si platform, as well as good agreement between human labelers and the V-JEPA 2-derived video classifier. Additionally, preliminary experiments with ex-vivo porcine cholecystectomy tasks in Cosmos-Surg-dVRK demonstrate promising alignment with real-world evaluations, highlighting the platform's potential for more complex surgical procedures.
DeGrip: A Compact Cable-driven Robotic Gripper for Desktop Disassembly
Intelligent robotic disassembly of end-of-life (EOL) products has been a long-standing challenge in robotics. While machine learning techniques have shown promise, the lack of specialized hardware limits their application in real-world scenarios. We introduce DeGrip, a customized gripper designed for the disassembly of EOL computer desktops. DeGrip provides three degrees of freedom (DOF), enabling arbitrary configurations within the disassembly environment when mounted on a robotic manipulator. It employs a cable-driven transmission mechanism that reduces its overall size and enables operation in confined spaces. The wrist is designed to decouple the actuation of wrist and jaw joints. We also developed an EOL desktop disassembly environment in Isaac Sim to evaluate the effectiveness of DeGrip. The tasks were designed to demonstrate its ability to operate in confined spaces and disassemble components in arbitrary configurations. The evaluation results confirm the capability of DeGrip for EOL desktop disassembly.
VAR-SLAM: Visual Adaptive and Robust SLAM for Dynamic Environments
Visual SLAM in dynamic environments remains challenging, as several existing methods rely on semantic filtering that only handles known object classes, or use fixed robust kernels that cannot adapt to unknown moving objects, leading to degraded accuracy when they appear in the scene. We present VAR-SLAM (Visual Adaptive and Robust SLAM), an ORB-SLAM3-based system that combines a lightweight semantic keypoint filter to deal with known moving objects, with Barron's adaptive robust loss to handle unknown ones. The shape parameter of the robust kernel is estimated online from residuals, allowing the system to automatically adjust between Gaussian and heavy-tailed behavior. We evaluate VAR-SLAM on the TUM RGB-D, Bonn RGB-D Dynamic, and OpenLORIS datasets, which include both known and unknown moving objects. Results show improved trajectory accuracy and robustness over state-of-the-art baselines, achieving up to 25% lower ATE RMSE than NGD-SLAM on challenging sequences, while maintaining performance at 27 FPS on average.
comment: Code available at https://github.com/iit-DLSLab/VAR-SLAM
Zero-Shot Coordination in Ad Hoc Teams with Generalized Policy Improvement and Difference Rewards
Real-world multi-agent systems may require ad hoc teaming, where an agent must coordinate with other previously unseen teammates to solve a task in a zero-shot manner. Prior work often either selects a pretrained policy based on an inferred model of the new teammates or pretrains a single policy that is robust to potential teammates. Instead, we propose to leverage all pretrained policies in a zero-shot transfer setting. We formalize this problem as an ad hoc multi-agent Markov decision process and present a solution that uses two key ideas, generalized policy improvement and difference rewards, for efficient and effective knowledge transfer between different teams. We empirically demonstrate that our algorithm, Generalized Policy improvement for Ad hoc Teaming (GPAT), successfully enables zero-shot transfer to new teams in three simulated environments: cooperative foraging, predator-prey, and Overcooked. We also demonstrate our algorithm in a real-world multi-robot setting.
comment: 10 pages, 8 figures
Aria Gen 2 Pilot Dataset
The Aria Gen 2 Pilot Dataset (A2PD) is an egocentric multimodal open dataset captured using the state-of-the-art Aria Gen 2 glasses. To facilitate timely access, A2PD is released incrementally with ongoing dataset enhancements. The initial release features Dia'ane, our primary subject, who records her daily activities alongside friends, each equipped with Aria Gen 2 glasses. It encompasses five primary scenarios: cleaning, cooking, eating, playing, and outdoor walking. In each of the scenarios, we provide comprehensive raw sensor data and output data from various machine perception algorithms. These data illustrate the device's ability to perceive the wearer, the surrounding environment, and interactions between the wearer and the environment, while maintaining robust performance across diverse users and conditions. The A2PD is publicly available at projectaria.com, with open-source tools and usage examples provided in Project Aria Tools.
CLOVER: Context-aware Long-term Object Viewpoint- and Environment- Invariant Representation Learning
Mobile service robots can benefit from object-level understanding of their environments, including the ability to distinguish object instances and re-identify previously seen instances. Object re-identification is challenging across different viewpoints and in scenes with significant appearance variation arising from weather or lighting changes. Existing works on object re-identification either focus on specific classes or require foreground segmentation. Further, these methods, along with object re-identification datasets, have limited consideration of challenges such as outdoor scenes and illumination changes. To address this problem, we introduce CODa Re-ID: an in-the-wild object re-identification dataset containing 1,037,814 observations of 557 objects across 8 classes under diverse lighting conditions and viewpoints. Further, we propose CLOVER, a representation learning method for object observations that can distinguish between static object instances without requiring foreground segmentation. We also introduce MapCLOVER, a method for scalably summarizing CLOVER descriptors for use in object maps and matching new observations to summarized descriptors. Our results show that CLOVER achieves superior performance in static object re-identification under varying lighting conditions and viewpoint changes and can generalize to unseen instances and classes.
comment: 8 pages, 3 figures, 8 tables
From Language to Locomotion: Retargeting-free Humanoid Control via Motion Latent Guidance
Natural language offers a natural interface for humanoid robots, but existing language-guided humanoid locomotion pipelines remain cumbersome and untrustworthy. They typically decode human motion, retarget it to robot morphology, and then track it with a physics-based controller. However, this multi-stage process is prone to cumulative errors, introduces high latency, and yields weak coupling between semantics and control. These limitations call for a more direct pathway from language to action, one that eliminates fragile intermediate stages. Therefore, we present RoboGhost, a retargeting-free framework that directly conditions humanoid policies on language-grounded motion latents. By bypassing explicit motion decoding and retargeting, RoboGhost enables a diffusion-based policy to denoise executable actions directly from noise, preserving semantic intent and supporting fast, reactive control. A hybrid causal transformer-diffusion motion generator further ensures long-horizon consistency while maintaining stability and diversity, yielding rich latent representations for precise humanoid behavior. Extensive experiments demonstrate that RoboGhost substantially reduces deployment latency, improves success rates and tracking precision, and produces smooth, semantically aligned locomotion on real humanoids. Beyond text, the framework naturally extends to other modalities such as images, audio, and music, providing a universal foundation for vision-language-action humanoid systems.
Onboard Mission Replanning for Adaptive Cooperative Multi-Robot Systems
Cooperative autonomous robotic systems have significant potential for executing complex multi-task missions across space, air, ground, and maritime domains. But they commonly operate in remote, dynamic and hazardous environments, requiring rapid in-mission adaptation without reliance on fragile or slow communication links to centralised compute. Fast, on-board replanning algorithms are therefore needed to enhance resilience. Reinforcement Learning shows strong promise for efficiently solving mission planning tasks when formulated as Travelling Salesperson Problems (TSPs), but existing methods: 1) are unsuitable for replanning, where agents do not start at a single location; 2) do not allow cooperation between agents; 3) are unable to model tasks with variable durations; or 4) lack practical considerations for on-board deployment. Here we define the Cooperative Mission Replanning Problem as a novel variant of multiple TSP with adaptations to overcome these issues, and develop a new encoder/decoder-based model using Graph Attention Networks and Attention Models to solve it effectively and efficiently. Using a simple example of cooperative drones, we show our replanner consistently (90% of the time) maintains performance within 10% of the state-of-the-art LKH3 heuristic solver, whilst running 85-370 times faster on a Raspberry Pi. This work paves the way for increased resilience in autonomous multi-agent systems.
comment: 9 pages, 5 figures, 1 table
Development and Adaptation of Robotic Vision in the Real-World: the Challenge of Door Detection
Mobile service robots are increasingly prevalent in human-centric, real-world domains, operating autonomously in unconstrained indoor environments. In such a context, robotic vision plays a central role in enabling service robots to perceive high-level environmental features from visual observations. Despite the data-driven approaches based on deep learning push the boundaries of vision systems, applying these techniques to real-world robotic scenarios presents unique methodological challenges. Traditional models fail to represent the challenging perception constraints typical of service robots and must be adapted for the specific environment where robots finally operate. We propose a method leveraging photorealistic simulations that balances data quality and acquisition costs for synthesizing visual datasets from the robot perspective used to train deep architectures. Then, we show the benefits in qualifying a general detector for the target domain in which the robot is deployed, showing also the trade-off between the effort for obtaining new examples from such a setting and the performance gain. In our extensive experimental campaign, we focus on the door detection task (namely recognizing the presence and the traversability of doorways) that, in dynamic settings, is useful to infer the topology of the map. Our findings are validated in a real-world robot deployment, comparing prominent deep-learning models and demonstrating the effectiveness of our approach in practical settings.
Contact-Aware Safety in Soft Robots Using High-Order Control Barrier and Lyapunov Functions
Robots operating alongside people, particularly in sensitive scenarios such as aiding the elderly with daily tasks or collaborating with workers in manufacturing, must guarantee safety and cultivate user trust. Continuum soft manipulators promise safety through material compliance, but as designs evolve for greater precision, payload capacity, and speed, and increasingly incorporate rigid elements, their injury risk resurfaces. In this letter, we introduce a comprehensive High-Order Control Barrier Function (HOCBF) + High-Order Control Lyapunov Function (HOCLF) framework that enforces strict contact force limits across the entire soft-robot body during environmental interactions. Our approach combines a differentiable Piecewise Cosserat-Segment (PCS) dynamics model with a convex-polygon distance approximation metric, named Differentiable Conservative Separating Axis Theorem (DCSAT), based on the soft robot geometry to enable real-time, whole-body collision detection, resolution, and enforcement of the safety constraints. By embedding HOCBFs into our optimization routine, we guarantee safety, allowing, for instance, safe navigation in operational space under HOCLF-driven motion objectives. Extensive planar simulations demonstrate that our method maintains safety-bounded contacts while achieving precise shape and task-space regulation. This work thus lays a foundation for the deployment of soft robots in human-centric environments with provable safety and performance.
comment: 8 pages
Pseudo-Kinematic Trajectory Control and Planning of Tracked Vehicles
Tracked vehicles distribute their weight continuously over a large surface area (the tracks). This distinctive feature makes them the preferred choice for vehicles required to traverse soft and uneven terrain. From a robotics perspective, however, this flexibility comes at a cost: the complexity of modelling the system and the resulting difficulty in designing theoretically sound navigation solutions. In this paper, we aim to bridge this gap by proposing a framework for the navigation of tracked vehicles, built upon three key pillars. The first pillar comprises two models: a simulation model and a control-oriented model. The simulation model captures the intricate terramechanics dynamics arising from soil-track interaction and is employed to develop faithful digital twins of the system across a wide range of operating conditions. The control-oriented model is pseudo-kinematic and mathematically tractable, enabling the design of efficient and theoretically robust control schemes. The second pillar is a Lyapunov-based feedback trajectory controller that provides certifiable tracking guarantees. The third pillar is a portfolio of motion planning solutions, each offering different complexity-accuracy trade-offs. The various components of the proposed approach are validated through an extensive set of simulation and experimental data.
Real-time Recognition of Human Interactions from a Single RGB-D Camera for Socially-Aware Robot Navigation
{Recognizing human interactions is essential for social robots as it enables them to navigate safely and naturally in shared environments. Conventional robotic systems however often focus on obstacle avoidance, neglecting social cues necessary for seamless human-robot interaction. To address this gap, we propose a framework to recognize human group interactions for socially aware navigation. Our method utilizes color and depth frames from a monocular RGB-D camera to estimate 3D human keypoints and positions. Principal component analysis (PCA) is then used to determine dominant interaction directions. The shoelace formula is finally applied to compute interest points and engagement areas. Extensive experiments have been conducted to evaluate the validity of the proposed method. The results show that our method is capable of recognizing group interactions across different scenarios with varying numbers of individuals. It also achieves high-speed performance, processing each frame in approximately 4 ms on a single-board computer used in robotic systems. The method is implemented as a ROS 2 package making it simple to integrate into existing navigation systems. Source code is available at https://github.com/thanhlong103/social-interaction-detector
Scaling Multi Agent Reinforcement Learning for Underwater Acoustic Tracking via Autonomous Vehicles
Autonomous vehicles (AV) offer a cost-effective solution for scientific missions such as underwater tracking. Recently, reinforcement learning (RL) has emerged as a powerful method for controlling AVs in complex marine environments. However, scaling these techniques to a fleet--essential for multi-target tracking or targets with rapid, unpredictable motion--presents significant computational challenges. Multi-Agent Reinforcement Learning (MARL) is notoriously sample-inefficient, and while high-fidelity simulators like Gazebo's LRAUV provide 100x faster-than-real-time single-robot simulations, they offer no significant speedup for multi-vehicle scenarios, making MARL training impractical. To address these limitations, we propose an iterative distillation method that transfers high-fidelity simulations into a simplified, GPU-accelerated environment while preserving high-level dynamics. This approach achieves up to a 30,000x speedup over Gazebo through parallelization, enabling efficient training via end-to-end GPU acceleration. Additionally, we introduce a novel Transformer-based architecture (TransfMAPPO) that learns multi-agent policies invariant to the number of agents and targets, significantly improving sample efficiency. Following large-scale curriculum learning conducted entirely on GPU, we perform extensive evaluations in Gazebo, demonstrating that our method maintains tracking errors below 5 meters over extended durations, even in the presence of multiple fast-moving targets. This work bridges the gap between large-scale MARL training and high-fidelity deployment, providing a scalable framework for autonomous fleet control in real-world sea missions.
CLASP: General-Purpose Clothes Manipulation with Semantic Keypoints
Clothes manipulation, such as folding or hanging, is a critical capability for home service robots. Despite recent advances, most existing methods remain limited to specific clothes types and tasks, due to the complex, high-dimensional geometry of clothes. This paper presents CLothes mAnipulation with Semantic keyPoints (CLASP), which aims at general-purpose clothes manipulation over diverse clothes types, T-shirts, shorts, skirts, long dresses, ..., as well as different tasks, folding, flattening, hanging, .... The core idea of CLASP is semantic keypoints-e.g., ''left sleeve'' and ''right shoulder''-a sparse spatial-semantic representation, salient for both perception and action. Semantic keypoints of clothes can be reliably extracted from RGB-D images and provide an effective representation for a wide range of clothes manipulation policies. CLASP uses semantic keypoints as an intermediate representation to connect high-level task planning and low-level action execution. At the high level, it exploits vision language models (VLMs) to predict task plans over the semantic keypoints. At the low level, it executes the plans with the help of a set of pre-built manipulation skills conditioned on the keypoints. Extensive simulation experiments show that CLASP outperforms state-of-the-art baseline methods on multiple tasks across diverse clothes types, demonstrating strong performance and generalization. Further experiments with a Franka dual-arm system on four distinct tasks-folding, flattening, hanging, and placing-confirm CLASP's performance on real-life clothes manipulation.
U-ARM : Ultra low-cost general teleoperation interface for robot manipulation
We propose U-Arm, a low-cost and rapidly adaptable leader-follower teleoperation framework designed to interface with most of commercially available robotic arms. Our system supports teleoperation through three structurally distinct 3D-printed leader arms that share consistent control logic, enabling seamless compatibility with diverse commercial robot configurations. Compared with previous open-source leader-follower interfaces, we further optimized both the mechanical design and servo selection, achieving a bill of materials (BOM) cost of only \$50.5 for the 6-DoF leader arm and \$56.8 for the 7-DoF version. To enhance usability, we mitigate the common challenge in controlling redundant degrees of freedom by %engineering methods mechanical and control optimizations. Experimental results demonstrate that U-Arm achieves 39\% higher data collection efficiency and comparable task success rates across multiple manipulation scenarios compared with Joycon, another low-cost teleoperation interface. We have open-sourced all CAD models of three configs and also provided simulation support for validating teleoperation workflows. We also open-sourced real-world manipulation data collected with U-Arm. The project website is https://github.com/MINT-SJTU/LeRobot-Anything-U-Arm.
Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation
We introduce Genie Envisioner (GE), a unified world foundation platform for robotic manipulation that integrates policy learning, evaluation, and simulation within a single video-generative framework. At its core, GE-Base is a large-scale, instruction-conditioned video diffusion model that captures the spatial, temporal, and semantic dynamics of real-world robotic interactions in a structured latent space. Built upon this foundation, GE-Act maps latent representations to executable action trajectories through a lightweight, flow-matching decoder, enabling precise and generalizable policy inference across diverse embodiments with minimal supervision. To support scalable evaluation and training, GE-Sim serves as an action-conditioned neural simulator, producing high-fidelity rollouts for closed-loop policy development. The platform is further equipped with EWMBench, a standardized benchmark suite measuring visual fidelity, physical consistency, and instruction-action alignment. Together, these components establish Genie Envisioner as a scalable and practical foundation for instruction-driven, general-purpose embodied intelligence. All code, models, and benchmarks will be released publicly.
comment: https://genie-envisioner.github.io/
Learning to Capture Rocks using an Excavator: A Reinforcement Learning Approach with Guiding Reward Formulation
Rock capturing with standard excavator buckets is a challenging task typically requiring the expertise of skilled operators. Unlike soil digging, it involves manipulating large, irregular rocks in unstructured environments where complex contact interactions with granular material make model-based control impractical. Existing autonomous excavation methods focus mainly on continuous media or rely on specialized grippers, limiting their applicability to real-world construction sites. This paper introduces a fully data-driven control framework for rock capturing that eliminates the need for explicit modeling of rock or soil properties. A model-free reinforcement learning agent is trained in the AGX Dynamics simulator using the Proximal Policy Optimization (PPO) algorithm and a guiding reward formulation. The learned policy outputs joint velocity commands directly to the boom, arm, and bucket of a CAT365 excavator model. Robustness is enhanced through extensive domain randomization of rock geometry, density, and mass, as well as the initial configurations of the bucket, rock, and goal position. To the best of our knowledge, this is the first study to develop and evaluate an RL-based controller for the rock capturing task. Experimental results show that the policy generalizes well to unseen rocks and varying soil conditions, achieving high success rates comparable to those of human participants while maintaining machine stability. These findings demonstrate the feasibility of learning-based excavation strategies for discrete object manipulation without requiring specialized hardware or detailed material models.
General-Purpose Robotic Navigation via LVLM-Orchestrated Perception, Reasoning, and Acting
Developing general-purpose navigation policies for unknown environments remains a core challenge in robotics. Most existing systems rely on task-specific neural networks and fixed information flows, limiting their generalizability. Large Vision-Language Models (LVLMs) offer a promising alternative by embedding human-like knowledge for reasoning and planning, but prior LVLM-robot integrations have largely depended on pre-mapped spaces, hard-coded representations, and rigid control logic. We introduce the Agentic Robotic Navigation Architecture (ARNA), a general-purpose framework that equips an LVLM-based agent with a library of perception, reasoning, and navigation tools drawn from modern robotic stacks. At runtime, the agent autonomously defines and executes task-specific workflows that iteratively query modules, reason over multimodal inputs, and select navigation actions. This agentic formulation enables robust navigation and reasoning in previously unmapped environments, offering a new perspective on robotic stack design. Evaluated in Habitat Lab on the HM-EQA benchmark, ARNA outperforms state-of-the-art EQA-specific approaches. Qualitative results on RxR and custom tasks further demonstrate its ability to generalize across a broad range of navigation challenges.
An Intention-driven Lane Change Framework Considering Heterogeneous Dynamic Cooperation in Mixed-traffic Environment
In mixed-traffic environments, where autonomous vehicles (AVs) interact with diverse human-driven vehicles (HVs), unpredictable intentions and heterogeneous behaviors make safe and efficient lane change maneuvers highly challenging. Existing methods often oversimplify these interactions by assuming uniform patterns. We propose an intention-driven lane change framework that integrates driving-style recognition, cooperation-aware decision-making, and coordinated motion planning. A deep learning classifier trained on the NGSIM dataset identifies human driving styles in real time. A cooperation score with intrinsic and interactive components estimates surrounding drivers' intentions and quantifies their willingness to cooperate with the ego vehicle. Decision-making combines behavior cloning with inverse reinforcement learning to determine whether a lane change should be initiated. For trajectory generation, model predictive control is integrated with IRL-based intention inference to produce collision-free and socially compliant maneuvers. Experiments show that the proposed model achieves 94.2\% accuracy and 94.3\% F1-score, outperforming rule-based and learning-based baselines by 4-15\% in lane change recognition. These results highlight the benefit of modeling inter-driver heterogeneity and demonstrate the potential of the framework to advance context-aware and human-like autonomous driving in complex traffic environments.
Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model
Vision-language-action (VLA) models have recently shown strong potential in enabling robots to follow language instructions and execute precise actions. However, most VLAs are built upon vision-language models pretrained solely on 2D data, which lack accurate spatial awareness and hinder their ability to operate in the 3D physical world. Existing solutions attempt to incorporate explicit 3D sensor inputs such as depth maps or point clouds, but these approaches face challenges due to sensor noise, hardware heterogeneity, and incomplete depth coverage in existing datasets. Alternative methods that estimate 3D cues from 2D images also suffer from the limited performance of depth estimators. We propose Spatial Forcing (SF), a simple yet effective alignment strategy that implicitly forces VLA models to develop spatial comprehension capabilities without relying on explicit 3D inputs or depth estimators. SF aligns intermediate visual embeddings of VLAs with geometric representations produced by pretrained 3D foundation models. By enforcing alignment at intermediate layers, SF guides VLAs to encode richer spatial representations that enhance action precision. Extensive experiments in simulation and real-world environments demonstrate that SF achieves state-of-the-art results, surpassing both 2D- and 3D-based VLAs. SF further accelerates training by up to 3.8x and improves data efficiency across diverse robotic tasks. Project page is at https://spatial-forcing.github.io/
Self-supervised Multi-future Occupancy Forecasting for Autonomous Driving
Environment prediction frameworks are critical for the safe navigation of autonomous vehicles (AVs) in dynamic settings. LiDAR-generated occupancy grid maps (L-OGMs) offer a robust bird's-eye view for the scene representation, enabling self-supervised joint scene predictions while exhibiting resilience to partial observability and perception detection failures. Prior approaches have focused on deterministic L-OGM prediction architectures within the grid cell space. While these methods have seen some success, they frequently produce unrealistic predictions and fail to capture the stochastic nature of the environment. Additionally, they do not effectively integrate additional sensor modalities present in AVs. Our proposed framework, Latent Occupancy Prediction (LOPR), performs stochastic L-OGM prediction in the latent space of a generative architecture and allows for conditioning on RGB cameras, maps, and planned trajectories. We decode predictions using either a single-step decoder, which provides high-quality predictions in real-time, or a diffusion-based batch decoder, which can further refine the decoded frames to address temporal consistency issues and reduce compression losses. Our experiments on the nuScenes and Waymo Open datasets show that all variants of our approach qualitatively and quantitatively outperform prior approaches.
comment: Accepted to Robotics: Science and Systems (RSS) 2025
MOBODY: Model Based Off-Dynamics Offline Reinforcement Learning
We study off-dynamics offline reinforcement learning, where the goal is to learn a policy from offline source and limited target datasets with mismatched dynamics. Existing methods either penalize the reward or discard source transitions occurring in parts of the transition space with high dynamics shift. As a result, they optimize the policy using data from low-shift regions, limiting exploration of high-reward states in the target domain that do not fall within these regions. Consequently, such methods often fail when the dynamics shift is significant or the optimal trajectories lie outside the low-shift regions. To overcome this limitation, we propose MOBODY, a Model-Based Off-Dynamics Offline RL algorithm that optimizes a policy using learned target dynamics transitions to explore the target domain, rather than only being trained with the low dynamics-shift transitions. For the dynamics learning, built on the observation that achieving the same next state requires taking different actions in different domains, MOBODY employs separate action encoders for each domain to encode different actions to the shared latent space while sharing a unified representation of states and a common transition function. We further introduce a target Q-weighted behavior cloning loss in policy optimization to avoid out-of-distribution actions, which push the policy toward actions with high target-domain Q-values, rather than high source domain Q-values or uniformly imitating all actions in the offline dataset. We evaluate MOBODY on a wide range of MuJoCo and Adroit benchmarks, demonstrating that it outperforms state-of-the-art off-dynamics RL baselines as well as policy learning methods based on different dynamics learning baselines, with especially pronounced improvements in challenging scenarios where existing methods struggle.
MLFM: Multi-Layered Feature Maps for Richer Language Understanding in Zero-Shot Semantic Navigation
Recent progress in large vision-language models has driven improvements in language-based semantic navigation, where an embodied agent must reach a target object described in natural language. Yet we still lack a clear, language-focused evaluation framework to test how well agents ground the words in their instructions. We address this gap by proposing LangNav, an open-vocabulary multi-object navigation dataset with natural language goal descriptions (e.g. 'go to the red short candle on the table') and corresponding fine-grained linguistic annotations (e.g., attributes: color=red, size=short; relations: support=on). These labels enable systematic evaluation of language understanding. To evaluate on this setting, we extend multi-object navigation task setting to Language-guided Multi-Object Navigation (LaMoN), where the agent must find a sequence of goals specified using language. Furthermore, we propose Multi-Layered Feature Map (MLFM), a novel method that builds a queryable, multi-layered semantic map from pretrained vision-language features and proves effective for reasoning over fine-grained attributes and spatial relations in goal descriptions. Experiments on LangNav show that MLFM outperforms state-of-the-art zero-shot mapping-based navigation baselines.
SegDAC: Improving Visual Reinforcement Learning by Extracting Dynamic Objectc-Centric Representations from Pretrained Vision Models
Visual reinforcement learning (RL) is challenging due to the need to extract useful representations from high-dimensional inputs while learning effective control from sparse and noisy rewards. Although large perception models exist, integrating them effectively into RL for visual generalization and improved sample efficiency remains difficult. We propose SegDAC, a Segmentation-Driven Actor-Critic method. SegDAC uses Segment Anything (SAM) for object-centric decomposition and YOLO-World to ground the image segmentation process via text inputs. It includes a novel transformer-based architecture that supports a dynamic number of segments at each time step and effectively learns which segments to focus on using online RL, without using human labels. By evaluating SegDAC over a challenging visual generalization benchmark using Maniskill3, which covers diverse manipulation tasks under strong visual perturbations, we demonstrate that SegDAC achieves significantly better visual generalization, doubling prior performance on the hardest setting and matching or surpassing prior methods in sample efficiency across all evaluated tasks.
Interpretable Decision-Making for End-to-End Autonomous Driving ICCV 2025
Trustworthy AI is mandatory for the broad deployment of autonomous vehicles. Although end-to-end approaches derive control commands directly from raw data, interpreting these decisions remains challenging, especially in complex urban scenarios. This is mainly attributed to very deep neural networks with non-linear decision boundaries, making it challenging to grasp the logic behind AI-driven decisions. This paper presents a method to enhance interpretability while optimizing control commands in autonomous driving. To address this, we propose loss functions that promote the interpretability of our model by generating sparse and localized feature maps. The feature activations allow us to explain which image regions contribute to the predicted control command. We conduct comprehensive ablation studies on the feature extraction step and validate our method on the CARLA benchmarks. We also demonstrate that our approach improves interpretability, which correlates with reducing infractions, yielding a safer, high-performance driving model. Notably, our monocular, non-ensemble model surpasses the top-performing approaches from the CARLA Leaderboard by achieving lower infraction scores and the highest route completion rate, all while ensuring interpretability.
comment: Accepted to the ICCV 2025 2nd Workshop on the Challenge Of Out-of-Label Hazards in Autonomous Driving (2COOOL)
ZeST: an LLM-based Zero-Shot Traversability Navigation for Unknown Environments
The advancement of robotics and autonomous navigation systems hinges on the ability to accurately predict terrain traversability. Traditional methods for generating datasets to train these prediction models often involve putting robots into potentially hazardous environments, posing risks to equipment and safety. To solve this problem, we present ZeST, a novel approach leveraging visual reasoning capabilities of Large Language Models (LLMs) to create a traversability map in real-time without exposing robots to danger. Our approach not only performs zero-shot traversability and mitigates the risks associated with real-world data collection but also accelerates the development of advanced navigation systems, offering a cost-effective and scalable solution. To support our findings, we present navigation results, in both controlled indoor and unstructured outdoor environments. As shown in the experiments, our method provides safer navigation when compared to other state-of-the-art methods, constantly reaching the final goal.
LOPR: Latent Occupancy PRediction using Generative Models
Environment prediction frameworks are integral for autonomous vehicles, enabling safe navigation in dynamic environments. LiDAR generated occupancy grid maps (L-OGMs) offer a robust bird's eye-view scene representation that facilitates joint scene predictions without relying on manual labeling unlike commonly used trajectory prediction frameworks. Prior approaches have optimized deterministic L-OGM prediction architectures directly in grid cell space. While these methods have achieved some degree of success in prediction, they occasionally grapple with unrealistic and incorrect predictions. We claim that the quality and realism of the forecasted occupancy grids can be enhanced with the use of generative models. We propose a framework that decouples occupancy prediction into: representation learning and stochastic prediction within the learned latent space. Our approach allows for conditioning the model on other available sensor modalities such as RGB-cameras and high definition maps. We demonstrate that our approach achieves state-of-the-art performance and is readily transferable between different robotic platforms on the real-world NuScenes, Waymo Open, and a custom dataset we collected on an experimental vehicle platform.
comment: We recommend referring to the peer-reviewed and updated version of this approach, available at arXiv:2407.21126
Multiagent Systems
Grassroots Logic Programs: A Secure, Multiagent, Concurrent, Logic Programming Language
Grassroots platforms are distributed applications run by\linebreak cryptographically-identified people on their networked personal devices, where multiple disjoint platform instances emerge independently and coalesce when they interoperate. Their foundation is the grassroots social graph, upon which grassroots social networks, grassroots cryptocurrencies, and grassroots democratic federations can be built. Grassroots platforms have yet to be implemented, the key challenge being faulty and malicious participants: without secure programming support, correct participants cannot reliably identify each other, establish secure communication, or verify each other's code integrity. We present Grassroots Logic Programs (GLP), a secure, multiagent, concurrent, logic programming language for implementing grassroots platforms. GLP extends logic programs with paired single-reader/single-writer (SRSW) logic variables, providing secure communication channels among cryptographically-identified people through encrypted, signed and attested messages, which enable identity and code integrity verification. We present GLP progressively: logic programs, concurrent GLP, multiagent GLP, augmenting it with cryptographic security, and providing smartphone implementation-ready specifications. We prove safety properties including that GLP computations are deductions, SRSW preservation, acyclicity, and monotonicity. We prove multiagent GLP is grassroots and that GLP streams achieve blockchain security properties. We present a grassroots social graph protocol establishing authenticated peer-to-peer connections and demonstrate secure grassroots social networking applications.
AURA: An Agent Autonomy Risk Assessment Framework AAMAS 2026
As autonomous agentic AI systems see increasing adoption across organisations, persistent challenges in alignment, governance, and risk management threaten to impede deployment at scale. We present AURA (Agent aUtonomy Risk Assessment), a unified framework designed to detect, quantify, and mitigate risks arising from agentic AI. Building on recent research and practical deployments, AURA introduces a gamma-based risk scoring methodology that balances risk assessment accuracy with computational efficiency and practical considerations. AURA provides an interactive process to score, evaluate and mitigate the risks of running one or multiple AI Agents, synchronously or asynchronously (autonomously). The framework is engineered for Human-in-the-Loop (HITL) oversight and presents Agent-to-Human (A2H) communication mechanisms, allowing for seamless integration with agentic systems for autonomous self-assessment, rendering it interoperable with established protocols (MCP and A2A) and tools. AURA supports a responsible and transparent adoption of agentic AI and provides robust risk detection and mitigation while balancing computational resources, positioning it as a critical enabler for large-scale, governable agentic AI in enterprise environments.
comment: 10 pages, 2 figures. Submitted for open-access preprint on arXiv. Based on the AAMAS 2026 paper template
Build Your Personalized Research Group: A Multiagent Framework for Continual and Interactive Science Automation
The automation of scientific discovery represents a critical milestone in Artificial Intelligence (AI) research. However, existing agentic systems for science suffer from two fundamental limitations: rigid, pre-programmed workflows that cannot adapt to intermediate findings, and inadequate context management that hinders long-horizon research. We present \texttt{freephdlabor}, an open-source multiagent framework featuring \textit{fully dynamic workflows} determined by real-time agent reasoning and a \coloremph{\textit{modular architecture}} enabling seamless customization -- users can modify, add, or remove agents to address domain-specific requirements. The framework provides comprehensive infrastructure including \textit{automatic context compaction}, \textit{workspace-based communication} to prevent information degradation, \textit{memory persistence} across sessions, and \textit{non-blocking human intervention} mechanisms. These features collectively transform automated research from isolated, single-run attempts into \textit{continual research programs} that build systematically on prior explorations and incorporate human feedback. By providing both the architectural principles and practical implementation for building customizable co-scientist systems, this work aims to facilitate broader adoption of automated research across scientific domains, enabling practitioners to deploy interactive multiagent systems that autonomously conduct end-to-end research -- from ideation through experimentation to publication-ready manuscripts.
comment: 37 pages, 5 figures. Code: https://github.com/ltjed/freephdlabor
Hypergame-based Cognition Modeling and Intention Interpretation for Human-Driven Vehicles in Connected Mixed Traffic
With the practical implementation of connected and autonomous vehicles (CAVs), the traffic system is expected to remain a mix of CAVs and human-driven vehicles (HVs) for the foreseeable future. To enhance safety and traffic efficiency, the trajectory planning strategies of CAVs must account for the influence of HVs, necessitating accurate HV trajectory prediction. Current research often assumes that human drivers have perfect knowledge of all vehicles' objectives, an unrealistic premise. This paper bridges the gap by leveraging hypergame theory to account for cognitive and perception limitations in HVs. We model human bounded rationality without assuming them to be merely passive followers and propose a hierarchical cognition modeling framework that captures cognitive relationships among vehicles. We further analyze the cognitive stability of the system, proving that the strategy profile where all vehicles adopt cognitively equilibrium strategies constitutes a hyper Nash equilibrium when CAVs accurately learn HV parameters. To achieve this, we develop an inverse learning algorithm for distributed intention interpretation via vehicle-to-everything (V2X) communication, which extends the framework to both offline and online scenarios. Additionally, we introduce a distributed trajectory prediction and planning approach for CAVs, leveraging the learned parameters in real time. Simulations in highway lane-changing scenarios demonstrate the proposed method's accuracy in parameter learning, robustness to noisy trajectory observations, and safety in HV trajectory prediction. The results validate the effectiveness of our method in both offline and online implementations.
TranSimHub:A Unified Air-Ground Simulation Platform for Multi-Modal Perception and Decision-Making
Air-ground collaborative intelligence is becoming a key approach for next-generation urban intelligent transportation management, where aerial and ground systems work together on perception, communication, and decision-making. However, the lack of a unified multi-modal simulation environment has limited progress in studying cross-domain perception, coordination under communication constraints, and joint decision optimization. To address this gap, we present TranSimHub, a unified simulation platform for air-ground collaborative intelligence. TranSimHub offers synchronized multi-view rendering across RGB, depth, and semantic segmentation modalities, ensuring consistent perception between aerial and ground viewpoints. It also supports information exchange between the two domains and includes a causal scene editor that enables controllable scenario creation and counterfactual analysis under diverse conditions such as different weather, emergency events, and dynamic obstacles. We release TranSimHub as an open-source platform that supports end-to-end research on perception, fusion, and control across realistic air and ground traffic scenes. Our code is available at https://github.com/Traffic-Alpha/TranSimHub.
comment: 9 pages, 4 figures
Personalized Collaborative Learning with Affinity-Based Variance Reduction
Multi-agent learning faces a fundamental tension: leveraging distributed collaboration without sacrificing the personalization needed for diverse agents. This tension intensifies when aiming for full personalization while adapting to unknown heterogeneity levels -- gaining collaborative speedup when agents are similar, without performance degradation when they are different. Embracing the challenge, we propose personalized collaborative learning (PCL), a novel framework for heterogeneous agents to collaboratively learn personalized solutions with seamless adaptivity. Through carefully designed bias correction and importance correction mechanisms, our method AffPCL robustly handles both environment and objective heterogeneity. We prove that AffPCL reduces sample complexity over independent learning by a factor of $\max\{n^{-1}, \delta\}$, where $n$ is the number of agents and $\delta\in[0,1]$ measures their heterogeneity. This affinity-based acceleration automatically interpolates between the linear speedup of federated learning in homogeneous settings and the baseline of independent learning, without requiring prior knowledge of the system. Our analysis further reveals that an agent may obtain linear speedup even by collaborating with arbitrarily dissimilar agents, unveiling new insights into personalization and collaboration in the high heterogeneity regime.
Heterogeneous Multi-Agent Task-Assignment with Uncertain Execution Times and Preferences
While sequential task assignment for a single agent has been widely studied, such problems in a multi-agent setting, where the agents have heterogeneous task preferences or capabilities, remain less well-characterized. We study a multi-agent task assignment problem where a central planner assigns recurring tasks to multiple members of a team over a finite time horizon. For any given task, the members have heterogeneous capabilities in terms of task completion times, task resource consumption (which can model variables such as energy or attention), and preferences in terms of the rewards they collect upon task completion. We assume that the reward, execution time, and resource consumption for each member to complete any task are stochastic with unknown distributions. The goal of the planner is to maximize the total expected reward that the team receives over the problem horizon while ensuring that the resource consumption required for any assigned task is within the capability of the agent. We propose and analyze a bandit algorithm for this problem. Since the bandit algorithm relies on solving an optimal task assignment problem repeatedly, we analyze the achievable regret in two cases: when we can solve the optimal task assignment exactly and when we can solve it only approximately.
comment: 14 pages
Zero-Shot Coordination in Ad Hoc Teams with Generalized Policy Improvement and Difference Rewards
Real-world multi-agent systems may require ad hoc teaming, where an agent must coordinate with other previously unseen teammates to solve a task in a zero-shot manner. Prior work often either selects a pretrained policy based on an inferred model of the new teammates or pretrains a single policy that is robust to potential teammates. Instead, we propose to leverage all pretrained policies in a zero-shot transfer setting. We formalize this problem as an ad hoc multi-agent Markov decision process and present a solution that uses two key ideas, generalized policy improvement and difference rewards, for efficient and effective knowledge transfer between different teams. We empirically demonstrate that our algorithm, Generalized Policy improvement for Ad hoc Teaming (GPAT), successfully enables zero-shot transfer to new teams in three simulated environments: cooperative foraging, predator-prey, and Overcooked. We also demonstrate our algorithm in a real-world multi-robot setting.
comment: 10 pages, 8 figures
Agentic AI for Ultra-Modern Networks: Multi-Agent Framework for RAN Autonomy and Assurance
The increasing complexity of Beyond 5G and 6G networks necessitates new paradigms for autonomy and assur- ance. Traditional O-RAN control loops rely heavily on RIC- based orchestration, which centralizes intelligence and exposes the system to risks such as policy conflicts, data drift, and unsafe actions under unforeseen conditions. In this work, we argue that the future of autonomous networks lies in a multi-agentic architecture, where specialized agents collaborate to perform data collection, model training, prediction, policy generation, verification, deployment, and assurance. By replacing tightly- coupled centralized RIC-based workflows with distributed agents, the framework achieves autonomy, resilience, explainability, and system-wide safety. To substantiate this vision, we design and evaluate a traffic steering use case under surge and drift conditions. Results across four KPIs: RRC connected users, IP throughput, PRB utilization, and SINR, demonstrate that a naive predictor-driven deployment improves local KPIs but destabilizes neighbors, whereas the agentic system blocks unsafe policies, preserving global network health. This study highlights multi- agent architectures as a credible foundation for trustworthy AI- driven autonomy in next-generation RANs.
VLMLight: Safety-Critical Traffic Signal Control via Vision-Language Meta-Control and Dual-Branch Reasoning Architecture
Traffic signal control (TSC) is a core challenge in urban mobility, where real-time decisions must balance efficiency and safety. Existing methods - ranging from rule-based heuristics to reinforcement learning (RL) - often struggle to generalize to complex, dynamic, and safety-critical scenarios. We introduce VLMLight, a novel TSC framework that integrates vision-language meta-control with dual-branch reasoning. At the core of VLMLight is the first image-based traffic simulator that enables multi-view visual perception at intersections, allowing policies to reason over rich cues such as vehicle type, motion, and spatial density. A large language model (LLM) serves as a safety-prioritized meta-controller, selecting between a fast RL policy for routine traffic and a structured reasoning branch for critical cases. In the latter, multiple LLM agents collaborate to assess traffic phases, prioritize emergency vehicles, and verify rule compliance. Experiments show that VLMLight reduces waiting times for emergency vehicles by up to 65% over RL-only systems, while preserving real-time performance in standard conditions with less than 1% degradation. VLMLight offers a scalable, interpretable, and safety-aware solution for next-generation traffic signal control.
comment: 25 pages, 15 figures
Bayesian Ego-graph inference for Networked Multi-Agent Reinforcement Learning NeurIPS 2025
In networked multi-agent reinforcement learning (Networked-MARL), decentralized agents must act under local observability and constrained communication over fixed physical graphs. Existing methods often assume static neighborhoods, limiting adaptability to dynamic or heterogeneous environments. While centralized frameworks can learn dynamic graphs, their reliance on global state access and centralized infrastructure is impractical in real-world decentralized systems. We propose a stochastic graph-based policy for Networked-MARL, where each agent conditions its decision on a sampled subgraph over its local physical neighborhood. Building on this formulation, we introduce BayesG, a decentralized actor-framework that learns sparse, context-aware interaction structures via Bayesian variational inference. Each agent operates over an ego-graph and samples a latent communication mask to guide message passing and policy computation. The variational distribution is trained end-to-end alongside the policy using an evidence lower bound (ELBO) objective, enabling agents to jointly learn both interaction topology and decision-making strategies. BayesG outperforms strong MARL baselines on large-scale traffic control tasks with up to 167 agents, demonstrating superior scalability, efficiency, and performance.
comment: Accepted at NeurIPS 2025
Topological Structure Learning Should Be A Research Priority for LLM-Based Multi-Agent Systems
Large Language Model-based Multi-Agent Systems (MASs) have emerged as a powerful paradigm for tackling complex tasks through collaborative intelligence. However, the topology of these systems--how agents in MASs should be configured, connected, and coordinated--remains largely unexplored. In this position paper, we call for a paradigm shift toward \emph{topology-aware MASs} that explicitly model and dynamically optimize the structure of inter-agent interactions. We identify three fundamental components--agents, communication links, and overall topology--that collectively determine the system's adaptability, efficiency, robustness, and fairness. To operationalize this vision, we introduce a systematic three-stage framework: 1) agent selection, 2) structure profiling, and 3) topology synthesis. This framework not only provides a principled foundation for designing MASs but also opens new research frontiers across language modeling, reinforcement learning, graph learning, and generative modeling to ultimately unleash their full potential in complex real-world applications. We conclude by outlining key challenges and opportunities in MASs evaluation. We hope our framework and perspectives offer critical new insights in the era of agentic AI.
LayerCraft: Enhancing Text-to-Image Generation with CoT Reasoning and Layered Object Integration
Text-to-image (T2I) generation has made remarkable progress, yet existing systems still lack intuitive control over spatial composition, object consistency, and multi-step editing. We present $\textbf{LayerCraft}$, a modular framework that uses large language models (LLMs) as autonomous agents to orchestrate structured, layered image generation and editing. LayerCraft supports two key capabilities: (1) $\textit{structured generation}$ from simple prompts via chain-of-thought (CoT) reasoning, enabling it to decompose scenes, reason about object placement, and guide composition in a controllable, interpretable manner; and (2) $\textit{layered object integration}$, allowing users to insert and customize objects -- such as characters or props -- across diverse images or scenes while preserving identity, context, and style. The system comprises a coordinator agent, the $\textbf{ChainArchitect}$ for CoT-driven layout planning, and the $\textbf{Object Integration Network (OIN)}$ for seamless image editing using off-the-shelf T2I models without retraining. Through applications like batch collage editing and narrative scene generation, LayerCraft empowers non-experts to iteratively design, customize, and refine visual content with minimal manual effort. Code will be released at https://github.com/PeterYYZhang/LayerCraft.
comment: 26 pages
Robust Federated Inference
Federated inference, in the form of one-shot federated learning, edge ensembles, or federated ensembles, has emerged as an attractive solution to combine predictions from multiple models. This paradigm enables each model to remain local and proprietary while a central server queries them and aggregates predictions. Yet, the robustness of federated inference has been largely neglected, leaving them vulnerable to even simple attacks. To address this critical gap, we formalize the problem of robust federated inference and provide the first robustness analysis of this class of methods. Our analysis of averaging-based aggregators shows that the error of the aggregator is small either when the dissimilarity between honest responses is small or the margin between the two most probable classes is large. Moving beyond linear averaging, we show that problem of robust federated inference with non-linear aggregators can be cast as an adversarial machine learning problem. We then introduce an advanced technique using the DeepSet aggregation model, proposing a novel composition of adversarial training and test-time robust aggregation to robustify non-linear aggregators. Our composition yields significant improvements, surpassing existing robust aggregation methods by 4.7 - 22.2% in accuracy points across diverse benchmarks.
Audit the Whisper: Detecting Steganographic Collusion in Multi-Agent LLMs
Multi-agent deployments of large language models (LLMs) are increasingly embedded in market, allocation, and governance workflows, yet covert coordination among agents can silently erode trust and social welfare. Existing audits are dominated by heuristics that lack theoretical guarantees, struggle to transfer across tasks, and seldom ship with the infrastructure needed for independent replication. We introduce Audit the Whisper, a conference-grade research artifact that spans theory, benchmark design, detection, and reproducibility. Our contributions are: (i) a channel-capacity analysis showing how interventions such as paraphrase, rate limiting, and role permutation impose quantifiable capacity penalties-operationalised via paired-run Kullback--Leibler diagnostics-that tighten mutual-information thresholds with finite-sample guarantees and full proofs; (ii) ColludeBench-v0, covering pricing, first-price auctions, peer review, and hosted Gemini/Groq APIs with configurable covert schemes, deterministic manifests, and reward instrumentation; and (iii) a calibrated auditing pipeline that fuses cross-run mutual information, permutation invariance, watermark variance, and fairness-aware acceptance bias, each tuned to a $10^{-3}$ false-positive budget and validated by 10k honest runs plus an e-value martingale. Across ColludeBench and external suites including Secret Collusion, CASE, Perfect Collusion Benchmark, and SentinelAgent, the union meta-test attains state-of-the-art power at fixed FPR while ablations surface price-of-auditing trade-offs and fairness-driven colluders invisible to MI alone. We release regeneration scripts, anonymized manifests, and documentation so that external auditors can reproduce every figure, satisfy double-blind requirements, and extend the framework with minimal effort.
comment: 13 pages, 0 figures
Robotics
Ponimator: Unfolding Interactive Pose for Versatile Human-human Interaction Animation ICCV 2025
Close-proximity human-human interactive poses convey rich contextual information about interaction dynamics. Given such poses, humans can intuitively infer the context and anticipate possible past and future dynamics, drawing on strong priors of human behavior. Inspired by this observation, we propose Ponimator, a simple framework anchored on proximal interactive poses for versatile interaction animation. Our training data consists of close-contact two-person poses and their surrounding temporal context from motion-capture interaction datasets. Leveraging interactive pose priors, Ponimator employs two conditional diffusion models: (1) a pose animator that uses the temporal prior to generate dynamic motion sequences from interactive poses, and (2) a pose generator that applies the spatial prior to synthesize interactive poses from a single pose, text, or both when interactive poses are unavailable. Collectively, Ponimator supports diverse tasks, including image-based interaction animation, reaction animation, and text-to-interaction synthesis, facilitating the transfer of interaction knowledge from high-quality mocap data to open-world scenarios. Empirical experiments across diverse datasets and applications demonstrate the universality of the pose prior and the effectiveness and robustness of our framework.
comment: Accepted to ICCV 2025. Project page: https://stevenlsw.github.io/ponimator/
RDD: Retrieval-Based Demonstration Decomposer for Planner Alignment in Long-Horizon Tasks NeurIPS 2025
To tackle long-horizon tasks, recent hierarchical vision-language-action (VLAs) frameworks employ vision-language model (VLM)-based planners to decompose complex manipulation tasks into simpler sub-tasks that low-level visuomotor policies can easily handle. Typically, the VLM planner is finetuned to learn to decompose a target task. This finetuning requires target task demonstrations segmented into sub-tasks by either human annotation or heuristic rules. However, the heuristic subtasks can deviate significantly from the training data of the visuomotor policy, which degrades task performance. To address these issues, we propose a Retrieval-based Demonstration Decomposer (RDD) that automatically decomposes demonstrations into sub-tasks by aligning the visual features of the decomposed sub-task intervals with those from the training data of the low-level visuomotor policies. Our method outperforms the state-of-the-art sub-task decomposer on both simulation and real-world tasks, demonstrating robustness across diverse settings. Code and more results are available at rdd-neurips.github.io.
comment: 39th Conference on Neural Information Processing Systems (NeurIPS 2025); Project Website: rdd-neurips.github.io
CBF-RL: Safety Filtering Reinforcement Learning in Training with Control Barrier Functions
Reinforcement learning (RL), while powerful and expressive, can often prioritize performance at the expense of safety. Yet safety violations can lead to catastrophic outcomes in real-world deployments. Control Barrier Functions (CBFs) offer a principled method to enforce dynamic safety -- traditionally deployed \emph{online} via safety filters. While the result is safe behavior, the fact that the RL policy does not have knowledge of the CBF can lead to conservative behaviors. This paper proposes CBF-RL, a framework for generating safe behaviors with RL by enforcing CBFs \emph{in training}. CBF-RL has two key attributes: (1) minimally modifying a nominal RL policy to encode safety constraints via a CBF term, (2) and safety filtering of the policy rollouts in training. Theoretically, we prove that continuous-time safety filters can be deployed via closed-form expressions on discrete-time roll-outs. Practically, we demonstrate that CBF-RL internalizes the safety constraints in the learned policy -- both enforcing safer actions and biasing towards safer rewards -- enabling safe deployment without the need for an online safety filter. We validate our framework through ablation studies on navigation tasks and on the Unitree G1 humanoid robot, where CBF-RL enables safer exploration, faster convergence, and robust performance under uncertainty, enabling the humanoid robot to avoid obstacles and climb stairs safely in real-world settings without a runtime safety filter.
comment: 8 pages
From Language to Locomotion: Retargeting-free Humanoid Control via Motion Latent Guidance
Natural language offers a natural interface for humanoid robots, but existing language-guided humanoid locomotion pipelines remain cumbersome and unreliable. They typically decode human motion, retarget it to robot morphology, and then track it with a physics-based controller. However, this multi-stage process is prone to cumulative errors, introduces high latency, and yields weak coupling between semantics and control. These limitations call for a more direct pathway from language to action, one that eliminates fragile intermediate stages. Therefore, we present RoboGhost, a retargeting-free framework that directly conditions humanoid policies on language-grounded motion latents. By bypassing explicit motion decoding and retargeting, RoboGhost enables a diffusion-based policy to denoise executable actions directly from noise, preserving semantic intent and supporting fast, reactive control. A hybrid causal transformer-diffusion motion generator further ensures long-horizon consistency while maintaining stability and diversity, yielding rich latent representations for precise humanoid behavior. Extensive experiments demonstrate that RoboGhost substantially reduces deployment latency, improves success rates and tracking accuracy, and produces smooth, semantically aligned locomotion on real humanoids. Beyond text, the framework naturally extends to other modalities such as images, audio, and music, providing a general foundation for vision-language-action humanoid systems.
Architecture Is All You Need: Diversity-Enabled Sweet Spots for Robust Humanoid Locomotion
Robust humanoid locomotion in unstructured environments requires architectures that balance fast low-level stabilization with slower perceptual decision-making. We show that a simple layered control architecture (LCA), a proprioceptive stabilizer running at high rate, coupled with a compact low-rate perceptual policy, enables substantially more robust performance than monolithic end-to-end designs, even when using minimal perception encoders. Through a two-stage training curriculum (blind stabilizer pretraining followed by perceptual fine-tuning), we demonstrate that layered policies consistently outperform one-stage alternatives in both simulation and hardware. On a Unitree G1 humanoid, our approach succeeds across stair and ledge tasks where one-stage perceptual policies fail. These results highlight that architectural separation of timescales, rather than network scale or complexity, is the key enabler for robust perception-conditioned locomotion.
comment: 8 pages
EdgeNavMamba: Mamba Optimized Object Detection for Energy Efficient Edge Devices
Deployment of efficient and accurate Deep Learning models has long been a challenge in autonomous navigation, particularly for real-time applications on resource-constrained edge devices. Edge devices are limited in computing power and memory, making model efficiency and compression essential. In this work, we propose EdgeNavMamba, a reinforcement learning-based framework for goal-directed navigation using an efficient Mamba object detection model. To train and evaluate the detector, we introduce a custom shape detection dataset collected in diverse indoor settings, reflecting visual cues common in real-world navigation. The object detector serves as a pre-processing module, extracting bounding boxes (BBOX) from visual input, which are then passed to an RL policy to control goal-oriented navigation. Experimental results show that the student model achieved a reduction of 67% in size, and up to 73% in energy per inference on edge devices of NVIDIA Jetson Orin Nano and Raspberry Pi 5, while keeping the same performance as the teacher model. EdgeNavMamba also maintains high detection accuracy in MiniWorld and IsaacLab simulators while reducing parameters by 31% compared to the baseline. In the MiniWorld simulator, the navigation policy achieves over 90% success across environments of varying complexity.
comment: The 11th IEEE International Conference on Edge Computing and Scalable Cloud (IEEE EdgeCom 2025)
VT-Refine: Learning Bimanual Assembly with Visuo-Tactile Feedback via Simulation Fine-Tunin
Humans excel at bimanual assembly tasks by adapting to rich tactile feedback -- a capability that remains difficult to replicate in robots through behavioral cloning alone, due to the suboptimality and limited diversity of human demonstrations. In this work, we present VT-Refine, a visuo-tactile policy learning framework that combines real-world demonstrations, high-fidelity tactile simulation, and reinforcement learning to tackle precise, contact-rich bimanual assembly. We begin by training a diffusion policy on a small set of demonstrations using synchronized visual and tactile inputs. This policy is then transferred to a simulated digital twin equipped with simulated tactile sensors and further refined via large-scale reinforcement learning to enhance robustness and generalization. To enable accurate sim-to-real transfer, we leverage high-resolution piezoresistive tactile sensors that provide normal force signals and can be realistically modeled in parallel using GPU-accelerated simulation. Experimental results show that VT-Refine improves assembly performance in both simulation and the real world by increasing data diversity and enabling more effective policy fine-tuning. Our project page is available at https://binghao-huang.github.io/vt_refine/.
comment: Accepted by 9th Conference on Robot Learning (CoRL 2025); Website: https://binghao-huang.github.io/vt_refine/
Design of Paper Robot Building Kits
Building robots is an engaging activity that provides opportunities for hands-on learning. However, traditional robot-building kits are usually costly with limited functionality due to material and technology constraints. To improve the accessibility and flexibility of such kits, we take paper as the building material and extensively explore the versatility of paper-based interactions. Based on an analysis of current robot-building kits and paper-based interaction research, we propose a design space for devising paper robots. We also analyzed our building kit designs using this design space, where these kits demonstrate the potential of paper as a cost-effective material for robot building. As a starting point, our design space and building kit examples provide a guideline that inspires and informs future research and development of novel paper robot-building kits.
VLA^2: Empowering Vision-Language-Action Models with an Agentic Framework for Unseen Concept Manipulation
Current vision-language-action (VLA) models, pre-trained on large-scale robotic data, exhibit strong multi-task capabilities and generalize well to variations in visual and language instructions for manipulation. However, their success rate drops significantly when faced with object concepts outside the training data, such as unseen object descriptions and textures in the dataset. To address this, we propose a novel agentic framework, VLA^2, which leverages OpenVLA as the execution backbone and effectively leverages external modules such as web retrieval and object detection to provide visual and textual knowledge about target objects to the VLA. This approach mitigates generalization failure when handling out-of-distribution objects. Based on the LIBERO simulation environment, we introduced novel objects and object descriptions to construct a new evaluation benchmark with three difficulty levels to test the effectiveness of our method. Our framework successfully outperformed the current state-of-the-art models on our designed hard-level generalization benchmark. Compared to the standalone OpenVLA baseline, VLA^2 achieves a 44.2% improvement in the success rate in the hard-level benchmark and an average improvement of 20.2% in all customized environments without any performance degradation on in-domain tasks. Project website: https://vla-2.github.io.
STITCHER: Constrained Trajectory Planning in Known Environments with Real-Time Motion Primitive Search
Autonomous high-speed navigation through large, complex environments requires real-time generation of agile trajectories that are dynamically feasible, collision-free, and satisfy state or actuator constraints. Modern trajectory planning techniques primarily use numerical optimization, as they enable the systematic computation of high-quality, expressive trajectories that satisfy various constraints. However, stringent requirements on computation time and the risk of numerical instability can limit the use of optimization-based planners in safety-critical scenarios. This work presents an optimization-free planning framework called STITCHER that stitches short trajectory segments together with graph search to compute long-range, expressive, and near-optimal trajectories in real-time. STITCHER outperforms modern optimization-based planners through our innovative planning architecture and several algorithmic developments that make real-time planning possible. Extensive simulation testing is performed to analyze the algorithmic components that make up STITCHER, along with a thorough comparison with two state-of-the-art optimization planners. Simulation tests show that safe trajectories can be created within a few milliseconds for paths that span the entirety of two 50 m x 50 m environments. Hardware tests with a custom quadrotor verify that STITCHER can produce trackable paths in real-time while respecting nonconvex constraints, such as limits on tilt angle and motor forces, which are otherwise hard to include in optimization-based planners.
SADCHER: Scheduling using Attention-based Dynamic Coalitions of Heterogeneous Robots in Real-Time
We present Sadcher, a real-time task assignment framework for heterogeneous multi-robot teams that incorporates dynamic coalition formation and task precedence constraints. Sadcher is trained through Imitation Learning and combines graph attention and transformers to predict assignment rewards between robots and tasks. Based on the predicted rewards, a relaxed bipartite matching step generates high-quality schedules with feasibility guarantees. We explicitly model robot and task positions, task durations, and robots' remaining processing times, enabling advanced temporal and spatial reasoning and generalization to environments with different spatiotemporal distributions compared to training. Trained on optimally solved small-scale instances, our method can scale to larger task sets and team sizes. Sadcher outperforms other learning-based and heuristic baselines on randomized, unseen problems for small and medium-sized teams with computation times suitable for real-time operation. We also explore sampling-based variants and evaluate scalability across robot and task counts. In addition, we release our dataset of 250,000 optimal schedules: https://autonomousrobots.nl/paper_websites/sadcher_MRTA/
comment: 7 pages, 5 figures. 2025 IEEE Int. Symposium on Multi-Robot and Multi-Agent Systems (MRS 2025). Website and Code: https://autonomousrobots.nl/paper_websites/sadcher_MRTA/
Multi Agent Switching Mode Controller for Sound Source localization
Source seeking is an important topic in robotic research, especially considering sound-based sensors since they allow the agents to locate a target even in critical conditions where it is not possible to establish a direct line of sight. In this work, we design a multi- agent switching mode control strategy for acoustic-based target localization. Two scenarios are considered: single source localization, in which the agents are driven maintaining a rigid formation towards the target, and multi-source scenario, in which each agent searches for the targets independently from the others.
QDepth-VLA: Quantized Depth Prediction as Auxiliary Supervision for Vision-Language-Action Models
Spatial perception and reasoning are crucial for Vision-Language-Action (VLA) models to accomplish fine-grained manipulation tasks. However, existing approaches often lack the ability to understand and reason over the essential 3D structures necessary for precise control. To address this limitation, we propose QDepth-VLA, a general framework that augments VLA models with an auxiliary depth prediction task. A dedicated depth expert is designed to predict quantized latent tokens of depth maps obtained from a VQ-VAE encoder, enabling the model to learn depth-aware representations that capture critical geometric cues. Experimental results on the simulation benchmarks and real-world tasks demonstrate that QDepth-VLA yields strong spatial reasoning and competitive performance on manipulation tasks.
RL-100: Performant Robotic Manipulation with Real-World Reinforcement Learning
Real-world robotic manipulation in homes and factories demands reliability, efficiency, and robustness that approach or surpass skilled human operators. We present RL-100, a real-world reinforcement learning training framework built on diffusion visuomotor policies trained bu supervised learning. RL-100 introduces a three-stage pipeline. First, imitation learning leverages human priors. Second, iterative offline reinforcement learning uses an Offline Policy Evaluation procedure, abbreviated OPE, to gate PPO-style updates that are applied in the denoising process for conservative and reliable improvement. Third, online reinforcement learning eliminates residual failure modes. An additional lightweight consistency distillation head compresses the multi-step sampling process in diffusion into a single-step policy, enabling high-frequency control with an order-of-magnitude reduction in latency while preserving task performance. The framework is task-, embodiment-, and representation-agnostic and supports both 3D point clouds and 2D RGB inputs, a variety of robot platforms, and both single-step and action-chunk policies. We evaluate RL-100 on seven real-robot tasks spanning dynamic rigid-body control, such as Push-T and Agile Bowling, fluids and granular pouring, deformable cloth folding, precise dexterous unscrewing, and multi-stage orange juicing. RL-100 attains 100\% success across evaluated trials for a total of 900 out of 900 episodes, including up to 250 out of 250 consecutive trials on one task. The method achieves near-human teleoperation or better time efficiency and demonstrates multi-hour robustness with uninterrupted operation lasting up to two hours.
comment: https://lei-kun.github.io/RL-100/
RoboGPT-R1: Enhancing Robot Planning with Reinforcement Learning
Improving the reasoning capabilities of embodied agents is crucial for robots to complete complex human instructions in long-view manipulation tasks successfully. Despite the success of large language models and vision language models based on Supervised Fine-Tuning (SFT) in planning tasks, they continue facing challenges in performing long-horizon manipulation tasks in complex real-world environments, owing to their restricted common sense and reasoning capabilities. Considering that aligning general-purpose vision language models to robotic planning tasks via supervised fine-tuning suffers from poor generalization and insufficient physical understanding, we propose RoboGPT-R1, a two-stage fine-tuning framework for embodied planning. In this framework, supervised training acquires foundational knowledge through expert sequences, followed by RL to address the model's shortcomings in visual-spatial understanding and reasoning. To achieve physical understanding and action sequence consistency in multi-step reasoning tasks, we design a rule-based reward function that simultaneously considers long-horizon performance and action constraint in the environment. The reasoning model, trained on Qwen2.5-VL-3B, significantly outperforms the larger-scale model, GPT-4o-mini, by 21.33% and surpasses other work trained on Qwen2.5-VL-7B by 20.33% on the EmbodiedBench benchmark.
Neural Implicit Flow Fields for Spatio-Temporal Motion Mapping
Safe and efficient robot operation in complex human environments can benefit from good models of site-specific motion patterns. Maps of Dynamics (MoDs) provide such models by encoding statistical motion patterns in a map, but existing representations use discrete spatial sampling and typically require costly offline construction. We propose a continuous spatio-temporal MoD representation based on implicit neural functions that directly map coordinates to the parameters of a Semi-Wrapped Gaussian Mixture Model. This removes the need for discretization and imputation for unevenly sampled regions, enabling smooth generalization across both space and time. Evaluated on a large public dataset with long-term real-world people tracking data, our method achieves better accuracy of motion representation and smoother velocity distributions in sparse regions while still being computationally efficient, compared to available baselines. The proposed approach demonstrates a powerful and efficient way of modeling complex human motion patterns.
SkyDreamer: Interpretable End-to-End Vision-Based Drone Racing with Model-Based Reinforcement Learning
Autonomous drone racing (ADR) systems have recently achieved champion-level performance, yet remain highly specific to drone racing. While end-to-end vision-based methods promise broader applicability, no system to date simultaneously achieves full sim-to-real transfer, onboard execution, and champion-level performance. In this work, we present SkyDreamer, to the best of our knowledge, the first end-to-end vision-based ADR policy that maps directly from pixel-level representations to motor commands. SkyDreamer builds on informed Dreamer, a model-based reinforcement learning approach where the world model decodes to privileged information only available during training. By extending this concept to end-to-end vision-based ADR, the world model effectively functions as an implicit state and parameter estimator, greatly improving interpretability. SkyDreamer runs fully onboard without external aid, resolves visual ambiguities by tracking progress using the state decoded from the world model's hidden state, and requires no extrinsic camera calibration, enabling rapid deployment across different drones without retraining. Real-world experiments show that SkyDreamer achieves robust, high-speed flight, executing tight maneuvers such as an inverted loop, a split-S and a ladder, reaching speeds of up to 21 m/s and accelerations of up to 6 g. It further demonstrates a non-trivial visual sim-to-real transfer by operating on poor-quality segmentation masks, and exhibits robustness to battery depletion by accurately estimating the maximum attainable motor RPM and adjusting its flight path in real-time. These results highlight SkyDreamer's adaptability to important aspects of the reality gap, bringing robustness while still achieving extremely high-speed, agile flight.
Open TeleDex: A Hardware-Agnostic Teleoperation System for Imitation Learning based Dexterous Manipulation
Accurate and high-fidelity demonstration data acquisition is a critical bottleneck for deploying robot Imitation Learning (IL) systems, particularly when dealing with heterogeneous robotic platforms. Existing teleoperation systems often fail to guarantee high-precision data collection across diverse types of teleoperation devices. To address this, we developed Open TeleDex, a unified teleoperation framework engineered for demonstration data collection. Open TeleDex specifically tackles the TripleAny challenge, seamlessly supporting any robotic arm, any dexterous hand, and any external input device. Furthermore, we propose a novel hand pose retargeting algorithm that significantly boosts the interoperability of Open TeleDex, enabling robust and accurate compatibility with an even wider spectrum of heterogeneous master and slave equipment. Open TeleDex establishes a foundational, high-quality, and publicly available platform for accelerating both academic research and industry development in complex robotic manipulation and IL.
comment: 17 pages
Leveraging Neural Descriptor Fields for Learning Contact-Aware Dynamic Recovery
Real-world dexterous manipulation often encounters unexpected errors and disturbances, which can lead to catastrophic failures, such as dropping the manipulated object. To address this challenge, we focus on the problem of catching a falling object while it remains within grasping range and, importantly, resetting the system to a configuration favorable for resuming the primary manipulation task. We propose Contact-Aware Dynamic Recovery (CADRE), a reinforcement learning framework that incorporates a Neural Descriptor Field (NDF)-inspired module to extract implicit contact features. Compared to methods that rely solely on object pose or point cloud input, NDFs can directly reason about finger-object correspondence and adapt to different object geometries. Our experiments show that incorporating contact features improves training efficiency, enhances convergence performance for RL training, and ultimately leads to more successful recoveries. Additionally, we demonstrate that CADRE can generalize zero-shot to unseen objects with different geometries.
When Planners Meet Reality: How Learned, Reactive Traffic Agents Shift nuPlan Benchmarks
Planner evaluation in closed-loop simulation often uses rule-based traffic agents, whose simplistic and passive behavior can hide planner deficiencies and bias rankings. Widely used IDM agents simply follow a lead vehicle and cannot react to vehicles in adjacent lanes, hindering tests of complex interaction capabilities. We address this issue by integrating the state-of-the-art learned traffic agent model SMART into nuPlan. Thus, we are the first to evaluate planners under more realistic conditions and quantify how conclusions shift when narrowing the sim-to-real gap. Our analysis covers 14 recent planners and established baselines and shows that IDM-based simulation overestimates planning performance: nearly all scores deteriorate. In contrast, many planners interact better than previously assumed and even improve in multi-lane, interaction-heavy scenarios like lane changes or turns. Methods trained in closed-loop demonstrate the best and most stable driving performance. However, when reaching their limits in augmented edge-case scenarios, all learned planners degrade abruptly, whereas rule-based planners maintain reasonable basic behavior. Based on our results, we suggest SMART-reactive simulation as a new standard closed-loop benchmark in nuPlan and release the SMART agents as a drop-in alternative to IDM at https://github.com/shgd95/InteractiveClosedLoop.
Requirement Identification for Traffic Simulations in Driving Simulators
This paper addresses the challenge of ensuring realistic traffic conditions by proposing a methodology that systematically identifies traffic simulation requirements. Using a structured approach based on sub-goals in each study phase, specific technical needs are derived for microscopic levels, agent models, and visual representation. The methodology aims to maintain a high degree of fidelity, enhancing both the validity of experimental outcomes and participant engagement. By providing a clear link between study objectives and traffic simulation design, this approach supports robust automotive development and testing.
comment: 2 Pages, 1 figure
Spatially anchored Tactile Awareness for Robust Dexterous Manipulation
Dexterous manipulation requires precise geometric reasoning, yet existing visuo-tactile learning methods struggle with sub-millimeter precision tasks that are routine for traditional model-based approaches. We identify a key limitation: while tactile sensors provide rich contact information, current learning frameworks fail to effectively leverage both the perceptual richness of tactile signals and their spatial relationship with hand kinematics. We believe an ideal tactile representation should explicitly ground contact measurements in a stable reference frame while preserving detailed sensory information, enabling policies to not only detect contact occurrence but also precisely infer object geometry in the hand's coordinate system. We introduce SaTA (Spatially-anchored Tactile Awareness for dexterous manipulation), an end-to-end policy framework that explicitly anchors tactile features to the hand's kinematic frame through forward kinematics, enabling accurate geometric reasoning without requiring object models or explicit pose estimation. Our key insight is that spatially grounded tactile representations allow policies to not only detect contact occurrence but also precisely infer object geometry in the hand's coordinate system. We validate SaTA on challenging dexterous manipulation tasks, including bimanual USB-C mating in free space, a task demanding sub-millimeter alignment precision, as well as light bulb installation requiring precise thread engagement and rotational control, and card sliding that demands delicate force modulation and angular precision. These tasks represent significant challenges for learning-based methods due to their stringent precision requirements. Across multiple benchmarks, SaTA significantly outperforms strong visuo-tactile baselines, improving success rates by up to 30 percentage while reducing task completion times by 27 percentage.
comment: 8 pages
Generative Models From and For Sampling-Based MPC: A Bootstrapped Approach For Adaptive Contact-Rich Manipulation
We present a generative predictive control (GPC) framework that amortizes sampling-based Model Predictive Control (SPC) by bootstrapping it with conditional flow-matching models trained on SPC control sequences collected in simulation. Unlike prior work relying on iterative refinement or gradient-based solvers, we show that meaningful proposal distributions can be learned directly from noisy SPC data, enabling more efficient and informed sampling during online planning. We further demonstrate, for the first time, the application of this approach to real-world contact-rich loco-manipulation with a quadruped robot. Extensive experiments in simulation and on hardware show that our method improves sample efficiency, reduces planning horizon requirements, and generalizes robustly across task variations.
comment: 9 pages, 5 figures
GOPLA: Generalizable Object Placement Learning via Synthetic Augmentation of Human Arrangement
Robots are expected to serve as intelligent assistants, helping humans with everyday household organization. A central challenge in this setting is the task of object placement, which requires reasoning about both semantic preferences (e.g., common-sense object relations) and geometric feasibility (e.g., collision avoidance). We present GOPLA, a hierarchical framework that learns generalizable object placement from augmented human demonstrations. A multi-modal large language model translates human instructions and visual inputs into structured plans that specify pairwise object relationships. These plans are then converted into 3D affordance maps with geometric common sense by a spatial mapper, while a diffusion-based planner generates placement poses guided by test-time costs, considering multi-plan distributions and collision avoidance. To overcome data scarcity, we introduce a scalable pipeline that expands human placement demonstrations into diverse synthetic training data. Extensive experiments show that our approach improves placement success rates by 30.04 percentage points over the runner-up, evaluated on positioning accuracy and physical plausibility, demonstrating strong generalization across a wide range of real-world robotic placement scenarios.
Accelerated Multi-Modal Motion Planning Using Context-Conditioned Diffusion Models
Classical methods in robot motion planning, such as sampling-based and optimization-based methods, often struggle with scalability towards higher-dimensional state spaces and complex environments. Diffusion models, known for their capability to learn complex, high-dimensional and multi-modal data distributions, provide a promising alternative when applied to motion planning problems and have already shown interesting results. However, most of the current approaches train their model for a single environment, limiting their generalization to environments not seen during training. The techniques that do train a model for multiple environments rely on a specific camera to provide the model with the necessary environmental information and therefore always require that sensor. To effectively adapt to diverse scenarios without the need for retraining, this research proposes Context-Aware Motion Planning Diffusion (CAMPD). CAMPD leverages a classifier-free denoising probabilistic diffusion model, conditioned on sensor-agnostic contextual information. An attention mechanism, integrated in the well-known U-Net architecture, conditions the model on an arbitrary number of contextual parameters. CAMPD is evaluated on a 7-DoF robot manipulator and benchmarked against state-of-the-art approaches on real-world tasks, showing its ability to generalize to unseen environments and generate high-quality, multi-modal trajectories, at a fraction of the time required by existing methods.
comment: This paper has been submitted and has not yet been peer reviewed or accepted for publication
Proprioceptive Image: An Image Representation of Proprioceptive Data from Quadruped Robots for Contact Estimation Learning
This paper presents a novel approach for representing proprioceptive time-series data from quadruped robots as structured two-dimensional images, enabling the use of convolutional neural networks for learning locomotion-related tasks. The proposed method encodes temporal dynamics from multiple proprioceptive signals, such as joint positions, IMU readings, and foot velocities, while preserving the robot's morphological structure in the spatial arrangement of the image. This transformation captures inter-signal correlations and gait-dependent patterns, providing a richer feature space than direct time-series processing. We apply this concept in the problem of contact estimation, a key capability for stable and adaptive locomotion on diverse terrains. Experimental evaluations on both real-world datasets and simulated environments show that our image-based representation consistently enhances prediction accuracy and generalization over conventional sequence-based models, underscoring the potential of cross-modal encoding strategies for robotic state learning. Our method achieves superior performance on the contact dataset, improving contact state accuracy from 87.7% to 94.5% over the recently proposed MI-HGNN method, using a 15 times shorter window size.
A Generalized Placeability Metric for Model-Free Unified Pick-and-Place Reasoning
To reliably pick and place unknown objects under real-world sensing noise remains a challenging task, as existing methods rely on strong object priors (e.g., CAD models), or planar-support assumptions, limiting generalization and unified reasoning between grasping and placing. In this work, we introduce a generalized placeability metric that evaluates placement poses directly from noisy point clouds, without any shape priors. The metric jointly scores stability, graspability, and clearance. From raw geometry, we extract the support surfaces of the object to generate diverse candidates for multi-orientation placement and sample contacts that satisfy collision and stability constraints. By conditioning grasp scores on each candidate placement, our proposed method enables model-free unified pick-and-place reasoning and selects grasp-place pairs that lead to stable, collision-free placements. On unseen real objects and non-planar object supports, our metric delivers CAD-comparable accuracy in predicting stability loss and generally produces more physically plausible placements than learning-based predictors.
QuASH: Using Natural-Language Heuristics to Query Visual-Language Robotic Maps ICRA 2026
Embeddings from Visual-Language Models are increasingly utilized to represent semantics in robotic maps, offering an open-vocabulary scene understanding that surpasses traditional, limited labels. Embeddings enable on-demand querying by comparing embedded user text prompts to map embeddings via a similarity metric. The key challenge in performing the task indicated in a query is that the robot must determine the parts of the environment relevant to the query. This paper proposes a solution to this challenge. We leverage natural-language synonyms and antonyms associated with the query within the embedding space, applying heuristics to estimate the language space relevant to the query, and use that to train a classifier to partition the environment into matches and non-matches. We evaluate our method through extensive experiments, querying both maps and standard image benchmarks. The results demonstrate increased queryability of maps and images. Our querying technique is agnostic to the representation and encoder used, and requires limited training.
comment: Submitted to ICRA 2026
Stability Criteria and Motor Performance in Delayed Haptic Dyadic Interactions Mediated by Robots
This paper establishes analytical stability criteria for robot-mediated human-human (dyadic) interaction systems, focusing on haptic communication under network-induced time delays. Through frequency-domain analysis supported by numerical simulations, we identify both delay-independent and delay-dependent stability criteria. The delay-independent criterion guarantees stability irrespective of the delay, whereas the delay-dependent criterion is characterised by a maximum tolerable delay before instability occurs. The criteria demonstrate dependence on controller and robot dynamic parameters, where increasing stiffness reduces the maximum tolerable delay in a non-linear manner, thereby heightening system vulnerability. The proposed criteria can be generalised to a wide range of robot-mediated interactions and serve as design guidelines for stable remote dyadic systems. Experiments with robots performing human-like movements further illustrate the correlation between stability and motor performance. The findings of this paper suggest the prerequisites for effective delay-compensation strategies.
Restoring Noisy Demonstration for Imitation Learning With Diffusion Models
Imitation learning (IL) aims to learn a policy from expert demonstrations and has been applied to various applications. By learning from the expert policy, IL methods do not require environmental interactions or reward signals. However, most existing imitation learning algorithms assume perfect expert demonstrations, but expert demonstrations often contain imperfections caused by errors from human experts or sensor/control system inaccuracies. To address the above problems, this work proposes a filter-and-restore framework to best leverage expert demonstrations with inherent noise. Our proposed method first filters clean samples from the demonstrations and then learns conditional diffusion models to recover the noisy ones. We evaluate our proposed framework and existing methods in various domains, including robot arm manipulation, dexterous manipulation, and locomotion. The experiment results show that our proposed framework consistently outperforms existing methods across all the tasks. Ablation studies further validate the effectiveness of each component and demonstrate the framework's robustness to different noise types and levels. These results confirm the practical applicability of our framework to noisy offline demonstration data.
comment: Published in IEEE Transactions on Neural Networks and Learning Systems (TNNLS)
Towards Adaptable Humanoid Control via Adaptive Motion Tracking
Humanoid robots are envisioned to adapt demonstrated motions to diverse real-world conditions while accurately preserving motion patterns. Existing motion prior approaches enable well adaptability with a few motions but often sacrifice imitation accuracy, whereas motion-tracking methods achieve accurate imitation yet require many training motions and a test-time target motion to adapt. To combine their strengths, we introduce AdaMimic, a novel motion tracking algorithm that enables adaptable humanoid control from a single reference motion. To reduce data dependence while ensuring adaptability, our method first creates an augmented dataset by sparsifying the single reference motion into keyframes and applying light editing with minimal physical assumptions. A policy is then initialized by tracking these sparse keyframes to generate dense intermediate motions, and adapters are subsequently trained to adjust tracking speed and refine low-level actions based on the adjustment, enabling flexible time warping that further improves imitation accuracy and adaptability. We validate these significant improvements in our approach in both simulation and the real-world Unitree G1 humanoid robot in multiple tasks across a wide range of adaptation conditions. Videos and code are available at https://taohuang13.github.io/adamimic.github.io/.
comment: 9 pages
RoboANKLE: Design, Development, and Functional Evaluation of a Robotic Ankle with a Motorized Compliant Unit
This study presents a powered transtibial prosthesis with complete push-off assistance, RoboANKLE. The design aims to fulfill specific requirements, such as a sufficient range of motion (RoM) while providing the necessary torque for achieving natural ankle motion in daily activities. Addressing the challenges faced in designing active transtibial prostheses, such as maintaining energetic autonomy and minimizing weight, is vital for the study. With this aim, we try to imitate the human ankle by providing extensive push-off assistance to achieve a natural-like torque profile. Thus, Energy Store and Extended Release mechanism (ESER) is employed with a novel Extra Energy Storage (EES) mechanism. Kinematic and kinetic analyses are carried out to determine the design parameters and assess the design performance. Subsequently, a Computer-Aided Design (CAD) model is built and used in comprehensive dynamic and structural analyses. These analyses are used for the design performance evaluation and determine the forces and torques applied to the prosthesis, which aids in optimizing the design for minimal weight via structural analysis and topology optimization. The design of the prototype is then finalized and manufactured for experimental evaluation to validate the design and functionality. The prototype is realized with a mass of 1.92 kg and dimensions of 261x107x420 mm. The Functional evaluations of the RoboANKLE revealed that it is capable of achieving the natural maximum dorsi-flexion angle with 95% accuracy. Also, Thanks to the implemented mechanisms, the results show that RoboANKLE can generate 57% higher than the required torque for natural walking. The result of the power generation capacity of the RoboANKLE is 10% more than the natural power during the gait cycle.
SUM-AgriVLN: Spatial Understanding Memory for Agricultural Vision-and-Language Navigation
Agricultural robots are emerging as powerful assistants across a wide range of agricultural tasks, nevertheless, still heavily rely on manual operation or fixed rail systems for movement. The AgriVLN method and the A2A benchmark pioneeringly extend Vision-and-Language Navigation (VLN) to the agricultural domain, enabling robots to navigate to the target positions following the natural language instructions. In practical agricultural scenarios, navigation instructions often repeatedly occur, yet AgriVLN treat each instruction as an independent episode, overlooking the potential of past experiences to provide spatial context for subsequent ones. To bridge this gap, we propose the method of Spatial Understanding Memory for Agricultural Vision-and-Language Navigation (SUM-AgriVLN), in which the SUM module employs spatial understanding and save spatial memory through 3D reconstruction and representation. When evaluated on the A2A benchmark, our SUM-AgriVLN effectively improves Success Rate from 0.47 to 0.54 with slight sacrifice on Navigation Error from 2.91m to 2.93m, demonstrating the state-of-the-art performance in the agricultural domain. Code: https://github.com/AlexTraveling/SUM-AgriVLN.
Leveraging Cycle-Consistent Anchor Points for Self-Supervised RGB-D Registration ICRA 2024
With the rise in consumer depth cameras, a wealth of unlabeled RGB-D data has become available. This prompts the question of how to utilize this data for geometric reasoning of scenes. While many RGB-D registration meth- ods rely on geometric and feature-based similarity, we take a different approach. We use cycle-consistent keypoints as salient points to enforce spatial coherence constraints during matching, improving correspondence accuracy. Additionally, we introduce a novel pose block that combines a GRU recurrent unit with transformation synchronization, blending historical and multi-view data. Our approach surpasses previous self- supervised registration methods on ScanNet and 3DMatch, even outperforming some older supervised methods. We also integrate our components into existing methods, showing their effectiveness.
comment: 8 pages, accepted at ICRA 2024 (International Conference on Robotics and Automation)
Risk-Aware Reinforcement Learning with Bandit-Based Adaptation for Quadrupedal Locomotion
In this work, we study risk-aware reinforcement learning for quadrupedal locomotion. Our approach trains a family of risk-conditioned policies using a Conditional Value-at-Risk (CVaR) constrained policy optimization technique that provides improved stability and sample efficiency. At deployment, we adaptively select the best performing policy from the family of policies using a multi-armed bandit framework that uses only observed episodic returns, without any privileged environment information, and adapts to unknown conditions on the fly. Hence, we train quadrupedal locomotion policies at various levels of robustness using CVaR and adaptively select the desired level of robustness online to ensure performance in unknown environments. We evaluate our method in simulation across eight unseen settings (by changing dynamics, contacts, sensing noise, and terrain) and on a Unitree Go2 robot in previously unseen terrains. Our risk-aware policy attains nearly twice the mean and tail performance in unseen environments compared to other baselines and our bandit-based adaptation selects the best-performing risk-aware policy in unknown terrain within two minutes of operation.
Expertise need not monopolize: Action-Specialized Mixture of Experts for Vision-Language-Action Learning
Vision-Language-Action (VLA) models are experiencing rapid development and demonstrating promising capabilities in robotic manipulation tasks. However, scaling up VLA models presents several critical challenges: (1) Training new VLA models from scratch demands substantial computational resources and extensive datasets. Given the current scarcity of robot data, it becomes particularly valuable to fully leverage well-pretrained VLA model weights during the scaling process. (2) Real-time control requires carefully balancing model capacity with computational efficiency. To address these challenges, We propose AdaMoE, a Mixture-of-Experts (MoE) architecture that inherits pretrained weights from dense VLA models, and scales up the action expert by substituting the feedforward layers into sparsely activated MoE layers. AdaMoE employs a decoupling technique that decouples expert selection from expert weighting through an independent scale adapter working alongside the traditional router. This enables experts to be selected based on task relevance while contributing with independently controlled weights, allowing collaborative expert utilization rather than winner-takes-all dynamics. Our approach demonstrates that expertise need not monopolize. Instead, through collaborative expert utilization, we can achieve superior performance while maintaining computational efficiency. AdaMoE consistently outperforms the baseline model across key benchmarks, delivering performance gains of 1.8% on LIBERO and 9.3% on RoboTwin. Most importantly, a substantial 21.5% improvement in real-world experiments validates its practical effectiveness for robotic manipulation tasks.
Learning Human-Humanoid Coordination for Collaborative Object Carrying
Human-humanoid collaboration shows significant promise for applications in healthcare, domestic assistance, and manufacturing. While compliant robot-human collaboration has been extensively developed for robotic arms, enabling compliant human-humanoid collaboration remains largely unexplored due to humanoids' complex whole-body dynamics. In this paper, we propose a proprioception-only reinforcement learning approach, COLA, that combines leader and follower behaviors within a single policy. The model is trained in a closed-loop environment with dynamic object interactions to predict object motion patterns and human intentions implicitly, enabling compliant collaboration to maintain load balance through coordinated trajectory planning. We evaluate our approach through comprehensive simulator and real-world experiments on collaborative carrying tasks, demonstrating the effectiveness, generalization, and robustness of our model across various terrains and objects. Simulation experiments demonstrate that our model reduces human effort by 24.7%. compared to baseline approaches while maintaining object stability. Real-world experiments validate robust collaborative carrying across different object types (boxes, desks, stretchers, etc.) and movement patterns (straight-line, turning, slope climbing). Human user studies with 23 participants confirm an average improvement of 27.4% compared to baseline models. Our method enables compliant human-humanoid collaborative carrying without requiring external sensors or complex interaction models, offering a practical solution for real-world deployment.
Prescribed Performance Control of Deformable Object Manipulation in Spatial Latent Space
Manipulating three-dimensional (3D) deformable objects presents significant challenges for robotic systems due to their infinite-dimensional state space and complex deformable dynamics. This paper proposes a novel model-free approach for shape control with constraints imposed on key points. Unlike existing methods that rely on feature dimensionality reduction, the proposed controller leverages the coordinates of key points as the feature vector, which are extracted from the deformable object's point cloud using deep learning methods. This approach not only reduces the dimensionality of the feature space but also retains the spatial information of the object. By extracting key points, the manipulation of deformable objects is simplified into a visual servoing problem, where the shape dynamics are described using a deformation Jacobian matrix. To enhance control accuracy, a prescribed performance control method is developed by integrating barrier Lyapunov functions (BLF) to enforce constraints on the key points. The stability of the closed-loop system is rigorously analyzed and verified using the Lyapunov method. Experimental results further demonstrate the effectiveness and robustness of the proposed method.
Lagrange-Poincaré-Kepler Equations of Disturbed Space-Manipulator Systems in Orbit
This article presents an extension of the Lagrange-Poincare Equations (LPE) to model the dynamics of spacecraft-manipulator systems operating within a non-inertial orbital reference frame. Building upon prior formulations of LPE for vehicle-manipulator systems, the proposed framework, termed the Lagrange-Poincare-Kepler Equations (LPKE), incorporates the coupling between spacecraft attitude dynamics, orbital motion, and manipulator kinematics. The formalism combines the Euler-Poincare equations for the base spacecraft, Keplerian orbital dynamics for the reference frame, and reduced Euler-Lagrange equations for the manipulator's shape space, using an exponential joint parametrization. Leveraging the Lagrange-d'Alembert principle on principal bundles, we derive novel closed-form structural matrices that explicitly capture the effects of orbital disturbances and their dynamic coupling with the manipulator system. The LPKE framework also systematically includes externally applied, symmetry-breaking wrenches, allowing for immediate integration into hardware-in-the-loop simulations and model-based control architectures for autonomous robotic operations in the orbital environment. To illustrate the effectiveness of the proposed model and its numerical superiority, we present a simulation study analyzing orbital effects on a 7-degree-of-freedom manipulator mounted on a spacecraft.
RM-RL: Role-Model Reinforcement Learning for Precise Robot Manipulation
Precise robot manipulation is critical for fine-grained applications such as chemical and biological experiments, where even small errors (e.g., reagent spillage) can invalidate an entire task. Existing approaches often rely on pre-collected expert demonstrations and train policies via imitation learning (IL) or offline reinforcement learning (RL). However, obtaining high-quality demonstrations for precision tasks is difficult and time-consuming, while offline RL commonly suffers from distribution shifts and low data efficiency. We introduce a Role-Model Reinforcement Learning (RM-RL) framework that unifies online and offline training in real-world environments. The key idea is a role-model strategy that automatically generates labels for online training data using approximately optimal actions, eliminating the need for human demonstrations. RM-RL reformulates policy learning as supervised training, reducing instability from distribution mismatch and improving efficiency. A hybrid training scheme further leverages online role-model data for offline reuse, enhancing data efficiency through repeated sampling. Extensive experiments show that RM-RL converges faster and more stably than existing RL methods, yielding significant gains in real-world manipulation: 53% improvement in translation accuracy and 20% in rotation accuracy. Finally, we demonstrate the successful execution of a challenging task, precisely placing a cell plate onto a shelf, highlighting the framework's effectiveness where prior methods fail.
Autonomous Reactive Masonry Construction using Collaborative Heterogeneous Aerial Robots with Experimental Demonstration
This article presents a fully autonomous aerial masonry construction framework using heterogeneous unmanned aerial vehicles (UAVs), supported by experimental validation. Two specialized UAVs were developed for the task: (i) a brick-carrier UAV equipped with a ball-joint actuation mechanism for precise brick manipulation, and (ii) an adhesion UAV integrating a servo-controlled valve and extruder nozzle for accurate adhesion application. The proposed framework employs a reactive mission planning unit that combines a dependency graph of the construction layout with a conflict graph to manage simultaneous task execution, while hierarchical state machines ensure robust operation and safe transitions during task execution. Dynamic task allocation allows real-time adaptation to environmental feedback, while minimum-jerk trajectory generation ensures smooth and precise UAV motion during brick pickup and placement. Additionally, the brick-carrier UAV employs an onboard vision system that estimates brick poses in real time using ArUco markers and a least-squares optimization filter, enabling accurate alignment during construction. To the best of the authors' knowledge, this work represents the first experimental demonstration of fully autonomous aerial masonry construction using heterogeneous UAVs, where one UAV precisely places the bricks while another autonomously applies adhesion material between them. The experimental results supported by the video showcase the effectiveness of the proposed framework and demonstrate its potential to serve as a foundation for future developments in autonomous aerial robotic construction.
UrbanVerse: Scaling Urban Simulation by Watching City-Tour Videos
Urban embodied AI agents, ranging from delivery robots to quadrupeds, are increasingly populating our cities, navigating chaotic streets to provide last-mile connectivity. Training such agents requires diverse, high-fidelity urban environments to scale, yet existing human-crafted or procedurally generated simulation scenes either lack scalability or fail to capture real-world complexity. We introduce UrbanVerse, a data-driven real-to-sim system that converts crowd-sourced city-tour videos into physics-aware, interactive simulation scenes. UrbanVerse consists of: (i) UrbanVerse-100K, a repository of 100k+ annotated urban 3D assets with semantic and physical attributes, and (ii) UrbanVerse-Gen, an automatic pipeline that extracts scene layouts from video and instantiates metric-scale 3D simulations using retrieved assets. Running in IsaacSim, UrbanVerse offers 160 high-quality constructed scenes from 24 countries, along with a curated benchmark of 10 artist-designed test scenes. Experiments show that UrbanVerse scenes preserve real-world semantics and layouts, achieving human-evaluated realism comparable to manually crafted scenes. In urban navigation, policies trained in UrbanVerse exhibit scaling power laws and strong generalization, improving success by +6.3% in simulation and +30.1% in zero-shot sim-to-real transfer comparing to prior methods, accomplishing a 300 m real-world mission with only two interventions.
comment: Technical report. Project page: https://urbanverseproject.github.io/
MimicKit: A Reinforcement Learning Framework for Motion Imitation and Control
MimicKit is an open-source framework for training motion controllers using motion imitation and reinforcement learning. The codebase provides implementations of commonly-used motion-imitation techniques and RL algorithms. This framework is intended to support research and applications in computer graphics and robotics by providing a unified training framework, along with standardized environment, agent, and data structures. The codebase is designed to be modular and easily configurable, enabling convenient modification and extension to new characters and tasks. The open-source codebase is available at: https://github.com/xbpeng/MimicKit.
Hoecken-D Hand: A Novel Robotic Hand for Linear Parallel Pinching and Self-Adaptive Grasping
This paper presents the Hoecken-D Hand, an underactuated robotic gripper that combines a modified Hoecken linkage with a differential spring mechanism to achieve both linear parallel pinching and a mid-stroke transition to adaptive envelope. The original Hoecken linkage is reconfigured by replacing one member with differential links, preserving straight-line guidance while enabling contact-triggered reconfiguration without additional actuators. A double-parallelogram arrangement maintains fingertip parallelism during conventional pinching, whereas the differential mechanism allows one finger to wrap inward upon encountering an obstacle, improving stability on irregular or thin objects. The mechanism can be driven by a single linear actuator, minimizing complexity and cost; in our prototype, each finger is driven by its own linear actuator for simplicity. We perform kinematic modeling and force analysis to characterize grasp performance, including simulated grasping forces and spring-opening behavior under varying geometric parameters. The design was prototyped using PLA-based 3D printing, achieving a linear pinching span of approximately 200 mm. Preliminary tests demonstrate reliable grasping in both modes across a wide range of object geometries, highlighting the Hoecken-D Hand as a compact, adaptable, and cost-effective solution for manipulation in unstructured environments.
comment: Accepted by IEEE International Conference on Robotics and Biomimetics (ROBIO) 2025, Chengdu, China. This version includes updated contact information
SimULi: Real-Time LiDAR and Camera Simulation with Unscented Transforms
Rigorous testing of autonomous robots, such as self-driving vehicles, is essential to ensure their safety in real-world deployments. This requires building high-fidelity simulators to test scenarios beyond those that can be safely or exhaustively collected in the real-world. Existing neural rendering methods based on NeRF and 3DGS hold promise but suffer from low rendering speeds or can only render pinhole camera models, hindering their suitability to applications that commonly require high-distortion lenses and LiDAR data. Multi-sensor simulation poses additional challenges as existing methods handle cross-sensor inconsistencies by favoring the quality of one modality at the expense of others. To overcome these limitations, we propose SimULi, the first method capable of rendering arbitrary camera models and LiDAR data in real-time. Our method extends 3DGUT, which natively supports complex camera models, with LiDAR support, via an automated tiling strategy for arbitrary spinning LiDAR models and ray-based culling. To address cross-sensor inconsistencies, we design a factorized 3D Gaussian representation and anchoring strategy that reduces mean camera and depth error by up to 40% compared to existing methods. SimULi renders 10-20x faster than ray tracing approaches and 1.5-10x faster than prior rasterization-based work (and handles a wider range of camera models). When evaluated on two widely benchmarked autonomous driving datasets, SimULi matches or exceeds the fidelity of existing state-of-the-art methods across numerous camera and LiDAR metrics.
comment: Project page: https://research.nvidia.com/labs/sil/projects/simuli
SIG-Chat: Spatial Intent-Guided Conversational Gesture Generation Involving How, When and Where
The accompanying actions and gestures in dialogue are often closely linked to interactions with the environment, such as looking toward the interlocutor or using gestures to point to the described target at appropriate moments. Speech and semantics guide the production of gestures by determining their timing (WHEN) and style (HOW), while the spatial locations of interactive objects dictate their directional execution (WHERE). Existing approaches either rely solely on descriptive language to generate motions or utilize audio to produce non-interactive gestures, thereby lacking the characterization of interactive timing and spatial intent. This significantly limits the applicability of conversational gesture generation, whether in robotics or in the fields of game and animation production. To address this gap, we present a full-stack solution. We first established a unique data collection method to simultaneously capture high-precision human motion and spatial intent. We then developed a generation model driven by audio, language, and spatial data, alongside dedicated metrics for evaluating interaction timing and spatial accuracy. Finally, we deployed the solution on a humanoid robot, enabling rich, context-aware physical interactions.
Act to See, See to Act: Diffusion-Driven Perception-Action Interplay for Adaptive Policies NeurIPS 2025
Existing imitation learning methods decouple perception and action, which overlooks the causal reciprocity between sensory representations and action execution that humans naturally leverage for adaptive behaviors. To bridge this gap, we introduce Action-Guided Diffusion Policy (DP-AG), a unified representation learning that explicitly models a dynamic interplay between perception and action through probabilistic latent dynamics. DP-AG encodes latent observations into a Gaussian posterior via variational inference and evolves them using an action-guided SDE, where the Vector-Jacobian Product (VJP) of the diffusion policy's noise predictions serves as a structured stochastic force driving latent updates. To promote bidirectional learning between perception and action, we introduce a cycle-consistent contrastive loss that organizes the gradient flow of the noise predictor into a coherent perception-action loop, enforcing mutually consistent transitions in both latent updates and action refinements. Theoretically, we derive a variational lower bound for the action-guided SDE, and prove that the contrastive objective enhances continuity in both latent and action trajectories. Empirically, DP-AG significantly outperforms state-of-the-art methods across simulation benchmarks and real-world UR5 manipulation tasks. As a result, our DP-AG offers a promising step toward bridging biological adaptability and artificial policy learning.
comment: 39th Conference on Neural Information Processing Systems (NeurIPS 2025)
STITCHER: Real-Time Trajectory Planning with Motion Primitive Search
Autonomous high-speed navigation through large, complex environments requires real-time generation of agile trajectories that are dynamically feasible, collision-free, and satisfy constraints. Most modern trajectory planning techniques rely on numerical optimization because high-quality, expressive trajectories that satisfy constraints can be systematically computed. However, strict requirements on computation time and the risk of numerical instability can limit the use of optimization-based planners in safety-critical situations. This work presents an optimization-free planning framework called STITCHER that leverages graph search to generate long-range trajectories by stitching short trajectory segments together in real time. STITCHER is shown to outperform modern optimization-based planners through its innovative planning architecture and several algorithmic developments that make real-time planning possible. Simulation results show safe trajectories through complex environments can be generated in milliseconds that cover tens of meters.
SGAligner++: Cross-Modal Language-Aided 3D Scene Graph Alignment
Aligning 3D scene graphs is a crucial initial step for several applications in robot navigation and embodied perception. Current methods in 3D scene graph alignment often rely on single-modality point cloud data and struggle with incomplete or noisy input. We introduce SGAligner++, a cross-modal, language-aided framework for 3D scene graph alignment. Our method addresses the challenge of aligning partially overlapping scene observations across heterogeneous modalities by learning a unified joint embedding space, enabling accurate alignment even under low-overlap conditions and sensor noise. By employing lightweight unimodal encoders and attention-based fusion, SGAligner++ enhances scene understanding for tasks such as visual localization, 3D reconstruction, and navigation, while ensuring scalability and minimal computational overhead. Extensive evaluations on real-world datasets demonstrate that SGAligner++ outperforms state-of-the-art methods by up to 40% on noisy real-world reconstructions, while enabling cross-modal generalization.
comment: Project Page: https://singhbino3d.github.io/sgpp/
Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos ICCV 2025
Recent developments in Large Language Models pre-trained on extensive corpora have shown significant success in various natural language processing tasks with minimal fine-tuning. This success offers new promise for robotics, which has long been constrained by the high cost of action-labeled data. We ask: given the abundant video data containing interaction-related knowledge available as a rich "corpus", can a similar generative pre-training approach be effectively applied to enhance robot learning? The key challenge is to identify an effective representation for autoregressive pre-training that benefits robot manipulation tasks. Inspired by the way humans learn new skills through observing dynamic environments, we propose that effective robotic learning should emphasize motion-related knowledge, which is closely tied to low-level actions and is hardware-agnostic, facilitating the transfer of learned motions to actual robot actions. To this end, we introduce Moto, which converts video content into latent Motion Token sequences by a Latent Motion Tokenizer, learning a bridging "language" of motion from videos in an unsupervised manner. We pre-train Moto-GPT through motion token autoregression, enabling it to capture diverse visual motion knowledge. After pre-training, Moto-GPT demonstrates the promising ability to produce semantically interpretable motion tokens, predict plausible motion trajectories, and assess trajectory rationality through output likelihood. To transfer learned motion priors to real robot actions, we implement a co-fine-tuning strategy that seamlessly bridges latent motion token prediction and real robot control. Extensive experiments show that the fine-tuned Moto-GPT exhibits superior robustness and efficiency on robot manipulation benchmarks, underscoring its effectiveness in transferring knowledge from video data to downstream visual manipulation tasks.
comment: ICCV 2025. Project page: https://chenyi99.github.io/moto/
FEWT: Improving Humanoid Robot Perception with Frequency-Enhanced Wavelet-based Transformers
The embodied intelligence bridges the physical world and information space. As its typical physical embodiment, humanoid robots have shown great promise through robot learning algorithms in recent years. In this study, a hardware platform, including humanoid robot and exoskeleton-style teleoperation cabin, was developed to realize intuitive remote manipulation and efficient collection of anthropomorphic action data. To improve the perception representation of humanoid robot, an imitation learning framework, termed Frequency-Enhanced Wavelet-based Transformer (FEWT), was proposed, which consists of two primary modules: Frequency-Enhanced Efficient Multi-Scale Attention (FE-EMA) and Time-Series Discrete Wavelet Transform (TS-DWT). By combining multi-scale wavelet decomposition with the residual network, FE-EMA can dynamically fuse features from both cross-spatial and frequency-domain. This fusion is able to capture feature information across various scales effectively, thereby enhancing model robustness. Experimental performance demonstrates that FEWT improves the success rate of the state-of-the-art algorithm (Action Chunking with Transformers, ACT baseline) by up to 30% in simulation and by 6-12% in real-world.
FTIN: Frequency-Time Integration Network for Inertial Odometry
Inertial odometry (IO) leverages inertial measurement unit (IMU) signals for cost-effective localization. However, high IMU sampling rates introduce substantial redundancy that impedes IO's ability to attend to salient components, thereby creating an information bottleneck. To address this challenge, we propose a cross-domain IO framework that fuses information from the frequency and time domains. Specifically, we exploit the global context and energy-compaction properties of frequency-domain representations to capture holistic motion patterns and alleviate the bottleneck. To the best of our knowledge, this is among the first attempts to incorporate frequency-domain feature processing into IO. Experimental results on multiple public datasets demonstrate the effectiveness of the proposed frequency--time-domain fusion strategy.
CKANIO: Learnable Chebyshev Polynomials for Inertial Odometry
Inertial odometry (IO) relies exclusively on signals from an inertial measurement unit (IMU) for localization and offers a promising avenue for consumer grade positioning. However, accurate modeling of the nonlinear motion patterns present in IMU signals remains the principal limitation on IO accuracy. To address this challenge, we propose CKANIO, an IO framework that integrates Chebyshev based Kolmogorov-Arnold Networks (Chebyshev KAN). Specifically, we design a novel residual architecture that leverages the nonlinear approximation capabilities of Chebyshev polynomials within the KAN framework to more effectively model the complex motion characteristics inherent in IMU signals. To the best of our knowledge, this work represents the first application of an interpretable KAN model to IO. Experimental results on five publicly available datasets demonstrate the effectiveness of CKANIO.
NAMO-LLM: Efficient Navigation Among Movable Obstacles with Large Language Model Guidance
Several planners have been proposed to compute robot paths that reach desired goal regions while avoiding obstacles. However, these methods fail when all pathways to the goal are blocked. In such cases, the robot must reason about how to reconfigure the environment to access task-relevant regions - a problem known as Navigation Among Movable Objects (NAMO). While various solutions to this problem have been developed, they often struggle to scale to highly cluttered environments. To address this, we propose NAMO-LLM, a sampling-based planner that searches over robot and obstacle configurations to compute feasible plans specifying which obstacles to move, where, and in what order. Its key novelty is a non-uniform sampling strategy guided by Large Language Models (LLMs) biasing the tree construction toward directions more likely to yield a solution. We show that NAMO-LLM is probabilistically complete and demonstrate through experiments that it efficiently scales to cluttered environments, outperforming related works in both runtime and plan quality.
comment: 9 pages, 6 figures
WoW: Towards a World omniscient World model Through Embodied Interaction
Humans develop an understanding of intuitive physics through active interaction with the world. This approach is in stark contrast to current video models, such as Sora, which rely on passive observation and therefore struggle with grasping physical causality. This observation leads to our central hypothesis: authentic physical intuition of the world model must be grounded in extensive, causally rich interactions with the real world. To test this hypothesis, we present WoW, a 14-billion-parameter generative world model trained on 2 million robot interaction trajectories. Our findings reveal that the model's understanding of physics is a probabilistic distribution of plausible outcomes, leading to stochastic instabilities and physical hallucinations. Furthermore, we demonstrate that this emergent capability can be actively constrained toward physical realism by SOPHIA, where vision-language model agents evaluate the DiT-generated output and guide its refinement by iteratively evolving the language instructions. In addition, a co-trained Inverse Dynamics Model translates these refined plans into executable robotic actions, thus closing the imagination-to-action loop. We establish WoWBench, a new benchmark focused on physical consistency and causal reasoning in video, where WoW achieves state-of-the-art performance in both human and autonomous evaluation, demonstrating strong ability in physical causality, collision dynamics, and object permanence. Our work provides systematic evidence that large-scale, real-world interaction is a cornerstone for developing physical intuition in AI. Models, data, and benchmarks will be open-sourced.
Insect-Scale Tailless Robot with Flapping Wings: A Simple Structure and Drive for Yaw Control
Insect-scale micro-aerial vehicles, especially, lightweight, flapping-wing robots, are becoming increasingly important for safe motion sensing in spatially constrained environments such as living spaces. However, yaw control using flapping wings is fundamentally more difficult than using rotating wings. In this study, an insect-scale, tailless robot with four paired tilted flapping wings (weighing 1.52 g) to enable yaw control was fabricated. It benefits from the simplicity of a directly driven wing actuator with no transmission and a lift control signal; however, it still has an offset in the lift force. Therefore, an adaptive controller was designed to alleviate the offset. Numerical experiments confirm that the proposed controller outperforms the linear quadratic integral controller. Finally, in a tethered and controlled demonstration flight, the yaw drift was suppressed by the wing-tilting arrangement and the proposed controller. The simple structure drive system demonstrates the potential for future controlled flights of battery-powered, tailless, flapping-wing robots weighing less than 10 grams.
comment: Submitted to Control Engineering Practice (Elsevier)
Real-Time Adaptive Motion Planning via Point Cloud-Guided, Energy-Based Diffusion and Potential Fields
Motivated by the problem of pursuit-evasion, we present a motion planning framework that combines energy-based diffusion models with artificial potential fields for robust real time trajectory generation in complex environments. Our approach processes obstacle information directly from point clouds, enabling efficient planning without requiring complete geometric representations. The framework employs classifier-free guidance training and integrates local potential fields during sampling to enhance obstacle avoidance. In dynamic scenarios, the system generates initial trajectories using the diffusion model and continuously refines them through potential field-based adaptation, demonstrating effective performance in pursuit-evasion scenarios with partial pursuer observability.
comment: Accepted to IEEE RA-L 2025
TACS-Graphs: Traversability-Aware Consistent Scene Graphs for Ground Robot Localization and Mapping IROS 2025
Scene graphs have emerged as a powerful tool for robots, providing a structured representation of spatial and semantic relationships for advanced task planning. Despite their potential, conventional 3D indoor scene graphs face critical limitations, particularly under- and over-segmentation of room layers in structurally complex environments. Under-segmentation misclassifies non-traversable areas as part of a room, often in open spaces, while over-segmentation fragments a single room into overlapping segments in complex environments. These issues stem from naive voxel-based map representations that rely solely on geometric proximity, disregarding the structural constraints of traversable spaces and resulting in inconsistent room layers within scene graphs. To the best of our knowledge, this work is the first to tackle segmentation inconsistency as a challenge and address it with Traversability-Aware Consistent Scene Graphs (TACS-Graphs), a novel framework that integrates ground robot traversability with room segmentation. By leveraging traversability as a key factor in defining room boundaries, the proposed method achieves a more semantically meaningful and topologically coherent segmentation, effectively mitigating the inaccuracies of voxel-based scene graph approaches in complex environments. Furthermore, the enhanced segmentation consistency improves loop closure detection efficiency in the proposed Consistent Scene Graph-leveraging Loop Closure Detection (CoSG-LCD) leading to higher pose estimation accuracy. Experimental results confirm that the proposed approach outperforms state-of-the-art methods in terms of scene graph consistency and pose graph optimization performance.
comment: Accepted by IROS 2025
EffiTune: Diagnosing and Mitigating Training Inefficiency for Parameter Tuner in Robot Navigation System IROS 2025
Robot navigation systems are critical for various real-world applications such as delivery services, hospital logistics, and warehouse management. Although classical navigation methods provide interpretability, they rely heavily on expert manual tuning, limiting their adaptability. Conversely, purely learning-based methods offer adaptability but often lead to instability and erratic robot behaviors. Recently introduced parameter tuners aim to balance these approaches by integrating data-driven adaptability into classical navigation frameworks. However, the parameter tuning process currently suffers from training inefficiencies and redundant sampling, with critical regions in environment often underrepresented in training data. In this paper, we propose EffiTune, a novel framework designed to diagnose and mitigate training inefficiency for parameter tuners in robot navigation systems. EffiTune first performs robot-behavior-guided diagnostics to pinpoint critical bottlenecks and underrepresented regions. It then employs a targeted up-sampling strategy to enrich the training dataset with critical samples, significantly reducing redundancy and enhancing training efficiency. Our comprehensive evaluation demonstrates that EffiTune achieves more than a 13.5% improvement in navigation performance, enhanced robustness in out-of-distribution scenarios, and a 4x improvement in training efficiency within the same computational budget.
comment: Accepted to IROS 2025
APEX: Empowering LLMs with Physics-Based Task Planning for Real-time Insight
Large Language Models (LLMs) demonstrate strong reasoning and task planning capabilities but remain fundamentally limited in physical interaction modeling. Existing approaches integrate perception via Vision-Language Models (VLMs) or adaptive decision-making through Reinforcement Learning (RL), but they fail to capture dynamic object interactions or require task-specific training, limiting their real-world applicability. We introduce APEX (Anticipatory Physics-Enhanced Execution), a framework that equips LLMs with physics-driven foresight for real-time task planning. APEX constructs structured graphs to identify and model the most relevant dynamic interactions in the environment, providing LLMs with explicit physical state updates. Simultaneously, APEX provides low-latency forward simulations of physically feasible actions, allowing LLMs to select optimal strategies based on predictive outcomes rather than static observations. We evaluate APEX on three benchmarks designed to assess perception, prediction, and decision-making: (1) Physics Reasoning Benchmark, testing causal inference and object motion prediction; (2) Tetris, evaluating whether physics-informed prediction enhances decision-making performance in long-horizon planning tasks; (3) Dynamic Obstacle Avoidance, assessing the immediate integration of perception and action feasibility analysis. APEX significantly outperforms standard LLMs and VLM-based models, demonstrating the necessity of explicit physics reasoning for bridging the gap between language-based intelligence and real-world task execution. The source code and experiment setup are publicly available at https://github.com/hwj20/APEX_EXP .
Never too Prim to Swim: An LLM-Enhanced RL-based Adaptive S-Surface Controller for AUVs under Extreme Sea Conditions IROS 2025
The adaptivity and maneuvering capabilities of Autonomous Underwater Vehicles (AUVs) have drawn significant attention in oceanic research, due to the unpredictable disturbances and strong coupling among the AUV's degrees of freedom. In this paper, we developed large language model (LLM)-enhanced reinforcement learning (RL)-based adaptive S-surface controller for AUVs. Specifically, LLMs are introduced for the joint optimization of controller parameters and reward functions in RL training. Using multi-modal and structured explicit task feedback, LLMs enable joint adjustments, balance multiple objectives, and enhance task-oriented performance and adaptability. In the proposed controller, the RL policy focuses on upper-level tasks, outputting task-oriented high-level commands that the S-surface controller then converts into control signals, ensuring cancellation of nonlinear effects and unpredictable external disturbances in extreme sea conditions. Under extreme sea conditions involving complex terrain, waves, and currents, the proposed controller demonstrates superior performance and adaptability in high-level tasks such as underwater target tracking and data collection, outperforming traditional PID and SMC controllers.
comment: Accepted by IEEE/RSJ IROS 2025
Inferring Foresightedness in Dynamic Noncooperative Games
Dynamic game theory is an increasingly popular tool for modeling multi-agent, e.g. human-robot, interactions. Game-theoretic models presume that each agent wishes to minimize a private cost function that depends on others' actions. These games typically evolve over a fixed time horizon, specifying how far into the future each agent plans. In practical settings, however, decision-makers may vary in foresightedness, or how much they care about their current cost in relation to their past and future costs. We conjecture that quantifying and estimating each agent's foresightedness from online data will enable safer and more efficient interactions with other agents. To this end, we frame this inference problem as an inverse dynamic game. We consider a specific objective function parametrization that smoothly interpolates myopic and farsighted planning. Games of this form are readily transformed into parametric mixed complementarity problems; we exploit the directional differentiability of solutions to these problems with respect to their hidden parameters to solve for agents' foresightedness. We conduct three experiments: one with synthetically generated delivery robot motion, one with real-world data involving people walking, biking, and driving vehicles, and one using high-fidelity simulators. The results of these experiments demonstrate that explicitly inferring agents' foresightedness enables game-theoretic models to make 33% more accurate models for agents' behavior.
MotionScript: Natural Language Descriptions for Expressive 3D Human Motions
We introduce MotionScript, a novel framework for generating highly detailed, natural language descriptions of 3D human motions. Unlike existing motion datasets that rely on broad action labels or generic captions, MotionScript provides fine-grained, structured descriptions that capture the full complexity of human movement including expressive actions (e.g., emotions, stylistic walking) and interactions beyond standard motion capture datasets. MotionScript serves as both a descriptive tool and a training resource for text-to-motion models, enabling the synthesis of highly realistic and diverse human motions from text. By augmenting motion datasets with MotionScript captions, we demonstrate significant improvements in out-of-distribution motion generation, allowing large language models (LLMs) to generate motions that extend beyond existing data. Additionally, MotionScript opens new applications in animation, virtual human simulation, and robotics, providing an interpretable bridge between intuitive descriptions and motion synthesis. To the best of our knowledge, this is the first attempt to systematically translate 3D motion into structured natural language without requiring training data.
comment: Project webpage: https://pjyazdian.github.io/MotionScript
Towards smart and adaptive agents for active sensing on edge devices
TinyML has made deploying deep learning models on low-power edge devices feasible, creating new opportunities for real-time perception in constrained environments. However, the adaptability of such deep learning methods remains limited to data drift adaptation, lacking broader capabilities that account for the environment's underlying dynamics and inherent uncertainty. Deep learning's scaling laws, which counterbalance this limitation by massively up-scaling data and model size, cannot be applied when deploying on the Edge, where deep learning limitations are further amplified as models are scaled down for deployment on resource-constrained devices. This paper presents an innovative agentic system capable of performing on-device perception and planning, enabling active sensing on the edge. By incorporating active inference into our solution, our approach extends beyond deep learning capabilities, allowing the system to plan in dynamic environments while operating in real-time with a compact memory footprint of as little as 300 MB. We showcase our proposed system by creating and deploying a saccade agent connected to an IoT camera with pan and tilt capabilities on an NVIDIA Jetson embedded device. The saccade agent controls the camera's field of view following optimal policies derived from the active inference principles, simulating human-like saccadic motion for surveillance and robotics applications.
COMPASS: Cross-embodiment Mobility Policy via Residual RL and Skill Synthesis
As robots are increasingly deployed in diverse application domains, enabling robust mobility across different embodiments has become a critical challenge. Classical mobility stacks, though effective on specific platforms, require extensive per-robot tuning and do not scale easily to new embodiments. Learning-based approaches, such as imitation learning (IL), offer alternatives, but face significant limitations on the need for high-quality demonstrations for each embodiment. To address these challenges, we introduce COMPASS, a unified framework that enables scalable cross-embodiment mobility using expert demonstrations from only a single embodiment. We first pre-train a mobility policy on a single robot using IL, combining a world model with a policy model. We then apply residual reinforcement learning (RL) to efficiently adapt this policy to diverse embodiments through corrective refinements. Finally, we distill specialist policies into a single generalist policy conditioned on an embodiment embedding vector. This design significantly reduces the burden of collecting data while enabling robust generalization across a wide range of robot designs. Our experiments demonstrate that COMPASS scales effectively across diverse robot platforms while maintaining adaptability to various environment configurations, achieving a generalist policy with a success rate approximately 5X higher than the pre-trained IL policy on unseen embodiments, and further demonstrates zero-shot sim-to-real transfer.
TAS: A Transit-Aware Strategy for Embodied Navigation with Non-Stationary Targets
Embodied navigation methods commonly operate in static environments with stationary targets. In this work, we present a new algorithm for navigation in dynamic scenarios with non-stationary targets. Our novel Transit-Aware Strategy (TAS) enriches embodied navigation policies with object path information. TAS improves performance in non-stationary environments by rewarding agents for synchronizing their routes with target routes. To evaluate TAS, we further introduce Dynamic Object Maps (DOMs), a dynamic variant of node-attributed topological graphs with structured object transitions. DOMs are inspired by human habits to simulate realistic object routes on a graph. Our experiments show that on average, TAS improves agent Success Rate (SR) by 21.1 in non-stationary environments, while also generalizing better from static environments by 44.5% when measured by Relative Change in Success (RCS). We qualitatively investigate TAS-agent performance on DOMs and draw various inferences to help better model generalist navigation policies. To the best of our knowledge, ours is the first work that quantifies the adaptability of embodied navigation methods in non-stationary environments. Code and data for our benchmark will be made publicly available.
comment: 15 pages
Multiagent Systems
SADCHER: Scheduling using Attention-based Dynamic Coalitions of Heterogeneous Robots in Real-Time
We present Sadcher, a real-time task assignment framework for heterogeneous multi-robot teams that incorporates dynamic coalition formation and task precedence constraints. Sadcher is trained through Imitation Learning and combines graph attention and transformers to predict assignment rewards between robots and tasks. Based on the predicted rewards, a relaxed bipartite matching step generates high-quality schedules with feasibility guarantees. We explicitly model robot and task positions, task durations, and robots' remaining processing times, enabling advanced temporal and spatial reasoning and generalization to environments with different spatiotemporal distributions compared to training. Trained on optimally solved small-scale instances, our method can scale to larger task sets and team sizes. Sadcher outperforms other learning-based and heuristic baselines on randomized, unseen problems for small and medium-sized teams with computation times suitable for real-time operation. We also explore sampling-based variants and evaluate scalability across robot and task counts. In addition, we release our dataset of 250,000 optimal schedules: https://autonomousrobots.nl/paper_websites/sadcher_MRTA/
comment: 7 pages, 5 figures. 2025 IEEE Int. Symposium on Multi-Robot and Multi-Agent Systems (MRS 2025). Website and Code: https://autonomousrobots.nl/paper_websites/sadcher_MRTA/
Multi Agent Switching Mode Controller for Sound Source localization
Source seeking is an important topic in robotic research, especially considering sound-based sensors since they allow the agents to locate a target even in critical conditions where it is not possible to establish a direct line of sight. In this work, we design a multi- agent switching mode control strategy for acoustic-based target localization. Two scenarios are considered: single source localization, in which the agents are driven maintaining a rigid formation towards the target, and multi-source scenario, in which each agent searches for the targets independently from the others.
When Planners Meet Reality: How Learned, Reactive Traffic Agents Shift nuPlan Benchmarks
Planner evaluation in closed-loop simulation often uses rule-based traffic agents, whose simplistic and passive behavior can hide planner deficiencies and bias rankings. Widely used IDM agents simply follow a lead vehicle and cannot react to vehicles in adjacent lanes, hindering tests of complex interaction capabilities. We address this issue by integrating the state-of-the-art learned traffic agent model SMART into nuPlan. Thus, we are the first to evaluate planners under more realistic conditions and quantify how conclusions shift when narrowing the sim-to-real gap. Our analysis covers 14 recent planners and established baselines and shows that IDM-based simulation overestimates planning performance: nearly all scores deteriorate. In contrast, many planners interact better than previously assumed and even improve in multi-lane, interaction-heavy scenarios like lane changes or turns. Methods trained in closed-loop demonstrate the best and most stable driving performance. However, when reaching their limits in augmented edge-case scenarios, all learned planners degrade abruptly, whereas rule-based planners maintain reasonable basic behavior. Based on our results, we suggest SMART-reactive simulation as a new standard closed-loop benchmark in nuPlan and release the SMART agents as a drop-in alternative to IDM at https://github.com/shgd95/InteractiveClosedLoop.
The Role of Social Learning and Collective Norm Formation in Fostering Cooperation in LLM Multi-Agent Systems
A growing body of multi-agent studies with Large Language Models (LLMs) explores how norms and cooperation emerge in mixed-motive scenarios, where pursuing individual gain can undermine the collective good. While prior work has explored these dynamics in both richly contextualized simulations and simplified game-theoretic environments, most LLM systems featuring common-pool resource (CPR) games provide agents with explicit reward functions directly tied to their actions. In contrast, human cooperation often emerges without full visibility into payoffs and population, relying instead on heuristics, communication, and punishment. We introduce a CPR simulation framework that removes explicit reward signals and embeds cultural-evolutionary mechanisms: social learning (adopting strategies and beliefs from successful peers) and norm-based punishment, grounded in Ostrom's principles of resource governance. Agents also individually learn from the consequences of harvesting, monitoring, and punishing via environmental feedback, enabling norms to emerge endogenously. We establish the validity of our simulation by reproducing key findings from existing studies on human behavior. Building on this, we examine norm evolution across a $2\times2$ grid of environmental and social initialisations (resource-rich vs. resource-scarce; altruistic vs. selfish) and benchmark how agentic societies comprised of different LLMs perform under these conditions. Our results reveal systematic model differences in sustaining cooperation and norm formation, positioning the framework as a rigorous testbed for studying emergent norms in mixed-motive LLM societies. Such analysis can inform the design of AI systems deployed in social and organizational contexts, where alignment with cooperative norms is critical for stability, fairness, and effective governance of AI-mediated environments.
Disaster Management in the Era of Agentic AI Systems: A Vision for Collective Human-Machine Intelligence for Augmented Resilience
The escalating frequency and severity of disasters routinely overwhelm traditional response capabilities, exposing critical vulnerability in disaster management. Current practices are hindered by fragmented data streams, siloed technologies, resource constraints, and the erosion of institutional memory, which collectively impede timely and effective decision making. This study introduces Disaster Copilot, a vision for a multi-agent artificial intelligence system designed to overcome these systemic challenges by unifying specialized AI tools within a collaborative framework. The proposed architecture utilizes a central orchestrator to coordinate diverse sub-agents, each specializing in critical domains such as predictive risk analytics, situational awareness, and impact assessment. By integrating multi-modal data, the system delivers a holistic, real-time operational picture and serve as the essential AI backbone required to advance Disaster Digital Twins from passive models to active, intelligent environments. Furthermore, it ensures functionality in resource-limited environments through on-device orchestration and incorporates mechanisms to capture institutional knowledge, mitigating the impact of staff turnover. We detail the system architecture and propose a three-phased roadmap emphasizing the parallel growth of technology, organizational capacity, and human-AI teaming. Disaster Copilot offers a transformative vision, fostering collective human-machine intelligence to build more adaptive, data-driven and resilient communities.
Sketch2BIM: A Multi-Agent Human-AI Collaborative Pipeline to Convert Hand-Drawn Floor Plans to 3D BIM
This study introduces a human-in-the-loop pipeline that converts unscaled, hand-drawn floor plan sketches into semantically consistent 3D BIM models. The workflow leverages multimodal large language models (MLLMs) within a multi-agent framework, combining perceptual extraction, human feedback, schema validation, and automated BIM scripting. Initially, sketches are iteratively refined into a structured JSON layout of walls, doors, and windows. Later, these layouts are transformed into executable scripts that generate 3D BIM models. Experiments on ten diverse floor plans demonstrate strong convergence: openings (doors, windows) are captured with high reliability in the initial pass, while wall detection begins around 83% and achieves near-perfect alignment after a few feedback iterations. Across all categories, precision, recall, and F1 scores remain above 0.83, and geometric errors (RMSE, MAE) progressively decrease to zero through feedback corrections. This study demonstrates how MLLM-driven multi-agent reasoning can make BIM creation accessible to both experts and non-experts using only freehand sketches.
ABMax: A JAX-based Agent-based Modeling Framework
Agent-based modeling (ABM) is a principal approach for studying complex systems. By decomposing a system into simpler, interacting agents, agent-based modeling (ABM) allows researchers to observe the emergence of complex phenomena. High-performance array computing libraries like JAX can help scale such computational models to a large number of agents by using automatic vectorization and just-in-time (JIT) compilation. One of the caveats of using JAX to achieve such scaling is that the shapes of arrays used in the computational model should remain immutable throughout the simulation. In the context of agent-based modeling (ABM), this can pose constraints on certain agent manipulation operations that require flexible data structures. A subset of which is represented by the ability to update a dynamically selected number of agents by applying distinct changes to them during a simulation. To this effect, we introduce ABMax, an ABM framework based on JAX that implements multiple just-in-time (JIT) compilable algorithms to provide this functionality. On the canonical predation model benchmark, ABMax achieves runtime performance comparable to state-of-the-art implementations. Further, we show that this functionality can also be vectorized, making it possible to run many similar agent-based models in parallel. We also present two examples in the form of a traffic-flow model and a financial market model to show the use case of ABMax
comment: 8 pages, 7 figures, 4 tables, 2 algorithms
Measuring and Mitigating Identity Bias in Multi-Agent Debate via Anonymization
Multi-agent debate (MAD) aims to improve large language model (LLM) reasoning by letting multiple agents exchange answers and then aggregate their opinions. Yet recent studies reveal that agents are not neutral: they are prone to identity-driven sycophancy and self-bias, uncritically adopting a peer's view or stubbornly adhering to their own prior output, undermining the reliability of debate. In this work, we present the first principled framework that joins sycophancy and self-bias to mitigate and quantify identity bias in MAD. First, we formalize the debate dynamics as an identity-weighted Bayesian update process. Second, we propose response anonymization: by removing identity markers from prompts, agents cannot distinguish "self" from "peer", which forces equal weights on agent identity, thereby reducing bias. Third, we define the Identity Bias Coefficient (IBC), a principled metric that measures how often an agent follows a peer versus itself. Empirical studies across multiple models, datasets and debate rounds confirm that identity bias is widespread, with sycophancy far more common than self-bias. Our findings highlight the need to "mask" identity to ensure that MAD systems reason based on content rather than source identity. Code is released in https://github.com/deeplearning-wisc/MAD-identity-bias.
RareAgent: Self-Evolving Reasoning for Drug Repurposing in Rare Diseases
Computational drug repurposing for rare diseases is especially challenging when no prior associations exist between drugs and target diseases. Therefore, knowledge graph completion and message-passing GNNs have little reliable signal to learn and propagate, resulting in poor performance. We present RareAgent, a self-evolving multi-agent system that reframes this task from passive pattern recognition to active evidence-seeking reasoning. RareAgent organizes task-specific adversarial debates in which agents dynamically construct evidence graphs from diverse perspectives to support, refute, or entail hypotheses. The reasoning strategies are analyzed post hoc in a self-evolutionary loop, producing textual feedback that refines agent policies, while successful reasoning paths are distilled into transferable heuristics to accelerate future investigations. Comprehensive evaluations reveal that RareAgent improves the indication AUPRC by 18.1% over reasoning baselines and provides a transparent reasoning chain consistent with clinical evidence.
Ax-Prover: A Deep Reasoning Agentic Framework for Theorem Proving in Mathematics and Quantum Physics
We present Ax-Prover, a multi-agent system for automated theorem proving in Lean that can solve problems across diverse scientific domains and operate either autonomously or collaboratively with human experts. To achieve this, Ax-Prover approaches scientific problem solving through formal proof generation, a process that demands both creative reasoning and strict syntactic rigor. Ax-Prover meets this challenge by equipping Large Language Models (LLMs), which provide knowledge and reasoning, with Lean tools via the Model Context Protocol (MCP), which ensure formal correctness. To evaluate its performance as an autonomous prover, we benchmark our approach against frontier LLMs and specialized prover models on two public math benchmarks and on two Lean benchmarks we introduce in the fields of abstract algebra and quantum theory. On public datasets, Ax-Prover is competitive with state-of-the-art provers, while it largely outperforms them on the new benchmarks. This shows that, unlike specialized systems that struggle to generalize, our tool-based agentic theorem prover approach offers a generalizable methodology for formal verification across diverse scientific domains. Furthermore, we demonstrate Ax-Prover's assistant capabilities in a practical use case, showing how it enabled an expert mathematician to formalize the proof of a complex cryptography theorem.
Internet of Agents: Fundamentals, Applications, and Challenges
With the rapid proliferation of large language models and vision-language models, AI agents have evolved from isolated, task-specific systems into autonomous, interactive entities capable of perceiving, reasoning, and acting without human intervention. As these agents proliferate across virtual and physical environments, from virtual assistants to embodied robots, the need for a unified, agent-centric infrastructure becomes paramount. In this survey, we introduce the Internet of Agents (IoA) as a foundational framework that enables seamless interconnection, dynamic discovery, and collaborative orchestration among heterogeneous agents at scale. We begin by presenting a general IoA architecture, highlighting its hierarchical organization, distinguishing features relative to the traditional Internet, and emerging applications. Next, we analyze the key operational enablers of IoA, including capability notification and discovery, adaptive communication protocols, dynamic task matching, consensus and conflict-resolution mechanisms, and incentive models. Finally, we identify open research directions toward building resilient and trustworthy IoA ecosystems.
comment: 25 pages,10 figures, 10 tables. Accepted by IEEE TCCN in Oct. 2025
RADAR: A Risk-Aware Dynamic Multi-Agent Framework for LLM Safety Evaluation via Role-Specialized Collaboration
Existing safety evaluation methods for large language models (LLMs) suffer from inherent limitations, including evaluator bias and detection failures arising from model homogeneity, which collectively undermine the robustness of risk evaluation processes. This paper seeks to re-examine the risk evaluation paradigm by introducing a theoretical framework that reconstructs the underlying risk concept space. Specifically, we decompose the latent risk concept space into three mutually exclusive subspaces: the explicit risk subspace (encompassing direct violations of safety guidelines), the implicit risk subspace (capturing potential malicious content that requires contextual reasoning for identification), and the non-risk subspace. Furthermore, we propose RADAR, a multi-agent collaborative evaluation framework that leverages multi-round debate mechanisms through four specialized complementary roles and employs dynamic update mechanisms to achieve self-evolution of risk concept distributions. This approach enables comprehensive coverage of both explicit and implicit risks while mitigating evaluator bias. To validate the effectiveness of our framework, we construct an evaluation dataset comprising 800 challenging cases. Extensive experiments on our challenging testset and public benchmarks demonstrate that RADAR significantly outperforms baseline evaluation methods across multiple dimensions, including accuracy, stability, and self-evaluation risk sensitivity. Notably, RADAR achieves a 28.87% improvement in risk identification accuracy compared to the strongest baseline evaluation method.
Inferring Foresightedness in Dynamic Noncooperative Games
Dynamic game theory is an increasingly popular tool for modeling multi-agent, e.g. human-robot, interactions. Game-theoretic models presume that each agent wishes to minimize a private cost function that depends on others' actions. These games typically evolve over a fixed time horizon, specifying how far into the future each agent plans. In practical settings, however, decision-makers may vary in foresightedness, or how much they care about their current cost in relation to their past and future costs. We conjecture that quantifying and estimating each agent's foresightedness from online data will enable safer and more efficient interactions with other agents. To this end, we frame this inference problem as an inverse dynamic game. We consider a specific objective function parametrization that smoothly interpolates myopic and farsighted planning. Games of this form are readily transformed into parametric mixed complementarity problems; we exploit the directional differentiability of solutions to these problems with respect to their hidden parameters to solve for agents' foresightedness. We conduct three experiments: one with synthetically generated delivery robot motion, one with real-world data involving people walking, biking, and driving vehicles, and one using high-fidelity simulators. The results of these experiments demonstrate that explicitly inferring agents' foresightedness enables game-theoretic models to make 33% more accurate models for agents' behavior.
Where Did It All Go Wrong? A Hierarchical Look into Multi-Agent Error Attribution
Error attribution in Large Language Model (LLM) multi-agent systems presents a significant challenge in debugging and improving collaborative AI systems. Current approaches to pinpointing agent and step level failures in interaction traces - whether using all-at-once evaluation, step-by-step analysis, or binary search - fall short when analyzing complex patterns, struggling with both accuracy and consistency. We present ECHO (Error attribution through Contextual Hierarchy and Objective consensus analysis), a novel algorithm that combines hierarchical context representation, objective analysis-based evaluation, and consensus voting to improve error attribution accuracy. Our approach leverages a positional-based leveling of contextual understanding while maintaining objective evaluation criteria, ultimately reaching conclusions through a consensus mechanism. Experimental results demonstrate that ECHO outperforms existing methods across various multi-agent interaction scenarios, showing particular strength in cases involving subtle reasoning errors and complex interdependencies. Our findings suggest that leveraging these concepts of structured, hierarchical context representation combined with consensus-based objective decision-making, provides a more robust framework for error attribution in multi-agent systems.
Cooperative Bargaining Games Without Utilities: Mediated Solutions from Direction Oracles
Cooperative bargaining games are widely used to model resource allocation and conflict resolution. Traditional solutions assume the mediator can access agents utility function values and gradients. However, there is an increasing number of settings, such as human AI interactions, where utility values may be inaccessible or incomparable due to unknown, nonaffine transformations. To model such settings, we consider that the mediator has access only to agents most preferred directions, i.e., normalized utility gradients in the decision space. To this end, we propose a cooperative bargaining algorithm where a mediator has access to only the direction oracle of each agent. We prove that unlike popular approaches such as the Nash and Kalai Smorodinsky bargaining solutions, our approach is invariant to monotonic nonaffine transformations, and that under strong convexity and smoothness assumptions, this approach enjoys global asymptotic convergence to Pareto stationary solutions. Moreover, we show that the bargaining solutions found by our algorithm also satisfy the axioms of symmetry and (under slightly stronger conditions) independence of irrelevant alternatives, which are popular in the literature. Finally, we conduct experiments in two domains, multi agent formation assignment and mediated stock portfolio allocation, which validate these theoretic results. All code for our experiments can be found at https://github.com/suryakmurthy/dibs_bargaining.
Systems and Control (CS)
RDD: Retrieval-Based Demonstration Decomposer for Planner Alignment in Long-Horizon Tasks NeurIPS 2025
To tackle long-horizon tasks, recent hierarchical vision-language-action (VLAs) frameworks employ vision-language model (VLM)-based planners to decompose complex manipulation tasks into simpler sub-tasks that low-level visuomotor policies can easily handle. Typically, the VLM planner is finetuned to learn to decompose a target task. This finetuning requires target task demonstrations segmented into sub-tasks by either human annotation or heuristic rules. However, the heuristic subtasks can deviate significantly from the training data of the visuomotor policy, which degrades task performance. To address these issues, we propose a Retrieval-based Demonstration Decomposer (RDD) that automatically decomposes demonstrations into sub-tasks by aligning the visual features of the decomposed sub-task intervals with those from the training data of the low-level visuomotor policies. Our method outperforms the state-of-the-art sub-task decomposer on both simulation and real-world tasks, demonstrating robustness across diverse settings. Code and more results are available at rdd-neurips.github.io.
comment: 39th Conference on Neural Information Processing Systems (NeurIPS 2025); Project Website: rdd-neurips.github.io
CBF-RL: Safety Filtering Reinforcement Learning in Training with Control Barrier Functions
Reinforcement learning (RL), while powerful and expressive, can often prioritize performance at the expense of safety. Yet safety violations can lead to catastrophic outcomes in real-world deployments. Control Barrier Functions (CBFs) offer a principled method to enforce dynamic safety -- traditionally deployed \emph{online} via safety filters. While the result is safe behavior, the fact that the RL policy does not have knowledge of the CBF can lead to conservative behaviors. This paper proposes CBF-RL, a framework for generating safe behaviors with RL by enforcing CBFs \emph{in training}. CBF-RL has two key attributes: (1) minimally modifying a nominal RL policy to encode safety constraints via a CBF term, (2) and safety filtering of the policy rollouts in training. Theoretically, we prove that continuous-time safety filters can be deployed via closed-form expressions on discrete-time roll-outs. Practically, we demonstrate that CBF-RL internalizes the safety constraints in the learned policy -- both enforcing safer actions and biasing towards safer rewards -- enabling safe deployment without the need for an online safety filter. We validate our framework through ablation studies on navigation tasks and on the Unitree G1 humanoid robot, where CBF-RL enables safer exploration, faster convergence, and robust performance under uncertainty, enabling the humanoid robot to avoid obstacles and climb stairs safely in real-world settings without a runtime safety filter.
comment: 8 pages
Architecture Is All You Need: Diversity-Enabled Sweet Spots for Robust Humanoid Locomotion
Robust humanoid locomotion in unstructured environments requires architectures that balance fast low-level stabilization with slower perceptual decision-making. We show that a simple layered control architecture (LCA), a proprioceptive stabilizer running at high rate, coupled with a compact low-rate perceptual policy, enables substantially more robust performance than monolithic end-to-end designs, even when using minimal perception encoders. Through a two-stage training curriculum (blind stabilizer pretraining followed by perceptual fine-tuning), we demonstrate that layered policies consistently outperform one-stage alternatives in both simulation and hardware. On a Unitree G1 humanoid, our approach succeeds across stair and ledge tasks where one-stage perceptual policies fail. These results highlight that architectural separation of timescales, rather than network scale or complexity, is the key enabler for robust perception-conditioned locomotion.
comment: 8 pages
Further Results on Safety-Critical Stabilization of Force-Controlled Nonholonomic Mobile Robots
In this paper, we address the stabilization problem for force-controlled nonholonomic mobile robots under safety-critical constraints. We propose a continuous, time-invariant control law based on the gamma m-quadratic programming (gamma m-QP) framework, which unifies control Lyapunov functions (CLFs) and control barrier functions (CBFs) to enforce both stability and safety in the closed-loop system. For the first time, we construct a global, time-invariant, strict Lyapunov function for the closed-loop nonholonomic mobile robot system with a nominal stabilization controller in polar coordinates; this strict Lyapunov function then serves as the CLF in the QP design. Next, by exploiting the inherent cascaded structure of the vehicle dynamics, we develop a CBF for the mobile robot via an integrator backstepping procedure. Our main results guarantee both asymptotic stability and safety for the closed-loop system. Both the simulation and experimental results are presented to illustrate the effectiveness and performance of our approach.
Through-the-Earth Magnetic Induction Communication and Networking: A Comprehensive Survey
Magnetic induction (MI) communication (MIC) has emerged as a promising candidate for underground communication networks due to its excellent penetration capabilities. Integration with Space-Air-Ground-Underground (SAGUI) networks in next-generation mobile communication systems requires a well-defined network architecture. A recent discovery in MIC research, MI fast fading, remains in its early stages and presents unique challenges. This paper provides a comprehensive survey on through-the-earth (TTE) MIC, covering MI applications, channel modeling, point-to-point MIC design, relay techniques, network frameworks, and emerging technologies. We compare various MIC applications to highlight TTE-specific challenges and review the principles of channel modeling, addressing both MI slow fading and MI fast fading, along with its potential impact on existing MIC theories. We conduct a fine-grained decomposition of MI channel power gain into four distinct physical parameters, and propose a novel geometric model to analyze MI fast fading. We also summarize MI relay techniques, examine crosstalk effects in relay and high-density networks, and explore key research tasks within the OSI framework for a holistic MI network protocol in SAGUI. To bridge the gaps identified, we propose a MIC framework that supports TCP/IP and Linux, enabling full implementation of existing and emerging MIC solutions. This framework empowers researchers to leverage Linux resources and deep learning platforms for accelerated development of MIC in SAGUI networks. Remaining research challenges, open issues, and promising novel techniques are further identified to advance MIC research.
comment: This work has been accepted by the IEEE Communications Surveys & Tutorials (COMST) for publication.The final published version will be available on IEEE Xplore
Dynamic-Key-Aware Co-Simulation Framework for Next Generation of SCADA Systems Encrypted by Quantum-Key-Distribution Techniques
To address growing cybersecurity challenges in modern power dispatch systems, this paper proposes a multi-layer modeling and optimization framework for SCADA systems enhanced with quantum key distribution (QKD). While most existing applications of QKD in the power sector focus on building secure point-to-point communication tunnels, they rarely consider the system-level coupling between key dynamics and control scheduling. In contrast, our approach integrates quantum key generation, consumption, inventory prediction, and control latency into a unified model, enabling key-aware reconfiguration of SCADA control chains based on task security demands and real-time resource constraints. To resolve conflicts in key resource allocation between transmission system operators (TSOs) and distribution system operators (DSOs), we formulate a bi-level Stackelberg game and transform it into a mathematical program with complementarity constraints (MPCC). We further develop an efficient Level Decomposition-Complementarity Pruning (LD-CP) algorithm to solve the problem. To support reproducible evaluation, we build an end-to-end co-simulation platform that integrates physical-layer disruptions via OpenQKD-Sim, Q3P/IEC-104 protocol stack binding, and real-time control-chain monitoring through Grafana. Experimental results on the IEEE 39- and 118-bus systems show that our method increases task success rate by 25%, reduces peak frequency deviation by 70%, and improves key utilization to 83%. This work lays the foundation for future quantum-secure control systems in power grid operations.
Improved Voltage Regulation with Optimal Design of Decentralized Volt-VAr Control
Integration of distributed energy resources has created a need for autonomous, dynamic voltage regulation. Decentralized Volt-VAr Control (VVC) of grid-connected inverters presents a unique opportunity for voltage management but, if designed poorly, can lead to unstable behavior when in feedback with the grid. We model the grid-VVC closed-loop dynamics with a linearized power flow approach, leveraging historical data, which shows improvement over the commonly used LinDistFlow model. This model is used to design VVC slopes by minimizing steady-state voltage deviation from the nominal value, subject to a non-convex spectral radius stability constraint, which has not been previously implemented within this context. We compare this constraint to existing convex restrictions and demonstrate, through simulations on a realistic feeder, that using the spectral radius results in more effective voltage regulation.
A Human-Vector Susceptible--Infected--Susceptible Model for Analyzing and Controlling the Spread of Vector-Borne Diseases
We propose an epidemic model for the spread of vector-borne diseases. The model, which is built extending the classical susceptible-infected-susceptible model, accounts for two populations -- humans and vectors -- and for cross-contagion between the two species, whereby humans become infected upon interaction with carrier vectors, and vectors become carriers after interaction with infected humans. We formulate the model as a system of ordinary differential equations and leverage monotone systems theory to rigorously characterize the epidemic dynamics. Specifically, we characterize the global asymptotic behavior of the disease, determining conditions for quick eradication of the disease (i.e., for which all trajectories converge to a disease-free equilibrium), or convergence to a (unique) endemic equilibrium. Then, we incorporate two control actions: namely, vector control and incentives to adopt protection measures. Using the derived mathematical tools, we assess the impact of these two control actions and determine the optimal control policy.
comment: To appear in the Proceedings of the 2025 European Control Conference (ECC)
High-Resolution PTDF-Based Planning of Storage and Transmission Under High Renewables
Transmission Expansion Planning (TEP) optimizes power grid upgrades and investments to ensure reliable, efficient, and cost-effective electricity delivery while addressing grid constraints. To support growing demand and renewable energy integration, energy storage is emerging as a pivotal asset that provides temporal flexibility and alleviates congestion. This paper develops a multiperiod, two-stage PTDF formulation that co-optimizes transmission upgrades and storage siting/sizing. To ensure scalability, a trust-region, multicut Benders scheme warm-started from per-representative-day optima is proposed. Applied to a 2,000-bus synthetic Texas system under high-renewable projections, the method attains final optimality gaps below 1% and yields a plan with storage at about 180 nodes (32% of peak renewable capacity). These results demonstrate that the proposed PTDF-based methodology efficiently handles large distributed storage fleets, demonstrating scalability at high spatial resolution
A Deep State-Space Model Compression Method using Upper Bound on Output Error
We study deep state-space models (Deep SSMs) that contain linear-quadratic-output (LQO) systems as internal blocks and present a compression method with a provable output error guarantee. We first derive an upper bound on the output error between two Deep SSMs and show that the bound can be expressed via the $h^2$-error norms between the layerwise LQO systems, thereby providing a theoretical justification for existing model order reduction (MOR)-based compression. Building on this bound, we formulate an optimization problem in terms of the $h^2$-error norm and develop a gradient-based MOR method. On the IMDb task from the Long Range Arena benchmark, we demonstrate that our compression method achieves strong performance. Moreover, unlike prior approaches, we reduce roughly 80% of trainable parameters without retraining, with only a 4-5% performance drop.
Stability Criteria and Motor Performance in Delayed Haptic Dyadic Interactions Mediated by Robots
This paper establishes analytical stability criteria for robot-mediated human-human (dyadic) interaction systems, focusing on haptic communication under network-induced time delays. Through frequency-domain analysis supported by numerical simulations, we identify both delay-independent and delay-dependent stability criteria. The delay-independent criterion guarantees stability irrespective of the delay, whereas the delay-dependent criterion is characterised by a maximum tolerable delay before instability occurs. The criteria demonstrate dependence on controller and robot dynamic parameters, where increasing stiffness reduces the maximum tolerable delay in a non-linear manner, thereby heightening system vulnerability. The proposed criteria can be generalised to a wide range of robot-mediated interactions and serve as design guidelines for stable remote dyadic systems. Experiments with robots performing human-like movements further illustrate the correlation between stability and motor performance. The findings of this paper suggest the prerequisites for effective delay-compensation strategies.
RoboANKLE: Design, Development, and Functional Evaluation of a Robotic Ankle with a Motorized Compliant Unit
This study presents a powered transtibial prosthesis with complete push-off assistance, RoboANKLE. The design aims to fulfill specific requirements, such as a sufficient range of motion (RoM) while providing the necessary torque for achieving natural ankle motion in daily activities. Addressing the challenges faced in designing active transtibial prostheses, such as maintaining energetic autonomy and minimizing weight, is vital for the study. With this aim, we try to imitate the human ankle by providing extensive push-off assistance to achieve a natural-like torque profile. Thus, Energy Store and Extended Release mechanism (ESER) is employed with a novel Extra Energy Storage (EES) mechanism. Kinematic and kinetic analyses are carried out to determine the design parameters and assess the design performance. Subsequently, a Computer-Aided Design (CAD) model is built and used in comprehensive dynamic and structural analyses. These analyses are used for the design performance evaluation and determine the forces and torques applied to the prosthesis, which aids in optimizing the design for minimal weight via structural analysis and topology optimization. The design of the prototype is then finalized and manufactured for experimental evaluation to validate the design and functionality. The prototype is realized with a mass of 1.92 kg and dimensions of 261x107x420 mm. The Functional evaluations of the RoboANKLE revealed that it is capable of achieving the natural maximum dorsi-flexion angle with 95% accuracy. Also, Thanks to the implemented mechanisms, the results show that RoboANKLE can generate 57% higher than the required torque for natural walking. The result of the power generation capacity of the RoboANKLE is 10% more than the natural power during the gait cycle.
Prescribed Performance Control of Deformable Object Manipulation in Spatial Latent Space
Manipulating three-dimensional (3D) deformable objects presents significant challenges for robotic systems due to their infinite-dimensional state space and complex deformable dynamics. This paper proposes a novel model-free approach for shape control with constraints imposed on key points. Unlike existing methods that rely on feature dimensionality reduction, the proposed controller leverages the coordinates of key points as the feature vector, which are extracted from the deformable object's point cloud using deep learning methods. This approach not only reduces the dimensionality of the feature space but also retains the spatial information of the object. By extracting key points, the manipulation of deformable objects is simplified into a visual servoing problem, where the shape dynamics are described using a deformation Jacobian matrix. To enhance control accuracy, a prescribed performance control method is developed by integrating barrier Lyapunov functions (BLF) to enforce constraints on the key points. The stability of the closed-loop system is rigorously analyzed and verified using the Lyapunov method. Experimental results further demonstrate the effectiveness and robustness of the proposed method.
A Comparative Study of Oscillatory Perturbations in Car-Following Models
As connected and autonomous vehicles become more widespread, platooning has emerged as a key strategy to improve road capacity, reduce fuel consumption, and enhance traffic flow. However, the benefits of platoons strongly depend on their ability to maintain stability. Instability can lead to unsafe spacing and increased energy usage. In this work, we study platoon instability and analyze the root cause of its occurrence, as well as its impacts on the following vehicle. To achieve this, we propose a comparative study between different car-following models such as the Intelligent Driver Model (IDM), the Optimal Velocity Model (OVM), the General Motors Model (GMM), and the Cooperative Adaptive Cruise Control (CACC). In our approach, we introduce a disruption in the model by varying the velocity of the leading vehicle to visualize the behavior of the following vehicles. To evaluate the dynamic response of each model, we introduce controlled perturbations in the velocity of the leading vehicle, specifically, sinusoidal oscillations and discrete velocity changes. The resulting vehicle trajectories and variations in inter-vehicle spacing are analyzed to assess the robustness of each model to disturbance propagation. The findings offer insight into model sensitivity, stability characteristics, and implications for designing resilient platooning control strategies.
Two Roads to Koopman Operator Theory for Control: Infinite Input Sequences and Operator Families
The Koopman operator, originally defined for dynamical systems without input, has inspired many applications in control. Yet, the theoretical foundations underpinning this progress in control remain underdeveloped. This paper investigates the theoretical structure and connections between two extensions of Koopman theory to control: (i) Koopman operator via infinite input sequences and (ii) the Koopman control family. Although these frameworks encode system information in fundamentally different ways, we show that under certain conditions on the function spaces they operate on, they are equivalent. The equivalence is both in terms of the actions of the Koopman-based formulations in each framework as well as the function values on the system trajectories. Our analysis provides constructive tools to translate between the frameworks, offering a unified perspective for Koopman methods in control.
Tail-Optimized Caching for LLM Inference
Prompt caching is critical for reducing latency and cost in LLM inference: OpenAI and Anthropic report up to 50-90% cost savings through prompt reuse. Despite its widespread success, little is known about what constitutes an optimal prompt caching policy, particularly when optimizing tail latency, a metric of central importance to practitioners. The widely used Least Recently Used (LRU) policy can perform arbitrarily poor on this metric, as it is oblivious to the heterogeneity of conversation lengths. To address this gap, we propose Tail-Optimized LRU, a simple two-line modification that reallocates KV cache capacity to prioritize high-latency conversations by evicting cache entries that are unlikely to affect future turns. Though the implementation is simple, we prove its optimality under a natural stochastic model of conversation dynamics, providing the first theoretical justification for LRU in this setting, a result that may be of independent interest to the caching community. Experimentally, on real conversation data WildChat, Tail-Optimized LRU achieves up to 27.5% reduction in P90 tail Time to First Token latency and 23.9% in P95 tail latency compared to LRU, along with up to 38.9% decrease in SLO violations of 200ms. We believe this provides a practical and theoretically grounded option for practitioners seeking to optimize tail latency in real-world LLM deployments.
Sparsity-exploiting Gaussian Process for Robust Transient Learning of Power System Dynamics
Advances in leveraging Gaussian processes (GP) have enabled learning and inferring dynamic grid behavior from scarce PMU measurements. However, real measurements can be corrupted by various random and targeted threats, leading to inaccurate and meaningless results. This paper develops robust transient learning to overcome this challenge by exploiting the sparse corruption patterns in the data flow. Specifically, we integrate sparse optimization with method of moments (MoM) to make learning robust to a sparse distribution of data corruptions; then, we optimize sparse weights to identify corrupted meter locations. To improve inference speed on large-scale systems, we further adopt K-medoid clustering of locations to develop dimension reduction (DR) and aggregate representation (AR) heuristics. Experimental results demonstrate robustness against random large errors, targeted false data injections, and local PMU clock drifts. On a 1354-bus system, inference turns out to be 18x faster using DR and 400x faster when further combined with AR heuristics.
comment: This manuscript has been submitted to PESGM2026
Exploring a New Design Paradigm for Omnidirectional MAVs for Minimal Actuation and Internal Force Elimination: Theoretical Framework and Control
This paper presents a novel concept for achieving omnidirectionality in a multirotor aerial vehicle (MAV) that uses only 6 inputs and ensures no internal forces at the equilibria. The concept integrates a single actively-tilting propeller along with 3 pendulum-like links, each carrying a propeller, connected by passive universal joints to the main body. We show that this design ensures omnidirectionality while minimizing the internal forces and without resorting to overactuation (i.e., more than 6 inputs). A detailed dynamic model of the multi-link MAV is first developed. Afterwards, the analysis identifies the equilibrium configurations and illustrates that a forced equilibrium exists for every pose of the MAV's main platform. In order to render this equilibrium asymptotically stable for the closed-loop system, a geometric nonlinear controller is constructed using dynamic feedback linearization and backstepping techniques with the main platform configuration error being the left-trivialized error on SE(3). The stability of the closed-loop system is then investigated by employing standard Lyapunov arguments on the zero dynamics. We conclude by providing numerical simulations validating the proposed approach. They demonstrate the MAV capability to perform decoupled attitude and translational motions under non-zero initial conditions, parametric uncertainty, and actuators noise.
Q-EnergyDEX: A Zero-Trust Distributed Energy Trading Framework Driven by Quantum Key Distribution and Blockchain
The rapid decentralization and digitalization of local electricity markets have introduced new cyber-physical vulnerabilities, including key leakage, data tampering, and identity spoofing. Existing blockchain-based solutions provide transparency and traceability but still depend on classical cryptographic primitives that are vulnerable to quantum attacks. To address these challenges, this paper proposes Q-EnergyDEX, a zero-trust distributed energy trading framework driven by quantum key distribution and blockchain. The framework integrates physical-layer quantum randomness with market-level operations, providing an end-to-end quantum-secured infrastructure. A cloud-based Quantum Key Management Service continuously generates verifiable entropy and regulates key generation through a rate-adaptive algorithm to sustain high-quality randomness. A symmetric authentication protocol (Q-SAH) establishes secure and low-latency sessions, while the quantum-aided consensus mechanism (PoR-Lite) achieves probabilistic ledger finality within a few seconds. Furthermore, a Stackelberg-constrained bilateral auction couples market clearing with entropy availability, ensuring both economic efficiency and cryptographic security. Simulation results show that Q-EnergyDEX maintains robust key stability and near-optimal social welfare, demonstrating its feasibility for large-scale decentralized energy markets.
A predictive modular approach to constraint satisfaction under uncertainty - with application to glycosylation in continuous monoclonal antibody biosimilar production
The paper proposes a modular-based approach to constraint handling in process optimization and control. This is partly motivated by the recent interest in learning-based methods, e.g., within bioproduction, for which constraint handling under uncertainty is a challenge. The proposed constraint handler, called predictive filter, is combined with an adaptive constraint margin and a constraint violation cost monitor to minimize the cost of violating soft constraints due to model uncertainty and disturbances. The module can be combined with any controller and is based on minimally modifying the controller output, in a least squares sense, such that constraints are satisfied within the considered horizon. The proposed method is computationally efficient and suitable for real-time applications. The effectiveness of the method is illustrated through a realistic simulation case study of glycosylation constraint satisfaction in continuous monoclonal antibody biosimilar production using Chinese hamster ovary cells, for which the metabolic network model consists of 23 extracellular metabolites and 126 reactions.
Revolution-Spaced Output-Feedback Model Predictive Control for Station Keeping on Near-Rectilinear Halo Orbits
We develop a model predictive control (MPC) policy for station keeping on a Near-Rectilinear Halo Orbit (NRHO). The proposed policy achieves full-state tracking of a reference NRHO via a multiple-maneuver control horizon, each spaced one revolution apart to abide by typical mission operation requirements. We prove that the proposed policy is recursively feasible, and perform numerical evaluation in an output-feedback setting by incorporating a navigation filter and realistic operational uncertainties, where the proposed MPC is compared against the state-of-the-art station-keeping algorithm adopted for the Gateway. Our approach successfully maintains the spacecraft in the vicinity of the reference NRHO at a similar cumulative cost as existing station-keeping methods without encountering phase deviation issues, a common drawback of existing methods with one maneuver per revolution.
comment: 8 pages, 6 figures
Offline and Online Use of Interval and Set-Based Approaches for Control and State Estimation: A Selection of Methodological Approaches and Their Application
Control and state estimation procedures need to be robust against imprecisely known parameters, uncertainty in initial conditions, and external disturbances. Interval methods and other set-based techniques form the basis for the implementation of powerful approaches that can be used to identify parameters of dynamic system models in the presence of the aforementioned types of uncertainty. Moreover, they are applicable to a verified feasibility and stability analysis of controllers and state estimators. In addition to these approaches which are typically used offline for analysis of system models designed with classical floating point procedures, interval and set-based methods have also been developed in recent years, which allow to directly solve the associated design tasks and to implement reliable techniques that are applicable online, i.e., during system operation. The latter approaches include set-based model predictive control, online parameter adaptation techniques for nonlinear variable-structure and backstepping controllers, interval observers, and fault diagnosis techniques. This paper provides an overview of the methodological background and reviews numerous practical applications for which interval and other set-valued approaches have been employed successfully.
No-Regret Learning in Stackelberg Games with an Application to Electric Ride-Hailing
We consider the problem of efficiently learning to play single-leader multi-follower Stackelberg games when the leader lacks knowledge of the lower-level game. Such games arise in hierarchical decision-making problems involving self-interested agents. For example, in electric ride-hailing markets, a central authority aims to learn optimal charging prices to shape fleet distributions and charging patterns of ride-hailing companies. Existing works typically apply gradient-based methods to find the leader's optimal strategy. Such methods are impractical as they require that the followers share private utility information with the leader. Instead, we treat the lower-level game as a black box, assuming only that the followers' interactions approximate a Nash equilibrium while the leader observes the realized cost of the resulting approximation. Under kernel-based regularity assumptions on the leader's cost function, we develop a no-regret algorithm that converges to an $\epsilon$-Stackelberg equilibrium in $O(\sqrt{T})$ rounds. Finally, we validate our approach through a numerical case study on optimal pricing in electric ride-hailing markets.
comment: 8 pages, 2 figures, 1 table
Hierarchical Fuel-Cell Airpath Control: an Efficiency-Aware MIMO Control Approach Combined with a Novel Constraint-Enforcing Reference Governor
This paper presents a hierarchical multivariable control and constraint management approach for an air supply system for a proton exchange membrane fuel cell (PEMFC) system. The control objectives are to track desired compressor mass airflow and cathode inlet pressure, maintain a minimum oxygen excess ratio (OER), and run the system at maximum net efficiency. A multi-input multi-output (MIMO) internal model controller (IMC) is designed and simulated to track flow and pressure set-points, which showed high performance despite strongly coupled plant dynamics. A new set-point map is generated to compute the most efficient cathode inlet pressure from the stack current load. To enforce OER constraints, a novel reference governor (RG) with the ability to govern multiple references (the cascade RG) and the ability to speed up as well as slow down a reference signal (the cross-section RG) is developed and tested. Compared with a single-input single-output (SISO) air-flow control approach, the proposed MIMO control approach shows up to 7.36 percent lower hydrogen fuel consumption. Compared to a traditional load governor, the novel cascaded cross-section RG (CC-RG) shows up to 3.68 percent less mean absolute percent error (MAPE) on net power tracking and greatly improved worst-case OER on realistic drive-cycle simulations. Control development and validations were conducted on two fuel cell system (FCS) models, a nonlinear open-source model and a proprietary Ford high-fidelity model
comment: Accepted for publication in IEEE Transactions on Control Systems Technology. This version incorporates all peer-review revisions
Offline Reinforcement Learning via Inverse Optimization
Inspired by the recent successes of Inverse Optimization (IO) across various application domains, we propose a novel offline Reinforcement Learning (ORL) algorithm for continuous state and action spaces, leveraging the convex loss function called ``sub-optimality loss" from the IO literature. To mitigate the distribution shift commonly observed in ORL problems, we further employ a robust and non-causal Model Predictive Control (MPC) expert steering a nominal model of the dynamics using in-hindsight information stemming from the model mismatch. Unlike the existing literature, our robust MPC expert enjoys an exact and tractable convex reformulation. In the second part of this study, we show that the IO hypothesis class, trained by the proposed convex loss function, enjoys ample expressiveness and achieves competitive performance comparing with the state-of-the-art (SOTA) methods in the low-data regime of the MuJoCo benchmark while utilizing three orders of magnitude fewer parameters, thereby requiring significantly fewer computational resources. To facilitate the reproducibility of our results, we provide an open-source package implementing the proposed algorithms and the experiments.
comment: preprint
Composite learning backstepping control with guaranteed exponential stability and robustness
Adaptive backstepping control provides a feasible solution to achieve asymptotic tracking for mismatched uncertain nonlinear systems. However, the closed-loop stability depends on high-gain feedback generated by nonlinear damping terms, and closed-loop exponential stability with parameter convergence involves a stringent condition named persistent excitation (PE). This paper proposes a composite learning backstepping control (CLBC) strategy based on modular backstepping and high-order tuners to compensate for the transient process of parameter estimation and achieve closed-loop exponential stability without the nonlinear damping terms and the PE condition. A novel composite learning mechanism is designed to maximize the staged exciting strength for parameter estimation, such that parameter convergence can be achieved under a condition of interval excitation (IE) or even partial IE that is strictly weaker than PE. An extra prediction error is employed in the adaptive law to ensure the transient performance without nonlinear damping terms. The exponential stability of the closed-loop system is proved rigorously under the partial IE or IE condition. Simulations have demonstrated the effectiveness and superiority of the proposed method in both parameter estimation and control compared to state-of-the-art methods.
comment: This work has been submitted to the IEEE for possible publication
Strategy Templates for Almost-Sure and Positive Winning of Stochastic Parity Games towards Permissive and Resilient Control
Stochastic games are fundamental in various applications, including the control of cyber-physical systems (CPS), where both controller and environment are modeled as players. Traditional algorithms typically aim to determine a single winning strategy to develop a controller. However, in CPS control and other domains, permissive controllers are essential, as they enable the system to adapt when additional constraints arise and remain resilient to runtime changes. This work generalizes the concept of (permissive winning) strategy templates, originally introduced by Anand et al. at TACAS and CAV 2023 for deterministic games, to incorporate stochastic games. These templates capture an infinite number of winning strategies, allowing for efficient strategy adaptation to system changes. We focus on two winning criteria (almost-sure and positive winning) and five winning objectives (safety, reachability, B\"uchi, co-B\"uchi, and parity). Our contributions include algorithms for constructing templates for each winning criterion and objective and a novel approach for extracting a winning strategy from a given template. Discussions on comparisons between templates and between strategy extraction methods are provided.
comment: For the conference version published at ICTAC 2024 see: arXiv:2409.08607v1
A Weighted Predict-and-Optimize Framework for Power System Operation Considering Varying Impacts of Uncertainty
Prediction deviations of different uncertainties have varying impacts on downstream decision-making. Improving the prediction accuracy of critical uncertainties with significant impacts on decision-making quality yields better optimization results. Motivated by this observation, this paper proposes a novel weighted predict-and-optimize (WPO) framework for decision-making under multiple uncertainties. Specifically, we incorporate an uncertainty-aware weighting mechanism into the predictive model to capture the relative impact of each uncertainty on specific optimization tasks, and introduce a problem-driven prediction loss (PDPL) to quantify the suboptimality of the weighted predictions relative to perfect predictions in downstream optimization. By optimizing the uncertainty weights to minimize the PDPL, the proposed WPO framework enables adaptive assessment of uncertainty impacts and joint learning of prediction and optimization. Furthermore, to facilitate weight optimization, we develop a surrogate model that establishes a direct mapping between the uncertainty weights and the PDPL, where enhanced graph convolutional networks and multi-task learning are adopted for efficient surrogate model construction and training. Numerical experiments on the modified IEEE 33-bus and 123-bus systems demonstrate that the proposed WPO framework outperforms the traditional predict-then-optimize paradigm, reducing the PDPL by an average of 55% within acceptable computational time.
comment: This is a paper submitted to IEEE TRANSACTIONS ON Power Systems
Differentiable-by-design Nonlinear Optimization for Model Predictive Control
Nonlinear optimization-based control policies, such as those those arising in nonlinear Model Predictive Control, have seen remarkable success in recent years. These policies require solving computationally demanding nonlinear optimization programs online at each time-step. The resulting solution map, viewed as a function of the measured state of the system and design parameters, may not be differentiable, which poses significant challenges if the control policy is embedded in a gradient-based policy optimization scheme. We propose a principled way to regularize the nonlinear optimization problem, obtaining a surrogate derivative even if when the original problem is not differentiable. The surrogate problem is differentiable by design and its solution map coincides with the solution of the unregularized problem. We demonstrate the effectiveness of our approach in a free-final-time optimal control problem and a receding-horizon nonlinear MPC example.
Time-causal and time-recursive wavelets
When to apply wavelet analysis to real-time temporal signals, where the future cannot be accessed, it is essential to base all the steps in the signal processing pipeline on computational mechanisms that are truly time-causal. This paper describes how a time-causal wavelet analysis can be performed based on concepts developed in the area of temporal scale-space theory, originating from a complete classification of temporal smoothing kernels that guarantee non-creation of new structures from finer to coarser temporal scale levels. By necessity, convolution with truncated exponential kernels in cascade constitutes the only permissable class of kernels, as well as their temporal derivatives as a natural complement to fulfil the admissibility conditions of wavelet representations. For a particular way of choosing the time constants in the resulting infinite convolution of truncated exponential kernels, to ensure temporal scale covariance and thus self-similarity over temporal scales, we describe how mother wavelets can be chosen as temporal derivatives of the resulting time-causal limit kernel. By developing connections between wavelet theory and scale-space theory, we characterize and quantify how the continuous scaling properties transfer to the discrete implementation, demonstrating how the proposed time-causal wavelet representation can reflect the duration of locally dominant temporal structures in the input signals. We propose that this notion of time-causal wavelet analysis could be a valuable tool for signal processing tasks, where streams of signals are to be processed in real time, specifically for signals that may contain local variations over a rich span of temporal scales, or more generally for analysing physical or biophysical temporal phenomena, where a fully time-causal analysis is called for to be physically realistic.
comment: 23 pages, 8 figures
Inferring Foresightedness in Dynamic Noncooperative Games
Dynamic game theory is an increasingly popular tool for modeling multi-agent, e.g. human-robot, interactions. Game-theoretic models presume that each agent wishes to minimize a private cost function that depends on others' actions. These games typically evolve over a fixed time horizon, specifying how far into the future each agent plans. In practical settings, however, decision-makers may vary in foresightedness, or how much they care about their current cost in relation to their past and future costs. We conjecture that quantifying and estimating each agent's foresightedness from online data will enable safer and more efficient interactions with other agents. To this end, we frame this inference problem as an inverse dynamic game. We consider a specific objective function parametrization that smoothly interpolates myopic and farsighted planning. Games of this form are readily transformed into parametric mixed complementarity problems; we exploit the directional differentiability of solutions to these problems with respect to their hidden parameters to solve for agents' foresightedness. We conduct three experiments: one with synthetically generated delivery robot motion, one with real-world data involving people walking, biking, and driving vehicles, and one using high-fidelity simulators. The results of these experiments demonstrate that explicitly inferring agents' foresightedness enables game-theoretic models to make 33% more accurate models for agents' behavior.
PowerChain: A Verifiable Agentic AI System for Automating Distribution Grid Analyses
Rapid electrification and decarbonization are increasing the complexity of distribution grid (DG) operation and planning, necessitating advanced computational analyses to ensure reliability and resilience. These analyses depend on disparate workflows comprising complex models, function calls, and data pipelines that require substantial expert knowledge and remain difficult to automate. Workforce and budget constraints further limit utilities' ability to apply such analyses at scale. To address this gap, we build an agentic system PowerChain, which is capable of autonomously performing complex grid analyses. Existing agentic AI systems are typically developed in a bottom-up manner with customized context for predefined analysis tasks; therefore, they do not generalize to tasks that the agent has never seen. In comparison, to generalize to unseen DG analysis tasks, PowerChain dynamically generates structured context by leveraging supervisory signals from self-contained power systems tools (e.g., GridLAB-D) and an optimized set of expert-annotated and verified reasoning trajectories. For complex DG tasks defined in natural language, empirical results on real utility data demonstrate that PowerChain achieves up to a 144/% improvement in performance over baselines.
Physiology-informed layered sensing for intelligent human-exoskeleton interaction
Wearable exoskeletons hold transformative promise for restoring mobility across diverse users with muscular weakness or other impairments. However, their translation beyond laboratory environments remains limited by sensing systems that capture movement but not underlying physiology. Here, we present a soft, lightweight smart leg sleeve that achieves anatomically aligned, layered multimodal sensing by integrating textile-based surface electromyography (sEMG) electrodes, ultrasensitive textile strain sensors, and inertial measurement units (IMUs). Each sensing modality targets a distinct physiological layer: IMUs track joint kinematics at the skeletal level, sEMG monitors muscle activation at the muscular level, and strain sensors detect skin deformation at the cutaneous level. Together, these sensors provide real-time perception to support three core objectives: controlling personalized assistance, optimizing user effort, and safeguarding against injury risks. The system is skin-conformal, mechanically compliant, and seamlessly integrated with a custom exoskeleton ($<20$~g total sensor and electronics weight). We demonstrate: (1) accurate ankle joint moment estimation (RMSE = 0.13~Nm/kg), (2) real-time classification of metabolic trends (accuracy = 97.1\%), and (3) injury risk detection within 100~ms (recall = 0.96), all validated on unseen users using a leave-one-subject-out protocol. This work establishes a physiology-aligned sensing architecture that reframes exoskeleton perception from motion tracking to real-time physiological decoding, offering a pathway towards intelligent, adaptive, and personalized wearable robotics.
comment: 21 pages, 5 figures, 43 references
End-to-End Learning Framework for Solving Non-Markovian Optimal Control
Integer-order calculus often falls short in capturing the long-range dependencies and memory effects found in many real-world processes. Fractional calculus addresses these gaps via fractional-order integrals and derivatives, but fractional-order dynamical systems pose substantial challenges in system identification and optimal control due to the lack of standard control methodologies. In this paper, we theoretically derive the optimal control via linear quadratic regulator (LQR) for fractional-order linear time-invariant (FOLTI) systems and develop an end-to-end deep learning framework based on this theoretical foundation. Our approach establishes a rigorous mathematical model, derives analytical solutions, and incorporates deep learning to achieve data-driven optimal control of FOLTI systems. Our key contributions include: (i) proposing an innovative system identification method control strategy for FOLTI systems, (ii) developing the first end-to-end data-driven learning framework, Fractional-Order Learning for Optimal Control (FOLOC), that learns control policies from observed trajectories, and (iii) deriving a theoretical analysis of sample complexity to quantify the number of samples required for accurate optimal control in complex real-world problems. Experimental results indicate that our method accurately approximates fractional-order system behaviors without relying on Gaussian noise assumptions, pointing to promising avenues for advanced optimal control.
Autonomous Cyber Resilience via a Co-Evolutionary Arms Race within a Fortified Digital Twin Sandbox SP
The convergence of Information Technology and Operational Technology has exposed Industrial Control Systems to adaptive, intelligent adversaries that render static defenses obsolete. This paper introduces the Adversarial Resilience Co-evolution (ARC) framework, addressing the "Trinity of Trust" comprising model fidelity, data integrity, and analytical resilience. ARC establishes a co-evolutionary arms race within a Fortified Secure Digital Twin (F-SCDT), where a Deep Reinforcement Learning "Red Agent" autonomously discovers attack paths while an ensemble-based "Blue Agent" is continuously hardened against these threats. Experimental validation on the Tennessee Eastman Process (TEP) and Secure Water Treatment (SWaT) testbeds demonstrates superior performance in detecting novel attacks, with F1-scores improving from 0.65 to 0.89 and detection latency reduced from over 1200 seconds to 210 seconds. A comprehensive ablation study reveals that the co-evolutionary process itself contributes a 27% performance improvement. By integrating Explainable AI and proposing a Federated ARC architecture, this work presents a necessary paradigm shift toward dynamic, self-improving security for critical infrastructure.
comment: 6 pages, 2 figures, 4 equations, 1 algorithm, 3 tables, to be published in ISPACS 2025, unabridged version exists as arXiv:2506.20102v1
Systems and Control (EESS)
RDD: Retrieval-Based Demonstration Decomposer for Planner Alignment in Long-Horizon Tasks NeurIPS 2025
To tackle long-horizon tasks, recent hierarchical vision-language-action (VLAs) frameworks employ vision-language model (VLM)-based planners to decompose complex manipulation tasks into simpler sub-tasks that low-level visuomotor policies can easily handle. Typically, the VLM planner is finetuned to learn to decompose a target task. This finetuning requires target task demonstrations segmented into sub-tasks by either human annotation or heuristic rules. However, the heuristic subtasks can deviate significantly from the training data of the visuomotor policy, which degrades task performance. To address these issues, we propose a Retrieval-based Demonstration Decomposer (RDD) that automatically decomposes demonstrations into sub-tasks by aligning the visual features of the decomposed sub-task intervals with those from the training data of the low-level visuomotor policies. Our method outperforms the state-of-the-art sub-task decomposer on both simulation and real-world tasks, demonstrating robustness across diverse settings. Code and more results are available at rdd-neurips.github.io.
comment: 39th Conference on Neural Information Processing Systems (NeurIPS 2025); Project Website: rdd-neurips.github.io
CBF-RL: Safety Filtering Reinforcement Learning in Training with Control Barrier Functions
Reinforcement learning (RL), while powerful and expressive, can often prioritize performance at the expense of safety. Yet safety violations can lead to catastrophic outcomes in real-world deployments. Control Barrier Functions (CBFs) offer a principled method to enforce dynamic safety -- traditionally deployed \emph{online} via safety filters. While the result is safe behavior, the fact that the RL policy does not have knowledge of the CBF can lead to conservative behaviors. This paper proposes CBF-RL, a framework for generating safe behaviors with RL by enforcing CBFs \emph{in training}. CBF-RL has two key attributes: (1) minimally modifying a nominal RL policy to encode safety constraints via a CBF term, (2) and safety filtering of the policy rollouts in training. Theoretically, we prove that continuous-time safety filters can be deployed via closed-form expressions on discrete-time roll-outs. Practically, we demonstrate that CBF-RL internalizes the safety constraints in the learned policy -- both enforcing safer actions and biasing towards safer rewards -- enabling safe deployment without the need for an online safety filter. We validate our framework through ablation studies on navigation tasks and on the Unitree G1 humanoid robot, where CBF-RL enables safer exploration, faster convergence, and robust performance under uncertainty, enabling the humanoid robot to avoid obstacles and climb stairs safely in real-world settings without a runtime safety filter.
comment: 8 pages
Architecture Is All You Need: Diversity-Enabled Sweet Spots for Robust Humanoid Locomotion
Robust humanoid locomotion in unstructured environments requires architectures that balance fast low-level stabilization with slower perceptual decision-making. We show that a simple layered control architecture (LCA), a proprioceptive stabilizer running at high rate, coupled with a compact low-rate perceptual policy, enables substantially more robust performance than monolithic end-to-end designs, even when using minimal perception encoders. Through a two-stage training curriculum (blind stabilizer pretraining followed by perceptual fine-tuning), we demonstrate that layered policies consistently outperform one-stage alternatives in both simulation and hardware. On a Unitree G1 humanoid, our approach succeeds across stair and ledge tasks where one-stage perceptual policies fail. These results highlight that architectural separation of timescales, rather than network scale or complexity, is the key enabler for robust perception-conditioned locomotion.
comment: 8 pages
Further Results on Safety-Critical Stabilization of Force-Controlled Nonholonomic Mobile Robots
In this paper, we address the stabilization problem for force-controlled nonholonomic mobile robots under safety-critical constraints. We propose a continuous, time-invariant control law based on the gamma m-quadratic programming (gamma m-QP) framework, which unifies control Lyapunov functions (CLFs) and control barrier functions (CBFs) to enforce both stability and safety in the closed-loop system. For the first time, we construct a global, time-invariant, strict Lyapunov function for the closed-loop nonholonomic mobile robot system with a nominal stabilization controller in polar coordinates; this strict Lyapunov function then serves as the CLF in the QP design. Next, by exploiting the inherent cascaded structure of the vehicle dynamics, we develop a CBF for the mobile robot via an integrator backstepping procedure. Our main results guarantee both asymptotic stability and safety for the closed-loop system. Both the simulation and experimental results are presented to illustrate the effectiveness and performance of our approach.
Through-the-Earth Magnetic Induction Communication and Networking: A Comprehensive Survey
Magnetic induction (MI) communication (MIC) has emerged as a promising candidate for underground communication networks due to its excellent penetration capabilities. Integration with Space-Air-Ground-Underground (SAGUI) networks in next-generation mobile communication systems requires a well-defined network architecture. A recent discovery in MIC research, MI fast fading, remains in its early stages and presents unique challenges. This paper provides a comprehensive survey on through-the-earth (TTE) MIC, covering MI applications, channel modeling, point-to-point MIC design, relay techniques, network frameworks, and emerging technologies. We compare various MIC applications to highlight TTE-specific challenges and review the principles of channel modeling, addressing both MI slow fading and MI fast fading, along with its potential impact on existing MIC theories. We conduct a fine-grained decomposition of MI channel power gain into four distinct physical parameters, and propose a novel geometric model to analyze MI fast fading. We also summarize MI relay techniques, examine crosstalk effects in relay and high-density networks, and explore key research tasks within the OSI framework for a holistic MI network protocol in SAGUI. To bridge the gaps identified, we propose a MIC framework that supports TCP/IP and Linux, enabling full implementation of existing and emerging MIC solutions. This framework empowers researchers to leverage Linux resources and deep learning platforms for accelerated development of MIC in SAGUI networks. Remaining research challenges, open issues, and promising novel techniques are further identified to advance MIC research.
comment: This work has been accepted by the IEEE Communications Surveys & Tutorials (COMST) for publication.The final published version will be available on IEEE Xplore
Dynamic-Key-Aware Co-Simulation Framework for Next Generation of SCADA Systems Encrypted by Quantum-Key-Distribution Techniques
To address growing cybersecurity challenges in modern power dispatch systems, this paper proposes a multi-layer modeling and optimization framework for SCADA systems enhanced with quantum key distribution (QKD). While most existing applications of QKD in the power sector focus on building secure point-to-point communication tunnels, they rarely consider the system-level coupling between key dynamics and control scheduling. In contrast, our approach integrates quantum key generation, consumption, inventory prediction, and control latency into a unified model, enabling key-aware reconfiguration of SCADA control chains based on task security demands and real-time resource constraints. To resolve conflicts in key resource allocation between transmission system operators (TSOs) and distribution system operators (DSOs), we formulate a bi-level Stackelberg game and transform it into a mathematical program with complementarity constraints (MPCC). We further develop an efficient Level Decomposition-Complementarity Pruning (LD-CP) algorithm to solve the problem. To support reproducible evaluation, we build an end-to-end co-simulation platform that integrates physical-layer disruptions via OpenQKD-Sim, Q3P/IEC-104 protocol stack binding, and real-time control-chain monitoring through Grafana. Experimental results on the IEEE 39- and 118-bus systems show that our method increases task success rate by 25%, reduces peak frequency deviation by 70%, and improves key utilization to 83%. This work lays the foundation for future quantum-secure control systems in power grid operations.
Improved Voltage Regulation with Optimal Design of Decentralized Volt-VAr Control
Integration of distributed energy resources has created a need for autonomous, dynamic voltage regulation. Decentralized Volt-VAr Control (VVC) of grid-connected inverters presents a unique opportunity for voltage management but, if designed poorly, can lead to unstable behavior when in feedback with the grid. We model the grid-VVC closed-loop dynamics with a linearized power flow approach, leveraging historical data, which shows improvement over the commonly used LinDistFlow model. This model is used to design VVC slopes by minimizing steady-state voltage deviation from the nominal value, subject to a non-convex spectral radius stability constraint, which has not been previously implemented within this context. We compare this constraint to existing convex restrictions and demonstrate, through simulations on a realistic feeder, that using the spectral radius results in more effective voltage regulation.
A Human-Vector Susceptible--Infected--Susceptible Model for Analyzing and Controlling the Spread of Vector-Borne Diseases
We propose an epidemic model for the spread of vector-borne diseases. The model, which is built extending the classical susceptible-infected-susceptible model, accounts for two populations -- humans and vectors -- and for cross-contagion between the two species, whereby humans become infected upon interaction with carrier vectors, and vectors become carriers after interaction with infected humans. We formulate the model as a system of ordinary differential equations and leverage monotone systems theory to rigorously characterize the epidemic dynamics. Specifically, we characterize the global asymptotic behavior of the disease, determining conditions for quick eradication of the disease (i.e., for which all trajectories converge to a disease-free equilibrium), or convergence to a (unique) endemic equilibrium. Then, we incorporate two control actions: namely, vector control and incentives to adopt protection measures. Using the derived mathematical tools, we assess the impact of these two control actions and determine the optimal control policy.
comment: To appear in the Proceedings of the 2025 European Control Conference (ECC)
High-Resolution PTDF-Based Planning of Storage and Transmission Under High Renewables
Transmission Expansion Planning (TEP) optimizes power grid upgrades and investments to ensure reliable, efficient, and cost-effective electricity delivery while addressing grid constraints. To support growing demand and renewable energy integration, energy storage is emerging as a pivotal asset that provides temporal flexibility and alleviates congestion. This paper develops a multiperiod, two-stage PTDF formulation that co-optimizes transmission upgrades and storage siting/sizing. To ensure scalability, a trust-region, multicut Benders scheme warm-started from per-representative-day optima is proposed. Applied to a 2,000-bus synthetic Texas system under high-renewable projections, the method attains final optimality gaps below 1% and yields a plan with storage at about 180 nodes (32% of peak renewable capacity). These results demonstrate that the proposed PTDF-based methodology efficiently handles large distributed storage fleets, demonstrating scalability at high spatial resolution
A Deep State-Space Model Compression Method using Upper Bound on Output Error
We study deep state-space models (Deep SSMs) that contain linear-quadratic-output (LQO) systems as internal blocks and present a compression method with a provable output error guarantee. We first derive an upper bound on the output error between two Deep SSMs and show that the bound can be expressed via the $h^2$-error norms between the layerwise LQO systems, thereby providing a theoretical justification for existing model order reduction (MOR)-based compression. Building on this bound, we formulate an optimization problem in terms of the $h^2$-error norm and develop a gradient-based MOR method. On the IMDb task from the Long Range Arena benchmark, we demonstrate that our compression method achieves strong performance. Moreover, unlike prior approaches, we reduce roughly 80% of trainable parameters without retraining, with only a 4-5% performance drop.
Stability Criteria and Motor Performance in Delayed Haptic Dyadic Interactions Mediated by Robots
This paper establishes analytical stability criteria for robot-mediated human-human (dyadic) interaction systems, focusing on haptic communication under network-induced time delays. Through frequency-domain analysis supported by numerical simulations, we identify both delay-independent and delay-dependent stability criteria. The delay-independent criterion guarantees stability irrespective of the delay, whereas the delay-dependent criterion is characterised by a maximum tolerable delay before instability occurs. The criteria demonstrate dependence on controller and robot dynamic parameters, where increasing stiffness reduces the maximum tolerable delay in a non-linear manner, thereby heightening system vulnerability. The proposed criteria can be generalised to a wide range of robot-mediated interactions and serve as design guidelines for stable remote dyadic systems. Experiments with robots performing human-like movements further illustrate the correlation between stability and motor performance. The findings of this paper suggest the prerequisites for effective delay-compensation strategies.
RoboANKLE: Design, Development, and Functional Evaluation of a Robotic Ankle with a Motorized Compliant Unit
This study presents a powered transtibial prosthesis with complete push-off assistance, RoboANKLE. The design aims to fulfill specific requirements, such as a sufficient range of motion (RoM) while providing the necessary torque for achieving natural ankle motion in daily activities. Addressing the challenges faced in designing active transtibial prostheses, such as maintaining energetic autonomy and minimizing weight, is vital for the study. With this aim, we try to imitate the human ankle by providing extensive push-off assistance to achieve a natural-like torque profile. Thus, Energy Store and Extended Release mechanism (ESER) is employed with a novel Extra Energy Storage (EES) mechanism. Kinematic and kinetic analyses are carried out to determine the design parameters and assess the design performance. Subsequently, a Computer-Aided Design (CAD) model is built and used in comprehensive dynamic and structural analyses. These analyses are used for the design performance evaluation and determine the forces and torques applied to the prosthesis, which aids in optimizing the design for minimal weight via structural analysis and topology optimization. The design of the prototype is then finalized and manufactured for experimental evaluation to validate the design and functionality. The prototype is realized with a mass of 1.92 kg and dimensions of 261x107x420 mm. The Functional evaluations of the RoboANKLE revealed that it is capable of achieving the natural maximum dorsi-flexion angle with 95% accuracy. Also, Thanks to the implemented mechanisms, the results show that RoboANKLE can generate 57% higher than the required torque for natural walking. The result of the power generation capacity of the RoboANKLE is 10% more than the natural power during the gait cycle.
Prescribed Performance Control of Deformable Object Manipulation in Spatial Latent Space
Manipulating three-dimensional (3D) deformable objects presents significant challenges for robotic systems due to their infinite-dimensional state space and complex deformable dynamics. This paper proposes a novel model-free approach for shape control with constraints imposed on key points. Unlike existing methods that rely on feature dimensionality reduction, the proposed controller leverages the coordinates of key points as the feature vector, which are extracted from the deformable object's point cloud using deep learning methods. This approach not only reduces the dimensionality of the feature space but also retains the spatial information of the object. By extracting key points, the manipulation of deformable objects is simplified into a visual servoing problem, where the shape dynamics are described using a deformation Jacobian matrix. To enhance control accuracy, a prescribed performance control method is developed by integrating barrier Lyapunov functions (BLF) to enforce constraints on the key points. The stability of the closed-loop system is rigorously analyzed and verified using the Lyapunov method. Experimental results further demonstrate the effectiveness and robustness of the proposed method.
A Comparative Study of Oscillatory Perturbations in Car-Following Models
As connected and autonomous vehicles become more widespread, platooning has emerged as a key strategy to improve road capacity, reduce fuel consumption, and enhance traffic flow. However, the benefits of platoons strongly depend on their ability to maintain stability. Instability can lead to unsafe spacing and increased energy usage. In this work, we study platoon instability and analyze the root cause of its occurrence, as well as its impacts on the following vehicle. To achieve this, we propose a comparative study between different car-following models such as the Intelligent Driver Model (IDM), the Optimal Velocity Model (OVM), the General Motors Model (GMM), and the Cooperative Adaptive Cruise Control (CACC). In our approach, we introduce a disruption in the model by varying the velocity of the leading vehicle to visualize the behavior of the following vehicles. To evaluate the dynamic response of each model, we introduce controlled perturbations in the velocity of the leading vehicle, specifically, sinusoidal oscillations and discrete velocity changes. The resulting vehicle trajectories and variations in inter-vehicle spacing are analyzed to assess the robustness of each model to disturbance propagation. The findings offer insight into model sensitivity, stability characteristics, and implications for designing resilient platooning control strategies.
Two Roads to Koopman Operator Theory for Control: Infinite Input Sequences and Operator Families
The Koopman operator, originally defined for dynamical systems without input, has inspired many applications in control. Yet, the theoretical foundations underpinning this progress in control remain underdeveloped. This paper investigates the theoretical structure and connections between two extensions of Koopman theory to control: (i) Koopman operator via infinite input sequences and (ii) the Koopman control family. Although these frameworks encode system information in fundamentally different ways, we show that under certain conditions on the function spaces they operate on, they are equivalent. The equivalence is both in terms of the actions of the Koopman-based formulations in each framework as well as the function values on the system trajectories. Our analysis provides constructive tools to translate between the frameworks, offering a unified perspective for Koopman methods in control.
Tail-Optimized Caching for LLM Inference
Prompt caching is critical for reducing latency and cost in LLM inference: OpenAI and Anthropic report up to 50-90% cost savings through prompt reuse. Despite its widespread success, little is known about what constitutes an optimal prompt caching policy, particularly when optimizing tail latency, a metric of central importance to practitioners. The widely used Least Recently Used (LRU) policy can perform arbitrarily poor on this metric, as it is oblivious to the heterogeneity of conversation lengths. To address this gap, we propose Tail-Optimized LRU, a simple two-line modification that reallocates KV cache capacity to prioritize high-latency conversations by evicting cache entries that are unlikely to affect future turns. Though the implementation is simple, we prove its optimality under a natural stochastic model of conversation dynamics, providing the first theoretical justification for LRU in this setting, a result that may be of independent interest to the caching community. Experimentally, on real conversation data WildChat, Tail-Optimized LRU achieves up to 27.5% reduction in P90 tail Time to First Token latency and 23.9% in P95 tail latency compared to LRU, along with up to 38.9% decrease in SLO violations of 200ms. We believe this provides a practical and theoretically grounded option for practitioners seeking to optimize tail latency in real-world LLM deployments.
Sparsity-exploiting Gaussian Process for Robust Transient Learning of Power System Dynamics
Advances in leveraging Gaussian processes (GP) have enabled learning and inferring dynamic grid behavior from scarce PMU measurements. However, real measurements can be corrupted by various random and targeted threats, leading to inaccurate and meaningless results. This paper develops robust transient learning to overcome this challenge by exploiting the sparse corruption patterns in the data flow. Specifically, we integrate sparse optimization with method of moments (MoM) to make learning robust to a sparse distribution of data corruptions; then, we optimize sparse weights to identify corrupted meter locations. To improve inference speed on large-scale systems, we further adopt K-medoid clustering of locations to develop dimension reduction (DR) and aggregate representation (AR) heuristics. Experimental results demonstrate robustness against random large errors, targeted false data injections, and local PMU clock drifts. On a 1354-bus system, inference turns out to be 18x faster using DR and 400x faster when further combined with AR heuristics.
comment: This manuscript has been submitted to PESGM2026
Exploring a New Design Paradigm for Omnidirectional MAVs for Minimal Actuation and Internal Force Elimination: Theoretical Framework and Control
This paper presents a novel concept for achieving omnidirectionality in a multirotor aerial vehicle (MAV) that uses only 6 inputs and ensures no internal forces at the equilibria. The concept integrates a single actively-tilting propeller along with 3 pendulum-like links, each carrying a propeller, connected by passive universal joints to the main body. We show that this design ensures omnidirectionality while minimizing the internal forces and without resorting to overactuation (i.e., more than 6 inputs). A detailed dynamic model of the multi-link MAV is first developed. Afterwards, the analysis identifies the equilibrium configurations and illustrates that a forced equilibrium exists for every pose of the MAV's main platform. In order to render this equilibrium asymptotically stable for the closed-loop system, a geometric nonlinear controller is constructed using dynamic feedback linearization and backstepping techniques with the main platform configuration error being the left-trivialized error on SE(3). The stability of the closed-loop system is then investigated by employing standard Lyapunov arguments on the zero dynamics. We conclude by providing numerical simulations validating the proposed approach. They demonstrate the MAV capability to perform decoupled attitude and translational motions under non-zero initial conditions, parametric uncertainty, and actuators noise.
Q-EnergyDEX: A Zero-Trust Distributed Energy Trading Framework Driven by Quantum Key Distribution and Blockchain
The rapid decentralization and digitalization of local electricity markets have introduced new cyber-physical vulnerabilities, including key leakage, data tampering, and identity spoofing. Existing blockchain-based solutions provide transparency and traceability but still depend on classical cryptographic primitives that are vulnerable to quantum attacks. To address these challenges, this paper proposes Q-EnergyDEX, a zero-trust distributed energy trading framework driven by quantum key distribution and blockchain. The framework integrates physical-layer quantum randomness with market-level operations, providing an end-to-end quantum-secured infrastructure. A cloud-based Quantum Key Management Service continuously generates verifiable entropy and regulates key generation through a rate-adaptive algorithm to sustain high-quality randomness. A symmetric authentication protocol (Q-SAH) establishes secure and low-latency sessions, while the quantum-aided consensus mechanism (PoR-Lite) achieves probabilistic ledger finality within a few seconds. Furthermore, a Stackelberg-constrained bilateral auction couples market clearing with entropy availability, ensuring both economic efficiency and cryptographic security. Simulation results show that Q-EnergyDEX maintains robust key stability and near-optimal social welfare, demonstrating its feasibility for large-scale decentralized energy markets.
A predictive modular approach to constraint satisfaction under uncertainty - with application to glycosylation in continuous monoclonal antibody biosimilar production
The paper proposes a modular-based approach to constraint handling in process optimization and control. This is partly motivated by the recent interest in learning-based methods, e.g., within bioproduction, for which constraint handling under uncertainty is a challenge. The proposed constraint handler, called predictive filter, is combined with an adaptive constraint margin and a constraint violation cost monitor to minimize the cost of violating soft constraints due to model uncertainty and disturbances. The module can be combined with any controller and is based on minimally modifying the controller output, in a least squares sense, such that constraints are satisfied within the considered horizon. The proposed method is computationally efficient and suitable for real-time applications. The effectiveness of the method is illustrated through a realistic simulation case study of glycosylation constraint satisfaction in continuous monoclonal antibody biosimilar production using Chinese hamster ovary cells, for which the metabolic network model consists of 23 extracellular metabolites and 126 reactions.
Revolution-Spaced Output-Feedback Model Predictive Control for Station Keeping on Near-Rectilinear Halo Orbits
We develop a model predictive control (MPC) policy for station keeping on a Near-Rectilinear Halo Orbit (NRHO). The proposed policy achieves full-state tracking of a reference NRHO via a multiple-maneuver control horizon, each spaced one revolution apart to abide by typical mission operation requirements. We prove that the proposed policy is recursively feasible, and perform numerical evaluation in an output-feedback setting by incorporating a navigation filter and realistic operational uncertainties, where the proposed MPC is compared against the state-of-the-art station-keeping algorithm adopted for the Gateway. Our approach successfully maintains the spacecraft in the vicinity of the reference NRHO at a similar cumulative cost as existing station-keeping methods without encountering phase deviation issues, a common drawback of existing methods with one maneuver per revolution.
comment: 8 pages, 6 figures
Offline and Online Use of Interval and Set-Based Approaches for Control and State Estimation: A Selection of Methodological Approaches and Their Application
Control and state estimation procedures need to be robust against imprecisely known parameters, uncertainty in initial conditions, and external disturbances. Interval methods and other set-based techniques form the basis for the implementation of powerful approaches that can be used to identify parameters of dynamic system models in the presence of the aforementioned types of uncertainty. Moreover, they are applicable to a verified feasibility and stability analysis of controllers and state estimators. In addition to these approaches which are typically used offline for analysis of system models designed with classical floating point procedures, interval and set-based methods have also been developed in recent years, which allow to directly solve the associated design tasks and to implement reliable techniques that are applicable online, i.e., during system operation. The latter approaches include set-based model predictive control, online parameter adaptation techniques for nonlinear variable-structure and backstepping controllers, interval observers, and fault diagnosis techniques. This paper provides an overview of the methodological background and reviews numerous practical applications for which interval and other set-valued approaches have been employed successfully.
No-Regret Learning in Stackelberg Games with an Application to Electric Ride-Hailing
We consider the problem of efficiently learning to play single-leader multi-follower Stackelberg games when the leader lacks knowledge of the lower-level game. Such games arise in hierarchical decision-making problems involving self-interested agents. For example, in electric ride-hailing markets, a central authority aims to learn optimal charging prices to shape fleet distributions and charging patterns of ride-hailing companies. Existing works typically apply gradient-based methods to find the leader's optimal strategy. Such methods are impractical as they require that the followers share private utility information with the leader. Instead, we treat the lower-level game as a black box, assuming only that the followers' interactions approximate a Nash equilibrium while the leader observes the realized cost of the resulting approximation. Under kernel-based regularity assumptions on the leader's cost function, we develop a no-regret algorithm that converges to an $\epsilon$-Stackelberg equilibrium in $O(\sqrt{T})$ rounds. Finally, we validate our approach through a numerical case study on optimal pricing in electric ride-hailing markets.
comment: 8 pages, 2 figures, 1 table
Hierarchical Fuel-Cell Airpath Control: an Efficiency-Aware MIMO Control Approach Combined with a Novel Constraint-Enforcing Reference Governor
This paper presents a hierarchical multivariable control and constraint management approach for an air supply system for a proton exchange membrane fuel cell (PEMFC) system. The control objectives are to track desired compressor mass airflow and cathode inlet pressure, maintain a minimum oxygen excess ratio (OER), and run the system at maximum net efficiency. A multi-input multi-output (MIMO) internal model controller (IMC) is designed and simulated to track flow and pressure set-points, which showed high performance despite strongly coupled plant dynamics. A new set-point map is generated to compute the most efficient cathode inlet pressure from the stack current load. To enforce OER constraints, a novel reference governor (RG) with the ability to govern multiple references (the cascade RG) and the ability to speed up as well as slow down a reference signal (the cross-section RG) is developed and tested. Compared with a single-input single-output (SISO) air-flow control approach, the proposed MIMO control approach shows up to 7.36 percent lower hydrogen fuel consumption. Compared to a traditional load governor, the novel cascaded cross-section RG (CC-RG) shows up to 3.68 percent less mean absolute percent error (MAPE) on net power tracking and greatly improved worst-case OER on realistic drive-cycle simulations. Control development and validations were conducted on two fuel cell system (FCS) models, a nonlinear open-source model and a proprietary Ford high-fidelity model
comment: Accepted for publication in IEEE Transactions on Control Systems Technology. This version incorporates all peer-review revisions
Offline Reinforcement Learning via Inverse Optimization
Inspired by the recent successes of Inverse Optimization (IO) across various application domains, we propose a novel offline Reinforcement Learning (ORL) algorithm for continuous state and action spaces, leveraging the convex loss function called ``sub-optimality loss" from the IO literature. To mitigate the distribution shift commonly observed in ORL problems, we further employ a robust and non-causal Model Predictive Control (MPC) expert steering a nominal model of the dynamics using in-hindsight information stemming from the model mismatch. Unlike the existing literature, our robust MPC expert enjoys an exact and tractable convex reformulation. In the second part of this study, we show that the IO hypothesis class, trained by the proposed convex loss function, enjoys ample expressiveness and achieves competitive performance comparing with the state-of-the-art (SOTA) methods in the low-data regime of the MuJoCo benchmark while utilizing three orders of magnitude fewer parameters, thereby requiring significantly fewer computational resources. To facilitate the reproducibility of our results, we provide an open-source package implementing the proposed algorithms and the experiments.
comment: preprint
Composite learning backstepping control with guaranteed exponential stability and robustness
Adaptive backstepping control provides a feasible solution to achieve asymptotic tracking for mismatched uncertain nonlinear systems. However, the closed-loop stability depends on high-gain feedback generated by nonlinear damping terms, and closed-loop exponential stability with parameter convergence involves a stringent condition named persistent excitation (PE). This paper proposes a composite learning backstepping control (CLBC) strategy based on modular backstepping and high-order tuners to compensate for the transient process of parameter estimation and achieve closed-loop exponential stability without the nonlinear damping terms and the PE condition. A novel composite learning mechanism is designed to maximize the staged exciting strength for parameter estimation, such that parameter convergence can be achieved under a condition of interval excitation (IE) or even partial IE that is strictly weaker than PE. An extra prediction error is employed in the adaptive law to ensure the transient performance without nonlinear damping terms. The exponential stability of the closed-loop system is proved rigorously under the partial IE or IE condition. Simulations have demonstrated the effectiveness and superiority of the proposed method in both parameter estimation and control compared to state-of-the-art methods.
comment: This work has been submitted to the IEEE for possible publication
Strategy Templates for Almost-Sure and Positive Winning of Stochastic Parity Games towards Permissive and Resilient Control
Stochastic games are fundamental in various applications, including the control of cyber-physical systems (CPS), where both controller and environment are modeled as players. Traditional algorithms typically aim to determine a single winning strategy to develop a controller. However, in CPS control and other domains, permissive controllers are essential, as they enable the system to adapt when additional constraints arise and remain resilient to runtime changes. This work generalizes the concept of (permissive winning) strategy templates, originally introduced by Anand et al. at TACAS and CAV 2023 for deterministic games, to incorporate stochastic games. These templates capture an infinite number of winning strategies, allowing for efficient strategy adaptation to system changes. We focus on two winning criteria (almost-sure and positive winning) and five winning objectives (safety, reachability, B\"uchi, co-B\"uchi, and parity). Our contributions include algorithms for constructing templates for each winning criterion and objective and a novel approach for extracting a winning strategy from a given template. Discussions on comparisons between templates and between strategy extraction methods are provided.
comment: For the conference version published at ICTAC 2024 see: arXiv:2409.08607v1
A Weighted Predict-and-Optimize Framework for Power System Operation Considering Varying Impacts of Uncertainty
Prediction deviations of different uncertainties have varying impacts on downstream decision-making. Improving the prediction accuracy of critical uncertainties with significant impacts on decision-making quality yields better optimization results. Motivated by this observation, this paper proposes a novel weighted predict-and-optimize (WPO) framework for decision-making under multiple uncertainties. Specifically, we incorporate an uncertainty-aware weighting mechanism into the predictive model to capture the relative impact of each uncertainty on specific optimization tasks, and introduce a problem-driven prediction loss (PDPL) to quantify the suboptimality of the weighted predictions relative to perfect predictions in downstream optimization. By optimizing the uncertainty weights to minimize the PDPL, the proposed WPO framework enables adaptive assessment of uncertainty impacts and joint learning of prediction and optimization. Furthermore, to facilitate weight optimization, we develop a surrogate model that establishes a direct mapping between the uncertainty weights and the PDPL, where enhanced graph convolutional networks and multi-task learning are adopted for efficient surrogate model construction and training. Numerical experiments on the modified IEEE 33-bus and 123-bus systems demonstrate that the proposed WPO framework outperforms the traditional predict-then-optimize paradigm, reducing the PDPL by an average of 55% within acceptable computational time.
comment: This is a paper submitted to IEEE TRANSACTIONS ON Power Systems
Differentiable-by-design Nonlinear Optimization for Model Predictive Control
Nonlinear optimization-based control policies, such as those those arising in nonlinear Model Predictive Control, have seen remarkable success in recent years. These policies require solving computationally demanding nonlinear optimization programs online at each time-step. The resulting solution map, viewed as a function of the measured state of the system and design parameters, may not be differentiable, which poses significant challenges if the control policy is embedded in a gradient-based policy optimization scheme. We propose a principled way to regularize the nonlinear optimization problem, obtaining a surrogate derivative even if when the original problem is not differentiable. The surrogate problem is differentiable by design and its solution map coincides with the solution of the unregularized problem. We demonstrate the effectiveness of our approach in a free-final-time optimal control problem and a receding-horizon nonlinear MPC example.
Time-causal and time-recursive wavelets
When to apply wavelet analysis to real-time temporal signals, where the future cannot be accessed, it is essential to base all the steps in the signal processing pipeline on computational mechanisms that are truly time-causal. This paper describes how a time-causal wavelet analysis can be performed based on concepts developed in the area of temporal scale-space theory, originating from a complete classification of temporal smoothing kernels that guarantee non-creation of new structures from finer to coarser temporal scale levels. By necessity, convolution with truncated exponential kernels in cascade constitutes the only permissable class of kernels, as well as their temporal derivatives as a natural complement to fulfil the admissibility conditions of wavelet representations. For a particular way of choosing the time constants in the resulting infinite convolution of truncated exponential kernels, to ensure temporal scale covariance and thus self-similarity over temporal scales, we describe how mother wavelets can be chosen as temporal derivatives of the resulting time-causal limit kernel. By developing connections between wavelet theory and scale-space theory, we characterize and quantify how the continuous scaling properties transfer to the discrete implementation, demonstrating how the proposed time-causal wavelet representation can reflect the duration of locally dominant temporal structures in the input signals. We propose that this notion of time-causal wavelet analysis could be a valuable tool for signal processing tasks, where streams of signals are to be processed in real time, specifically for signals that may contain local variations over a rich span of temporal scales, or more generally for analysing physical or biophysical temporal phenomena, where a fully time-causal analysis is called for to be physically realistic.
comment: 23 pages, 8 figures
Inferring Foresightedness in Dynamic Noncooperative Games
Dynamic game theory is an increasingly popular tool for modeling multi-agent, e.g. human-robot, interactions. Game-theoretic models presume that each agent wishes to minimize a private cost function that depends on others' actions. These games typically evolve over a fixed time horizon, specifying how far into the future each agent plans. In practical settings, however, decision-makers may vary in foresightedness, or how much they care about their current cost in relation to their past and future costs. We conjecture that quantifying and estimating each agent's foresightedness from online data will enable safer and more efficient interactions with other agents. To this end, we frame this inference problem as an inverse dynamic game. We consider a specific objective function parametrization that smoothly interpolates myopic and farsighted planning. Games of this form are readily transformed into parametric mixed complementarity problems; we exploit the directional differentiability of solutions to these problems with respect to their hidden parameters to solve for agents' foresightedness. We conduct three experiments: one with synthetically generated delivery robot motion, one with real-world data involving people walking, biking, and driving vehicles, and one using high-fidelity simulators. The results of these experiments demonstrate that explicitly inferring agents' foresightedness enables game-theoretic models to make 33% more accurate models for agents' behavior.
PowerChain: A Verifiable Agentic AI System for Automating Distribution Grid Analyses
Rapid electrification and decarbonization are increasing the complexity of distribution grid (DG) operation and planning, necessitating advanced computational analyses to ensure reliability and resilience. These analyses depend on disparate workflows comprising complex models, function calls, and data pipelines that require substantial expert knowledge and remain difficult to automate. Workforce and budget constraints further limit utilities' ability to apply such analyses at scale. To address this gap, we build an agentic system PowerChain, which is capable of autonomously performing complex grid analyses. Existing agentic AI systems are typically developed in a bottom-up manner with customized context for predefined analysis tasks; therefore, they do not generalize to tasks that the agent has never seen. In comparison, to generalize to unseen DG analysis tasks, PowerChain dynamically generates structured context by leveraging supervisory signals from self-contained power systems tools (e.g., GridLAB-D) and an optimized set of expert-annotated and verified reasoning trajectories. For complex DG tasks defined in natural language, empirical results on real utility data demonstrate that PowerChain achieves up to a 144/% improvement in performance over baselines.
Physiology-informed layered sensing for intelligent human-exoskeleton interaction
Wearable exoskeletons hold transformative promise for restoring mobility across diverse users with muscular weakness or other impairments. However, their translation beyond laboratory environments remains limited by sensing systems that capture movement but not underlying physiology. Here, we present a soft, lightweight smart leg sleeve that achieves anatomically aligned, layered multimodal sensing by integrating textile-based surface electromyography (sEMG) electrodes, ultrasensitive textile strain sensors, and inertial measurement units (IMUs). Each sensing modality targets a distinct physiological layer: IMUs track joint kinematics at the skeletal level, sEMG monitors muscle activation at the muscular level, and strain sensors detect skin deformation at the cutaneous level. Together, these sensors provide real-time perception to support three core objectives: controlling personalized assistance, optimizing user effort, and safeguarding against injury risks. The system is skin-conformal, mechanically compliant, and seamlessly integrated with a custom exoskeleton ($<20$~g total sensor and electronics weight). We demonstrate: (1) accurate ankle joint moment estimation (RMSE = 0.13~Nm/kg), (2) real-time classification of metabolic trends (accuracy = 97.1\%), and (3) injury risk detection within 100~ms (recall = 0.96), all validated on unseen users using a leave-one-subject-out protocol. This work establishes a physiology-aligned sensing architecture that reframes exoskeleton perception from motion tracking to real-time physiological decoding, offering a pathway towards intelligent, adaptive, and personalized wearable robotics.
comment: 21 pages, 5 figures, 43 references
Autonomous Cyber Resilience via a Co-Evolutionary Arms Race within a Fortified Digital Twin Sandbox SP
The convergence of Information Technology and Operational Technology has exposed Industrial Control Systems to adaptive, intelligent adversaries that render static defenses obsolete. This paper introduces the Adversarial Resilience Co-evolution (ARC) framework, addressing the "Trinity of Trust" comprising model fidelity, data integrity, and analytical resilience. ARC establishes a co-evolutionary arms race within a Fortified Secure Digital Twin (F-SCDT), where a Deep Reinforcement Learning "Red Agent" autonomously discovers attack paths while an ensemble-based "Blue Agent" is continuously hardened against these threats. Experimental validation on the Tennessee Eastman Process (TEP) and Secure Water Treatment (SWaT) testbeds demonstrates superior performance in detecting novel attacks, with F1-scores improving from 0.65 to 0.89 and detection latency reduced from over 1200 seconds to 210 seconds. A comprehensive ablation study reveals that the co-evolutionary process itself contributes a 27% performance improvement. By integrating Explainable AI and proposing a Federated ARC architecture, this work presents a necessary paradigm shift toward dynamic, self-improving security for critical infrastructure.
comment: 6 pages, 2 figures, 4 equations, 1 algorithm, 3 tables, to be published in ISPACS 2025, unabridged version exists as arXiv:2506.20102v1
End-to-End Learning Framework for Solving Non-Markovian Optimal Control
Integer-order calculus often falls short in capturing the long-range dependencies and memory effects found in many real-world processes. Fractional calculus addresses these gaps via fractional-order integrals and derivatives, but fractional-order dynamical systems pose substantial challenges in system identification and optimal control due to the lack of standard control methodologies. In this paper, we theoretically derive the optimal control via linear quadratic regulator (LQR) for fractional-order linear time-invariant (FOLTI) systems and develop an end-to-end deep learning framework based on this theoretical foundation. Our approach establishes a rigorous mathematical model, derives analytical solutions, and incorporates deep learning to achieve data-driven optimal control of FOLTI systems. Our key contributions include: (i) proposing an innovative system identification method control strategy for FOLTI systems, (ii) developing the first end-to-end data-driven learning framework, Fractional-Order Learning for Optimal Control (FOLOC), that learns control policies from observed trajectories, and (iii) deriving a theoretical analysis of sample complexity to quantify the number of samples required for accurate optimal control in complex real-world problems. Experimental results indicate that our method accurately approximates fractional-order system behaviors without relying on Gaussian noise assumptions, pointing to promising avenues for advanced optimal control.
Robotics
MimicKit: A Reinforcement Learning Framework for Motion Imitation and Control
MimicKit is an open-source framework for training motion controllers using motion imitation and reinforcement learning. The codebase provides implementations of commonly-used motion-imitation techniques and RL algorithms. This framework is intended to support research and applications in computer graphics and robotics by providing a unified training framework, along with standardized environment, agent, and data structures. The codebase is designed to be modular and easily configurable, enabling convenient modification and extension to new characters and tasks. The open-source codebase is available at: https://github.com/xbpeng/MimicKit.
InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy
We introduce InternVLA-M1, a unified framework for spatial grounding and robot control that advances instruction-following robots toward scalable, general-purpose intelligence. Its core idea is spatially guided vision-language-action training, where spatial grounding serves as the critical link between instructions and robot actions. InternVLA-M1 employs a two-stage pipeline: (i) spatial grounding pre-training on over 2.3M spatial reasoning data to determine ``where to act'' by aligning instructions with visual, embodiment-agnostic positions, and (ii) spatially guided action post-training to decide ``how to act'' by generating embodiment-aware actions through plug-and-play spatial prompting. This spatially guided training recipe yields consistent gains: InternVLA-M1 outperforms its variant without spatial guidance by +14.6% on SimplerEnv Google Robot, +17% on WidowX, and +4.3% on LIBERO Franka, while demonstrating stronger spatial reasoning capability in box, point, and trace prediction. To further scale instruction following, we built a simulation engine to collect 244K generalizable pick-and-place episodes, enabling a 6.2% average improvement across 200 tasks and 3K+ objects. In real-world clustered pick-and-place, InternVLA-M1 improved by 7.3%, and with synthetic co-training, achieved +20.6% on unseen objects and novel configurations. Moreover, in long-horizon reasoning-intensive scenarios, it surpassed existing works by over 10%. These results highlight spatially guided training as a unifying principle for scalable and resilient generalist robots. Code and models are available at https://github.com/InternRobotics/InternVLA-M1.
comment: Technical report
Simplicial Embeddings Improve Sample Efficiency in Actor-Critic Agents
Recent works have proposed accelerating the wall-clock training time of actor-critic methods via the use of large-scale environment parallelization; unfortunately, these can sometimes still require large number of environment interactions to achieve a desired level of performance. Noting that well-structured representations can improve the generalization and sample efficiency of deep reinforcement learning (RL) agents, we propose the use of simplicial embeddings: lightweight representation layers that constrain embeddings to simplicial structures. This geometric inductive bias results in sparse and discrete features that stabilize critic bootstrapping and strengthen policy gradients. When applied to FastTD3, FastSAC, and PPO, simplicial embeddings consistently improve sample efficiency and final performance across a variety of continuous- and discrete-control environments, without any loss in runtime speed.
Hierarchical Discrete Lattice Assembly: An Approach for the Digital Fabrication of Scalable Macroscale Structures SC
Although digital fabrication processes at the desktop scale have become proficient and prolific, systems aimed at producing larger-scale structures are still typically complex, expensive, and unreliable. In this work, we present an approach for the fabrication of scalable macroscale structures using simple robots and interlocking lattice building blocks. A target structure is first voxelized so that it can be populated with an architected lattice. These voxels are then grouped into larger interconnected blocks, which are produced using standard digital fabrication processes, leveraging their capability to produce highly complex geometries at a small scale. These blocks, on the size scale of tens of centimeters, are then fed to mobile relative robots that are able to traverse over the structure and place new blocks to form structures on the meter scale. To facilitate the assembly of large structures, we introduce a live digital twin simulation tool for controlling and coordinating assembly robots that enables both global planning for a target structure and live user design, interaction, or intervention. To improve assembly throughput, we introduce a new modular assembly robot, designed for hierarchical voxel handling. We validate this system by demonstrating the voxelization, hierarchical blocking, path planning, and robotic fabrication of a set of meter-scale objects.
comment: In ACM Symposium on Computational Fabrication (SCF '25), November 20-21, 2025, Cambridge, MA, USA. ACM, New York, NY, USA, 15 pages
On Your Own: Pro-level Autonomous Drone Racing in Uninstrumented Arenas
Drone technology is proliferating in many industries, including agriculture, logistics, defense, infrastructure, and environmental monitoring. Vision-based autonomy is one of its key enablers, particularly for real-world applications. This is essential for operating in novel, unstructured environments where traditional navigation methods may be unavailable. Autonomous drone racing has become the de facto benchmark for such systems. State-of-the-art research has shown that autonomous systems can surpass human-level performance in racing arenas. However, direct applicability to commercial and field operations is still limited as current systems are often trained and evaluated in highly controlled environments. In our contribution, the system's capabilities are analyzed within a controlled environment -- where external tracking is available for ground-truth comparison -- but also demonstrated in a challenging, uninstrumented environment -- where ground-truth measurements were never available. We show that our approach can match the performance of professional human pilots in both scenarios. We also publicly release the data from the flights carried out by our approach and a world-class human pilot.
LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models
Visual-Language-Action (VLA) models report impressive success rates on robotic manipulation benchmarks, yet these results may mask fundamental weaknesses in robustness. We perform a systematic vulnerability analysis by introducing controlled perturbations across seven dimensions: objects layout, camera viewpoints, robot initial states, language instructions, light conditions, background textures and sensor noise. We comprehensively analyzed multiple state-of-the-art models and revealed consistent brittleness beneath apparent competence. Our analysis exposes critical weaknesses: models exhibit extreme sensitivity to perturbation factors, including camera viewpoints and robot initial states, with performance dropping from 95% to below 30% under modest perturbations. Surprisingly, models are largely insensitive to language variations, with further experiments revealing that models tend to ignore language instructions completely. Our findings challenge the assumption that high benchmark scores equate to true competency and highlight the need for evaluation practices that assess reliability under realistic variation.
A Modular Object Detection System for Humanoid Robots Using YOLO
Within the field of robotics, computer vision remains a significant barrier to progress, with many tasks hindered by inefficient vision systems. This research proposes a generalized vision module leveraging YOLOv9, a state-of-the-art framework optimized for computationally constrained environments like robots. The model is trained on a dataset tailored to the FIRA robotics Hurocup. A new vision module is implemented in ROS1 using a virtual environment to enable YOLO compatibility. Performance is evaluated using metrics such as frames per second (FPS) and Mean Average Precision (mAP). Performance is then compared to the existing geometric framework in static and dynamic contexts. The YOLO model achieved comparable precision at a higher computational cost then the geometric model, while providing improved robustness.
comment: 7 Figures, 5 tables. This article was presented at FIRA Summit 2025. It will be updated for journal submission
Characterizing Lidar Point-Cloud Adversities Using a Vector Field Visualization
In this paper we introduce a visualization methodology to aid a human analyst in classifying adversity modes that impact lidar scan matching. Our methodology is intended for offline rather than real-time analysis. The method generates a vector-field plot that characterizes local discrepancies between a pair of registered point clouds. The vector field plot reveals patterns that would be difficult for the analyst to extract from raw point-cloud data. After introducing our methodology, we apply the process to two proof-of-concept examples: one a simulation study and the other a field experiment. For both data sets, a human analyst was able to reason about a series of adversity mechanisms and iteratively remove those mechanisms from the raw data, to help focus attention on progressively smaller discrepancies.
comment: This is the preprint version of the paper published in: Proceedings of the 37th International Technical Meeting of the Satellite Division of The Institute of Navigation (ION GNSS+ 2024), September 2024 The final version is available at https://doi.org/10.33012/2024.19864
Efficient Force and Stiffness Prediction in Robotic Produce Handling with a Piezoresistive Pressure Sensor
Properly handling delicate produce with robotic manipulators is a major part of the future role of automation in agricultural harvesting and processing. Grasping with the correct amount of force is crucial in not only ensuring proper grip on the object, but also to avoid damaging or bruising the product. In this work, a flexible pressure sensor that is both low cost and easy to fabricate is integrated with robotic grippers for working with produce of varying shapes, sizes, and stiffnesses. The sensor is successfully integrated with both a rigid robotic gripper, as well as a pneumatically actuated soft finger. Furthermore, an algorithm is proposed for accelerated estimation of the steady-state value of the sensor output based on the transient response data, to enable real-time applications. The sensor is shown to be effective in incorporating feedback to correctly grasp objects of unknown sizes and stiffnesses. At the same time, the sensor provides estimates for these values which can be utilized for identification of qualities such as ripeness levels and bruising. It is also shown to be able to provide force feedback for objects of variable stiffnesses. This enables future use not only for produce identification, but also for tasks such as quality control and selective distribution based on ripeness levels.
comment: For supplementary videos, see https://drive.google.com/drive/folders/1jol-_z6gaUfjpL1Qi7EG420usTbVSodv?usp=sharing
PlanarMesh: Building Compact 3D Meshes from LiDAR using Incremental Adaptive Resolution Reconstruction
Building an online 3D LiDAR mapping system that produces a detailed surface reconstruction while remaining computationally efficient is a challenging task. In this paper, we present PlanarMesh, a novel incremental, mesh-based LiDAR reconstruction system that adaptively adjusts mesh resolution to achieve compact, detailed reconstructions in real-time. It introduces a new representation, planar-mesh, which combines plane modeling and meshing to capture both large surfaces and detailed geometry. The planar-mesh can be incrementally updated considering both local surface curvature and free-space information from sensor measurements. We employ a multi-threaded architecture with a Bounding Volume Hierarchy (BVH) for efficient data storage and fast search operations, enabling real-time performance. Experimental results show that our method achieves reconstruction accuracy on par with, or exceeding, state-of-the-art techniques-including truncated signed distance functions, occupancy mapping, and voxel-based meshing-while producing smaller output file sizes (10 times smaller than raw input and more than 5 times smaller than mesh-based methods) and maintaining real-time performance (around 2 Hz for a 64-beam sensor).
Active Tactile Exploration for Rigid Body Pose and Shape Estimation
General robot manipulation requires the handling of previously unseen objects. Learning a physically accurate model at test time can provide significant benefits in data efficiency, predictability, and reuse between tasks. Tactile sensing can compliment vision with its robustness to occlusion, but its temporal sparsity necessitates careful online exploration to maintain data efficiency. Direct contact can also cause an unrestrained object to move, requiring both shape and location estimation. In this work, we propose a learning and exploration framework that uses only tactile data to simultaneously determine the shape and location of rigid objects with minimal robot motion. We build on recent advances in contact-rich system identification to formulate a loss function that penalizes physical constraint violation without introducing the numerical stiffness inherent in rigid-body contact. Optimizing this loss, we can learn cuboid and convex polyhedral geometries with less than 10s of randomly collected data after first contact. Our exploration scheme seeks to maximize Expected Information Gain and results in significantly faster learning in both simulated and real-robot experiments. More information can be found at https://dairlab.github.io/activetactile
comment: 8 pages, 6 figures
Development of an Intuitive GUI for Non-Expert Teleoperation of Humanoid Robots
The operation of humanoid robotics is an essential field of research with many practical and competitive applications. Many of these systems, however, do not invest heavily in developing a non-expert-centered graphical user interface (GUI) for operation. The focus of this research is to develop a scalable GUI that is tailored to be simple and intuitive so non-expert operators can control the robot through a FIRA-regulated obstacle course. Using common practices from user interface development (UI) and understanding concepts described in human-robot interaction (HRI) and other related concepts, we will develop a new interface with the goal of a non-expert teleoperation system.
comment: 9 Figure. Presented at FIRA Summit 2025, Daegu, S. Korea
Hoecken-D Hand: A Novel Robotic Hand for Linear Parallel Pinching and Self-Adaptive Grasping IROS
This paper presents the Hoecken-D Hand, an underactuated robotic gripper that combines a modified Hoecken linkage with a differential spring mechanism to achieve both linear parallel pinching and a mid-stroke transition to adaptive envelope. The original Hoecken linkage is reconfigured by replacing one member with differential links, preserving straight-line guidance while enabling contact-triggered reconfiguration without additional actuators. A double-parallelogram arrangement maintains fingertip parallelism during conventional pinching, whereas the differential mechanism allows one finger to wrap inward upon encountering an obstacle, improving stability on irregular or thin objects. The mechanism can be driven by a single linear actuator, minimizing complexity and cost; in our prototype, each finger is driven by its own linear actuator for simplicity. We perform kinematic modeling and force analysis to characterize grasp performance, including simulated grasping forces and spring-opening behavior under varying geometric parameters. The design was prototyped using PLA-based 3D printing, achieving a linear pinching span of approximately 200 mm. Preliminary tests demonstrate reliable grasping in both modes across a wide range of object geometries, highlighting the Hoecken-D Hand as a compact, adaptable, and cost-effective solution for manipulation in unstructured environments.
comment: Accepted by IEEE International Conference on Robotics and Biomimetics (IROS) 2025, Hangzhou, China. This version includes updated contact information
Accelerated Feature Detectors for Visual SLAM: A Comparative Study of FPGA vs GPU
Feature detection is a common yet time-consuming module in Simultaneous Localization and Mapping (SLAM) implementations, which are increasingly deployed on power-constrained platforms, such as drones. Graphics Processing Units (GPUs) have been a popular accelerator for computer vision in general, and feature detection and SLAM in particular. On the other hand, System-on-Chips (SoCs) with integrated Field Programmable Gate Array (FPGA) are also widely available. This paper presents the first study of hardware-accelerated feature detectors considering a Visual SLAM (V-SLAM) pipeline. We offer new insights by comparing the best GPU-accelerated FAST, Harris, and SuperPoint implementations against the FPGA-accelerated counterparts on modern SoCs (Nvidia Jetson Orin and AMD Versal). The evaluation shows that when using a non-learning-based feature detector such as FAST and Harris, their GPU implementations, and the GPU-accelerated V-SLAM can achieve better run-time performance and energy efficiency than the FAST and Harris FPGA implementations as well as the FPGA-accelerated V-SLAM. However, when considering a learning-based detector such as SuperPoint, its FPGA implementation can achieve better run-time performance and energy efficiency (up to 3.1$\times$ and 1.4$\times$ improvements, respectively) than the GPU implementation. The FPGA-accelerated V-SLAM can also achieve comparable run-time performance compared to the GPU-accelerated V-SLAM, with better FPS in 2 out of 5 dataset sequences. When considering the accuracy, the results show that the GPU-accelerated V-SLAM is more accurate than the FPGA-accelerated V-SLAM in general. Last but not least, the use of hardware acceleration for feature detection could further improve the performance of the V-SLAM pipeline by having the global bundle adjustment module invoked less frequently without sacrificing accuracy.
comment: 12 pages, 7 figures
A Novel Robot Hand with Hoeckens Linkages and Soft Phalanges for Scooping and Self-Adaptive Grasping in Environmental Constraints IROS
This paper presents a novel underactuated adaptive robotic hand, Hockens-A Hand, which integrates the Hoeckens mechanism, a double-parallelogram linkage, and a specialized four-bar linkage to achieve three adaptive grasping modes: parallel pinching, asymmetric scooping, and enveloping grasping. Hockens-A Hand requires only a single linear actuator, leveraging passive mechanical intelligence to ensure adaptability and compliance in unstructured environments. Specifically, the vertical motion of the Hoeckens mechanism introduces compliance, the double-parallelogram linkage ensures line contact at the fingertip, and the four-bar amplification system enables natural transitions between different grasping modes. Additionally, the inclusion of a mesh-textured silicone phalanx further enhances the ability to envelop objects of various shapes and sizes. This study employs detailed kinematic analysis to optimize the push angle and design the linkage lengths for optimal performance. Simulations validated the design by analyzing the fingertip motion and ensuring smooth transitions between grasping modes. Furthermore, the grasping force was analyzed using power equations to enhance the understanding of the system's performance.Experimental validation using a 3D-printed prototype demonstrates the three grasping modes of the hand in various scenarios under environmental constraints, verifying its grasping stability and broad applicability.
comment: Accepted by IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2025, Hangzhou. This version includes updated contact information
Bridge the Gap: Enhancing Quadruped Locomotion with Vertical Ground Perturbations
Legged robots, particularly quadrupeds, excel at navigating rough terrains, yet their performance under vertical ground perturbations, such as those from oscillating surfaces, remains underexplored. This study introduces a novel approach to enhance quadruped locomotion robustness by training the Unitree Go2 robot on an oscillating bridge - a 13.24-meter steel-and-concrete structure with a 2.0 Hz eigenfrequency designed to perturb locomotion. Using Reinforcement Learning (RL) with the Proximal Policy Optimization (PPO) algorithm in a MuJoCo simulation, we trained 15 distinct locomotion policies, combining five gaits (trot, pace, bound, free, default) with three training conditions: rigid bridge and two oscillating bridge setups with differing height regulation strategies (relative to bridge surface or ground). Domain randomization ensured zero-shot transfer to the real-world bridge. Our results demonstrate that policies trained on the oscillating bridge exhibit superior stability and adaptability compared to those trained on rigid surfaces. Our framework enables robust gait patterns even without prior bridge exposure. These findings highlight the potential of simulation-based RL to improve quadruped locomotion during dynamic ground perturbations, offering insights for designing robots capable of traversing vibrating environments.
Through the Lens of Doubt: Robust and Efficient Uncertainty Estimation for Visual Place Recognition
Visual Place Recognition (VPR) enables robots and autonomous vehicles to identify previously visited locations by matching current observations against a database of known places. However, VPR systems face significant challenges when deployed across varying visual environments, lighting conditions, seasonal changes, and viewpoints changes. Failure-critical VPR applications, such as loop closure detection in simultaneous localization and mapping (SLAM) pipelines, require robust estimation of place matching uncertainty. We propose three training-free uncertainty metrics that estimate prediction confidence by analyzing inherent statistical patterns in similarity scores from any existing VPR method. Similarity Distribution (SD) quantifies match distinctiveness by measuring score separation between candidates; Ratio Spread (RS) evaluates competitive ambiguity among top-scoring locations; and Statistical Uncertainty (SU) is a combination of SD and RS that provides a unified metric that generalizes across datasets and VPR methods without requiring validation data to select the optimal metric. All three metrics operate without additional model training, architectural modifications, or computationally expensive geometric verification. Comprehensive evaluation across nine state-of-the-art VPR methods and six benchmark datasets confirms that our metrics excel at discriminating between correct and incorrect VPR matches, and consistently outperform existing approaches while maintaining negligible computational overhead, making it deployable for real-time robotic applications across varied environmental conditions with improved precision-recall performance.
Physics-Informed Neural Network Modeling of Vehicle Collision Dynamics in Precision Immobilization Technique Maneuvers
Accurate prediction of vehicle collision dynamics is crucial for advanced safety systems and post-impact control applications, yet existing methods face inherent trade-offs among computational efficiency, prediction accuracy, and data requirements. This paper proposes a dual Physics-Informed Neural Network framework addressing these challenges through two complementary networks. The first network integrates Gaussian Mixture Models with PINN architecture to learn impact force distributions from finite element analysis data while enforcing momentum conservation and energy consistency constraints. The second network employs an adaptive PINN with dynamic constraint weighting to predict post-collision vehicle dynamics, featuring an adaptive physics guard layer that prevents unrealistic predictions whil e preserving data-driven learning capabilities. The framework incorporates uncertainty quantification through time-varying parameters and enables rapid adaptation via fine-tuning strategies. Validation demonstrates significant improvements: the impact force model achieves relative errors below 15.0% for force prediction on finite element analysis (FEA) datasets, while the vehicle dynamics model reduces average trajectory prediction error by 63.6% compared to traditional four-degree-of-freedom models in scaled vehicle experiments. The integrated system maintains millisecond-level computational efficiency suitable for real-time applications while providing probabilistic confidence bounds essential for safety-critical control. Comprehensive validation through FEA simulation, dynamic modeling, and scaled vehicle experiments confirms the framework's effectiveness for Precision Immobilization Technique scenarios and general collision dynamics prediction.
Real-Time Knee Angle Prediction Using EMG and Kinematic Data with an Attention-Based CNN-LSTM Network and Transfer Learning Across Multiple Datasets
Electromyography (EMG) signals are widely used for predicting body joint angles through machine learning (ML) and deep learning (DL) methods. However, these approaches often face challenges such as limited real-time applicability, non-representative test conditions, and the need for large datasets to achieve optimal performance. This paper presents a transfer-learning framework for knee joint angle prediction that requires only a few gait cycles from new subjects. Three datasets - Georgia Tech, the University of California Irvine (UCI), and the Sharif Mechatronic Lab Exoskeleton (SMLE) - containing four EMG channels relevant to knee motion were utilized. A lightweight attention-based CNN-LSTM model was developed and pre-trained on the Georgia Tech dataset, then transferred to the UCI and SMLE datasets. The proposed model achieved Normalized Mean Absolute Errors (NMAE) of 6.8 percent and 13.7 percent for one-step and 50-step predictions on abnormal subjects using EMG inputs alone. Incorporating historical knee angles reduced the NMAE to 3.1 percent and 3.5 percent for normal subjects, and to 2.8 percent and 7.5 percent for abnormal subjects. When further adapted to the SMLE exoskeleton with EMG, kinematic, and interaction force inputs, the model achieved 1.09 percent and 3.1 percent NMAE for one- and 50-step predictions, respectively. These results demonstrate robust performance and strong generalization for both short- and long-term rehabilitation scenarios.
A New Perspective on Transformers in Online Reinforcement Learning for Continuous Control
Despite their effectiveness and popularity in offline or model-based reinforcement learning (RL), transformers remain underexplored in online model-free RL due to their sensitivity to training setups and model design decisions such as how to structure the policy and value networks, share components, or handle temporal information. In this paper, we show that transformers can be strong baselines for continuous control in online model-free RL. We investigate key design questions: how to condition inputs, share components between actor and critic, and slice sequential data for training. Our experiments reveal stable architectural and training strategies enabling competitive performance across fully and partially observable tasks, and in both vector- and image-based settings. These findings offer practical guidance for applying transformers in online RL.
Adversarial Fine-tuning in Offline-to-Online Reinforcement Learning for Robust Robot Control
Offline reinforcement learning enables sample-efficient policy acquisition without risky online interaction, yet policies trained on static datasets remain brittle under action-space perturbations such as actuator faults. This study introduces an offline-to-online framework that trains policies on clean data and then performs adversarial fine-tuning, where perturbations are injected into executed actions to induce compensatory behavior and improve resilience. A performance-aware curriculum further adjusts the perturbation probability during training via an exponential-moving-average signal, balancing robustness and stability throughout the learning process. Experiments on continuous-control locomotion tasks demonstrate that the proposed method consistently improves robustness over offline-only baselines and converges faster than training from scratch. Matching the fine-tuning and evaluation conditions yields the strongest robustness to action-space perturbations, while the adaptive curriculum strategy mitigates the degradation of nominal performance observed with the linear curriculum strategy. Overall, the results show that adversarial fine-tuning enables adaptive and robust control under uncertain environments, bridging the gap between offline efficiency and online adaptability.
comment: 16 pages, 8 figures
MODUR: A Modular Dual-reconfigurable Robot
Modular Self-Reconfigurable Robot (MSRR) systems are a class of robots capable of forming higher-level robotic systems by altering the topological relationships between modules, offering enhanced adaptability and robustness in various environments. This paper presents a novel MSRR called MODUR, featuring dual-level reconfiguration capabilities designed to integrate reconfigurable mechanisms into MSRR. Specifically, MODUR can perform high-level self-reconfiguration among modules to create different configurations, while each module is also able to change its shape to execute basic motions. The design of MODUR primarily includes a compact connector and scissor linkage groups that provide actuation, forming a parallel mechanism capable of achieving both connector motion decoupling and adjacent position migration capabilities. Furthermore, the workspace, considering the interdependent connectors, is comprehensively analyzed, laying a theoretical foundation for the design of the module's basic motion. Finally, the motion of MODUR is validated through a series of experiments.
Tactile-Conditioned Diffusion Policy for Force-Aware Robotic Manipulation
Contact-rich manipulation depends on applying the correct grasp forces throughout the manipulation task, especially when handling fragile or deformable objects. Most existing imitation learning approaches often treat visuotactile feedback only as an additional observation, leaving applied forces as an uncontrolled consequence of gripper commands. In this work, we present Force-Aware Robotic Manipulation (FARM), an imitation learning framework that integrates high-dimensional tactile data to infer tactile-conditioned force signals, which in turn define a matching force-based action space. We collect human demonstrations using a modified version of the handheld Universal Manipulation Interface (UMI) gripper that integrates a GelSight Mini visual tactile sensor. For deploying the learned policies, we developed an actuated variant of the UMI gripper with geometry matching our handheld version. During policy rollouts, the proposed FARM diffusion policy jointly predicts robot pose, grip width, and grip force. FARM outperforms several baselines across three tasks with distinct force requirements -- high-force, low-force, and dynamic force adaptation -- demonstrating the advantages of its two key components: leveraging force-grounded, high-dimensional tactile observations and a force-based control space. The codebase and design files are open-sourced and available at https://tactile-farm.github.io .
DAMM-LOAM: Degeneracy Aware Multi-Metric LiDAR Odometry and Mapping IROS
LiDAR Simultaneous Localization and Mapping (SLAM) systems are essential for enabling precise navigation and environmental reconstruction across various applications. Although current point-to-plane ICP algorithms perform effec- tively in structured, feature-rich environments, they struggle in scenarios with sparse features, repetitive geometric structures, and high-frequency motion. This leads to degeneracy in 6- DOF pose estimation. Most state-of-the-art algorithms address these challenges by incorporating additional sensing modalities, but LiDAR-only solutions continue to face limitations under such conditions. To address these issues, we propose a novel Degeneracy-Aware Multi-Metric LiDAR Odometry and Map- ping (DAMM-LOAM) module. Our system improves mapping accuracy through point cloud classification based on surface normals and neighborhood analysis. Points are classified into ground, walls, roof, edges, and non-planar points, enabling accurate correspondences. A Degeneracy-based weighted least squares-based ICP algorithm is then applied for accurate odom- etry estimation. Additionally, a Scan Context based back-end is implemented to support robust loop closures. DAMM-LOAM demonstrates significant improvements in odometry accuracy, especially in indoor environments such as long corridors
comment: Accepted at IROS Active Perception Workshop
ALOHA2 Robot Kitchen Application Scenario Reproduction Report
ALOHA2 is an enhanced version of the dual-arm teleoperated robot ALOHA, featuring higher performance and robustness compared to the original design, while also being more ergonomic. Like ALOHA, ALOHA2 consists of two grippers and two ViperX 6-DoF arms, as well as two smaller WidowX arms. Users control the follower mechanical arms by operating the leader mechanical arms through back-driving. The device also includes cameras that generate images from multiple viewpoints, allowing for RGB data collection during teleoperation. The robot is mounted on a 48-inch x 30-inch table, equipped with an aluminum frame that provides additional mounting points for cameras and gravity compensation systems.
RoboHiMan: A Hierarchical Evaluation Paradigm for Compositional Generalization in Long-Horizon Manipulation
Enabling robots to flexibly schedule and compose learned skills for novel long-horizon manipulation under diverse perturbations remains a core challenge. Early explorations with end-to-end VLA models show limited success, as these models struggle to generalize beyond the training distribution. Hierarchical approaches, where high-level planners generate subgoals for low-level policies, bring certain improvements but still suffer under complex perturbations, revealing limited capability in skill composition. However, existing benchmarks primarily emphasize task completion in long-horizon settings, offering little insight into compositional generalization, robustness, and the interplay between planning and execution. To systematically investigate these gaps, we propose RoboHiMan, a hierarchical evaluation paradigm for compositional generalization in long-horizon manipulation. RoboHiMan introduces HiMan-Bench, a benchmark of atomic and compositional tasks under diverse perturbations, supported by a multi-level training dataset for analyzing progressive data scaling, and proposes three evaluation paradigms (vanilla, decoupled, coupled) that probe the necessity of skill composition and reveal bottlenecks in hierarchical architectures. Experiments highlight clear capability gaps across representative models and architectures, pointing to directions for advancing models better suited to real-world long-horizon manipulation tasks. Videos and open-source code can be found on our project website: https://chenyt31.github.io/robo-himan.github.io/.
comment: Under review. These first two authors contributed equally to this work
Safe Driving in Occluded Environments
Ensuring safe autonomous driving in the presence of occlusions poses a significant challenge in its policy design. While existing model-driven control techniques based on set invariance can handle visible risks, occlusions create latent risks in which safety-critical states are not observable. Data-driven techniques also struggle to handle latent risks because direct mappings from risk-critical objects in sensor inputs to safe actions cannot be learned without visible risk-critical objects. Motivated by these challenges, in this paper, we propose a probabilistic safety certificate for latent risk. Our key technical enabler is the application of probabilistic invariance: It relaxes the strict observability requirements imposed by set-invariance methods that demand the knowledge of risk-critical states. The proposed techniques provide linear action constraints that confine the latent risk probability within tolerance. Such constraints can be integrated into model predictive controllers or embedded in data-driven policies to mitigate latent risks. The proposed method is tested using the CARLA simulator and compared with a few existing techniques. The theoretical and empirical analysis jointly demonstrate that the proposed methods assure long-term safety in real-time control in occluded environments without being overly conservative and with transparency to exposed risks.
DriveCritic: Towards Context-Aware, Human-Aligned Evaluation for Autonomous Driving with Vision-Language Models
Benchmarking autonomous driving planners to align with human judgment remains a critical challenge, as state-of-the-art metrics like the Extended Predictive Driver Model Score (EPDMS) lack context awareness in nuanced scenarios. To address this, we introduce DriveCritic, a novel framework featuring two key contributions: the DriveCritic dataset, a curated collection of challenging scenarios where context is critical for correct judgment and annotated with pairwise human preferences, and the DriveCritic model, a Vision-Language Model (VLM) based evaluator. Fine-tuned using a two-stage supervised and reinforcement learning pipeline, the DriveCritic model learns to adjudicate between trajectory pairs by integrating visual and symbolic context. Experiments show DriveCritic significantly outperforms existing metrics and baselines in matching human preferences and demonstrates strong context awareness. Overall, our work provides a more reliable, human-aligned foundation to evaluating autonomous driving systems.
comment: 9 pages, 3 figures
VLA-0: Building State-of-the-Art VLAs with Zero Modification
Vision-Language-Action models (VLAs) hold immense promise for enabling generalist robot manipulation. However, the best way to build them remains an open question. Current approaches often add complexity, such as modifying the existing vocabulary of a Vision-Language Model (VLM) with action tokens or introducing special action heads. Curiously, the simplest strategy of representing actions directly as text has remained largely unexplored. This work introduces VLA-0 to investigate this idea. We find that VLA-0 is not only effective; it is surprisingly powerful. With the right design, VLA-0 outperforms more involved models. On LIBERO, a popular benchmark for evaluating VLAs, VLA-0 outperforms all existing methods trained on the same robotic data, including $\pi_0.5$-KI, OpenVLA-OFT and SmolVLA. Furthermore, without large-scale robotics-specific training, it outperforms methods trained on large-scale robotic data, like $\pi_0.5$-KI, $\pi_0$, GR00T-N1 and MolmoAct. These findings also translate to the real world, where VLA-0 outperforms SmolVLA, a VLA model pre-trained on large-scale real data. This paper summarizes our unexpected findings and spells out the specific techniques required to unlock the high performance of this simple yet potent VLA design. Visual results, code, and trained models are provided here: https://vla0.github.io/.
ViTacGen: Robotic Pushing with Vision-to-Touch Generation
Robotic pushing is a fundamental manipulation task that requires tactile feedback to capture subtle contact forces and dynamics between the end-effector and the object. However, real tactile sensors often face hardware limitations such as high costs and fragility, and deployment challenges involving calibration and variations between different sensors, while vision-only policies struggle with satisfactory performance. Inspired by humans' ability to infer tactile states from vision, we propose ViTacGen, a novel robot manipulation framework designed for visual robotic pushing with vision-to-touch generation in reinforcement learning to eliminate the reliance on high-resolution real tactile sensors, enabling effective zero-shot deployment on visual-only robotic systems. Specifically, ViTacGen consists of an encoder-decoder vision-to-touch generation network that generates contact depth images, a standardized tactile representation, directly from visual image sequence, followed by a reinforcement learning policy that fuses visual-tactile data with contrastive learning based on visual and generated tactile observations. We validate the effectiveness of our approach in both simulation and real world experiments, demonstrating its superior performance and achieving a success rate of up to 86\%.
Partial Feedback Linearization Control of a Cable-Suspended Multirotor Platform for Stabilization of an Attached Load IROS
In this work, we present a novel control approach based on partial feedback linearization (PFL) for the stabilization of a suspended aerial platform with an attached load. Such systems are envisioned for various applications in construction sites involving cranes, such as the holding and transportation of heavy objects. Our proposed control approach considers the underactuation of the whole system while utilizing its coupled dynamics for stabilization. We demonstrate using numerical stability analysis that these coupled terms are crucial for the stabilization of the complete system. We also carried out robustness analysis of the proposed approach in the presence of external wind disturbances, sensor noise, and uncertainties in system dynamics. As our envisioned target application involves cranes in outdoor construction sites, our control approaches rely on only onboard sensors, thus making it suitable for such applications. We carried out extensive simulation studies and experimental tests to validate our proposed control approach.
comment: Accepted for IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2025
Optimistic Reinforcement Learning-Based Skill Insertions for Task and Motion Planning
Task and motion planning (TAMP) for robotics manipulation necessitates long-horizon reasoning involving versatile actions and skills. While deterministic actions can be crafted by sampling or optimizing with certain constraints, planning actions with uncertainty, i.e., probabilistic actions, remains a challenge for TAMP. On the contrary, Reinforcement Learning (RL) excels in acquiring versatile, yet short-horizon, manipulation skills that are robust with uncertainties. In this letter, we design a method that integrates RL skills into TAMP pipelines. Besides the policy, a RL skill is defined with data-driven logical components that enable the skill to be deployed by symbolic planning. A plan refinement sub-routine is designed to further tackle the inevitable effect uncertainties. In the experiments, we compare our method with baseline hierarchical planning from both TAMP and RL fields and illustrate the strength of the method. The results show that by embedding RL skills, we extend the capability of TAMP to domains with probabilistic skills, and improve the planning efficiency compared to the previous methods.
Adaptive Obstacle-Aware Task Assignment and Planning for Heterogeneous Robot Teaming
Multi-Agent Task Assignment and Planning (MATP) has attracted growing attention but remains challenging in terms of scalability, spatial reasoning, and adaptability in obstacle-rich environments. To address these challenges, we propose OATH: Adaptive Obstacle-Aware Task Assignment and Planning for Heterogeneous Robot Teaming, which advances MATP by introducing a novel obstacle-aware strategy for task assignment. First, we develop an adaptive Halton sequence map, the first known application of Halton sampling with obstacle-aware adaptation in MATP, which adjusts sampling density based on obstacle distribution. Second, we propose a cluster-auction-selection framework that integrates obstacle-aware clustering with weighted auctions and intra-cluster task selection. These mechanisms jointly enable effective coordination among heterogeneous robots while maintaining scalability and near-optimal allocation performance. In addition, our framework leverages an LLM to interpret human instructions and directly guide the planner in real time. We validate OATH in NVIDIA Isaac Sim, showing substantial improvements in task assignment quality, scalability, adaptability to dynamic changes, and overall execution performance compared to state-of-the-art MATP baselines. A project website is available at https://llm-oath.github.io/.
comment: 16 pages, 11 figures, 4 tables
Spatially Intelligent Patrol Routes for Concealed Emitter Localization by Robot Swarms
This paper introduces a method for designing spatially intelligent robot swarm behaviors to localize concealed radio emitters. We use differential evolution to generate geometric patrol routes that localize unknown signals independently of emitter parameters, a key challenge in electromagnetic surveillance. Patrol shape and antenna type are shown to influence information gain, which in turn determines the effective triangulation coverage. We simulate a four-robot swarm across eight configurations, assigning pre-generated patrol routes based on a specified patrol shape and sensing capability (antenna type: omnidirectional or directional). An emitter is placed within the map for each trial, with randomized position, transmission power and frequency. Results show that omnidirectional localization success rates are driven primarily by source location rather than signal properties, with failures occurring most often when sources are placed in peripheral areas of the map. Directional antennas are able to overcome this limitation due to their higher gain and directivity, with an average detection success rate of 98.75% compared to 80.25% for omnidirectional. Average localization errors range from 1.01-1.30 m for directional sensing and 1.67-1.90 m for omnidirectional sensing; while directional sensing also benefits from shorter patrol edges. These results demonstrate that a swarm's ability to predict electromagnetic phenomena is directly dependent on its physical interaction with the environment. Consequently, spatial intelligence, realized here through optimized patrol routes and antenna selection, is a critical design consideration for effective robotic surveillance.
A Diffusion-Refined Planner with Reinforcement Learning Priors for Confined-Space Parking
The growing demand for parking has increased the need for automated parking planning methods that can operate reliably in confined spaces. In restricted and complex environments, high-precision maneuvers are required to achieve a high success rate in planning, yet existing approaches often rely on explicit action modeling, which faces challenges when accurately modeling the optimal action distribution. In this paper, we propose DRIP, a diffusion-refined planner anchored in reinforcement learning (RL) prior action distribution, in which an RL-pretrained policy provides prior action distributions to regularize the diffusion training process. During the inference phase the denoising process refines these coarse priors into more precise action distributions. By steering the denoising trajectory through the reinforcement learning prior distribution during training, the diffusion model inherits a well-informed initialization, resulting in more accurate action modeling, a higher planning success rate, and reduced inference steps. We evaluate our approach across parking scenarios with varying degrees of spatial constraints. Experimental results demonstrate that our method significantly improves planning performance in confined-space parking environments while maintaining strong generalization in common scenarios.
EReLiFM: Evidential Reliability-Aware Residual Flow Meta-Learning for Open-Set Domain Generalization under Noisy Labels
Open-Set Domain Generalization (OSDG) aims to enable deep learning models to recognize unseen categories in new domains, which is crucial for real-world applications. Label noise hinders open-set domain generalization by corrupting source-domain knowledge, making it harder to recognize known classes and reject unseen ones. While existing methods address OSDG under Noisy Labels (OSDG-NL) using hyperbolic prototype-guided meta-learning, they struggle to bridge domain gaps, especially with limited clean labeled data. In this paper, we propose Evidential Reliability-Aware Residual Flow Meta-Learning (EReLiFM). We first introduce an unsupervised two-stage evidential loss clustering method to promote label reliability awareness. Then, we propose a residual flow matching mechanism that models structured domain- and category-conditioned residuals, enabling diverse and uncertainty-aware transfer paths beyond interpolation-based augmentation. During this meta-learning process, the model is optimized such that the update direction on the clean set maximizes the loss decrease on the noisy set, using pseudo labels derived from the most confident predicted class for supervision. Experimental results show that EReLiFM outperforms existing methods on OSDG-NL, achieving state-of-the-art performance. The source code is available at https://github.com/KPeng9510/ERELIFM.
comment: The source code is available at https://github.com/KPeng9510/ERELIFM
MTIL: Encoding Full History with Mamba for Temporal Imitation Learning
Standard imitation learning (IL) methods have achieved considerable success in robotics, yet often rely on the Markov assumption, which falters in long-horizon tasks where history is crucial for resolving perceptual ambiguity. This limitation stems not only from a conceptual gap but also from a fundamental computational barrier: prevailing architectures like Transformers are often constrained by quadratic complexity, rendering the processing of long, high-dimensional observation sequences infeasible. To overcome this dual challenge, we introduce Mamba Temporal Imitation Learning (MTIL). Our approach represents a new paradigm for robotic learning, which we frame as a practical synthesis of World Model and Dynamical System concepts. By leveraging the linear-time recurrent dynamics of State Space Models (SSMs), MTIL learns an implicit, action-oriented world model that efficiently encodes the entire trajectory history into a compressed, evolving state. This allows the policy to be conditioned on a comprehensive temporal context, transcending the confines of Markovian approaches. Through extensive experiments on simulated benchmarks (ACT, Robomimic, LIBERO) and on challenging real-world tasks, MTIL demonstrates superior performance against SOTA methods like ACT and Diffusion Policy, particularly in resolving long-term temporal ambiguities. Our findings not only affirm the necessity of full temporal context but also validate MTIL as a powerful and a computationally feasible approach for learning long-horizon, non-Markovian behaviors from high-dimensional observations.
comment: Published in IEEE Robotics and Automation Letters (RA-L), 2025. 8 pages, 5 figures
Hybrid Terrain-Aware Path Planning: Integrating VD-RRT* Exploration and VD-D* Lite Repair
Autonomous ground vehicles operating off-road must plan curvature-feasible paths while accounting for spatially varying soil strength and slope hazards in real time. We present a continuous state--cost metric that combines a Bekker pressure--sinkage model with elevation-derived slope and attitude penalties. The resulting terrain cost field is analytic, bounded, and monotonic in soil modulus and slope, ensuring well-posed discretization and stable updates under sensor noise. This metric is evaluated on a lattice with exact steering primitives: Dubins and Reeds--Shepp motions for differential drive and time-parameterized bicycle arcs for Ackermann steering. Global exploration is performed using Vehicle-Dynamics RRT\(^{*}\), while local repair is managed by Vehicle-Dynamics D\(^{*}\) Lite, enabling millisecond-scale replanning without heuristic smoothing. By separating the terrain--vehicle model from the planner, the framework provides a reusable basis for deterministic, sampling-based, or learning-driven planning in deformable terrain. Hardware trials on an off-road platform demonstrate real-time navigation across soft soil and slope transitions, supporting reliable autonomy in unstructured environments.
Product Digital Twin Supporting End-of-life Phase of Electric Vehicle Batteries Utilizing Product-Process-Resource Asset Network
In a circular economy, products in their end-of-life phase should be either remanufactured or recycled. Both of these processes are crucial for sustainability and environmental conservation. However, manufacturers frequently do not support these processes enough in terms of not sharing relevant data about the products nor their (re-)manufacturing processes. This paper proposes to accompany each product with a digital twin technology, specifically the Product Digital Twin (PDT), which can carry information for facilitating and optimizing production and remanufacturing processes. This paper introduces a knowledge representation called Bi-Flow Product-Process-Resource Asset Network (Bi-PAN). Bi-PAN extends a well-proven Product-Process-Resource Asset Network (PAN) paradigm by integrating both assembly and disassembly workflows into a single information model. Such networks enable capturing relevant relationships across products, production resources, manufacturing processes, and specific production operations that have to be done in the manufacturing phase of a product. The proposed approach is demonstrated in a use-case of disassembling electric vehicle (EV) batteries. By utilizing PDTs with Bi-PAN knowledge models, challenges associated with disassembling of EV batteries can be solved flexibly and efficiently for various battery types, enhancing the sustainability of the EV battery life-cycle management.
comment: \copyright 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
More than A Point: Capturing Uncertainty with Adaptive Affordance Heatmaps for Spatial Grounding in Robotic Tasks
Many language-guided robotic systems rely on collapsing spatial reasoning into discrete points, making them brittle to perceptual noise and semantic ambiguity. To address this challenge, we propose RoboMAP, a framework that represents spatial targets as continuous, adaptive affordance heatmaps. This dense representation captures the uncertainty in spatial grounding and provides richer information for downstream policies, thereby significantly enhancing task success and interpretability. RoboMAP surpasses the previous state-of-the-art on a majority of grounding benchmarks with up to a 50x speed improvement, and achieves an 82\% success rate in real-world manipulation. Across extensive simulated and physical experiments, it demonstrates robust performance and shows strong zero-shot generalization to navigation. More details and videos can be found at https://robo-map.github.io.
comment: More details and videos can be found at https://robo-map.github.io
Flattening Hierarchies with Policy Bootstrapping NeurIPS 2025
Offline goal-conditioned reinforcement learning (GCRL) is a promising approach for pretraining generalist policies on large datasets of reward-free trajectories, akin to the self-supervised objectives used to train foundation models for computer vision and natural language processing. However, scaling GCRL to longer horizons remains challenging due to the combination of sparse rewards and discounting, which obscures the comparative advantages of primitive actions with respect to distant goals. Hierarchical RL methods achieve strong empirical results on long-horizon goal-reaching tasks, but their reliance on modular, timescale-specific policies and subgoal generation introduces significant additional complexity and hinders scaling to high-dimensional goal spaces. In this work, we introduce an algorithm to train a flat (non-hierarchical) goal-conditioned policy by bootstrapping on subgoal-conditioned policies with advantage-weighted importance sampling. Our approach eliminates the need for a generative model over the (sub)goal space, which we find is key for scaling to high-dimensional control in large state spaces. We further show that existing hierarchical and bootstrapping-based approaches correspond to specific design choices within our derivation. Across a comprehensive suite of state- and pixel-based locomotion and manipulation benchmarks, our method matches or surpasses state-of-the-art offline GCRL algorithms and scales to complex, long-horizon tasks where prior approaches fail. Project page: https://johnlyzhou.github.io/saw/
comment: NeurIPS 2025 (Spotlight, top 3.2%)
QuaDreamer: Controllable Panoramic Video Generation for Quadruped Robots
Panoramic cameras, capturing comprehensive 360-degree environmental data, are suitable for quadruped robots in surrounding perception and interaction with complex environments. However, the scarcity of high-quality panoramic training data-caused by inherent kinematic constraints and complex sensor calibration challenges-fundamentally limits the development of robust perception systems tailored to these embodied platforms. To address this issue, we propose QuaDreamer-the first panoramic data generation engine specifically designed for quadruped robots. QuaDreamer focuses on mimicking the motion paradigm of quadruped robots to generate highly controllable, realistic panoramic videos, providing a data source for downstream tasks. Specifically, to effectively capture the unique vertical vibration characteristics exhibited during quadruped locomotion, we introduce Vertical Jitter Encoding (VJE). VJE extracts controllable vertical signals through frequency-domain feature filtering and provides high-quality prompts. To facilitate high-quality panoramic video generation under jitter signal control, we propose a Scene-Object Controller (SOC) that effectively manages object motion and boosts background jitter control through the attention mechanism. To address panoramic distortions in wide-FoV video generation, we propose the Panoramic Enhancer (PE)-a dual-stream architecture that synergizes frequency-texture refinement for local detail enhancement with spatial-structure correction for global geometric consistency. We further demonstrate that the generated video sequences can serve as training data for the quadruped robot's panoramic visual perception model, enhancing the performance of multi-object tracking in 360-degree scenes. The source code and model weights will be publicly available at https://github.com/losehu/QuaDreamer.
comment: Accepted to CoRL 2025. The source code and model weights will be publicly available at https://github.com/losehu/QuaDreamer
LLM-Enabled In-Context Learning for Data Collection Scheduling in UAV-assisted Sensor Networks
Unmanned Aerial Vehicles (UAVs) are increasingly being utilized in various private and commercial applications, e.g., traffic control, parcel delivery, and Search and Rescue (SAR) missions. Machine Learning (ML) methods used in UAV-Assisted Sensor Networks (UASNETs) and, especially, in Deep Reinforcement Learning (DRL) face challenges such as complex and lengthy model training, gaps between simulation and reality, and low sampling efficiency, which conflict with the urgency of emergencies, such as SAR missions. In this paper, an In-Context Learning (ICL)-Data Collection Scheduling (ICLDC) system is proposed as an alternative to DRL in emergencies. The UAV collects sensory data and transmits it to a Large Language Model (LLM), which creates a task description in natural language. From this description, the UAV receives a data collection schedule that must be executed. A verifier ensures safe UAV operations by evaluating the schedules generated by the LLM and overriding unsafe schedules based on predefined rules. The system continuously adapts by incorporating feedback into the task descriptions and using this for future decisions. This method is tested against jailbreaking attacks, where the task description is manipulated to undermine network performance, highlighting the vulnerability of LLMs to such attacks. The proposed ICLDC significantly reduces cumulative packet loss compared to both the DQN and Maximum Channel Gain baselines. ICLDC presents a promising direction for intelligent scheduling and control in UASNETs.
Dual-Regularized Riccati Recursions for Interior-Point Optimal Control
We derive closed-form extensions of Riccati's recursions (both sequential and parallel) for solving dual-regularized LQR problems. We show how these methods can be used to solve general constrained, non-convex, discrete-time optimal control problems via a regularized interior point method, while guaranteeing that each step is a descent direction of an Augmented Barrier-Lagrangian merit function. We provide MIT-licensed implementations of our methods in C++ and JAX.
Extended Friction Models for the Physics Simulation of Servo Actuators
Accurate physical simulation is crucial for the development and validation of control algorithms in robotic systems. Recent works in Reinforcement Learning (RL) take notably advantage of extensive simulations to produce efficient robot control. State-of-the-art servo actuator models generally fail at capturing the complex friction dynamics of these systems. This limits the transferability of simulated behaviors to real-world applications. In this work, we present extended friction models that allow to more accurately simulate servo actuator dynamics. We propose a comprehensive analysis of various friction models, present a method for identifying model parameters using recorded trajectories from a pendulum test bench, and demonstrate how these models can be integrated into physics engines. The proposed friction models are validated on four distinct servo actuators and tested on 2R manipulators, showing significant improvements in accuracy over the standard Coulomb-Viscous model. Our results highlight the importance of considering advanced friction effects in the simulation of servo actuators to enhance the realism and reliability of robotic simulations.
Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision
Robot vision has greatly benefited from advancements in multimodal fusion techniques and vision-language models (VLMs). We adopt a task-oriented perspective to systematically review the applications and advancements of multimodal fusion methods and VLMs in the field of robot vision. For semantic scene understanding tasks, we categorize fusion approaches into encoder-decoder frameworks, attention-based architectures, and graph neural networks. Meanwhile, we also analyze the architectural characteristics and practical implementations of these fusion strategies in key tasks such as simultaneous localization and mapping (SLAM), 3D object detection, navigation, and manipulation. We compare the evolutionary paths and applicability of VLMs based on large language models (LLMs) with traditional multimodal fusion methods.Additionally, we conduct an in-depth analysis of commonly used datasets, evaluating their applicability and challenges in real-world robotic scenarios. Building on this analysis, we identify key challenges in current research, including cross-modal alignment, efficient fusion, real-time deployment, and domain adaptation. We propose future directions such as self-supervised learning for robust multimodal representations, structured spatial memory and environment modeling to enhance spatial intelligence, and the integration of adversarial robustness and human feedback mechanisms to enable ethically aligned system deployment. Through a comprehensive review, comparative analysis, and forward-looking discussion, we provide a valuable reference for advancing multimodal perception and interaction in robotic vision. A comprehensive list of studies in this survey is available at https://github.com/Xiaofeng-Han-Res/MF-RV.
comment: 27 pages, 11 figures. Accepted to Information Fusion. Final journal version: volume 126 (Part B), February 2026
Towards Proprioception-Aware Embodied Planning for Dual-Arm Humanoid Robots
In recent years, Multimodal Large Language Models (MLLMs) have demonstrated the ability to serve as high-level planners, enabling robots to follow complex human instructions. However, their effectiveness, especially in long-horizon tasks involving dual-arm humanoid robots, remains limited. This limitation arises from two main challenges: (i) the absence of simulation platforms that systematically support task evaluation and data collection for humanoid robots, and (ii) the insufficient embodiment awareness of current MLLMs, which hinders reasoning about dual-arm selection logic and body positions during planning. To address these issues, we present DualTHOR, a new dual-arm humanoid simulator, with continuous transition and a contingency mechanism. Building on this platform, we propose Proprio-MLLM, a model that enhances embodiment awareness by incorporating proprioceptive information with motion-based position embedding and a cross-spatial encoder. Experiments show that, while existing MLLMs struggle in this environment, Proprio-MLLM achieves an average improvement of 19.75% in planning performance. Our work provides both an essential simulation platform and an effective model to advance embodied intelligence in humanoid robotics. The code is available at https://anonymous.4open.science/r/DualTHOR-5F3B.
GARField: Addressing the visual Sim-to-Real gap in garment manipulation with mesh-attached radiance fields
While humans intuitively manipulate garments and other textile items swiftly and accurately, it is a significant challenge for robots. A factor crucial to human performance is the ability to imagine, a priori, the intended result of the manipulation intents and hence develop predictions on the garment pose. That ability allows us to plan from highly obstructed states, adapt our plans as we collect more information and react swiftly to unforeseen circumstances. Conversely, robots struggle to establish such intuitions and form tight links between plans and observations. We can partly attribute this to the high cost of obtaining densely labelled data for textile manipulation, both in quality and quantity. The problem of data collection is a long-standing issue in data-based approaches to garment manipulation. As of today, generating high-quality and labelled garment manipulation data is mainly attempted through advanced data capture procedures that create simplified state estimations from real-world observations. However, this work proposes a novel approach to the problem by generating real-world observations from object states. To achieve this, we present GARField (Garment Attached Radiance Field), the first differentiable rendering architecture, to our knowledge, for data generation from simulated states stored as triangle meshes. Code is available on https://ddonatien.github.io/garfield-website/
comment: Project site: https://ddonatien.github.io/garfield-website/
Geometric Backstepping Control of Omnidirectional Tiltrotors Incorporating Servo-Rotor Dynamics for Robustness against Sudden Disturbances
This work presents a geometric backstepping controller for a variable-tilt omnidirectional multirotor that explicitly accounts for both servo and rotor dynamics. Considering actuator dynamics is essential for more effective and reliable operation, particularly during aggressive flight maneuvers or recovery from sudden disturbances. While prior studies have investigated actuator-aware control for conventional and fixed-tilt multirotors, these approaches rely on linear relationships between actuator input and wrench, which cannot capture the nonlinearities induced by variable tilt angles. In this work, we exploit the cascade structure between the rigid-body dynamics of the multirotor and its nonlinear actuator dynamics to design the proposed backstepping controller and establish exponential stability of the overall system. Furthermore, we reveal parametric uncertainty in the actuator model through experiments, and we demonstrate that the proposed controller remains robust against such uncertainty. The controller was compared against a baseline that does not account for actuator dynamics across three experimental scenarios: fast translational tracking, rapid rotational tracking, and recovery from sudden disturbance. The proposed method consistently achieved better tracking performance, and notably, while the baseline diverged and crashed during the fastest translational trajectory tracking and the recovery experiment, the proposed controller maintained stability and successfully completed the tasks, thereby demonstrating its effectiveness.
Robust Statistics vs. Machine Learning vs. Bayesian Inference: Insights into Handling Faulty GNSS Measurements in Field Robotics IROS2025
This paper presents research findings on handling faulty measurements (i.e., outliers) of global navigation satellite systems (GNSS) for vehicle localization under adverse signal conditions in field applications, where raw GNSS data are frequently corrupted due to environmental interference such as multipath, signal blockage, or non-line-of-sight conditions. In this context, we investigate three strategies applied specifically to GNSS pseudorange observations: robust statistics for error mitigation, machine learning for faulty measurement prediction, and Bayesian inference for noise distribution approximation. Since previous studies have provided limited insight into the theoretical foundations and practical evaluations of these three methodologies within a unified problem statement (i.e., state estimation using ranging sensors), we conduct extensive experiments using real-world sensor data collected in diverse urban environments. Our goal is to examine both established techniques and newly proposed methods, thereby advancing the understanding of how to handle faulty range measurements, such as GNSS, for robust, long-term vehicle localization. In addition to presenting successful results, this work highlights critical observations and open questions to motivate future research in robust state estimation.
comment: Accepted to the 2nd Workshop on Safety of Intelligent and Autonomous Vehicles: Formal Methods vs. Machine Learning approaches for reliable navigation (SIAV-FM2L) at IEEE IROS2025
EO-1: Interleaved Vision-Text-Action Pretraining for General Robot Control
The human ability to seamlessly perform multimodal reasoning and physical interaction in the open world is a core goal for general-purpose embodied intelligent systems. Recent vision-language-action (VLA) models, which are co-trained on large-scale robot and visual-text data, have demonstrated notable progress in general robot control. However, they still fail to achieve human-level flexibility in interleaved reasoning and interaction. In this work, introduce EO-Robotics, consists of EO-1 model and EO-Data1.5M dataset. EO-1 is a unified embodied foundation model that achieves superior performance in multimodal embodied reasoning and robot control through interleaved vision-text-action pre-training. The development of EO-1 is based on two key pillars: (i) a unified architecture that processes multimodal inputs indiscriminately (image, text, video, and action), and (ii) a massive, high-quality multimodal embodied reasoning dataset, EO-Data1.5M, which contains over 1.5 million samples with emphasis on interleaved vision-text-action comprehension. EO-1 is trained through synergies between auto-regressive decoding and flow matching denoising on EO-Data1.5M, enabling seamless robot action generation and multimodal embodied reasoning. Extensive experiments demonstrate the effectiveness of interleaved vision-text-action learning for open-world understanding and generalization, validated through a variety of long-horizon, dexterous manipulation tasks across multiple embodiments. This paper details the architecture of EO-1, the data construction strategy of EO-Data1.5M, and the training methodology, offering valuable insights for developing advanced embodied foundation models.
USIM and U0: A Vision-Language-Action Dataset and Model for General Underwater Robots
Underwater environments present unique challenges for robotic operation, including complex hydrodynamics, limited visibility, and constrained communication. Although data-driven approaches have advanced embodied intelligence in terrestrial robots and enabled task-specific autonomous underwater robots, developing underwater intelligence capable of autonomously performing multiple tasks remains highly challenging, as large-scale, high-quality underwater datasets are still scarce. To address these limitations, we introduce USIM, a simulation-based multi-task Vision-Language-Action (VLA) dataset for underwater robots. USIM comprises over 561K frames from 1,852 trajectories, totaling approximately 15.6 hours of BlueROV2 interactions across 20 tasks in 9 diverse scenarios, ranging from visual navigation to mobile manipulation. Building upon this dataset, we propose U0, a VLA model for general underwater robots, which integrates binocular vision and other sensor modalities through multimodal fusion, and further incorporates a convolution-attention-based perception focus enhancement module (CAP) to improve spatial understanding and mobile manipulation. Across tasks such as inspection, obstacle avoidance, scanning, and dynamic tracking, the framework achieves a success rate of 80%, while in challenging mobile manipulation tasks, it reduces the distance to the target by 21.2% compared with baseline methods, demonstrating its effectiveness. USIM and U0 show that VLA models can be effectively applied to underwater robotic applications, providing a foundation for scalable dataset construction, improved task autonomy, and the practical realization of intelligent general underwater robots.
comment: Project Page: https://vincentgu2000.github.io/u0project/
Hi-Drive: Hierarchical POMDP Planning for Safe Autonomous Driving in Diverse Urban Environments
Uncertainties in dynamic road environments pose significant challenges for behavior and trajectory planning in autonomous driving. This paper introduces Hi-Drive, a hierarchical planning algorithm addressing uncertainties at both behavior and trajectory levels using a hierarchical Partially Observable Markov Decision Process (POMDP) formulation. Hi-Drive employs driver models to represent uncertain behavioral intentions of other vehicles and uses their parameters to infer hidden driving styles. By treating driver models as high-level decision-making actions, our approach effectively manages the exponential complexity inherent in POMDPs. To further enhance safety and robustness, Hi-Drive integrates a trajectory optimization based on importance sampling, refining trajectories using a comprehensive analysis of critical agents. Evaluations on real-world urban driving datasets demonstrate that Hi-Drive significantly outperforms state-of-the-art planning-based and learning-based methods across diverse urban driving situations in real-world benchmarks.
A Verification Methodology for Safety Assurance of Robotic Autonomous Systems
Autonomous robots deployed in shared human environments, such as agricultural settings, require rigorous safety assurance to meet both functional reliability and regulatory compliance. These systems must operate in dynamic, unstructured environments, interact safely with humans, and respond effectively to a wide range of potential hazards. This paper presents a verification workflow for the safety assurance of an autonomous agricultural robot, covering the entire development life-cycle, from concept study and design to runtime verification. The outlined methodology begins with a systematic hazard analysis and risk assessment to identify potential risks and derive corresponding safety requirements. A formal model of the safety controller is then developed to capture its behaviour and verify that the controller satisfies the specified safety properties with respect to these requirements. The proposed approach is demonstrated on a field robot operating in an agricultural setting. The results show that the methodology can be effectively used to verify safety-critical properties and facilitate the early identification of design issues, contributing to the development of safer robots and autonomous systems.
comment: In Proc. of the 26th TAROS (Towards Autonomous Robotic Systems) Conference, York, UK, August, 2025
RealEngine: Simulating Autonomous Driving in Realistic Context
Driving simulation plays a crucial role in developing reliable driving agents by providing controlled, evaluative environments. To enable meaningful assessments, a high-quality driving simulator must satisfy several key requirements: multi-modal sensing capabilities (e.g., camera and LiDAR) with realistic scene rendering to minimize observational discrepancies; closed-loop evaluation to support free-form trajectory behaviors; highly diverse traffic scenarios for thorough evaluation; multi-agent cooperation to capture interaction dynamics; and high computational efficiency to ensure affordability and scalability. However, existing simulators and benchmarks fail to comprehensively meet these fundamental criteria. To bridge this gap, this paper introduces RealEngine, a novel driving simulation framework that holistically integrates 3D scene reconstruction and novel view synthesis techniques to achieve realistic and flexible closed-loop simulation in the driving context. By leveraging real-world multi-modal sensor data, RealEngine reconstructs background scenes and foreground traffic participants separately, allowing for highly diverse and realistic traffic scenarios through flexible scene composition. This synergistic fusion of scene reconstruction and view synthesis enables photorealistic rendering across multiple sensor modalities, ensuring both perceptual fidelity and geometric accuracy. Building upon this environment, RealEngine supports three essential driving simulation categories: non-reactive simulation, safety testing, and multi-agent interaction, collectively forming a reliable and comprehensive benchmark for evaluating the real-world performance of driving agents.
A Hierarchical Bin Packing Framework with Dual Manipulators via Heuristic Search and Deep Reinforcement Learning
We address the bin packing problem (BPP), which aims to maximize bin utilization when packing a variety of items. The offline problem, where the complete information about the item set and their sizes is known in advance, is proven to be NP-hard. The semi-online and online variants are even more challenging, as full information about incoming items is unavailable. While existing methods have tackled both 2D and 3D BPPs, the 2D BPP remains underexplored in terms of fully maximizing utilization. We propose a hierarchical approach for solving the 2D online and semi-online BPP by combining deep reinforcement learning (RL) with heuristic search. The heuristic search selects which item to pack or unpack, determines the packing order, and chooses the orientation of each item, while the RL agent decides the precise position within the bin. Our method is capable of handling diverse scenarios, including repacking, varying levels of item information, differing numbers of accessible items, and coordination of dual manipulators. Experimental results demonstrate that our approach achieves near-optimal utilization across various practical scenarios, largely due to its repacking capability. In addition, the algorithm is evaluated in a physics-based simulation environment, where execution time is measured to assess its real-world performance.
Inland-LOAM: Voxel-Based Structural Semantic LiDAR Odometry and Mapping for Inland Waterway Navigation
Accurate geospatial information is crucial for safe, autonomous Inland Waterway Transport (IWT), as existing charts (IENC) lack real-time detail and conventional LiDAR SLAM fails in waterway environments. These challenges lead to vertical drift and non-semantic maps, hindering autonomous navigation. This paper introduces Inland-LOAM, a LiDAR SLAM framework for waterways. It uses an improved feature extraction and a water surface planar constraint to mitigate vertical drift. A novel pipeline transforms 3D point clouds into structured 2D semantic maps using voxel-based geometric analysis, enabling real-time computation of navigational parameters like bridge clearances. An automated module extracts shorelines and exports them into a lightweight, IENC-compatible format. Evaluations on a real-world dataset show Inland-LOAM achieves superior localization accuracy over state-of-the-art methods. The generated semantic maps and shorelines align with real-world conditions, providing reliable data for enhanced situational awareness. The code and dataset will be publicly available
EmbodiedCoder: Parameterized Embodied Mobile Manipulation via Modern Coding Model
Recent advances in control robot methods, from end-to-end vision-language-action frameworks to modular systems with predefined primitives, have advanced robots' ability to follow natural language instructions. Nonetheless, many approaches still struggle to scale to diverse environments, as they often rely on large annotated datasets and offer limited interpretability.In this work, we introduce EmbodiedCoder, a training-free framework for open-world mobile robot manipulation that leverages coding models to directly generate executable robot trajectories. By grounding high-level instructions in code, EmbodiedCoder enables flexible object geometry parameterization and manipulation trajectory synthesis without additional data collection or fine-tuning.This coding-based paradigm provides a transparent and generalizable way to connect perception with manipulation. Experiments on real mobile robots show that EmbodiedCoder achieves robust performance across diverse long-term tasks and generalizes effectively to novel objects and environments.Our results demonstrate an interpretable approach for bridging high-level reasoning and low-level control, moving beyond fixed primitives toward versatile robot intelligence. See the project page at: https://embodiedcoder.github.io/EmbodiedCoder/
comment: Demo Page: https://embodiedcoder.github.io/EmbodiedCoder/
Tiny Learning-Based MPC for Multirotors: Solver-Aware Learning for Efficient Embedded Predictive Control
Tiny aerial robots hold great promise for applications such as environmental monitoring and search-and-rescue, yet face significant control challenges due to limited onboard computing power and nonlinear dynamics. Model Predictive Control (MPC) enables agile trajectory tracking and constraint handling but depends on an accurate dynamics model. While existing Learning-Based (LB) MPC methods, such as Gaussian Process (GP) MPC, enhance performance by learning residual dynamics, their high computational cost restricts onboard deployment on tiny robots. This paper introduces Tiny LB MPC, a co-designed MPC framework and optimization solver for resource-constrained micro multirotor platforms. The proposed approach achieves 100 Hz control on a Crazyflie 2.1 equipped with a Teensy 4.0 microcontroller, demonstrating a 43% average improvement in tracking performance over existing embedded MPC methods under model uncertainty, and achieving the first onboard implementation of LB MPC on a 53 g multirotor.
Ctrl-World: A Controllable Generative World Model for Robot Manipulation
Generalist robot policies can now perform a wide range of manipulation skills, but evaluating and improving their ability with unfamiliar objects and instructions remains a significant challenge. Rigorous evaluation requires a large number of real-world rollouts, while systematic improvement demands additional corrective data with expert labels. Both of these processes are slow, costly, and difficult to scale. World models offer a promising, scalable alternative by enabling policies to rollout within imagination space. However, a key challenge is building a controllable world model that can handle multi-step interactions with generalist robot policies. This requires a world model compatible with modern generalist policies by supporting multi-view prediction, fine-grained action control, and consistent long-horizon interactions, which is not achieved by previous works. In this paper, we make a step forward by introducing a controllable multi-view world model that can be used to evaluate and improve the instruction-following ability of generalist robot policies. Our model maintains long-horizon consistency with a pose-conditioned memory retrieval mechanism and achieves precise action control through frame-level action conditioning. Trained on the DROID dataset (95k trajectories, 564 scenes), our model generates spatially and temporally consistent trajectories under novel scenarios and new camera placements for over 20 seconds. We show that our method can accurately rank policy performance without real-world robot rollouts. Moreover, by synthesizing successful trajectories in imagination and using them for supervised fine-tuning, our approach can improve policy success by 44.7\%.
comment: 17 pages
A Faster and More Reliable Middleware for Autonomous Driving Systems
Ensuring safety in high-speed autonomous vehicles requires rapid control loops and tightly bounded delays from perception to actuation. Many open-source autonomy systems rely on ROS 2 middleware; when multiple sensor and control nodes share one compute unit, ROS 2 and its DDS transports add significant (de)serialization, copying, and discovery overheads, shrinking the available time budget. We present Sensor-in-Memory (SIM), a shared-memory transport designed for intra-host pipelines in autonomous vehicles. SIM keeps sensor data in native memory layouts (e.g., cv::Mat, PCL), uses lock-free bounded double buffers that overwrite old data to prioritize freshness, and integrates into ROS 2 nodes with four lines of code. Unlike traditional middleware, SIM operates beside ROS 2 and is optimized for applications where data freshness and minimal latency outweigh guaranteed completeness. SIM provides sequence numbers, a writer heartbeat, and optional checksums to ensure ordering, liveness, and basic integrity. On an NVIDIA Jetson Orin Nano, SIM reduces data-transport latency by up to 98% compared to ROS 2 zero-copy transports such as FastRTPS and Zenoh, lowers mean latency by about 95%, and narrows 95th/99th-percentile tail latencies by around 96%. In tests on a production-ready Level 4 vehicle running Autoware.Universe, SIM increased localization frequency from 7.5 Hz to 9.5 Hz. Applied across all latency-critical modules, SIM cut average perception-to-decision latency from 521.91 ms to 290.26 ms, reducing emergency braking distance at 40 mph (64 km/h) on dry concrete by 13.6 ft (4.14 m).
comment: 8 pages,7 figures, 8 tables
Predictive Preference Learning from Human Interventions NeurIPS 2025
Learning from human involvement aims to incorporate the human subject to monitor and correct agent behavior errors. Although most interactive imitation learning methods focus on correcting the agent's action at the current state, they do not adjust its actions in future states, which may be potentially more hazardous. To address this, we introduce Predictive Preference Learning from Human Interventions (PPL), which leverages the implicit preference signals contained in human interventions to inform predictions of future rollouts. The key idea of PPL is to bootstrap each human intervention into L future time steps, called the preference horizon, with the assumption that the agent follows the same action and the human makes the same intervention in the preference horizon. By applying preference optimization on these future states, expert corrections are propagated into the safety-critical regions where the agent is expected to explore, significantly improving learning efficiency and reducing human demonstrations needed. We evaluate our approach with experiments on both autonomous driving and robotic manipulation benchmarks and demonstrate its efficiency and generality. Our theoretical analysis further shows that selecting an appropriate preference horizon L balances coverage of risky states with label correctness, thereby bounding the algorithmic optimality gap. Demo and code are available at: https://metadriverse.github.io/ppl
comment: NeurIPS 2025 Spotlight. Project page: https://metadriverse.github.io/ppl
BlueME: Robust Underwater Robot-to-Robot Communication Using Compact Magnetoelectric Antennas
We present the design, development, and experimental validation of BlueME, a compact magnetoelectric (ME) antenna array system for underwater robot-to-robot communication. BlueME employs ME antennas operating at their natural mechanical resonance frequency to efficiently transmit and receive very-low-frequency (VLF) electromagnetic signals underwater. We outline the design, simulation, fabrication, and integration of the proposed system on low-power embedded platforms, focusing on portable and scalable applications. For performance evaluation, we deployed BlueME on an autonomous surface vehicle (ASV) and a remotely operated vehicle (ROV) in open-water field trials. Ocean trials demonstrate that BlueME maintains reliable signal transmission at distances beyond 700 meters while consuming only 10 watts of power. Field trials show that the system operates effectively in challenging underwater conditions such as turbidity, obstacles, and multipath interference -- conditions that generally affect acoustics and optics. Our analysis also examines the impact of complete submersion on system performance and identifies key deployment considerations. This work represents the first practical underwater deployment of ME antennas outside the laboratory and implements the largest VLF ME array system to date. BlueME demonstrates significant potential for marine robotics and automation in multi-robot cooperative systems and remote sensor networks.
Seeing through Uncertainty: Robust Task-Oriented Optimization in Visual Navigation
Visual navigation is a fundamental problem in embodied AI, yet practical deployments demand long-horizon planning capabilities to address multi-objective tasks. A major bottleneck is data scarcity: policies learned from limited data often overfit and fail to generalize OOD. Existing neural network-based agents typically increase architectural complexity that paradoxically become counterproductive in the small-sample regime. This paper introduce NeuRO, a integrated learning-to-optimize framework that tightly couples perception networks with downstream task-level robust optimization. Specifically, NeuRO addresses core difficulties in this integration: (i) it transforms noisy visual predictions under data scarcity into convex uncertainty sets using Partially Input Convex Neural Networks (PICNNs) with conformal calibration, which directly parameterize the optimization constraints; and (ii) it reformulates planning under partial observability as a robust optimization problem, enabling uncertainty-aware policies that transfer across environments. Extensive experiments on both unordered and sequential multi-object navigation tasks demonstrate that NeuRO establishes SoTA performance, particularly in generalization to unseen environments. Our work thus presents a significant advancement for developing robust, generalizable autonomous agents.
AI-Agents for Culturally Diverse Online Higher Education Environments
As the global reach of online higher education continues to grow, universities are increasingly accommodating students from diverse cultural backgrounds (Tereshko et al., 2024). This can present a number of challenges including linguistic barriers (Ullah et al., 2021), cultural differences in learning style (Omidvar & Tan, 2012), cultural sensitivity in course design (Nguyen, 2022) and perceived isolation when students feel their perspectives or experiences are not reflected or valued in the learning environment (Hansen-Brown et al., 2022). Ensuring active engagement and reasonable learning outcomes in such a environments requires distance educational systems that are not only adaptive but also culturally resonant (Dalle et al., 2024). Both embodied and virtual AI-Agents have great potential in this regard as they can facilitate personalized learning and adapt their interactions and content delivery to align with students' cultural context. In addition, Generative AI (GAI), such as, Large Language Models (LLMs) can amplify the potential for these culturally aware AI agents to address educational challenges due to their advanced capacity for understanding and generating contextually relevant content (Wang et al., 2024). This chapter reviews existing research and suggests the usage of culturally aware AI-Agents, powered by GAI, to foster engagement and improve learning outcomes in culturally diverse online higher education environments.
Effect of Performance Feedback Timing on Motor Learning for a Surgical Training Task
Objective: Robot-assisted minimally invasive surgery (RMIS) has become the gold standard for a variety of surgical procedures, but the optimal method of training surgeons for RMIS is unknown. We hypothesized that real-time, rather than post-task, error feedback would better increase learning speed and reduce errors. Methods: Forty-two surgical novices learned a virtual version of the ring-on-wire task, a canonical task in RMIS training. We investigated the impact of feedback timing with multi-sensory (haptic and visual) cues in three groups: (1) real-time error feedback, (2) trial replay with error feedback, and (3) no error feedback. Results: Participant performance was evaluated based on the accuracy of ring position and orientation during the task. Participants who received real-time feedback outperformed other groups in ring orientation. Additionally, participants who received feedback in replay outperformed participants who did not receive any error feedback on ring orientation during long, straight path sections. There were no significant differences between groups for ring position overall, but participants who received real-time feedback outperformed the other groups in positional accuracy on tightly curved path sections. Conclusion: The addition of real-time haptic and visual error feedback improves learning outcomes in a virtual surgical task over error feedback in replay or no error feedback at all. Significance: This work demonstrates that multi-sensory error feedback delivered in real time leads to better training outcomes as compared to the same feedback delivered after task completion. This novel method of training may enable surgical trainees to develop skills with greater speed and accuracy.
comment: Held at https://ieeexplore.ieee.org/document/11202637
Opti-Acoustic Scene Reconstruction in Highly Turbid Underwater Environments IROS 2025
Scene reconstruction is an essential capability for underwater robots navigating in close proximity to structures. Monocular vision-based reconstruction methods are unreliable in turbid waters and lack depth scale information. Sonars are robust to turbid water and non-uniform lighting conditions, however, they have low resolution and elevation ambiguity. This work proposes a real-time opti-acoustic scene reconstruction method that is specially optimized to work in turbid water. Our strategy avoids having to identify point features in visual data and instead identifies regions of interest in the data. We then match relevant regions in the image to corresponding sonar data. A reconstruction is obtained by leveraging range data from the sonar and elevation data from the camera image. Experimental comparisons against other vision-based and sonar-based approaches at varying turbidity levels, and field tests conducted in marina environments, validate the effectiveness of the proposed approach. We have made our code open-source to facilitate reproducibility and encourage community engagement.
comment: To appear at IROS 2025 in Hangzhou, China
Systems and Control (CS)
A 0.62 μW/sensor 82 fps Time-to-Digital Impedance Measurement IC with Unified Excitation/Readout Front-end for Large-Scale Piezo-Resistive Sensor Array
This paper presents a fast impedance measurement IC for large-scale piezo-resistive sensor array. It features a unified differential time-to-digital demodulation architecture that readout impedance directly through the excitation circuit. The proposed pre-saturation adaptive bias technique further improves power efficiency. The chip scans 253 sensors in 12.2 ms (82 fps) at 125 kHz, consuming 158 {\mu}W (7.5 nJ/sensor). With loads from 20 {\Omega} to 500 k{\Omega}, it achieves 0.5% error and up to 71.1 dB SNR.
Cryo-CMOS Antenna for Wireless Communications within a Quantum Computer Cryostat
Scaling quantum computers from a few qubits to large numbers remains one of the critical challenges in realizing practical quantum advantage. Multi-core quantum architectures have emerged as a promising solution, enabling scalability through distributed quantum processing units (QPUs) interconnected via classical and quantum links. However, the bottleneck of wired connections persists, as densely packed wired interconnects, both vertically across temperature stages and horizontally within the same layer, introduce spatial constraints, power dissipation, and latency, which could hinder performance as the number of QPUs increases. To overcome these limitations, this work proposes a cryo-compatible on-chip differential dipole antenna operating at 28 GHz to enable short-range wireless communication within a quantum computer cryostat. Temperature-dependent material properties are incorporated to accurately capture antenna behavior at 4 K. Moreover, by embedding the antenna in a realistic cryostat structure, we evaluate the feasibility of antenna operation within the cryogenic environment. The proposed antenna achieves a reflection coefficient of -20.8 dB in free space and -18.38 dB within the cryostat, demonstrating efficient impedance matching.
Efficient Force and Stiffness Prediction in Robotic Produce Handling with a Piezoresistive Pressure Sensor
Properly handling delicate produce with robotic manipulators is a major part of the future role of automation in agricultural harvesting and processing. Grasping with the correct amount of force is crucial in not only ensuring proper grip on the object, but also to avoid damaging or bruising the product. In this work, a flexible pressure sensor that is both low cost and easy to fabricate is integrated with robotic grippers for working with produce of varying shapes, sizes, and stiffnesses. The sensor is successfully integrated with both a rigid robotic gripper, as well as a pneumatically actuated soft finger. Furthermore, an algorithm is proposed for accelerated estimation of the steady-state value of the sensor output based on the transient response data, to enable real-time applications. The sensor is shown to be effective in incorporating feedback to correctly grasp objects of unknown sizes and stiffnesses. At the same time, the sensor provides estimates for these values which can be utilized for identification of qualities such as ripeness levels and bruising. It is also shown to be able to provide force feedback for objects of variable stiffnesses. This enables future use not only for produce identification, but also for tasks such as quality control and selective distribution based on ripeness levels.
comment: For supplementary videos, see https://drive.google.com/drive/folders/1jol-_z6gaUfjpL1Qi7EG420usTbVSodv?usp=sharing
Channel Estimation under Large Doppler Shifts in NOMA-Based Air-Ground Communications
This paper investigates a multiple antenna system with non-orthogonal multiple access (NOMA) for the exchange of air traffic management data between commercial aircraft pilots and ground-based air traffic controllers. While NOMA techniques enhance spectral efficiency, their application to aircraft communications is challenged by the high speed of the aircraft (up to 214 m/s) and the long communication ranges (up to 250 km), resulting in significant Doppler shifts and low signal-to-noise ratios, respectively. To accurately assess these challenges, we employ a realistic geometry-based stochastic air-ground channel model, derived from dedicated flight measurement campaigns. In this paper, multiple aircraft simultaneously transmit data to the ground station. We focus on the channel estimation problem at the ground station under high carrier frequency offsets and the effects of channel aging due to channel's time-varying nature. For the channel estimation problem, we compare the Zadoff-Chu sequences with time-division approach under varying carrier frequency offset pre-compensation accuracies at the aircraft transmitter. For the channel aging problem and performance evaluation of channel estimators, we compute the outage probability for both the zero-forcing detector and the minimum mean squared error detector with successive interference cancellation. The results show that the favorable channel estimator-detector combinations differ between the takeoff & landing phase and the enroute cruise phase of the flight, due to the distinct channel propagation characteristics of each phase.
comment: Submitted to IEEE Conference, 6 pages, 2 Figures
Data-driven learning of feedback maps for explicit robust predictive control: an approximation theoretic view
We establish an algorithm to learn feedback maps from data for a class of robust model predictive control (MPC) problems. The algorithm accounts for the approximation errors due to the learning directly at the synthesis stage, ensuring recursive feasibility by construction. The optimal control problem consists of a linear noisy dynamical system, a quadratic stage and quadratic terminal costs as the objective, and convex constraints on the state, control, and disturbance sequences; the control minimizes and the disturbance maximizes the objective. We proceed via two steps -- (a) Data generation: First, we reformulate the given minmax problem into a convex semi-infinite program and employ recently developed tools to solve it in an exact fashion on grid points of the state space to generate (state, action) data. (b) Learning approximate feedback maps: We employ a couple of approximation schemes that furnish tight approximations within preassigned uniform error bounds on the admissible state space to learn the unknown feedback policy. The stability of the closed-loop system under the approximate feedback policies is also guaranteed under a standard set of hypotheses. Two benchmark numerical examples are provided to illustrate the results.
comment: 27 pages; submitted
Quantifying the Impact of Missing Risk Markets for Decarbonized Power Systems with Long Duration Energy Storage
The transition to a fully decarbonised electricity system depends on integrating new technologies that ensure reliability alongside sustainability. However, missing risk markets hinder investment in reliability-enhancing technologies by exposing investors to revenue uncertainty. This study provides the first quantitative assessment of how missing risk markets affect investment decisions in power systems that depend on long-duration energy storage (LDES) for reliability. We develop a two-stage stochastic equilibrium model with risk-averse market participants, which independently sizes power and energy capacity. We apply the method to a case study of a deeply decarbonised power system in Great Britain. The results show that incomplete risk markets reduce social welfare, harm reliability, and discourage investment in LDES and other technologies with volatile revenue streams. Revenue volatility leads to substantial risk premiums and higher financing costs for LDES, creating a barrier to its large-scale deployment. These findings demonstrate the importance of policy mechanisms that hedge revenue risk to lower the cost of capital and accelerate investment in reliability-enhancing, zero-carbon technologies
Physics-Informed Neural Network Modeling of Vehicle Collision Dynamics in Precision Immobilization Technique Maneuvers
Accurate prediction of vehicle collision dynamics is crucial for advanced safety systems and post-impact control applications, yet existing methods face inherent trade-offs among computational efficiency, prediction accuracy, and data requirements. This paper proposes a dual Physics-Informed Neural Network framework addressing these challenges through two complementary networks. The first network integrates Gaussian Mixture Models with PINN architecture to learn impact force distributions from finite element analysis data while enforcing momentum conservation and energy consistency constraints. The second network employs an adaptive PINN with dynamic constraint weighting to predict post-collision vehicle dynamics, featuring an adaptive physics guard layer that prevents unrealistic predictions whil e preserving data-driven learning capabilities. The framework incorporates uncertainty quantification through time-varying parameters and enables rapid adaptation via fine-tuning strategies. Validation demonstrates significant improvements: the impact force model achieves relative errors below 15.0% for force prediction on finite element analysis (FEA) datasets, while the vehicle dynamics model reduces average trajectory prediction error by 63.6% compared to traditional four-degree-of-freedom models in scaled vehicle experiments. The integrated system maintains millisecond-level computational efficiency suitable for real-time applications while providing probabilistic confidence bounds essential for safety-critical control. Comprehensive validation through FEA simulation, dynamic modeling, and scaled vehicle experiments confirms the framework's effectiveness for Precision Immobilization Technique scenarios and general collision dynamics prediction.
On the Flexibility Potential of a Swiss Distribution Grid: Opportunities and Limitations
The growing integration of distributed renewable generation and the electrification of heating and transportation are rapidly increasing the number of flexible devices within modern distribution grids. Leveraging the aggregated flexibility of these small-scale distributed resources is essential to maintaining future grid-wide stability. This work uses the Swiss distribution grid of Walenstadt as a case study to provide insights into the aggregated flexibility potential of distribution grids. It demonstrates that incorporating devices such as heat pumps and photovoltaic systems significantly enhances distribution grid flexibility. It investigates the time-varying nature of aggregated flexibility and highlights how it can vary seasonally. Furthermore, simulations of future scenarios reveal that aggregated flexibility does not increase linearly or monotonically with higher levels of flexible device penetration. This is primarily due to the overloading of individual feeders, which underscores the impact of grid topology and network constraints on the aggregated flexibility potential.
Multipolar dynamics of social segregation: Data validation on Swedish vaccination statistics
We perform a validation analysis on the multipolar model of opinion dynamics. A general methodology for using the model on datasets of two correlated variables is proposed and tested using data on the relationship between COVID-19 vaccination rates and political participation in Sweden. The model is shown to successfully capture the opinion segregation demonstrated by the data and spatial correlation of biases is demonstrated as necessary for the result. A mixing of the biases on the other hand leads to a more homogeneous opinion distribution, and greater penetration of the majority opinion, which here corresponds to a decision to vote or vaccinate.
comment: Presented at CoDIT 2025
Performance Comparison of Gate-Based and Adiabatic Quantum Computing for Power Flow Analysis SC
In this paper, we present the first direct comparison between gate-based quantum computing (GQC) and adiabatic quantum computing (AQC) for solving the AC power flow (PF) equations. Building on the Adiabatic Quantum Power Flow (AQPF) algorithm originally designed for annealing platforms, we adapt it to the Quantum Approximate Optimization Algorithm (QAOA). The PF equations are reformulated as a combinatorial optimization problem. Numerical experiments on a 4-bus test system assess solution accuracy and computational time. Results from QAOA are benchmarked against those obtained using D-Wave's Advantage system and Fujitsu's latest generation Digital Annealer, i.e., Quantum-Inspired Integrated Optimization software (QIIO). The findings provide quantitative insights into the performance trade-offs, scalability, and practical viability of GQC versus AQC paradigms for PF analysis, highlighting the potential of quantum algorithms to address the computational challenges associated with modern electricity networks in the Noisy Intermediate-Scale Quantum (NISQ).
comment: 7 pages, 1 figure, 4 tables, submitted to PSCC 2026
Partitioned Scheduling for DAG Tasks Considering Probabilistic Execution Time
Autonomous driving systems, critical for safety, require real-time guarantees and can be modeled as DAGs. Their acceleration features, such as caches and pipelining, often result in execution times below the worst-case. Thus, a probabilistic approach ensuring constraint satisfaction within a probability threshold is more suitable than worst-case guarantees for these systems. This paper considers probabilistic guarantees for DAG tasks by utilizing the results of probabilistic guarantees for single processors, which have been relatively more advanced than those for multi-core processors. This paper proposes a task set partitioning method that guarantees schedulability under the partitioned scheduling. The evaluation on randomly generated DAG task sets demonstrates that the proposed method schedules more task sets with a smaller mean analysis time compared to existing probabilistic schedulability analysis for DAGs. The evaluation also compares four bin-packing heuristics, revealing Item-Centric Worst-Fit-Decreasing schedules the most task sets.
Safe Driving in Occluded Environments
Ensuring safe autonomous driving in the presence of occlusions poses a significant challenge in its policy design. While existing model-driven control techniques based on set invariance can handle visible risks, occlusions create latent risks in which safety-critical states are not observable. Data-driven techniques also struggle to handle latent risks because direct mappings from risk-critical objects in sensor inputs to safe actions cannot be learned without visible risk-critical objects. Motivated by these challenges, in this paper, we propose a probabilistic safety certificate for latent risk. Our key technical enabler is the application of probabilistic invariance: It relaxes the strict observability requirements imposed by set-invariance methods that demand the knowledge of risk-critical states. The proposed techniques provide linear action constraints that confine the latent risk probability within tolerance. Such constraints can be integrated into model predictive controllers or embedded in data-driven policies to mitigate latent risks. The proposed method is tested using the CARLA simulator and compared with a few existing techniques. The theoretical and empirical analysis jointly demonstrate that the proposed methods assure long-term safety in real-time control in occluded environments without being overly conservative and with transparency to exposed risks.
Decision-dependent Robust Charging Infrastructure Planning for Light-duty Truck Electrification at Industrial Sites: Scheduling and Abandonment
Many industrial sites rely on diesel-powered light-duty trucks to transport workers and small-scale facilities, which has resulted in a significant amount of greenhouse emissions (GHGs). To address this, we developed a two-stage robust charging infrastructure planning model for electrifying light-duty trucks at industrial sites. The model is formulated as a mixed-integer linear programming (MILP) that optimizes the charging infrastructure, selected from multiple charger types and potential locations, and determines opportunity charging schedules for each truck based on the chosen infrastructure. Given the strict stopping points and schedules at industrial sites, we introduced a scheduling problem with abandonment, where trucks forgo charging if their waiting times exceed a maximum threshold. We also further incorporated the impacts of overnight charging and range anxiety on waiting and abandonment behaviors. To represent the stochastic and heterogeneous parking durations of trucks, we constructed a decision-dependent robust uncertainty set in which parking time variability flexibly depends on charging choices. We applied the model in a case study of an open-pit mining site, which plans charger installations in eight zones and schedules a fleet of around 200 trucks. By decomposing the problem into monthly subproblems and using heuristic approaches, for the whole-year dataset, the model achieves an optimality gap of less than 0.1 % within a reasonable computation time under diverse uncertainty scenarios.
Time-Varying Optimization for Streaming Data Via Temporal Weighting
Classical optimization theory deals with fixed, time-invariant objective functions. However, time-varying optimization has emerged as an important subject for decision-making in dynamic environments. In this work, we study the problem of learning from streaming data through a time-varying optimization lens. Unlike prior works that focus on generic formulations, we introduce a structured, \emph{weight-based} formulation that explicitly captures the streaming-data origin of the time-varying objective, where at each time step, an agent aims to minimize a weighted average loss over all the past data samples. We focus on two specific weighting strategies: (1) uniform weights, which treat all samples equally, and (2) discounted weights, which geometrically decay the influence of older data. For both schemes, we derive tight bounds on the ``tracking error'' (TE), defined as the deviation between the model parameter and the time-varying optimum at a given time step, under gradient descent (GD) updates. We show that under uniform weighting, the TE vanishes asymptotically with a $\mathcal{O}(1/t)$ decay rate, whereas discounted weighting incurs a nonzero error floor controlled by the discount factor and the number of gradient updates performed at each time step. Our theoretical findings are validated through numerical simulations.
comment: Accepted at IEEE Asilomar, 2025
Learning Wireless Interference Patterns: Decoupled GNN for Throughput Prediction in Heterogeneous Multi-Hop p-CSMA Networks
The p-persistent CSMA protocol is central to random-access MAC analysis, but predicting saturation throughput in heterogeneous multi-hop wireless networks remains a hard problem. Simplified models that assume a single, shared interference domain can underestimate throughput by 48--62\% in sparse topologies. Exact Markov-chain analyses are accurate but scale exponentially in computation time, making them impractical for large networks. These computational barriers motivate structural machine learning approaches like GNNs for scalable throughput prediction in general network topologies. Yet off-the-shelf GNNs struggle here: a standard GCN yields 63.94\% normalized mean absolute error (NMAE) on heterogeneous networks because symmetric normalization conflates a node's direct interference with higher-order, cascading effects that pertain to how interference propagates over the network graph. Building on these insights, we propose the Decoupled Graph Convolutional Network (D-GCN), a novel architecture that explicitly separates processing of a node's own transmission probability from neighbor interference effects. D-GCN replaces mean aggregation with learnable attention, yielding interpretable, per-neighbor contribution weights while capturing complex multihop interference patterns. D-GCN attains 3.3\% NMAE, outperforms strong baselines, remains tractable even when exact analytical methods become computationally infeasible, and enables gradient-based network optimization that achieves within 1\% of theoretical optima.
Laser Fault Injection in Memristor-Based Accelerators for AI/ML and Neuromorphic Computing
Memristive crossbar arrays (MCA) are emerging as efficient building blocks for in-memory computing and neuromorphic hardware due to their high density and parallel analog matrix-vector multiplication capabilities. However, the physical properties of their nonvolatile memory elements introduce new attack surfaces, particularly under fault injection scenarios. This work explores Laser Fault Injection as a means of inducing analog perturbations in MCA-based architectures. We present a detailed threat model in which adversaries target memristive cells to subtly alter their physical properties or outputs using laser beams. Through HSPICE simulations of a large MCA on 45 nm CMOS tech. node, we show how laser-induced photocurrent manifests in output current distributions, enabling differential fault analysis to infer internal weights with up to 99.7% accuracy, replicate the model, and compromise computational integrity through targeted weight alterations by approximately 143%.
comment: 3 pages, 4 figures
Resource-Aware Stealthy Attacks in Vehicle Platoons
Connected and Autonomous Vehicles (CAVs) are transforming modern transportation by enabling cooperative applications such as vehicle platooning, where multiple vehicles travel in close formation to improve efficiency and safety. However, the heavy reliance on inter-vehicle communication makes platoons highly susceptible to attacks, where even subtle manipulations can escalate into severe physical consequences. While existing research has largely focused on defending against attacks, far less attention has been given to stealthy adversaries that aim to covertly manipulate platoon behavior. This paper introduces a new perspective on the attack design problem by demonstrating how attackers can guide platoons toward their own desired trajectories while remaining undetected. We outline conditions under which such attacks are feasible, analyze their dependence on communication topologies and control protocols, and investigate the resources required by the attacker. By characterizing the resources needed to launch stealthy attacks, we address system vulnerabilities and informing the design of resilient countermeasures. Our findings reveal critical weaknesses in current platoon architectures and anomaly detection mechanisms and provide methods to develop more secure and trustworthy CAV systems.
comment: 13 pages, 8 figures
Belief Space Control of Safety-Critical Systems Under State-Dependent Measurement Noise
Safety-critical control is imperative for deploying autonomous systems in the real world. Control Barrier Functions (CBFs) offer strong safety guarantees when accurate system and sensor models are available. However, widely used additive, fixed-noise models are not representative of complex sensor modalities with state-dependent error characteristics. Although CBFs have been designed to mitigate uncertainty using fixed worst-case bounds on measurement noise, this approach can lead to overly-conservative control. To solve this problem, we extend the Belief Control Barrier Function (BCBF) framework to accommodate state-dependent measurement noise via the Generalized Extended Kalman Filter (GEKF) algorithm, which models measurement noise as a linear function of the state. Using the original BCBF framework as baseline, we demonstrate the performance of the BCBF-GEKF approach through simulation results on a 1D single integrator setpoint tracking scenario and 2D unicycle kinematics trajectory tracking scenario. Our results confirm that the BCBF-GEKF approach offers less conservative control with greater safety.
comment: Preprint - Submitted to the 2026 American Control Conference
DiffOPF: Diffusion Solver for Optimal Power Flow
The optimal power flow (OPF) is a multi-valued, non-convex mapping from loads to dispatch setpoints. The variability of system parameters (e.g., admittances, topology) further contributes to the multiplicity of dispatch setpoints for a given load. Existing deep learning OPF solvers are single-valued and thus fail to capture the variability of system parameters unless fully represented in the feature space, which is prohibitive. To solve this problem, we introduce a diffusion-based OPF solver, termed \textit{DiffOPF}, that treats OPF as a conditional sampling problem. The solver learns the joint distribution of loads and dispatch setpoints from operational history, and returns the marginal dispatch distributions conditioned on loads. Unlike single-valued solvers, DiffOPF enables sampling statistically credible warm starts with favorable cost and constraint satisfaction trade-offs. We explore the sample complexity of DiffOPF to ensure the OPF solution within a prescribed distance from the optimization-based solution, and verify this experimentally on power system benchmarks.
comment: 7 pages, 4 figures, 2 tables
Dual Detection Framework for Faults and Integrity Attacks in Cyber-Physical Control Systems
Anomaly detection plays a vital role in the security and safety of cyber-physical control systems, and accurately distinguishing between different anomaly types is crucial for system recovery and mitigation. This study proposes a dual detection framework for anomaly detection and discrimination. By leveraging the dynamic characteristics of control loops and the stealthiness features of integrity attacks, the closed-loop stealthiness condition is first derived, and two dedicated detectors are designed and deployed on the controller side and the plant side, respectively, enabling joint plant fault and cyber attack detection. Moreover, by jointly analyzing the residual response of the two detectors corresponding to different anomalies, it is proved that the proposed method can distinguish between faults and integrity attacks due to the detectors' individual residual spaces. According to the detector's residual space, the fault and attack detection performance is further improved by a two-stage optimization scheme. Simulation results validate the effectiveness of the proposed approach.
Multi-Period Sparse Optimization for Proactive Grid Blackout Diagnosis
Existing or planned power grids need to evaluate survivability under extreme events, like a number of peak load overloading conditions, which could possibly cause system collapses (i.e. blackouts). For realistic extreme events that are correlated or share similar patterns, it is reasonable to expect that the dominant vulnerability or failure sources behind them share the same locations but with different severity. Early warning diagnosis that proactively identifies the key vulnerabilities responsible for a number of system collapses of interest can significantly enhance resilience. This paper proposes a multi-period sparse optimization method, enabling the discovery of {persistent failure sources} across a sequence of collapsed systems with increasing system stress, such as rising demand or worsening contingencies. This work defines persistency and efficiently integrates persistency constraints to capture the ``hidden'' evolving vulnerabilities. Circuit-theory based power flow formulations and circuit-inspired optimization heuristics are used to facilitate the scalability of the method. Experiments on benchmark systems show that the method reliably tracks persistent vulnerability locations under increasing load stress, and solves with scalability to large systems ({on average} taking {around} 200 s per scenario on 2000+ bus systems).
Cyber-Resilient System Identification for Power Grid through Bayesian Integration
Power grids increasingly need real-time situational awareness under the ever-evolving cyberthreat landscape. Advances in snapshot-based system identification approaches have enabled accurately estimating states and topology from a snapshot of measurement data, under random bad data and topology errors. However, modern interactive, targeted false data can stay undetectable to these methods, and significantly compromise estimation accuracy. This work advances system identification that combines snapshot-based method with time-series model via Bayesian Integration, to advance cyber resiliency against both random and targeted false data. Using a distance-based time-series model, this work can leverage historical data of different distributions induced by changes in grid topology and other settings. The normal system behavior captured from historical data is integrated into system identification through a Bayesian treatment, to make solutions robust to targeted false data. We experiment on mixed random anomalies (bad data, topology error) and targeted false data injection attack (FDIA) to demonstrate our method's 1) cyber resilience: achieving over 70% reduction in estimation error under FDIA; 2) anomalous data identification: being able to alarm and locate anomalous data; 3) almost linear scalability: achieving comparable speed with the snapshot-based baseline, both taking <1min per time tick on the large 2,383-bus system using a laptop CPU.
The Algorithmic Regulator
The regulator theorem states that, under certain conditions, any optimal controller must embody a model of the system it regulates, grounding the idea that controllers embed, explicitly or implicitly, internal models of the controlled. This principle underpins neuroscience and predictive brain theories like the Free-Energy Principle or Kolmogorov/Algorithmic Agent theory. However, the theorem is only proven in limited settings. Here, we treat the deterministic, closed, coupled world-regulator system $(W,R)$ as a single self-delimiting program $p$ via a constant-size wrapper that produces the world output string~$x$ fed to the regulator. We analyze regulation from the viewpoint of the algorithmic complexity of the output, $K(x)$. We define $R$ to be a \emph{good algorithmic regulator} if it \emph{reduces} the algorithmic complexity of the readout relative to a null (unregulated) baseline $\varnothing$, i.e., \[ \Delta = K\big(O_{W,\varnothing}\big) - K\big(O_{W,R}\big) > 0. \] We then prove that the larger $\Delta$ is, the more world-regulator pairs with high mutual algorithmic information are favored. More precisely, a complexity gap $\Delta > 0$ yields \[ \Pr\big((W,R)\mid x\big) \le C\,2^{\,M(W{:}R)}\,2^{-\Delta}, \] making low $M(W{:}R)$ exponentially unlikely as $\Delta$ grows. This is an AIT version of the idea that ``the regulator contains a model of the world.'' The framework is distribution-free, applies to individual sequences, and complements the Internal Model Principle. Beyond this necessity claim, the same coding-theorem calculus singles out a \emph{canonical scalar objective} and implicates a \emph{planner}. On the realized episode, a regulator behaves \emph{as if} it minimized the conditional description length of the readout.
comment: 2 Figures
The value of storage in electricity distribution: The role of markets
Electricity distribution companies deploy battery storage to defer grid upgrades by reducing peak demand. In deregulated jurisdictions, such storage often sits idle because regulatory constraints bar participation in electricity markets. Here, we develop an optimization framework that, to our knowledge, provides the first formal model of market participation constraints within storage investment and operation planning. Applying the framework to a Massachusetts case study, we find that market participation could deliver similar savings as peak demand reduction. Under current conditions, market participation does not increase storage investment, but at very low storage costs, could incentivize deployment beyond local distribution needs. This might run contrary to the separation of distribution from generation in deregulated markets. Our framework can identify investment levels appropriate for local distribution needs.
High-Parallel FPGA-Based Discrete Simulated Bifurcation for Large-Scale Optimization
Combinatorial Optimization (CO) problems exhibit exponential complexity, making their resolution challenging. Simulated Adiabatic Bifurcation (aSB) is a quantum-inspired algorithm to obtain approximate solutions to largescale CO problems written in the Ising form. It explores the solution space by emulating the adiabatic evolution of a network of Kerr-nonlinear parametric oscillators (KPOs), where each oscillator represents a variable in the problem. The optimal solution corresponds to the ground state of this system. A key advantage of this approach is the possibility of updating multiple variables simultaneously, making it particularly suited for hardware implementation. To enhance solution quality and convergence speed, variations of the algorithm have been proposed in the literature, including ballistic (bSB), discrete (dSB), and thermal (HbSB) versions. In this work, we have comprehensively analyzed dSB, bSB, and HbSB using dedicated software models, evaluating the feasibility of using a fixed-point representation for hardware implementation. We then present an opensource hardware architecture implementing the dSB algorithm for Field-Programmable Gate Arrays (FPGAs). The design allows users to adjust the degree of algorithmic parallelization based on their specific requirements. A proof-of-concept implementation that solves 256-variable problems was achieved on an AMD Kria KV260 SoM, a low-tier FPGA, validated using well-known max-cut and knapsack problems.
Hybrid Terrain-Aware Path Planning: Integrating VD-RRT* Exploration and VD-D* Lite Repair
Autonomous ground vehicles operating off-road must plan curvature-feasible paths while accounting for spatially varying soil strength and slope hazards in real time. We present a continuous state--cost metric that combines a Bekker pressure--sinkage model with elevation-derived slope and attitude penalties. The resulting terrain cost field is analytic, bounded, and monotonic in soil modulus and slope, ensuring well-posed discretization and stable updates under sensor noise. This metric is evaluated on a lattice with exact steering primitives: Dubins and Reeds--Shepp motions for differential drive and time-parameterized bicycle arcs for Ackermann steering. Global exploration is performed using Vehicle-Dynamics RRT\(^{*}\), while local repair is managed by Vehicle-Dynamics D\(^{*}\) Lite, enabling millisecond-scale replanning without heuristic smoothing. By separating the terrain--vehicle model from the planner, the framework provides a reusable basis for deterministic, sampling-based, or learning-driven planning in deformable terrain. Hardware trials on an off-road platform demonstrate real-time navigation across soft soil and slope transitions, supporting reliable autonomy in unstructured environments.
Product Digital Twin Supporting End-of-life Phase of Electric Vehicle Batteries Utilizing Product-Process-Resource Asset Network
In a circular economy, products in their end-of-life phase should be either remanufactured or recycled. Both of these processes are crucial for sustainability and environmental conservation. However, manufacturers frequently do not support these processes enough in terms of not sharing relevant data about the products nor their (re-)manufacturing processes. This paper proposes to accompany each product with a digital twin technology, specifically the Product Digital Twin (PDT), which can carry information for facilitating and optimizing production and remanufacturing processes. This paper introduces a knowledge representation called Bi-Flow Product-Process-Resource Asset Network (Bi-PAN). Bi-PAN extends a well-proven Product-Process-Resource Asset Network (PAN) paradigm by integrating both assembly and disassembly workflows into a single information model. Such networks enable capturing relevant relationships across products, production resources, manufacturing processes, and specific production operations that have to be done in the manufacturing phase of a product. The proposed approach is demonstrated in a use-case of disassembling electric vehicle (EV) batteries. By utilizing PDTs with Bi-PAN knowledge models, challenges associated with disassembling of EV batteries can be solved flexibly and efficiently for various battery types, enhancing the sustainability of the EV battery life-cycle management.
comment: \copyright 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
A Personalized Data-Driven Generative Model of Human Repetitive Motion
The deployment of autonomous virtual avatars (in extended reality) and robots in human group activities -- such as rehabilitation therapy, sports, and manufacturing -- is expected to increase as these technologies become more pervasive. Designing cognitive architectures and control strategies to drive these agents requires realistic models of human motion. Furthermore, recent research has shown that each person exhibits a unique velocity signature, highlighting how individual motor behaviors are both rich in variability and internally consistent. However, existing models only provide simplified descriptions of human motor behavior, hindering the development of effective cognitive architectures. In this work, we first show that motion amplitude provides a valid and complementary characterization of individual motor signatures. Then, we propose a fully data-driven approach, based on long short-term memory neural networks, to generate original motion that captures the unique features of specific individuals. We validate the architecture using real human data from participants performing spontaneous oscillatory motion. Extensive analyses show that state-of-the-art Kuramoto-like models fail to replicate individual motor signatures, whereas our model accurately reproduces the velocity distribution and amplitude envelopes of the individual it was trained on, while remaining distinct from others.
comment: 12 pages, 6 figures
Addressing Model Inaccuracies in Transmission Network Reconfiguration via Diverse Alternatives
The ongoing energy transition places significant pressure on the transmission network due to increasing shares of renewables and electrification. To mitigate grid congestion, transmission system operators need decision support tools to suggest remedial actions, such as transmission network reconfigurations or redispatch. However, these tools are prone to model inaccuracies and may not provide relevant suggestions with regard to important unmodeled constraints or operator preferences. We propose a human-in-the-loop modeling-to-generate alternatives (HITL-MGA) approach to address these shortcomings by generating diverse topology reconfiguration alternatives. Case studies on the IEEE 57-bus and IEEE 118-bus systems show the method can leverage expert feedback and improve the quality of the suggested topology reconfigurations.
comment: This preprint is currently under peer review
Dual-Regularized Riccati Recursions for Interior-Point Optimal Control
We derive closed-form extensions of Riccati's recursions (both sequential and parallel) for solving dual-regularized LQR problems. We show how these methods can be used to solve general constrained, non-convex, discrete-time optimal control problems via a regularized interior point method, while guaranteeing that each step is a descent direction of an Augmented Barrier-Lagrangian merit function. We provide MIT-licensed implementations of our methods in C++ and JAX.
On the Fast Nonlinear Filtering with Matrix Fisher Distributions on SO(3)
This paper addresses two interrelated problems: the nonlinear filtering mechanism and fast attitude filtering with the matrix Fisher distribution (MFD) on the special orthogonal group. By analyzing the distribution evolution along Bayes' rule, we reveal two essential properties that enhance the performance of Bayesian attitude filters with MFDs, particularly in challenging conditions from a theoretical viewpoint. Benefiting from the new understanding of the filtering mechanism associated with MFDs, two closed-form filters with MFDs are then proposed. The filters avoids the burdensome computations in previous MFD-based filters by introducing linearized error systems with invariant errors but retaining the two advantageous properties. Numerical simulations demonstrate that the proposed filters are more accurate than the classic invariant Kalman filter. Besides, it is also as accurate as recent MFD-based Bayesian filters in challenging circumstances with large initial error and measurement uncertainty, but it consumes far less computation time (about 1/5 to 1/100 of previous MFD-based attitude filters).
Design and benchmarking of a two degree of freedom tendon driver unit for cable-driven wearable technologies
Exosuits have recently been developed as alternatives to rigid exoskeletons and are increasingly adopted for both upper and lower limb therapy and assistance in clinical and home environments. Many cable-driven exosuits have been developed but little has been published on their electromechanical designs and performance. Therefore, this paper presents a comprehensive design and performance analysis of a two degree of freedom tendon driver unit (TDU) for cable-driven wearable exosuits. Detailed methodologies are presented to benchmark the functionality of the TDU. A static torque output test compares the commanded and measured torques. A velocity control test evaluates the attenuation and phase shift across velocities. A noise test evaluates how loud the TDU is for the wearer under different speeds. A thermal stress test captures the cooling performance of the TDU to ensure safe operation at higher loads. Finally, a battery endurance test evaluates the runtime of the TDU under various loading conditions to inform the usable time. To demonstrate these tests, a modular TDU system for cable-driven applications is introduced, which allows components such as motors, pulleys, and sensors to be adapted based on the requirements of the intended application. By sharing detailed methodologies and performance results, this study aims to provide a TDU design that may be leveraged by others and resources for researchers and engineers to better document the capabilities of their TDU designs.
Geometric Backstepping Control of Omnidirectional Tiltrotors Incorporating Servo-Rotor Dynamics for Robustness against Sudden Disturbances
This work presents a geometric backstepping controller for a variable-tilt omnidirectional multirotor that explicitly accounts for both servo and rotor dynamics. Considering actuator dynamics is essential for more effective and reliable operation, particularly during aggressive flight maneuvers or recovery from sudden disturbances. While prior studies have investigated actuator-aware control for conventional and fixed-tilt multirotors, these approaches rely on linear relationships between actuator input and wrench, which cannot capture the nonlinearities induced by variable tilt angles. In this work, we exploit the cascade structure between the rigid-body dynamics of the multirotor and its nonlinear actuator dynamics to design the proposed backstepping controller and establish exponential stability of the overall system. Furthermore, we reveal parametric uncertainty in the actuator model through experiments, and we demonstrate that the proposed controller remains robust against such uncertainty. The controller was compared against a baseline that does not account for actuator dynamics across three experimental scenarios: fast translational tracking, rapid rotational tracking, and recovery from sudden disturbance. The proposed method consistently achieved better tracking performance, and notably, while the baseline diverged and crashed during the fastest translational trajectory tracking and the recovery experiment, the proposed controller maintained stability and successfully completed the tasks, thereby demonstrating its effectiveness.
A Verification Methodology for Safety Assurance of Robotic Autonomous Systems
Autonomous robots deployed in shared human environments, such as agricultural settings, require rigorous safety assurance to meet both functional reliability and regulatory compliance. These systems must operate in dynamic, unstructured environments, interact safely with humans, and respond effectively to a wide range of potential hazards. This paper presents a verification workflow for the safety assurance of an autonomous agricultural robot, covering the entire development life-cycle, from concept study and design to runtime verification. The outlined methodology begins with a systematic hazard analysis and risk assessment to identify potential risks and derive corresponding safety requirements. A formal model of the safety controller is then developed to capture its behaviour and verify that the controller satisfies the specified safety properties with respect to these requirements. The proposed approach is demonstrated on a field robot operating in an agricultural setting. The results show that the methodology can be effectively used to verify safety-critical properties and facilitate the early identification of design issues, contributing to the development of safer robots and autonomous systems.
comment: In Proc. of the 26th TAROS (Towards Autonomous Robotic Systems) Conference, York, UK, August, 2025
Multi Timescale Stochastic Approximation: Stability and Convergence
This paper presents the first sufficient conditions that guarantee the stability and almost sure convergence of multi-timescale stochastic approximation (SA) iterates. It extends the existing results on one-timescale and two-timescale SA iterates to general $N$-timescale stochastic recursions, for any $N \geq 1$, using the ordinary differential equation (ODE) method. As an application, we study SA algorithms augmented with heavy-ball momentum in the context of Gradient Temporal Difference (GTD) learning. The added momentum introduces an auxiliary state evolving on an intermediate timescale, yielding a three-timescale recursion. We show that with appropriate momentum parameters, the scheme fits within our framework and converges almost surely to the same fixed point as baseline GTD. The stability and convergence of all iterates including the momentum state follow from our main results without ad hoc bounds. We then study off-policy actor-critic algorithms with a baseline learner, actor, and critic updated on separate timescales. In contrast to prior work, we eliminate projection steps from the actor update and instead use our framework to guarantee stability and almost sure convergence of all components. Finally, we extend the analysis to constrained policy optimization in the average reward setting, where the actor, critic, and dual variables evolve on three distinct timescales, and we verify that the resulting dynamics satisfy the conditions of our general theorem. These examples show how diverse reinforcement learning algorithms covering momentum acceleration, off-policy learning, and primal-dual methods-fit naturally into the proposed multi-timescale framework.
comment: arXiv admin note: text overlap with arXiv:2111.11004, Added an application to the 4-Timescale case
Learning Power Flow with Confidence: A Probabilistic Guarantee Framework for Voltage Risk
The absence of formal performance guarantees in machine learning (ML) has limited its adoption for safety-critical power system applications, where confidence and interpretability are as vital as accuracy. In this work, we present a probabilistic guarantee for power flow learning and voltage risk estimation, derived through the framework of Gaussian Process (GP) regression. Specifically, we establish a bound on the expected estimation error that connects the GP's predictive variance to confidence in voltage risk estimates, ensuring statistical equivalence with Monte Carlo-based ACPF risk quantification. To enhance model learnability in the low-data regime, we first design the Vertex-Degree Kernel (VDK), a topology-aware additive kernel that decomposes voltage-load interactions into local neighborhoods for efficient large-scale learning. Building on this, we introduce a network-swipe active learning (AL) algorithm that adaptively samples informative operating points and provides a principled stopping criterion without requiring out-of-sample validation. Together, these developments mitigate the principal bottleneck of ML-based power flow-its lack of guaranteed reliability-by combining data efficiency with analytical assurance. Empirical evaluations across IEEE 118-, 500-, and 1354-bus systems confirm that the proposed VDK-GP achieves mean absolute voltage errors below 1E-03 p.u., reproduces Monte Carlo-level voltage risk estimates with 15x fewer ACPF computations, and achieves over 120x reduction in evaluation time while conservatively bounding violation probabilities.
comment: 10 pages
Convergence and sample complexity of natural policy gradient primal-dual methods for constrained MDPs NeurIPS 2020
We study the sequential decision making problem of maximizing the expected total reward while satisfying a constraint on the expected total utility. We employ the natural policy gradient method to solve the discounted infinite-horizon optimal control problem for Constrained Markov Decision Processes (constrained MDPs). Specifically, we propose a new Natural Policy Gradient Primal-Dual (NPG-PD) method that updates the primal variable via natural policy gradient ascent and the dual variable via projected subgradient descent. Although the underlying maximization involves a nonconcave objective function and a nonconvex constraint set, under the softmax policy parametrization, we prove that our method achieves global convergence with sublinear rates regarding both the optimality gap and the constraint violation. Such convergence is independent of the size of the state-action space, i.e., it is~dimension-free. Furthermore, for log-linear and general smooth policy parametrizations, we establish sublinear convergence rates up to a function approximation error caused by restricted policy parametrization. We also provide convergence and finite-sample complexity guarantees for two sample-based NPG-PD algorithms. We use a set of computational experiments to showcase the effectiveness of our approach.
comment: 76 pages, 4 figures, 2 tables; Journal version of the NeurIPS 2020 paper; Accepted to JMLR
Tiny Learning-Based MPC for Multirotors: Solver-Aware Learning for Efficient Embedded Predictive Control
Tiny aerial robots hold great promise for applications such as environmental monitoring and search-and-rescue, yet face significant control challenges due to limited onboard computing power and nonlinear dynamics. Model Predictive Control (MPC) enables agile trajectory tracking and constraint handling but depends on an accurate dynamics model. While existing Learning-Based (LB) MPC methods, such as Gaussian Process (GP) MPC, enhance performance by learning residual dynamics, their high computational cost restricts onboard deployment on tiny robots. This paper introduces Tiny LB MPC, a co-designed MPC framework and optimization solver for resource-constrained micro multirotor platforms. The proposed approach achieves 100 Hz control on a Crazyflie 2.1 equipped with a Teensy 4.0 microcontroller, demonstrating a 43% average improvement in tracking performance over existing embedded MPC methods under model uncertainty, and achieving the first onboard implementation of LB MPC on a 53 g multirotor.
A Faster and More Reliable Middleware for Autonomous Driving Systems
Ensuring safety in high-speed autonomous vehicles requires rapid control loops and tightly bounded delays from perception to actuation. Many open-source autonomy systems rely on ROS 2 middleware; when multiple sensor and control nodes share one compute unit, ROS 2 and its DDS transports add significant (de)serialization, copying, and discovery overheads, shrinking the available time budget. We present Sensor-in-Memory (SIM), a shared-memory transport designed for intra-host pipelines in autonomous vehicles. SIM keeps sensor data in native memory layouts (e.g., cv::Mat, PCL), uses lock-free bounded double buffers that overwrite old data to prioritize freshness, and integrates into ROS 2 nodes with four lines of code. Unlike traditional middleware, SIM operates beside ROS 2 and is optimized for applications where data freshness and minimal latency outweigh guaranteed completeness. SIM provides sequence numbers, a writer heartbeat, and optional checksums to ensure ordering, liveness, and basic integrity. On an NVIDIA Jetson Orin Nano, SIM reduces data-transport latency by up to 98% compared to ROS 2 zero-copy transports such as FastRTPS and Zenoh, lowers mean latency by about 95%, and narrows 95th/99th-percentile tail latencies by around 96%. In tests on a production-ready Level 4 vehicle running Autoware.Universe, SIM increased localization frequency from 7.5 Hz to 9.5 Hz. Applied across all latency-critical modules, SIM cut average perception-to-decision latency from 521.91 ms to 290.26 ms, reducing emergency braking distance at 40 mph (64 km/h) on dry concrete by 13.6 ft (4.14 m).
comment: 8 pages,7 figures, 8 tables
An AI-Driven Multimodal Smart Home Platform for Continuous Monitoring and Assistance in Post-Stroke Motor Impairment
At-home rehabilitation for post-stroke patients presents significant challenges, as continuous, personalized care is often limited outside clinical settings. Moreover, the lack of integrated solutions capable of simultaneously monitoring motor recovery and providing intelligent assistance in home environments hampers rehabilitation outcomes. Here, we present a multimodal smart home platform designed for continuous, at-home rehabilitation of post-stroke patients, integrating wearable sensing, ambient monitoring, and adaptive automation. A plantar pressure insole equipped with a machine learning pipeline classifies users into motor recovery stages with up to 94\% accuracy, enabling quantitative tracking of walking patterns during daily activities. An optional head-mounted eye-tracking module, together with ambient sensors such as cameras and microphones, supports seamless hands-free control of household devices with a 100\% success rate and sub-second response time. These data streams are fused locally via a hierarchical Internet of Things (IoT) architecture, ensuring low latency and data privacy. An embedded large language model (LLM) agent, Auto-Care, continuously interprets multimodal data to provide real-time interventions -- issuing personalized reminders, adjusting environmental conditions, and notifying caregivers. Implemented in a post-stroke context, this integrated smart home platform increased mean user satisfaction from 3.9 $\pm$ 0.8 in conventional home environments to 8.4 $\pm$ 0.6 with the full system ($n=20$). Beyond stroke, the system offers a scalable, patient-centered framework with potential for long-term use in broader neurorehabilitation and aging-in-place applications.
comment: 5 figures, 41 references
Electromagnetically Reconfigurable Fluid Antenna System for Wireless Communications: Design, Modeling, Algorithm, Fabrication, and Experiment
This paper presents the concept, design, channel modeling, beamforming algorithm development, prototype fabrication, and experimental measurement of an electromagnetically reconfigurable fluid antenna system (ER-FAS), in which each FAS array element features electromagnetic (EM) reconfigurability. Unlike most existing FAS works that investigate spatial reconfigurability by adjusting the position and/or orientation of array elements, the proposed ER-FAS enables direct control over the EM characteristics of each element, allowing for dynamic radiation pattern reconfigurability. Specifically, a novel ER-FAS architecture leveraging software-controlled fluidics is proposed, and corresponding wireless channel models are established. Based on this ER-FAS channel model, a low-complexity greedy beamforming algorithm is developed to jointly optimize the analog phase shift and the radiation state of each array element. The accuracy of the ER-FAS channel model and the effectiveness of the beamforming algorithm are validated through (i) full-wave EM simulations and (ii) numerical spectral efficiency evaluations. These results confirm that the proposed ER-FAS significantly enhances spectral efficiency in both near-field and far-field scenarios compared to conventional antenna arrays. To further validate this design, we fabricate prototypes for both the ER-FAS element and array, using Galinstan liquid metal alloy, fluid silver paste, and software-controlled fluidic channels. The simulation results are experimentally validated through prototype measurements conducted in an anechoic chamber. Additionally, several indoor communication experiments using a pair of software-defined radios demonstrate the superior received power and bit error rate performance of the ER-FAS prototype.
Systems and Control (EESS)
A 0.62 μW/sensor 82 fps Time-to-Digital Impedance Measurement IC with Unified Excitation/Readout Front-end for Large-Scale Piezo-Resistive Sensor Array
This paper presents a fast impedance measurement IC for large-scale piezo-resistive sensor array. It features a unified differential time-to-digital demodulation architecture that readout impedance directly through the excitation circuit. The proposed pre-saturation adaptive bias technique further improves power efficiency. The chip scans 253 sensors in 12.2 ms (82 fps) at 125 kHz, consuming 158 {\mu}W (7.5 nJ/sensor). With loads from 20 {\Omega} to 500 k{\Omega}, it achieves 0.5% error and up to 71.1 dB SNR.
Cryo-CMOS Antenna for Wireless Communications within a Quantum Computer Cryostat
Scaling quantum computers from a few qubits to large numbers remains one of the critical challenges in realizing practical quantum advantage. Multi-core quantum architectures have emerged as a promising solution, enabling scalability through distributed quantum processing units (QPUs) interconnected via classical and quantum links. However, the bottleneck of wired connections persists, as densely packed wired interconnects, both vertically across temperature stages and horizontally within the same layer, introduce spatial constraints, power dissipation, and latency, which could hinder performance as the number of QPUs increases. To overcome these limitations, this work proposes a cryo-compatible on-chip differential dipole antenna operating at 28 GHz to enable short-range wireless communication within a quantum computer cryostat. Temperature-dependent material properties are incorporated to accurately capture antenna behavior at 4 K. Moreover, by embedding the antenna in a realistic cryostat structure, we evaluate the feasibility of antenna operation within the cryogenic environment. The proposed antenna achieves a reflection coefficient of -20.8 dB in free space and -18.38 dB within the cryostat, demonstrating efficient impedance matching.
Efficient Force and Stiffness Prediction in Robotic Produce Handling with a Piezoresistive Pressure Sensor
Properly handling delicate produce with robotic manipulators is a major part of the future role of automation in agricultural harvesting and processing. Grasping with the correct amount of force is crucial in not only ensuring proper grip on the object, but also to avoid damaging or bruising the product. In this work, a flexible pressure sensor that is both low cost and easy to fabricate is integrated with robotic grippers for working with produce of varying shapes, sizes, and stiffnesses. The sensor is successfully integrated with both a rigid robotic gripper, as well as a pneumatically actuated soft finger. Furthermore, an algorithm is proposed for accelerated estimation of the steady-state value of the sensor output based on the transient response data, to enable real-time applications. The sensor is shown to be effective in incorporating feedback to correctly grasp objects of unknown sizes and stiffnesses. At the same time, the sensor provides estimates for these values which can be utilized for identification of qualities such as ripeness levels and bruising. It is also shown to be able to provide force feedback for objects of variable stiffnesses. This enables future use not only for produce identification, but also for tasks such as quality control and selective distribution based on ripeness levels.
comment: For supplementary videos, see https://drive.google.com/drive/folders/1jol-_z6gaUfjpL1Qi7EG420usTbVSodv?usp=sharing
Channel Estimation under Large Doppler Shifts in NOMA-Based Air-Ground Communications
This paper investigates a multiple antenna system with non-orthogonal multiple access (NOMA) for the exchange of air traffic management data between commercial aircraft pilots and ground-based air traffic controllers. While NOMA techniques enhance spectral efficiency, their application to aircraft communications is challenged by the high speed of the aircraft (up to 214 m/s) and the long communication ranges (up to 250 km), resulting in significant Doppler shifts and low signal-to-noise ratios, respectively. To accurately assess these challenges, we employ a realistic geometry-based stochastic air-ground channel model, derived from dedicated flight measurement campaigns. In this paper, multiple aircraft simultaneously transmit data to the ground station. We focus on the channel estimation problem at the ground station under high carrier frequency offsets and the effects of channel aging due to channel's time-varying nature. For the channel estimation problem, we compare the Zadoff-Chu sequences with time-division approach under varying carrier frequency offset pre-compensation accuracies at the aircraft transmitter. For the channel aging problem and performance evaluation of channel estimators, we compute the outage probability for both the zero-forcing detector and the minimum mean squared error detector with successive interference cancellation. The results show that the favorable channel estimator-detector combinations differ between the takeoff & landing phase and the enroute cruise phase of the flight, due to the distinct channel propagation characteristics of each phase.
comment: Submitted to IEEE Conference, 6 pages, 2 Figures
Data-driven learning of feedback maps for explicit robust predictive control: an approximation theoretic view
We establish an algorithm to learn feedback maps from data for a class of robust model predictive control (MPC) problems. The algorithm accounts for the approximation errors due to the learning directly at the synthesis stage, ensuring recursive feasibility by construction. The optimal control problem consists of a linear noisy dynamical system, a quadratic stage and quadratic terminal costs as the objective, and convex constraints on the state, control, and disturbance sequences; the control minimizes and the disturbance maximizes the objective. We proceed via two steps -- (a) Data generation: First, we reformulate the given minmax problem into a convex semi-infinite program and employ recently developed tools to solve it in an exact fashion on grid points of the state space to generate (state, action) data. (b) Learning approximate feedback maps: We employ a couple of approximation schemes that furnish tight approximations within preassigned uniform error bounds on the admissible state space to learn the unknown feedback policy. The stability of the closed-loop system under the approximate feedback policies is also guaranteed under a standard set of hypotheses. Two benchmark numerical examples are provided to illustrate the results.
comment: 27 pages; submitted
Quantifying the Impact of Missing Risk Markets for Decarbonized Power Systems with Long Duration Energy Storage
The transition to a fully decarbonised electricity system depends on integrating new technologies that ensure reliability alongside sustainability. However, missing risk markets hinder investment in reliability-enhancing technologies by exposing investors to revenue uncertainty. This study provides the first quantitative assessment of how missing risk markets affect investment decisions in power systems that depend on long-duration energy storage (LDES) for reliability. We develop a two-stage stochastic equilibrium model with risk-averse market participants, which independently sizes power and energy capacity. We apply the method to a case study of a deeply decarbonised power system in Great Britain. The results show that incomplete risk markets reduce social welfare, harm reliability, and discourage investment in LDES and other technologies with volatile revenue streams. Revenue volatility leads to substantial risk premiums and higher financing costs for LDES, creating a barrier to its large-scale deployment. These findings demonstrate the importance of policy mechanisms that hedge revenue risk to lower the cost of capital and accelerate investment in reliability-enhancing, zero-carbon technologies
Physics-Informed Neural Network Modeling of Vehicle Collision Dynamics in Precision Immobilization Technique Maneuvers
Accurate prediction of vehicle collision dynamics is crucial for advanced safety systems and post-impact control applications, yet existing methods face inherent trade-offs among computational efficiency, prediction accuracy, and data requirements. This paper proposes a dual Physics-Informed Neural Network framework addressing these challenges through two complementary networks. The first network integrates Gaussian Mixture Models with PINN architecture to learn impact force distributions from finite element analysis data while enforcing momentum conservation and energy consistency constraints. The second network employs an adaptive PINN with dynamic constraint weighting to predict post-collision vehicle dynamics, featuring an adaptive physics guard layer that prevents unrealistic predictions whil e preserving data-driven learning capabilities. The framework incorporates uncertainty quantification through time-varying parameters and enables rapid adaptation via fine-tuning strategies. Validation demonstrates significant improvements: the impact force model achieves relative errors below 15.0% for force prediction on finite element analysis (FEA) datasets, while the vehicle dynamics model reduces average trajectory prediction error by 63.6% compared to traditional four-degree-of-freedom models in scaled vehicle experiments. The integrated system maintains millisecond-level computational efficiency suitable for real-time applications while providing probabilistic confidence bounds essential for safety-critical control. Comprehensive validation through FEA simulation, dynamic modeling, and scaled vehicle experiments confirms the framework's effectiveness for Precision Immobilization Technique scenarios and general collision dynamics prediction.
On the Flexibility Potential of a Swiss Distribution Grid: Opportunities and Limitations
The growing integration of distributed renewable generation and the electrification of heating and transportation are rapidly increasing the number of flexible devices within modern distribution grids. Leveraging the aggregated flexibility of these small-scale distributed resources is essential to maintaining future grid-wide stability. This work uses the Swiss distribution grid of Walenstadt as a case study to provide insights into the aggregated flexibility potential of distribution grids. It demonstrates that incorporating devices such as heat pumps and photovoltaic systems significantly enhances distribution grid flexibility. It investigates the time-varying nature of aggregated flexibility and highlights how it can vary seasonally. Furthermore, simulations of future scenarios reveal that aggregated flexibility does not increase linearly or monotonically with higher levels of flexible device penetration. This is primarily due to the overloading of individual feeders, which underscores the impact of grid topology and network constraints on the aggregated flexibility potential.
Multipolar dynamics of social segregation: Data validation on Swedish vaccination statistics
We perform a validation analysis on the multipolar model of opinion dynamics. A general methodology for using the model on datasets of two correlated variables is proposed and tested using data on the relationship between COVID-19 vaccination rates and political participation in Sweden. The model is shown to successfully capture the opinion segregation demonstrated by the data and spatial correlation of biases is demonstrated as necessary for the result. A mixing of the biases on the other hand leads to a more homogeneous opinion distribution, and greater penetration of the majority opinion, which here corresponds to a decision to vote or vaccinate.
comment: Presented at CoDIT 2025
Performance Comparison of Gate-Based and Adiabatic Quantum Computing for Power Flow Analysis SC
In this paper, we present the first direct comparison between gate-based quantum computing (GQC) and adiabatic quantum computing (AQC) for solving the AC power flow (PF) equations. Building on the Adiabatic Quantum Power Flow (AQPF) algorithm originally designed for annealing platforms, we adapt it to the Quantum Approximate Optimization Algorithm (QAOA). The PF equations are reformulated as a combinatorial optimization problem. Numerical experiments on a 4-bus test system assess solution accuracy and computational time. Results from QAOA are benchmarked against those obtained using D-Wave's Advantage system and Fujitsu's latest generation Digital Annealer, i.e., Quantum-Inspired Integrated Optimization software (QIIO). The findings provide quantitative insights into the performance trade-offs, scalability, and practical viability of GQC versus AQC paradigms for PF analysis, highlighting the potential of quantum algorithms to address the computational challenges associated with modern electricity networks in the Noisy Intermediate-Scale Quantum (NISQ).
comment: 7 pages, 1 figure, 4 tables, submitted to PSCC 2026
Partitioned Scheduling for DAG Tasks Considering Probabilistic Execution Time
Autonomous driving systems, critical for safety, require real-time guarantees and can be modeled as DAGs. Their acceleration features, such as caches and pipelining, often result in execution times below the worst-case. Thus, a probabilistic approach ensuring constraint satisfaction within a probability threshold is more suitable than worst-case guarantees for these systems. This paper considers probabilistic guarantees for DAG tasks by utilizing the results of probabilistic guarantees for single processors, which have been relatively more advanced than those for multi-core processors. This paper proposes a task set partitioning method that guarantees schedulability under the partitioned scheduling. The evaluation on randomly generated DAG task sets demonstrates that the proposed method schedules more task sets with a smaller mean analysis time compared to existing probabilistic schedulability analysis for DAGs. The evaluation also compares four bin-packing heuristics, revealing Item-Centric Worst-Fit-Decreasing schedules the most task sets.
Safe Driving in Occluded Environments
Ensuring safe autonomous driving in the presence of occlusions poses a significant challenge in its policy design. While existing model-driven control techniques based on set invariance can handle visible risks, occlusions create latent risks in which safety-critical states are not observable. Data-driven techniques also struggle to handle latent risks because direct mappings from risk-critical objects in sensor inputs to safe actions cannot be learned without visible risk-critical objects. Motivated by these challenges, in this paper, we propose a probabilistic safety certificate for latent risk. Our key technical enabler is the application of probabilistic invariance: It relaxes the strict observability requirements imposed by set-invariance methods that demand the knowledge of risk-critical states. The proposed techniques provide linear action constraints that confine the latent risk probability within tolerance. Such constraints can be integrated into model predictive controllers or embedded in data-driven policies to mitigate latent risks. The proposed method is tested using the CARLA simulator and compared with a few existing techniques. The theoretical and empirical analysis jointly demonstrate that the proposed methods assure long-term safety in real-time control in occluded environments without being overly conservative and with transparency to exposed risks.
Decision-dependent Robust Charging Infrastructure Planning for Light-duty Truck Electrification at Industrial Sites: Scheduling and Abandonment
Many industrial sites rely on diesel-powered light-duty trucks to transport workers and small-scale facilities, which has resulted in a significant amount of greenhouse emissions (GHGs). To address this, we developed a two-stage robust charging infrastructure planning model for electrifying light-duty trucks at industrial sites. The model is formulated as a mixed-integer linear programming (MILP) that optimizes the charging infrastructure, selected from multiple charger types and potential locations, and determines opportunity charging schedules for each truck based on the chosen infrastructure. Given the strict stopping points and schedules at industrial sites, we introduced a scheduling problem with abandonment, where trucks forgo charging if their waiting times exceed a maximum threshold. We also further incorporated the impacts of overnight charging and range anxiety on waiting and abandonment behaviors. To represent the stochastic and heterogeneous parking durations of trucks, we constructed a decision-dependent robust uncertainty set in which parking time variability flexibly depends on charging choices. We applied the model in a case study of an open-pit mining site, which plans charger installations in eight zones and schedules a fleet of around 200 trucks. By decomposing the problem into monthly subproblems and using heuristic approaches, for the whole-year dataset, the model achieves an optimality gap of less than 0.1 % within a reasonable computation time under diverse uncertainty scenarios.
Time-Varying Optimization for Streaming Data Via Temporal Weighting
Classical optimization theory deals with fixed, time-invariant objective functions. However, time-varying optimization has emerged as an important subject for decision-making in dynamic environments. In this work, we study the problem of learning from streaming data through a time-varying optimization lens. Unlike prior works that focus on generic formulations, we introduce a structured, \emph{weight-based} formulation that explicitly captures the streaming-data origin of the time-varying objective, where at each time step, an agent aims to minimize a weighted average loss over all the past data samples. We focus on two specific weighting strategies: (1) uniform weights, which treat all samples equally, and (2) discounted weights, which geometrically decay the influence of older data. For both schemes, we derive tight bounds on the ``tracking error'' (TE), defined as the deviation between the model parameter and the time-varying optimum at a given time step, under gradient descent (GD) updates. We show that under uniform weighting, the TE vanishes asymptotically with a $\mathcal{O}(1/t)$ decay rate, whereas discounted weighting incurs a nonzero error floor controlled by the discount factor and the number of gradient updates performed at each time step. Our theoretical findings are validated through numerical simulations.
comment: Accepted at IEEE Asilomar, 2025
Learning Wireless Interference Patterns: Decoupled GNN for Throughput Prediction in Heterogeneous Multi-Hop p-CSMA Networks
The p-persistent CSMA protocol is central to random-access MAC analysis, but predicting saturation throughput in heterogeneous multi-hop wireless networks remains a hard problem. Simplified models that assume a single, shared interference domain can underestimate throughput by 48--62\% in sparse topologies. Exact Markov-chain analyses are accurate but scale exponentially in computation time, making them impractical for large networks. These computational barriers motivate structural machine learning approaches like GNNs for scalable throughput prediction in general network topologies. Yet off-the-shelf GNNs struggle here: a standard GCN yields 63.94\% normalized mean absolute error (NMAE) on heterogeneous networks because symmetric normalization conflates a node's direct interference with higher-order, cascading effects that pertain to how interference propagates over the network graph. Building on these insights, we propose the Decoupled Graph Convolutional Network (D-GCN), a novel architecture that explicitly separates processing of a node's own transmission probability from neighbor interference effects. D-GCN replaces mean aggregation with learnable attention, yielding interpretable, per-neighbor contribution weights while capturing complex multihop interference patterns. D-GCN attains 3.3\% NMAE, outperforms strong baselines, remains tractable even when exact analytical methods become computationally infeasible, and enables gradient-based network optimization that achieves within 1\% of theoretical optima.
Laser Fault Injection in Memristor-Based Accelerators for AI/ML and Neuromorphic Computing
Memristive crossbar arrays (MCA) are emerging as efficient building blocks for in-memory computing and neuromorphic hardware due to their high density and parallel analog matrix-vector multiplication capabilities. However, the physical properties of their nonvolatile memory elements introduce new attack surfaces, particularly under fault injection scenarios. This work explores Laser Fault Injection as a means of inducing analog perturbations in MCA-based architectures. We present a detailed threat model in which adversaries target memristive cells to subtly alter their physical properties or outputs using laser beams. Through HSPICE simulations of a large MCA on 45 nm CMOS tech. node, we show how laser-induced photocurrent manifests in output current distributions, enabling differential fault analysis to infer internal weights with up to 99.7% accuracy, replicate the model, and compromise computational integrity through targeted weight alterations by approximately 143%.
comment: 3 pages, 4 figures
Resource-Aware Stealthy Attacks in Vehicle Platoons
Connected and Autonomous Vehicles (CAVs) are transforming modern transportation by enabling cooperative applications such as vehicle platooning, where multiple vehicles travel in close formation to improve efficiency and safety. However, the heavy reliance on inter-vehicle communication makes platoons highly susceptible to attacks, where even subtle manipulations can escalate into severe physical consequences. While existing research has largely focused on defending against attacks, far less attention has been given to stealthy adversaries that aim to covertly manipulate platoon behavior. This paper introduces a new perspective on the attack design problem by demonstrating how attackers can guide platoons toward their own desired trajectories while remaining undetected. We outline conditions under which such attacks are feasible, analyze their dependence on communication topologies and control protocols, and investigate the resources required by the attacker. By characterizing the resources needed to launch stealthy attacks, we address system vulnerabilities and informing the design of resilient countermeasures. Our findings reveal critical weaknesses in current platoon architectures and anomaly detection mechanisms and provide methods to develop more secure and trustworthy CAV systems.
comment: 13 pages, 8 figures
Belief Space Control of Safety-Critical Systems Under State-Dependent Measurement Noise
Safety-critical control is imperative for deploying autonomous systems in the real world. Control Barrier Functions (CBFs) offer strong safety guarantees when accurate system and sensor models are available. However, widely used additive, fixed-noise models are not representative of complex sensor modalities with state-dependent error characteristics. Although CBFs have been designed to mitigate uncertainty using fixed worst-case bounds on measurement noise, this approach can lead to overly-conservative control. To solve this problem, we extend the Belief Control Barrier Function (BCBF) framework to accommodate state-dependent measurement noise via the Generalized Extended Kalman Filter (GEKF) algorithm, which models measurement noise as a linear function of the state. Using the original BCBF framework as baseline, we demonstrate the performance of the BCBF-GEKF approach through simulation results on a 1D single integrator setpoint tracking scenario and 2D unicycle kinematics trajectory tracking scenario. Our results confirm that the BCBF-GEKF approach offers less conservative control with greater safety.
comment: Preprint - Submitted to the 2026 American Control Conference
DiffOPF: Diffusion Solver for Optimal Power Flow
The optimal power flow (OPF) is a multi-valued, non-convex mapping from loads to dispatch setpoints. The variability of system parameters (e.g., admittances, topology) further contributes to the multiplicity of dispatch setpoints for a given load. Existing deep learning OPF solvers are single-valued and thus fail to capture the variability of system parameters unless fully represented in the feature space, which is prohibitive. To solve this problem, we introduce a diffusion-based OPF solver, termed \textit{DiffOPF}, that treats OPF as a conditional sampling problem. The solver learns the joint distribution of loads and dispatch setpoints from operational history, and returns the marginal dispatch distributions conditioned on loads. Unlike single-valued solvers, DiffOPF enables sampling statistically credible warm starts with favorable cost and constraint satisfaction trade-offs. We explore the sample complexity of DiffOPF to ensure the OPF solution within a prescribed distance from the optimization-based solution, and verify this experimentally on power system benchmarks.
comment: 7 pages, 4 figures, 2 tables
Dual Detection Framework for Faults and Integrity Attacks in Cyber-Physical Control Systems
Anomaly detection plays a vital role in the security and safety of cyber-physical control systems, and accurately distinguishing between different anomaly types is crucial for system recovery and mitigation. This study proposes a dual detection framework for anomaly detection and discrimination. By leveraging the dynamic characteristics of control loops and the stealthiness features of integrity attacks, the closed-loop stealthiness condition is first derived, and two dedicated detectors are designed and deployed on the controller side and the plant side, respectively, enabling joint plant fault and cyber attack detection. Moreover, by jointly analyzing the residual response of the two detectors corresponding to different anomalies, it is proved that the proposed method can distinguish between faults and integrity attacks due to the detectors' individual residual spaces. According to the detector's residual space, the fault and attack detection performance is further improved by a two-stage optimization scheme. Simulation results validate the effectiveness of the proposed approach.
Multi-Period Sparse Optimization for Proactive Grid Blackout Diagnosis
Existing or planned power grids need to evaluate survivability under extreme events, like a number of peak load overloading conditions, which could possibly cause system collapses (i.e. blackouts). For realistic extreme events that are correlated or share similar patterns, it is reasonable to expect that the dominant vulnerability or failure sources behind them share the same locations but with different severity. Early warning diagnosis that proactively identifies the key vulnerabilities responsible for a number of system collapses of interest can significantly enhance resilience. This paper proposes a multi-period sparse optimization method, enabling the discovery of {persistent failure sources} across a sequence of collapsed systems with increasing system stress, such as rising demand or worsening contingencies. This work defines persistency and efficiently integrates persistency constraints to capture the ``hidden'' evolving vulnerabilities. Circuit-theory based power flow formulations and circuit-inspired optimization heuristics are used to facilitate the scalability of the method. Experiments on benchmark systems show that the method reliably tracks persistent vulnerability locations under increasing load stress, and solves with scalability to large systems ({on average} taking {around} 200 s per scenario on 2000+ bus systems).
Cyber-Resilient System Identification for Power Grid through Bayesian Integration
Power grids increasingly need real-time situational awareness under the ever-evolving cyberthreat landscape. Advances in snapshot-based system identification approaches have enabled accurately estimating states and topology from a snapshot of measurement data, under random bad data and topology errors. However, modern interactive, targeted false data can stay undetectable to these methods, and significantly compromise estimation accuracy. This work advances system identification that combines snapshot-based method with time-series model via Bayesian Integration, to advance cyber resiliency against both random and targeted false data. Using a distance-based time-series model, this work can leverage historical data of different distributions induced by changes in grid topology and other settings. The normal system behavior captured from historical data is integrated into system identification through a Bayesian treatment, to make solutions robust to targeted false data. We experiment on mixed random anomalies (bad data, topology error) and targeted false data injection attack (FDIA) to demonstrate our method's 1) cyber resilience: achieving over 70% reduction in estimation error under FDIA; 2) anomalous data identification: being able to alarm and locate anomalous data; 3) almost linear scalability: achieving comparable speed with the snapshot-based baseline, both taking <1min per time tick on the large 2,383-bus system using a laptop CPU.
The Algorithmic Regulator
The regulator theorem states that, under certain conditions, any optimal controller must embody a model of the system it regulates, grounding the idea that controllers embed, explicitly or implicitly, internal models of the controlled. This principle underpins neuroscience and predictive brain theories like the Free-Energy Principle or Kolmogorov/Algorithmic Agent theory. However, the theorem is only proven in limited settings. Here, we treat the deterministic, closed, coupled world-regulator system $(W,R)$ as a single self-delimiting program $p$ via a constant-size wrapper that produces the world output string~$x$ fed to the regulator. We analyze regulation from the viewpoint of the algorithmic complexity of the output, $K(x)$. We define $R$ to be a \emph{good algorithmic regulator} if it \emph{reduces} the algorithmic complexity of the readout relative to a null (unregulated) baseline $\varnothing$, i.e., \[ \Delta = K\big(O_{W,\varnothing}\big) - K\big(O_{W,R}\big) > 0. \] We then prove that the larger $\Delta$ is, the more world-regulator pairs with high mutual algorithmic information are favored. More precisely, a complexity gap $\Delta > 0$ yields \[ \Pr\big((W,R)\mid x\big) \le C\,2^{\,M(W{:}R)}\,2^{-\Delta}, \] making low $M(W{:}R)$ exponentially unlikely as $\Delta$ grows. This is an AIT version of the idea that ``the regulator contains a model of the world.'' The framework is distribution-free, applies to individual sequences, and complements the Internal Model Principle. Beyond this necessity claim, the same coding-theorem calculus singles out a \emph{canonical scalar objective} and implicates a \emph{planner}. On the realized episode, a regulator behaves \emph{as if} it minimized the conditional description length of the readout.
comment: 2 Figures
The value of storage in electricity distribution: The role of markets
Electricity distribution companies deploy battery storage to defer grid upgrades by reducing peak demand. In deregulated jurisdictions, such storage often sits idle because regulatory constraints bar participation in electricity markets. Here, we develop an optimization framework that, to our knowledge, provides the first formal model of market participation constraints within storage investment and operation planning. Applying the framework to a Massachusetts case study, we find that market participation could deliver similar savings as peak demand reduction. Under current conditions, market participation does not increase storage investment, but at very low storage costs, could incentivize deployment beyond local distribution needs. This might run contrary to the separation of distribution from generation in deregulated markets. Our framework can identify investment levels appropriate for local distribution needs.
High-Parallel FPGA-Based Discrete Simulated Bifurcation for Large-Scale Optimization
Combinatorial Optimization (CO) problems exhibit exponential complexity, making their resolution challenging. Simulated Adiabatic Bifurcation (aSB) is a quantum-inspired algorithm to obtain approximate solutions to largescale CO problems written in the Ising form. It explores the solution space by emulating the adiabatic evolution of a network of Kerr-nonlinear parametric oscillators (KPOs), where each oscillator represents a variable in the problem. The optimal solution corresponds to the ground state of this system. A key advantage of this approach is the possibility of updating multiple variables simultaneously, making it particularly suited for hardware implementation. To enhance solution quality and convergence speed, variations of the algorithm have been proposed in the literature, including ballistic (bSB), discrete (dSB), and thermal (HbSB) versions. In this work, we have comprehensively analyzed dSB, bSB, and HbSB using dedicated software models, evaluating the feasibility of using a fixed-point representation for hardware implementation. We then present an opensource hardware architecture implementing the dSB algorithm for Field-Programmable Gate Arrays (FPGAs). The design allows users to adjust the degree of algorithmic parallelization based on their specific requirements. A proof-of-concept implementation that solves 256-variable problems was achieved on an AMD Kria KV260 SoM, a low-tier FPGA, validated using well-known max-cut and knapsack problems.
Hybrid Terrain-Aware Path Planning: Integrating VD-RRT* Exploration and VD-D* Lite Repair
Autonomous ground vehicles operating off-road must plan curvature-feasible paths while accounting for spatially varying soil strength and slope hazards in real time. We present a continuous state--cost metric that combines a Bekker pressure--sinkage model with elevation-derived slope and attitude penalties. The resulting terrain cost field is analytic, bounded, and monotonic in soil modulus and slope, ensuring well-posed discretization and stable updates under sensor noise. This metric is evaluated on a lattice with exact steering primitives: Dubins and Reeds--Shepp motions for differential drive and time-parameterized bicycle arcs for Ackermann steering. Global exploration is performed using Vehicle-Dynamics RRT\(^{*}\), while local repair is managed by Vehicle-Dynamics D\(^{*}\) Lite, enabling millisecond-scale replanning without heuristic smoothing. By separating the terrain--vehicle model from the planner, the framework provides a reusable basis for deterministic, sampling-based, or learning-driven planning in deformable terrain. Hardware trials on an off-road platform demonstrate real-time navigation across soft soil and slope transitions, supporting reliable autonomy in unstructured environments.
Product Digital Twin Supporting End-of-life Phase of Electric Vehicle Batteries Utilizing Product-Process-Resource Asset Network
In a circular economy, products in their end-of-life phase should be either remanufactured or recycled. Both of these processes are crucial for sustainability and environmental conservation. However, manufacturers frequently do not support these processes enough in terms of not sharing relevant data about the products nor their (re-)manufacturing processes. This paper proposes to accompany each product with a digital twin technology, specifically the Product Digital Twin (PDT), which can carry information for facilitating and optimizing production and remanufacturing processes. This paper introduces a knowledge representation called Bi-Flow Product-Process-Resource Asset Network (Bi-PAN). Bi-PAN extends a well-proven Product-Process-Resource Asset Network (PAN) paradigm by integrating both assembly and disassembly workflows into a single information model. Such networks enable capturing relevant relationships across products, production resources, manufacturing processes, and specific production operations that have to be done in the manufacturing phase of a product. The proposed approach is demonstrated in a use-case of disassembling electric vehicle (EV) batteries. By utilizing PDTs with Bi-PAN knowledge models, challenges associated with disassembling of EV batteries can be solved flexibly and efficiently for various battery types, enhancing the sustainability of the EV battery life-cycle management.
comment: \copyright 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
A Personalized Data-Driven Generative Model of Human Repetitive Motion
The deployment of autonomous virtual avatars (in extended reality) and robots in human group activities -- such as rehabilitation therapy, sports, and manufacturing -- is expected to increase as these technologies become more pervasive. Designing cognitive architectures and control strategies to drive these agents requires realistic models of human motion. Furthermore, recent research has shown that each person exhibits a unique velocity signature, highlighting how individual motor behaviors are both rich in variability and internally consistent. However, existing models only provide simplified descriptions of human motor behavior, hindering the development of effective cognitive architectures. In this work, we first show that motion amplitude provides a valid and complementary characterization of individual motor signatures. Then, we propose a fully data-driven approach, based on long short-term memory neural networks, to generate original motion that captures the unique features of specific individuals. We validate the architecture using real human data from participants performing spontaneous oscillatory motion. Extensive analyses show that state-of-the-art Kuramoto-like models fail to replicate individual motor signatures, whereas our model accurately reproduces the velocity distribution and amplitude envelopes of the individual it was trained on, while remaining distinct from others.
comment: 12 pages, 6 figures
Addressing Model Inaccuracies in Transmission Network Reconfiguration via Diverse Alternatives
The ongoing energy transition places significant pressure on the transmission network due to increasing shares of renewables and electrification. To mitigate grid congestion, transmission system operators need decision support tools to suggest remedial actions, such as transmission network reconfigurations or redispatch. However, these tools are prone to model inaccuracies and may not provide relevant suggestions with regard to important unmodeled constraints or operator preferences. We propose a human-in-the-loop modeling-to-generate alternatives (HITL-MGA) approach to address these shortcomings by generating diverse topology reconfiguration alternatives. Case studies on the IEEE 57-bus and IEEE 118-bus systems show the method can leverage expert feedback and improve the quality of the suggested topology reconfigurations.
comment: This preprint is currently under peer review
Dual-Regularized Riccati Recursions for Interior-Point Optimal Control
We derive closed-form extensions of Riccati's recursions (both sequential and parallel) for solving dual-regularized LQR problems. We show how these methods can be used to solve general constrained, non-convex, discrete-time optimal control problems via a regularized interior point method, while guaranteeing that each step is a descent direction of an Augmented Barrier-Lagrangian merit function. We provide MIT-licensed implementations of our methods in C++ and JAX.
On the Fast Nonlinear Filtering with Matrix Fisher Distributions on SO(3)
This paper addresses two interrelated problems: the nonlinear filtering mechanism and fast attitude filtering with the matrix Fisher distribution (MFD) on the special orthogonal group. By analyzing the distribution evolution along Bayes' rule, we reveal two essential properties that enhance the performance of Bayesian attitude filters with MFDs, particularly in challenging conditions from a theoretical viewpoint. Benefiting from the new understanding of the filtering mechanism associated with MFDs, two closed-form filters with MFDs are then proposed. The filters avoids the burdensome computations in previous MFD-based filters by introducing linearized error systems with invariant errors but retaining the two advantageous properties. Numerical simulations demonstrate that the proposed filters are more accurate than the classic invariant Kalman filter. Besides, it is also as accurate as recent MFD-based Bayesian filters in challenging circumstances with large initial error and measurement uncertainty, but it consumes far less computation time (about 1/5 to 1/100 of previous MFD-based attitude filters).
Design and benchmarking of a two degree of freedom tendon driver unit for cable-driven wearable technologies
Exosuits have recently been developed as alternatives to rigid exoskeletons and are increasingly adopted for both upper and lower limb therapy and assistance in clinical and home environments. Many cable-driven exosuits have been developed but little has been published on their electromechanical designs and performance. Therefore, this paper presents a comprehensive design and performance analysis of a two degree of freedom tendon driver unit (TDU) for cable-driven wearable exosuits. Detailed methodologies are presented to benchmark the functionality of the TDU. A static torque output test compares the commanded and measured torques. A velocity control test evaluates the attenuation and phase shift across velocities. A noise test evaluates how loud the TDU is for the wearer under different speeds. A thermal stress test captures the cooling performance of the TDU to ensure safe operation at higher loads. Finally, a battery endurance test evaluates the runtime of the TDU under various loading conditions to inform the usable time. To demonstrate these tests, a modular TDU system for cable-driven applications is introduced, which allows components such as motors, pulleys, and sensors to be adapted based on the requirements of the intended application. By sharing detailed methodologies and performance results, this study aims to provide a TDU design that may be leveraged by others and resources for researchers and engineers to better document the capabilities of their TDU designs.
Geometric Backstepping Control of Omnidirectional Tiltrotors Incorporating Servo-Rotor Dynamics for Robustness against Sudden Disturbances
This work presents a geometric backstepping controller for a variable-tilt omnidirectional multirotor that explicitly accounts for both servo and rotor dynamics. Considering actuator dynamics is essential for more effective and reliable operation, particularly during aggressive flight maneuvers or recovery from sudden disturbances. While prior studies have investigated actuator-aware control for conventional and fixed-tilt multirotors, these approaches rely on linear relationships between actuator input and wrench, which cannot capture the nonlinearities induced by variable tilt angles. In this work, we exploit the cascade structure between the rigid-body dynamics of the multirotor and its nonlinear actuator dynamics to design the proposed backstepping controller and establish exponential stability of the overall system. Furthermore, we reveal parametric uncertainty in the actuator model through experiments, and we demonstrate that the proposed controller remains robust against such uncertainty. The controller was compared against a baseline that does not account for actuator dynamics across three experimental scenarios: fast translational tracking, rapid rotational tracking, and recovery from sudden disturbance. The proposed method consistently achieved better tracking performance, and notably, while the baseline diverged and crashed during the fastest translational trajectory tracking and the recovery experiment, the proposed controller maintained stability and successfully completed the tasks, thereby demonstrating its effectiveness.
A Verification Methodology for Safety Assurance of Robotic Autonomous Systems
Autonomous robots deployed in shared human environments, such as agricultural settings, require rigorous safety assurance to meet both functional reliability and regulatory compliance. These systems must operate in dynamic, unstructured environments, interact safely with humans, and respond effectively to a wide range of potential hazards. This paper presents a verification workflow for the safety assurance of an autonomous agricultural robot, covering the entire development life-cycle, from concept study and design to runtime verification. The outlined methodology begins with a systematic hazard analysis and risk assessment to identify potential risks and derive corresponding safety requirements. A formal model of the safety controller is then developed to capture its behaviour and verify that the controller satisfies the specified safety properties with respect to these requirements. The proposed approach is demonstrated on a field robot operating in an agricultural setting. The results show that the methodology can be effectively used to verify safety-critical properties and facilitate the early identification of design issues, contributing to the development of safer robots and autonomous systems.
comment: In Proc. of the 26th TAROS (Towards Autonomous Robotic Systems) Conference, York, UK, August, 2025
Multi Timescale Stochastic Approximation: Stability and Convergence
This paper presents the first sufficient conditions that guarantee the stability and almost sure convergence of multi-timescale stochastic approximation (SA) iterates. It extends the existing results on one-timescale and two-timescale SA iterates to general $N$-timescale stochastic recursions, for any $N \geq 1$, using the ordinary differential equation (ODE) method. As an application, we study SA algorithms augmented with heavy-ball momentum in the context of Gradient Temporal Difference (GTD) learning. The added momentum introduces an auxiliary state evolving on an intermediate timescale, yielding a three-timescale recursion. We show that with appropriate momentum parameters, the scheme fits within our framework and converges almost surely to the same fixed point as baseline GTD. The stability and convergence of all iterates including the momentum state follow from our main results without ad hoc bounds. We then study off-policy actor-critic algorithms with a baseline learner, actor, and critic updated on separate timescales. In contrast to prior work, we eliminate projection steps from the actor update and instead use our framework to guarantee stability and almost sure convergence of all components. Finally, we extend the analysis to constrained policy optimization in the average reward setting, where the actor, critic, and dual variables evolve on three distinct timescales, and we verify that the resulting dynamics satisfy the conditions of our general theorem. These examples show how diverse reinforcement learning algorithms covering momentum acceleration, off-policy learning, and primal-dual methods-fit naturally into the proposed multi-timescale framework.
comment: arXiv admin note: text overlap with arXiv:2111.11004, Added an application to the 4-Timescale case
Learning Power Flow with Confidence: A Probabilistic Guarantee Framework for Voltage Risk
The absence of formal performance guarantees in machine learning (ML) has limited its adoption for safety-critical power system applications, where confidence and interpretability are as vital as accuracy. In this work, we present a probabilistic guarantee for power flow learning and voltage risk estimation, derived through the framework of Gaussian Process (GP) regression. Specifically, we establish a bound on the expected estimation error that connects the GP's predictive variance to confidence in voltage risk estimates, ensuring statistical equivalence with Monte Carlo-based ACPF risk quantification. To enhance model learnability in the low-data regime, we first design the Vertex-Degree Kernel (VDK), a topology-aware additive kernel that decomposes voltage-load interactions into local neighborhoods for efficient large-scale learning. Building on this, we introduce a network-swipe active learning (AL) algorithm that adaptively samples informative operating points and provides a principled stopping criterion without requiring out-of-sample validation. Together, these developments mitigate the principal bottleneck of ML-based power flow-its lack of guaranteed reliability-by combining data efficiency with analytical assurance. Empirical evaluations across IEEE 118-, 500-, and 1354-bus systems confirm that the proposed VDK-GP achieves mean absolute voltage errors below 1E-03 p.u., reproduces Monte Carlo-level voltage risk estimates with 15x fewer ACPF computations, and achieves over 120x reduction in evaluation time while conservatively bounding violation probabilities.
comment: 10 pages
Convergence and sample complexity of natural policy gradient primal-dual methods for constrained MDPs NeurIPS 2020
We study the sequential decision making problem of maximizing the expected total reward while satisfying a constraint on the expected total utility. We employ the natural policy gradient method to solve the discounted infinite-horizon optimal control problem for Constrained Markov Decision Processes (constrained MDPs). Specifically, we propose a new Natural Policy Gradient Primal-Dual (NPG-PD) method that updates the primal variable via natural policy gradient ascent and the dual variable via projected subgradient descent. Although the underlying maximization involves a nonconcave objective function and a nonconvex constraint set, under the softmax policy parametrization, we prove that our method achieves global convergence with sublinear rates regarding both the optimality gap and the constraint violation. Such convergence is independent of the size of the state-action space, i.e., it is~dimension-free. Furthermore, for log-linear and general smooth policy parametrizations, we establish sublinear convergence rates up to a function approximation error caused by restricted policy parametrization. We also provide convergence and finite-sample complexity guarantees for two sample-based NPG-PD algorithms. We use a set of computational experiments to showcase the effectiveness of our approach.
comment: 76 pages, 4 figures, 2 tables; Journal version of the NeurIPS 2020 paper; Accepted to JMLR
Tiny Learning-Based MPC for Multirotors: Solver-Aware Learning for Efficient Embedded Predictive Control
Tiny aerial robots hold great promise for applications such as environmental monitoring and search-and-rescue, yet face significant control challenges due to limited onboard computing power and nonlinear dynamics. Model Predictive Control (MPC) enables agile trajectory tracking and constraint handling but depends on an accurate dynamics model. While existing Learning-Based (LB) MPC methods, such as Gaussian Process (GP) MPC, enhance performance by learning residual dynamics, their high computational cost restricts onboard deployment on tiny robots. This paper introduces Tiny LB MPC, a co-designed MPC framework and optimization solver for resource-constrained micro multirotor platforms. The proposed approach achieves 100 Hz control on a Crazyflie 2.1 equipped with a Teensy 4.0 microcontroller, demonstrating a 43% average improvement in tracking performance over existing embedded MPC methods under model uncertainty, and achieving the first onboard implementation of LB MPC on a 53 g multirotor.
A Faster and More Reliable Middleware for Autonomous Driving Systems
Ensuring safety in high-speed autonomous vehicles requires rapid control loops and tightly bounded delays from perception to actuation. Many open-source autonomy systems rely on ROS 2 middleware; when multiple sensor and control nodes share one compute unit, ROS 2 and its DDS transports add significant (de)serialization, copying, and discovery overheads, shrinking the available time budget. We present Sensor-in-Memory (SIM), a shared-memory transport designed for intra-host pipelines in autonomous vehicles. SIM keeps sensor data in native memory layouts (e.g., cv::Mat, PCL), uses lock-free bounded double buffers that overwrite old data to prioritize freshness, and integrates into ROS 2 nodes with four lines of code. Unlike traditional middleware, SIM operates beside ROS 2 and is optimized for applications where data freshness and minimal latency outweigh guaranteed completeness. SIM provides sequence numbers, a writer heartbeat, and optional checksums to ensure ordering, liveness, and basic integrity. On an NVIDIA Jetson Orin Nano, SIM reduces data-transport latency by up to 98% compared to ROS 2 zero-copy transports such as FastRTPS and Zenoh, lowers mean latency by about 95%, and narrows 95th/99th-percentile tail latencies by around 96%. In tests on a production-ready Level 4 vehicle running Autoware.Universe, SIM increased localization frequency from 7.5 Hz to 9.5 Hz. Applied across all latency-critical modules, SIM cut average perception-to-decision latency from 521.91 ms to 290.26 ms, reducing emergency braking distance at 40 mph (64 km/h) on dry concrete by 13.6 ft (4.14 m).
comment: 8 pages,7 figures, 8 tables
An AI-Driven Multimodal Smart Home Platform for Continuous Monitoring and Assistance in Post-Stroke Motor Impairment
At-home rehabilitation for post-stroke patients presents significant challenges, as continuous, personalized care is often limited outside clinical settings. Moreover, the lack of integrated solutions capable of simultaneously monitoring motor recovery and providing intelligent assistance in home environments hampers rehabilitation outcomes. Here, we present a multimodal smart home platform designed for continuous, at-home rehabilitation of post-stroke patients, integrating wearable sensing, ambient monitoring, and adaptive automation. A plantar pressure insole equipped with a machine learning pipeline classifies users into motor recovery stages with up to 94\% accuracy, enabling quantitative tracking of walking patterns during daily activities. An optional head-mounted eye-tracking module, together with ambient sensors such as cameras and microphones, supports seamless hands-free control of household devices with a 100\% success rate and sub-second response time. These data streams are fused locally via a hierarchical Internet of Things (IoT) architecture, ensuring low latency and data privacy. An embedded large language model (LLM) agent, Auto-Care, continuously interprets multimodal data to provide real-time interventions -- issuing personalized reminders, adjusting environmental conditions, and notifying caregivers. Implemented in a post-stroke context, this integrated smart home platform increased mean user satisfaction from 3.9 $\pm$ 0.8 in conventional home environments to 8.4 $\pm$ 0.6 with the full system ($n=20$). Beyond stroke, the system offers a scalable, patient-centered framework with potential for long-term use in broader neurorehabilitation and aging-in-place applications.
comment: 5 figures, 41 references
Electromagnetically Reconfigurable Fluid Antenna System for Wireless Communications: Design, Modeling, Algorithm, Fabrication, and Experiment
This paper presents the concept, design, channel modeling, beamforming algorithm development, prototype fabrication, and experimental measurement of an electromagnetically reconfigurable fluid antenna system (ER-FAS), in which each FAS array element features electromagnetic (EM) reconfigurability. Unlike most existing FAS works that investigate spatial reconfigurability by adjusting the position and/or orientation of array elements, the proposed ER-FAS enables direct control over the EM characteristics of each element, allowing for dynamic radiation pattern reconfigurability. Specifically, a novel ER-FAS architecture leveraging software-controlled fluidics is proposed, and corresponding wireless channel models are established. Based on this ER-FAS channel model, a low-complexity greedy beamforming algorithm is developed to jointly optimize the analog phase shift and the radiation state of each array element. The accuracy of the ER-FAS channel model and the effectiveness of the beamforming algorithm are validated through (i) full-wave EM simulations and (ii) numerical spectral efficiency evaluations. These results confirm that the proposed ER-FAS significantly enhances spectral efficiency in both near-field and far-field scenarios compared to conventional antenna arrays. To further validate this design, we fabricate prototypes for both the ER-FAS element and array, using Galinstan liquid metal alloy, fluid silver paste, and software-controlled fluidic channels. The simulation results are experimentally validated through prototype measurements conducted in an anechoic chamber. Additionally, several indoor communication experiments using a pair of software-defined radios demonstrate the superior received power and bit error rate performance of the ER-FAS prototype.
Multiagent Systems
AOAD-MAT: Transformer-based multi-agent deep reinforcement learning model considering agents' order of action decisions
Multi-agent reinforcement learning focuses on training the behaviors of multiple learning agents that coexist in a shared environment. Recently, MARL models, such as the Multi-Agent Transformer (MAT) and ACtion dEpendent deep Q-learning (ACE), have significantly improved performance by leveraging sequential decision-making processes. Although these models can enhance performance, they do not explicitly consider the importance of the order in which agents make decisions. In this paper, we propose an Agent Order of Action Decisions-MAT (AOAD-MAT), a novel MAT model that considers the order in which agents make decisions. The proposed model explicitly incorporates the sequence of action decisions into the learning process, allowing the model to learn and predict the optimal order of agent actions. The AOAD-MAT model leverages a Transformer-based actor-critic architecture that dynamically adjusts the sequence of agent actions. To achieve this, we introduce a novel MARL architecture that cooperates with a subtask focused on predicting the next agent to act, integrated into a Proximal Policy Optimization based loss function to synergistically maximize the advantage of the sequential decision-making. The proposed method was validated through extensive experiments on the StarCraft Multi-Agent Challenge and Multi-Agent MuJoCo benchmarks. The experimental results show that the proposed AOAD-MAT model outperforms existing MAT and other baseline models, demonstrating the effectiveness of adjusting the AOAD order in MARL.
comment: This manuscript is an extended version of the work accepted as a short paper at the 26th International Conference on Principles and Practice of Multi-Agent Systems (PRIMA 2025). The Version of Record of this contribution is published in Springer's Lecture Notes in Artificial Intelligence series (LNCS/LNAI)
Altruistic Ride Sharing: A Community-Driven Approach to Short-Distance Mobility
Urban mobility faces persistent challenges of congestion and fuel consumption, specifically when people choose a private, point-to-point commute option. Profit-driven ride-sharing platforms prioritize revenue over fairness and sustainability. This paper introduces Altruistic Ride-Sharing (ARS), a decentralized, peer-to-peer mobility framework where participants alternate between driver and rider roles based on altruism points rather than monetary incentives. The system integrates multi-agent reinforcement learning (MADDPG) for dynamic ride-matching, game-theoretic equilibrium guarantees for fairness, and a population model to sustain long-term balance. Using real-world New York City taxi data, we demonstrate that ARS reduces travel distance and emissions, increases vehicle utilization, and promotes equitable participation compared to both no-sharing and optimization-based baselines. These results establish ARS as a scalable, community-driven alternative to conventional ride-sharing, aligning individual behavior with collective urban sustainability goals.
comment: Submitted to IEEE Transactions on Intelligent Transportation Systems
Addressing the alignment problem in transportation policy making: an LLM approach
A key challenge in transportation planning is that the collective preferences of heterogeneous travelers often diverge from the policies produced by model-driven decision tools. This misalignment frequently results in implementation delays or failures. Here, we investigate whether large language models (LLMs), noted for their capabilities in reasoning and simulating human decision-making, can help inform and address this alignment problem. We develop a multi-agent simulation in which LLMs, acting as agents representing residents from different communities in a city, participate in a referendum on a set of transit policy proposals. Using chain-of-thought reasoning, LLM agents provide ranked-choice or approval-based preferences, which are aggregated using instant-runoff voting (IRV) to model democratic consensus. We implement this simulation framework with both GPT-4o and Claude-3.5, and apply it for Chicago and Houston. Our findings suggest that LLM agents are capable of approximating plausible collective preferences and responding to local context, while also displaying model-specific behavioral biases and modest divergences from optimization-based benchmarks. These capabilities underscore both the promise and limitations of LLMs as tools for solving the alignment problem in transportation decision-making.
Agentic Discovery: Closing the Loop with Cooperative Agents
As data-driven methods, artificial intelligence (AI), and automated workflows accelerate scientific tasks, we see the rate of discovery increasingly limited by human decision-making tasks such as setting objectives, generating hypotheses, and designing experiments. We postulate that cooperative agents are needed to augment the role of humans and enable autonomous discovery. Realizing such agents will require progress in both AI and infrastructure.
comment: Published in IEEE Computer Volume 58 Issue 10
Formalizing the Safety, Security, and Functional Properties of Agentic AI Systems
Agentic AI systems, which leverage multiple autonomous agents and Large Language Models (LLMs), are increasingly used to address complex, multi-step tasks. The safety, security, and functionality of these systems are critical, especially in high-stakes applications. However, the current ecosystem of inter-agent communication is fragmented, with protocols such as the Model Context Protocol (MCP) for tool access and the Agent-to-Agent (A2A) protocol for coordination being analyzed in isolation. This fragmentation creates a semantic gap that prevents the rigorous analysis of system properties and introduces risks such as architectural misalignment and exploitable coordination issues. To address these challenges, we introduce a modeling framework for agentic AI systems composed of two foundational models. The first, the host agent model, formalizes the top-level entity that interacts with the user, decomposes tasks, and orchestrates their execution by leveraging external agents and tools. The second, the task lifecycle model, details the states and transitions of individual sub-tasks from creation to completion, providing a fine-grained view of task management and error handling. Together, these models provide a unified semantic framework for reasoning about the behavior of multi-AI agent systems. Grounded in this framework, we define 17 properties for the host agent and 14 for the task lifecycle, categorized into liveness, safety, completeness, and fairness. Expressed in temporal logic, these properties enable formal verification of system behavior, detection of coordination edge cases, and prevention of deadlocks and security vulnerabilities. Through this effort, we introduce the first rigorously grounded, domain-agnostic framework for the systematic analysis, design, and deployment of correct, reliable, and robust agentic AI systems.
Stop Reducing Responsibility in LLM-Powered Multi-Agent Systems to Local Alignment
LLM-powered Multi-Agent Systems (LLM-MAS) unlock new potentials in distributed reasoning, collaboration, and task generalization but also introduce additional risks due to unguaranteed agreement, cascading uncertainty, and adversarial vulnerabilities. We argue that ensuring responsible behavior in such systems requires a paradigm shift: from local, superficial agent-level alignment to global, systemic agreement. We conceptualize responsibility not as a static constraint but as a lifecycle-wide property encompassing agreement, uncertainty, and security, each requiring the complementary integration of subjective human-centered values and objective verifiability. Furthermore, a dual-perspective governance framework that combines interdisciplinary design with human-AI collaborative oversight is essential for tracing and ensuring responsibility throughout the lifecycle of LLM-MAS. Our position views LLM-MAS not as loose collections of agents, but as unified, dynamic socio-technical systems that demand principled mechanisms to support each dimension of responsibility and enable ethically aligned, verifiably coherent, and resilient behavior for sustained, system-wide agreement.
comment: Under Review
Static Sandboxes Are Inadequate: Modeling Societal Complexity Requires Open-Ended Co-Evolution in LLM-Based Multi-Agent Simulations
What if artificial agents could not just communicate, but also evolve, adapt, and reshape their worlds in ways we cannot fully predict? With llm now powering multi-agent systems and social simulations, we are witnessing new possibilities for modeling open-ended, ever-changing environments. Yet, most current simulations remain constrained within static sandboxes, characterized by predefined tasks, limited dynamics, and rigid evaluation criteria. These limitations prevent them from capturing the complexity of real-world societies. In this paper, we argue that static, task-specific benchmarks are fundamentally inadequate and must be rethought. We critically review emerging architectures that blend llm with multi-agent dynamics, highlight key hurdles such as balancing stability and diversity, evaluating unexpected behaviors, and scaling to greater complexity, and introduce a fresh taxonomy for this rapidly evolving field. Finally, we present a research roadmap centered on open-endedness, continuous co-evolution, and the development of resilient, socially aligned AI ecosystems. \textbf{We call on the community to move beyond static paradigms and help shape the next generation of adaptive, socially-aware multi-agent simulations.}
GUARDIAN: Safeguarding LLM Multi-Agent Collaborations with Temporal Graph Modeling
The emergence of large language models (LLMs) enables the development of intelligent agents capable of engaging in complex and multi-turn dialogues. However, multi-agent collaboration faces critical safety challenges, such as hallucination amplification and error injection and propagation. This paper presents GUARDIAN, a unified method for detecting and mitigating multiple safety concerns in GUARDing Intelligent Agent collaboratioNs. By modeling the multi-agent collaboration process as a discrete-time temporal attributed graph, GUARDIAN explicitly captures the propagation dynamics of hallucinations and errors. The unsupervised encoder-decoder architecture incorporating an incremental training paradigm learns to reconstruct node attributes and graph structures from latent embeddings, enabling the identification of anomalous nodes and edges with unparalleled precision. Moreover, we introduce a graph abstraction mechanism based on the Information Bottleneck Theory, which compresses temporal interaction graphs while preserving essential patterns. Extensive experiments demonstrate GUARDIAN's effectiveness in safeguarding LLM multi-agent collaborations against diverse safety vulnerabilities, achieving state-of-the-art accuracy with efficient resource utilization. The code is available at https://github.com/JialongZhou666/GUARDIAN
MACTAS: Self-Attention-Based Module for Inter-Agent Communication in Multi-Agent Reinforcement Learning AAMAS 2026
Communication is essential for the collective execution of complex tasks by human agents, motivating interest in communication mechanisms for multi-agent reinforcement learning (MARL). However, existing communication protocols in MARL are often complex and non-differentiable. In this work, we introduce a self-attention-based communication module that exchanges information between the agents in MARL. Our proposed approach is fully differentiable, allowing agents to learn to generate messages in a reward-driven manner. The module can be seamlessly integrated with any action-value function decomposition method and can be viewed as an extension of such decompositions. Notably, it includes a fixed number of trainable parameters, independent of the number of agents. Experimental results on the SMAC and SMACv2 benchmarks demonstrate the effectiveness of our approach, which achieves state-of-the-art performance on a number of maps.
comment: Submitted for AAMAS 2026
Benchmarking LLMs' Swarm intelligence
Large Language Models (LLMs) show potential for complex reasoning, yet their capacity for emergent coordination in Multi-Agent Systems (MAS) when operating under strict swarm-like constraints-limited local perception and communication-remains largely unexplored. Existing benchmarks often do not fully capture the unique challenges of decentralized coordination when agents operate with incomplete spatio-temporal information. To bridge this gap, we introduce SwarmBench, a novel benchmark designed to systematically evaluate the swarm intelligence capabilities of LLMs acting as decentralized agents. SwarmBench features five foundational MAS coordination tasks (Pursuit, Synchronization, Foraging, Flocking, Transport) within a configurable 2D grid environment, forcing agents to rely solely on local sensory input ($k\times k$ view) and local communication. We propose metrics for coordination effectiveness and analyze emergent group dynamics. Zero-shot evaluations of leading LLMs (e.g., deepseek-v3, o4-mini) reveal significant task-dependent performance variations. While some rudimentary coordination is observed, our results indicate that current LLMs significantly struggle with robust long-range planning and adaptive strategy formation under the uncertainty inherent in these decentralized scenarios. Assessing LLMs under such swarm-like constraints is crucial for understanding their utility in future decentralized intelligent systems. We release SwarmBench as an open, extensible toolkit-built on a customizable physical system-providing environments, prompts, evaluation scripts, and comprehensive datasets. This aims to foster reproducible research into LLM-based MAS coordination and the theoretical underpinnings of emergent collective behavior under severe informational decentralization. Our code repository is available at https://github.com/x66ccff/swarmbench.
Constant-Memory Strategies in Stochastic Games: Best Responses and Equilibria
Stochastic games have become a prevalent framework for studying long-term multi-agent interactions, especially in the context of multi-agent reinforcement learning. In this work, we comprehensively investigate the concept of constant-memory strategies in stochastic games. We first establish some results on best responses and Nash equilibria for behavioral constant-memory strategies, followed by a discussion on the computational hardness of best responding to mixed constant-memory strategies. Those theoretic insights are later verified on several sequential decision-making testbeds, including the $\textit{Iterated Prisoner's Dilemma}$, the $\textit{Iterated Traveler's Dilemma}$, and the $\textit{Pursuit}$ domain. This work aims to enhance the understanding of theoretical issues in single-agent planning under multi-agent systems, and uncover the connection between decision models in single-agent and multi-agent contexts. The code is available at $\texttt{https://github.com/Fernadoo/Const-Mem.}$
comment: 21 pages. Under review
Robotics
HYPE: Hybrid Planning with Ego Proposal-Conditioned Predictions
Safe and interpretable motion planning in complex urban environments needs to reason about bidirectional multi-agent interactions. This reasoning requires to estimate the costs of potential ego driving maneuvers. Many existing planners generate initial trajectories with sampling-based methods and refine them by optimizing on learned predictions of future environment states, which requires a cost function that encodes the desired vehicle behavior. Designing such a cost function can be very challenging, especially if a wide range of complex urban scenarios has to be considered. We propose HYPE: HYbrid Planning with Ego proposal-conditioned predictions, a planner that integrates multimodal trajectory proposals from a learned proposal model as heuristic priors into a Monte Carlo Tree Search (MCTS) refinement. To model bidirectional interactions, we introduce an ego-conditioned occupancy prediction model, enabling consistent, scene-aware reasoning. Our design significantly simplifies cost function design in refinement by considering proposal-driven guidance, requiring only minimalistic grid-based cost terms. Evaluations on large-scale real-world benchmarks nuPlan and DeepUrban show that HYPE effectively achieves state-of-the-art performance, especially in safety and adaptability.
T(R,O) Grasp: Efficient Graph Diffusion of Robot-Object Spatial Transformation for Cross-Embodiment Dexterous Grasping
Dexterous grasping remains a central challenge in robotics due to the complexity of its high-dimensional state and action space. We introduce T(R,O) Grasp, a diffusion-based framework that efficiently generates accurate and diverse grasps across multiple robotic hands. At its core is the T(R,O) Graph, a unified representation that models spatial transformations between robotic hands and objects while encoding their geometric properties. A graph diffusion model, coupled with an efficient inverse kinematics solver, supports both unconditioned and conditioned grasp synthesis. Extensive experiments on a diverse set of dexterous hands show that T(R,O) Grasp achieves average success rate of 94.83%, inference speed of 0.21s, and throughput of 41 grasps per second on an NVIDIA A100 40GB GPU, substantially outperforming existing baselines. In addition, our approach is robust and generalizable across embodiments while significantly reducing memory consumption. More importantly, the high inference speed enables closed-loop dexterous manipulation, underscoring the potential of T(R,O) Grasp to scale into a foundation model for dexterous grasping.
comment: 12 pages, 14 figures
Residual MPC: Blending Reinforcement Learning with GPU-Parallelized Model Predictive Control
Model Predictive Control (MPC) provides interpretable, tunable locomotion controllers grounded in physical models, but its robustness depends on frequent replanning and is limited by model mismatch and real-time computational constraints. Reinforcement Learning (RL), by contrast, can produce highly robust behaviors through stochastic training but often lacks interpretability, suffers from out-of-distribution failures, and requires intensive reward engineering. This work presents a GPU-parallelized residual architecture that tightly integrates MPC and RL by blending their outputs at the torque-control level. We develop a kinodynamic whole-body MPC formulation evaluated across thousands of agents in parallel at 100 Hz for RL training. The residual policy learns to make targeted corrections to the MPC outputs, combining the interpretability and constraint handling of model-based control with the adaptability of RL. The model-based control prior acts as a strong bias, initializing and guiding the policy towards desirable behavior with a simple set of rewards. Compared to standalone MPC or end-to-end RL, our approach achieves higher sample efficiency, converges to greater asymptotic rewards, expands the range of trackable velocity commands, and enables zero-shot adaptation to unseen gaits and uneven terrain.
comment: TRO submission preprint
Reflection-Based Task Adaptation for Self-Improving VLA
Pre-trained Vision-Language-Action (VLA) models represent a major leap towards general-purpose robots, yet efficiently adapting them to novel, specific tasks in-situ remains a significant hurdle. While reinforcement learning (RL) is a promising avenue for such adaptation, the process often suffers from low efficiency, hindering rapid task mastery. We introduce Reflective Self-Adaptation, a framework for rapid, autonomous task adaptation without human intervention. Our framework establishes a self-improving loop where the agent learns from its own experience to enhance both strategy and execution. The core of our framework is a dual-pathway architecture that addresses the full adaptation lifecycle. First, a Failure-Driven Reflective RL pathway enables rapid learning by using the VLM's causal reasoning to automatically synthesize a targeted, dense reward function from failure analysis. This provides a focused learning signal that significantly accelerates policy exploration. However, optimizing such proxy rewards introduces a potential risk of "reward hacking," where the agent masters the reward function but fails the actual task. To counteract this, our second pathway, Success-Driven Quality-Guided SFT, grounds the policy in holistic success. It identifies and selectively imitates high-quality successful trajectories, ensuring the agent remains aligned with the ultimate task goal. This pathway is strengthened by a conditional curriculum mechanism to aid initial exploration. We conduct experiments in challenging manipulation tasks. The results demonstrate that our framework achieves faster convergence and higher final success rates compared to representative baselines. Our work presents a robust solution for creating self-improving agents that can efficiently and reliably adapt to new environments.
EReLiFM: Evidential Reliability-Aware Residual Flow Meta-Learning for Open-Set Domain Generalization under Noisy Labels
Open-Set Domain Generalization (OSDG) aims to enable deep learning models to recognize unseen categories in new domains, which is crucial for real-world applications. Label noise hinders open-set domain generalization by corrupting source-domain knowledge, making it harder to recognize known classes and reject unseen ones. While existing methods address OSDG under Noisy Labels (OSDG-NL) using hyperbolic prototype-guided meta-learning, they struggle to bridge domain gaps, especially with limited clean labeled data. In this paper, we propose Evidential Reliability-Aware Residual Flow Meta-Learning (EReLiFM). We first introduce an unsupervised two-stage evidential loss clustering method to promote label reliability awareness. Then, we propose a residual flow matching mechanism that models structured domain- and category-conditioned residuals, enabling diverse and uncertainty-aware transfer paths beyond interpolation-based augmentation. During this meta-learning process, the model is optimized such that the update direction on the clean set maximizes the loss decrease on the noisy set, using pseudo labels derived from the most confident predicted class for supervision. Experimental results show that EReLiFM outperforms existing methods on OSDG-NL, achieving state-of-the-art performance. The source code is available at https://github.com/KPeng9510/ERELIFM.
comment: The source code is available at https://github.com/KPeng9510/ERELIFM
Autonomous Legged Mobile Manipulation for Lunar Surface Operations via Constrained Reinforcement Learning
Robotics plays a pivotal role in planetary science and exploration, where autonomous and reliable systems are crucial due to the risks and challenges inherent to space environments. The establishment of permanent lunar bases demands robotic platforms capable of navigating and manipulating in the harsh lunar terrain. While wheeled rovers have been the mainstay for planetary exploration, their limitations in unstructured and steep terrains motivate the adoption of legged robots, which offer superior mobility and adaptability. This paper introduces a constrained reinforcement learning framework designed for autonomous quadrupedal mobile manipulators operating in lunar environments. The proposed framework integrates whole-body locomotion and manipulation capabilities while explicitly addressing critical safety constraints, including collision avoidance, dynamic stability, and power efficiency, in order to ensure robust performance under lunar-specific conditions, such as reduced gravity and irregular terrain. Experimental results demonstrate the framework's effectiveness in achieving precise 6D task-space end-effector pose tracking, achieving an average positional accuracy of 4 cm and orientation accuracy of 8.1 degrees. The system consistently respects both soft and hard constraints, exhibiting adaptive behaviors optimized for lunar gravity conditions. This work effectively bridges adaptive learning with essential mission-critical safety requirements, paving the way for advanced autonomous robotic explorers for future lunar missions.
comment: This is the authors version of the paper accepted for publication in The IEEE International Conference on Space Robotics 2025. The final version link will be added here after conference proceedings are published
Maximal Adaptation, Minimal Guidance: Permissive Reactive Robot Task Planning with Humans in the Loop
We present a novel framework for human-robot \emph{logical} interaction that enables robots to reliably satisfy (infinite horizon) temporal logic tasks while effectively collaborating with humans who pursue independent and unknown tasks. The framework combines two key capabilities: (i) \emph{maximal adaptation} enables the robot to adjust its strategy \emph{online} to exploit human behavior for cooperation whenever possible, and (ii) \emph{minimal tunable feedback} enables the robot to request cooperation by the human online only when necessary to guarantee progress. This balance minimizes human-robot interference, preserves human autonomy, and ensures persistent robot task satisfaction even under conflicting human goals. We validate the approach in a real-world block-manipulation task with a Franka Emika Panda robotic arm and in the Overcooked-AI benchmark, demonstrating that our method produces rich, \emph{emergent} cooperative behaviors beyond the reach of existing approaches, while maintaining strong formal guarantees.
Designing Tools with Control Confidence
Prehistoric humans invented stone tools for specialized tasks by not just maximizing the tool's immediate goal-completion accuracy, but also increasing their confidence in the tool for later use under similar settings. This factor contributed to the increased robustness of the tool, i.e., the least performance deviations under environmental uncertainties. However, the current autonomous tool design frameworks solely rely on performance optimization, without considering the agent's confidence in tool use for repeated use. Here, we take a step towards filling this gap by i) defining an optimization framework for task-conditioned autonomous hand tool design for robots, where ii) we introduce a neuro-inspired control confidence term into the optimization routine that helps the agent to design tools with higher robustness. Through rigorous simulations using a robotic arm, we show that tools designed with control confidence as the objective function are more robust to environmental uncertainties during tool use than a pure accuracy-driven objective. We further show that adding control confidence to the objective function for tool design provides a balance between the robustness and goal accuracy of the designed tools under control perturbations. Finally, we show that our CMAES-based evolutionary optimization strategy for autonomous tool design outperforms other state-of-the-art optimizers by designing the optimal tool within the fewest iterations. Code: https://github.com/ajitham123/Tool_design_control_confidence.
Learning Robust Agile Flight Control with Stability Guarantees
In the evolving landscape of high-speed agile quadrotor flight, achieving precise trajectory tracking at the platform's operational limits is paramount. Controllers must handle actuator constraints, exhibit robustness to disturbances, and remain computationally efficient for safety-critical applications. In this work, we present a novel neural-augmented feedback controller for agile flight control. The controller addresses individual limitations of existing state-of-the-art control paradigms and unifies their strengths. We demonstrate the controller's capabilities, including the accurate tracking of highly aggressive trajectories that surpass the feasibility of the actuators. Notably, the controller provides universal stability guarantees, enhancing its robustness and tracking performance even in exceedingly disturbance-prone settings. Its nonlinear feedback structure is highly efficient enabling fast computation at high update rates. Moreover, the learning process in simulation is both fast and stable, and the controller's inherent robustness allows direct deployment to real-world platforms without the need for training augmentations or fine-tuning.
CoIRL-AD: Collaborative-Competitive Imitation-Reinforcement Learning in Latent World Models for Autonomous Driving
End-to-end autonomous driving models trained solely with imitation learning (IL) often suffer from poor generalization. In contrast, reinforcement learning (RL) promotes exploration through reward maximization but faces challenges such as sample inefficiency and unstable convergence. A natural solution is to combine IL and RL. Moving beyond the conventional two-stage paradigm (IL pretraining followed by RL fine-tuning), we propose CoIRL-AD, a competitive dual-policy framework that enables IL and RL agents to interact during training. CoIRL-AD introduces a competition-based mechanism that facilitates knowledge exchange while preventing gradient conflicts. Experiments on the nuScenes dataset show an 18% reduction in collision rate compared to baselines, along with stronger generalization and improved performance on long-tail scenarios. Code is available at: https://github.com/SEU-zxj/CoIRL-AD.
comment: 18 pages, 17 figures
Two-stream network-driven vision-based tactile sensor for object feature extraction and fusion perception
Tactile perception is crucial for embodied intelligent robots to recognize objects. Vision-based tactile sensors extract object physical attributes multidimensionally using high spatial resolution; however, this process generates abundant redundant information. Furthermore, single-dimensional extraction, lacking effective fusion, fails to fully characterize object attributes. These challenges hinder the improvement of recognition accuracy. To address this issue, this study introduces a two-stream network feature extraction and fusion perception strategy for vision-based tactile systems. This strategy employs a distributed approach to extract internal and external object features. It obtains depth map information through three-dimensional reconstruction while simultaneously acquiring hardness information by measuring contact force data. After extracting features with a convolutional neural network (CNN), weighted fusion is applied to create a more informative and effective feature representation. In standard tests on objects of varying shapes and hardness, the force prediction error is 0.06 N (within a 12 N range). Hardness recognition accuracy reaches 98.0%, and shape recognition accuracy reaches 93.75%. With fusion algorithms, object recognition accuracy in actual grasping scenarios exceeds 98.5%. Focused on object physical attributes perception, this method enhances the artificial tactile system ability to transition from perception to cognition, enabling its use in embodied perception applications.
Automated Behavior Planning for Fruit Tree Pruning via Redundant Robot Manipulators: Addressing the Behavior Planning Challenge
Pruning is an essential agricultural practice for orchards. Proper pruning can promote healthier growth and optimize fruit production throughout the orchard's lifespan. Robot manipulators have been developed as an automated solution for this repetitive task, which typically requires seasonal labor with specialized skills. While previous research has primarily focused on the challenges of perception, the complexities of manipulation are often overlooked. These challenges involve planning and control in both joint and Cartesian spaces to guide the end-effector through intricate, obstructive branches. Our work addresses the behavior planning challenge for a robotic pruning system, which entails a multi-level planning problem in environments with complex collisions. In this paper, we formulate the planning problem for a high-dimensional robotic arm in a pruning scenario, investigate the system's intrinsic redundancies, and propose a comprehensive pruning workflow that integrates perception, modeling, and holistic planning. In our experiments, we demonstrate that more comprehensive planning methods can significantly enhance the performance of the robotic manipulator. Finally, we implement the proposed workflow on a real-world robot. As a result, this work complements previous efforts on robotic pruning and motivates future research and development in planning for pruning applications.
Fast Visuomotor Policy for Robotic Manipulation
We present a fast and effective policy framework for robotic manipulation, named Energy Policy, designed for high-frequency robotic tasks and resource-constrained systems. Unlike existing robotic policies, Energy Policy natively predicts multimodal actions in a single forward pass, enabling high-precision manipulation at high speed. The framework is built upon two core components. First, we adopt the energy score as the learning objective to facilitate multimodal action modeling. Second, we introduce an energy MLP to implement the proposed objective while keeping the architecture simple and efficient. We conduct comprehensive experiments in both simulated environments and real-world robotic tasks to evaluate the effectiveness of Energy Policy. The results show that Energy Policy matches or surpasses the performance of state-of-the-art manipulation methods while significantly reducing computational overhead. Notably, on the MimicGen benchmark, Energy Policy achieves superior performance with at a faster inference compared to existing approaches.
A Task-Efficient Reinforcement Learning Task-Motion Planner for Safe Human-Robot Cooperation
In a Human-Robot Cooperation (HRC) environment, safety and efficiency are the two core properties to evaluate robot performance. However, safety mechanisms usually hinder task efficiency since human intervention will cause backup motions and goal failures of the robot. Frequent motion replanning will increase the computational load and the chance of failure. In this paper, we present a hybrid Reinforcement Learning (RL) planning framework which is comprised of an interactive motion planner and a RL task planner. The RL task planner attempts to choose statistically safe and efficient task sequences based on the feedback from the motion planner, while the motion planner keeps the task execution process collision-free by detecting human arm motions and deploying new paths when the previous path is not valid anymore. Intuitively, the RL agent will learn to avoid dangerous tasks, while the motion planner ensures that the chosen tasks are safe. The proposed framework is validated on the cobot in both simulation and the real world, we compare the planner with hard-coded task motion planning methods. The results show that our planning framework can 1) react to uncertain human motions at both joint and task levels; 2) reduce the times of repeating failed goal commands; 3) reduce the total number of replanning requests.
M3D-skin: Multi-material 3D-printed Tactile Sensor with Hierarchical Infill Structures for Pressure Sensing IROS2025
Tactile sensors have a wide range of applications, from utilization in robotic grippers to human motion measurement. If tactile sensors could be fabricated and integrated more easily, their applicability would further expand. In this study, we propose a tactile sensor-M3D-skin-that can be easily fabricated with high versatility by leveraging the infill patterns of a multi-material fused deposition modeling (FDM) 3D printer as the sensing principle. This method employs conductive and non-conductive flexible filaments to create a hierarchical structure with a specific infill pattern. The flexible hierarchical structure deforms under pressure, leading to a change in electrical resistance, enabling the acquisition of tactile information. We measure the changes in characteristics of the proposed tactile sensor caused by modifications to the hierarchical structure. Additionally, we demonstrate the fabrication and use of a multi-tile sensor. Furthermore, as applications, we implement motion pattern measurement on the sole of a foot, integration with a robotic hand, and tactile-based robotic operations. Through these experiments, we validate the effectiveness of the proposed tactile sensor.
comment: Accepted to IROS2025, Website: https://ssk-yoshimura.github.io/M3D-skin/
Robot Learning: A Tutorial
Robot learning is at an inflection point, driven by rapid advancements in machine learning and the growing availability of large-scale robotics data. This shift from classical, model-based methods to data-driven, learning-based paradigms is unlocking unprecedented capabilities in autonomous systems. This tutorial navigates the landscape of modern robot learning, charting a course from the foundational principles of Reinforcement Learning and Behavioral Cloning to generalist, language-conditioned models capable of operating across diverse tasks and even robot embodiments. This work is intended as a guide for researchers and practitioners, and our goal is to equip the reader with the conceptual understanding and practical tools necessary to contribute to developments in robot learning, with ready-to-use examples implemented in $\texttt{lerobot}$.
comment: Tutorial on Robot Learning using LeRobot, the end-to-end robot learning library developed by Hugging Face
Improving Generative Behavior Cloning via Self-Guidance and Adaptive Chunking NeurIPS25
Generative Behavior Cloning (GBC) is a simple yet effective framework for robot learning, particularly in multi-task settings. Recent GBC methods often employ diffusion policies with open-loop (OL) control, where actions are generated via a diffusion process and executed in multi-step chunks without replanning. While this approach has demonstrated strong success rates and generalization, its inherent stochasticity can result in erroneous action sampling, occasionally leading to unexpected task failures. Moreover, OL control suffers from delayed responses, which can degrade performance in noisy or dynamic environments. To address these limitations, we propose two novel techniques to enhance the consistency and reactivity of diffusion policies: (1) self-guidance, which improves action fidelity by leveraging past observations and implicitly promoting future-aware behavior; and (2) adaptive chunking, which selectively updates action sequences when the benefits of reactivity outweigh the need for temporal consistency. Extensive experiments show that our approach substantially improves GBC performance across a wide range of simulated and real-world robotic manipulation tasks. Our code is available at https://github.com/junhyukso/SGAC
comment: Accepted at NeurIPS25
Controlling Intent Expressiveness in Robot Motion with Diffusion Models
Legibility of robot motion is critical in human-robot interaction, as it allows humans to quickly infer a robot's intended goal. Although traditional trajectory generation methods typically prioritize efficiency, they often fail to make the robot's intentions clear to humans. Meanwhile, existing approaches to legible motion usually produce only a single "most legible" trajectory, overlooking the need to modulate intent expressiveness in different contexts. In this work, we propose a novel motion generation framework that enables controllable legibility across the full spectrum, from highly legible to highly ambiguous motions. We introduce a modeling approach based on an Information Potential Field to assign continuous legibility scores to trajectories, and build upon it with a two-stage diffusion framework that first generates paths at specified legibility levels and then translates them into executable robot actions. Experiments in both 2D and 3D reaching tasks demonstrate that our approach produces diverse and controllable motions with varying degrees of legibility, while achieving performance comparable to SOTA. Code and project page: https://legibility-modulator.github.io.
comment: Using diffusion models trained on quality diversity datasets for generating robot motions with adjustable legibility levels
Pretraining in Actor-Critic Reinforcement Learning for Robot Motion Control ICLR 2026
The pretraining-finetuning paradigm has facilitated numerous transformative advancements in artificial intelligence research in recent years. However, in the domain of reinforcement learning (RL) for robot motion control, individual skills are often learned from scratch despite the high likelihood that some generalizable knowledge is shared across all task-specific policies belonging to a single robot embodiment. This work aims to define a paradigm for pretraining neural network models that encapsulate such knowledge and can subsequently serve as a basis for warm-starting the RL process in classic actor-critic algorithms, such as Proximal Policy Optimization (PPO). We begin with a task-agnostic exploration-based data collection algorithm to gather diverse, dynamic transition data, which is then used to train a Proprioceptive Inverse Dynamics Model (PIDM) through supervised learning. The pretrained weights are loaded into both the actor and critic networks to warm-start the policy optimization of actual tasks. We systematically validated our proposed method on seven distinct robot motion control tasks, showing significant benefits to this initialization strategy. Our proposed approach on average improves sample efficiency by 40.1% and task performance by 7.5%, compared to random initialization. We further present key ablation studies and empirical analyses that shed light on the mechanisms behind the effectiveness of our method.
comment: Submitted to ICLR 2026
A Unidirectionally Connected FAS Approach for 6-DOF Quadrotor Control
This paper proposes a unidirectionally connected fully actuated system (UC-FAS) approach for the sub-stabilization and tracking control of 6-DOF quadrotors, tackling limitations both in state-space and FAS framework to some extent. The framework systematically converts underactuated quadrotor dynamics into a UC-FAS model, unifying the existing different FAS transformation ways. By eliminating estimation of the high-order derivatives of control inputs, a drawback of current methods, the UC-FAS model simplifies controller design and enables direct eigenstructure assignment for closed-loop dynamics. Simulations demonstrate precise 6-DOF tracking performance. This work bridges theoretical FAS approach advancements with practical implementation needs, offering a standardized paradigm for nonlinear quadrotor control.
comment: This paper has been submitted to 2026 IFAC World Congress. Corresponding author: Guang-Ren Duan
PolygMap: A Perceptive Locomotion Framework for Humanoid Robot Stair Climbing
Recently, biped robot walking technology has been significantly developed, mainly in the context of a bland walking scheme. To emulate human walking, robots need to step on the positions they see in unknown spaces accurately. In this paper, we present PolyMap, a perception-based locomotion planning framework for humanoid robots to climb stairs. Our core idea is to build a real-time polygonal staircase plane semantic map, followed by a footstep planar using these polygonal plane segments. These plane segmentation and visual odometry are done by multi-sensor fusion(LiDAR, RGB-D camera and IMUs). The proposed framework is deployed on a NVIDIA Orin, which performs 20-30 Hz whole-body motion planning output. Both indoor and outdoor real-scene experiments indicate that our method is efficient and robust for humanoid robot stair climbing.
Achieving Meaningful Collaboration: Worker-centered Design of a Physical Human-Robot Collaborative Blending Task ICRA
The use of robots in industrial settings continues to grow, driven by the need to address complex societal challenges such as labor shortages, aging populations, and ever-increasing production demands. In this abstract, we advocate for (and demonstrate) a transdisciplinary approach when considering robotics in the workplace. Transdisciplinarity emphasizes the integration of academic research with pragmatic expertise and embodied experiential knowledge, that prioritize values such as worker wellbeing and job attractiveness. In the following, we describe an ongoing multi-pronged effort to explore the potential of collaborative robots in the context of airplane engine repair and maintenance operations.
comment: 3 pages, 1 figure, ICRA@40 (Extended abstract)
Shape-Aware Whole-Body Control for Continuum Robots with Application in Endoluminal Surgical Robotics
This paper presents a shape-aware whole-body control framework for tendon-driven continuum robots with direct application to endoluminal surgical navigation. Endoluminal procedures, such as bronchoscopy, demand precise and safe navigation through tortuous, patient-specific anatomy where conventional tip-only control often leads to wall contact, tissue trauma, or failure to reach distal targets. To address these challenges, our approach combines a physics-informed backbone model with residual learning through an Augmented Neural ODE, enabling accurate shape estimation and efficient Jacobian computation. A sampling-based Model Predictive Path Integral (MPPI) controller leverages this representation to jointly optimize tip tracking, backbone conformance, and obstacle avoidance under actuation constraints. A task manager further enhances adaptability by allowing real-time adjustment of objectives, such as wall clearance or direct advancement, during tele-operation. Extensive simulation studies demonstrate millimeter-level accuracy across diverse scenarios, including trajectory tracking, dynamic obstacle avoidance, and shape-constrained reaching. Real-robot experiments on a bronchoscopy phantom validate the framework, showing improved lumen-following accuracy, reduced wall contacts, and enhanced adaptability compared to joystick-only navigation and existing baselines. These results highlight the potential of the proposed framework to increase safety, reliability, and operator efficiency in minimally invasive endoluminal surgery, with broader applicability to other confined and safety-critical environments.
Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model
Vision-language-action (VLA) models have recently shown strong potential in enabling robots to follow language instructions and execute precise actions. However, most VLAs are built upon vision-language models pretrained solely on 2D data, which lack accurate spatial awareness and hinder their ability to operate in the 3D physical world. Existing solutions attempt to incorporate explicit 3D sensor inputs such as depth maps or point clouds, but these approaches face challenges due to sensor noise, hardware heterogeneity, and incomplete depth coverage in existing datasets. Alternative methods that estimate 3D cues from 2D images also suffer from the limited performance of depth estimators.We propose Spatial Forcing (SF), a simple yet effective alignment strategy that implicitly forces VLA models to develop spatial comprehension capabilities without relying on explicit 3D inputs or depth estimators. SF aligns intermediate visual embeddings of VLAs with geometric representations produced by pretrained 3D foundation models. By enforcing alignment at intermediate layers, SF guides VLAs to encode richer spatial representations that enhance action precision.Extensive experiments in simulation and real-world environments demonstrate that SF achieves state-of-the-art results, surpassing both 2D- and 3D-based VLAs. SF further accelerates training by up to 3.8x and improves data efficiency across diverse robotic tasks. Project page is at https://spatial-forcing.github.io/
Learning Social Navigation from Positive and Negative Demonstrations and Rule-Based Specifications
Mobile robot navigation in dynamic human environments requires policies that balance adaptability to diverse behaviors with compliance to safety constraints. We hypothesize that integrating data-driven rewards with rule-based objectives enables navigation policies to achieve a more effective balance of adaptability and safety. To this end, we develop a framework that learns a density-based reward from positive and negative demonstrations and augments it with rule-based objectives for obstacle avoidance and goal reaching. A sampling-based lookahead controller produces supervisory actions that are both safe and adaptive, which are subsequently distilled into a compact student policy suitable for real-time operation with uncertainty estimates. Experiments in synthetic and elevator co-boarding simulations show consistent gains in success rate and time efficiency over baselines, and real-world demonstrations with human participants confirm the practicality of deployment. A video illustrating this work can be found on our project page https://chanwookim971024.github.io/PioneeR/.
comment: For more videos, see https://chanwookim971024.github.io/PioneeR/
Controllable Collision Scenario Generation via Collision Pattern Prediction ICRA
Evaluating the safety of autonomous vehicles (AVs) requires diverse, safety-critical scenarios, with collisions being especially important yet rare and unsafe to collect in the real world. Therefore, the community has been focusing on generating safety-critical scenarios in simulation. However, controlling attributes such as collision type and time-to-accident (TTA) remains challenging. We introduce a new task called controllable collision scenario generation, where the goal is to produce trajectories that realize a user-specified collision type and TTA, to investigate the feasibility of automatically generating desired collision scenarios. To support this task, we present COLLIDE, a large-scale collision scenario dataset constructed by transforming real-world driving logs into diverse collisions, balanced across five representative collision types and different TTA intervals. We propose a framework that predicts Collision Pattern, a compact and interpretable representation that captures the spatial configuration of the ego and the adversarial vehicles at impact, before rolling out full adversarial trajectories. Experiments show that our approach outperforms strong baselines in both collision rate and controllability. Furthermore, generated scenarios consistently induce higher planner failure rates, revealing limitations of existing planners. We demonstrate that these scenarios fine-tune planners for robustness improvements, contributing to safer AV deployment in different collision scenarios.
comment: 8 pages, 3 figures. Submitted to IEEE International Conference on Robotics and Automation (ICRA) 2026
UniGS: Unified Geometry-Aware Gaussian Splatting for Multimodal Rendering
In this paper, we propose UniGS, a unified map representation and differentiable framework for high-fidelity multimodal 3D reconstruction based on 3D Gaussian Splatting. Our framework integrates a CUDA-accelerated rasterization pipeline capable of rendering photo-realistic RGB images, geometrically accurate depth maps, consistent surface normals, and semantic logits simultaneously. We redesign the rasterization to render depth via differentiable ray-ellipsoid intersection rather than using Gaussian centers, enabling effective optimization of rotation and scale attribute through analytic depth gradients. Furthermore, we derive the analytic gradient formulation for surface normal rendering, ensuring geometric consistency among reconstructed 3D scenes. To improve computational and storage efficiency, we introduce a learnable attribute that enables differentiable pruning of Gaussians with minimal contribution during training. Quantitative and qualitative experiments demonstrate state-of-the-art reconstruction accuracy across all modalities, validating the efficacy of our geometry-aware paradigm. Source code and multimodal viewer will be available on GitHub.
Hybrid Terrain-Aware Path Planning: Integrating VD--RRT\(^{*}\) Exploration and VD--D\(^{*}\) Lite Repair
Autonomous ground vehicles operating off-road must plan curvature-feasible paths while accounting for spatially varying soil strength and slope hazards in real time. We present a continuous state--cost metric that combines a Bekker pressure--sinkage model with elevation-derived slope and attitude penalties. The resulting terrain cost field is analytic, bounded, and monotonic in soil modulus and slope, ensuring well-posed discretization and stable updates under sensor noise. This metric is evaluated on a lattice with exact steering primitives: Dubins and Reeds--Shepp motions for differential drive and time-parameterized bicycle arcs for Ackermann steering. Global exploration is performed using Vehicle-Dynamics RRT\(^{*}\), while local repair is managed by Vehicle-Dynamics D\(^{*}\) Lite, enabling millisecond-scale replanning without heuristic smoothing. By separating the terrain--vehicle model from the planner, the framework provides a reusable basis for deterministic, sampling-based, or learning-driven planning in deformable terrain. Hardware trials on an off-road platform demonstrate real-time navigation across soft soil and slope transitions, supporting reliable autonomy in unstructured environments.
Gaussian Semantic Field for One-shot LiDAR Global Localization
We present a one-shot LiDAR global localization algorithm featuring semantic disambiguation ability based on a lightweight tri-layered scene graph. While landmark semantic registration-based methods have shown promising performance improvements in global localization compared with geometric-only methods, landmarks can be repetitive and misleading for correspondence establishment. We propose to mitigate this problem by modeling semantic distributions with continuous functions learned from a population of Gaussian processes. Compared with discrete semantic labels, the continuous functions capture finer-grained geo-semantic information and also provide more detailed metric information for correspondence establishment. We insert this continuous function as the middle layer between the object layer and the metric-semantic layer, forming a tri-layered 3D scene graph, serving as a light-weight yet performant backend for one-shot localization. We term our global localization pipeline Outram-GSF (Gaussian semantic field) and conduct a wide range of experiments on publicly available data sets, validating the superior performance against the current state-of-the-art.
Translating Milli/Microrobots with A Value-Centered Readiness Framework
Untethered mobile milli/microrobots hold transformative potential for interventional medicine by enabling more precise and entirely non-invasive diagnosis and therapy. Realizing this promise requires bridging the gap between groundbreaking laboratory demonstrations and successful clinical integration. Despite remarkable technical progress over the past two decades, most millirobots and microrobots remain confined to laboratory proof-of-concept demonstrations, with limited real-world feasibility. In this Review, we identify key factors that slow translation from bench to bedside, focusing on the disconnect between technical innovation and real-world application. We argue that the long-term impact and sustainability of the field depend on aligning development with unmet medical needs, ensuring applied feasibility, and integrating seamlessly into existing clinical workflows, which are essential pillars for delivering meaningful patient outcomes. To support this shift, we introduce a strategic milli/microrobot Technology Readiness Level framework (mTRL), which maps system development from initial conceptualization to clinical adoption through clearly defined milestones and their associated stepwise activities. The mTRL model provides a structured gauge of technological maturity, a common language for cross-disciplinary collaboration and actionable guidance to accelerate translational development toward new, safer and more efficient interventions.
EmboMatrix: A Scalable Training-Ground for Embodied Decision-Making
Embodied decision-making enables agents to translate high-level goals into executable actions through continuous interactions within the physical world, forming a cornerstone of general-purpose embodied intelligence. Large language models (LLMs), with their general decision-making capabilities, offer a promising path to realize this potential; however, LLMs trained solely on language lack exposure to physical environments, limiting their true embodied understanding. To bridge this gap, we propose the concept of a training ground: a comprehensive infrastructure that provides task and scene simulation, embodied interaction, and feedback signals, offering a one-stop solution for LLM acquire genuine embodied decision-making skills. In this work, we present EmboMatrix, the first training ground of its kind, providing massive and diverse tasks with efficient simulation and precise rewards. EmboMatrix incorporates a series of novel techniques: a multi-agent data engine for large-scale task and scene generation, a distributed heterogeneous-hardware system for scalable simulation, and a multi-level reward architecture for precise supervision. Leveraging EmboMatrix, we cultivate EmboBrain, an LLM whose embodied decision-making abilities emerge from extensive embodied interactions. Experiments show that EmboBrain-7B surpasses the 671B DeepSeek-R1 baseline by 9.5\% on two challenging embodied decision-making benchmarks, demonstrating the power of interactive, environment-grounded learning for building truly intelligent embodied agents.
comment: 10 pages 8 figures
Kinematic Kitbashing for Modeling Functional Articulated Objects
We introduce Kinematic Kitbashing, an automatic framework that synthesizes functionality-aware articulated objects by reusing parts from existing models. Given a kinematic graph with a small collection of articulated parts, our optimizer jointly solves for the spatial placement of every part so that (i) attachments remain geometrically sound over the entire range of motion and (ii) the assembled object satisfies user-specified functional goals such as collision-free actuation, reachability, or trajectory following. At its core is a kinematics-aware attachment energy that aligns vector distance function features sampled across multiple articulation snapshots. We embed this attachment term within an annealed Riemannian Langevin dynamics sampler that treats functionality objectives as additional energies, enabling robust global exploration while accommodating non-differentiable functionality objectives and constraints. Our framework produces a wide spectrum of assembled articulated shapes, from trash-can wheels grafted onto car bodies to multi-segment lamps, gear-driven paddlers, and reconfigurable furniture, and delivers strong quantitative improvements over state-of-the-art baselines across geometric, kinematic, and functional metrics. By tightly coupling articulation-aware geometry matching with functionality-driven optimization, Kinematic Kitbashing bridges part-based shape modeling and functional assembly design, empowering rapid creation of interactive articulated assets.
Development of a Linear Guide-Rail Testbed for Physically Emulating ISAM Operations
In-Space Servicing, Assembly, and Manufacturing (ISAM) is a set of emerging operations that provides several benefits to improve the longevity, capacity, mo- bility, and expandability of existing and future space assets. Serial robotic ma- nipulators are particularly vital in accomplishing ISAM operations, however, the complex perturbation forces and motions associated with movement of a robotic arm on a free-flying satellite presents a complex controls problem requiring addi- tional study. While many dynamical models are developed, experimentally test- ing and validating these models is challenging given that the models operate in space, where satellites have six-degrees-of-freedom (6-DOF). This paper attempts to resolve those challenges by presenting the design and development of a new hardware-in-the-loop (HIL) experimental testbed utilized to emulate ISAM. This emulation will be accomplished by means of a 6-DOF UR3e robotic arm attached to a satellite bus. This satellite bus is mounted to a 1-DOF guide-rail system, en- abling the satellite bus and robotic arm to move freely in one linear direction. This experimental ISAM emulation system will explore and validate models for space motion, serial robot manipulation, and contact mechanics.
comment: 12 pages, 4 figures, AAS/AIAA Space Flight Mechanics
Comparison of Forced and Unforced Rendezvous, Proximity Operations, and Docking Under Model Mismatch
This paper compares the required fuel usage for forced and unforced motion of a chaser satellite engaged in Rendezvous, Proximity Operations, and Docking (RPOD) maneuvers. Improved RPOD models are vital, particularly as the space industry expands and demands for improved fuel efficiency, cost effectiveness, and mission life span increase. This paper specifically examines the Clohessy- Wiltshire (CW) Equations and the extent of model mismatch by comparing pre- dicted trajectories from this model with a more computationally complex, higher fidelity RPOD model. This paper assesses several test cases of similar mission parameters, in each case comparing natural motion circumnavigation (NMC) with comparable forced motion circumnavigation. The Guidance, Navigation, and Con- trol (GNC) impulse maneuvers required to maintain the supposedly zero fuel CW trajectories is representative of the extent of CW model mismatch. This paper demonstrates that unforced motions are not inherently more fuel efficient than forced motions, thus permitting extended orbital operations given the higher fuel efficiency.
comment: 12 pages, 4 figures, AAS/AIAA Space Flight Mechanics
UNCAP: Uncertainty-Guided Planning Using Natural Language Communication for Cooperative Autonomous Vehicles
Safe large-scale coordination of multiple cooperative connected autonomous vehicles (CAVs) hinges on communication that is both efficient and interpretable. Existing approaches either rely on transmitting high-bandwidth raw sensor data streams or neglect perception and planning uncertainties inherent in shared data, resulting in systems that are neither scalable nor safe. To address these limitations, we propose Uncertainty-Guided Natural Language Cooperative Autonomous Planning (UNCAP), a vision-language model-based planning approach that enables CAVs to communicate via lightweight natural language messages while explicitly accounting for perception uncertainty in decision-making. UNCAP features a two-stage communication protocol: (i) an ego CAV first identifies the subset of vehicles most relevant for information exchange, and (ii) the selected CAVs then transmit messages that quantitatively express their perception uncertainty. By selectively fusing messages that maximize mutual information, this strategy allows the ego vehicle to integrate only the most relevant signals into its decision-making, improving both the scalability and reliability of cooperative planning. Experiments across diverse driving scenarios show a 63% reduction in communication bandwidth with a 31% increase in driving safety score, a 61% reduction in decision uncertainty, and a four-fold increase in collision distance margin during near-miss events. Project website: https://uncap-project.github.io/
Actron3D: Learning Actionable Neural Functions from Videos for Transferable Robotic Manipulation
We present Actron3D, a framework that enables robots to acquire transferable 6-DoF manipulation skills from just a few monocular, uncalibrated, RGB-only human videos. At its core lies the Neural Affordance Function, a compact object-centric representation that distills actionable cues from diverse uncalibrated videos-geometry, visual appearance, and affordance-into a lightweight neural network, forming a memory bank of manipulation skills. During deployment, we adopt a pipeline that retrieves relevant affordance functions and transfers precise 6-DoF manipulation policies via coarse-to-fine optimization, enabled by continuous queries to the multimodal features encoded in the neural functions. Experiments in both simulation and the real world demonstrate that Actron3D significantly outperforms prior methods, achieving a 14.9 percentage point improvement in average success rate across 13 tasks while requiring only 2-3 demonstration videos per task.
comment: 8 pages, 5 figures
The Omega Turn: A General Turning Template for Elongate Robots
Elongate limbless robots have the potential to locomote through tightly packed spaces for applications such as search-and-rescue and industrial inspections. The capability to effectively and robustly maneuver elongate limbless robots is crucial to realize such potential. However, there has been limited research on turning strategies for such systems. To achieve effective and robust turning performance in cluttered spaces, we take inspiration from a microscopic nematode, C. elegans, which exhibits remarkable maneuverability in rheologically complex environments partially because of its ability to perform omega turns. Despite recent efforts to analyze omega turn kinematics, it remains unknown if there exists a wave equation sufficient to prescribe an omega turn, let alone its reconstruction on robot platforms. Here, using a comparative theory-biology approach, we prescribe the omega turn as a superposition of two traveling waves. With wave equations as a guideline, we design a controller for limbless robots enabling robust and effective turning behaviors in lab and cluttered field environments. Finally, we show that such omega turn controllers can also generalize to elongate multi-legged robots, demonstrating an alternative effective body-driven turning strategy for elongate robots, with and without limbs.
Enhancing Sampling-based Planning with a Library of Paths
Path planning for 3D solid objects is a challenging problem, requiring a search in a six-dimensional configuration space, which is, nevertheless, essential in many robotic applications such as bin-picking and assembly. The commonly used sampling-based planners, such as Rapidly-exploring Random Trees, struggle with narrow passages where the sampling probability is low, increasing the time needed to find a solution. In scenarios like robotic bin-picking, various objects must be transported through the same environment. However, traditional planners start from scratch each time, losing valuable information gained during the planning process. We address this by using a library of past solutions, allowing the reuse of previous experiences even when planning for a new, previously unseen object. Paths for a set of objects are stored, and when planning for a new object, we find the most similar one in the library and use its paths as approximate solutions, adjusting for possible mutual transformations. The configuration space is then sampled along the approximate paths. Our method is tested in various narrow passage scenarios and compared with state-of-the-art methods from the OMPL library. Results show significant speed improvements (up to 85% decrease in the required time) of our method, often finding a solution in cases where the other planners fail. Our implementation of the proposed method is released as an open-source package.
Geometric Model Predictive Path Integral for Agile UAV Control with Online Collision Avoidance
In this letter, we introduce Geometric Model Predictive Path Integral (GMPPI), a sampling-based controller capable of tracking agile trajectories while avoiding obstacles. In each iteration, GMPPI generates a large number of candidate rollout trajectories and then averages them to create a nominal control to be followed by the Unmanned Aerial Vehicle (UAV). We propose using geometric SE(3) control to generate part of the rollout trajectories, significantly increasing precision in agile flight. Furthermore, we introduce varying rollout simulation time step length and dynamic cost and noise parameters, vastly improving tracking performance of smooth and low-speed trajectories over an existing Model Predictive Path Integral (MPPI) implementation. Finally, we propose an integration of GMPPI with a stereo depth camera, enabling online obstacle avoidance at high speeds, a crucial step towards autonomous UAV flights in complex environments. The proposed controller can track simulated agile reference trajectories with position error similar to the geometric SE(3) controller. However, the same configuration of the proposed controller can avoid obstacles in a simulated forest environment at speeds of up to 13m/s, surpassing the performance of a state-of-the-art obstacle-aware planner. In real-world experiments, GMPPI retains the capability to track agile trajectories and avoids obstacles at speeds of up to 10m/s.
comment: This work has been submitted to the IEEE for possible publication
Gaussian Process Implicit Surfaces as Control Barrier Functions for Safe Robot Navigation
Level set methods underpin modern safety techniques such as control barrier functions (CBFs), while also serving as implicit surface representations for geometric shapes via distance fields. Inspired by these two paradigms, we propose a unified framework where the implicit surface itself acts as a CBF. We leverage Gaussian process (GP) implicit surface (GPIS) to represent the safety boundaries, using safety samples which are derived from sensor measurements to condition the GP. The GP posterior mean defines the implicit safety surface (safety belief), while the posterior variance provides a robust safety margin. Although GPs have favorable properties such as uncertainty estimation and analytical tractability, they scale cubically with data. To alleviate this issue, we develop a sparse solution called sparse Gaussian CBFs. To the best of our knowledge, GPIS have not been explicitly used to synthesize CBFs. We validate the approach on collision avoidance tasks in two settings: a simulated 7-DOF manipulator operating around the Stanford bunny, and a quadrotor navigating in 3D around a physical chair. In both cases, Gaussian CBFs (with and without sparsity) enable safe interaction and collision-free execution of trajectories that would otherwise intersect the objects.
comment: 8 pages, 7 figures, under review
SimULi: Real-Time LiDAR and Camera Simulation with Unscented Transforms
Rigorous testing of autonomous robots, such as self-driving vehicles, is essential to ensure their safety in real-world deployments. This requires building high-fidelity simulators to test scenarios beyond those that can be safely or exhaustively collected in the real-world. Existing neural rendering methods based on NeRF and 3DGS hold promise but suffer from low rendering speeds or can only render pinhole camera models, hindering their suitability to applications that commonly require high-distortion lenses and LiDAR data. Multi-sensor simulation poses additional challenges as existing methods handle cross-sensor inconsistencies by favoring the quality of one modality at the expense of others. To overcome these limitations, we propose SimULi, the first method capable of rendering arbitrary camera models and LiDAR data in real-time. Our method extends 3DGUT, which natively supports complex camera models, with LiDAR support, via an automated tiling strategy for arbitrary spinning LiDAR models and ray-based culling. To address cross-sensor inconsistencies, we design a factorized 3D Gaussian representation and anchoring strategy that reduces mean camera and depth error by up to 40% compared to existing methods. SimULi renders 10-20x faster than ray tracing approaches and 1.5-10x faster than prior rasterization-based work (and handles a wider range of camera models). When evaluated on two widely benchmarked autonomous driving datasets, SimULi matches or exceeds the fidelity of existing state-of-the-art methods across numerous camera and LiDAR metrics.
comment: Project page: https://research.nvidia.com/labs/sil/projects/simuli
Learning to Grasp Anything by Playing with Random Toys
Robotic manipulation policies often struggle to generalize to novel objects, limiting their real-world utility. In contrast, cognitive science suggests that children develop generalizable dexterous manipulation skills by mastering a small set of simple toys and then applying that knowledge to more complex items. Inspired by this, we study if similar generalization capabilities can also be achieved by robots. Our results indicate robots can learn generalizable grasping using randomly assembled objects that are composed from just four shape primitives: spheres, cuboids, cylinders, and rings. We show that training on these "toys" enables robust generalization to real-world objects, yielding strong zero-shot performance. Crucially, we find the key to this generalization is an object-centric visual representation induced by our proposed detection pooling mechanism. Evaluated in both simulation and on physical robots, our model achieves a 67% real-world grasping success rate on the YCB dataset, outperforming state-of-the-art approaches that rely on substantially more in-domain data. We further study how zero-shot generalization performance scales by varying the number and diversity of training toys and the demonstrations per toy. We believe this work offers a promising path to scalable and generalizable learning in robotic manipulation. Demonstration videos, code, checkpoints and our dataset are available on our project page: https://lego-grasp.github.io/ .
VLURes: Benchmarking VLM Visual and Linguistic Understanding in Low-Resource Languages
Vision Language Models (VLMs) are pivotal for advancing perception in intelligent agents. Yet, evaluation of VLMs remains limited to predominantly English-centric benchmarks in which the image-text pairs comprise short texts. To evaluate VLM fine-grained abilities, in four languages under long-text settings, we introduce a novel multilingual benchmark VLURes featuring eight vision-and-language tasks, and a pioneering unrelatedness task, to probe the fine-grained Visual and Linguistic Understanding capabilities of VLMs across English, Japanese, and low-resource languages, Swahili, and Urdu. Our datasets, curated from web resources in the target language, encompass ten diverse image categories and rich textual context, introducing valuable vision-language resources for Swahili and Urdu. By prompting VLMs to generate responses and rationales, evaluated automatically and by native speakers, we uncover performance disparities across languages and tasks critical to intelligent agents, such as object recognition, scene understanding, and relationship understanding. We conducted evaluations of ten VLMs with VLURes. The best performing model, GPT-4o, achieves an overall accuracy of 90.8% and lags human performance by 6.7%, though the gap is larger for open-source models. The gap highlights VLURes' critical role in developing intelligent agents to tackle multi-modal visual reasoning.
ManiAgent: An Agentic Framework for General Robotic Manipulation
While Vision-Language-Action (VLA) models have demonstrated impressive capabilities in robotic manipulation, their performance in complex reasoning and long-horizon task planning is limited by data scarcity and model capacity. To address this, we introduce ManiAgent, an agentic architecture for general manipulation tasks that achieves end-to-end output from task descriptions and environmental inputs to robotic manipulation actions. In this framework, multiple agents involve inter-agent communication to perform environmental perception, sub-task decomposition and action generation, enabling efficient handling of complex manipulation scenarios. Evaluations show ManiAgent achieves an 86.8% success rate on the SimplerEnv benchmark and 95.8% on real-world pick-and-place tasks, enabling efficient data collection that yields VLA models with performance comparable to those trained on human-annotated datasets. The project webpage is available at https://yi-yang929.github.io/ManiAgent/.
comment: 8 pages, 6 figures, conference
REACT3D: Recovering Articulations for Interactive Physical 3D Scenes
Interactive 3D scenes are increasingly vital for embodied intelligence, yet existing datasets remain limited due to the labor-intensive process of annotating part segmentation, kinematic types, and motion trajectories. We present REACT3D, a scalable zero-shot framework that converts static 3D scenes into simulation-ready interactive replicas with consistent geometry, enabling direct use in diverse downstream tasks. Our contributions include: (i) openable-object detection and segmentation to extract candidate movable parts from static scenes, (ii) articulation estimation that infers joint types and motion parameters, (iii) hidden-geometry completion followed by interactive object assembly, and (iv) interactive scene integration in widely supported formats to ensure compatibility with standard simulation platforms. We achieve state-of-the-art performance on detection/segmentation and articulation metrics across diverse indoor scenes, demonstrating the effectiveness of our framework and providing a practical foundation for scalable interactive scene generation, thereby lowering the barrier to large-scale research on articulated scene understanding. Our project page is https://react3d.github.io/
comment: 8 pages
RoVer: Robot Reward Model as Test-Time Verifier for Vision-Language-Action Model
Vision-Language-Action (VLA) models have become a prominent paradigm for embodied intelligence, yet further performance improvements typically rely on scaling up training data and model size -- an approach that is prohibitively expensive for robotics and fundamentally limited by data collection costs. We address this limitation with $\mathbf{RoVer}$, an embodied test-time scaling framework that uses a $\mathbf{Ro}$bot Process Reward Model (PRM) as a Test-Time $\mathbf{Ver}$ifier to enhance the capabilities of existing VLA models without modifying their architectures or weights. Specifically, RoVer (i) assigns scalar-based process rewards to evaluate the reliability of candidate actions, and (ii) predicts an action-space direction for candidate expansion/refinement. During inference, RoVer generates multiple candidate actions concurrently from the base policy, expands them along PRM-predicted directions, and then scores all candidates with PRM to select the optimal action for execution. Notably, by caching shared perception features, it can amortize perception cost and evaluate more candidates under the same test-time computational budget. Essentially, our approach effectively transforms available computing resources into better action decision-making, realizing the benefits of test-time scaling without extra training overhead. Our contributions are threefold: (1) a general, plug-and-play test-time scaling framework for VLAs; (2) a PRM that jointly provides scalar process rewards and an action-space direction to guide exploration; and (3) an efficient direction-guided sampling strategy that leverages a shared perception cache to enable scalable candidate generation and selection during inference.
Towards Safe Maneuvering of Double-Ackermann-Steering Robots with a Soft Actor-Critic Framework IROS 2025
We present a deep reinforcement learning framework based on Soft Actor-Critic (SAC) for safe and precise maneuvering of double-Ackermann-steering mobile robots (DASMRs). Unlike holonomic or simpler non-holonomic robots such as differential-drive robots, DASMRs face strong kinematic constraints that make classical planners brittle in cluttered environments. Our framework leverages the Hindsight Experience Replay (HER) and the CrossQ overlay to encourage maneuvering efficiency while avoiding obstacles. Simulation results with a heavy four-wheel-steering rover show that the learned policy can robustly reach up to 97% of target positions while avoiding obstacles. Our framework does not rely on handcrafted trajectories or expert demonstrations.
comment: 4 pages, 3 figures, 2 tables, Accepted for Safety of Intelligent and Autonomous Vehicles: Formal Methods vs. Machine Learning approaches for reliable navigation (SIAV-FM2L) an IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025) workshop
Integration of the TIAGo Robot into Isaac Sim with Mecanum Drive Modeling and Learned S-Curve Velocity Profiles
Efficient physics simulation has significantly accelerated research progress in robotics applications such as grasping and assembly. The advent of GPU-accelerated simulation frameworks like Isaac Sim has particularly empowered learning-based methods, enabling them to tackle increasingly complex tasks. The PAL Robotics TIAGo++ Omni is a versatile mobile manipulator equipped with a mecanum-wheeled base, allowing omnidirectional movement and a wide range of task capabilities. However, until now, no model of the robot has been available in Isaac Sim. In this paper, we introduce such a model, calibrated to approximate the behavior of the real robot, with a focus on its omnidirectional drive dynamics. We present two control models for the omnidirectional drive: a physically accurate model that replicates real-world wheel dynamics and a lightweight velocity-based model optimized for learning-based applications. With these models, we introduce a learning-based calibration approach to approximate the real robot's S-shaped velocity profile using minimal trajectory data recordings. This simulation should allow researchers to experiment with the robot and perform efficient learning-based control in diverse environments. We provide the integration publicly at https://github.com/AIS-Bonn/tiago_isaac.
comment: In Proceedings of IEEE 21st International Conference on Automation Science and Engineering (CASE), Los Angeles, USA, August 2025
Dynamics-aware Diffusion Models for Planning and Control
This paper addresses the problem of generating dynamically admissible trajectories for control tasks using diffusion models, particularly in scenarios where the environment is complex and system dynamics are crucial for practical application. We propose a novel framework that integrates system dynamics directly into the diffusion model's denoising process through a sequential prediction and projection mechanism. This mechanism, aligned with the diffusion model's noising schedule, ensures generated trajectories are both consistent with expert demonstrations and adhere to underlying physical constraints. Notably, our approach can generate maximum likelihood trajectories and accurately recover trajectories generated by linear feedback controllers, even when explicit dynamics knowledge is unavailable. We validate the effectiveness of our method through experiments on standard control tasks and a complex non-convex optimal control problem involving waypoint tracking and collision avoidance, demonstrating its potential for efficient trajectory generation in practical applications. Our code repository is available at www.github.com/darshangm/dynamics-aware-diffusion.
comment: 8 pages, 3 figures
PSN Game: Game-theoretic Prediction and Planning via a Player Selection Network
While game-theoretic planning frameworks are effective at modeling multi-agent interactions, they require solving large optimization problems where the number of variables increases with the number of agents, resulting in long computation times that limit their use in large-scale, real-time systems. To address this issue, we propose 1) PSN Game: a learning-based, game-theoretic prediction and planning framework that reduces runtime by learning a Player Selection Network (PSN); and 2) a Goal Inference Network (GIN) that makes it possible to use the PSN in incomplete information games where agents' intentions are unknown. A PSN outputs a player selection mask that distinguishes influential players from less relevant ones, enabling the ego player to solve a smaller, masked game involving only selected players. By reducing the number of players in the game, and therefore reducing the number of variables in the corresponding optimization problem, PSN directly lowers computation time. The PSN Game framework is more flexible than existing player selection methods as it 1) relies solely on observations of players' past trajectories, without requiring full state, action, or other game-specific information; and 2) requires no online parameter tuning. Experiments in both simulated scenarios and human trajectory datasets demonstrate that PSNs outperform baseline selection methods in 1) prediction accuracy; and 2) planning safety. PSNs also generalize effectively to real-world scenarios in which agents' objectives are unknown without fine-tuning. By selecting only the most relevant players for decision-making, PSN Game offers a general mechanism for reducing planning complexity that can be seamlessly integrated into existing multi-agent planning frameworks.
Image Quality Assessment for Embodied AI
Embodied AI has developed rapidly in recent years, but it is still mainly deployed in laboratories, with various distortions in the Real-world limiting its application. Traditionally, Image Quality Assessment (IQA) methods are applied to predict human preferences for distorted images; however, there is no IQA method to assess the usability of an image in embodied tasks, namely, the perceptual quality for robots. To provide accurate and reliable quality indicators for future embodied scenarios, we first propose the topic: IQA for Embodied AI. Specifically, we (1) based on the Mertonian system and meta-cognitive theory, constructed a perception-cognition-decision-execution pipeline and defined a comprehensive subjective score collection process; (2) established the Embodied-IQA database, containing over 36k reference/distorted image pairs, with more than 5m fine-grained annotations provided by Vision Language Models/Vision Language Action-models/Real-world robots; (3) trained and validated the performance of mainstream IQA methods on Embodied-IQA, demonstrating the need to develop more accurate quality indicators for Embodied AI. We sincerely hope that through evaluation, we can promote the application of Embodied AI under complex distortions in the Real-world. Project page: https://github.com/lcysyzxdxc/EmbodiedIQA
Product-oriented Product-Process-Resource Asset Network and its Representation in AutomationML for Asset Administration Shell
Current products, especially in the automotive sector, pose complex technical systems having a multi-disciplinary mechatronic nature. Industrial standards supporting system engineering and production typically (i) address the production phase only, but do not cover the complete product life cycle, and (ii) focus on production processes and resources rather than the products themselves. The presented approach is motivated by incorporating the impacts of the end-of-life phase of the product life cycle into the engineering phase. This paper proposes a modeling approach coming up from the Product-Process-Resource (PPR) modeling paradigm. It combines requirements on (i) respecting the product structure as a basis for the model, and (ii) incorporates repairing, remanufacturing, or upcycling within cyber-physical production systems. The proposed model called PoPAN should accompany the product during the entire life cycle as a digital shadow encapsulated within the Asset Administration Shell of a product. To facilitate the adoption of the proposed paradigm, the paper also proposes serialization of the model in the AutomationML data format. The model is demonstrated on a use-case for disassembling electric vehicle batteries to support their remanufacturing for stationary battery applications.
comment: \copyright 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
OpenLex3D: A Tiered Evaluation Benchmark for Open-Vocabulary 3D Scene Representations NeurIPS 2025
3D scene understanding has been transformed by open-vocabulary language models that enable interaction via natural language. However, at present the evaluation of these representations is limited to datasets with closed-set semantics that do not capture the richness of language. This work presents OpenLex3D, a dedicated benchmark for evaluating 3D open-vocabulary scene representations. OpenLex3D provides entirely new label annotations for scenes from Replica, ScanNet++, and HM3D, which capture real-world linguistic variability by introducing synonymical object categories and additional nuanced descriptions. Our label sets provide 13 times more labels per scene than the original datasets. By introducing an open-set 3D semantic segmentation task and an object retrieval task, we evaluate various existing 3D open-vocabulary methods on OpenLex3D, showcasing failure cases, and avenues for improvement. Our experiments provide insights on feature precision, segmentation, and downstream capabilities. The benchmark is publicly available at: https://openlex3d.github.io/.
comment: NeurIPS 2025
BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models NeurIPS 2025
Recently, leveraging pre-trained vision-language models (VLMs) for building vision-language-action (VLA) models has emerged as a promising approach to effective robot manipulation learning. However, only few methods incorporate 3D signals into VLMs for action prediction, and they do not fully leverage the spatial structure inherent in 3D data, leading to low sample efficiency. In this paper, we introduce BridgeVLA, a novel 3D VLA model that (1) projects 3D inputs to multiple 2D images, ensuring input alignment with the VLM backbone, and (2) utilizes 2D heatmaps for action prediction, unifying the input and output spaces within a consistent 2D image space. In addition, we propose a scalable pre-training method that equips the VLM backbone with the capability to predict 2D heatmaps before downstream policy learning. Extensive experiments show the proposed method is able to learn 3D manipulation efficiently and effectively. BridgeVLA outperforms state-of-the-art baseline methods across three simulation benchmarks. In RLBench, it improves the average success rate from 81.4% to 88.2%. In COLOSSEUM, it demonstrates significantly better performance in challenging generalization settings, boosting the average success rate from 56.7% to 64.0%. In GemBench, it surpasses all the comparing baseline methods in terms of average success rate. In real-robot experiments, BridgeVLA outperforms a state-of-the-art baseline method by 32% on average. It generalizes robustly in multiple out-of-distribution settings, including visual disturbances and unseen instructions. Remarkably, it is able to achieve a success rate of 96.8% on 10+ tasks with only 3 trajectories per task, highlighting its extraordinary sample efficiency. Project Website:https://bridgevla.github.io/
comment: NeurIPS 2025
Aux-Think: Exploring Reasoning Strategies for Data-Efficient Vision-Language Navigation
Vision-Language Navigation (VLN) is a critical task for developing embodied agents that can follow natural language instructions to navigate in complex real-world environments. Recent advances in VLN by large pretrained models have significantly improved generalization and instruction grounding compared to traditional approaches. However, the role of reasoning strategies in navigation-an action-centric, long-horizon task-remains underexplored, despite Chain-of-Thought (CoT) reasoning's demonstrated success in static tasks like visual question answering. To address this gap, we conduct the first systematic evaluation of reasoning strategies for VLN, including No-Think (direct action prediction), Pre-Think (reason before action), and Post-Think (reason after action). Surprisingly, our findings reveal the Inference-time Reasoning Collapse issue, where inference-time reasoning degrades navigation accuracy, highlighting the challenges of integrating reasoning into VLN. Based on this insight, we propose Aux-Think, a framework that trains models to internalize structured reasoning patterns through CoT supervision, while inferring action directly without reasoning in online prediction. To support this framework, we release R2R-CoT-320k, the first Chain-of-Thought annotated dataset for VLN. Extensive experiments show that Aux-Think reduces training effort greatly and achieves the best performance under the same data scale.
SIG-Chat: Spatial Intent-Guided Conversational Gesture Generation Involving How, When and Where
The accompanying actions and gestures in dialogue are often closely linked to interactions with the environment, such as looking toward the interlocutor or using gestures to point to the described target at appropriate moments. Speech and semantics guide the production of gestures by determining their timing (WHEN) and style (HOW), while the spatial locations of interactive objects dictate their directional execution (WHERE). Existing approaches either rely solely on descriptive language to generate motions or utilize audio to produce non-interactive gestures, thereby lacking the characterization of interactive timing and spatial intent. This significantly limits the applicability of conversational gesture generation, whether in robotics or in the fields of game and animation production. To address this gap, we present a full-stack solution. We first established a unique data collection method to simultaneously capture high-precision human motion and spatial intent. We then developed a generation model driven by audio, language, and spatial data, alongside dedicated metrics for evaluating interaction timing and spatial accuracy. Finally, we deployed the solution on a humanoid robot, enabling rich, context-aware physical interactions.
MTIL: Encoding Full History with Mamba for Temporal Imitation Learning
Standard imitation learning (IL) methods have achieved considerable success in robotics, yet often rely on the Markov assumption, which falters in long-horizon tasks where history is crucial for resolving perceptual ambiguity. This limitation stems not only from a conceptual gap but also from a fundamental computational barrier: prevailing architectures like Transformers are often constrained by quadratic complexity, rendering the processing of long, high-dimensional observation sequences infeasible. To overcome this dual challenge, we introduce Mamba Temporal Imitation Learning (MTIL). Our approach represents a new paradigm for robotic learning, which we frame as a practical synthesis of World Model and Dynamical System concepts. By leveraging the linear-time recurrent dynamics of State Space Models (SSMs), MTIL learns an implicit, action-oriented world model that efficiently encodes the entire trajectory history into a compressed, evolving state. This allows the policy to be conditioned on a comprehensive temporal context, transcending the confines of Markovian approaches. Through extensive experiments on simulated benchmarks (ACT, Robomimic, LIBERO) and on challenging real-world tasks, MTIL demonstrates superior performance against SOTA methods like ACT and Diffusion Policy, particularly in resolving long-term temporal ambiguities. Our findings not only affirm the necessity of full temporal context but also validate MTIL as a powerful and a computationally feasible approach for learning long-horizon, non-Markovian behaviors from high-dimensional observations.
comment: 8 pages,5 figures.Published in IEEE Robotics and Automation Letters (RA-L), 2025
No Plan but Everything Under Control: Robustly Solving Sequential Tasks with Dynamically Composed Gradient Descent ICRA25
We introduce a novel gradient-based approach for solving sequential tasks by dynamically adjusting the underlying myopic potential field in response to feedback and the world's regularities. This adjustment implicitly considers subgoals encoded in these regularities, enabling the solution of long sequential tasks, as demonstrated by solving the traditional planning domain of Blocks World - without any planning. Unlike conventional planning methods, our feedback-driven approach adapts to uncertain and dynamic environments, as demonstrated by one hundred real-world trials involving drawer manipulation. These experiments highlight the robustness of our method compared to planning and show how interactive perception and error recovery naturally emerge from gradient descent without explicitly implementing them. This offers a computationally efficient alternative to planning for a variety of sequential tasks, while aligning with observations on biological problem-solving strategies.
comment: Accepted at ICRA25; Supplementary Material under https://www.tu.berlin/robotics/papers/noplan ; 7 pages + 6 figures;
GPA-RAM: Grasp-Pretraining Augmented Robotic Attention Mamba for Spatial Task Learning
Most existing robot manipulation methods prioritize task learning by enhancing perception through complex deep network architectures. However, they face challenges in real-time collision-free planning. Hence, Robotic Attention Mamba (RAM) is designed for refined planning. Specifically, by integrating Mamba and parallel single-view attention, RAM aligns multi-view vision and task-related language features, ensuring efficient fine-grained task planning with linear complexity and robust real-time performance. Nevertheless, it has the potential for further improvement in high-precision grasping and manipulation. Thus, Grasp-Pretraining Augmentation (GPA) is devised, with a grasp pose feature extractor pretrained utilizing object grasp poses directly inherited from whole-task demonstrations. Subsequently, the extracted grasp features are fused with the spatially aligned planning features from RAM through attention-based Pre-trained Location Fusion, preserving high-resolution grasping cues overshadowed by an overemphasis on global planning. To summarize, we propose Grasp-Pretraining Augmented Robotic Attention Mamba (GPA-RAM), dividing spatial task learning into RAM for planning skill learning and GPA for grasping skill learning. GPA-RAM demonstrates superior performance across three robot systems with distinct camera configurations in simulation and the real world. Compared with previous state-of-the-art methods, it improves the absolute success rate by 8.2% (from 79.3% to 87.5%) on the RLBench multi-task benchmark and 40% (from 16% to 56%), 12% (from 86% to 98%) on the ALOHA bimanual manipulation tasks, while delivering notably faster inference. Furthermore, experimental results demonstrate that both RAM and GPA enhance task learning, with GPA proving robust to different architectures of pretrained grasp pose feature extractors. The project is https://logssim.github.io/GPA_RAM_website/
SPiDR: A Simple Approach for Zero-Shot Safety in Sim-to-Real Transfer
Deploying reinforcement learning (RL) safely in the real world is challenging, as policies trained in simulators must face the inevitable sim-to-real gap. Robust safe RL techniques are provably safe, however difficult to scale, while domain randomization is more practical yet prone to unsafe behaviors. We address this gap by proposing SPiDR, short for Sim-to-real via Pessimistic Domain Randomization -- a scalable algorithm with provable guarantees for safe sim-to-real transfer. SPiDR uses domain randomization to incorporate the uncertainty about the sim-to-real gap into the safety constraints, making it versatile and highly compatible with existing training pipelines. Through extensive experiments on sim-to-sim benchmarks and two distinct real-world robotic platforms, we demonstrate that SPiDR effectively ensures safety despite the sim-to-real gap while maintaining strong performance.
mmWave Radar-Based Non-Line-of-Sight Pedestrian Localization at T-Junctions Utilizing Road Layout Extraction via Camera
Pedestrians Localization in Non-Line-of-Sight (NLoS) regions within urban environments poses a significant challenge for autonomous driving systems. While mmWave radar has demonstrated potential for detecting objects in such scenarios, the 2D radar point cloud (PCD) data is susceptible to distortions caused by multipath reflections, making accurate spatial inference difficult. Additionally, although camera images provide high-resolution visual information, they lack depth perception and cannot directly observe objects in NLoS regions. In this paper, we propose a novel framework that interprets radar PCD through road layout inferred from camera for localization of NLoS pedestrians. The proposed method leverages visual information from the camera to interpret 2D radar PCD, enabling spatial scene reconstruction. The effectiveness of the proposed approach is validated through experiments conducted using a radar-camera system mounted on a real vehicle. The localization performance is evaluated using a dataset collected in outdoor NLoS driving environments, demonstrating the practical applicability of the method.
PolySim: Bridging the Sim-to-Real Gap for Humanoid Control via Multi-Simulator Dynamics Randomization
Humanoid whole-body control (WBC) policies trained in simulation often suffer from the sim-to-real gap, which fundamentally arises from simulator inductive bias, the inherent assumptions and limitations of any single simulator. These biases lead to nontrivial discrepancies both across simulators and between simulation and the real world. To mitigate the effect of simulator inductive bias, the key idea is to train policies jointly across multiple simulators, encouraging the learned controller to capture dynamics that generalize beyond any single simulator's assumptions. We thus introduce PolySim, a WBC training platform that integrates multiple heterogeneous simulators. PolySim can launch parallel environments from different engines simultaneously within a single training run, thereby realizing dynamics-level domain randomization. Theoretically, we show that PolySim yields a tighter upper bound on simulator inductive bias than single-simulator training. In experiments, PolySim substantially reduces motion-tracking error in sim-to-sim evaluations; for example, on MuJoCo, it improves execution success by 52.8 over an IsaacSim baseline. PolySim further enables zero-shot deployment on a real Unitree G1 without additional fine-tuning, showing effective transfer from simulation to the real world. We will release the PolySim code upon acceptance of this work.
comment: 8 pages, 5 figures
HEAL: An Empirical Study on Hallucinations in Embodied Agents Driven by Large Language Models EMNLP 2025
Large language models (LLMs) are increasingly being adopted as the cognitive core of embodied agents. However, inherited hallucinations, which stem from failures to ground user instructions in the observed physical environment, can lead to navigation errors, such as searching for a refrigerator that does not exist. In this paper, we present the first systematic study of hallucinations in LLM-based embodied agents performing long-horizon tasks under scene-task inconsistencies. Our goal is to understand to what extent hallucinations occur, what types of inconsistencies trigger them, and how current models respond. To achieve these goals, we construct a hallucination probing set by building on an existing benchmark, capable of inducing hallucination rates up to 40x higher than base prompts. Evaluating 12 models across two simulation environments, we find that while models exhibit reasoning, they fail to resolve scene-task inconsistencies-highlighting fundamental limitations in handling infeasible tasks. We also provide actionable insights on ideal model behavior for each scenario, offering guidance for developing more robust and reliable planning strategies.
comment: Accepted by EMNLP 2025 Findings
InternScenes: A Large-scale Simulatable Indoor Scene Dataset with Realistic Layouts
The advancement of Embodied AI heavily relies on large-scale, simulatable 3D scene datasets characterized by scene diversity and realistic layouts. However, existing datasets typically suffer from limitations in data scale or diversity, sanitized layouts lacking small items, and severe object collisions. To address these shortcomings, we introduce \textbf{InternScenes}, a novel large-scale simulatable indoor scene dataset comprising approximately 40,000 diverse scenes by integrating three disparate scene sources, real-world scans, procedurally generated scenes, and designer-created scenes, including 1.96M 3D objects and covering 15 common scene types and 288 object classes. We particularly preserve massive small items in the scenes, resulting in realistic and complex layouts with an average of 41.5 objects per region. Our comprehensive data processing pipeline ensures simulatability by creating real-to-sim replicas for real-world scans, enhances interactivity by incorporating interactive objects into these scenes, and resolves object collisions by physical simulations. We demonstrate the value of InternScenes with two benchmark applications: scene layout generation and point-goal navigation. Both show the new challenges posed by the complex and realistic layouts. More importantly, InternScenes paves the way for scaling up the model training for both tasks, making the generation and navigation in such complex scenes possible. We commit to open-sourcing the data, models, and benchmarks to benefit the whole community.
How Vulnerable Is My Learned Policy? Universal Adversarial Perturbation Attacks On Modern Behavior Cloning Policies
Learning from Demonstration (LfD) algorithms have shown promising results in robotic manipulation tasks, but their vulnerability to offline universal perturbation attacks remains underexplored. This paper presents a comprehensive study of adversarial attacks on both classic and recently proposed algorithms, including Behavior Cloning (BC), LSTM-GMM, Implicit Behavior Cloning (IBC), Diffusion Policy (DP), and Vector-Quantizied Behavior Transformer (VQ-BET). We study the vulnerability of these methods to universal adversarial perturbations. Our experiments on several simulated robotic manipulation tasks reveal that most of the current methods are highly vulnerable to adversarial perturbations. We also show that these attacks are often transferable across algorithms, architectures, and tasks, raising concerning security vulnerabilities to black-box attacks. To the best of our knowledge, we are the first to present a systematic study of the vulnerabilities of different LfD algorithms to both white-box and black-box attacks. Our findings highlight the vulnerabilities of modern BC algorithms, paving the way for future work in addressing such limitations.
DSM: Constructing a Diverse Semantic Map for 3D Visual Grounding
Effective scene representation is critical for the visual grounding ability of representations, yet existing methods for 3D Visual Grounding are often constrained. They either only focus on geometric and visual cues, or, like traditional 3D scene graphs, lack the multi-dimensional attributes needed for complex reasoning. To bridge this gap, we introduce the Diverse Semantic Map (DSM) framework, a novel scene representation framework that enriches robust geometric models with a spectrum of VLM-derived semantics, including appearance, physical properties, and affordances. The DSM is first constructed online by fusing multi-view observations within a temporal sliding window, creating a persistent and comprehensive world model. Building on this foundation, we propose DSM-Grounding, a new paradigm that shifts grounding from free-form VLM queries to a structured reasoning process over the semantic-rich map, markedly improving accuracy and interpretability. Extensive evaluations validate our approach's superiority. On the ScanRefer benchmark, DSM-Grounding achieves a state-of-the-art 59.06% overall accuracy of IoU@0.5, surpassing others by 10%. In semantic segmentation, our DSM attains a 67.93% F-mIoU, outperforming all baselines, including privileged ones. Furthermore, successful deployment on physical robots for complex navigation and grasping tasks confirms the framework's practical utility in real-world scenarios.
comment: 8 pages, 6 figures, Project Page: https://binicey.github.io/DSM
Behavior Trees and State Machines in Robotics Applications
Autonomous robots combine skills to form increasingly complex behaviors, called missions. While skills are often programmed at a relatively low abstraction level, their coordination is architecturally separated and often expressed in higher-level languages or frameworks. State machines have been the go-to language to model behavior for decades, but recently, behavior trees have gained attention among roboticists. Although several implementations of behavior trees are in use, little is known about their usage and scope in the real world.How do concepts offered by behavior trees relate to traditional languages, such as state machines? How are concepts in behavior trees and state machines used in actual applications? This paper is a study of the key language concepts in behavior trees as realized in domain-specific languages (DSLs), internal and external DSLs offered as libraries, and their use in open-source robotic applications supported by the Robot Operating System (ROS). We analyze behavior-tree DSLs and compare them to the standard language for behavior models in robotics:state machines. We identify DSLs for both behavior-modeling languages, and we analyze five in-depth.We mine open-source repositories for robotic applications that use the analyzed DSLs and analyze their usage. We identify similarities between behavior trees and state machines in terms of language design and the concepts offered to accommodate the needs of the robotics domain. We observed that the usage of behavior-tree DSLs in open-source projects is increasing rapidly. We observed similar usage patterns at model structure and at code reuse in the behavior-tree and state-machine models within the mined open-source projects. We contribute all extracted models as a dataset, hoping to inspire the community to use and further develop behavior trees, associated tools, and analysis techniques.
comment: Published at IEEE TSE as a journal extension of a preceding SLE paper (available as arXiv:2010.06256). arXiv admin note: text overlap with arXiv:2010.06256
Multiagent Systems
Ax-Prover: A Deep Reasoning Agentic Framework for Theorem Proving in Mathematics and Quantum Physics
We present Ax-Prover, a multi-agent system for automated theorem proving in Lean that can solve problems across diverse scientific domains and operate either autonomously or collaboratively with human experts. To achieve this, Ax-Prover approaches scientific problem solving through formal proof generation, a process that demands both creative reasoning and strict syntactic rigor. Ax-Prover meets this challenge by equipping Large Language Models (LLMs), which provide knowledge and reasoning, with Lean tools via the Model Context Protocol (MCP), which ensure formal correctness. To evaluate its performance as an autonomous prover, we benchmark our approach against frontier LLMs and specialized prover models on two public math benchmarks and on two Lean benchmarks we introduce in the fields of abstract algebra and quantum theory. On public datasets, Ax-Prover is competitive with state-of-the-art provers, while it largely outperform them on the new benchmarks. This shows that, unlike specialized systems that struggle to generalize, our tool-based agentic theorem prover approach offers a generalizable methodology for formal verification across diverse scientific domains. Furthermore, we demonstrate Ax-Prover's assistant capabilities in a practical use case, showing how it enabled an expert mathematician to formalize the proof of a complex cryptography theorem.
Characterizing Agent-Based Model Dynamics via $ε$-Machines and Kolmogorov-Style Complexity
We propose a two-level information-theoretic framework for characterizing the informational organization of Agent-Based Model (ABM) dynamics within the broader paradigm of Complex Adaptive Systems (CAS). At the macro level, a pooled $\epsilon$-machine is reconstructed as a reference model that summarizes the system-wide informational regime. At the micro level, $\epsilon$-machines are reconstructed for each caregiver-elder dyad and variable, and are complemented with algorithm-agnostic Kolmogorov-style measures, including normalized LZ78 complexity and bits per symbol from lossless compression. The resulting feature set $\{h_{\mu}, C_{\mu}, E, \mathrm{LZ78}, \mathrm{bps}\}$ enables distributional analysis, stratified comparisons, and unsupervised clustering across agents and scenarios. This dual-scale design preserves agent heterogeneity while providing an interpretable macro-level baseline, aligning ABM practice with CAS principles of emergence, feedback, and adaptation. A case study on caregiver-elder interactions illustrates the framework's implementation; the results and discussion will be completed following final simulation runs.
comment: 5 pages, methodological preprint
Runtime Composition in Dynamic System of Systems: A Systematic Review of Challenges, Solutions, Tools, and Evaluation Methods
Context: Modern Systems of Systems (SoSs) increasingly operate in dynamic environments (e.g., smart cities, autonomous vehicles) where runtime composition -- the on-the-fly discovery, integration, and coordination of constituent systems (CSs)--is crucial for adaptability. Despite growing interest, the literature lacks a cohesive synthesis of runtime composition in dynamic SoSs. Objective: This study synthesizes research on runtime composition in dynamic SoSs and identifies core challenges, solution strategies, supporting tools, and evaluation methods. Methods: We conducted a Systematic Literature Review (SLR), screening 1,774 studies published between 2019 and 2024 and selecting 80 primary studies for thematic analysis (TA). Results: Challenges fall into four categories: modeling and analysis, resilient operations, system orchestration, and heterogeneity of CSs. Solutions span seven areas: co-simulation and digital twins, semantic ontologies, integration frameworks, adaptive architectures, middleware, formal methods, and AI-driven resilience. Service-oriented frameworks for composition and integration dominate tooling, while simulation platforms support evaluation. Interoperability across tools, limited cross-toolchain workflows, and the absence of standardized benchmarks remain key gaps. Evaluation approaches include simulation-based, implementation-driven, and human-centered studies, which have been applied in domains such as smart cities, healthcare, defense, and industrial automation. Conclusions: The synthesis reveals tensions, including autonomy versus coordination, the modeling-reality gap, and socio-technical integration. It calls for standardized evaluation metrics, scalable decentralized architectures, and cross-domain frameworks. The analysis aims to guide researchers and practitioners in developing and implementing dynamically composable SoSs.
Inclusive Fitness as a Key Step Towards More Advanced Social Behaviors in Multi-Agent Reinforcement Learning Settings AAMAS 2022
The competitive and cooperative forces of natural selection have driven the evolution of intelligence for millions of years, culminating in nature's vast biodiversity and the complexity of human minds. Inspired by this process, we propose a novel multi-agent reinforcement learning framework where each agent is assigned a genotype and where reward functions are modelled after the concept of inclusive fitness. An agent's genetic material may be shared with other agents, and our inclusive reward function naturally accounts for this. We study the resulting social dynamics in two types of network games with prisoner's dilemmas and find that our results align with well-established principles from biology, such as Hamilton's rule. Furthermore, we outline how this framework can extend to more open-ended environments with spatial and temporal structure, finite resources, and evolving populations. We hypothesize the emergence of an arms race of strategies, where each new strategy is a gradual improvement over earlier adaptations of other agents, effectively producing a multi-agent autocurriculum analogous to biological evolution. In contrast to the binary team-based structures prevalent in earlier research, our gene-based reward structure introduces a spectrum of cooperation ranging from full adversity to full cooperativeness based on genetic similarity, enabling unique non team-based social dynamics. For example, one agent having a mutual cooperative relationship with two other agents, while the two other agents behave adversarially towards each other. We argue that incorporating inclusive fitness in agents provides a foundation for the emergence of more strategically advanced and socially intelligent agents.
comment: This version is a slightly updated version (e.g., added an important reference) compared to the peer-reviewed versions at 'Adapative Learning Agents' at AAMAS 2022 or 'From Cells to Societies' at ICLR 2022
Heterogeneous RBCs via deep multi-agent reinforcement learning
Current macroeconomic models with agent heterogeneity can be broadly divided into two main groups. Heterogeneous-agent general equilibrium (GE) models, such as those based on Heterogeneous Agents New Keynesian (HANK) or Krusell-Smith (KS) approaches, rely on GE and 'rational expectations', somewhat unrealistic assumptions that make the models very computationally cumbersome, which in turn limits the amount of heterogeneity that can be modelled. In contrast, agent-based models (ABMs) can flexibly encompass a large number of arbitrarily heterogeneous agents, but typically require the specification of explicit behavioural rules, which can lead to a lengthy trial-and-error model-development process. To address these limitations, we introduce MARL-BC, a framework that integrates deep multi-agent reinforcement learning (MARL) with Real Business Cycle (RBC) models. We demonstrate that MARL-BC can: (1) recover textbook RBC results when using a single agent; (2) recover the results of the mean-field KS model using a large number of identical agents; and (3) effectively simulate rich heterogeneity among agents, a hard task for traditional GE approaches. Our framework can be thought of as an ABM if used with a variety of heterogeneous interacting agents, and can reproduce GE results in limit cases. As such, it is a step towards a synthesis of these often opposed modelling paradigms.
comment: 13 pages, 9 figures
UNCAP: Uncertainty-Guided Planning Using Natural Language Communication for Cooperative Autonomous Vehicles
Safe large-scale coordination of multiple cooperative connected autonomous vehicles (CAVs) hinges on communication that is both efficient and interpretable. Existing approaches either rely on transmitting high-bandwidth raw sensor data streams or neglect perception and planning uncertainties inherent in shared data, resulting in systems that are neither scalable nor safe. To address these limitations, we propose Uncertainty-Guided Natural Language Cooperative Autonomous Planning (UNCAP), a vision-language model-based planning approach that enables CAVs to communicate via lightweight natural language messages while explicitly accounting for perception uncertainty in decision-making. UNCAP features a two-stage communication protocol: (i) an ego CAV first identifies the subset of vehicles most relevant for information exchange, and (ii) the selected CAVs then transmit messages that quantitatively express their perception uncertainty. By selectively fusing messages that maximize mutual information, this strategy allows the ego vehicle to integrate only the most relevant signals into its decision-making, improving both the scalability and reliability of cooperative planning. Experiments across diverse driving scenarios show a 63% reduction in communication bandwidth with a 31% increase in driving safety score, a 61% reduction in decision uncertainty, and a four-fold increase in collision distance margin during near-miss events. Project website: https://uncap-project.github.io/
KVCOMM: Online Cross-context KV-cache Communication for Efficient LLM-based Multi-agent Systems NeurIPS2025
Multi-agent large language model (LLM) systems are increasingly adopted for complex language processing tasks that require communication and coordination among agents. However, these systems often suffer substantial overhead from repeated reprocessing of overlapping contexts across agents. In typical pipelines, once an agent receives a message from its predecessor, the full context-including prior turns-must be reprocessed from scratch, leading to inefficient processing. While key-value (KV) caching is an effective solution for avoiding redundant computation in single-agent settings where prefixes remain unchanged, it cannot be directly reused in multi-agent scenarios due to diverging prefixes introduced by agent-specific context extensions. We identify that the core challenge lies in the offset variance of KV-caches across agents. To address this, we propose KVCOMM, a training-free framework that enables efficient prefilling in multi-agent inference by reusing KV-caches and aligning cache offsets of overlapping contexts under diverse prefix contexts. KVCOMM estimates and adjusts KV-caches for shared content by referencing a pool of cached examples-termed anchors-that store observed cache deviations under varying prefixes. The anchor pool is maintained and updated online, allowing dynamic adaptation to distinct user requests and context structures. KVCOMM achieves over 70% reuse rate across diverse multi-agent workloads, including retrieval-augmented generation, math reasoning, and collaborative coding tasks, all without quality degradation. Particularly, when each fully-connected agent receives 1K input tokens with 512 prefix tokens and 512 output tokens under a five-agent setting, KVCOMM achieves up to 7.8x speedup compared to the standard prefill pipeline, reducing TTFT from ~430 ms to ~55 ms.
comment: Accepted for publication in NeurIPS2025. Code is available at \url{https://github.com/HankYe/KVCOMM}
Equilibria in routing games with connected autonomous vehicles will not be strong, as exclusive clubs may form
User Equilibrium is the standard representation of the so-called routing game in which drivers adjust their route choices to arrive at their destinations as fast as possible. Asking whether this Equilibrium is strong or not was meaningless for human drivers who did not form coalitions due to technical and behavioral constraints. This is no longer the case for connected autonomous vehicles (CAVs), which will be able to communicate and collaborate to jointly form routing coalitions. We demonstrate this for the first time on a carefully designed toy-network example, where a `club` of three autonomous vehicles jointly decides to deviate from the user equilibrium and benefit (arrive faster). The formation of such a club has negative consequences for other users, who are not invited to join it and now travel longer, and for the system, making it suboptimal and disequilibrated, which triggers adaptation dynamics. This discovery has profound implications for the future of our cities. We demonstrate that, if not prevented, CAV operators may intentionally disequilibrate traffic systems from their classic Nash equilibria, benefiting their own users and imposing costs on others. These findings suggest the possible emergence of an exclusive CAV elite, from which human-driven vehicles and non-coalition members may be excluded, potentially leading to systematically longer travel times for those outside the coalition, which would be harmful for the equity of public road networks.
Benefits and Limitations of Communication in Multi-Agent Reasoning
Chain-of-thought prompting has popularized step-by-step reasoning in large language models, yet model performance still degrades as problem complexity and context length grow. By decomposing difficult tasks with long contexts into shorter, manageable ones, recent multi-agent paradigms offer a promising near-term solution to this problem. However, the fundamental capacities of such systems are poorly understood. In this work, we propose a theoretical framework to analyze the expressivity of multi-agent systems. We apply our framework to three algorithmic families: state tracking, recall, and $k$-hop reasoning. We derive bounds on (i) the number of agents required to solve the task exactly, (ii) the quantity and structure of inter-agent communication, and (iii) the achievable speedups as problem size and context scale. Our results identify regimes where communication is provably beneficial, delineate tradeoffs between agent count and bandwidth, and expose intrinsic limitations when either resource is constrained. We complement our theoretical analysis with a set of experiments on pretrained LLMs using controlled synthetic benchmarks. Empirical outcomes confirm the tradeoffs between key quantities predicted by our theory. Collectively, our analysis offers principled guidance for designing scalable multi-agent reasoning systems.
comment: 34 pages, 14 figures
GenCellAgent: Generalizable, Training-Free Cellular Image Segmentation via Large Language Model Agents
Cellular image segmentation is essential for quantitative biology yet remains difficult due to heterogeneous modalities, morphological variability, and limited annotations. We present GenCellAgent, a training-free multi-agent framework that orchestrates specialist segmenters and generalist vision-language models via a planner-executor-evaluator loop (choose tool $\rightarrow$ run $\rightarrow$ quality-check) with long-term memory. The system (i) automatically routes images to the best tool, (ii) adapts on the fly using a few reference images when imaging conditions differ from what a tool expects, (iii) supports text-guided segmentation of organelles not covered by existing models, and (iv) commits expert edits to memory, enabling self-evolution and personalized workflows. Across four cell-segmentation benchmarks, this routing yields a 15.7\% mean accuracy gain over state-of-the-art baselines. On endoplasmic reticulum and mitochondria from new datasets, GenCellAgent improves average IoU by 37.6\% over specialist models. It also segments novel objects such as the Golgi apparatus via iterative text-guided refinement, with light human correction further boosting performance. Together, these capabilities provide a practical path to robust, adaptable cellular image segmentation without retraining, while reducing annotation burden and matching user preferences.
comment: 43 pages
Large Language Model Agents Enable Autonomous Design and Image Analysis of Microwell Microfluidics
Microwell microfluidics has been utilized for single-cell analysis to reveal heterogeneity in gene expression, signaling pathways, and phenotypic responses for identifying rare cell types, understanding disease progression, and developing more precise therapeutic strategies. However, designing microwell microfluidics is a considerably complex task, requiring knowledge, experience, and CAD software, as well as manual intervention, which often fails initial designs, demanding multiple costly and time-consuming iterations. In this study, we establish an autonomous large language model (LLM)-driven microwell design framework to generate code-based computer-aided design (CAD) scripts, that enables the rapid and reproducible creation of microwells with diverse geometries and imaging-based analysis. We propose a multimodal large language model (MLLM)-logistic regression framework based on integrating high-level semantic descriptions generated by MLLMs with image embeddings for image classification tasks, aiming to identify microwell occupancy and microwell shape. The fused multimodal representation is input to a logistic regression model, which is both interpretable and computationally efficient. We achieved significant improvements, exceeding 0.92 for occupancy classification and 0.99 for shape classification, across all evaluated MLLMs, compared with 0.50 and 0.55, respectively, when relying solely on direct classification. The MLLM-logistic regression framework is a scalable, efficient solution for high-throughput microwell image analysis. Our study demonstrates an autonomous design microwell platform by translating natural language prompts into optimized device geometries, CAD scripts and image analysis, facilitating the development of next-generation digital discovery by integration of literature mining, autonomous design and experimental data analysis.
Emergence of Internal State-Modulated Swarming in Multi-Agent Patch Foraging System
Active particles are entities that sustain persistent out-of-equilibrium motion by consuming energy. Under certain conditions, they exhibit the tendency to self-organize through coordinated movements, such as swarming via aggregation. While performing non-cooperative foraging tasks, the emergence of such swarming behavior in foragers, exemplifying active particles, has been attributed to the partial observability of the environment, in which the presence of another forager can serve as a proxy signal to indicate the potential presence of a food source or a resource patch. In this paper, we validate this phenomenon by simulating multiple self-propelled foragers as they forage from multiple resource patches in a non-cooperative manner. These foragers operate in a continuous two-dimensional space with stochastic position updates and partial observability. We evolve a shared policy in the form of a continuous-time recurrent neural network that serves as a velocity controller for the foragers. To this end, we use an evolutionary strategy algorithm wherein the different samples of the policy-distribution are evaluated in the same rollout. Then we show that agents are able to learn to adaptively forage in the environment. Next, we show the emergence of swarming in the form of aggregation among the foragers when resource patches are absent. We observe that the strength of this swarming behavior appears to be inversely proportional to the amount of resource stored in the foragers, which supports the risk-sensitive foraging claims. Empirical analysis of the learned controller's hidden states in minimal test runs uncovers their sensitivity to the amount of resource stored in a forager. Clamping these hidden states to represent a lesser amount of resource hastens its learned aggregation behavior.
comment: 9 pages, 9 figures, 1 table, 1 algorithm
Stronger Together: On-Policy Reinforcement Learning for Collaborative LLMs
Multi-agent systems (MAS) and reinforcement learning (RL) are widely used to enhance the agentic capabilities of large language models (LLMs). MAS improves task performance through role-based orchestration, while RL uses environmental rewards to learn stronger policies, such as GRPO-style optimization. However, applying on-policy RL to MAS remains underexplored and presents unique challenges. Algorithmically, standard GRPO grouping assumptions break down because prompts vary by role and by turn. System-wise, the training stack must support MAS-workflow rollouts and on-policy updates for both single-policy and multi-policy models. We propose AT-GRPO, which includes (i) an agent- and turn-wise grouped RL algorithm tailored to MAS and (ii) a training system that supports both single- and multi-policy regimes. Across game, planning, coding, and math tasks, AT-GRPO delivers substantial gains. On long-horizon planning, it increases accuracy from a 14.0 to 47.0 percent single-agent RL baseline to 96.0 to 99.5 percent. It also improves reasoning performance, with average gains of 3.87 to 7.62 percent on coding tasks and 9.0 to 17.93 percent on math. Code and environments are available at: https://github.com/pettingllms-ai/PettingLLMs.
Autonomous vehicles need social awareness to find optima in multi-agent reinforcement learning routing games
Previous work has shown that when multiple selfish Autonomous Vehicles (AVs) are introduced to future cities and start learning optimal routing strategies using Multi-Agent Reinforcement Learning (MARL), they may destabilize traffic systems, as they would require a significant amount of time to converge to the optimal solution, equivalent to years of real-world commuting. We demonstrate that moving beyond the selfish component in the reward significantly relieves this issue. If each AV, apart from minimizing its own travel time, aims to reduce its impact on the system, this will be beneficial not only for the system-wide performance but also for each individual player in this routing game. By introducing an intrinsic reward signal based on the marginal cost matrix, we significantly reduce training time and achieve convergence more reliably. Marginal cost quantifies the impact of each individual action (route-choice) on the system (total travel time). Including it as one of the components of the reward can reduce the degree of non-stationarity by aligning agents' objectives. Notably, the proposed counterfactual formulation preserves the system's equilibria and avoids oscillations. Our experiments show that training MARL algorithms with our novel reward formulation enables the agents to converge to the optimal solution, whereas the baseline algorithms fail to do so. We show these effects in both a toy network and the real-world network of Saint-Arnoult. Our results optimistically indicate that social awareness (i.e., including marginal costs in routing decisions) improves both the system-wide and individual performance of future urban systems with AVs.
Optimistic Multi-Agent Policy Gradient ICML 2024
*Relative overgeneralization* (RO) occurs in cooperative multi-agent learning tasks when agents converge towards a suboptimal joint policy due to overfitting to suboptimal behavior of other agents. No methods have been proposed for addressing RO in multi-agent policy gradient (MAPG) methods although these methods produce state-of-the-art results. To address this gap, we propose a general, yet simple, framework to enable optimistic updates in MAPG methods that alleviate the RO problem. Our approach involves clipping the advantage to eliminate negative values, thereby facilitating optimistic updates in MAPG. The optimism prevents individual agents from quickly converging to a local optimum. Additionally, we provide a formal analysis to show that the proposed method retains optimality at a fixed point. In extensive evaluations on a diverse set of tasks including the *Multi-agent MuJoCo* and *Overcooked* benchmarks, our method outperforms strong baselines on 13 out of 19 tested tasks and matches the performance on the rest.
comment: Updated MMDP notation. Published at ICML 2024, 17 pages, 10 figures
Nash Equilibria in Games with Playerwise Concave Coupling Constraints: Existence and Computation
We study the existence and computation of Nash equilibria in continuous static games where the players' admissible strategies are subject to shared coupling constraints, i.e., constraints that depend on their \emph{joint} strategies. Specifically, we focus on a class of games characterized by playerwise concave utilities and playerwise concave constraints. Prior results on the existence of Nash equilibria are not applicable to this class, as they rely on strong assumptions such as joint convexity of the feasible set. By leveraging topological fixed point theory and novel structural insights into the contractibility of feasible sets under playerwise concave constraints, we give an existence proof for Nash equilibria under weaker conditions. Having established existence, we then focus on the computation of Nash equilibria via independent gradient methods under the additional assumption that the utilities admit a potential function. To account for the possibly nonconvex feasible region, we employ a log barrier regularized gradient ascent with adaptive stepsizes. Starting from an initial feasible strategy profile and under exact gradient feedback, the proposed method converges to an $\epsilon$-approximate constrained Nash equilibrium within $\mathcal{O}(\epsilon^{-3})$ iterations.
Offline Fictitious Self-Play for Competitive Games
Offline Reinforcement Learning (RL) enables policy improvement from fixed datasets without online interactions, making it highly suitable for real-world applications lacking efficient simulators. Despite its success in the single-agent setting, offline multi-agent RL remains a challenge, especially in competitive games. Firstly, unaware of the game structure, it is impossible to interact with the opponents and conduct a major learning paradigm, self-play, for competitive games. Secondly, real-world datasets cannot cover all the state and action space in the game, resulting in barriers to identifying Nash equilibrium (NE). To address these issues, this paper introduces OFF-FSP, the first practical model-free offline RL algorithm for competitive games. We start by simulating interactions with various opponents by adjusting the weights of the fixed dataset with importance sampling. This technique allows us to learn the best responses to different opponents and employ the Offline Self-Play learning framework. To overcome the challenge of partial coverage, we combine the single-agent offline RL method with Fictitious Self-Play (FSP) to approximate NE by constraining the approximate best responses away from out-of-distribution actions. Experiments on matrix games, extensive-form poker, and board games demonstrate that OFF-FSP achieves significantly lower exploitability than state-of-the-art baselines. Finally, we validate OFF-FSP on a real-world human-robot competitive task, demonstrating its potential for solving complex, hard-to-simulate real-world problems.
A Hybrid ABM-PDE Framework for Real-World Infectious Disease Simulations
This paper presents a hybrid modeling approach that couples an Agent-Based Model (ABM) with a partial differential equation (PDE) model in an epidemic setting to simulate the spatial spread of infectious diseases using a compartmental structure with seven health states. The goal is to reduce the computational complexity of a full-ABM by introducing a coupled ABM-PDE model that offers significantly faster simulations while maintaining comparable accuracy. Our results demonstrate that the hybrid model not only reduces the overall simulation runtime (defined as the number of runs required for stable results multiplied by the duration of a single run) but also achieves smaller errors across both 25% and 100% population samples. The coupling mechanism ensures consistency at the model interface: agents crossing from the ABM into the PDE domain are removed and represented as density contributions, while surplus density in the PDE domain is used to generate agents with plausible trajectories derived from mobile phone data. We evaluate the hybrid model using real-world mobility and infection data for the Berlin-Brandenburg region in Germany, showing that it captures the core epidemiological dynamics while enabling efficient large-scale simulations.
Scaling Multi-Agent Epistemic Planning through GNN-Derived Heuristics
Multi-agent Epistemic Planning (MEP) is an autonomous planning framework for reasoning about both the physical world and the beliefs of agents, with applications in domains where information flow and awareness among agents are critical. The richness of MEP requires states to be represented as Kripke structures, i.e., directed labeled graphs. This representation limits the applicability of existing heuristics, hindering the scalability of epistemic solvers, which must explore an exponential search space without guidance, resulting often in intractability. To address this, we exploit Graph Neural Networks (GNNs) to learn patterns and relational structures within epistemic states, to guide the planning process. GNNs, which naturally capture the graph-like nature of Kripke models, allow us to derive meaningful estimates of state quality -- e.g., the distance from the nearest goal -- by generalizing knowledge obtained from previously solved planning instances. We integrate these predictive heuristics into an epistemic planning pipeline and evaluate them against standard baselines, showing improvements in the scalability of multi-agent epistemic planning.
Abmax: A JAX-based Agent-based Modeling Framework
Agent-based modeling (ABM) is a principal approach for studying complex systems. By decomposing a system into simpler, interacting agents, agent-based modeling (ABM) allows researchers to observe the emergence of complex phenomena. High-performance array computing libraries like JAX can help scale such computational models to a large number of agents by using automatic vectorization and just-in-time (JIT) compilation. One of the caveats of using JAX to achieve such scaling is that the shapes of arrays used in the computational model should remain immutable throughout the simulation. In the context of agent-based modeling (ABM), this can pose constraints on certain agent manipulation operations that require flexible data structures. A subset of which is represented by the ability to update a dynamically selected number of agents by applying distinct changes to them during a simulation. To this effect, we introduce Abmax, an ABM framework based on JAX that implements multiple just-in-time (JIT) compilable algorithms to provide this functionality. On the canonical predation model benchmark, Abmax achieves runtime performance comparable to state-of-the-art implementations. Further, we show that this functionality can also be vectorized, making it possible to run many similar agent-based models in parallel. We also present two examples in the form of a traffic-flow model and a financial market model to show the use case of Abmax.
comment: 8 pages, 7 figures, 4 tables, 2 algorithms
Multi-Agent Autonomous Driving Systems with Large Language Models: A Survey of Recent Advances
Autonomous Driving Systems (ADSs) are revolutionizing transportation by reducing human intervention, improving operational efficiency, and enhancing safety. Large Language Models (LLMs) have been integrated into ADSs to support high-level decision-making through their powerful reasoning, instruction-following, and communication abilities. However, LLM-based single-agent ADSs face three major challenges: limited perception, insufficient collaboration, and high computational demands. To address these issues, recent advances in LLM-based multi-agent ADSs leverage language-driven communication and coordination to enhance inter-agent collaboration. This paper provides a frontier survey of this emerging intersection between NLP and multi-agent ADSs. We begin with a background introduction to related concepts, followed by a categorization of existing LLM-based methods based on different agent interaction modes. We then discuss agent-human interactions in scenarios where LLM-based agents engage with humans. Finally, we summarize key applications, datasets, and challenges to support future research.
Time Travel is Cheating: Going Live with DeepFund for Real-Time Fund Investment Benchmarking NeurIPS 2025
Large Language Models (LLMs) have demonstrated notable capabilities across financial tasks, including financial report summarization, earnings call transcript analysis, and asset classification. However, their real-world effectiveness in managing complex fund investment remains inadequately assessed. A fundamental limitation of existing benchmarks for evaluating LLM-driven trading strategies is their reliance on historical back-testing, inadvertently enabling LLMs to "time travel"-leveraging future information embedded in their training corpora, thus resulting in possible information leakage and overly optimistic performance estimates. To address this issue, we introduce DeepFund, a live fund benchmark tool designed to rigorously evaluate LLM in real-time market conditions. Utilizing a multi-agent architecture, DeepFund connects directly with real-time stock market data-specifically data published after each model pretraining cutoff-to ensure fair and leakage-free evaluations. Empirical tests on nine flagship LLMs from leading global institutions across multiple investment dimensions-including ticker-level analysis, investment decision-making, portfolio management, and risk control-reveal significant practical challenges. Notably, even cutting-edge models such as DeepSeek-V3 and Claude-3.7-Sonnet incur net trading losses within DeepFund real-time evaluation environment, underscoring the present limitations of LLMs for active fund management. Our code is available at https://github.com/HKUSTDial/DeepFund.
comment: NeurIPS 2025 Datasets and Benchmarks Track
Coordination Requires Simplification: Thermodynamic Bounds on Multi-Objective Compromise in Natural and Artificial Intelligence
Information-processing systems that coordinate multiple agents and objectives face fundamental thermodynamic constraints. We show that solutions with maximum utility to act as coordination focal points have a much higher selection pressure for being findable across agents rather than accuracy. We derive that the information-theoretic minimum description length of coordination protocols to precision $\varepsilon$ scales as $L(P)\geq NK\log_2 K+N^2d^2\log (1/\varepsilon)$ for $N$ agents with $d$ potentially conflicting objectives and internal model complexity $K$. This scaling forces progressive simplification, with coordination dynamics changing the environment itself and shifting optimization across hierarchical levels. Moving from established focal points requires re-coordination, creating persistent metastable states and hysteresis until significant environmental shifts trigger phase transitions through spontaneous symmetry breaking. We operationally define coordination temperature to predict critical phenomena and estimate coordination work costs, identifying measurable signatures across systems from neural networks to restaurant bills to bureaucracies. Extending the topological version of Arrow's theorem on the impossibility of consistent preference aggregation, we find it recursively binds whenever preferences are combined. This potentially explains the indefinite cycling in multi-objective gradient descent and alignment faking in Large Language Models trained with reinforcement learning with human feedback. We term this framework Thermodynamic Coordination Theory (TCT), which demonstrates that coordination requires radical information loss.
comment: 15 pages, 1 figure, 9 pages supplementary material, submitted to Journal of Physics: Complexity
OrbitZoo: Real Orbital Systems Challenges for Reinforcement Learning NeurIPS 2025
The increasing number of satellites and orbital debris has made space congestion a critical issue, threatening satellite safety and sustainability. Challenges such as collision avoidance, station-keeping, and orbital maneuvering require advanced techniques to handle dynamic uncertainties and multi-agent interactions. Reinforcement learning (RL) has shown promise in this domain, enabling adaptive, autonomous policies for space operations; however, many existing RL frameworks rely on custom-built environments developed from scratch, which often use simplified models and require significant time to implement and validate the orbital dynamics, limiting their ability to fully capture real-world complexities. To address this, we introduce OrbitZoo, a versatile multi-agent RL environment built on a high-fidelity industry standard library, that enables realistic data generation, supports scenarios like collision avoidance and cooperative maneuvers, and ensures robust and accurate orbital dynamics. The environment is validated against a real satellite constellation, Starlink, achieving a Mean Absolute Percentage Error (MAPE) of 0.16% compared to real-world data. This validation ensures reliability for generating high-fidelity simulations and enabling autonomous and independent satellite operations.
comment: Accepted at the 39th Conference on Neural Information Processing Systems (NeurIPS 2025)
Evolution of AI Agent Registry Solutions: Centralized, Enterprise, and Distributed Approaches
Autonomous AI agents now operate across cloud, enterprise, and decentralized domains, creating demand for registry infrastructures that enable trustworthy discovery, capability negotiation, and identity assurance. We analyze five prominent approaches: (1) MCP Registry (centralized publication of mcp.json descriptors), (2) A2A Agent Cards (decentralized self-describing JSON capability manifests), (3) AGNTCY Agent Directory Service (IPFS Kademlia DHT content routing extended for semantic taxonomy-based content discovery, OCI artifact storage, and Sigstore-backed integrity), (4) Microsoft Entra Agent ID (enterprise SaaS directory with policy and zero-trust integration), and (5) NANDA Index AgentFacts (cryptographically verifiable, privacy-preserving fact model with credentialed assertions). Using four evaluation dimensions: security, authentication, scalability, and maintainability, we surface architectural trade-offs between centralized control, enterprise governance, and distributed resilience. We conclude with design recommendations for an emerging Internet of AI Agents requiring verifiable identity, adaptive discovery flows, and interoperable capability semantics.
Foragax: An Agent-Based Modelling Framework Based on JAX
Foraging for resources is a ubiquitous activity conducted by living organisms in a shared environment to maintain their homeostasis. Modelling multi-agent foraging in-silico allows us to study both individual and collective emergent behaviour in a tractable manner. Agent-based modelling has proven to be effective in simulating such tasks, though scaling the simulations to accommodate large numbers of agents with complex dynamics remains challenging. In this work, we present Foragax, a general-purpose, scalable, hardware-accelerated, multi-agent foraging toolkit. Leveraging the JAX library, our toolkit can simulate thousands of agents foraging in a common environment, in an end-to-end vectorized and differentiable manner. The toolkit provides agent-based modelling tools to model various foraging tasks, including options to design custom spatial and temporal agent dynamics, control policies, sensor models, and boundary conditions. Further, the number of agents during such simulations can be increased or decreased based on custom rules. While applied to foraging, the toolkit can also be used to model and simulate a wide range of other multi-agent scenarios.
comment: A new version of the framework with more complete features is available in the form of ABMax (arXiv:2508.16508)
Decentralizing Multi-Agent Reinforcement Learning with Temporal Causal Information
Reinforcement learning (RL) algorithms can find an optimal policy for a single agent to accomplish a particular task. However, many real-world problems require multiple agents to collaborate in order to achieve a common goal. For example, a robot executing a task in a warehouse may require the assistance of a drone to retrieve items from high shelves. In Decentralized Multi-Agent RL (DMARL), agents learn independently and then combine their policies at execution time, but often must satisfy constraints on compatibility of local policies to ensure that they can achieve the global task when combined. In this paper, we study how providing high-level symbolic knowledge to agents can help address unique challenges of this setting, such as privacy constraints, communication limitations, and performance concerns. In particular, we extend the formal tools used to check the compatibility of local policies with the team task, making decentralized training with theoretical guarantees usable in more scenarios. Furthermore, we empirically demonstrate that symbolic knowledge about the temporal evolution of events in the environment can significantly expedite the learning process in DMARL.
comment: Code available at https://github.com/corazza/tcdmarl
Systems and Control (CS)
Autonomous Legged Mobile Manipulation for Lunar Surface Operations via Constrained Reinforcement Learning
Robotics plays a pivotal role in planetary science and exploration, where autonomous and reliable systems are crucial due to the risks and challenges inherent to space environments. The establishment of permanent lunar bases demands robotic platforms capable of navigating and manipulating in the harsh lunar terrain. While wheeled rovers have been the mainstay for planetary exploration, their limitations in unstructured and steep terrains motivate the adoption of legged robots, which offer superior mobility and adaptability. This paper introduces a constrained reinforcement learning framework designed for autonomous quadrupedal mobile manipulators operating in lunar environments. The proposed framework integrates whole-body locomotion and manipulation capabilities while explicitly addressing critical safety constraints, including collision avoidance, dynamic stability, and power efficiency, in order to ensure robust performance under lunar-specific conditions, such as reduced gravity and irregular terrain. Experimental results demonstrate the framework's effectiveness in achieving precise 6D task-space end-effector pose tracking, achieving an average positional accuracy of 4 cm and orientation accuracy of 8.1 degrees. The system consistently respects both soft and hard constraints, exhibiting adaptive behaviors optimized for lunar gravity conditions. This work effectively bridges adaptive learning with essential mission-critical safety requirements, paving the way for advanced autonomous robotic explorers for future lunar missions.
comment: This is the authors version of the paper accepted for publication in The IEEE International Conference on Space Robotics 2025. The final version link will be added here after conference proceedings are published
Variational Quantum Eigensolver Models of Molecular Quantum Dot Cellular Automata
Molecular quantum-dot Cellular Automata (QCA) may provide low-power, high-speed computational hardware for processing classical information. Simulation and modeling play an important role in the design of QCA circuits because fully-coherent models of QCA scale exponentially with the number of devices, and such models are severely limited in size. For larger circuits, approximations become necessary. In the era of fault-tolerant quantum computation, however, it may become possible to model large QCA circuits without such limitations. Presently, this work explores the use of the noisy-intermediate scale quantum (NISQ) variational quantum eigensolver (VQE) method for estimating the ground state of QCA circuits. This is relevant because the computational result of a QCA calculation is encoded in the circuit's ground state. In this study, VQE is used to model logic circuits, including binary wires, inverters, and majority gates. VQE models are performed ideal simulators, noisy simulators, and actual quantum hardware. This study demonstrates that VQE may indeed be used to model molecular QCA circuits. It is observed that using modern NISQ hardware, results are still quite sensitive to noise, so measures should be taken to minimize noise. These include simplifying the ansatz circuit whenever possible, and using low-noise hardware.
comment: 18 pages, 26 figures, submitted to the Journal of Applied Physics
Learning Robust Agile Flight Control with Stability Guarantees
In the evolving landscape of high-speed agile quadrotor flight, achieving precise trajectory tracking at the platform's operational limits is paramount. Controllers must handle actuator constraints, exhibit robustness to disturbances, and remain computationally efficient for safety-critical applications. In this work, we present a novel neural-augmented feedback controller for agile flight control. The controller addresses individual limitations of existing state-of-the-art control paradigms and unifies their strengths. We demonstrate the controller's capabilities, including the accurate tracking of highly aggressive trajectories that surpass the feasibility of the actuators. Notably, the controller provides universal stability guarantees, enhancing its robustness and tracking performance even in exceedingly disturbance-prone settings. Its nonlinear feedback structure is highly efficient enabling fast computation at high update rates. Moreover, the learning process in simulation is both fast and stable, and the controller's inherent robustness allows direct deployment to real-world platforms without the need for training augmentations or fine-tuning.
Enhancing Robust Multi-Market Participation of Renewable-Based VPPs through Flexible Resources
In the transition toward a sustainable power system, renewable-based Virtual Power Plants (RVPPs) have emerged as a promising solution to the challenges of integrating renewable energy sources into electricity markets. Their viability, however, depends on effective market participation strategies and the ability to manage uncertainties while leveraging flexible resources. This paper analyzes the impact of different flexible resources - such as concentrated solar power plants, hydro plants, biomass plants, and flexible demand - on the participation of RVPPs in energy and reserve markets. Multiple sources of uncertainty in generation, consumption, and electricity prices are addressed using a two-stage robust optimization approach. The contribution of different technologies to RVPP profitability is evaluated through a marginal contribution method, ensuring fair allocation of profits among them according to their actual role in energy and reserve provision across markets. Simulations for an RVPP in southern Spain demonstrate how strategic decisions and the availability of flexible resources influence viability, market participation, and unit scheduling.
Privacy-Preserving Distributed Estimation with Limited Data Rate
This paper focuses on the privacy-preserving distributed estimation problem with a limited data rate, where the observations are the sensitive information. Specifically, a binary-valued quantizer-based privacy-preserving distributed estimation algorithm is developed, which improves the algorithm's privacy-preserving capability and simultaneously reduces the communication costs. The algorithm's privacy-preserving capability, measured by the Fisher information matrix, is dynamically enhanced over time. Notably, the Fisher information matrix of the output signals with respect to the sensitive information converges to zero at a polynomial rate, and the improvement in privacy brought by the quantizers is quantitatively characterized as a multiplicative effect. Regarding the communication costs, each sensor transmits only 1 bit of information to its neighbours at each time step. Additionally, the assumption on the negligible quantization error for real-valued messages is not required. While achieving the requirements of privacy preservation and reducing communication costs, the algorithm ensures that its estimates converge almost surely to the true value of the unknown parameter by establishing a co-design guideline for the time-varying privacy noises and step-sizes. A polynomial almost sure convergence rate is obtained, and then the trade-off between privacy and convergence rate is established. Numerical examples demonstrate the main results.
Optimising Communication Control Factors for Energy Consumption in Rural LOS V2X
Connected braking can reduce fatal collisions in connected and autonomous vehicles (CAVs) by using reliable, low-latency 5G New Radio (NR) links, especially NR Sidelink Vehicle-to-Everything (V2X). In rural areas, road side units are sparse and power-constrained or off-grid, so energy efficiency must be considered alongside safety. This paper studies how three communication control factors including subcarrier spacing ($\mathrm{SCS}$), modulation and coding scheme ($\mathrm{MCS}$), and transmit power ($P_{\mathrm{t}}$) should be configured to balance safety and energy consumption in rural line-of-sight (LOS) scenarios in light and heavy traffic scenarios. Safety is quantified by the packet receive ratio ($\mathrm{PRR}$) against the minimum communication distance $D_{\mathrm{comm}}$, defined as the distance that the vehicle travels during the transmission of the safety message. Results show that, under heavy traffic, increasing $P_{\mathrm{t}}$ and selecting a low-rate $\mathrm{MCS}$ at $\mathrm{SCS} = 30$ kHz sustains high $\mathrm{PRR}$ at $D_{\mathrm{comm}}$, albeit with higher energy cost. In light traffic, maintaining lower $P_\mathrm{t}$ with low $\mathrm{MCS}$ levels achieves a favorable reliability-energy trade-off while preserving acceptable $\mathrm{PRR}$ at $D_{\mathrm{comm}}$. These findings demonstrate the necessity of adaptive, energy-aware strategy to guarantee both safety and energy efficiency in rural V2X systems.
Temporal Variabilities Limit Convergence Rates in Gradient-Based Online Optimization
This paper investigates the fundamental performance limits of gradient-based algorithms for time-varying optimization. Leveraging the internal model principle and root locus techniques, we show that temporal variabilities impose intrinsic limits on the achievable rate of convergence. For a problem with condition ratio $\kappa$ and time variation whose model has degree $n$, we show that the worst-case convergence rate of any minimal-order gradient-based algorithm is $\rho_\text{TV} = (\frac{\kappa-1}{\kappa+1})^{1/n}$. This bound reveals a fundamental tradeoff between problem conditioning, temporal complexity, and rate of convergence. We further construct explicit controllers that attain the bound for low-degree models of time variation.
DarTwin made precise by SysMLv2 -- An Experiment
The new SysMLv2 adds mechanisms for the built-in specification of domain-specific concepts and language extensions. This feature promises to facilitate the creation of Domain-Specific Languages (DSLs) and interfacing with existing system descriptions and technical designs. In this paper, we review these features and evaluate SysMLv2's capabilities using concrete use cases. We develop DarTwin DSL, a DSL that formalizes the existing DarTwin notation for Digital Twin (DT) evolution, through SysMLv2, thereby supposedly enabling the wide application of DarTwin's evolution templates using any SysMLv2 tool. We demonstrate DarTwin DSL, but also point out limitations in the currently available tooling of SysMLv2 in terms of graphical notation capabilities. This work contributes to the growing field of Model-Driven Engineering (MDE) for DTs and combines it with the release of SysMLv2, thus integrating a systematic approach with DT evolution management in systems engineering.
Micro-Macro Backstepping Control of Large-Scale Hyperbolic Systems (Extended Version)
We introduce a control design and analysis framework for micro-macro, boundary control of large-scale, $n+m$ hyperbolic PDE systems. Specifically, we develop feedback laws for stabilization of hyperbolic systems at the micro level (i.e., of the large-scale system) that employ a) measurements obtained from the $n+m$ system (i.e., at micro level) and kernels constructed based on an $\infty+\infty$ continuum system counterpart (i.e., at macro level), or b) kernels and measurements both stemming from a continuum counterpart, or c) averaged-continuum kernels/measurements. We also address (d)) stabilization of the continuum (macro) system, employing continuum kernels and measurements. Towards addressing d) we derive in a constructive manner an $\infty+\infty$ continuum approximation of $n+m$ hyperbolic systems and establish that its solutions approximate, for large $n$ and $m$, the solutions of the $n+m$ system. We then construct a feedback law for stabilization of the $\infty+\infty$ system via introduction of a continuum-PDE backstepping transformation. We establish well-posedness of the resulting 4-D kernel equations and prove closed-loop stability via construction of a novel Lyapunov functional. Furthermore, under control configuration a) we establish that the closed-loop system is exponentially stable provided that $n$ and $m$ are large, by proving that the exact, stabilizing $n+m$ control kernels can be accurately approximated by the continuum kernels. While under control configurations b) and c), we establish closed-loop stability capitalizing on the established solutions' and kernels' approximation properties via employment of infinite-dimensional ISS arguments. We provide two numerical simulation examples to illustrate the effectiveness and potential limitations of our design approach.
comment: 22 pages, 5 figures
The value of storage in electricity distribution: The role of storage
Electricity distribution companies deploy battery storage to defer grid upgrades by reducing peak demand. In deregulated jurisdictions, such storage often sits idle because regulatory constraints bar participation in electricity markets. Here, we develop an optimization framework that, to our knowledge, provides the first formal model of market participation constraints within storage investment and operation planning. Applying the framework to a Massachusetts case study, we find that market participation could deliver similar savings as peak demand reduction. Under current conditions, market participation does not increase storage investment, but at very low storage costs, could incentivize deployment beyond local distribution needs. This might run contrary to the separation of distribution from generation in deregulated markets. Our framework can identify investment levels appropriate for local distribution needs.
High-Parallel FPGA-Based Discrete Simulated Bifurcation for Large-Scale Optimization
Combinatorial Optimization (CO) problems exhibit exponential complexity, making their resolution challenging. Simulated Adiabatic Bifurcation (aSB) is a quantum-inspired algorithm to obtain approximate solutions to largescale CO problems written in the Ising form. It explores the solution space by emulating the adiabatic evolution of a network of Kerr-nonlinear parametric oscillators (KPOs), where each oscillator represents a variable in the problem. The optimal solution corresponds to the ground state of this system. A key advantage of this approach is the possibility of updating multiple variables simultaneously, making it particularly suited for hardware implementation. To enhance solution quality and convergence speed, variations of the algorithm have been proposed in the literature, including ballistic (bSB), discrete (dSB), and thermal (HbSB) versions. In this work, we have comprehensively analyzed dSB, bSB, and HbSB using dedicated software models, evaluating the feasibility of using a fixed-point representation for hardware implementation. We then present an opensource hardware architecture implementing the dSB algorithm for Field-Programmable Gate Arrays (FPGAs). The design allows users to adjust the degree of algorithmic parallelization based on their specific requirements. A proof-of-concept implementation that solves 256-variable problems was achieved on an AMD Kria KV260 SoM, a low-tier FPGA, validated using well-known max-cut and knapsack problems.
Pooling Probabilistic Forecasts for Cooperative Wind Power Offering SC
Wind power producers can benefit from forming coalitions to participate cooperatively in electricity markets. To support such collaboration, various profit allocation rules rooted in cooperative game theory have been proposed. However, existing approaches overlook the lack of coherence among producers regarding forecast information, which may lead to ambiguity in offering and allocations. In this paper, we introduce a ``reconcile-then-optimize'' framework for cooperative market offerings. This framework first aligns the individual forecasts into a coherent joint forecast before determining market offers. With such forecasts, we formulate and solve a two-stage stochastic programming problem to derive both the aggregate offer and the corresponding scenario-based dual values for each trading hour. Based on these dual values, we construct a profit allocation rule that is budget-balanced and stable. Finally, we validate the proposed method through empirical case studies, demonstrating its practical effectiveness and theoretical soundness.
comment: submission to PSCC 2026, 7 pages
A Unidirectionally Connected FAS Approach for 6-DOF Quadrotor Control
This paper proposes a unidirectionally connected fully actuated system (UC-FAS) approach for the sub-stabilization and tracking control of 6-DOF quadrotors, tackling limitations both in state-space and FAS framework to some extent. The framework systematically converts underactuated quadrotor dynamics into a UC-FAS model, unifying the existing different FAS transformation ways. By eliminating estimation of the high-order derivatives of control inputs, a drawback of current methods, the UC-FAS model simplifies controller design and enables direct eigenstructure assignment for closed-loop dynamics. Simulations demonstrate precise 6-DOF tracking performance. This work bridges theoretical FAS approach advancements with practical implementation needs, offering a standardized paradigm for nonlinear quadrotor control.
comment: This paper has been submitted to 2026 IFAC World Congress. Corresponding author: Guang-Ren Duan
Ultrafast Grid Impedance Identification in $dq$-Asymmetric Three-Phase Power Systems
We propose a non-parametric frequency-domain method to identify small-signal $dq$-asymmetric grid impedances, over a wide frequency band, using grid-connected converters. Existing identification methods are faced with significant trade-offs: e.g., passive approaches rely on ambient harmonics and rare grid events and thus can only provide estimates at a few frequencies, while many active approaches that intentionally perturb grid operation require long time series measurement and specialized equipment. Although active time-domain methods reduce the measurement time, they either make crude simplifying assumptions or require laborious model order tuning. Our approach effectively addresses these challenges: it does not require specialized excitation signals or hardware and achieves ultrafast ($<1$ s) identification, drastically reducing measurement time. Being non-parametric, our approach also makes no assumptions on the grid structure. A detailed electromagnetic transient simulation is used to validate the method and demonstrate its clear superiority over existing alternatives.
Physics-Informed Reinforcement Learning for Large-Scale EV Smart Charging Considering Distribution Network Voltage Constraints
Electric Vehicles (EVs) offer substantial flexibility for grid services, yet large-scale, uncoordinated charging can threaten voltage stability in distribution networks. Existing Reinforcement Learning (RL) approaches for smart charging often disregard physical grid constraints or have limited performance for complex large-scale tasks, limiting their scalability and real-world applicability. This paper introduces a physics-informed (PI) RL algorithm that integrates a differentiable power flow model and voltage-based reward design into the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm, enabling EVs to deliver real-time voltage support while meeting user demands. The resulting PI-TD3 algorithm achieves faster convergence, improved sample efficiency, and reliable voltage magnitude regulation under uncertain and overloaded conditions. Benchmarks on the IEEE 34-bus and 123-bus networks show that the proposed PI-TD3 outperforms both model-free RL and optimization-based baselines in grid constraint management, user satisfaction, and economic metrics, even as the system scales to hundreds of EVs. These advances enable robust, scalable, and practical EV charging strategies that enhance grid resilience and support distribution networks operation.
Empowering Prosumers: Incentive Design for Local Electricity Markets Under Generalized Uncertainty and Grid Constraints
Since the 1990s, widespread introduction of central (wholesale) electricity markets has been seen across multiple continents, driven by the search for efficient operation of the power grid through competition. The increase of renewables has made significant impacts both on central electricity markets and distribution-level grids as renewable power generation is often connected to the latter. These stochastic renewable technologies have both advantages and disadvantages. On one hand they offer very low marginal cost and carbon emissions, while on the other hand, their output is uncertain, requiring flexible backup power with high marginal cost. Flexibility from end-prosumers or smaller market participants is therefore seen as a key enabler of large-scale integration of renewables. However, current central electricity markets do not directly include uncertainty into the market clearing and do not account for physical constraints of distribution grids. In this paper we propose a local electricity market framework based on probabilistic locational marginal pricing, effectively accounting for uncertainties in production, consumption and grid variables. The model includes a representation of the grid using the lindistflow equations and accounts for the propagation of uncertainty using general Polynomial Chaos (gPC). A two-stage convex model is proposed; in the day-ahead stage, probability distributions of prices are calculated for every timestep, where the expected values represent the day-ahead (spot) prices. In the real-time stage, uncertainties are realized (measured) and a trivial calculation reveals the real-time price. Through four instructive case-studies we highlight the effectiveness of the method to incentivize end-prosumers' participation in the market, while ensuring that their behavior does not have an adverse impact on the operation of the grid.
Human-in-the-Loop Bandwidth Estimation for Quality of Experience Optimization in Real-Time Video Communication AAAI
The quality of experience (QoE) delivered by video conferencing systems is significantly influenced by accurately estimating the time-varying available bandwidth between the sender and receiver. Bandwidth estimation for real-time communications remains an open challenge due to rapidly evolving network architectures, increasingly complex protocol stacks, and the difficulty of defining QoE metrics that reliably improve user experience. In this work, we propose a deployed, human-in-the-loop, data-driven framework for bandwidth estimation to address these challenges. Our approach begins with training objective QoE reward models derived from subjective user evaluations to measure audio and video quality in real-time video conferencing systems. Subsequently, we collect roughly $1$M network traces with objective QoE rewards from real-world Microsoft Teams calls to curate a bandwidth estimation training dataset. We then introduce a novel distributional offline reinforcement learning (RL) algorithm to train a neural-network-based bandwidth estimator aimed at improving QoE for users. Our real-world A/B test demonstrates that the proposed approach reduces the subjective poor call ratio by $11.41\%$ compared to the baseline bandwidth estimator. Furthermore, the proposed offline RL algorithm is benchmarked on D4RL tasks to demonstrate its generalization beyond bandwidth estimation.
comment: Accepted for publication in the proceedings of the AAAI Conference on Artificial Intelligence 2026 (IAAI Technical Track on Deployed Highly Innovative Applications of AI)
Sleepy Chauffeur Detection and Alert Techniques for Road Safety
The most startling of the contemporary problems is the sleepiness of chauffeur which causes lots of car accidents. Prevention of those impending accidents by detecting and alerting the sleepy chauffeur is vital, otherwise that would lead to loss of lives and various traumas along with severe injuries. The slumber or sleep may be caused by huge stress, pressure, relentless work load or alcoholism, for which sleep deprivation occurs and the chauffeur while driving gets drowsy. So far, considerable amount of systems has been developed to detect drowsiness of drivers, most of which mainly depend on image processing algorithms using cameras. Some of them also incorporate artificial intelligence and machine learning based algorithms. This paper presents a review of the existing systems and also proposes an easy and cheap system using sensors and Arduino, capable of detecting sleepiness and generates siren alarm and send alert message to take precautionary measures.
comment: 8 pages, 5 figures, International Journal on Recent Innovation in Microelectronics and Microcontrollers Applications Vol. 1, Issue 1 - 2018
Hybrid Terrain-Aware Path Planning: Integrating VD--RRT\(^{*}\) Exploration and VD--D\(^{*}\) Lite Repair
Autonomous ground vehicles operating off-road must plan curvature-feasible paths while accounting for spatially varying soil strength and slope hazards in real time. We present a continuous state--cost metric that combines a Bekker pressure--sinkage model with elevation-derived slope and attitude penalties. The resulting terrain cost field is analytic, bounded, and monotonic in soil modulus and slope, ensuring well-posed discretization and stable updates under sensor noise. This metric is evaluated on a lattice with exact steering primitives: Dubins and Reeds--Shepp motions for differential drive and time-parameterized bicycle arcs for Ackermann steering. Global exploration is performed using Vehicle-Dynamics RRT\(^{*}\), while local repair is managed by Vehicle-Dynamics D\(^{*}\) Lite, enabling millisecond-scale replanning without heuristic smoothing. By separating the terrain--vehicle model from the planner, the framework provides a reusable basis for deterministic, sampling-based, or learning-driven planning in deformable terrain. Hardware trials on an off-road platform demonstrate real-time navigation across soft soil and slope transitions, supporting reliable autonomy in unstructured environments.
Towards xApp Conflict Evaluation with Explainable Machine Learning and Causal Inference in O-RAN
The Open Radio Access Network (O-RAN) architecture enables a flexible, vendor-neutral deployment of 5G networks by disaggregating base station components and supporting third-party xApps for near real-time RAN control. However, the concurrent operation of multiple xApps can lead to conflicting control actions, which may cause network performance degradation. In this work, we propose a framework for xApp conflict management that combines explainable machine learning and causal inference to evaluate the causal relationships between RAN Control Parameters (RCPs) and Key Performance Indicators (KPIs). We use model explainability tools such as SHAP to identify RCPs that jointly affect the same KPI, signaling potential conflicts, and represent these interactions as a causal Directed Acyclic Graph (DAG). We then estimate the causal impact of each of these RCPs on their associated KPIs using metrics such as Average Treatment Effect (ATE) and Conditional Average Treatment Effect (CATE). This approach offers network operators guided insights into identifying conflicts and quantifying their impacts, enabling more informed and effective conflict resolution strategies across diverse xApp deployments.
Information Shapes Koopman Representation
The Koopman operator provides a powerful framework for modeling dynamical systems and has attracted growing interest from the machine learning community. However, its infinite-dimensional nature makes identifying suitable finite-dimensional subspaces challenging, especially for deep architectures. We argue that these difficulties come from suboptimal representation learning, where latent variables fail to balance expressivity and simplicity. This tension is closely related to the information bottleneck (IB) dilemma: constructing compressed representations that are both compact and predictive. Rethinking Koopman learning through this lens, we demonstrate that latent mutual information promotes simplicity, yet an overemphasis on simplicity may cause latent space to collapse onto a few dominant modes. In contrast, expressiveness is sustained by the von Neumann entropy, which prevents such collapse and encourages mode diversity. This insight leads us to propose an information-theoretic Lagrangian formulation that explicitly balances this tradeoff. Furthermore, we propose a new algorithm based on the Lagrangian formulation that encourages both simplicity and expressiveness, leading to a stable and interpretable Koopman representation. Beyond quantitative evaluations, we further visualize the learned manifolds under our representations, observing empirical results consistent with our theoretical predictions. Finally, we validate our approach across a diverse range of dynamical systems, demonstrating improved performance over existing Koopman learning methods. The implementation is publicly available at https://github.com/Wenxuan52/InformationKoopman.
Data to Certificate: Guaranteed Cost Control with Quantization-Aware System Identification
Cloud-assisted system identification and control have emerged as practical solutions for low-power, resource-constrained control systems such as micro-UAVs. In a typical cloud-assisted setting, state and input data are transmitted from local agents to a central computer over low-bandwidth wireless links, leading to quantization. This paper investigates the impact of state and input data quantization on a linear time invariant (LTI) system identification, derives a worst-case bound on the identification error, and develops a robust controller for guaranteed cost control. We establish a fundamental bound on the model error that depends only on the quantized data and quantization resolution, and develop a linear matrix inequality (LMI) based guaranteed cost robust controller under this error bound.
comment: 8 pages, 3 figures
Comparison of Forced and Unforced Rendezvous, Proximity Operations, and Docking Under Model Mismatch
This paper compares the required fuel usage for forced and unforced motion of a chaser satellite engaged in Rendezvous, Proximity Operations, and Docking (RPOD) maneuvers. Improved RPOD models are vital, particularly as the space industry expands and demands for improved fuel efficiency, cost effectiveness, and mission life span increase. This paper specifically examines the Clohessy- Wiltshire (CW) Equations and the extent of model mismatch by comparing pre- dicted trajectories from this model with a more computationally complex, higher fidelity RPOD model. This paper assesses several test cases of similar mission parameters, in each case comparing natural motion circumnavigation (NMC) with comparable forced motion circumnavigation. The Guidance, Navigation, and Con- trol (GNC) impulse maneuvers required to maintain the supposedly zero fuel CW trajectories is representative of the extent of CW model mismatch. This paper demonstrates that unforced motions are not inherently more fuel efficient than forced motions, thus permitting extended orbital operations given the higher fuel efficiency.
comment: 12 pages, 4 figures, AAS/AIAA Space Flight Mechanics
Identifying Best Candidates for Busbar Splitting
Rising electricity demand and the growing integration of renewables are intensifying congestion in transmission grids. Grid topology optimization through busbar splitting (BuS) and optimal transmission switching can alleviate grid congestion and reduce the generation costs in a power system. However, BuS optimization requires a large number of binary variables, and analyzing all the substations for potential new topological actions is computationally intractable, particularly in large grids. To tackle this issue, we propose a set of metrics to identify and rank promising candidates for BuS, focusing on finding buses where topology optimization can reduce generation costs. To assess the effect of BuS on the identified buses, we use a combined mixed-integer convex-quadratic BuS model to compute the optimal topology and test it with the non-linear non-convex AC optimal power flow (OPF) simulation to show its AC feasibility. By testing and validating the proposed metrics on test cases of different sizes, we show that they are able to identify busbars that reduce the total generation costs when their topology is optimized. Thus, the metrics enable effective selection of busbars for BuS, with no need to test every busbar in the grid, one at a time.
Competitive EV charging station location with queues
Electric vehicle (EV) public charging infrastructure planning faces significant challenges in competitive markets, where multiple service providers affect congestion and user behavior. This work extends existing modeling frameworks by incorporating the presence of competitors' stations and more realistic queueing systems. First, we analyze three finite queueing systems, M/M/1/K, M/M/s/K, and M/Er/s/K, with varying numbers of servers (charging outlets) and service time distributions, deriving analytic expressions for user behavior metrics. Second, we embed the queueing-based user behavior model into a bilevel program, where the upper level locates new charging stations to maximize accessibility (throughput), and the lower level captures users' station choices via a user equilibrium. Third, we apply a reformulation from competitive congested user-choice facility location models to approximately solve the bilevel problem and introduce a surrogate-based heuristic to enhance scalability. Fourth, we showcase our methodology on a real-world case study of an urban area in Montreal (Canada), offering managerial insights into how user-choice behavior assumptions and competition affect throughput and location decisions. The results demonstrate that our model yields (re)location strategies that outperform the existing network. More broadly, this approach provides a tool for incorporating charging service quality-through queueing metrics-and existing competition into station planning.
Model predictive control lowers barriers to adoption of heat-pump water heaters: A field study
Electric heat-pump water heaters (HPWHs) could reduce the energy costs, emissions, and power grid impacts associated with water heating, the second-largest energy use in United States housing. However, most HPWHs today require 240 V circuits to power the backup resistance heating elements they use to maintain comfort during large water draws. Installing a 240 V circuit can increase the up-front cost of a HPWH by half or more. This paper develops and field-tests the first control system that enables a 120 V HPWH to efficiently maintain comfort without resistance heating elements. The novel model predictive control (MPC) system enables pre-heating in anticipation of large water draws, which it forecasts using an ensemble of machine learning predictors. By shifting electrical load over time, MPC also reduces energy costs on average by 23% and 28% under time-of-use pricing and hourly pricing, respectively, relative to a 240 V HPWH with standard controls. Compared to the increasingly common practice in 120 V HPWHs of storing water at a constant, high temperature (60 {\deg}C) to ensure comfort, MPC saves 37% energy on average. In addition to demonstrating MPC's benefits in a real, occupied house, this paper discusses implementation challenges and costs. A simple payback analysis suggests that a 120 V HPWH, operated by the MPC system developed here, would be economically attractive in most installation scenarios.
Enhancing Profit and CO2 Mitigation: Commercial Direct Air Capture Design and Operation with Power Market Volatility
Current decarbonization efforts are falling short of meeting the net-zero greenhouse gas (GHG) emission target, highlighting the need for substantial carbon dioxide removal methods such as direct air capture (DAC). However, integrating DACs poses challenges due to their enormous power consumption. This study assesses the commercial operation of various DAC technologies that earn revenue using monetized carbon incentives while purchasing electricity from wholesale power markets. We model four commercial DAC technologies and examine their operation in three representative locations including California, Texas, and New York. Our findings reveal that commercial DAC operations can take financial advantage of the volatile power market to operate only during low-price periods strategically, offering a pathway to facilitate a cost-efficient decarbonization transition. The ambient operational environment such as temperature and relative humidity has non-trivial impact on abatement capacity. Profit-driven decisions introduce climate-economic trade-offs that might decrease the capacity factor of DAC and reduce total CO2 removal. These implications extend throughout the entire lifecycle of DAC developments and influence power systems and policies related to full-scale DAC implementation. Our study shows that DAC technologies with shorter cycle spans and higher flexibility can better exploit the electricity price volatility, while power markets demonstrate persistent low-price windows that often synergize with low grid emission periods, like during the solar "duck curve" in California. An optimal incentive design exists for profit-driven operations while carbon-tax policy in electricity pricing is counterproductive for DAC systems.
comment: 16 pages, 8 figure, Submitted and under review for Engineering
Non-Gaussian Distribution Steering in Nonlinear Dynamics with Conjugate Unscented Transformation
In highly nonlinear systems such as the ones commonly found in astrodynamics, Gaussian distributions generally evolve into non-Gaussian distributions. This paper introduces a method for effectively controlling non-Gaussian distributions in nonlinear environments using optimized linear feedback control. This paper utilizes Conjugate Unscented Transformation to quantify the higher-order statistical moments of non-Gaussian distributions. The formulation focuses on controlling and constraining the sigma points associated with the uncertainty quantification, which would thereby reflect the control of the entire distribution and constraints on the moments themselves. This paper develops an algorithm to solve this problem with sequential convex programming, and it is demonstrated through a two-body and three-body example. The examples show that individual moments can be directly controlled, and the moments are accurately approximated for non-Gaussian distributions throughout the controller's time horizon in nonlinear dynamics.
Gaussian Process Implicit Surfaces as Control Barrier Functions for Safe Robot Navigation
Level set methods underpin modern safety techniques such as control barrier functions (CBFs), while also serving as implicit surface representations for geometric shapes via distance fields. Inspired by these two paradigms, we propose a unified framework where the implicit surface itself acts as a CBF. We leverage Gaussian process (GP) implicit surface (GPIS) to represent the safety boundaries, using safety samples which are derived from sensor measurements to condition the GP. The GP posterior mean defines the implicit safety surface (safety belief), while the posterior variance provides a robust safety margin. Although GPs have favorable properties such as uncertainty estimation and analytical tractability, they scale cubically with data. To alleviate this issue, we develop a sparse solution called sparse Gaussian CBFs. To the best of our knowledge, GPIS have not been explicitly used to synthesize CBFs. We validate the approach on collision avoidance tasks in two settings: a simulated 7-DOF manipulator operating around the Stanford bunny, and a quadrotor navigating in 3D around a physical chair. In both cases, Gaussian CBFs (with and without sparsity) enable safe interaction and collision-free execution of trajectories that would otherwise intersect the objects.
comment: 8 pages, 7 figures, under review
A Wideband Composite Sequence Impedance Model for Evaluation of Interactions in Unbalanced Power-Electronic-Based Power Systems
This paper proposes a wideband composite sequence impedance model (WCSIM)-based analysis method to evaluate the interactions in power-electronic-based power systems subjected to unbalanced grid faults or with unbalanced loads. The WCSIM-based method intuitively assesses the impact of the small-signal interconnection among the positive-, negative-, and zero-sequence circuits on the interaction stability of unbalanced power systems. The effectiveness of this method is demonstrated using a permanent magnet synchronous generator-based weak grid system under a single-line-to-ground fault (SLGF). Frequency scanning results and controller hardware-in-loop tests validate both the correctness of the WCSIM and the effectiveness of the WCSIM-based analysis method.
comment: This work will be submitted to the IEEE for possible publication
ExaModelsPower.jl: A GPU-Compatible Modeling Library for Nonlinear Power System Optimization
As GPU-accelerated mathematical programming techniques mature, there is growing interest in utilizing them to address the computational challenges of power system optimization. This paper introduces ExaModelsPower.jl, an open-source modeling library for creating GPU-compatible nonlinear AC optimal power flow models. Built on ExaModels.jl, ExaModelsPower.jl provides a high-level interface that automatically generates all necessary callback functions for GPU solvers. The library is designed for large-scale problem instances, which may include multiple time periods and security constraints. Using ExaModelsPower.jl, we benchmark GPU and CPU solvers on open-source test cases. Our results show that GPU solvers can deliver up to two orders of magnitude speedups compared to alternative tools on CPU for problems with more than 20,000 variables and a solution precision of up to $10^{-4}$, while performance for smaller instances or tighter tolerances may vary.
Optimization of High-Order Quarter-Wave Plate for Birefringence Suppression in FOCS
Fiber optic current sensors (FOCS) are widely adopted in modern power grids due to high sensitivity, excellent insulation, and strong immunity to electromagnetic interference. This prominence necessitates precise investigation into their error sources and corresponding optimization. This study examines reflective FOCS based on the Faraday effect. A theoretical model is established to simulate phase error caused by linear birefringence from the quarter-wave plate. Conventional methods using circular birefringence are analyzed, revealing inherent limitations. Innovatively, a compensation strategy employing high-order quarter-wave plates is proposed to effectively eliminate linear birefringence effects. This approach significantly enhances the accuracy and practicality of FOCS in precision metrology.
The Algorithmic Regulator
The regulator theorem states that, under certain conditions, any optimal controller must embody a model of the system it regulates, grounding the idea that controllers embed, explicitly or implicitly, internal models of the controlled. This principle underpins neuroscience and predictive brain theories like the Free-Energy Principle or Kolmogorov/Algorithmic Agent theory. However, the theorem is only proven in limited settings. Here, we treat the deterministic, closed, coupled world-regulator system $(W,R)$ as a single self-delimiting program $p$ via a constant-size wrapper that produces the world output string~$x$ fed to the regulator. We analyze regulation from the viewpoint of the algorithmic complexity of the output, $K(x)$. We define $R$ to be a \emph{good algorithmic regulator} if it \emph{reduces} the algorithmic complexity of the readout relative to a null (unregulated) baseline $\varnothing$, i.e., \[ \Delta = K\big(O_{W,\varnothing}\big) - K\big(O_{W,R}\big) > 0. \] We then prove that the larger $\Delta$ is, the more world-regulator pairs with high mutual algorithmic information are favored. More precisely, a complexity gap $\Delta > 0$ yields \[ \Pr\big((W,R)\mid x\big) \le C\,2^{\,M(W{:}R)}\,2^{-\Delta}, \] making low $M(W{:}R)$ exponentially unlikely as $\Delta$ grows. This is an AIT version of the idea that ``the regulator contains a model of the world.'' The framework is distribution-free, applies to individual sequences, and complements the Internal Model Principle. Beyond this necessity claim, the same coding-theorem calculus singles out a \emph{canonical scalar objective} and implicates a \emph{planner}. On the realized episode, a regulator behaves \emph{as if} it minimized the conditional description length of the readout.
comment: 2 Figures
Integrative, Scalable Modeling of Hydrological Systems with MBSE and HFGT
Worsening global challenges in the Anthropocene demand complex, adaptive solutions grounded in a systems-level understanding of coupled social and environmental dynamics. However, existing modeling approaches often fall short due to disciplinary silos, limited scalability, and the absence of shared ontological frameworks. Model-Based Systems Engineering (MBSE), when integrated with Hetero-functional Graph Theory (HFGT), offers a powerful methodology for modeling systems of systems while preserving subsystem heterogeneity and enabling cross-disciplinary integration. This paper presents the first application of the MBSE-HFGT methodology to environmental systems, using a series of worked examples involving flow through lake and land segments. These examples demonstrate how the approach enables consistent, scalable, and integrative modeling of complex environmental processes.
A Cyber Insurance Policy for Hedging Against Load-Altering Attacks and Extreme Load Variations in Distribution Grids
Uncertainties in renewable energy resources (RES) and load variations can lead to elevated system operational costs. Moreover, the emergence of large-scale distributed threats, such as load-altering attacks (LAAs), can induce substantial load variations, further exacerbating these costs. Although traditional defense measures can reduce the likelihood of such attacks, considerable residual risks remain. Thus, this paper proposes a cyber insurance framework designed to hedge against additional operational costs resulting from LAAs and substantial load variations in renewable-rich grids. The insurance framework determines both the insurance coverage and premium based on the Value at Risk (VaR) and Tail Value at Risk (TVaR). These risk metrics are calculated using the system failure probability and the probability density function (PDF) of the system operation cost. The system failure probability is assessed through a semi-Markov process (SMP), while the cost distribution is estimated through a cost minimization model of a distribution grid combined with a Monte-Carlo simulation to capture load variability. Furthermore, we employ a bi-level optimization scheme that identifies the specific load distribution leading to the maximum system cost, thereby enhancing the accuracy of the operation cost PDF estimation. The effectiveness and scalability of the proposed cyber insurance policy are evaluated considering a modified IEEE-118 test bus system and the IEEE European low-voltage (LV) test feeders model. The case study shows that with a relatively low premium, the network operator can hedge against additional operational costs caused by malicious load manipulations.
Product-oriented Product-Process-Resource Asset Network and its Representation in AutomationML for Asset Administration Shell
Current products, especially in the automotive sector, pose complex technical systems having a multi-disciplinary mechatronic nature. Industrial standards supporting system engineering and production typically (i) address the production phase only, but do not cover the complete product life cycle, and (ii) focus on production processes and resources rather than the products themselves. The presented approach is motivated by incorporating the impacts of the end-of-life phase of the product life cycle into the engineering phase. This paper proposes a modeling approach coming up from the Product-Process-Resource (PPR) modeling paradigm. It combines requirements on (i) respecting the product structure as a basis for the model, and (ii) incorporates repairing, remanufacturing, or upcycling within cyber-physical production systems. The proposed model called PoPAN should accompany the product during the entire life cycle as a digital shadow encapsulated within the Asset Administration Shell of a product. To facilitate the adoption of the proposed paradigm, the paper also proposes serialization of the model in the AutomationML data format. The model is demonstrated on a use-case for disassembling electric vehicle batteries to support their remanufacturing for stationary battery applications.
comment: \copyright 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
RIS-Assisted Millimeter Wave Communications for Indoor Scenarios: Modeling and Coverage Analysis
Millimeter wave (mmWave) communications and reconfigurable intelligent surfaces (RIS) are two critical technologies for next-generation networks, especially in dense indoor environments. However, existing analyses often oversimplify the indoor environment by neglecting some of the key characteristics, such as height variations, boundary effects, blockage effects, and user spatial distributions. In this paper, we develop an improved stochastic geometry-based model for RIS-assisted mmWave communications in indoor scenarios like conference centers, hospitals, and shopping malls. The proposed model incorporates the height factor for all the nodes in the network (e.g., transmitters, users, RISs, and obstacles) and captures the user clustering behavior in these scenarios. In addition, the boundary effect is also being considered for line-of-sight (LOS) probability calculation. Analytical expressions for distance distributions, LOS probabilities, and the coverage probability (CP) are derived. The CP is then validated through Monte Carlo simulations. Our results reveal deployment insights by approximating and simplifying the derived CP expressions, showing how transmitter density, obstacle density, RIS density, and user cluster radius impact network coverage. Notably, we show that RISs significantly improve coverage when transmitters or transmit power are limited but offer marginal benefits when transmitter density is high. These findings provide practical guidelines for the design and deployment of RIS-assisted indoor mmWave networks.
Eco-driving Incentive Mechanisms for Mitigating Emissions in Urban Transportation
This paper develops incentive mechanisms for promoting eco-driving with the overarching goal of minimizing emissions in transportation networks. The system operator provides drivers with energy-efficient driving guidance throughout their trips and measures compliance through vehicle telematics that capture how closely drivers follow this guidance. Drivers optimize their behaviors based on personal trade-offs between travel times and emissions. To design effective incentives, the operator elicits driver preferences regarding trip urgency and willingness to eco-drive, while determining optimal budget allocations and eco-driving recommendations. Two distinct settings based on driver behavior are analyzed. When drivers report their preferences truthfully, an incentive mechanism ensuring obedience (drivers find it optimal to follow recommendations) is designed by implementing eco-driving recommendations as a Nash equilibrium. When drivers may report strategically, the mechanism is extended to be both obedient and truthful (drivers find it optimal to report truthfully). Unlike existing works that focus on congestion or routing decisions in transportation networks, our framework explicitly targets emissions reduction by incentivizing drivers. The proposed mechanism addresses both strategic behavior and network effects arising from driver interactions, without requiring the operator to reveal system parameters to the drivers. Numerical simulations demonstrate the effects of budget constraints, driver types, and strategic misreporting on equilibrium outcomes and emissions reduction.
comment: 12 pages, 6 figures
Globally Stable Discrete Time PID Passivity-based Control of Power Converters: Simulation and Experimental Results
The key idea behind PID Passivity-based Control (PID-PBC) is to leverage the passivity property of PIDs (for all positive gains) and wrap the PID controller around a passive output to ensure global stability in closed-loop. However, the practical applicability of PID-PBC is stymied by two key facts: (i) the vast majority of practical implementations of PIDs is carried-out in discrete time -- discretizing the continuous time dynamical system of the PID; (ii) the well-known problem that passivity is not preserved upon discretization, even with small sampling times. Therefore, two aspects of the PID-PBC must be revisited for its safe practical application. First, we propose a discretization of the PID that ensures its passivity. Second, since the output that is identified as passive for the continuous time system is not necessarily passive for its discrete time version, we construct a new output that ensures the passivity property for the discretization of the system. In this paper, we provide a constructive answer to both issues for the case of power converter models. Instrumental to achieve this objective is the use of the implicit midpoint discretization method -- which is a symplectic integration technique that preserves system invariants. Since the reference value for the output to be regulated in power converters is non-zero, we are henceforth interested in the property of passivity of the incremental model -- currently known as shifted passivity. Therefore, we demonstrate that the resulting discrete-time PID-PBC defines a passive map for the incremental model and establish shifted passivity for the discretized power converter model. Combining these properties, we prove global stability for the feedback interconnection of the power converter with the discretized PID-PBC. The paper also presents simulations and experiments that demonstrate the performance of the proposed discretization.
Padé Approximant Neural Networks for Enhanced Electric Motor Fault Diagnosis Using Vibration and Acoustic Data
Purpose: The primary aim of this study is to enhance fault diagnosis in induction machines by leveraging the Pad\'e Approximant Neuron (PAON) model. While accelerometers and microphones are standard in motor condition monitoring, deep learning models with nonlinear neuron architectures offer promising improvements in diagnostic performance. This research investigates whether Pad\'e Approximant Neural Networks (Pad\'eNets) can outperform conventional Convolutional Neural Networks (CNNs) and Self-Organized Operational Neural Networks (Self-ONNs) in the diagnosis of electrical and mechanical faults from vibration and acoustic data. Methods: We evaluate and compare the diagnostic capabilities of three deep learning architectures: one-dimensional CNNs, Self-ONNs, and Pad\'eNets. These models are tested on the University of Ottawa's publicly available constant-speed induction motor datasets, which include both vibration and acoustic sensor data. The Pad\'eNet model is designed to introduce enhanced nonlinearity and is compatible with unbounded activation functions such as LeakyReLU. Results and Conclusion: Pad\'eNets consistently outperformed the baseline models, achieving diagnostic accuracies of 99.96%, 98.26%, 97.61%, and 98.33% for accelerometers 1, 2, 3, and the acoustic sensor, respectively. The enhanced nonlinearity of Pad\'eNets, together with their compatibility with unbounded activation functions, significantly improves fault diagnosis performance in induction motor condition monitoring.
comment: This version is the author's accepted manuscript. It has been peer-reviewed and accepted for publication in Journal of Vibration Engineering & Technologies. The final published version is available at https://doi.org/10.1007/s42417-025-02129-5
Dissipativity-Based Distributed Control and Communication Topology Co-Design for DC Microgrids with ZIP Loads
This paper presents a novel dissipativity-based distributed droop-free control and communication topology co-design approach for voltage regulation and current sharing in DC microgrids (DC MGs) with generic ``ZIP'' (constant impedance (Z), current (I) and power (P)) loads. While ZIP loads accurately capture the varied nature of the consumer loads, its constant power load (CPL) component is particularly challenging (and destabilizing) due to its non-linear form. Moreover, ensuring simultaneous voltage regulation and current sharing and co-designing controllers and topology are also challenging when designing control solutions for DC MGs. To address these three challenges, we model the DC MG as a networked system comprised of distributed generators (DGs), ZIP loads, and lines interconnected according to a static interconnection matrix. Next, we equip each DG with a local controller and a distributed global controller (over an arbitrary topology) to derive the error dynamic model of the DC MG as a networked ``error'' system, including disturbance inputs and performance outputs. Subsequently, to co-design the controllers and the topology ensuring robust (dissipative) voltage regulation and current sharing performance, we use the dissipativity and sector boundedness properties of the involved subsystems and formulate Linear Matrix Inequality (LMI) problems to be solved locally and globally. To support the feasibility of the global LMI problem, we identify and embed several crucial necessary conditions in the corresponding local LMI problems, thus providing a one-shot approach to solve the LMI problems. Overall, the proposed approach in this paper provides a unified framework for designing DC MGs. The effectiveness of the proposed solution was verified by simulating an islanded DC MG under different scenarios, demonstrating superior performance compared to traditional control approaches.
comment: arXiv admin note: substantial text overlap with arXiv:2503.04908
Dissipativity-Based Distributed Control and Communication Topology Co-Design for Voltage Regulation and Current Sharing in DC Microgrids
This paper presents a novel dissipativity-based distributed droop-free control approach for voltage regulation and current sharing in DC microgrids (MGs) comprised of an interconnected set of distributed generators (DGs), loads, and power lines. First, we describe the closed-loop DC MG as a networked system where the DGs and lines (i.e., subsystems) are interconnected via a static interconnection matrix. This interconnection matrix demonstrates how the inputs, outputs, and disturbances of DGs and lines are connected in a DC MG. Each DG is equipped with a local controller for voltage regulation and a distributed global controller for current sharing, where the local controllers ensure individual voltage tracking while the global controllers coordinate among DGs to achieve proportional current sharing. To design the distributed global controllers, we use the dissipativity properties of the subsystems and formulate a linear matrix inequality (LMI) problem. To support the feasibility of this problem, we identify a set of necessary local and global conditions to enforce in a specifically developed LMI-based local controller design process. In contrast to existing DC MG control solutions, our approach proposes a unified framework for co-designing the distributed controller and communication topology. As the co-design process is LMI-based, it can be efficiently implemented and evaluated using existing convex optimization tools. The effectiveness of the proposed solution is verified by simulating an islanded DC MG in a MATLAB/Simulink environment under different scenarios, such as load changes and topological constraint changes, and then comparing the performance with the droop control algorithm.
The Untapped Potential of Smart Charging: How EV Owners Can Save Money and Reduce Emissions Without Behavioral Change
The transportation sector is the single largest contributor to US emissions and the second largest globally. Electric vehicles (EVs) are expected to represent half of global car sales by 2035, emerging as a pivotal solution to reduce emissions and enhance grid flexibility. The electrification of buildings, manufacturing, and transportation is expected to grow electricity demand substantially over the next decade. Without effectively managed EV charging, EVs could strain energy grid infrastructure and increase electricity costs. Drawing on de-identified 2023 EV telematics data from Rivian Automotive, this study found that 72% of home charging commenced after the customer plugged in their vehicle regardless of utility time of use (TOU) tariffs or managed charging programs. In fewer than 26% of charging sessions in the sample, EV owners actively scheduled charging times to align or participate in utility tariffs or programs. With a majority of drivers concurrently plugged in during optimal charging periods yet not actively charging, the study identified an opportunity to reduce individual EV owner costs and carbon emissions through smarter charging habits without significant behavioral modifications or sacrifice in user preferences. By optimizing home charging schedules within existing plug-in and plug-out windows, the study suggests that EV owners can save an average of $140 annually and reduce the associated carbon emissions of charging their EV by as much as 28%.
A Neural Network-based Multi-timestep Command Governor for Nonlinear Systems with Constraints
The multi-timestep command governor (MCG) is an add-on algorithm that enforces constraints by modifying, at each timestep, the reference command to a pre-stabilized control system. The MCG can be interpreted as a Model-Predictive Control scheme operating on the reference command. The implementation of MCG on nonlinear systems carries a heavy computational burden as it requires solving a nonlinear program with multiple decision variables at each timestep. This paper proposes a less computationally demanding alternative, based on approximating the MCG control law using a neural network (NN) trained on offline data. However, since the NN output may not always be constraint-admissible due to training errors, its output is adjusted using a sensitivity-based method. We thus refer to the resulting control strategy as the neural network-based MCG (NN-MCG). As validation, the proposed controller is applied as a load governor for constraint management in an automotive fuel cell system. It is shown that the proposed strategy is significantly more computationally efficient than the traditional MCG, while achieving nearly identical performance if the NN is well-trained.
comment: Accepted for publication in the 2025 IEEE Conference on Control Technology and Applications (CCTA)
Systems and Control (EESS)
Autonomous Legged Mobile Manipulation for Lunar Surface Operations via Constrained Reinforcement Learning
Robotics plays a pivotal role in planetary science and exploration, where autonomous and reliable systems are crucial due to the risks and challenges inherent to space environments. The establishment of permanent lunar bases demands robotic platforms capable of navigating and manipulating in the harsh lunar terrain. While wheeled rovers have been the mainstay for planetary exploration, their limitations in unstructured and steep terrains motivate the adoption of legged robots, which offer superior mobility and adaptability. This paper introduces a constrained reinforcement learning framework designed for autonomous quadrupedal mobile manipulators operating in lunar environments. The proposed framework integrates whole-body locomotion and manipulation capabilities while explicitly addressing critical safety constraints, including collision avoidance, dynamic stability, and power efficiency, in order to ensure robust performance under lunar-specific conditions, such as reduced gravity and irregular terrain. Experimental results demonstrate the framework's effectiveness in achieving precise 6D task-space end-effector pose tracking, achieving an average positional accuracy of 4 cm and orientation accuracy of 8.1 degrees. The system consistently respects both soft and hard constraints, exhibiting adaptive behaviors optimized for lunar gravity conditions. This work effectively bridges adaptive learning with essential mission-critical safety requirements, paving the way for advanced autonomous robotic explorers for future lunar missions.
comment: This is the authors version of the paper accepted for publication in The IEEE International Conference on Space Robotics 2025. The final version link will be added here after conference proceedings are published
Variational Quantum Eigensolver Models of Molecular Quantum Dot Cellular Automata
Molecular quantum-dot Cellular Automata (QCA) may provide low-power, high-speed computational hardware for processing classical information. Simulation and modeling play an important role in the design of QCA circuits because fully-coherent models of QCA scale exponentially with the number of devices, and such models are severely limited in size. For larger circuits, approximations become necessary. In the era of fault-tolerant quantum computation, however, it may become possible to model large QCA circuits without such limitations. Presently, this work explores the use of the noisy-intermediate scale quantum (NISQ) variational quantum eigensolver (VQE) method for estimating the ground state of QCA circuits. This is relevant because the computational result of a QCA calculation is encoded in the circuit's ground state. In this study, VQE is used to model logic circuits, including binary wires, inverters, and majority gates. VQE models are performed ideal simulators, noisy simulators, and actual quantum hardware. This study demonstrates that VQE may indeed be used to model molecular QCA circuits. It is observed that using modern NISQ hardware, results are still quite sensitive to noise, so measures should be taken to minimize noise. These include simplifying the ansatz circuit whenever possible, and using low-noise hardware.
comment: 18 pages, 26 figures, submitted to the Journal of Applied Physics
Learning Robust Agile Flight Control with Stability Guarantees
In the evolving landscape of high-speed agile quadrotor flight, achieving precise trajectory tracking at the platform's operational limits is paramount. Controllers must handle actuator constraints, exhibit robustness to disturbances, and remain computationally efficient for safety-critical applications. In this work, we present a novel neural-augmented feedback controller for agile flight control. The controller addresses individual limitations of existing state-of-the-art control paradigms and unifies their strengths. We demonstrate the controller's capabilities, including the accurate tracking of highly aggressive trajectories that surpass the feasibility of the actuators. Notably, the controller provides universal stability guarantees, enhancing its robustness and tracking performance even in exceedingly disturbance-prone settings. Its nonlinear feedback structure is highly efficient enabling fast computation at high update rates. Moreover, the learning process in simulation is both fast and stable, and the controller's inherent robustness allows direct deployment to real-world platforms without the need for training augmentations or fine-tuning.
Enhancing Robust Multi-Market Participation of Renewable-Based VPPs through Flexible Resources
In the transition toward a sustainable power system, renewable-based Virtual Power Plants (RVPPs) have emerged as a promising solution to the challenges of integrating renewable energy sources into electricity markets. Their viability, however, depends on effective market participation strategies and the ability to manage uncertainties while leveraging flexible resources. This paper analyzes the impact of different flexible resources - such as concentrated solar power plants, hydro plants, biomass plants, and flexible demand - on the participation of RVPPs in energy and reserve markets. Multiple sources of uncertainty in generation, consumption, and electricity prices are addressed using a two-stage robust optimization approach. The contribution of different technologies to RVPP profitability is evaluated through a marginal contribution method, ensuring fair allocation of profits among them according to their actual role in energy and reserve provision across markets. Simulations for an RVPP in southern Spain demonstrate how strategic decisions and the availability of flexible resources influence viability, market participation, and unit scheduling.
Privacy-Preserving Distributed Estimation with Limited Data Rate
This paper focuses on the privacy-preserving distributed estimation problem with a limited data rate, where the observations are the sensitive information. Specifically, a binary-valued quantizer-based privacy-preserving distributed estimation algorithm is developed, which improves the algorithm's privacy-preserving capability and simultaneously reduces the communication costs. The algorithm's privacy-preserving capability, measured by the Fisher information matrix, is dynamically enhanced over time. Notably, the Fisher information matrix of the output signals with respect to the sensitive information converges to zero at a polynomial rate, and the improvement in privacy brought by the quantizers is quantitatively characterized as a multiplicative effect. Regarding the communication costs, each sensor transmits only 1 bit of information to its neighbours at each time step. Additionally, the assumption on the negligible quantization error for real-valued messages is not required. While achieving the requirements of privacy preservation and reducing communication costs, the algorithm ensures that its estimates converge almost surely to the true value of the unknown parameter by establishing a co-design guideline for the time-varying privacy noises and step-sizes. A polynomial almost sure convergence rate is obtained, and then the trade-off between privacy and convergence rate is established. Numerical examples demonstrate the main results.
Optimising Communication Control Factors for Energy Consumption in Rural LOS V2X
Connected braking can reduce fatal collisions in connected and autonomous vehicles (CAVs) by using reliable, low-latency 5G New Radio (NR) links, especially NR Sidelink Vehicle-to-Everything (V2X). In rural areas, road side units are sparse and power-constrained or off-grid, so energy efficiency must be considered alongside safety. This paper studies how three communication control factors including subcarrier spacing ($\mathrm{SCS}$), modulation and coding scheme ($\mathrm{MCS}$), and transmit power ($P_{\mathrm{t}}$) should be configured to balance safety and energy consumption in rural line-of-sight (LOS) scenarios in light and heavy traffic scenarios. Safety is quantified by the packet receive ratio ($\mathrm{PRR}$) against the minimum communication distance $D_{\mathrm{comm}}$, defined as the distance that the vehicle travels during the transmission of the safety message. Results show that, under heavy traffic, increasing $P_{\mathrm{t}}$ and selecting a low-rate $\mathrm{MCS}$ at $\mathrm{SCS} = 30$ kHz sustains high $\mathrm{PRR}$ at $D_{\mathrm{comm}}$, albeit with higher energy cost. In light traffic, maintaining lower $P_\mathrm{t}$ with low $\mathrm{MCS}$ levels achieves a favorable reliability-energy trade-off while preserving acceptable $\mathrm{PRR}$ at $D_{\mathrm{comm}}$. These findings demonstrate the necessity of adaptive, energy-aware strategy to guarantee both safety and energy efficiency in rural V2X systems.
Temporal Variabilities Limit Convergence Rates in Gradient-Based Online Optimization
This paper investigates the fundamental performance limits of gradient-based algorithms for time-varying optimization. Leveraging the internal model principle and root locus techniques, we show that temporal variabilities impose intrinsic limits on the achievable rate of convergence. For a problem with condition ratio $\kappa$ and time variation whose model has degree $n$, we show that the worst-case convergence rate of any minimal-order gradient-based algorithm is $\rho_\text{TV} = (\frac{\kappa-1}{\kappa+1})^{1/n}$. This bound reveals a fundamental tradeoff between problem conditioning, temporal complexity, and rate of convergence. We further construct explicit controllers that attain the bound for low-degree models of time variation.
DarTwin made precise by SysMLv2 -- An Experiment
The new SysMLv2 adds mechanisms for the built-in specification of domain-specific concepts and language extensions. This feature promises to facilitate the creation of Domain-Specific Languages (DSLs) and interfacing with existing system descriptions and technical designs. In this paper, we review these features and evaluate SysMLv2's capabilities using concrete use cases. We develop DarTwin DSL, a DSL that formalizes the existing DarTwin notation for Digital Twin (DT) evolution, through SysMLv2, thereby supposedly enabling the wide application of DarTwin's evolution templates using any SysMLv2 tool. We demonstrate DarTwin DSL, but also point out limitations in the currently available tooling of SysMLv2 in terms of graphical notation capabilities. This work contributes to the growing field of Model-Driven Engineering (MDE) for DTs and combines it with the release of SysMLv2, thus integrating a systematic approach with DT evolution management in systems engineering.
Micro-Macro Backstepping Control of Large-Scale Hyperbolic Systems (Extended Version)
We introduce a control design and analysis framework for micro-macro, boundary control of large-scale, $n+m$ hyperbolic PDE systems. Specifically, we develop feedback laws for stabilization of hyperbolic systems at the micro level (i.e., of the large-scale system) that employ a) measurements obtained from the $n+m$ system (i.e., at micro level) and kernels constructed based on an $\infty+\infty$ continuum system counterpart (i.e., at macro level), or b) kernels and measurements both stemming from a continuum counterpart, or c) averaged-continuum kernels/measurements. We also address (d)) stabilization of the continuum (macro) system, employing continuum kernels and measurements. Towards addressing d) we derive in a constructive manner an $\infty+\infty$ continuum approximation of $n+m$ hyperbolic systems and establish that its solutions approximate, for large $n$ and $m$, the solutions of the $n+m$ system. We then construct a feedback law for stabilization of the $\infty+\infty$ system via introduction of a continuum-PDE backstepping transformation. We establish well-posedness of the resulting 4-D kernel equations and prove closed-loop stability via construction of a novel Lyapunov functional. Furthermore, under control configuration a) we establish that the closed-loop system is exponentially stable provided that $n$ and $m$ are large, by proving that the exact, stabilizing $n+m$ control kernels can be accurately approximated by the continuum kernels. While under control configurations b) and c), we establish closed-loop stability capitalizing on the established solutions' and kernels' approximation properties via employment of infinite-dimensional ISS arguments. We provide two numerical simulation examples to illustrate the effectiveness and potential limitations of our design approach.
comment: 22 pages, 5 figures
The value of storage in electricity distribution: The role of storage
Electricity distribution companies deploy battery storage to defer grid upgrades by reducing peak demand. In deregulated jurisdictions, such storage often sits idle because regulatory constraints bar participation in electricity markets. Here, we develop an optimization framework that, to our knowledge, provides the first formal model of market participation constraints within storage investment and operation planning. Applying the framework to a Massachusetts case study, we find that market participation could deliver similar savings as peak demand reduction. Under current conditions, market participation does not increase storage investment, but at very low storage costs, could incentivize deployment beyond local distribution needs. This might run contrary to the separation of distribution from generation in deregulated markets. Our framework can identify investment levels appropriate for local distribution needs.
High-Parallel FPGA-Based Discrete Simulated Bifurcation for Large-Scale Optimization
Combinatorial Optimization (CO) problems exhibit exponential complexity, making their resolution challenging. Simulated Adiabatic Bifurcation (aSB) is a quantum-inspired algorithm to obtain approximate solutions to largescale CO problems written in the Ising form. It explores the solution space by emulating the adiabatic evolution of a network of Kerr-nonlinear parametric oscillators (KPOs), where each oscillator represents a variable in the problem. The optimal solution corresponds to the ground state of this system. A key advantage of this approach is the possibility of updating multiple variables simultaneously, making it particularly suited for hardware implementation. To enhance solution quality and convergence speed, variations of the algorithm have been proposed in the literature, including ballistic (bSB), discrete (dSB), and thermal (HbSB) versions. In this work, we have comprehensively analyzed dSB, bSB, and HbSB using dedicated software models, evaluating the feasibility of using a fixed-point representation for hardware implementation. We then present an opensource hardware architecture implementing the dSB algorithm for Field-Programmable Gate Arrays (FPGAs). The design allows users to adjust the degree of algorithmic parallelization based on their specific requirements. A proof-of-concept implementation that solves 256-variable problems was achieved on an AMD Kria KV260 SoM, a low-tier FPGA, validated using well-known max-cut and knapsack problems.
Pooling Probabilistic Forecasts for Cooperative Wind Power Offering SC
Wind power producers can benefit from forming coalitions to participate cooperatively in electricity markets. To support such collaboration, various profit allocation rules rooted in cooperative game theory have been proposed. However, existing approaches overlook the lack of coherence among producers regarding forecast information, which may lead to ambiguity in offering and allocations. In this paper, we introduce a ``reconcile-then-optimize'' framework for cooperative market offerings. This framework first aligns the individual forecasts into a coherent joint forecast before determining market offers. With such forecasts, we formulate and solve a two-stage stochastic programming problem to derive both the aggregate offer and the corresponding scenario-based dual values for each trading hour. Based on these dual values, we construct a profit allocation rule that is budget-balanced and stable. Finally, we validate the proposed method through empirical case studies, demonstrating its practical effectiveness and theoretical soundness.
comment: submission to PSCC 2026, 7 pages
A Unidirectionally Connected FAS Approach for 6-DOF Quadrotor Control
This paper proposes a unidirectionally connected fully actuated system (UC-FAS) approach for the sub-stabilization and tracking control of 6-DOF quadrotors, tackling limitations both in state-space and FAS framework to some extent. The framework systematically converts underactuated quadrotor dynamics into a UC-FAS model, unifying the existing different FAS transformation ways. By eliminating estimation of the high-order derivatives of control inputs, a drawback of current methods, the UC-FAS model simplifies controller design and enables direct eigenstructure assignment for closed-loop dynamics. Simulations demonstrate precise 6-DOF tracking performance. This work bridges theoretical FAS approach advancements with practical implementation needs, offering a standardized paradigm for nonlinear quadrotor control.
comment: This paper has been submitted to 2026 IFAC World Congress. Corresponding author: Guang-Ren Duan
Ultrafast Grid Impedance Identification in $dq$-Asymmetric Three-Phase Power Systems
We propose a non-parametric frequency-domain method to identify small-signal $dq$-asymmetric grid impedances, over a wide frequency band, using grid-connected converters. Existing identification methods are faced with significant trade-offs: e.g., passive approaches rely on ambient harmonics and rare grid events and thus can only provide estimates at a few frequencies, while many active approaches that intentionally perturb grid operation require long time series measurement and specialized equipment. Although active time-domain methods reduce the measurement time, they either make crude simplifying assumptions or require laborious model order tuning. Our approach effectively addresses these challenges: it does not require specialized excitation signals or hardware and achieves ultrafast ($<1$ s) identification, drastically reducing measurement time. Being non-parametric, our approach also makes no assumptions on the grid structure. A detailed electromagnetic transient simulation is used to validate the method and demonstrate its clear superiority over existing alternatives.
Physics-Informed Reinforcement Learning for Large-Scale EV Smart Charging Considering Distribution Network Voltage Constraints
Electric Vehicles (EVs) offer substantial flexibility for grid services, yet large-scale, uncoordinated charging can threaten voltage stability in distribution networks. Existing Reinforcement Learning (RL) approaches for smart charging often disregard physical grid constraints or have limited performance for complex large-scale tasks, limiting their scalability and real-world applicability. This paper introduces a physics-informed (PI) RL algorithm that integrates a differentiable power flow model and voltage-based reward design into the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm, enabling EVs to deliver real-time voltage support while meeting user demands. The resulting PI-TD3 algorithm achieves faster convergence, improved sample efficiency, and reliable voltage magnitude regulation under uncertain and overloaded conditions. Benchmarks on the IEEE 34-bus and 123-bus networks show that the proposed PI-TD3 outperforms both model-free RL and optimization-based baselines in grid constraint management, user satisfaction, and economic metrics, even as the system scales to hundreds of EVs. These advances enable robust, scalable, and practical EV charging strategies that enhance grid resilience and support distribution networks operation.
Empowering Prosumers: Incentive Design for Local Electricity Markets Under Generalized Uncertainty and Grid Constraints
Since the 1990s, widespread introduction of central (wholesale) electricity markets has been seen across multiple continents, driven by the search for efficient operation of the power grid through competition. The increase of renewables has made significant impacts both on central electricity markets and distribution-level grids as renewable power generation is often connected to the latter. These stochastic renewable technologies have both advantages and disadvantages. On one hand they offer very low marginal cost and carbon emissions, while on the other hand, their output is uncertain, requiring flexible backup power with high marginal cost. Flexibility from end-prosumers or smaller market participants is therefore seen as a key enabler of large-scale integration of renewables. However, current central electricity markets do not directly include uncertainty into the market clearing and do not account for physical constraints of distribution grids. In this paper we propose a local electricity market framework based on probabilistic locational marginal pricing, effectively accounting for uncertainties in production, consumption and grid variables. The model includes a representation of the grid using the lindistflow equations and accounts for the propagation of uncertainty using general Polynomial Chaos (gPC). A two-stage convex model is proposed; in the day-ahead stage, probability distributions of prices are calculated for every timestep, where the expected values represent the day-ahead (spot) prices. In the real-time stage, uncertainties are realized (measured) and a trivial calculation reveals the real-time price. Through four instructive case-studies we highlight the effectiveness of the method to incentivize end-prosumers' participation in the market, while ensuring that their behavior does not have an adverse impact on the operation of the grid.
Human-in-the-Loop Bandwidth Estimation for Quality of Experience Optimization in Real-Time Video Communication AAAI
The quality of experience (QoE) delivered by video conferencing systems is significantly influenced by accurately estimating the time-varying available bandwidth between the sender and receiver. Bandwidth estimation for real-time communications remains an open challenge due to rapidly evolving network architectures, increasingly complex protocol stacks, and the difficulty of defining QoE metrics that reliably improve user experience. In this work, we propose a deployed, human-in-the-loop, data-driven framework for bandwidth estimation to address these challenges. Our approach begins with training objective QoE reward models derived from subjective user evaluations to measure audio and video quality in real-time video conferencing systems. Subsequently, we collect roughly $1$M network traces with objective QoE rewards from real-world Microsoft Teams calls to curate a bandwidth estimation training dataset. We then introduce a novel distributional offline reinforcement learning (RL) algorithm to train a neural-network-based bandwidth estimator aimed at improving QoE for users. Our real-world A/B test demonstrates that the proposed approach reduces the subjective poor call ratio by $11.41\%$ compared to the baseline bandwidth estimator. Furthermore, the proposed offline RL algorithm is benchmarked on D4RL tasks to demonstrate its generalization beyond bandwidth estimation.
comment: Accepted for publication in the proceedings of the AAAI Conference on Artificial Intelligence 2026 (IAAI Technical Track on Deployed Highly Innovative Applications of AI)
Sleepy Chauffeur Detection and Alert Techniques for Road Safety
The most startling of the contemporary problems is the sleepiness of chauffeur which causes lots of car accidents. Prevention of those impending accidents by detecting and alerting the sleepy chauffeur is vital, otherwise that would lead to loss of lives and various traumas along with severe injuries. The slumber or sleep may be caused by huge stress, pressure, relentless work load or alcoholism, for which sleep deprivation occurs and the chauffeur while driving gets drowsy. So far, considerable amount of systems has been developed to detect drowsiness of drivers, most of which mainly depend on image processing algorithms using cameras. Some of them also incorporate artificial intelligence and machine learning based algorithms. This paper presents a review of the existing systems and also proposes an easy and cheap system using sensors and Arduino, capable of detecting sleepiness and generates siren alarm and send alert message to take precautionary measures.
comment: 8 pages, 5 figures, International Journal on Recent Innovation in Microelectronics and Microcontrollers Applications Vol. 1, Issue 1 - 2018
Hybrid Terrain-Aware Path Planning: Integrating VD--RRT\(^{*}\) Exploration and VD--D\(^{*}\) Lite Repair
Autonomous ground vehicles operating off-road must plan curvature-feasible paths while accounting for spatially varying soil strength and slope hazards in real time. We present a continuous state--cost metric that combines a Bekker pressure--sinkage model with elevation-derived slope and attitude penalties. The resulting terrain cost field is analytic, bounded, and monotonic in soil modulus and slope, ensuring well-posed discretization and stable updates under sensor noise. This metric is evaluated on a lattice with exact steering primitives: Dubins and Reeds--Shepp motions for differential drive and time-parameterized bicycle arcs for Ackermann steering. Global exploration is performed using Vehicle-Dynamics RRT\(^{*}\), while local repair is managed by Vehicle-Dynamics D\(^{*}\) Lite, enabling millisecond-scale replanning without heuristic smoothing. By separating the terrain--vehicle model from the planner, the framework provides a reusable basis for deterministic, sampling-based, or learning-driven planning in deformable terrain. Hardware trials on an off-road platform demonstrate real-time navigation across soft soil and slope transitions, supporting reliable autonomy in unstructured environments.
Towards xApp Conflict Evaluation with Explainable Machine Learning and Causal Inference in O-RAN
The Open Radio Access Network (O-RAN) architecture enables a flexible, vendor-neutral deployment of 5G networks by disaggregating base station components and supporting third-party xApps for near real-time RAN control. However, the concurrent operation of multiple xApps can lead to conflicting control actions, which may cause network performance degradation. In this work, we propose a framework for xApp conflict management that combines explainable machine learning and causal inference to evaluate the causal relationships between RAN Control Parameters (RCPs) and Key Performance Indicators (KPIs). We use model explainability tools such as SHAP to identify RCPs that jointly affect the same KPI, signaling potential conflicts, and represent these interactions as a causal Directed Acyclic Graph (DAG). We then estimate the causal impact of each of these RCPs on their associated KPIs using metrics such as Average Treatment Effect (ATE) and Conditional Average Treatment Effect (CATE). This approach offers network operators guided insights into identifying conflicts and quantifying their impacts, enabling more informed and effective conflict resolution strategies across diverse xApp deployments.
Information Shapes Koopman Representation
The Koopman operator provides a powerful framework for modeling dynamical systems and has attracted growing interest from the machine learning community. However, its infinite-dimensional nature makes identifying suitable finite-dimensional subspaces challenging, especially for deep architectures. We argue that these difficulties come from suboptimal representation learning, where latent variables fail to balance expressivity and simplicity. This tension is closely related to the information bottleneck (IB) dilemma: constructing compressed representations that are both compact and predictive. Rethinking Koopman learning through this lens, we demonstrate that latent mutual information promotes simplicity, yet an overemphasis on simplicity may cause latent space to collapse onto a few dominant modes. In contrast, expressiveness is sustained by the von Neumann entropy, which prevents such collapse and encourages mode diversity. This insight leads us to propose an information-theoretic Lagrangian formulation that explicitly balances this tradeoff. Furthermore, we propose a new algorithm based on the Lagrangian formulation that encourages both simplicity and expressiveness, leading to a stable and interpretable Koopman representation. Beyond quantitative evaluations, we further visualize the learned manifolds under our representations, observing empirical results consistent with our theoretical predictions. Finally, we validate our approach across a diverse range of dynamical systems, demonstrating improved performance over existing Koopman learning methods. The implementation is publicly available at https://github.com/Wenxuan52/InformationKoopman.
Data to Certificate: Guaranteed Cost Control with Quantization-Aware System Identification
Cloud-assisted system identification and control have emerged as practical solutions for low-power, resource-constrained control systems such as micro-UAVs. In a typical cloud-assisted setting, state and input data are transmitted from local agents to a central computer over low-bandwidth wireless links, leading to quantization. This paper investigates the impact of state and input data quantization on a linear time invariant (LTI) system identification, derives a worst-case bound on the identification error, and develops a robust controller for guaranteed cost control. We establish a fundamental bound on the model error that depends only on the quantized data and quantization resolution, and develop a linear matrix inequality (LMI) based guaranteed cost robust controller under this error bound.
comment: 8 pages, 3 figures
Comparison of Forced and Unforced Rendezvous, Proximity Operations, and Docking Under Model Mismatch
This paper compares the required fuel usage for forced and unforced motion of a chaser satellite engaged in Rendezvous, Proximity Operations, and Docking (RPOD) maneuvers. Improved RPOD models are vital, particularly as the space industry expands and demands for improved fuel efficiency, cost effectiveness, and mission life span increase. This paper specifically examines the Clohessy- Wiltshire (CW) Equations and the extent of model mismatch by comparing pre- dicted trajectories from this model with a more computationally complex, higher fidelity RPOD model. This paper assesses several test cases of similar mission parameters, in each case comparing natural motion circumnavigation (NMC) with comparable forced motion circumnavigation. The Guidance, Navigation, and Con- trol (GNC) impulse maneuvers required to maintain the supposedly zero fuel CW trajectories is representative of the extent of CW model mismatch. This paper demonstrates that unforced motions are not inherently more fuel efficient than forced motions, thus permitting extended orbital operations given the higher fuel efficiency.
comment: 12 pages, 4 figures, AAS/AIAA Space Flight Mechanics
Identifying Best Candidates for Busbar Splitting
Rising electricity demand and the growing integration of renewables are intensifying congestion in transmission grids. Grid topology optimization through busbar splitting (BuS) and optimal transmission switching can alleviate grid congestion and reduce the generation costs in a power system. However, BuS optimization requires a large number of binary variables, and analyzing all the substations for potential new topological actions is computationally intractable, particularly in large grids. To tackle this issue, we propose a set of metrics to identify and rank promising candidates for BuS, focusing on finding buses where topology optimization can reduce generation costs. To assess the effect of BuS on the identified buses, we use a combined mixed-integer convex-quadratic BuS model to compute the optimal topology and test it with the non-linear non-convex AC optimal power flow (OPF) simulation to show its AC feasibility. By testing and validating the proposed metrics on test cases of different sizes, we show that they are able to identify busbars that reduce the total generation costs when their topology is optimized. Thus, the metrics enable effective selection of busbars for BuS, with no need to test every busbar in the grid, one at a time.
Competitive EV charging station location with queues
Electric vehicle (EV) public charging infrastructure planning faces significant challenges in competitive markets, where multiple service providers affect congestion and user behavior. This work extends existing modeling frameworks by incorporating the presence of competitors' stations and more realistic queueing systems. First, we analyze three finite queueing systems, M/M/1/K, M/M/s/K, and M/Er/s/K, with varying numbers of servers (charging outlets) and service time distributions, deriving analytic expressions for user behavior metrics. Second, we embed the queueing-based user behavior model into a bilevel program, where the upper level locates new charging stations to maximize accessibility (throughput), and the lower level captures users' station choices via a user equilibrium. Third, we apply a reformulation from competitive congested user-choice facility location models to approximately solve the bilevel problem and introduce a surrogate-based heuristic to enhance scalability. Fourth, we showcase our methodology on a real-world case study of an urban area in Montreal (Canada), offering managerial insights into how user-choice behavior assumptions and competition affect throughput and location decisions. The results demonstrate that our model yields (re)location strategies that outperform the existing network. More broadly, this approach provides a tool for incorporating charging service quality-through queueing metrics-and existing competition into station planning.
Model predictive control lowers barriers to adoption of heat-pump water heaters: A field study
Electric heat-pump water heaters (HPWHs) could reduce the energy costs, emissions, and power grid impacts associated with water heating, the second-largest energy use in United States housing. However, most HPWHs today require 240 V circuits to power the backup resistance heating elements they use to maintain comfort during large water draws. Installing a 240 V circuit can increase the up-front cost of a HPWH by half or more. This paper develops and field-tests the first control system that enables a 120 V HPWH to efficiently maintain comfort without resistance heating elements. The novel model predictive control (MPC) system enables pre-heating in anticipation of large water draws, which it forecasts using an ensemble of machine learning predictors. By shifting electrical load over time, MPC also reduces energy costs on average by 23% and 28% under time-of-use pricing and hourly pricing, respectively, relative to a 240 V HPWH with standard controls. Compared to the increasingly common practice in 120 V HPWHs of storing water at a constant, high temperature (60 {\deg}C) to ensure comfort, MPC saves 37% energy on average. In addition to demonstrating MPC's benefits in a real, occupied house, this paper discusses implementation challenges and costs. A simple payback analysis suggests that a 120 V HPWH, operated by the MPC system developed here, would be economically attractive in most installation scenarios.
Enhancing Profit and CO2 Mitigation: Commercial Direct Air Capture Design and Operation with Power Market Volatility
Current decarbonization efforts are falling short of meeting the net-zero greenhouse gas (GHG) emission target, highlighting the need for substantial carbon dioxide removal methods such as direct air capture (DAC). However, integrating DACs poses challenges due to their enormous power consumption. This study assesses the commercial operation of various DAC technologies that earn revenue using monetized carbon incentives while purchasing electricity from wholesale power markets. We model four commercial DAC technologies and examine their operation in three representative locations including California, Texas, and New York. Our findings reveal that commercial DAC operations can take financial advantage of the volatile power market to operate only during low-price periods strategically, offering a pathway to facilitate a cost-efficient decarbonization transition. The ambient operational environment such as temperature and relative humidity has non-trivial impact on abatement capacity. Profit-driven decisions introduce climate-economic trade-offs that might decrease the capacity factor of DAC and reduce total CO2 removal. These implications extend throughout the entire lifecycle of DAC developments and influence power systems and policies related to full-scale DAC implementation. Our study shows that DAC technologies with shorter cycle spans and higher flexibility can better exploit the electricity price volatility, while power markets demonstrate persistent low-price windows that often synergize with low grid emission periods, like during the solar "duck curve" in California. An optimal incentive design exists for profit-driven operations while carbon-tax policy in electricity pricing is counterproductive for DAC systems.
comment: 16 pages, 8 figure, Submitted and under review for Engineering
Non-Gaussian Distribution Steering in Nonlinear Dynamics with Conjugate Unscented Transformation
In highly nonlinear systems such as the ones commonly found in astrodynamics, Gaussian distributions generally evolve into non-Gaussian distributions. This paper introduces a method for effectively controlling non-Gaussian distributions in nonlinear environments using optimized linear feedback control. This paper utilizes Conjugate Unscented Transformation to quantify the higher-order statistical moments of non-Gaussian distributions. The formulation focuses on controlling and constraining the sigma points associated with the uncertainty quantification, which would thereby reflect the control of the entire distribution and constraints on the moments themselves. This paper develops an algorithm to solve this problem with sequential convex programming, and it is demonstrated through a two-body and three-body example. The examples show that individual moments can be directly controlled, and the moments are accurately approximated for non-Gaussian distributions throughout the controller's time horizon in nonlinear dynamics.
Gaussian Process Implicit Surfaces as Control Barrier Functions for Safe Robot Navigation
Level set methods underpin modern safety techniques such as control barrier functions (CBFs), while also serving as implicit surface representations for geometric shapes via distance fields. Inspired by these two paradigms, we propose a unified framework where the implicit surface itself acts as a CBF. We leverage Gaussian process (GP) implicit surface (GPIS) to represent the safety boundaries, using safety samples which are derived from sensor measurements to condition the GP. The GP posterior mean defines the implicit safety surface (safety belief), while the posterior variance provides a robust safety margin. Although GPs have favorable properties such as uncertainty estimation and analytical tractability, they scale cubically with data. To alleviate this issue, we develop a sparse solution called sparse Gaussian CBFs. To the best of our knowledge, GPIS have not been explicitly used to synthesize CBFs. We validate the approach on collision avoidance tasks in two settings: a simulated 7-DOF manipulator operating around the Stanford bunny, and a quadrotor navigating in 3D around a physical chair. In both cases, Gaussian CBFs (with and without sparsity) enable safe interaction and collision-free execution of trajectories that would otherwise intersect the objects.
comment: 8 pages, 7 figures, under review
A Wideband Composite Sequence Impedance Model for Evaluation of Interactions in Unbalanced Power-Electronic-Based Power Systems
This paper proposes a wideband composite sequence impedance model (WCSIM)-based analysis method to evaluate the interactions in power-electronic-based power systems subjected to unbalanced grid faults or with unbalanced loads. The WCSIM-based method intuitively assesses the impact of the small-signal interconnection among the positive-, negative-, and zero-sequence circuits on the interaction stability of unbalanced power systems. The effectiveness of this method is demonstrated using a permanent magnet synchronous generator-based weak grid system under a single-line-to-ground fault (SLGF). Frequency scanning results and controller hardware-in-loop tests validate both the correctness of the WCSIM and the effectiveness of the WCSIM-based analysis method.
comment: This work will be submitted to the IEEE for possible publication
ExaModelsPower.jl: A GPU-Compatible Modeling Library for Nonlinear Power System Optimization
As GPU-accelerated mathematical programming techniques mature, there is growing interest in utilizing them to address the computational challenges of power system optimization. This paper introduces ExaModelsPower.jl, an open-source modeling library for creating GPU-compatible nonlinear AC optimal power flow models. Built on ExaModels.jl, ExaModelsPower.jl provides a high-level interface that automatically generates all necessary callback functions for GPU solvers. The library is designed for large-scale problem instances, which may include multiple time periods and security constraints. Using ExaModelsPower.jl, we benchmark GPU and CPU solvers on open-source test cases. Our results show that GPU solvers can deliver up to two orders of magnitude speedups compared to alternative tools on CPU for problems with more than 20,000 variables and a solution precision of up to $10^{-4}$, while performance for smaller instances or tighter tolerances may vary.
Optimization of High-Order Quarter-Wave Plate for Birefringence Suppression in FOCS
Fiber optic current sensors (FOCS) are widely adopted in modern power grids due to high sensitivity, excellent insulation, and strong immunity to electromagnetic interference. This prominence necessitates precise investigation into their error sources and corresponding optimization. This study examines reflective FOCS based on the Faraday effect. A theoretical model is established to simulate phase error caused by linear birefringence from the quarter-wave plate. Conventional methods using circular birefringence are analyzed, revealing inherent limitations. Innovatively, a compensation strategy employing high-order quarter-wave plates is proposed to effectively eliminate linear birefringence effects. This approach significantly enhances the accuracy and practicality of FOCS in precision metrology.
The Algorithmic Regulator
The regulator theorem states that, under certain conditions, any optimal controller must embody a model of the system it regulates, grounding the idea that controllers embed, explicitly or implicitly, internal models of the controlled. This principle underpins neuroscience and predictive brain theories like the Free-Energy Principle or Kolmogorov/Algorithmic Agent theory. However, the theorem is only proven in limited settings. Here, we treat the deterministic, closed, coupled world-regulator system $(W,R)$ as a single self-delimiting program $p$ via a constant-size wrapper that produces the world output string~$x$ fed to the regulator. We analyze regulation from the viewpoint of the algorithmic complexity of the output, $K(x)$. We define $R$ to be a \emph{good algorithmic regulator} if it \emph{reduces} the algorithmic complexity of the readout relative to a null (unregulated) baseline $\varnothing$, i.e., \[ \Delta = K\big(O_{W,\varnothing}\big) - K\big(O_{W,R}\big) > 0. \] We then prove that the larger $\Delta$ is, the more world-regulator pairs with high mutual algorithmic information are favored. More precisely, a complexity gap $\Delta > 0$ yields \[ \Pr\big((W,R)\mid x\big) \le C\,2^{\,M(W{:}R)}\,2^{-\Delta}, \] making low $M(W{:}R)$ exponentially unlikely as $\Delta$ grows. This is an AIT version of the idea that ``the regulator contains a model of the world.'' The framework is distribution-free, applies to individual sequences, and complements the Internal Model Principle. Beyond this necessity claim, the same coding-theorem calculus singles out a \emph{canonical scalar objective} and implicates a \emph{planner}. On the realized episode, a regulator behaves \emph{as if} it minimized the conditional description length of the readout.
comment: 2 Figures
Product-oriented Product-Process-Resource Asset Network and its Representation in AutomationML for Asset Administration Shell
Current products, especially in the automotive sector, pose complex technical systems having a multi-disciplinary mechatronic nature. Industrial standards supporting system engineering and production typically (i) address the production phase only, but do not cover the complete product life cycle, and (ii) focus on production processes and resources rather than the products themselves. The presented approach is motivated by incorporating the impacts of the end-of-life phase of the product life cycle into the engineering phase. This paper proposes a modeling approach coming up from the Product-Process-Resource (PPR) modeling paradigm. It combines requirements on (i) respecting the product structure as a basis for the model, and (ii) incorporates repairing, remanufacturing, or upcycling within cyber-physical production systems. The proposed model called PoPAN should accompany the product during the entire life cycle as a digital shadow encapsulated within the Asset Administration Shell of a product. To facilitate the adoption of the proposed paradigm, the paper also proposes serialization of the model in the AutomationML data format. The model is demonstrated on a use-case for disassembling electric vehicle batteries to support their remanufacturing for stationary battery applications.
comment: \copyright 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Integrative, Scalable Modeling of Hydrological Systems with MBSE and HFGT
Worsening global challenges in the Anthropocene demand complex, adaptive solutions grounded in a systems-level understanding of coupled social and environmental dynamics. However, existing modeling approaches often fall short due to disciplinary silos, limited scalability, and the absence of shared ontological frameworks. Model-Based Systems Engineering (MBSE), when integrated with Hetero-functional Graph Theory (HFGT), offers a powerful methodology for modeling systems of systems while preserving subsystem heterogeneity and enabling cross-disciplinary integration. This paper presents the first application of the MBSE-HFGT methodology to environmental systems, using a series of worked examples involving flow through lake and land segments. These examples demonstrate how the approach enables consistent, scalable, and integrative modeling of complex environmental processes.
A Cyber Insurance Policy for Hedging Against Load-Altering Attacks and Extreme Load Variations in Distribution Grids
Uncertainties in renewable energy resources (RES) and load variations can lead to elevated system operational costs. Moreover, the emergence of large-scale distributed threats, such as load-altering attacks (LAAs), can induce substantial load variations, further exacerbating these costs. Although traditional defense measures can reduce the likelihood of such attacks, considerable residual risks remain. Thus, this paper proposes a cyber insurance framework designed to hedge against additional operational costs resulting from LAAs and substantial load variations in renewable-rich grids. The insurance framework determines both the insurance coverage and premium based on the Value at Risk (VaR) and Tail Value at Risk (TVaR). These risk metrics are calculated using the system failure probability and the probability density function (PDF) of the system operation cost. The system failure probability is assessed through a semi-Markov process (SMP), while the cost distribution is estimated through a cost minimization model of a distribution grid combined with a Monte-Carlo simulation to capture load variability. Furthermore, we employ a bi-level optimization scheme that identifies the specific load distribution leading to the maximum system cost, thereby enhancing the accuracy of the operation cost PDF estimation. The effectiveness and scalability of the proposed cyber insurance policy are evaluated considering a modified IEEE-118 test bus system and the IEEE European low-voltage (LV) test feeders model. The case study shows that with a relatively low premium, the network operator can hedge against additional operational costs caused by malicious load manipulations.
RIS-Assisted Millimeter Wave Communications for Indoor Scenarios: Modeling and Coverage Analysis
Millimeter wave (mmWave) communications and reconfigurable intelligent surfaces (RIS) are two critical technologies for next-generation networks, especially in dense indoor environments. However, existing analyses often oversimplify the indoor environment by neglecting some of the key characteristics, such as height variations, boundary effects, blockage effects, and user spatial distributions. In this paper, we develop an improved stochastic geometry-based model for RIS-assisted mmWave communications in indoor scenarios like conference centers, hospitals, and shopping malls. The proposed model incorporates the height factor for all the nodes in the network (e.g., transmitters, users, RISs, and obstacles) and captures the user clustering behavior in these scenarios. In addition, the boundary effect is also being considered for line-of-sight (LOS) probability calculation. Analytical expressions for distance distributions, LOS probabilities, and the coverage probability (CP) are derived. The CP is then validated through Monte Carlo simulations. Our results reveal deployment insights by approximating and simplifying the derived CP expressions, showing how transmitter density, obstacle density, RIS density, and user cluster radius impact network coverage. Notably, we show that RISs significantly improve coverage when transmitters or transmit power are limited but offer marginal benefits when transmitter density is high. These findings provide practical guidelines for the design and deployment of RIS-assisted indoor mmWave networks.
Eco-driving Incentive Mechanisms for Mitigating Emissions in Urban Transportation
This paper develops incentive mechanisms for promoting eco-driving with the overarching goal of minimizing emissions in transportation networks. The system operator provides drivers with energy-efficient driving guidance throughout their trips and measures compliance through vehicle telematics that capture how closely drivers follow this guidance. Drivers optimize their behaviors based on personal trade-offs between travel times and emissions. To design effective incentives, the operator elicits driver preferences regarding trip urgency and willingness to eco-drive, while determining optimal budget allocations and eco-driving recommendations. Two distinct settings based on driver behavior are analyzed. When drivers report their preferences truthfully, an incentive mechanism ensuring obedience (drivers find it optimal to follow recommendations) is designed by implementing eco-driving recommendations as a Nash equilibrium. When drivers may report strategically, the mechanism is extended to be both obedient and truthful (drivers find it optimal to report truthfully). Unlike existing works that focus on congestion or routing decisions in transportation networks, our framework explicitly targets emissions reduction by incentivizing drivers. The proposed mechanism addresses both strategic behavior and network effects arising from driver interactions, without requiring the operator to reveal system parameters to the drivers. Numerical simulations demonstrate the effects of budget constraints, driver types, and strategic misreporting on equilibrium outcomes and emissions reduction.
comment: 12 pages, 6 figures
Globally Stable Discrete Time PID Passivity-based Control of Power Converters: Simulation and Experimental Results
The key idea behind PID Passivity-based Control (PID-PBC) is to leverage the passivity property of PIDs (for all positive gains) and wrap the PID controller around a passive output to ensure global stability in closed-loop. However, the practical applicability of PID-PBC is stymied by two key facts: (i) the vast majority of practical implementations of PIDs is carried-out in discrete time -- discretizing the continuous time dynamical system of the PID; (ii) the well-known problem that passivity is not preserved upon discretization, even with small sampling times. Therefore, two aspects of the PID-PBC must be revisited for its safe practical application. First, we propose a discretization of the PID that ensures its passivity. Second, since the output that is identified as passive for the continuous time system is not necessarily passive for its discrete time version, we construct a new output that ensures the passivity property for the discretization of the system. In this paper, we provide a constructive answer to both issues for the case of power converter models. Instrumental to achieve this objective is the use of the implicit midpoint discretization method -- which is a symplectic integration technique that preserves system invariants. Since the reference value for the output to be regulated in power converters is non-zero, we are henceforth interested in the property of passivity of the incremental model -- currently known as shifted passivity. Therefore, we demonstrate that the resulting discrete-time PID-PBC defines a passive map for the incremental model and establish shifted passivity for the discretized power converter model. Combining these properties, we prove global stability for the feedback interconnection of the power converter with the discretized PID-PBC. The paper also presents simulations and experiments that demonstrate the performance of the proposed discretization.
Padé Approximant Neural Networks for Enhanced Electric Motor Fault Diagnosis Using Vibration and Acoustic Data
Purpose: The primary aim of this study is to enhance fault diagnosis in induction machines by leveraging the Pad\'e Approximant Neuron (PAON) model. While accelerometers and microphones are standard in motor condition monitoring, deep learning models with nonlinear neuron architectures offer promising improvements in diagnostic performance. This research investigates whether Pad\'e Approximant Neural Networks (Pad\'eNets) can outperform conventional Convolutional Neural Networks (CNNs) and Self-Organized Operational Neural Networks (Self-ONNs) in the diagnosis of electrical and mechanical faults from vibration and acoustic data. Methods: We evaluate and compare the diagnostic capabilities of three deep learning architectures: one-dimensional CNNs, Self-ONNs, and Pad\'eNets. These models are tested on the University of Ottawa's publicly available constant-speed induction motor datasets, which include both vibration and acoustic sensor data. The Pad\'eNet model is designed to introduce enhanced nonlinearity and is compatible with unbounded activation functions such as LeakyReLU. Results and Conclusion: Pad\'eNets consistently outperformed the baseline models, achieving diagnostic accuracies of 99.96%, 98.26%, 97.61%, and 98.33% for accelerometers 1, 2, 3, and the acoustic sensor, respectively. The enhanced nonlinearity of Pad\'eNets, together with their compatibility with unbounded activation functions, significantly improves fault diagnosis performance in induction motor condition monitoring.
comment: This version is the author's accepted manuscript. It has been peer-reviewed and accepted for publication in Journal of Vibration Engineering & Technologies. The final published version is available at https://doi.org/10.1007/s42417-025-02129-5
Dissipativity-Based Distributed Control and Communication Topology Co-Design for DC Microgrids with ZIP Loads
This paper presents a novel dissipativity-based distributed droop-free control and communication topology co-design approach for voltage regulation and current sharing in DC microgrids (DC MGs) with generic ``ZIP'' (constant impedance (Z), current (I) and power (P)) loads. While ZIP loads accurately capture the varied nature of the consumer loads, its constant power load (CPL) component is particularly challenging (and destabilizing) due to its non-linear form. Moreover, ensuring simultaneous voltage regulation and current sharing and co-designing controllers and topology are also challenging when designing control solutions for DC MGs. To address these three challenges, we model the DC MG as a networked system comprised of distributed generators (DGs), ZIP loads, and lines interconnected according to a static interconnection matrix. Next, we equip each DG with a local controller and a distributed global controller (over an arbitrary topology) to derive the error dynamic model of the DC MG as a networked ``error'' system, including disturbance inputs and performance outputs. Subsequently, to co-design the controllers and the topology ensuring robust (dissipative) voltage regulation and current sharing performance, we use the dissipativity and sector boundedness properties of the involved subsystems and formulate Linear Matrix Inequality (LMI) problems to be solved locally and globally. To support the feasibility of the global LMI problem, we identify and embed several crucial necessary conditions in the corresponding local LMI problems, thus providing a one-shot approach to solve the LMI problems. Overall, the proposed approach in this paper provides a unified framework for designing DC MGs. The effectiveness of the proposed solution was verified by simulating an islanded DC MG under different scenarios, demonstrating superior performance compared to traditional control approaches.
comment: arXiv admin note: substantial text overlap with arXiv:2503.04908
Dissipativity-Based Distributed Control and Communication Topology Co-Design for Voltage Regulation and Current Sharing in DC Microgrids
This paper presents a novel dissipativity-based distributed droop-free control approach for voltage regulation and current sharing in DC microgrids (MGs) comprised of an interconnected set of distributed generators (DGs), loads, and power lines. First, we describe the closed-loop DC MG as a networked system where the DGs and lines (i.e., subsystems) are interconnected via a static interconnection matrix. This interconnection matrix demonstrates how the inputs, outputs, and disturbances of DGs and lines are connected in a DC MG. Each DG is equipped with a local controller for voltage regulation and a distributed global controller for current sharing, where the local controllers ensure individual voltage tracking while the global controllers coordinate among DGs to achieve proportional current sharing. To design the distributed global controllers, we use the dissipativity properties of the subsystems and formulate a linear matrix inequality (LMI) problem. To support the feasibility of this problem, we identify a set of necessary local and global conditions to enforce in a specifically developed LMI-based local controller design process. In contrast to existing DC MG control solutions, our approach proposes a unified framework for co-designing the distributed controller and communication topology. As the co-design process is LMI-based, it can be efficiently implemented and evaluated using existing convex optimization tools. The effectiveness of the proposed solution is verified by simulating an islanded DC MG in a MATLAB/Simulink environment under different scenarios, such as load changes and topological constraint changes, and then comparing the performance with the droop control algorithm.
The Untapped Potential of Smart Charging: How EV Owners Can Save Money and Reduce Emissions Without Behavioral Change
The transportation sector is the single largest contributor to US emissions and the second largest globally. Electric vehicles (EVs) are expected to represent half of global car sales by 2035, emerging as a pivotal solution to reduce emissions and enhance grid flexibility. The electrification of buildings, manufacturing, and transportation is expected to grow electricity demand substantially over the next decade. Without effectively managed EV charging, EVs could strain energy grid infrastructure and increase electricity costs. Drawing on de-identified 2023 EV telematics data from Rivian Automotive, this study found that 72% of home charging commenced after the customer plugged in their vehicle regardless of utility time of use (TOU) tariffs or managed charging programs. In fewer than 26% of charging sessions in the sample, EV owners actively scheduled charging times to align or participate in utility tariffs or programs. With a majority of drivers concurrently plugged in during optimal charging periods yet not actively charging, the study identified an opportunity to reduce individual EV owner costs and carbon emissions through smarter charging habits without significant behavioral modifications or sacrifice in user preferences. By optimizing home charging schedules within existing plug-in and plug-out windows, the study suggests that EV owners can save an average of $140 annually and reduce the associated carbon emissions of charging their EV by as much as 28%.
A Neural Network-based Multi-timestep Command Governor for Nonlinear Systems with Constraints
The multi-timestep command governor (MCG) is an add-on algorithm that enforces constraints by modifying, at each timestep, the reference command to a pre-stabilized control system. The MCG can be interpreted as a Model-Predictive Control scheme operating on the reference command. The implementation of MCG on nonlinear systems carries a heavy computational burden as it requires solving a nonlinear program with multiple decision variables at each timestep. This paper proposes a less computationally demanding alternative, based on approximating the MCG control law using a neural network (NN) trained on offline data. However, since the NN output may not always be constraint-admissible due to training errors, its output is adjusted using a sensitivity-based method. We thus refer to the resulting control strategy as the neural network-based MCG (NN-MCG). As validation, the proposed controller is applied as a load governor for constraint management in an automotive fuel cell system. It is shown that the proposed strategy is significantly more computationally efficient than the traditional MCG, while achieving nearly identical performance if the NN is well-trained.
comment: Accepted for publication in the 2025 IEEE Conference on Control Technology and Applications (CCTA)
Robotics
Phys2Real: Fusing VLM Priors with Interactive Online Adaptation for Uncertainty-Aware Sim-to-Real Manipulation
Learning robotic manipulation policies directly in the real world can be expensive and time-consuming. While reinforcement learning (RL) policies trained in simulation present a scalable alternative, effective sim-to-real transfer remains challenging, particularly for tasks that require precise dynamics. To address this, we propose Phys2Real, a real-to-sim-to-real RL pipeline that combines vision-language model (VLM)-inferred physical parameter estimates with interactive adaptation through uncertainty-aware fusion. Our approach consists of three core components: (1) high-fidelity geometric reconstruction with 3D Gaussian splatting, (2) VLM-inferred prior distributions over physical parameters, and (3) online physical parameter estimation from interaction data. Phys2Real conditions policies on interpretable physical parameters, refining VLM predictions with online estimates via ensemble-based uncertainty quantification. On planar pushing tasks of a T-block with varying center of mass (CoM) and a hammer with an off-center mass distribution, Phys2Real achieves substantial improvements over a domain randomization baseline: 100% vs 79% success rate for the bottom-weighted T-block, 57% vs 23% in the challenging top-weighted T-block, and 15% faster average task completion for hammer pushing. Ablation studies indicate that the combination of VLM and interaction information is essential for success. Project website: https://phys2real.github.io/ .
Ego-Vision World Model for Humanoid Contact Planning
Enabling humanoid robots to exploit physical contact, rather than simply avoid collisions, is crucial for autonomy in unstructured environments. Traditional optimization-based planners struggle with contact complexity, while on-policy reinforcement learning (RL) is sample-inefficient and has limited multi-task ability. We propose a framework combining a learned world model with sampling-based Model Predictive Control (MPC), trained on a demonstration-free offline dataset to predict future outcomes in a compressed latent space. To address sparse contact rewards and sensor noise, the MPC uses a learned surrogate value function for dense, robust planning. Our single, scalable model supports contact-aware tasks, including wall support after perturbation, blocking incoming objects, and traversing height-limited arches, with improved data efficiency and multi-task capability over on-policy RL. Deployed on a physical humanoid, our system achieves robust, real-time contact planning from proprioception and ego-centric depth images. Website: https://ego-vcp.github.io/
ManiAgent: An Agentic Framework for General Robotic Manipulation
While Vision-Language-Action (VLA) models have demonstrated impressive capabilities in robotic manipulation, their performance in complex reasoning and long-horizon task planning is limited by data scarcity and model capacity. To address this, we introduce ManiAgent, an agentic architecture for general manipulation tasks that achieves end-to-end output from task descriptions and environmental inputs to robotic manipulation actions. In this framework, multiple agents involve inter-agent communication to perform environmental perception, sub-task decomposition and action generation, enabling efficient handling of complex manipulation scenarios. Evaluations show ManiAgent achieves an 86.8% success rate on the SimplerEnv benchmark and 95.8% on real-world pick-and-place tasks, enabling efficient data collection that yields VLA models with performance comparable to those trained on human-annotated datasets.The project webpage is available at https://yi-yang929.github.io/ManiAgent/.
comment: 8 pages, 6 figures, conference
Smooth Spatiotemporal Tube Synthesis for Prescribed-Time Reach-Avoid-Stay Control
In this work, we address the issue of controller synthesis for a control-affine nonlinear system to meet prescribed time reach-avoid-stay specifications. Our goal is to improve upon previous methods based on spatiotemporal tubes (STTs) by eliminating the need for circumvent functions, which often lead to abrupt tube modifications and high control effort. We propose an adaptive framework that constructs smooth STTs around static unsafe sets, enabling continuous avoidance while guiding the system toward the target within the prescribed time. A closed-form, approximation-free control law is derived to ensure the system trajectory remains within the tube and satisfies the RAS task. The effectiveness of the proposed approach is demonstrated through a case study, showing a significant reduction in control effort compared to prior methods.
Calibrated Dynamic Modeling for Force and Payload Estimation in Hydraulic Machinery
Accurate real-time estimation of end effector interaction forces in hydraulic excavators is a key enabler for advanced automation in heavy machinery. Accurate knowledge of these forces allows improved, precise grading and digging maneuvers. To address these challenges, we introduce a high-accuracy, retrofittable 2D force- and payload estimation algorithm that does not impose additional requirements on the operator regarding trajectory, acceleration or the use of the slew joint. The approach is designed for retrofittability, requires minimal calibration and no prior knowledge of machine-specific dynamic characteristics. Specifically, we propose a method for identifying a dynamic model, necessary to estimate both end effector interaction forces and bucket payload during normal operation. Our optimization-based payload estimation achieves a full-scale payload accuracy of 1%. On a standard 25 t excavator, the online force measurement from pressure and inertial measurements achieves a direction accuracy of 13 degree and a magnitude accuracy of 383 N. The method's accuracy and generalization capability are validated on two excavator platforms of different type and weight classes. We benchmark our payload estimation against a classical quasistatic method and a commercially available system. Our system outperforms both in accuracy and precision.
SCOOP'D: Learning Mixed-Liquid-Solid Scooping via Sim2Real Generative Policy
Scooping items with tools such as spoons and ladles is common in daily life, ranging from assistive feeding to retrieving items from environmental disaster sites. However, developing a general and autonomous robotic scooping policy is challenging since it requires reasoning about complex tool-object interactions. Furthermore, scooping often involves manipulating deformable objects, such as granular media or liquids, which is challenging due to their infinite-dimensional configuration spaces and complex dynamics. We propose a method, SCOOP'D, which uses simulation from OmniGibson (built on NVIDIA Omniverse) to collect scooping demonstrations using algorithmic procedures that rely on privileged state information. Then, we use generative policies via diffusion to imitate demonstrations from observational input. We directly apply the learned policy in diverse real-world scenarios, testing its performance on various item quantities, item characteristics, and container types. In zero-shot deployment, our method demonstrates promising results across 465 trials in diverse scenarios, including objects of different difficulty levels that we categorize as "Level 1" and "Level 2." SCOOP'D outperforms all baselines and ablations, suggesting that this is a promising approach to acquiring robotic scooping skills. Project page is at https://scoopdiff.github.io/.
comment: Project page is at https://scoopdiff.github.io/
Robot Soccer Kit: Omniwheel Tracked Soccer Robots for Education
Recent developments of low cost off-the-shelf programmable components, their modularity, and also rapid prototyping made educational robotics flourish, as it is accessible in most schools today. They allow to illustrate and embody theoretical problems in practical and tangible applications, and gather multidisciplinary skills. They also give a rich natural context for project-oriented pedagogy. However, most current robot kits all are limited to egocentric aspect of the robots perception. This makes it difficult to access more high-level problems involving e.g. coordinates or navigation. In this paper we introduce an educational holonomous robot kit that comes with an external tracking system, which lightens the constraint on embedded systems, but allows in the same time to discover high-level aspects of robotics, otherwise unreachable.
NaviGait: Navigating Dynamically Feasible Gait Libraries using Deep Reinforcement Learning
Reinforcement learning (RL) has emerged as a powerful method to learn robust control policies for bipedal locomotion. Yet, it can be difficult to tune desired robot behaviors due to unintuitive and complex reward design. In comparison, offline trajectory optimization methods, like Hybrid Zero Dynamics, offer more tuneable, interpretable, and mathematically grounded motion plans for high-dimensional legged systems. However, these methods often remain brittle to real-world disturbances like external perturbations. In this work, we present NaviGait, a hierarchical framework that combines the structure of trajectory optimization with the adaptability of RL for robust and intuitive locomotion control. NaviGait leverages a library of offline-optimized gaits and smoothly interpolates between them to produce continuous reference motions in response to high-level commands. The policy provides both joint-level and velocity command residual corrections to modulate and stabilize the reference trajectories in the gait library. One notable advantage of NaviGait is that it dramatically simplifies reward design by encoding rich motion priors from trajectory optimization, reducing the need for finely tuned shaping terms and enabling more stable and interpretable learning. Our experimental results demonstrate that NaviGait enables faster training compared to conventional and imitation-based RL, and produces motions that remain closest to the original reference. Overall, by decoupling high-level motion generation from low-level correction, NaviGait offers a more scalable and generalizable approach for achieving dynamic and robust locomotion.
Simultaneous Calibration of Noise Covariance and Kinematics for State Estimation of Legged Robots via Bi-level Optimization
Accurate state estimation is critical for legged and aerial robots operating in dynamic, uncertain environments. A key challenge lies in specifying process and measurement noise covariances, which are typically unknown or manually tuned. In this work, we introduce a bi-level optimization framework that jointly calibrates covariance matrices and kinematic parameters in an estimator-in-the-loop manner. The upper level treats noise covariances and model parameters as optimization variables, while the lower level executes a full-information estimator. Differentiating through the estimator allows direct optimization of trajectory-level objectives, resulting in accurate and consistent state estimates. We validate our approach on quadrupedal and humanoid robots, demonstrating significantly improved estimation accuracy and uncertainty calibration compared to hand-tuned baselines. Our method unifies state estimation, sensor, and kinematics calibration into a principled, data-driven framework applicable across diverse robotic platforms.
IntersectioNDE: Learning Complex Urban Traffic Dynamics based on Interaction Decoupling Strategy SC 2025
Realistic traffic simulation is critical for ensuring the safety and reliability of autonomous vehicles (AVs), especially in complex and diverse urban traffic environments. However, existing data-driven simulators face two key challenges: a limited focus on modeling dense, heterogeneous interactions at urban intersections - which are prevalent, crucial, and practically significant in countries like China, featuring diverse agents including motorized vehicles (MVs), non-motorized vehicles (NMVs), and pedestrians - and the inherent difficulty in robustly learning high-dimensional joint distributions for such high-density scenes, often leading to mode collapse and long-term simulation instability. We introduce City Crossings Dataset (CiCross), a large-scale dataset collected from a real-world urban intersection, uniquely capturing dense, heterogeneous multi-agent interactions, particularly with a substantial proportion of MVs, NMVs and pedestrians. Based on this dataset, we propose IntersectioNDE (Intersection Naturalistic Driving Environment), a data-driven simulator tailored for complex urban intersection scenarios. Its core component is the Interaction Decoupling Strategy (IDS), a training paradigm that learns compositional dynamics from agent subsets, enabling the marginal-to-joint simulation. Integrated into a scene-aware Transformer network with specialized training techniques, IDS significantly enhances simulation robustness and long-term stability for modeling heterogeneous interactions. Experiments on CiCross show that IntersectioNDE outperforms baseline methods in simulation fidelity, stability, and its ability to replicate complex, distribution-level urban traffic dynamics.
comment: Accepted by ITSC 2025
DQ-NMPC: Dual-Quaternion NMPC for Quadrotor Flight
MAVs have great potential to assist humans in complex tasks, with applications ranging from logistics to emergency response. Their agility makes them ideal for operations in complex and dynamic environments. However, achieving precise control in agile flights remains a significant challenge, particularly due to the underactuated nature of quadrotors and the strong coupling between their translational and rotational dynamics. In this work, we propose a novel NMPC framework based on dual-quaternions (DQ-NMPC) for quadrotor flight. By representing both quadrotor dynamics and the pose error directly on the dual-quaternion manifold, our approach enables a compact and globally non-singular formulation that captures the quadrotor coupled dynamics. We validate our approach through simulations and real-world experiments, demonstrating better numerical conditioning and significantly improved tracking performance, with reductions in position and orientation errors of up to 56.11% and 56.77%, compared to a conventional baseline NMPC method. Furthermore, our controller successfully handles aggressive trajectories, reaching maximum speeds up to 13.66 m/s and accelerations reaching 4.2 g within confined space conditions of dimensions 11m x 4.5m x 3.65m under which the baseline controller fails.
comment: Accepted to IEEE Robotics and Automation Letters
Context-Aware Model-Based Reinforcement Learning for Autonomous Racing
Autonomous vehicles have shown promising potential to be a groundbreaking technology for improving the safety of road users. For these vehicles, as well as many other safety-critical robotic technologies, to be deployed in real-world applications, we require algorithms that can generalize well to unseen scenarios and data. Model-based reinforcement learning algorithms (MBRL) have demonstrated state-of-the-art performance and data efficiency across a diverse set of domains. However, these algorithms have also shown susceptibility to changes in the environment and its transition dynamics. In this work, we explore the performance and generalization capabilities of MBRL algorithms for autonomous driving, specifically in the simulated autonomous racing environment, Roboracer (formerly F1Tenth). We frame the head-to-head racing task as a learning problem using contextual Markov decision processes and parameterize the driving behavior of the adversaries using the context of the episode, thereby also parameterizing the transition and reward dynamics. We benchmark the behavior of MBRL algorithms in this environment and propose a novel context-aware extension of the existing literature, cMask. We demonstrate that context-aware MBRL algorithms generalize better to out-of-distribution adversary behaviors relative to context-free approaches. We also demonstrate that cMask displays strong generalization capabilities, as well as further performance improvement relative to other context-aware MBRL approaches when racing against adversaries with in-distribution behaviors.
comment: Accepted to IEEE ICAR 2025
Constraint-Aware Reinforcement Learning via Adaptive Action Scaling
Safe reinforcement learning (RL) seeks to mitigate unsafe behaviors that arise from exploration during training by reducing constraint violations while maintaining task performance. Existing approaches typically rely on a single policy to jointly optimize reward and safety, which can cause instability due to conflicting objectives, or they use external safety filters that override actions and require prior system knowledge. In this paper, we propose a modular cost-aware regulator that scales the agent's actions based on predicted constraint violations, preserving exploration through smooth action modulation rather than overriding the policy. The regulator is trained to minimize constraint violations while avoiding degenerate suppression of actions. Our approach integrates seamlessly with off-policy RL methods such as SAC and TD3, and achieves state-of-the-art return-to-cost ratios on Safety Gym locomotion tasks with sparse costs, reducing constraint violations by up to 126 times while increasing returns by over an order of magnitude compared to prior methods.
Coordinated Strategies in Realistic Air Combat by Hierarchical Multi-Agent Reinforcement Learning
Achieving mission objectives in a realistic simulation of aerial combat is highly challenging due to imperfect situational awareness and nonlinear flight dynamics. In this work, we introduce a novel 3D multi-agent air combat environment and a Hierarchical Multi-Agent Reinforcement Learning framework to tackle these challenges. Our approach combines heterogeneous agent dynamics, curriculum learning, league-play, and a newly adapted training algorithm. To this end, the decision-making process is organized into two abstraction levels: low-level policies learn precise control maneuvers, while high-level policies issue tactical commands based on mission objectives. Empirical results show that our hierarchical approach improves both learning efficiency and combat performance in complex dogfight scenarios.
comment: 2025 IEEE International Conference on Agentic AI (ICA)
A Faster and More Reliable Middleware for Autonomous Driving Systems
Ensuring safety in high-speed autonomous vehicles requires rapid control loops and tightly bounded delays from perception to actuation. Many open-source autonomy systems rely on ROS 2 middleware; when multiple sensor and control nodes share one compute unit, ROS 2 and its DDS transports add significant (de)serialization, copying, and discovery overheads, shrinking the available time budget. We present Sensor-in-Memory (SIM), a shared-memory transport designed for intra-host pipelines in autonomous vehicles. SIM keeps sensor data in native memory layouts (e.g., cv::Mat, PCL), uses lock-free bounded double buffers that overwrite old data to prioritize freshness, and integrates into ROS 2 nodes with four lines of code. Unlike traditional middleware, SIM operates beside ROS 2 and is optimized for applications where data freshness and minimal latency outweigh guaranteed completeness. SIM provides sequence numbers, a writer heartbeat, and optional checksums to ensure ordering, liveness, and basic integrity. On an NVIDIA Jetson Orin Nano, SIM reduces data-transport latency by up to 98% compared to ROS 2 zero-copy transports such as FastRTPS and Zenoh, lowers mean latency by about 95%, and narrows 95th/99th-percentile tail latencies by around 96%. In tests on a production-ready Level 4 vehicle running Autoware.Universe, SIM increased localization frequency from 7.5 Hz to 9.5 Hz. Applied across all latency-critical modules, SIM cut average perception-to-decision latency from 521.91 ms to 290.26 ms, reducing emergency braking distance at 40 mph (64 km/h) on dry concrete by 13.6 ft (4.14 m).
comment: 8 pages,7 figures, 8 tables
A Modular AIoT Framework for Low-Latency Real-Time Robotic Teleoperation in Smart Cities
This paper presents an AI-driven IoT robotic teleoperation system designed for real-time remote manipulation and intelligent visual monitoring, tailored for smart city applications. The architecture integrates a Flutter-based cross-platform mobile interface with MQTT-based control signaling and WebRTC video streaming via the LiveKit framework. A YOLOv11-nano model is deployed for lightweight object detection, enabling real-time perception with annotated visual overlays delivered to the user interface. Control commands are transmitted via MQTT to an ESP8266-based actuator node, which coordinates multi-axis robotic arm motion through an Arduino Mega2560 controller. The backend infrastructure is hosted on DigitalOcean, ensuring scalable cloud orchestration and stable global communication. Latency evaluations conducted under both local and international VPN scenarios (including Hong Kong, Japan, and Belgium) demonstrate actuator response times as low as 0.2 seconds and total video latency under 1.2 seconds, even across high-latency networks. This low-latency dual-protocol design ensures responsive closed-loop interaction and robust performance in distributed environments. Unlike conventional teleoperation platforms, the proposed system emphasizes modular deployment, real-time AI sensing, and adaptable communication strategies, making it well-suited for smart city scenarios such as remote infrastructure inspection, public equipment servicing, and urban automation. Future enhancements will focus on edge-device deployment, adaptive routing, and integration with city-scale IoT networks to enhance resilience and scalability.
Path and Motion Optimization for Efficient Multi-Location Inspection with Humanoid Robots
This paper proposes a novel framework for humanoid robots to execute inspection tasks with high efficiency and millimeter-level precision. The approach combines hierarchical planning, time-optimal standing position generation, and integrated \ac{mpc} to achieve high speed and precision. A hierarchical planning strategy, leveraging \ac{ik} and \ac{mip}, reduces computational complexity by decoupling the high-dimensional planning problem. A novel MIP formulation optimizes standing position selection and trajectory length, minimizing task completion time. Furthermore, an MPC system with simplified kinematics and single-step position correction ensures millimeter-level end-effector tracking accuracy. Validated through simulations and experiments on the Kuavo 4Pro humanoid platform, the framework demonstrates low time cost and a high success rate in multi-location tasks, enabling efficient and precise execution of complex industrial operations.
HiMaCon: Discovering Hierarchical Manipulation Concepts from Unlabeled Multi-Modal Data NeurIPS 2025
Effective generalization in robotic manipulation requires representations that capture invariant patterns of interaction across environments and tasks. We present a self-supervised framework for learning hierarchical manipulation concepts that encode these invariant patterns through cross-modal sensory correlations and multi-level temporal abstractions without requiring human annotation. Our approach combines a cross-modal correlation network that identifies persistent patterns across sensory modalities with a multi-horizon predictor that organizes representations hierarchically across temporal scales. Manipulation concepts learned through this dual structure enable policies to focus on transferable relational patterns while maintaining awareness of both immediate actions and longer-term goals. Empirical evaluation across simulated benchmarks and real-world deployments demonstrates significant performance improvements with our concept-enhanced policies. Analysis reveals that the learned concepts resemble human-interpretable manipulation primitives despite receiving no semantic supervision. This work advances both the understanding of representation learning for manipulation and provides a practical approach to enhancing robotic performance in complex scenarios.
comment: Accepted at 39th Conference on Neural Information Processing Systems (NeurIPS 2025)
Adap-RPF: Adaptive Trajectory Sampling for Robot Person Following in Dynamic Crowded Environments
Robot person following (RPF) is a core capability in human-robot interaction, enabling robots to assist users in daily activities, collaborative work, and other service scenarios. However, achieving practical RPF remains challenging due to frequent occlusions, particularly in dynamic and crowded environments. Existing approaches often rely on fixed-point following or sparse candidate-point selection with oversimplified heuristics, which cannot adequately handle complex occlusions caused by moving obstacles such as pedestrians. To address these limitations, we propose an adaptive trajectory sampling method that generates dense candidate points within socially aware zones and evaluates them using a multi-objective cost function. Based on the optimal point, a person-following trajectory is estimated relative to the predicted motion of the target. We further design a prediction-aware model predictive path integral (MPPI) controller that simultaneously tracks this trajectory and proactively avoids collisions using predicted pedestrian motions. Extensive experiments show that our method outperforms state-of-the-art baselines in smoothness, safety, robustness, and human comfort, with its effectiveness further demonstrated on a mobile robot in real-world scenarios.
comment: https://adap-rpf.github.io/
Rotor-Failure-Aware Quadrotors Flight in Unknown Environments
Rotor failures in quadrotors may result in high-speed rotation and vibration due to rotor imbalance, which introduces significant challenges for autonomous flight in unknown environments. The mainstream approaches against rotor failures rely on fault-tolerant control (FTC) and predefined trajectory tracking. To the best of our knowledge, online failure detection and diagnosis (FDD), trajectory planning, and FTC of the post-failure quadrotors in unknown and complex environments have not yet been achieved. This paper presents a rotor-failure-aware quadrotor navigation system designed to mitigate the impacts of rotor imbalance. First, a composite FDD-based nonlinear model predictive controller (NMPC), incorporating motor dynamics, is designed to ensure fast failure detection and flight stability. Second, a rotor-failure-aware planner is designed to leverage FDD results and spatial-temporal joint optimization, while a LiDAR-based quadrotor platform with four anti-torque plates is designed to enable reliable perception under high-speed rotation. Lastly, extensive benchmarks against state-of-the-art methods highlight the superior performance of the proposed approach in addressing rotor failures, including propeller unloading and motor stoppage. The experimental results demonstrate, for the first time, that our approach enables autonomous quadrotor flight with rotor failures in challenging environments, including cluttered rooms and unknown forests.
DemoHLM: From One Demonstration to Generalizable Humanoid Loco-Manipulation
Loco-manipulation is a fundamental challenge for humanoid robots to achieve versatile interactions in human environments. Although recent studies have made significant progress in humanoid whole-body control, loco-manipulation remains underexplored and often relies on hard-coded task definitions or costly real-world data collection, which limits autonomy and generalization. We present DemoHLM, a framework for humanoid loco-manipulation that enables generalizable loco-manipulation on a real humanoid robot from a single demonstration in simulation. DemoHLM adopts a hierarchy that integrates a low-level universal whole-body controller with high-level manipulation policies for multiple tasks. The whole-body controller maps whole-body motion commands to joint torques and provides omnidirectional mobility for the humanoid robot. The manipulation policies, learned in simulation via our data generation and imitation learning pipeline, command the whole-body controller with closed-loop visual feedback to execute challenging loco-manipulation tasks. Experiments show a positive correlation between the amount of synthetic data and policy performance, underscoring the effectiveness of our data generation pipeline and the data efficiency of our approach. Real-world experiments on a Unitree G1 robot equipped with an RGB-D camera validate the sim-to-real transferability of DemoHLM, demonstrating robust performance under spatial variations across ten loco-manipulation tasks.
A Primer on SO(3) Action Representations in Deep Reinforcement Learning
Many robotic control tasks require policies to act on orientations, yet the geometry of SO(3) makes this nontrivial. Because SO(3) admits no global, smooth, minimal parameterization, common representations such as Euler angles, quaternions, rotation matrices, and Lie algebra coordinates introduce distinct constraints and failure modes. While these trade-offs are well studied for supervised learning, their implications for actions in reinforcement learning remain unclear. We systematically evaluate SO(3) action representations across three standard continuous control algorithms, PPO, SAC, and TD3, under dense and sparse rewards. We compare how representations shape exploration, interact with entropy regularization, and affect training stability through empirical studies and analyze the implications of different projections for obtaining valid rotations from Euclidean network outputs. Across a suite of robotics benchmarks, we quantify the practical impact of these choices and distill simple, implementation-ready guidelines for selecting and using rotation actions. Our results highlight that representation-induced geometry strongly influences exploration and optimization and show that representing actions as tangent vectors in the local frame yields the most reliable results across algorithms.
Design and Koopman Model Predictive Control of A Soft Exoskeleton Based on Origami-Inspired Pneumatic Actuator for Knee Rehabilitation
Effective rehabilitation methods are essential for the recovery of lower limb dysfunction caused by stroke. Nowadays, robotic exoskeletons have shown great potentials in rehabilitation. Nevertheless, traditional rigid exoskeletons are usually heavy and need a lot of work to help the patients to put them on. Moreover, it also requires extra compliance control to guarantee the safety. In contrast, soft exoskeletons are easy and comfortable to wear and have intrinsic compliance, but their complex nonlinear human-robot interaction dynamics would pose significant challenges for control. In this work, based on the pneumatic actuators inspired by origami, we design a rehabilitation exoskeleton for knee that is easy and comfortable to wear. To guarantee the control performance and enable a nice human-robot interaction, we first use Deep Koopman Network to model the human-robot interaction dynamics. In particular, by viewing the electromyography (EMG) signals and the duty cycle of the PWM wave that controls the pneumatic robot's valves and pump as the inputs, the linear Koopman model accurately captures the complex human-robot interaction dynamics. Next, based on the obtained Koopman model, we further use Model Predictive Control (MPC) to control the soft robot and help the user to do rehabilitation training in real-time. The goal of the rehabilitation training is to track a given reference signal shown on the screen. Experiments show that by integrating the EMG signals into the Koopman model, we have improved the model accuracy to great extent. In addition, a personalized Koopman model trained from the individual's own data performs better than the non-personalized model. Consequently, our control framework outperforms the traditional PID control in both passive and active training modes. Hence the proposed method provides a new control framework for soft rehabilitation robots.
Flow Matching-Based Autonomous Driving Planning with Advanced Interactive Behavior Modeling NeurIPS 2025
Modeling interactive driving behaviors in complex scenarios remains a fundamental challenge for autonomous driving planning. Learning-based approaches attempt to address this challenge with advanced generative models, removing the dependency on over-engineered architectures for representation fusion. However, brute-force implementation by simply stacking transformer blocks lacks a dedicated mechanism for modeling interactive behaviors that are common in real driving scenarios. The scarcity of interactive driving data further exacerbates this problem, leaving conventional imitation learning methods ill-equipped to capture high-value interactive behaviors. We propose Flow Planner, which tackles these problems through coordinated innovations in data modeling, model architecture, and learning scheme. Specifically, we first introduce fine-grained trajectory tokenization, which decomposes the trajectory into overlapping segments to decrease the complexity of whole trajectory modeling. With a sophisticatedly designed architecture, we achieve efficient temporal and spatial fusion of planning and scene information, to better capture interactive behaviors. In addition, the framework incorporates flow matching with classifier-free guidance for multi-modal behavior generation, which dynamically reweights agent interactions during inference to maintain coherent response strategies, providing a critical boost for interactive scenario understanding. Experimental results on the large-scale nuPlan dataset and challenging interactive interPlan dataset demonstrate that Flow Planner achieves state-of-the-art performance among learning-based approaches while effectively modeling interactive behaviors in complex driving scenarios.
comment: 26 pages, 6 figures. Accepted at NeurIPS 2025
PhysHSI: Towards a Real-World Generalizable and Natural Humanoid-Scene Interaction System
Deploying humanoid robots to interact with real-world environments--such as carrying objects or sitting on chairs--requires generalizable, lifelike motions and robust scene perception. Although prior approaches have advanced each capability individually, combining them in a unified system is still an ongoing challenge. In this work, we present a physical-world humanoid-scene interaction system, PhysHSI, that enables humanoids to autonomously perform diverse interaction tasks while maintaining natural and lifelike behaviors. PhysHSI comprises a simulation training pipeline and a real-world deployment system. In simulation, we adopt adversarial motion prior-based policy learning to imitate natural humanoid-scene interaction data across diverse scenarios, achieving both generalization and lifelike behaviors. For real-world deployment, we introduce a coarse-to-fine object localization module that combines LiDAR and camera inputs to provide continuous and robust scene perception. We validate PhysHSI on four representative interactive tasks--box carrying, sitting, lying, and standing up--in both simulation and real-world settings, demonstrating consistently high success rates, strong generalization across diverse task goals, and natural motion patterns.
comment: Project website: https://why618188.github.io/physhsi/
Unveiling Uncertainty-Aware Autonomous Cooperative Learning Based Planning Strategy
In future intelligent transportation systems, autonomous cooperative planning (ACP), becomes a promising technique to increase the effectiveness and security of multi-vehicle interactions. However, multiple uncertainties cannot be fully addressed for existing ACP strategies, e.g. perception, planning, and communication uncertainties. To address these, a novel deep reinforcement learning-based autonomous cooperative planning (DRLACP) framework is proposed to tackle various uncertainties on cooperative motion planning schemes. Specifically, the soft actor-critic (SAC) with the implementation of gate recurrent units (GRUs) is adopted to learn the deterministic optimal time-varying actions with imperfect state information occurred by planning, communication, and perception uncertainties. In addition, the real-time actions of autonomous vehicles (AVs) are demonstrated via the Car Learning to Act (CARLA) simulation platform. Evaluation results show that the proposed DRLACP learns and performs cooperative planning effectively, which outperforms other baseline methods under different scenarios with imperfect AV state information.
comment: Accepted by IEEE RA-L
XGrasp: Gripper-Aware Grasp Detection with Multi-Gripper Data Generation
Most robotic grasping methods are typically designed for single gripper types, which limits their applicability in real-world scenarios requiring diverse end-effectors. We propose XGrasp, a real-time gripper-aware grasp detection framework that efficiently handles multiple gripper configurations. The proposed method addresses data scarcity by systematically augmenting existing datasets with multi-gripper annotations. XGrasp employs a hierarchical two-stage architecture. In the first stage, a Grasp Point Predictor (GPP) identifies optimal locations using global scene information and gripper specifications. In the second stage, an Angle-Width Predictor (AWP) refines the grasp angle and width using local features. Contrastive learning in the AWP module enables zero-shot generalization to unseen grippers by learning fundamental grasping characteristics. The modular framework integrates seamlessly with vision foundation models, providing pathways for future vision-language capabilities. The experimental results demonstrate competitive grasp success rates across various gripper types, while achieving substantial improvements in inference speed compared to existing gripper-aware methods. Project page: https://sites.google.com/view/xgrasp
Refinery: Active Fine-tuning and Deployment-time Optimization for Contact-Rich Policies
Simulation-based learning has enabled policies for precise, contact-rich tasks (e.g., robotic assembly) to reach high success rates (~80%) under high levels of observation noise and control error. Although such performance may be sufficient for research applications, it falls short of industry standards and makes policy chaining exceptionally brittle. A key limitation is the high variance in individual policy performance across diverse initial conditions. We introduce Refinery, an effective framework that bridges this performance gap, robustifying policy performance across initial conditions. We propose Bayesian Optimization-guided fine-tuning to improve individual policies, and Gaussian Mixture Model-based sampling during deployment to select initializations that maximize execution success. Using Refinery, we improve mean success rates by 10.98% over state-of-the-art methods in simulation-based learning for robotic assembly, reaching 91.51% in simulation and comparable performance in the real world. Furthermore, we demonstrate that these fine-tuned policies can be chained to accomplish long-horizon, multi-part assembly$\unicode{x2013}$successfully assembling up to 8 parts without requiring explicit multi-step training.
comment: in submission. 8 pages, 6 figures. Website: https://refinery-2025.github.io/refinery/
Into the Unknown: Towards using Generative Models for Sampling Priors of Environment Uncertainty for Planning in Configuration Spaces
Priors are vital for planning under partial observability, yet difficult to obtain in practice. We present a sampling-based pipeline that leverages large-scale pretrained generative models to produce probabilistic priors capturing environmental uncertainty and spatio-semantic relationships in a zero-shot manner. Conditioned on partial observations, the pipeline recovers complete RGB-D point cloud samples with occupancy and target semantics, formulated to be directly useful in configuration-space planning. We establish a Matterport3D benchmark of rooms partially visible through doorways, where a robot must navigate to an unobserved target object. Effective priors for this setting must represent both occupancy and target-location uncertainty in unobserved regions. Experiments show that our approach recovers commonsense spatial semantics consistent with ground truth, yielding diverse, clean 3D point clouds usable in motion planning, highlight the promise of generative models as a rich source of priors for robotic planning.
comment: Under Review
AMO-HEAD: Adaptive MARG-Only Heading Estimation for UAVs under Magnetic Disturbances
Accurate and robust heading estimation is crucial for unmanned aerial vehicles (UAVs) when conducting indoor inspection tasks. However, the cluttered nature of indoor environments often introduces severe magnetic disturbances, which can significantly degrade heading accuracy. To address this challenge, this paper presents an Adaptive MARG-Only Heading (AMO-HEAD) estimation approach for UAVs operating in magnetically disturbed environments. AMO-HEAD is a lightweight and computationally efficient Extended Kalman Filter (EKF) framework that leverages inertial and magnetic sensors to achieve reliable heading estimation. In the proposed approach, gyroscope angular rate measurements are integrated to propagate the quaternion state, which is subsequently corrected using accelerometer and magnetometer data. The corrected quaternion is then used to compute the UAV's heading. An adaptive process noise covariance method is introduced to model and compensate for gyroscope measurement noise, bias drift, and discretization errors arising from the Euler method integration. To mitigate the effects of external magnetic disturbances, a scaling factor is applied based on real-time magnetic deviation detection. A theoretical observability analysis of the proposed AMO-HEAD is performed using the Lie derivative. Extensive experiments were conducted in real world indoor environments with customized UAV platforms. The results demonstrate the effectiveness of the proposed algorithm in providing precise heading estimation under magnetically disturbed conditions.
Game-Theoretic Risk-Shaped Reinforcement Learning for Safe Autonomous Driving
Ensuring safety in autonomous driving (AD) remains a significant challenge, especially in highly dynamic and complex traffic environments where diverse agents interact and unexpected hazards frequently emerge. Traditional reinforcement learning (RL) methods often struggle to balance safety, efficiency, and adaptability, as they primarily focus on reward maximization without explicitly modeling risk or safety constraints. To address these limitations, this study proposes a novel game-theoretic risk-shaped RL (GTR2L) framework for safe AD. GTR2L incorporates a multi-level game-theoretic world model that jointly predicts the interactive behaviors of surrounding vehicles and their associated risks, along with an adaptive rollout horizon that adjusts dynamically based on predictive uncertainty. Furthermore, an uncertainty-aware barrier mechanism enables flexible modulation of safety boundaries. A dedicated risk modeling approach is also proposed, explicitly capturing both epistemic and aleatoric uncertainty to guide constrained policy optimization and enhance decision-making in complex environments. Extensive evaluations across diverse and safety-critical traffic scenarios show that GTR2L significantly outperforms state-of-the-art baselines, including human drivers, in terms of success rate, collision and violation reduction, and driving efficiency. The code is available at https://github.com/DanielHu197/GTR2L.
DKPMV: Dense Keypoints Fusion from Multi-View RGB Frames for 6D Pose Estimation of Textureless Objects ICRA 2026
6D pose estimation of textureless objects is valuable for industrial robotic applications, yet remains challenging due to the frequent loss of depth information. Current multi-view methods either rely on depth data or insufficiently exploit multi-view geometric cues, limiting their performance. In this paper, we propose DKPMV, a pipeline that achieves dense keypoint-level fusion using only multi-view RGB images as input. We design a three-stage progressive pose optimization strategy that leverages dense multi-view keypoint geometry information. To enable effective dense keypoint fusion, we enhance the keypoint network with attentional aggregation and symmetry-aware training, improving prediction accuracy and resolving ambiguities on symmetric objects. Extensive experiments on the ROBI dataset demonstrate that DKPMV outperforms state-of-the-art multi-view RGB approaches and even surpasses the RGB-D methods in the majority of cases. The code will be available soon.
comment: 12 pages, 9 figures, submitted to ICRA 2026
TabVLA: Targeted Backdoor Attacks on Vision-Language-Action Models
With the growing deployment of Vision-Language-Action (VLA) models in real-world embodied AI systems, their increasing vulnerability to backdoor attacks poses a serious safety threat. A backdoored VLA agent can be covertly triggered by a pre-injected backdoor to execute adversarial actions, potentially causing system failures or even physical harm. Although backdoor attacks on VLA models have been explored, prior work has focused only on untargeted attacks, leaving the more practically threatening scenario of targeted manipulation unexamined. In this paper, we study targeted backdoor attacks on VLA models and introduce TabVLA, a novel framework that enables such attacks via black-box fine-tuning. TabVLA explores two deployment-relevant inference-time threat models: input-stream editing and in-scene triggering. It formulates poisoned data generation as an optimization problem to improve attack effectivess. Experiments with OpenVLA-7B on the LIBERO benchmark reveal that the vision channel is the principal attack surface: targeted backdoors succeed with minimal poisoning, remain robust across variations in trigger design, and are degraded only by positional mismatches between fine-tuning and inference triggers. We also investigate a potential detection-based defense against TabVLA, which reconstructs latent visual triggers from the input stream to flag activation-conditioned backdoor samples. Our work highlights the vulnerability of VLA models to targeted backdoor manipulation and underscores the need for more advanced defenses.
comment: 8 pages, 8 tables, 1 figure. Under review
More than A Point: Capturing Uncertainty with Adaptive Affordance Heatmaps for Spatial Grounding in Robotic Tasks
Many language-guided robotic systems rely on collapsing spatial reasoning into discrete points, making them brittle to perceptual noise and semantic ambiguity. To address this challenge, we propose RoboMAP, a framework that represents spatial targets as continuous, adaptive affordance heatmaps. This dense representation captures the uncertainty in spatial grounding and provides richer information for downstream policies, thereby significantly enhancing task success and interpretability. RoboMAP surpasses the previous state-of-the-art on a majority of grounding benchmarks with up to a 50x speed improvement, and achieves an 82\% success rate in real-world manipulation. Across extensive simulated and physical experiments, it demonstrates robust performance and shows strong zero-shot generalization to navigation. More details and videos can be found at https://robo-map.github.io.
comment: More details and videos can be found at https://robo-map.github.io. Xiu Li (Corresponding author: Xiu Li)
Towards a Unified Understanding of Robot Manipulation: A Comprehensive Survey
Embodied intelligence has witnessed remarkable progress in recent years, driven by advances in computer vision, natural language processing, and the rise of large-scale multimodal models. Among its core challenges, robot manipulation stands out as a fundamental yet intricate problem, requiring the seamless integration of perception, planning, and control to enable interaction within diverse and unstructured environments. This survey presents a comprehensive overview of robotic manipulation, encompassing foundational background, task-organized benchmarks and datasets, and a unified taxonomy of existing methods. We extend the classical division between high-level planning and low-level control by broadening high-level planning to include language, code, motion, affordance, and 3D representations, while introducing a new taxonomy of low-level learning-based control grounded in training paradigms such as input modeling, latent learning, and policy learning. Furthermore, we provide the first dedicated taxonomy of key bottlenecks, focusing on data collection, utilization, and generalization, and conclude with an extensive review of real-world applications. Compared with prior surveys, our work offers both a broader scope and deeper insight, serving as an accessible roadmap for newcomers and a structured reference for experienced researchers. All related resources, including research papers, open-source datasets, and projects, are curated for the community at https://github.com/BaiShuanghao/Awesome-Robotics-Manipulation.
An Adaptive Transition Framework for Game-Theoretic Based Takeover
The transition of control from autonomous systems to human drivers is critical in automated driving systems, particularly due to the out-of-the-loop (OOTL) circumstances that reduce driver readiness and increase reaction times. Existing takeover strategies are based on fixed time-based transitions, which fail to account for real-time driver performance variations. This paper proposes an adaptive transition strategy that dynamically adjusts the control authority based on both the time and tracking ability of the driver trajectory. Shared control is modeled as a cooperative differential game, where control authority is modulated through time-varying objective functions instead of blending control torques directly. To ensure a more natural takeover, a driver-specific state-tracking matrix is introduced, allowing the transition to align with individual control preferences. Multiple transition strategies are evaluated using a cumulative trajectory error metric. Human-in-the-loop control scenarios of the standardized ISO lane change maneuvers demonstrate that adaptive transitions reduce trajectory deviations and driver control effort compared to conventional strategies. Experiments also confirm that continuously adjusting control authority based on real-time deviations enhances vehicle stability while reducing driver effort during takeover.
QuayPoints: A Reasoning Framework to Bridge the Information Gap Between Global and Local Planning in Autonomous Racing
Autonomous racing requires tight integration between perception, planning and control to minimize latency as well as timely decision making. A standard autonomy pipeline comprising a global planner, local planner, and controller loses information as the higher-level racing context is sequentially propagated downstream into specific task-oriented context. In particular, the global planner's understanding of optimality is typically reduced to a sparse set of waypoints, leaving the local planner to make reactive decisions with limited context. This paper investigates whether additional global insights, specifically time-optimality information, can be meaningfully passed to the local planner to improve downstream decisions. We introduce a framework that preserves essential global knowledge and conveys it to the local planner through QuayPoints regions where deviations from the optimal raceline result in significant compromises to optimality. QuayPoints enable local planners to make more informed global decisions when deviating from the raceline, such as during strategic overtaking. To demonstrate this, we integrate QuayPoints into an existing planner and show that it consistently overtakes opponents traveling at up to 75% of the ego vehicle's speed across four distinct race tracks.
comment: This work has been submitted to the IEEE for possible publication
GRIP: A Unified Framework for Grid-Based Relay and Co-Occurrence-Aware Planning in Dynamic Environments
Robots navigating dynamic, cluttered, and semantically complex environments must integrate perception, symbolic reasoning, and spatial planning to generalize across diverse layouts and object categories. Existing methods often rely on static priors or limited memory, constraining adaptability under partial observability and semantic ambiguity. We present GRIP, Grid-based Relay with Intermediate Planning, a unified, modular framework with three scalable variants: GRIP-L (Lightweight), optimized for symbolic navigation via semantic occupancy grids; GRIP-F (Full), supporting multi-hop anchor chaining and LLM-based introspection; and GRIP-R (Real-World), enabling physical robot deployment under perceptual uncertainty. GRIP integrates dynamic 2D grid construction, open-vocabulary object grounding, co-occurrence-aware symbolic planning, and hybrid policy execution using behavioral cloning, D* search, and grid-conditioned control. Empirical results on AI2-THOR and RoboTHOR benchmarks show that GRIP achieves up to 9.6% higher success rates and over $2\times$ improvement in path efficiency (SPL and SAE) on long-horizon tasks. Qualitative analyses reveal interpretable symbolic plans in ambiguous scenes. Real-world deployment on a Jetbot further validates GRIP's generalization under sensor noise and environmental variation. These results position GRIP as a robust, scalable, and explainable framework bridging simulation and real-world navigation.
comment: 17 pages, 5 figures, 8 tables
Holistic Evaluation of Multimodal LLMs on Spatial Intelligence
Multimodal models have achieved remarkable progress in recent years. Nevertheless, they continue to exhibit notable limitations in spatial understanding and reasoning, the very capability that anchors artificial general intelligence in the physical world. With the recent release of GPT-5, allegedly the most powerful AI model to date, it is timely to examine where the leading models (GPT, Gemini, Grok, Seed, Qwen, and Intern) stand on the path toward spatial intelligence. We first propose a holistic taxonomy of spatial tasks that unifies existing benchmarks and a standardized protocol for the fair evaluation of state-of-the-art proprietary and open-source models across eight key benchmarks, at a cost exceeding ten billion total tokens. Our empirical study then reveals that (1) GPT-5 demonstrates unprecedented strength in spatial intelligence (SI), yet (2) still falls short of human performance significantly across a broad spectrum of SI-tasks. Moreover, we (3) show that SI-tasks expose greater model capability deficiency than non-SI tasks, to the extent that (4) proprietary models do not exhibit a decisive advantage when facing the most difficult ones. In addition, we conduct a qualitative evaluation across a diverse set of scenarios that are intuitive for humans, yet fail even the most advanced multimodal models.
Guiding Energy-Efficient Locomotion through Impact Mitigation Rewards
Animals achieve energy-efficient locomotion by their implicit passive dynamics, a marvel that has captivated roboticists for decades.Recently, methods incorporated Adversarial Motion Prior (AMP) and Reinforcement learning (RL) shows promising progress to replicate Animals' naturalistic motion. However, such imitation learning approaches predominantly capture explicit kinematic patterns, so-called gaits, while overlooking the implicit passive dynamics. This work bridges this gap by incorporating a reward term guided by Impact Mitigation Factor (IMF), a physics-informed metric that quantifies a robot's ability to passively mitigate impacts. By integrating IMF with AMP, our approach enables RL policies to learn both explicit motion trajectories from animal reference motion and the implicit passive dynamic. We demonstrate energy efficiency improvements of up to 32%, as measured by the Cost of Transport (CoT), across both AMP and handcrafted reward structure.
ORN-CBF: Learning Observation-conditioned Residual Neural Control Barrier Functions via Hypernetworks
Control barrier functions (CBFs) have been demonstrated as an effective method for safety-critical control of autonomous systems. Although CBFs are simple to deploy, their design remains challenging, motivating the development of learning-based approaches. Yet, issues such as suboptimal safe sets, applicability in partially observable environments, and lack of rigorous safety guarantees persist. In this work, we propose observation-conditioned neural CBFs based on Hamilton-Jacobi (HJ) reachability analysis, which approximately recover the maximal safe sets. We exploit certain mathematical properties of the HJ value function, ensuring that the predicted safe set never intersects with the observed failure set. Moreover, we leverage a hypernetwork-based architecture that is particularly suitable for the design of observation-conditioned safety filters. The proposed method is examined both in simulation and hardware experiments for a ground robot and a quadcopter. The results show improved success rates and generalization to out-of-domain environments compared to the baselines.
Task-Optimized Convolutional Recurrent Networks Align with Tactile Processing in the Rodent Brain NeurIPS 2025
Tactile sensing remains far less understood in neuroscience and less effective in artificial systems compared to more mature modalities such as vision and language. We bridge these gaps by introducing a novel Encoder-Attender-Decoder (EAD) framework to systematically explore the space of task-optimized temporal neural networks trained on realistic tactile input sequences from a customized rodent whisker-array simulator. We identify convolutional recurrent neural networks (ConvRNNs) as superior encoders to purely feedforward and state-space architectures for tactile categorization. Crucially, these ConvRNN-encoder-based EAD models achieve neural representations closely matching rodent somatosensory cortex, saturating the explainable neural variability and revealing a clear linear relationship between supervised categorization performance and neural alignment. Furthermore, contrastive self-supervised ConvRNN-encoder-based EADs, trained with tactile-specific augmentations, match supervised neural fits, serving as an ethologically-relevant, label-free proxy. For neuroscience, our findings highlight nonlinear recurrent processing as important for general-purpose tactile representations in somatosensory cortex, providing the first quantitative characterization of the underlying inductive biases in this system. For embodied AI, our results emphasize the importance of recurrent EAD architectures to handle realistic tactile inputs, along with tailored self-supervised learning methods for achieving robust tactile perception with the same type of sensors animals use to sense in unstructured environments.
comment: 10 pages, 8 figures, 7 tables, NeurIPS 2025 Camera Ready Version (oral)
Investigating Memory in RL with POPGym Arcade
How should we analyze memory in deep RL? We introduce mathematical tools for fairly analyzing policies under partial observability and revealing how agents use memory to make decisions. To utilize these tools, we present POPGym Arcade, a collection of Atari-inspired, hardware-accelerated, pixel-based environments sharing a single observation and action space. Each environment provides fully and partially observable variants, enabling counterfactual studies on observability. We find that controlled studies are necessary for fair comparisons, and identify a pathology where value functions smear credit over irrelevant history. With this pathology, we demonstrate how out-of-distribution scenarios can contaminate memory, perturbing the policy far into the future, with implications for sim-to-real transfer and offline RL.
Multi-Modal Manipulation via Multi-Modal Policy Consensus
Effectively integrating diverse sensory modalities is crucial for robotic manipulation. However, the typical approach of feature concatenation is often suboptimal: dominant modalities such as vision can overwhelm sparse but critical signals like touch in contact-rich tasks, and monolithic architectures cannot flexibly incorporate new or missing modalities without retraining. Our method factorizes the policy into a set of diffusion models, each specialized for a single representation (e.g., vision or touch), and employs a router network that learns consensus weights to adaptively combine their contributions, enabling incremental of new representations. We evaluate our approach on simulated manipulation tasks in {RLBench}, as well as real-world tasks such as occluded object picking, in-hand spoon reorientation, and puzzle insertion, where it significantly outperforms feature-concatenation baselines on scenarios requiring multimodal reasoning. Our policy further demonstrates robustness to physical perturbations and sensor corruption. We further conduct perturbation-based importance analysis, which reveals adaptive shifts between modalities.
comment: 9 pages, 7 figures. Project website: https://policyconsensus.github.io
Failure Prediction at Runtime for Generative Robot Policies NeurIPS 2025
Imitation learning (IL) with generative models, such as diffusion and flow matching, has enabled robots to perform complex, long-horizon tasks. However, distribution shifts from unseen environments or compounding action errors can still cause unpredictable and unsafe behavior, leading to task failure. Early failure prediction during runtime is therefore essential for deploying robots in human-centered and safety-critical environments. We propose FIPER, a general framework for Failure Prediction at Runtime for generative IL policies that does not require failure data. FIPER identifies two key indicators of impending failure: (i) out-of-distribution (OOD) observations detected via random network distillation in the policy's embedding space, and (ii) high uncertainty in generated actions measured by a novel action-chunk entropy score. Both failure prediction scores are calibrated using a small set of successful rollouts via conformal prediction. A failure alarm is triggered when both indicators, aggregated over short time windows, exceed their thresholds. We evaluate FIPER across five simulation and real-world environments involving diverse failure modes. Our results demonstrate that FIPER better distinguishes actual failures from benign OOD situations and predicts failures more accurately and earlier than existing methods. We thus consider this work an important step towards more interpretable and safer generative robot policies. Code, data and videos are available at https://tum-lsy.github.io/fiper_website.
comment: Project page: https://tum-lsy.github.io/fiper_website. 33 pages, 12 figures. Accepted to NeurIPS 2025
Product Digital Twin Supporting End-of-life Phase of Electric Vehicle Batteries Utilizing Product-Process-Resource Asset Network
In a circular economy, products in their end-of-life phase should be either remanufactured or recycled. Both of these processes are crucial for sustainability and environmental conservation. However, manufacturers frequently do not support these processes enough in terms of not sharing relevant data about the products nor their (re-)manufacturing processes. This paper proposes to accompany each product with a digital twin technology, specifically the Product Digital Twin (PDT), which can carry information for facilitating and optimizing production and remanufacturing processes. This paper introduces a knowledge representation called Bi-Flow Product-Process-Resource Asset Network (Bi-PAN). Bi-PAN extends a well-proven Product-Process-Resource Asset Network (PAN) paradigm by integrating both assembly and disassembly workflows into a single information model. Such networks enable capturing relevant relationships across products, production resources, manufacturing processes, and specific production operations that have to be done in the manufacturing phase of a product. The proposed approach is demonstrated in a use-case of disassembling electric vehicle (EV) batteries. By utilizing PDTs with Bi-PAN knowledge models, challenges associated with disassembling of EV batteries can be solved flexibly and efficiently for various battery types, enhancing the sustainability of the EV battery life-cycle management.
comment: \copyright 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Manipulation of Elasto-Flexible Cables with Single or Multiple UAVs
This work considers a large class of systems composed of multiple quadrotors manipulating deformable and extensible cables. The cable is described via a discretized representation, which decomposes it into linear springs interconnected through lumped-mass passive spherical joints. Sets of flat outputs are found for the systems. Numerical simulations support the findings by showing cable manipulation relying on flatness-based trajectories. Eventually, we present an experimental validation of the effectiveness of the proposed discretized cable model for a two-robot example. Moreover, a closed-loop controller based on the identified model and using cable-output feedback is experimentally tested.
Data Scaling Laws for Imitation Learning-Based End-to-End Autonomous Driving
The end-to-end autonomous driving paradigm has recently attracted lots of attention due to its scalability. However, existing methods are constrained by the limited scale of real-world data, which hinders a comprehensive exploration of the scaling laws associated with end-to-end autonomous driving. To address this issue, we collected substantial data from various driving scenarios and behaviors and conducted an extensive study on the scaling laws of existing imitation learning-based end-to-end autonomous driving paradigms. Specifically, approximately 4 million demonstrations from 23 different scenario types were gathered, amounting to over 30,000 hours of driving demonstrations. We performed open-loop evaluations and closed-loop simulation evaluations in 1,400 diverse driving demonstrations (1,300 for open-loop and 100 for closed-loop) under stringent assessment conditions. Through experimental analysis, we discovered that (1) the performance of the driving model exhibits a power-law relationship with the amount of data, but this is not the case in closed-loop evaluation. The inconsistency between the two assessments shifts our focus toward the distribution of data rather than merely expanding its volume. (2) a small increase in the quantity of long-tailed data can significantly improve the performance for the corresponding scenarios; (3) appropriate scaling of data enables the model to achieve combinatorial generalization in novel scenes and actions. Our results highlight the critical role of data scaling in improving the generalizability of models across diverse autonomous driving scenarios, assuring safe deployment in the real world.. Project repository: https://github.com/ucaszyp/Driving-Scaling-Law
PCHands: PCA-based Hand Pose Synergy Representation on Manipulators with N-DoF
We consider the problem of learning a common representation for dexterous manipulation across manipulators of different morphologies. To this end, we propose PCHands, a novel approach for extracting hand postural synergies from a large set of manipulators. We define a simplified and unified description format based on anchor positions for manipulators ranging from 2-finger grippers to 5-finger anthropomorphic hands. This enables learning a variable-length latent representation of the manipulator configuration and the alignment of the end-effector frame of all manipulators. We show that it is possible to extract principal components from this latent representation that is universal across manipulators of different structures and degrees of freedom. To evaluate PCHands, we use this compact representation to encode observation and action spaces of control policies for dexterous manipulation tasks learned with RL. In terms of learning efficiency and consistency, the proposed representation outperforms a baseline that learns the same tasks in joint space. We additionally show that PCHands performs robustly in RL from demonstration, when demonstrations are provided from a different manipulator. We further support our results with real-world experiments that involve a 2-finger gripper and a 4-finger anthropomorphic hand. Code and additional material are available at https://hsp-iit.github.io/PCHands/.
comment: 2025 IEEE-RAS 24th International Conference on Humanoid Robots
Gemini Robotics 1.5: Pushing the Frontier of Generalist Robots with Advanced Embodied Reasoning, Thinking, and Motion Transfer
General-purpose robots need a deep understanding of the physical world, advanced reasoning, and general and dexterous control. This report introduces the latest generation of the Gemini Robotics model family: Gemini Robotics 1.5, a multi-embodiment Vision-Language-Action (VLA) model, and Gemini Robotics-ER 1.5, a state-of-the-art Embodied Reasoning (ER) model. We are bringing together three major innovations. First, Gemini Robotics 1.5 features a novel architecture and a Motion Transfer (MT) mechanism, which enables it to learn from heterogeneous, multi-embodiment robot data and makes the VLA more general. Second, Gemini Robotics 1.5 interleaves actions with a multi-level internal reasoning process in natural language. This enables the robot to "think before acting" and notably improves its ability to decompose and execute complex, multi-step tasks, and also makes the robot's behavior more interpretable to the user. Third, Gemini Robotics-ER 1.5 establishes a new state-of-the-art for embodied reasoning, i.e., for reasoning capabilities that are critical for robots, such as visual and spatial understanding, task planning, and progress estimation. Together, this family of models takes us a step towards an era of physical agents-enabling robots to perceive, think and then act so they can solve complex multi-step tasks.
TTF-VLA: Temporal Token Fusion via Pixel-Attention Integration for Vision-Language-Action Models AAAI 2026
Vision-Language-Action (VLA) models process visual inputs independently at each timestep, discarding valuable temporal information inherent in robotic manipulation tasks. This frame-by-frame processing makes models vulnerable to visual noise while ignoring the substantial coherence between consecutive frames in manipulation sequences. We propose Temporal Token Fusion (TTF), a training-free approach that intelligently integrates historical and current visual representations to enhance VLA inference quality. Our method employs dual-dimension detection combining efficient grayscale pixel difference analysis with attention-based semantic relevance assessment, enabling selective temporal token fusion through hard fusion strategies and keyframe anchoring to prevent error accumulation. Comprehensive experiments across LIBERO, SimplerEnv, and real robot tasks demonstrate consistent improvements: 4.0 percentage points average on LIBERO (72.4\% vs 68.4\% baseline), cross-environment validation on SimplerEnv (4.8\% relative improvement), and 8.7\% relative improvement on real robot tasks. Our approach proves model-agnostic, working across OpenVLA and VLA-Cache architectures. Notably, TTF reveals that selective Query matrix reuse in attention mechanisms enhances rather than compromises performance, suggesting promising directions for direct KQV matrix reuse strategies that achieve computational acceleration while improving task success rates.
comment: Manuscript submitted to AAAI 2026, currently under review
Contrastive Representation Regularization for Vision-Language-Action Models
Vision-Language-Action (VLA) models have shown its capabilities in robot manipulation by leveraging rich representations from pre-trained Vision-Language Models (VLMs). However, their representations arguably remain suboptimal, lacking sensitivity to robotic signals such as control actions and proprioceptive states. To address the issue, we introduce Robot State-aware Contrastive Loss (RS-CL), a simple and effective representation regularization for VLA models, designed to bridge the gap between VLM representations and robotic signals. In particular, RS-CL aligns the representations more closely with the robot's proprioceptive states, by using relative distances between the states as soft supervision. Complementing the original action prediction objective, RS-CL effectively enhances control-relevant representation learning, while being lightweight and fully compatible with standard VLA training pipeline. Our empirical results demonstrate that RS-CL substantially improves the manipulation performance of state-of-the-art VLA models; it pushes the prior art from 30.8% to 41.5% on pick-and-place tasks in RoboCasa-Kitchen, through more accurate positioning during grasping and placing, and boosts success rates from 45.0% to 58.3% on challenging real-robot manipulation tasks.
comment: 20 pages, 12 figures
RoHOI: Robustness Benchmark for Human-Object Interaction Detection
Human-Object Interaction (HOI) detection is crucial for robot-human assistance, enabling context-aware support. However, models trained on clean datasets degrade in real-world conditions due to unforeseen corruptions, leading to inaccurate predictions. To address this, we introduce the first robustness benchmark for HOI detection, evaluating model resilience under diverse challenges. Despite advances, current models struggle with environmental variability, occlusions, and noise. Our benchmark, RoHOI, includes 20 corruption types based on the HICO-DET and V-COCO datasets and a new robustness-focused metric. We systematically analyze existing models in the HOI field, revealing significant performance drops under corruptions. To improve robustness, we propose a Semantic-Aware Masking-based Progressive Learning (SAMPL) strategy to guide the model to be optimized based on holistic and partial cues, thus dynamically adjusting the model's optimization to enhance robust feature learning. Extensive experiments show that our approach outperforms state-of-the-art methods, setting a new standard for robust HOI detection. Benchmarks, datasets, and code are available at https://github.com/KratosWen/RoHOI.
comment: Benchmarks, datasets, and code are available at https://github.com/KratosWen/RoHOI
DualTHOR: A Dual-Arm Humanoid Simulation Platform for Contingency-Aware Planning
Developing embodied agents capable of performing complex interactive tasks in real-world scenarios remains a fundamental challenge in embodied AI. Although recent advances in simulation platforms have greatly enhanced task diversity to train embodied Vision Language Models (VLMs), most platforms rely on simplified robot morphologies and bypass the stochastic nature of low-level execution, which limits their transferability to real-world robots. To address these issues, we present a physics-based simulation platform DualTHOR for complex dual-arm humanoid robots, built upon an extended version of AI2-THOR. Our simulator includes real-world robot assets, a task suite for dual-arm collaboration, and inverse kinematics solvers for humanoid robots. We also introduce a contingency mechanism that incorporates potential failures through physics-based low-level execution, bridging the gap to real-world scenarios. Our simulator enables a more comprehensive evaluation of the robustness and generalization of VLMs in household environments. Extensive evaluations reveal that current VLMs struggle with dual-arm coordination and exhibit limited robustness in realistic environments with contingencies, highlighting the importance of using our simulator to develop more capable VLMs for embodied tasks. The code is available at https://github.com/ds199895/DualTHOR.git.
comment: The experiments in the paper need to be further supplemented, and more methods should be considered for expansion
EROAS: 3D Efficient Reactive Obstacle Avoidance System for Autonomous Underwater Vehicles using 2.5D Forward-Looking Sonar
Autonomous Underwater Vehicles (AUVs) have advanced significantly in obstacle detection and path planning through sonar, cameras, and learning-based methods. However, safe and efficient navigation in cluttered environments remains challenging due to partial observability, turbidity, the limited field-of-view of forward-looking sonar (FLS), and occlusions that obscure obstacle geometry. To address these issues, we propose the Efficient Reactive Obstacle Avoidance Strategy (EROAS), a lightweight framework that augments a standard 2D FLS with a pivoting mechanism, effectively transforming it into a cost-efficient \emph{2.5D sonar}. This design provides vertical information on demand, extending situational awareness while minimizing computational overhead. EROAS integrates three complementary modules: first, Sonar Profile-guided Directional Decision Control (SPD2C) for rapid gap detection and generation of reference commands in both horizontal and vertical planes. Secondly, the Spatial Context Generator (SCG), which maintains a short-term obstacle memory of the past to mitigate partial observability, and finally, a Spatio-Temporal Control Barrier Function (ST-CBF) that enforces forward-invariance of safety constraints by filtering nominal references. Together, these components enable robust, reactive avoidance of obstacles in uncertain and cluttered 3D underwater settings. Simulation and hardware-in-the-loop (HIL) experiments validate the efficacy of the proposed EROAS algorithm, demonstrating improved trajectory efficiency, reduced travel time, and enhanced safety compared to conventional methods such as the Dynamic Window Approach (DWA) and Artificial Potential Fields (APF). https://github.com/AIRLabIISc/EROAS
comment: Submitted to IEEE Journal of Ocean Engineering
A Taylor Series Approach to Correction of Input Errors in Gaussian Process Regression
Gaussian Processes (GPs) are widely recognized as powerful non-parametric models for regression and classification. Traditional GP frameworks predominantly operate under the assumption that the inputs are either accurately known or subject to zero-mean noise. However, several real-world applications such as mobile sensors have imperfect localization, leading to inputs with biased errors. These biases can typically be estimated through measurements collected over time using, for example, Kalman filters. To avoid recomputation of the entire GP model when better estimates of the inputs used in the training data become available, we introduce a technique for updating a trained GP model to incorporate updated estimates of the inputs. By leveraging the differentiability of the mean and covariance functions derived from the squared exponential kernel, a second-order correction algorithm is developed to update the trained GP models. Precomputed Jacobians and Hessians of kernels enable real-time refinement of the mean and covariance predictions. The efficacy of the developed approach is demonstrated using two simulation studies, with error analyses revealing improvements in both predictive accuracy and uncertainty quantification.
comment: Improving the paper with better results and adding experimental results to publish again
Goal-Based Vision-Language Driving
Autonomous vehicles must react in milliseconds while reasoning about road geometry and traffic intent to navigate complex situations. We introduce NovaDrive, a single-branch vision-language architecture that processes front-camera images, HD-map tiles, LiDAR depth, and textual waypoints in a single branch. A lightweight, two-stage cross-attention block first aligns waypoint tokens with the HD map, then refines attention over fine-grained image and depth patches. Coupled with a novel smoothness loss that discourages abrupt steering and speed changes, this design eliminates the need for recurrent memory. We fine-tune the top 15 layers of an 11B LLaMA-3.2 vision-language backbone, enabling real-time inference. On the nuScenes / Waymo subset of the MD-NEX Outdoor benchmark, NovaDrive raises success rate to 84% (+4%), boosts path-efficiency (SPL) to 0.66 (+0.11), and reduces collision frequency from 2.6% to 1.2% (-1.4%) relative to the previous state-of-the-art. Our ablations confirm that waypoint tokens, partial VLM fine-tuning, and the cross-attention fusion each contribute the most to these gains. Beyond safety, NovaDrive's shorter routes (resulting from the novel smoothness loss) translate to lower fuel or battery usage, pointing toward leaner, more easily updated driving stacks. NovaDrive can be extended to other embodied-AI domains as well.
comment: 6 pages
Vision-Language Cross-Attention for Real-Time Autonomous Driving
Autonomous cars need geometric accuracy and semantic understanding to navigate complex environments, yet most stacks handle them separately. We present XYZ-Drive, a single vision-language model that reads a front-camera frame, a 25m $\times$ 25m overhead map, and the next waypoint, then outputs steering and speed. A lightweight goal-centered cross-attention layer lets waypoint tokens highlight relevant image and map patches, supporting both action and textual explanations, before the fused tokens enter a partially fine-tuned LLaMA-3.2 11B model. On the MD-NEX Outdoor-Driving benchmark XYZ-Drive attains 95% success and 0.80 Success weighted by Path Length (SPL), surpassing PhysNav-DG by 15%. and halving collisions, all while significantly improving efficiency by using only a single branch. Sixteen ablations explain the gains. Removing any modality (vision, waypoint, map) drops success by up to 11%, confirming their complementary roles and rich connections. Replacing goal-centered attention with simple concatenation cuts 3% in performance, showing query-based fusion injects map knowledge more effectively. Keeping the transformer frozen loses 5%, showing the importance of fine-tuning when applying VLMs for specific tasks such as autonomous driving. Coarsening map resolution from 10 cm to 40 cm blurs lane edges and raises crash rate. Overall, these results demonstrate that early, token-level fusion of intent and map layout enables accurate, transparent, real-time driving.
comment: 5 pages
HoMeR: Learning In-the-Wild Mobile Manipulation via Hybrid Imitation and Whole-Body Control
We introduce HoMeR, an imitation learning framework for mobile manipulation that combines whole-body control with hybrid action modes that handle both long-range and fine-grained motion, enabling effective performance on realistic in-the-wild tasks. At its core is a fast, kinematics-based whole-body controller that maps desired end-effector poses to coordinated motion across the mobile base and arm. Within this reduced end-effector action space, HoMeR learns to switch between absolute pose predictions for long-range movement and relative pose predictions for fine-grained manipulation, offloading low-level coordination to the controller and focusing learning on task-level decisions. We deploy HoMeR on a holonomic mobile manipulator with a 7-DoF arm in a real home. We compare HoMeR to baselines without hybrid actions or whole-body control across 3 simulated and 3 real household tasks such as opening cabinets, sweeping trash, and rearranging pillows. Across tasks, HoMeR achieves an overall success rate of 79.17% using just 20 demonstrations per task, outperforming the next best baseline by 29.17 on average. HoMeR is also compatible with vision-language models and can leverage their internet-scale priors to better generalize to novel object appearances, layouts, and cluttered scenes. In summary, HoMeR moves beyond tabletop settings and demonstrates a scalable path toward sample-efficient, generalizable manipulation in everyday indoor spaces. Code, videos, and supplementary material are available at: http://homer-manip.github.io
TriVLA: A Triple-System-Based Unified Vision-Language-Action Model with Episodic World Modeling for General Robot Control
Recent advances in vision-language models (VLMs) have enabled robots to follow open-ended instructions and demonstrate impressive commonsense reasoning. However, current vision-language-action (VLA) frameworks primarily rely on static representations and limited temporal context, restricting agents to short-horizon, reactive behaviors and hindering robust generalization in dynamic embodied environments. Inspired by cognitive neuroscience theories of episodic memory, we propose, to our knowledge, one of the first formalized episodic world models in VLA, enabling embodied robots to accumulate, recall, and predict sequential experiences. As an instantiation of this concept, our unified TriVLA realizes the episodic world model through a triple-system architecture: integrating multimodal grounding from a pretrained VLM (System 2) and temporally rich dynamics perception from a video diffusion model (System 3). This enables the agent to accumulate and recall sequential experiences, interpret current contexts, and predict future environmental evolution. Guided by episodic representations that span both the past and anticipated future, the downstream policy (System 1) generates coherent, context-aware action sequences through flow-matching and cross-modal attention mechanisms. Experimental results show that TriVLA operates efficiently at approximately 36 Hz and consistently outperforms baseline models on standard benchmarks and challenging real-world manipulation tasks. It demonstrates strong long-horizon planning and open-ended intent understanding, showcasing the advantages of episodic world model-inspired reasoning for robust, generalizable robot intelligence. Project Page: https://zhenyangliu.github.io/TriVLA/.
A High-frequency, Interaction-induced Pneumatic Oscillator Enabling Versatile Soft Robotics
Soft robots, while highly adaptable to diverse environments through various actuation methods, still face significant performance boundary due to the inherent properties of materials. These limitations manifest in the challenge of guaranteeing rapid response and large-scale movements simultaneously, ultimately restricting the robots' absolute speed and overall efficiency. In this paper, we introduce a high-frequency pneumatic oscillator (HIPO) to overcome these challenges. Through a collision-induced phase resetting mechanism, our HIPO leverages event-based nonlinearity to trigger self-oscillation of pneumatic actuator, which positively utilizes intrinsic characteristics of materials. This enables the system to spontaneously generate periodic control signals and directly produce motion responses, eliminating the need for incorporating external actuation components. By efficiently and rapidly converting internal energy of airflow into the kinetic energy of robots, HIPO achieves a frequency of up to 20 Hz. Furthermore, we demonstrate the versatility and high-performance capabilities of HIPO through bio-inspired robots: an insect-like fast-crawler (with speeds up to 50.27 cm/s), a high-frequency butterfly-like wing-flapper, and a maneuverable duck-like swimmer. By eliminating external components and seamlessly fusing signal generation, energy conversion, and motion output, HIPO unleashes rapid and efficient motion, unlocking potential for high-performance soft robotics.
Data Scaling Laws in Imitation Learning for Robotic Manipulation
Data scaling has revolutionized fields like natural language processing and computer vision, providing models with remarkable generalization capabilities. In this paper, we investigate whether similar data scaling laws exist in robotics, particularly in robotic manipulation, and whether appropriate data scaling can yield single-task robot policies that can be deployed zero-shot for any object within the same category in any environment. To this end, we conduct a comprehensive empirical study on data scaling in imitation learning. By collecting data across numerous environments and objects, we study how a policy's generalization performance changes with the number of training environments, objects, and demonstrations. Throughout our research, we collect over 40,000 demonstrations and execute more than 15,000 real-world robot rollouts under a rigorous evaluation protocol. Our findings reveal several intriguing results: the generalization performance of the policy follows a roughly power-law relationship with the number of environments and objects. The diversity of environments and objects is far more important than the absolute number of demonstrations; once the number of demonstrations per environment or object reaches a certain threshold, additional demonstrations have minimal effect. Based on these insights, we propose an efficient data collection strategy. With four data collectors working for one afternoon, we collect sufficient data to enable the policies for two tasks to achieve approximately 90% success rates in novel environments with unseen objects.
Precise Mobile Manipulation of Small Everyday Objects
Many everyday mobile manipulation tasks require precise interaction with small objects, such as grasping a knob to open a cabinet or pressing a light switch. In this paper, we develop Servoing with Vision Models (SVM), a closed-loop framework that enables a mobile manipulator to tackle such precise tasks involving the manipulation of small objects. SVM uses state-of-the-art vision foundation models to generate 3D targets for visual servoing to enable diverse tasks in novel environments. Naively doing so fails because of occlusion by the end-effector. SVM mitigates this using vision models that out-paint the end-effector, thereby significantly enhancing target localization. We demonstrate that aided by out-painting methods, open-vocabulary object detectors can serve as a drop-in module for SVM to seek semantic targets (e.g. knobs) and point tracking methods can help SVM reliably pursue interaction sites indicated by user clicks. We conduct a large-scale evaluation spanning experiments in 10 novel environments across 6 buildings including 72 different object instances. SVM obtains a 71% zero-shot success rate on manipulating unseen objects in novel environments in the real world, outperforming an open-loop control method by an absolute 42% and an imitation learning baseline trained on 1000+ demonstrations also by an absolute success rate of 50%.
comment: Project webpage: https://arjung128.github.io/svm
Smooth Model Predictive Path Integral Control without Smoothing IROS 2022
We present a sampling-based control approach that can generate smooth actions for general nonlinear systems without external smoothing algorithms. Model Predictive Path Integral (MPPI) control has been utilized in numerous robotic applications due to its appealing characteristics to solve non-convex optimization problems. However, the stochastic nature of sampling-based methods can cause significant chattering in the resulting commands. Chattering becomes more prominent in cases where the environment changes rapidly, possibly even causing the MPPI to diverge. To address this issue, we propose a method that seamlessly combines MPPI with an input-lifting strategy. In addition, we introduce a new action cost to smooth control sequence during trajectory rollouts while preserving the information theoretic interpretation of MPPI, which was derived from non-affine dynamics. We validate our method in two nonlinear control tasks with neural network dynamics: a pendulum swing-up task and a challenging autonomous driving task. The experimental results demonstrate that our method outperforms the MPPI baselines with additionally applied smoothing algorithms.
comment: Accepted to IEEE Robotics and Automation Letters (and IROS 2022). Project page: https://www.taekyung.me/smppi
Visual Affordance Prediction: Survey and Reproducibility
Affordances are the potential actions an agent can perform on an object, as observed by a camera. Visual affordance prediction is formulated differently for tasks such as grasping detection, affordance classification, affordance segmentation, and hand pose estimation. This diversity in formulations leads to inconsistent definitions that prevent fair comparisons between methods. In this paper, we propose a unified formulation of visual affordance prediction by accounting for the complete information on the objects of interest and the interaction of the agent with the objects to accomplish a task. This unified formulation allows us to comprehensively and systematically review disparate visual affordance works, highlighting strengths and limitations of both methods and datasets. We also discuss reproducibility issues, such as the unavailability of methods implementation and experimental setups details, making benchmarks for visual affordance prediction unfair and unreliable. To favour transparency, we introduce the Affordance Sheet, a document that details the solution, datasets, and validation of a method, supporting future reproducibility and fairness in the community.
comment: 18 pages, 3 figures, 13 tables. Project website at https://apicis.github.io/aff-survey/
Model Predictive Inferential Control of Neural State-Space Models for Autonomous Vehicle Motion Planning
Model predictive control (MPC) has proven useful in enabling safe and optimal motion planning for autonomous vehicles. In this paper, we investigate how to achieve MPC-based motion planning when a neural state-space model represents the vehicle dynamics. As the neural state-space model will lead to highly complex, nonlinear and nonconvex optimization landscapes, mainstream gradient-based MPC methods will be computationally too heavy to be a viable solution. In a departure, we propose the idea of model predictive inferential control (MPIC), which seeks to infer the best control decisions from the control objectives and constraints. Following the idea, we convert the MPC problem for motion planning into a Bayesian state estimation problem. Then, we develop a new particle filtering/smoothing approach to perform the estimation. This approach is implemented as banks of unscented Kalman filters/smoothers and offers high sampling efficiency, fast computation, and estimation accuracy. We evaluate the MPIC approach through a simulation study of autonomous driving in different scenarios, along with an exhaustive comparison with gradient-based MPC. The results show that the MPIC approach has considerable computational efficiency, regardless of complex neural network architectures, and shows the capability to solve large-scale MPC problems for neural state-space models.
Multiagent Systems
StoryBox: Collaborative Multi-Agent Simulation for Hybrid Bottom-Up Long-Form Story Generation Using Large Language Models
Human writers often begin their stories with an overarching mental scene, where they envision the interactions between characters and their environment. Inspired by this creative process, we propose a novel approach to long-form story generation, termed hybrid bottom-up long-form story generation, using multi-agent simulations. In our method, agents interact within a dynamic sandbox environment, where their behaviors and interactions with one another and the environment generate emergent events. These events form the foundation for the story, enabling organic character development and plot progression. Unlike traditional top-down approaches that impose rigid structures, our hybrid bottom-up approach allows for the natural unfolding of events, fostering more spontaneous and engaging storytelling. The system is capable of generating stories exceeding 10,000 words while maintaining coherence and consistency, addressing some of the key challenges faced by current story generation models. We achieve state-of-the-art performance across several metrics. This approach offers a scalable and innovative solution for creating dynamic, immersive long-form stories that evolve organically from agent-driven interactions.
comment: Project: https://storyboxproject.github.io
Coordinated Strategies in Realistic Air Combat by Hierarchical Multi-Agent Reinforcement Learning
Achieving mission objectives in a realistic simulation of aerial combat is highly challenging due to imperfect situational awareness and nonlinear flight dynamics. In this work, we introduce a novel 3D multi-agent air combat environment and a Hierarchical Multi-Agent Reinforcement Learning framework to tackle these challenges. Our approach combines heterogeneous agent dynamics, curriculum learning, league-play, and a newly adapted training algorithm. To this end, the decision-making process is organized into two abstraction levels: low-level policies learn precise control maneuvers, while high-level policies issue tactical commands based on mission objectives. Empirical results show that our hierarchical approach improves both learning efficiency and combat performance in complex dogfight scenarios.
comment: 2025 IEEE International Conference on Agentic AI (ICA)
Autonomous vehicles need social awareness to find optima in multi-agent reinforcement learning routing games
Previous work has shown that when multiple selfish Autonomous Vehicles (AVs) are introduced to future cities and start learning optimal routing strategies using Multi-Agent Reinforcement Learning (MARL), they may destabilize traffic systems, as they would require a significant amount of time to converge to the optimal solution, equivalent to years of real-world commuting. We demonstrate that moving beyond the selfish component in the reward significantly relieves this issue. If each AV, apart from minimizing its own travel time, aims to reduce its impact on the system, this will be beneficial not only for the system-wide performance but also for each individual player in this routing game. By introducing an intrinsic reward signal based on the marginal cost matrix, we significantly reduce training time and achieve convergence more reliably. Marginal cost quantifies the impact of each individual action (route-choice) on the system (total travel time). Including it as one of the components of the reward can reduce the degree of non-stationarity by aligning agents' objectives. Notably, the proposed counterfactual formulation preserves the system's equilibria and avoids oscillations. Our experiments show that training MARL algorithms with our novel reward formulation enables the agents to converge to the optimal solution, whereas the baseline algorithms fail to do so. We show these effects in both a toy network and the real-world network of Saint-Arnoult. Our results optimistically indicate that social awareness (i.e., including marginal costs in routing decisions) improves both the system-wide and individual performance of future urban systems with AVs.
A Vision for Access Control in LLM-based Agent Systems
The autonomy and contextual complexity of LLM-based agents render traditional access control (AC) mechanisms insufficient. Static, rule-based systems designed for predictable environments are fundamentally ill-equipped to manage the dynamic information flows inherent in agentic interactions. This position paper argues for a paradigm shift from binary access control to a more sophisticated model of information governance, positing that the core challenge is not merely about permission, but about governing the flow of information. We introduce Agent Access Control (AAC), a novel framework that reframes AC as a dynamic, context-aware process of information flow governance. AAC operates on two core modules: (1) multi-dimensional contextual evaluation, which assesses not just identity but also relationships, scenarios, and norms; and (2) adaptive response formulation, which moves beyond simple allow/deny decisions to shape information through redaction, summarization, and paraphrasing. This vision, powered by a dedicated AC reasoning engine, aims to bridge the gap between human-like nuanced judgment and scalable Al safety, proposing a new conceptual lens for future research in trustworthy agent design.
comment: 10 pages, 1 figure
Stronger Together: On-Policy Reinforcement Learning for Collaborative LLMs
Multi-agent systems (MAS) and reinforcement learning (RL) are widely used to enhance the agentic capabilities of large language models (LLMs). MAS improves task performance through role-based orchestration, while RL uses environmental rewards to learn stronger policies, such as GRPO-style optimization. However, applying on-policy RL to MAS remains underexplored and presents unique challenges. Algorithmically, standard GRPO grouping assumptions break down because prompts vary by role and by turn. System-wise, the training stack must support MAS-workflow rollouts and on-policy updates for both single-policy and multi-policy models. We propose AT-GRPO, which includes (i) an agent- and turn-wise grouped RL algorithm tailored to MAS and (ii) a training system that supports both single- and multi-policy regimes. Across game, planning, coding, and math tasks, AT-GRPO delivers substantial gains. On long-horizon planning, it increases accuracy from a 14.0 to 47.0 percent single-agent RL baseline to 96.0 to 99.5 percent. It also improves reasoning performance, with average gains of 3.87 to 7.62 percent on coding tasks and 9.0 to 17.93 percent on math. Code and environments are available at: https://github.com/pettingllms-ai/PettingLLMs.
Automating Structural Engineering Workflows with Large Language Model Agents
We introduce $\textbf{MASSE}$, the first Multi-Agent System for Structural Engineering, effectively integrating large language model (LLM)-based agents with real-world engineering workflows. Structural engineering is a fundamental yet traditionally stagnant domain, with core workflows remaining largely unchanged for decades despite its substantial economic impact and global market size. Recent advancements in LLMs have significantly enhanced their ability to perform complex reasoning, long-horizon planning, and precise tool utilization -- capabilities well aligned with structural engineering tasks such as interpreting design codes, executing load calculations, and verifying structural capacities. We present a proof-of-concept showing that most real-world structural engineering workflows can be fully automated through a training-free LLM-based multi-agent system. MASSE enables immediate deployment in professional environments, and our comprehensive validation on real-world case studies demonstrates that it can reduce expert workload from approximately two hours to mere minutes, while enhancing both reliability and accuracy in practical engineering scenarios.
comment: Code: https://github.com/DelosLiang/masse
Comparative Evaluation of Neural Network Architectures for Generalizable Human Spatial Preference Prediction in Unseen Built Environments
The capacity to predict human spatial preferences within built environments is instrumental for developing Cyber-Physical-Social Infrastructure Systems (CPSIS). A significant challenge in this domain is the generalizability of preference models, particularly their efficacy in predicting preferences within environmental configurations not encountered during training. While deep learning models have shown promise in learning complex spatial and contextual dependencies, it remains unclear which neural network architectures are most effective at generalizing to unseen layouts. To address this, we conduct a comparative study of Graph Neural Networks, Convolutional Neural Networks, and standard feedforward Neural Networks using synthetic data generated from a simplified and synthetic pocket park environment. Beginning with this illustrative case study, allows for controlled analysis of each model's ability to transfer learned preference patterns to unseen spatial scenarios. The models are evaluated based on their capacity to predict preferences influenced by heterogeneous physical, environmental, and social features. Generalizability score is calculated using the area under the precision-recall curve for the seen and unseen layouts. This generalizability score is appropriate for imbalanced data, providing insights into the suitability of each neural network architecture for preference-aware human behavior modeling in unseen built environments.
comment: The 15th International Workshop on Structural Health Monitoring (IWSHM)
The Social Cost of Intelligence: Emergence, Propagation, and Amplification of Stereotypical Bias in Multi-Agent Systems
Bias in large language models (LLMs) remains a persistent challenge, manifesting in stereotyping and unfair treatment across social groups. While prior research has primarily focused on individual models, the rise of multi-agent systems (MAS), where multiple LLMs collaborate and communicate, introduces new and largely unexplored dynamics in bias emergence and propagation. In this work, we present a comprehensive study of stereotypical bias in MAS, examining how internal specialization, underlying LLMs and inter-agent communication protocols influence bias robustness, propagation, and amplification. We simulate social contexts where agents represent different social groups and evaluate system behavior under various interaction and adversarial scenarios. Experiments on three bias benchmarks reveal that MAS are generally less robust than single-agent systems, with bias often emerging early through in-group favoritism. However, cooperative and debate-based communication can mitigate bias amplification, while more robust underlying LLMs improve overall system stability. Our findings highlight critical factors shaping fairness and resilience in multi-agent LLM systems.
comment: 15 pages, 19 figures, Preprint. Under review
Rationally Analyzing Shelby: Proving Incentive Compatibility in a Decentralized Storage Network
Decentralized storage is one of the most natural applications built on blockchains and a central component of the Web3 ecosystem. Yet despite a decade of active development -- from IPFS and Filecoin to more recent entrants -- most of these storage protocols have received limited formal analysis of their incentive properties. Claims of incentive compatibility are sometimes made, but rarely proven. This gap matters: without well-designed incentives, a system may distribute storage but fail to truly decentralize it. We analyze Shelby -- a storage network protocol recently proposed by Aptos Labs and Jump Crypto -- and provide the first formal proof of its incentive properties. Our game-theoretic model shows that while off-chain audits alone collapse to universal shirking, Shelby's combination of peer audits with occasional on-chain verification yields incentive compatibility under natural parameter settings. We also examine coalition behavior and outline a simple modification that strengthens the protocol's collusion-resilience.
comment: 23 pages, 1 figure
Mean-Field Games with Constraints
This paper introduces a framework of Constrained Mean-Field Games (CMFGs), where each agent solves a constrained Markov decision process (CMDP). This formulation captures scenarios in which agents' strategies are subject to feasibility, safety, or regulatory restrictions, thereby extending the scope of classical mean field game (MFG) models. We first establish the existence of CMFG equilibria under a strict feasibility assumption, and we further show uniqueness under a classical monotonicity condition. To compute equilibria, we develop Constrained Mean-Field Occupation Measure Optimization (CMFOMO), an optimization-based scheme that parameterizes occupation measures and shows that finding CMFG equilibria is equivalent to solving a single optimization problem with convex constraints and bounded variables. CMFOMO does not rely on uniqueness of the equilibria and can approximate all equilibria with arbitrary accuracy. We further prove that CMFG equilibria induce $O(1 / \sqrt{N})$-Nash equilibria in the associated constrained $N$-player games, thereby extending the classical justification of MFGs as approximations for large but finite systems. Numerical experiments on a modified Susceptible-Infected-Susceptible (SIS) epidemic model with various constraints illustrate the effectiveness and flexibility of the framework.
Empirical Study on Robustness and Resilience in Cooperative Multi-Agent Reinforcement Learning NeurIPS 2025
In cooperative Multi-Agent Reinforcement Learning (MARL), it is a common practice to tune hyperparameters in ideal simulated environments to maximize cooperative performance. However, policies tuned for cooperation often fail to maintain robustness and resilience under real-world uncertainties. Building trustworthy MARL systems requires a deep understanding of robustness, which ensures stability under uncertainties, and resilience, the ability to recover from disruptions--a concept extensively studied in control systems but largely overlooked in MARL. In this paper, we present a large-scale empirical study comprising over 82,620 experiments to evaluate cooperation, robustness, and resilience in MARL across 4 real-world environments, 13 uncertainty types, and 15 hyperparameters. Our key findings are: (1) Under mild uncertainty, optimizing cooperation improves robustness and resilience, but this link weakens as perturbations intensify. Robustness and resilience also varies by algorithm and uncertainty type. (2) Robustness and resilience do not generalize across uncertainty modalities or agent scopes: policies robust to action noise for all agents may fail under observation noise on a single agent. (3) Hyperparameter tuning is critical for trustworthy MARL: surprisingly, standard practices like parameter sharing, GAE, and PopArt can hurt robustness, while early stopping, high critic learning rates, and Leaky ReLU consistently help. By optimizing hyperparameters only, we observe substantial improvement in cooperation, robustness and resilience across all MARL backbones, with the phenomenon also generalizing to robust MARL methods across these backbones. Code and results available at https://github.com/BUAA-TrustworthyMARL/adv_marl_benchmark .
comment: 44 pages, 16 figures, NeurIPS 2025
Semantic knowledge guides innovation and drives cultural evolution
Cumulative cultural evolution enables human societies to generate increasingly complex knowledge and technology over generations. While social learning transmits innovations between individuals and generations, the cognitive processes that generate these innovations remain poorly understood. Here, we demonstrate that semantic knowledge-structured associations between concepts and their functions-provides cognitive scaffolding for cumulative innovation by guiding exploration toward plausible and meaningful actions. We tested this hypothesis using a cultural evolutionary agent-based model and a large-scale behavioural experiment (N = 1,243), in which individuals performed a task requiring the combination of items into novel innovations. Across both approaches, semantic knowledge and social learning interact synergistically to enhance innovation. Behaviorally, participants without access to semantic knowledge performed no better than chance, even when social learning was available, and relied on shallow exploration strategies. These findings suggest that semantic knowledge is a key cognitive process enabling human cumulative culture.
Talk Isn't Always Cheap: Understanding Failure Modes in Multi-Agent Debate ICML
While multi-agent debate has been proposed as a promising strategy for improving AI reasoning ability, we find that debate can sometimes be harmful rather than helpful. Prior work has primarily focused on debates within homogeneous groups of agents, whereas we explore how diversity in model capabilities influences the dynamics and outcomes of multi-agent interactions. Through a series of experiments, we demonstrate that debate can lead to a decrease in accuracy over time - even in settings where stronger (i.e., more capable) models outnumber their weaker counterparts. Our analysis reveals that models frequently shift from correct to incorrect answers in response to peer reasoning, favoring agreement over challenging flawed reasoning. We perform additional experiments investigating various potential contributing factors to these harmful shifts - including sycophancy, social conformity, and model and task type. These results highlight important failure modes in the exchange of reasons during multi-agent debate, suggesting that naive applications of debate may cause performance degradation when agents are neither incentivised nor adequately equipped to resist persuasive but incorrect reasoning.
comment: ICML MAS Workshop 2025
Simulating Persuasive Dialogues on Meat Reduction with Generative Agents
Meat reduction benefits human and planetary health, but social norms keep meat central in shared meals. To date, the development of communication strategies that promote meat reduction while minimizing social costs has required the costly involvement of human participants at each stage of the process. We present work in progress on simulating multi-round dialogues on meat reduction between Generative Agents based on large language models (LLMs). We measure our main outcome using established psychological questionnaires based on the Theory of Planned Behavior and additionally investigate Social Costs. We find evidence that our preliminary simulations produce outcomes that are (i) consistent with theoretical expectations; and (ii) valid when compared to data from previous studies with human participants. Generative agent-based models are a promising tool for identifying novel communication strategies on meat reduction -- tailored to highly specific participant groups -- to then be tested in subsequent studies with human participants.
comment: Code available at https://github.com/dess-mannheim/MeatlessAgents
Incentivize Contribution and Learn Parameters Too: Federated Learning with Strategic Data Owners
Classical federated learning (FL) assumes that the clients have a limited amount of noisy data with which they voluntarily participate and contribute towards learning a global, more accurate model in a principled manner. The learning happens in a distributed fashion without sharing the data with the center. However, these methods do not consider the incentive of an agent for participating and contributing to the process, given that data collection and running a distributed algorithm is costly for the clients. The question of rationality of contribution has been asked recently in the literature and some results exist that consider this problem. This paper addresses the question of simultaneous parameter learning and incentivizing contribution in a truthful manner, which distinguishes it from the extant literature. Our first mechanism incentivizes each client to contribute to the FL process at a Nash equilibrium and simultaneously learn the model parameters. We also ensure that agents are incentivized to truthfully reveal information in the intermediate stages of the algorithm. However, this equilibrium outcome can be away from the optimal, where clients contribute with their full data and the algorithm learns the optimal parameters. We propose a second mechanism that enables the full data contribution along with optimal parameter learning. Large scale experiments with real (federated) datasets (CIFAR-10, FEMNIST, and Twitter) show that these algorithms converge quite fast in practice, yield good welfare guarantees and better model performance for all agents.
comment: 25 pages, under review
Agent-Based Modelling for Real-World Stock Markets under Behavioral Economic Principles
The reproduction of realistic dynamics in financial markets is of great significance, as it enhances our understanding of market evolution beyond other physical processes, and facilitates the development and backtesting of investment strategies. Most existing literature approaches this issue as a time series forecasting problem, which often faces challenges such as 1) overfitting historical data, 2) failing to reconstruct stylized facts, and 3) limiting users' ability to conduct counterfactual analyses. To address these limitations, we employ agent-based modeling (ABM) for market simulation, where each trader acts as an autonomous agent guided by established behavioral-economic principles. The parameters of the agent model are subsequently calibrated using deep learning techniques. Additionally, we align our agent model with publicly available economic indices, such as the Consumer Price Index (CPI), to enhance the explainability of our system's outcomes. Our experiments demonstrate that the ABM method effectively reproduces market dynamics with a confidence level of 90%, accurately reflecting well-known stylized facts. Furthermore, the calibration process proves to be more computationally efficient compared to other existing methods that perform simulation-based inference. We also present case studies illustrating the correlation between agent parameters and economic indices.
Systems and Control (CS)
Analysis of the Geometric Heat Flow Equation: Computing Geodesics in Real-Time with Convergence Guarantees
We present an analysis on the convergence properties of the so-called geometric heat flow equation for computing geodesics (shortest-path~curves) on Riemannian manifolds. Computing geodesics numerically in real-time has become an important capability in several fields, including control and motion planning. The geometric heat flow equation involves solving a parabolic partial differential equation whose solution is a geodesic. In practice, solving this PDE numerically can be done efficiently, and tends to be more numerically stable and exhibit a better rate of convergence compared to numerical optimization. We prove that the geometric heat flow equation is globally exponentially stable in $L_2$ if the curvature of the Riemannian manifold is not too positive, and that asymptotic convergence in $L_2$ is always guaranteed. We also present a pseudospectral method that leverages Chebyshev polynomials to accurately compute geodesics in only a few milliseconds for non-contrived manifolds. Our analysis was verified with our custom pseudospectral method by computing geodesics on common non-Euclidean surfaces, and in feedback for a contraction-based controller with a non-flat metric for a nonlinear system.
Ego-Vision World Model for Humanoid Contact Planning
Enabling humanoid robots to exploit physical contact, rather than simply avoid collisions, is crucial for autonomy in unstructured environments. Traditional optimization-based planners struggle with contact complexity, while on-policy reinforcement learning (RL) is sample-inefficient and has limited multi-task ability. We propose a framework combining a learned world model with sampling-based Model Predictive Control (MPC), trained on a demonstration-free offline dataset to predict future outcomes in a compressed latent space. To address sparse contact rewards and sensor noise, the MPC uses a learned surrogate value function for dense, robust planning. Our single, scalable model supports contact-aware tasks, including wall support after perturbation, blocking incoming objects, and traversing height-limited arches, with improved data efficiency and multi-task capability over on-policy RL. Deployed on a physical humanoid, our system achieves robust, real-time contact planning from proprioception and ego-centric depth images. Website: https://ego-vcp.github.io/
Smooth Spatiotemporal Tube Synthesis for Prescribed-Time Reach-Avoid-Stay Control
In this work, we address the issue of controller synthesis for a control-affine nonlinear system to meet prescribed time reach-avoid-stay specifications. Our goal is to improve upon previous methods based on spatiotemporal tubes (STTs) by eliminating the need for circumvent functions, which often lead to abrupt tube modifications and high control effort. We propose an adaptive framework that constructs smooth STTs around static unsafe sets, enabling continuous avoidance while guiding the system toward the target within the prescribed time. A closed-form, approximation-free control law is derived to ensure the system trajectory remains within the tube and satisfies the RAS task. The effectiveness of the proposed approach is demonstrated through a case study, showing a significant reduction in control effort compared to prior methods.
IntersectioNDE: Learning Complex Urban Traffic Dynamics based on Interaction Decoupling Strategy SC 2025
Realistic traffic simulation is critical for ensuring the safety and reliability of autonomous vehicles (AVs), especially in complex and diverse urban traffic environments. However, existing data-driven simulators face two key challenges: a limited focus on modeling dense, heterogeneous interactions at urban intersections - which are prevalent, crucial, and practically significant in countries like China, featuring diverse agents including motorized vehicles (MVs), non-motorized vehicles (NMVs), and pedestrians - and the inherent difficulty in robustly learning high-dimensional joint distributions for such high-density scenes, often leading to mode collapse and long-term simulation instability. We introduce City Crossings Dataset (CiCross), a large-scale dataset collected from a real-world urban intersection, uniquely capturing dense, heterogeneous multi-agent interactions, particularly with a substantial proportion of MVs, NMVs and pedestrians. Based on this dataset, we propose IntersectioNDE (Intersection Naturalistic Driving Environment), a data-driven simulator tailored for complex urban intersection scenarios. Its core component is the Interaction Decoupling Strategy (IDS), a training paradigm that learns compositional dynamics from agent subsets, enabling the marginal-to-joint simulation. Integrated into a scene-aware Transformer network with specialized training techniques, IDS significantly enhances simulation robustness and long-term stability for modeling heterogeneous interactions. Experiments on CiCross show that IntersectioNDE outperforms baseline methods in simulation fidelity, stability, and its ability to replicate complex, distribution-level urban traffic dynamics.
comment: Accepted by ITSC 2025
A Physics-Informed Reinforcement Learning Approach for Degradation-Aware Long-Term Charging Optimization in Batteries
Batteries degrade with usage and continuous cycling. This aging is typically reflected through the resistance growth and the capacity fade of battery cells. Over the years, various charging methods have been presented in the literature that proposed current profiles in order to enable optimal, fast, and/or health-conscious charging. However, very few works have attempted to make the ubiquitous Constant Current Constant Voltage (CCCV) charging protocol adaptive to the changing battery health as it cycles. This work aims to address this gap and proposes a framework that optimizes the constant current part of the CCCV protocol adapting to long-term battery degradation. Specifically, a physics-informed Reinforcement Learning (RL) approach has been used that not only estimates a key battery degradation mechanism, namely, Loss of Active Material (LAM), but also adjusts the current magnitude of CCCV as a result of this particular degradation. The proposed framework has been implemented by combining PyBamm, an open-source battery modeling tool, and Stable-baselines where the RL agent was trained using a Proximal Policy Optimization (PPO) network. Simulation results show the potential of the proposed framework for enhancing the widely used CCCV protocol by embedding physics information in RL algorithm. A comparative study of this proposed agent has also been discussed with 2 other charging protocols generated by a non-physics-based RL agent and a constant CCCV for all the cycles.
Constraint-Aware Reinforcement Learning via Adaptive Action Scaling
Safe reinforcement learning (RL) seeks to mitigate unsafe behaviors that arise from exploration during training by reducing constraint violations while maintaining task performance. Existing approaches typically rely on a single policy to jointly optimize reward and safety, which can cause instability due to conflicting objectives, or they use external safety filters that override actions and require prior system knowledge. In this paper, we propose a modular cost-aware regulator that scales the agent's actions based on predicted constraint violations, preserving exploration through smooth action modulation rather than overriding the policy. The regulator is trained to minimize constraint violations while avoiding degenerate suppression of actions. Our approach integrates seamlessly with off-policy RL methods such as SAC and TD3, and achieves state-of-the-art return-to-cost ratios on Safety Gym locomotion tasks with sparse costs, reducing constraint violations by up to 126 times while increasing returns by over an order of magnitude compared to prior methods.
The Role of Flexible Connection in Accelerating Load Interconnection in Distribution Networks
This paper investigates the role of flexible connection in accelerating the interconnection of large loads amid rising electricity demand from data centers and electrification. Flexible connection allows new loads to defer or curtail consumption during rare, grid-constrained periods, enabling faster access without major infrastructure upgrades. To quantify how flexible connection unlocks load hosting capacity, we formulate a flexibility-aware hosting capacity analysis problem that explicitly limits the number of utility-controlled interventions per year, ensuring infrequent disruption. Efficient solution methods are developed for this nonconvex problem and applied to real load data and test feeders. Empirical results reveal that modest flexibility, i.e., few interventions with small curtailments or delays, can unlock substantial hosting capacity. Theoretical analysis further explains and generalizes these findings, highlighting the broad potential of flexible connection.
A Faster and More Reliable Middleware for Autonomous Driving Systems
Ensuring safety in high-speed autonomous vehicles requires rapid control loops and tightly bounded delays from perception to actuation. Many open-source autonomy systems rely on ROS 2 middleware; when multiple sensor and control nodes share one compute unit, ROS 2 and its DDS transports add significant (de)serialization, copying, and discovery overheads, shrinking the available time budget. We present Sensor-in-Memory (SIM), a shared-memory transport designed for intra-host pipelines in autonomous vehicles. SIM keeps sensor data in native memory layouts (e.g., cv::Mat, PCL), uses lock-free bounded double buffers that overwrite old data to prioritize freshness, and integrates into ROS 2 nodes with four lines of code. Unlike traditional middleware, SIM operates beside ROS 2 and is optimized for applications where data freshness and minimal latency outweigh guaranteed completeness. SIM provides sequence numbers, a writer heartbeat, and optional checksums to ensure ordering, liveness, and basic integrity. On an NVIDIA Jetson Orin Nano, SIM reduces data-transport latency by up to 98% compared to ROS 2 zero-copy transports such as FastRTPS and Zenoh, lowers mean latency by about 95%, and narrows 95th/99th-percentile tail latencies by around 96%. In tests on a production-ready Level 4 vehicle running Autoware.Universe, SIM increased localization frequency from 7.5 Hz to 9.5 Hz. Applied across all latency-critical modules, SIM cut average perception-to-decision latency from 521.91 ms to 290.26 ms, reducing emergency braking distance at 40 mph (64 km/h) on dry concrete by 13.6 ft (4.14 m).
comment: 8 pages,7 figures, 8 tables
Trajectory control of a suspended load with non-stopping flying carriers
This paper presents the first closed-loop control framework for cooperative payload transportation with non-stopping flying carriers. Building upon grasp-matrix formulations and internal force redundancy, we propose a feedback wrench controller that actively regulates the payload's pose while an optimization layer dynamically shapes internal-force oscillations to guarantee persistent carrier motion. Preliminary experimental results on multirotor UAVs validate the model assumptions, and numerical simulations demonstrate that the method successfully prevents carrier stagnation, achieves accurate load tracking, and generates physically feasible trajectories with smooth velocity profiles. The proposed framework not only advances the state of the art but also offers a reliable, versatile solution for future real-world applications requiring load transportation by coordinated non-stopping flying carriers.
Robust Recovery and Control of Cyber-physical Discrete Event Systems under Actuator Attacks
Critical real-world applications strongly rely on Cyber-physical systems (CPS), but their dependence on communication networks introduces significant security risks, as attackers can exploit vulnerabilities to compromise their integrity and availability. This work explores the topic of cybersecurity in the context of CPS modeled as discrete event systems (DES), focusing on recovery strategies following the detection of cyberattacks. Specifically, we address actuator enablement attacks and propose a method that preserves the system's full valid behavior under normal conditions. Upon detecting an attack, our proposed solution aims to guide the system toward a restricted yet robust behavior, ensuring operational continuity and resilience. Additionally, we introduce a property termed AE-robust recoverability, which characterizes the necessary and sufficient conditions for recovering a system from attacks while preventing further vulnerabilities. Finally, we showcase the proposed solution through a case study based on a manufacturing system.
comment: This work has been accepted for publication in the 64th IEEE Conference on Decision and Control (CDC). The final published version will be available on IEEE Xplore
Robust Closed-Form Control for MIMO Nonlinear Systems under Conflicting Time-Varying Hard and Soft Constraints
This paper introduces a novel robust closed-form control law to handle time-varying hard and soft constraints in uncertain high-relative-degree nonlinear MIMO systems. These constraints represent spatiotemporal specifications in mechanical systems' operational space, with hard constraints ensuring safety-critical requirements and soft constraints encoding performance or task objectives. Initially, all constraints are consolidated into two separate scalar time-varying hard and soft constraint functions, whose positive level sets define feasible regions. A closed-form control law is developed to enforce these constraints using appropriately designed reciprocal barriers and nonlinear transformation functions. When conflicts between hard and soft constraints arise, the control law prioritizes hard constraints by virtually relaxing soft constraints via a dynamic relaxation law. Notably, the proposed control law maintains low complexity by avoiding approximation schemes for coping with system uncertainties. Simulation results confirm the effectiveness of the proposed method.
comment: 18 pages, 6 figures
Data-Driven Estimation of Quadrotor Motor Efficiency via Residual Minimization
A data-driven framework is proposed for online estimation of quadrotor motor efficiency via residual minimization. The problem is formulated as a constrained nonlinear optimization that minimizes trajectory residuals between measured flight data and predictions generated by a quadrotor dynamics model. A sliding-window strategy enables online estimation, and the optimization is efficiently solved using an iteratively reweighted least squares (IRLS) scheme combined with a primal-dual interior-point method, with inequality constraints enforced through a logarithmic barrier function. Robust z-score weighting is employed to reject outliers, which is particularly effective in motor clipping scenarios where the proposed estimator exhibits smaller spikes than an EKF baseline. Compared to traditional filter-based approaches, the batch-mode formulation offers greater flexibility by selectively incorporating informative data segments. This structure is well-suited for onboard implementation, particularly for applications such as fault detection and isolation (FDI), health monitoring, and predictive maintenance in aerial robotic systems. Simulation results under various degradation scenarios demonstrate the accuracy and robustness of the proposed estimator.
High-Order Quarter-Wave Plate Optimization for Linear Birefringence Suppression in Reflective FOCS
Fiber optic current sensors (FOCS) are widely adopted in modern power grids due to high sensitivity, excellent insulation, and strong immunity to electromagnetic interference. This prominence necessitates precise investigation into their error sources and corresponding optimization. This study examines reflective FOCS based on the Faraday effect. A theoretical model is established to simulate phase error caused by linear birefringence from the quarter-wave plate. Conventional methods using circular birefringence are analyzed, revealing inherent limitations. Innovatively, a compensation strategy employing high-order quarter-wave plates is proposed to effectively eliminate linear birefringence effects. This approach significantly enhances the accuracy and practicality of FOCS in precision metrology.
Exponential convergence of multiagent systems with lack of connection
Finding conditions ensuring consensus, i.e. convergence to a common value, for a networked system is of crucial interest, both for theoretical reasons and applications. This goal is harder to achieve when connections between agents are temporarily lost. Here, we prove that known conditions (introduced by Moreau) ensure an exponential convergence to consensus, with explicit rate of convergence. The key result is related to the length of the graph (i.e. the number of connections to reach a common agent): if this is large, then convergence is slow. This general result also provides conditions for convergence of second-order cooperative systems with lack of connections.
Efficient LLM Inference over Heterogeneous Edge Networks with Speculative Decoding
Large language model (LLM) inference at the network edge is a promising serving paradigm that leverages distributed edge resources to run inference near users and enhance privacy. Existing edge-based LLM inference systems typically adopt autoregressive decoding (AD), which only generates one token per forward pass. This iterative process, compounded by the limited computational resources of edge nodes, results in high serving latency and constrains the system's ability to support multiple users under growing demands.To address these challenges, we propose a speculative decoding (SD)-based LLM serving framework that deploys small and large models across heterogeneous edge nodes to collaboratively deliver inference services. Specifically, the small model rapidly generates draft tokens that the large model verifies in parallel, enabling multi-token generation per forward pass and thus reducing serving latency. To improve resource utilization of edge nodes, we incorporate pipeline parallelism to overlap drafting and verification across multiple inference tasks. Based on this framework, we analyze and derive a comprehensive latency model incorporating both communication and inference latency. Then, we formulate a joint optimization problem for speculation length, task batching, and wireless communication resource allocation to minimize total serving latency. To address this problem, we derive the closed-form solutions for wireless communication resource allocation, and develop a dynamic programming algorithm for joint batching and speculation control strategies. Experimental results demonstrate that the proposed framework achieves lower serving latency compared to AD-based serving systems. In addition,the proposed joint optimization method delivers up to 44.9% latency reduction compared to benchmark schemes.
pyspect: An Extensible Toolbox for Automatic Construction of Temporal Logic Trees via Reachability Analysis
In this paper, we present pyspect, a Python toolbox that simplifies the use of reachability analysis for temporal logic problems. Currently, satisfying complex requirements in cyber-physical systems requires significant manual effort and domain expertise to develop the underlying reachability programs. This high development effort limits the broader adoption of reachability analysis for complex verification problems. To address this, pyspect provides a method-agnostic approach to performing reachability analysis for verifying a temporal logic specification via temporal logic trees (TLTs). It enables the specification of complex safety and liveness requirements using high-level logic formulations that are independent of any particular reachability technique or set representation. As a result, pyspect allows for the comparison of different reachability implementations, such as Hamilton-Jacobi and Hybrid Zonotope-based reachability analysis, for the same temporal logic specification. This design separates the concerns of implementation developers (who develop numerical procedures for reachability) and end-users (who write specifications). Through a simple vehicle example, we demonstrate how pyspect simplifies the synthesis of reachability programs, promotes specification reusability, and facilitates side-by-side comparisons of reachability techniques for complex tasks.
comment: To be published in the 64th IEEE Conference on Decision and Control
Edge-to-Cloud Computations-as-a-Service in Software-Defined Energy Networks for Smart Grids
Modern power grids face an acute mismatch between where data is generated and where it can be processed: protection relays, EV (Electric Vehicle) charging, and distributed renewables demand millisecond analytics at the edge, while energy-hungry workloads often sit in distant clouds leading to missed real-time deadlines and wasted power. We address this by proposing, to our knowledge, the first-ever SDEN (Software Defined Energy Network) for CaaS (Computations-as-a-Service) that unifies edge, fog, and cloud compute with 5G URLLC (Ultra-Reliable Low-Latency Communications), SDN (Software Defined Networking), and NFV (Network Functions Virtualization) to co-optimize energy, latency, and reliability end-to-end. Our contributions are threefold: (i) a joint task offloading formulation that couples computation placement with network capacity under explicit URLLC constraints; (ii) a feasibility preserving, lightweight greedy heuristic that scales while closely tracking optimal energy and latency trade-offs; and (iii) a tiered AI (Artificial Intelligence) pipeline-reactive at the edge, predictive in the fog, strategic in the cloud-featuring privacy-preserving, federated GNNs (Graph Neural Networks) for fault detection and microgrid coordination. Unlike prior edge-only or cloud-only schemes, SDEN turns fragmented grid compute into a single, programmable substrate that delivers dependable, energy-aware, real time analytics establishing a first-ever, software defined path to practical, grid-scale CaaS.
Utilizing Bayesian Optimization for Timetable-Independent Railway Junction Performance Determination
The efficiency of railway infrastructure is significantly influenced by the mix of trains that utilize it, as different service types have competing operational requirements. While freight services might require extended service times, passenger services demand more predictable schedules. Traditional methods for addressing long-term traffic assignment problems often rely on fixed-value capacity limitations, determined based on specific assumptions about traffic composition. This paper introduces a methodology for determining timetable-independent capacity within the traffic rate assignment problem, enabling the calculation of junction capacities under dynamic traffic distributions. We solve the underlying non-linear constrained optimization problem maximizing the traffic throughput using Bayesian optimization (BO). This setting combines a known objective function with expensive- to-compute capacity constraints, motivating an adaption of standard BO problems, where objective functions are usually unknown. We tailor the acquisition process in BO to this specific setting and increase performance by incorporating prior knowledge about the shape of the constraint functions into the Gaussian process surrogate model. Our derived approaches are benchmarked on a railway junction near Paris, significantly outperforming fixed traffic composition models and highlighting the benefits of dynamic capacity allocation.
Visible Light Communication for Vehicular Networks: A Tutorial
The advent of the fifth-generation technology promises to bring about more vertical applications and emerging services that include vehicular networks and intelligent transportation systems (ITSs). To achieve their vision of real-time and safetyapplications, vehicular networks rely on short-range to medium-range communications. One emerging technology that aims to provide reliability and high-data rate in short-range communications is the visible light communications (VLC). Due to its remarkable advantages, some studies have recently investigated the integration of VLC in vehicular networks and ITSs. Despite their attractive features, such networks also face several implementation issues. This paper provides an extended tutorial on the implementation of VLC-based vehicular networks. To begin with, we present the implementation characteristics of these systems and discuss some related issues. The underlying system considers a general structure with transmitters, channels, and receivers based on photodetectors and cameras, as well as standardization efforts and types of topologies. In addition, we discuss the impact of the sun and artificial light sources, flickering, dimming, throughput enhancement, uplink security, and mobility on practical implementation. Finally, we highlight some key challenges and potential solutions and provide some directions for future research investigations that could constitute an advancement toward the development of commercial VLC-based vehicular systems.
Establishing assembly-oriented modular product architectures through Design for Assembly enhanced Modular Function Deployment
Modular product design has become a strategic enabler for companies seeking to balance product variety, operational efficiency, and market responsiveness, making the alignment between modular architecture and manufacturing considerations increasingly critical. Modular Function Deployment (MFD) is a widely adopted method for defining modular product architectures, yet it lacks systematic support for assembly considerations during early concept and system-level development. This limitation increases the risk of delayed production ramp-up and lifecycle inefficiencies. This paper proposes a set of enhancements to MFD that integrate Design for Assembly (DFA) logic into architectural synthesis. The extended method introduces structured heuristics, assembly-oriented module drivers, a coded interface taxonomy, and quantitative metrics for assessing assembly feasibility and automation readiness. These additions preserve compatibility with standard MFD workflows while enriching decision-making with traceable, production-informed reasoning. An illustrative case study involving a handheld leaf blower demonstrates the method's usability and effectiveness. The redesigned architecture shows reduced assembly effort, simplified interfaces, and increased automation potential. By supporting early-stage evaluation of architectural alternatives through an assembly lens, the method enables faster transition to efficient volume production and provides a foundation for continuous improvement throughout the product lifecycle.
PhysHSI: Towards a Real-World Generalizable and Natural Humanoid-Scene Interaction System
Deploying humanoid robots to interact with real-world environments--such as carrying objects or sitting on chairs--requires generalizable, lifelike motions and robust scene perception. Although prior approaches have advanced each capability individually, combining them in a unified system is still an ongoing challenge. In this work, we present a physical-world humanoid-scene interaction system, PhysHSI, that enables humanoids to autonomously perform diverse interaction tasks while maintaining natural and lifelike behaviors. PhysHSI comprises a simulation training pipeline and a real-world deployment system. In simulation, we adopt adversarial motion prior-based policy learning to imitate natural humanoid-scene interaction data across diverse scenarios, achieving both generalization and lifelike behaviors. For real-world deployment, we introduce a coarse-to-fine object localization module that combines LiDAR and camera inputs to provide continuous and robust scene perception. We validate PhysHSI on four representative interactive tasks--box carrying, sitting, lying, and standing up--in both simulation and real-world settings, demonstrating consistently high success rates, strong generalization across diverse task goals, and natural motion patterns.
comment: Project website: https://why618188.github.io/physhsi/
Optimal Multi-Modal Transportation and Electric Power Flow: The Value of Coordinated Dynamic Operation
The electrification of transportation represents a critical challenge in the global transition toward net-zero emissions, as the sector often accounts for more than one-quarter of national energy consumption. Achieving this transformation requires not only widespread adoption of electric vehicles (EVs) but also their seamless integration into interdependent infrastructure systems-specifically, the transportation-electricity nexus (TEN). This paper develops an optimal multi-modal transportation and electric power flow (OMTEPF) model to evaluate the benefits of coordinated, dynamic system operation. Building on recent advances in hetero-functional graph theory, the framework enables joint optimization of five key operational decisions in intelligent TEN management: vehicle dispatch, route choice, charging station queuing, coordinated charging, and vehicle-to-grid stabilization. The mesoscopic, dynamic model explicitly represents individual EVs and their state-of-charge trajectories, thereby extending beyond the prevailing literature's focus on static, macroscopic traffic assignment. It further captures the full scope of the TEN as a system-of-systems, incorporating five distinct charging modalities: private residential, private commercial, wired public commercial, inductive public, and discharging. On the power system side, an IV-ACOPF formulation ensures globally optimal solutions to the electrical subproblems. Comparative analysis demonstrates the substantial value of coordinated TEN operation relative to the status quo of siloed, uncoordinated infrastructure management. This work provides both a novel methodological contribution and actionable insights for the co-design and operation of next-generation sustainable mobility-energy systems.
comment: 31 pages, 9 figures
Observability and parameter estimation of a generic model for aggregated distributed energy resources
We propose a novel framework for estimating the parameters of an aggregated distributed energy resources (der_a) model. First, we introduce a rigorous method to determine whether all model parameters are estimable. When they are not, our approach identifies the subset of parameters that can be estimated. The proposed framework offers new insights into the number and specific parameters that can be reliably estimated based on commonly available measurements. It also highlights the limitations of calibrating such models. Second, we introduce a Kalman filtering method to calibrate the der_a model. Since we account for nonlinear effects such as saturation and deadbands, we develop a specific mechanism to handle smoothing functions within the Kalman filter. Specifically, we consider the extended and the unscented Kalman filter. We demonstrate the effectiveness of the proposed framework on a modified IEEE 34-node distribution feeder with inverter-based resources. Our findings align with the North American Electric Reliability Corporation's parameterization guideline and underscore the importance of model calibration in accurately capturing the collective dynamics of distributed energy resources installed on distribution systems.
Quantum Deception: Honey-X Deception using Quantum Games
In this paper, we develop a framework for deception in quantum games, extending the Honey-X paradigm from classical zero-sum settings into the quantum domain. Building on a view of deception in classical games as manipulation of a player's perception of the payoff matrix, we formalize quantum deception as controlled perturbations of the payoff Hamiltonian subject to a deception budget. We show that when victims are aware of possible deception, their equilibrium strategies surprisingly coincide with those of naive victims who fully trust the deceptive Hamiltonian. This equivalence allows us to cast quantum deception as a bilevel optimization problem, which can be reformulated into a bilinear semidefinite program. To illustrate the framework, we present simulations on quantum versions of the Penny Flip game, demonstrating how quantum strategy spaces and non-classical payoffs can amplify the impact of deception relative to classical formulations.
comment: Submitted to 2026 American Control Conference (ACC), New Orleans, LA
Coherent Load Profile Synthesis with Conditional Diffusion for LV Distribution Network Scenario Generation
Limited visibility of power distribution network power flows at the low voltage level presents challenges to both distribution network operators from a planning perspective and distribution system operators from a congestion management perspective. Forestalling these challenges through scenario analysis is confounded by the lack of realistic and coherent load data across representative distribution feeders. Load profiling approaches often rely on summarising demand through typical profiles, which oversimplifies the complexity of substation-level operations and limits their applicability in specific power system studies. Sampling methods, and more recently generative models, have attempted to address this through synthesising representative loads from historical exemplars; however, while these approaches can approximate load shapes to a convincing degree of fidelity, the co-behaviour between substations, which ultimately impacts higher voltage level network operation, is often overlooked. This limitation will become even more pronounced with the increasing integration of low-carbon technologies, as estimates of base loads fail to capture load diversity. To address this gap, a Conditional Diffusion model for synthesising daily active and reactive power profiles at the low voltage distribution substation level is proposed. The evaluation of fidelity is demonstrated through conventional metrics capturing temporal and statistical realism, as well as power flow modelling. The results show synthesised load profiles are plausible both independently and as a cohort in a wider power systems context. The Conditional Diffusion model is benchmarked against both naive and state-of-the-art models to demonstrate its effectiveness in producing realistic scenarios on which to base sub-regional power distribution network planning and operations.
ORN-CBF: Learning Observation-conditioned Residual Neural Control Barrier Functions via Hypernetworks
Control barrier functions (CBFs) have been demonstrated as an effective method for safety-critical control of autonomous systems. Although CBFs are simple to deploy, their design remains challenging, motivating the development of learning-based approaches. Yet, issues such as suboptimal safe sets, applicability in partially observable environments, and lack of rigorous safety guarantees persist. In this work, we propose observation-conditioned neural CBFs based on Hamilton-Jacobi (HJ) reachability analysis, which approximately recover the maximal safe sets. We exploit certain mathematical properties of the HJ value function, ensuring that the predicted safe set never intersects with the observed failure set. Moreover, we leverage a hypernetwork-based architecture that is particularly suitable for the design of observation-conditioned safety filters. The proposed method is examined both in simulation and hardware experiments for a ground robot and a quadcopter. The results show improved success rates and generalization to out-of-domain environments compared to the baselines.
Learning Satellite Attitude Dynamics with Physics-Informed Normalising Flow
Attitude control is a fundamental aspect of spacecraft operations. Model Predictive Control (MPC) has emerged as a powerful strategy for these tasks, relying on accurate models of the system dynamics to optimize control actions over a prediction horizon. In scenarios where physics models are incomplete, difficult to derive, or computationally expensive, machine learning offers a flexible alternative by learning the system behavior directly from data. However, purely data-driven models often struggle with generalization and stability, especially when applied to inputs outside their training domain. To address these limitations, we investigate the benefits of incorporating Physics-Informed Neural Networks (PINNs) into the learning of spacecraft attitude dynamics, comparing their performance with that of purely data-driven approaches. Using a Real-valued Non-Volume Preserving (Real NVP) neural network architecture with a self-attention mechanism, we trained several models on simulated data generated with the Basilisk simulator. Two training strategies were considered: a purely data-driven baseline and a physics-informed variant to improve robustness and stability. Our results demonstrate that the inclusion of physics-based information significantly enhances the performance in terms of the mean relative error of the best architectures found by 27.08%. These advantages are particularly evident when the learned models are integrated into an MPC framework, where PINN-based models consistently outperform their purely data-driven counterparts in terms of control accuracy and robustness, yielding improvements of up to 42.86% in performance stability error and increased robustness-to-noise.
A Unified Framework for Innovation-based Stochastic and Deterministic Event Triggers
Resources such as bandwidth and energy are limited in many wireless communications use cases, especially when large numbers of sensors and fusion centers need to exchange information frequently. One opportunity to overcome resource constraints is the use of event-based transmissions and estimation to transmit only information that contributes significantly to the reconstruction of the system's state. The design of efficient triggering policies and estimators is crucial for successful event-based transmissions. While previously deterministic and stochastic event triggering policies have been treated separately, this paper unifies the two approaches and gives insights into the design of consistent trigger-matching estimators. Two different estimators are presented, and different pairs of triggers and estimators are evaluated through simulation studies.
comment: 8 pages, 5 figures, presented at FUSION 2025
Product Digital Twin Supporting End-of-life Phase of Electric Vehicle Batteries Utilizing Product-Process-Resource Asset Network
In a circular economy, products in their end-of-life phase should be either remanufactured or recycled. Both of these processes are crucial for sustainability and environmental conservation. However, manufacturers frequently do not support these processes enough in terms of not sharing relevant data about the products nor their (re-)manufacturing processes. This paper proposes to accompany each product with a digital twin technology, specifically the Product Digital Twin (PDT), which can carry information for facilitating and optimizing production and remanufacturing processes. This paper introduces a knowledge representation called Bi-Flow Product-Process-Resource Asset Network (Bi-PAN). Bi-PAN extends a well-proven Product-Process-Resource Asset Network (PAN) paradigm by integrating both assembly and disassembly workflows into a single information model. Such networks enable capturing relevant relationships across products, production resources, manufacturing processes, and specific production operations that have to be done in the manufacturing phase of a product. The proposed approach is demonstrated in a use-case of disassembling electric vehicle (EV) batteries. By utilizing PDTs with Bi-PAN knowledge models, challenges associated with disassembling of EV batteries can be solved flexibly and efficiently for various battery types, enhancing the sustainability of the EV battery life-cycle management.
comment: \copyright 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Manipulation of Elasto-Flexible Cables with Single or Multiple UAVs
This work considers a large class of systems composed of multiple quadrotors manipulating deformable and extensible cables. The cable is described via a discretized representation, which decomposes it into linear springs interconnected through lumped-mass passive spherical joints. Sets of flat outputs are found for the systems. Numerical simulations support the findings by showing cable manipulation relying on flatness-based trajectories. Eventually, we present an experimental validation of the effectiveness of the proposed discretized cable model for a two-robot example. Moreover, a closed-loop controller based on the identified model and using cable-output feedback is experimentally tested.
Multiple input tangential interpolation-driven damage detection of a jet trainer aircraft
The problem of damage detection and identification is of interest for many aerospace and aeronautical engineering systems. However, relevant literature mostly focuses on subsystems and parts, rather than full airframes. In structural dynamics, modal parameters, such as natural frequencies and mode shapes, from any structure are the main building blocks of vibration-based damage detection. However, traditional comparisons of these parameters are often ambiguous in complex systems, complicating damage detection and assessment. The modified total modal assurance criterion (MTMAC), an index well-known in the field of finite element model updating, is extended to address this challenge and is proposed as an index for damage identification and severity assessment. To support the requirement for precise and robust modal identification of Structural Health Monitoring (SHM), the improved Loewner Framework (iLF), known for its reliability and computational performance, is pioneeringly employed within SHM. Since the MTMAC is proposed solely as a damage identification and severity assessment index, the coordinate modal assurance criterion (COMAC), also a well-established tool, but for damage localisation using mode shapes, is used for completeness. The iLF SHM capabilities are validated through comparisons with traditional methods, including least-squares complex exponential and stochastic subspace identification with canonical variate analysis on a numerical case study of a cantilever beam. Furthermore, the MTMAC is validated against the traditional vibration-based approach, which involves directly comparing natural frequencies and mode shapes. Finally, an experimental dataset from a BAE Systems Hawk T1A jet trainer ground vibration test is used to demonstrate the iLF and MTMAC capabilities on a real-life, real-size SHM problem, showing their effectiveness in detecting and assessing damage.
General formulation of an analytic, Lipschitz continuous control allocation for thrust-vectored controlled rigid-bodies
This study introduces a systematic and scalable method for arbitrary rigid-bodies equipped with vectorized thrusters. Two novel solutions are proposed: a closed-form, Lipschitz continuous mapping that ensures smooth actuator orientation references, and a convex optimization formulation capable of handling practical actuator constraints such as thrust saturation and angular rate limits. Both methods leverage the null-space structure of the allocation mapping to perform singularity avoidance while generating sub-optimal yet practical solutions. The effectiveness and generality of the proposed framework are demonstrated through numerical simulations on a 3DOF marine vessel and a 6DOF aerial quadcopter.
Topology optimization of decoupling feeding networks for antenna arrays
Near-field and radiation coupling between nearby radiating elements is unavoidable, and it is considered a limiting factor for applications in wireless communications and active sensing. This article proposes a density-based topology optimization approach to design decoupling networks for such systems. The decoupling networks are designed based on a multi-objective optimization problem with the radiating elements replaced by their time-domain impulse response for efficient computations and to enable the solution of the design problem using gradient-based optimization methods. We use the adjoint-field method to compute the gradients of the optimization objectives. Additionally, nonlinear filters are applied during the optimization procedure to impose minimum-size control on the optimized designs. We demonstrate the concept by designing the decoupling network for a two-element planar antenna array; the antenna is designed in a separate optimization problem. The optimized decoupling networks provide a signal path that destructively interferes with the coupling between the radiating elements while preserving their individual matching to the feeding ports. Compact decoupling networks capable of suppressing the mutual coupling by more than 10 dB between two closely separated planar antennas operating around 2.45 GHz are presented and validated experimentally.
comment: Accepted version of the manuscript published in IEEE Transactions on Antennas and Propagation, 2025. The final version is available at https://doi.org/10.1109/TAP.2025.3621265
A Taylor Series Approach to Correction of Input Errors in Gaussian Process Regression
Gaussian Processes (GPs) are widely recognized as powerful non-parametric models for regression and classification. Traditional GP frameworks predominantly operate under the assumption that the inputs are either accurately known or subject to zero-mean noise. However, several real-world applications such as mobile sensors have imperfect localization, leading to inputs with biased errors. These biases can typically be estimated through measurements collected over time using, for example, Kalman filters. To avoid recomputation of the entire GP model when better estimates of the inputs used in the training data become available, we introduce a technique for updating a trained GP model to incorporate updated estimates of the inputs. By leveraging the differentiability of the mean and covariance functions derived from the squared exponential kernel, a second-order correction algorithm is developed to update the trained GP models. Precomputed Jacobians and Hessians of kernels enable real-time refinement of the mean and covariance predictions. The efficacy of the developed approach is demonstrated using two simulation studies, with error analyses revealing improvements in both predictive accuracy and uncertainty quantification.
comment: Improving the paper with better results and adding experimental results to publish again
Adaptive DRL for IRS Mirror Orientation in Dynamic OWC Networks
Intelligent reflecting surfaces (IRSs) have emerged as a promising solution to mitigate line-of-sight (LoS) blockages and enhance signal coverage in optical wireless communication (OWC) systems with minimal additional power. In this work, we consider a mirror-based IRS to assist a dynamic indoor visible light communication (VLC) environment. We formulate an optimization problem that aims to maximize the sum rate by adjusting the orientation of the IRS mirrors. To enable real-time adaptability, the problem is modelled as a Markov decision process (MDP), and a deep reinforcement learning (DRL) algorithm is developed based on the deterministic policy gradient for real-time mirror-based IRS optimization in dynamic VLC networks. The proposed DRL is employed to optimize mirror orientation toward mobile users under blockage and mobility constraints. Simulation results demonstrate that our proposed DRL algorithm outperforms the conventional deep Q- learning (DQL) algorithm and achieves substantial improvements in sum rate compared to random-orientation IRS configurations
comment: 6 pages, 5 figures
PowerPlots.jl: An Open Source Power Grid Visualization and Data Analysis Framework for Academic Research
Data visualization is essential for developing an understanding of a complex system. The power grid is one of the most complex systems in the world and effective power grid research visualization software must 1) be easy to use, 2) support unique data that may arise in research, and 3) be capable of creating custom figures for publication and presentation. However, no current software addresses all three of these needs. PowerPlots is an open-source data visualization tool for power grids that does address these needs. In addition, several tools created to support this software facilitate the analysis of power grid data by transforming the data into graph topology or data-frame data formats that are more compatible for some analyses. In this work, we use PowerPlots to investigate several case studies that involve exploring power grid data. These case studies demonstrate the valuable insights that are possible when using network visualization and how it can be applied to research applications.
Adaptive Decentralized Queue Disclosure for Impatient Tenants in Edge and Non-terrestrial Systems
We study how queue-state information disclosures affect impatient tenants in multi-tenant edge systems. We propose an information-bulletin strategy in which each queue periodically broadcasts two Markov models. One is a model of steady-state service-rate behavior and the other a model of the queue length inter-change times. Tenants autonomously decide to renege or jockey based on this information. The queues observe tenant responses and adapt service rates via a learned, rule-based predictive policy designed for decentralized, partially-observed, and time-varying environments. We compare this decentralized, information-driven policy to the classical, centralized Markov Decision Process (MDP) hedging-point policy for M/M/2 systems. Numerical experiments quantify the tradeoffs in average delay, impatience and robustness to stale information. Results show that when full, instantaneous state information and stationarity hold, the hedging-point policy yields less impatience but this diminishes as information becomes partial or stale. The rule-based predictive policy on the other hand is more robust to staleness in dispatched information, making it conducive for conditions typical of edge cloud and non-terrestrial deployments.
comment: Accepted by NFV-SDN'25 Doctoral Symposium
Modeling nonuniform energy decay through the modal decomposition of acoustic radiance transfer (MoD-ART)
Modeling late reverberation in real-time interactive applications is a challenging task when multiple sound sources and listeners are present in the same environment. This is especially problematic when the environment is geometrically complex and/or features uneven energy absorption (e.g. coupled volumes), because in such cases the late reverberation is dependent on the sound sources' and listeners' positions, and therefore must be adapted to their movements in real time. We present a novel approach to the task, named modal decomposition of acoustic radiance transfer (MoD-ART), which can handle highly complex scenarios with efficiency. The approach is based on the geometrical acoustics method of acoustic radiance transfer, from which we extract a set of energy decay modes and their positional relationships with sources and listeners. In this paper, we describe the physical and mathematical significance of MoD-ART, highlighting its advantages and applicability to different scenarios. Through an analysis of the method's computational complexity, we show that it compares very favorably with ray-tracing. We also present simulation results showing that MoD-ART can capture multiple decay slopes and flutter echoes.
Model Predictive Inferential Control of Neural State-Space Models for Autonomous Vehicle Motion Planning
Model predictive control (MPC) has proven useful in enabling safe and optimal motion planning for autonomous vehicles. In this paper, we investigate how to achieve MPC-based motion planning when a neural state-space model represents the vehicle dynamics. As the neural state-space model will lead to highly complex, nonlinear and nonconvex optimization landscapes, mainstream gradient-based MPC methods will be computationally too heavy to be a viable solution. In a departure, we propose the idea of model predictive inferential control (MPIC), which seeks to infer the best control decisions from the control objectives and constraints. Following the idea, we convert the MPC problem for motion planning into a Bayesian state estimation problem. Then, we develop a new particle filtering/smoothing approach to perform the estimation. This approach is implemented as banks of unscented Kalman filters/smoothers and offers high sampling efficiency, fast computation, and estimation accuracy. We evaluate the MPIC approach through a simulation study of autonomous driving in different scenarios, along with an exhaustive comparison with gradient-based MPC. The results show that the MPIC approach has considerable computational efficiency, regardless of complex neural network architectures, and shows the capability to solve large-scale MPC problems for neural state-space models.
Systems and Control (EESS)
Analysis of the Geometric Heat Flow Equation: Computing Geodesics in Real-Time with Convergence Guarantees
We present an analysis on the convergence properties of the so-called geometric heat flow equation for computing geodesics (shortest-path~curves) on Riemannian manifolds. Computing geodesics numerically in real-time has become an important capability in several fields, including control and motion planning. The geometric heat flow equation involves solving a parabolic partial differential equation whose solution is a geodesic. In practice, solving this PDE numerically can be done efficiently, and tends to be more numerically stable and exhibit a better rate of convergence compared to numerical optimization. We prove that the geometric heat flow equation is globally exponentially stable in $L_2$ if the curvature of the Riemannian manifold is not too positive, and that asymptotic convergence in $L_2$ is always guaranteed. We also present a pseudospectral method that leverages Chebyshev polynomials to accurately compute geodesics in only a few milliseconds for non-contrived manifolds. Our analysis was verified with our custom pseudospectral method by computing geodesics on common non-Euclidean surfaces, and in feedback for a contraction-based controller with a non-flat metric for a nonlinear system.
Ego-Vision World Model for Humanoid Contact Planning
Enabling humanoid robots to exploit physical contact, rather than simply avoid collisions, is crucial for autonomy in unstructured environments. Traditional optimization-based planners struggle with contact complexity, while on-policy reinforcement learning (RL) is sample-inefficient and has limited multi-task ability. We propose a framework combining a learned world model with sampling-based Model Predictive Control (MPC), trained on a demonstration-free offline dataset to predict future outcomes in a compressed latent space. To address sparse contact rewards and sensor noise, the MPC uses a learned surrogate value function for dense, robust planning. Our single, scalable model supports contact-aware tasks, including wall support after perturbation, blocking incoming objects, and traversing height-limited arches, with improved data efficiency and multi-task capability over on-policy RL. Deployed on a physical humanoid, our system achieves robust, real-time contact planning from proprioception and ego-centric depth images. Website: https://ego-vcp.github.io/
Smooth Spatiotemporal Tube Synthesis for Prescribed-Time Reach-Avoid-Stay Control
In this work, we address the issue of controller synthesis for a control-affine nonlinear system to meet prescribed time reach-avoid-stay specifications. Our goal is to improve upon previous methods based on spatiotemporal tubes (STTs) by eliminating the need for circumvent functions, which often lead to abrupt tube modifications and high control effort. We propose an adaptive framework that constructs smooth STTs around static unsafe sets, enabling continuous avoidance while guiding the system toward the target within the prescribed time. A closed-form, approximation-free control law is derived to ensure the system trajectory remains within the tube and satisfies the RAS task. The effectiveness of the proposed approach is demonstrated through a case study, showing a significant reduction in control effort compared to prior methods.
IntersectioNDE: Learning Complex Urban Traffic Dynamics based on Interaction Decoupling Strategy SC 2025
Realistic traffic simulation is critical for ensuring the safety and reliability of autonomous vehicles (AVs), especially in complex and diverse urban traffic environments. However, existing data-driven simulators face two key challenges: a limited focus on modeling dense, heterogeneous interactions at urban intersections - which are prevalent, crucial, and practically significant in countries like China, featuring diverse agents including motorized vehicles (MVs), non-motorized vehicles (NMVs), and pedestrians - and the inherent difficulty in robustly learning high-dimensional joint distributions for such high-density scenes, often leading to mode collapse and long-term simulation instability. We introduce City Crossings Dataset (CiCross), a large-scale dataset collected from a real-world urban intersection, uniquely capturing dense, heterogeneous multi-agent interactions, particularly with a substantial proportion of MVs, NMVs and pedestrians. Based on this dataset, we propose IntersectioNDE (Intersection Naturalistic Driving Environment), a data-driven simulator tailored for complex urban intersection scenarios. Its core component is the Interaction Decoupling Strategy (IDS), a training paradigm that learns compositional dynamics from agent subsets, enabling the marginal-to-joint simulation. Integrated into a scene-aware Transformer network with specialized training techniques, IDS significantly enhances simulation robustness and long-term stability for modeling heterogeneous interactions. Experiments on CiCross show that IntersectioNDE outperforms baseline methods in simulation fidelity, stability, and its ability to replicate complex, distribution-level urban traffic dynamics.
comment: Accepted by ITSC 2025
A Physics-Informed Reinforcement Learning Approach for Degradation-Aware Long-Term Charging Optimization in Batteries
Batteries degrade with usage and continuous cycling. This aging is typically reflected through the resistance growth and the capacity fade of battery cells. Over the years, various charging methods have been presented in the literature that proposed current profiles in order to enable optimal, fast, and/or health-conscious charging. However, very few works have attempted to make the ubiquitous Constant Current Constant Voltage (CCCV) charging protocol adaptive to the changing battery health as it cycles. This work aims to address this gap and proposes a framework that optimizes the constant current part of the CCCV protocol adapting to long-term battery degradation. Specifically, a physics-informed Reinforcement Learning (RL) approach has been used that not only estimates a key battery degradation mechanism, namely, Loss of Active Material (LAM), but also adjusts the current magnitude of CCCV as a result of this particular degradation. The proposed framework has been implemented by combining PyBamm, an open-source battery modeling tool, and Stable-baselines where the RL agent was trained using a Proximal Policy Optimization (PPO) network. Simulation results show the potential of the proposed framework for enhancing the widely used CCCV protocol by embedding physics information in RL algorithm. A comparative study of this proposed agent has also been discussed with 2 other charging protocols generated by a non-physics-based RL agent and a constant CCCV for all the cycles.
Constraint-Aware Reinforcement Learning via Adaptive Action Scaling
Safe reinforcement learning (RL) seeks to mitigate unsafe behaviors that arise from exploration during training by reducing constraint violations while maintaining task performance. Existing approaches typically rely on a single policy to jointly optimize reward and safety, which can cause instability due to conflicting objectives, or they use external safety filters that override actions and require prior system knowledge. In this paper, we propose a modular cost-aware regulator that scales the agent's actions based on predicted constraint violations, preserving exploration through smooth action modulation rather than overriding the policy. The regulator is trained to minimize constraint violations while avoiding degenerate suppression of actions. Our approach integrates seamlessly with off-policy RL methods such as SAC and TD3, and achieves state-of-the-art return-to-cost ratios on Safety Gym locomotion tasks with sparse costs, reducing constraint violations by up to 126 times while increasing returns by over an order of magnitude compared to prior methods.
The Role of Flexible Connection in Accelerating Load Interconnection in Distribution Networks
This paper investigates the role of flexible connection in accelerating the interconnection of large loads amid rising electricity demand from data centers and electrification. Flexible connection allows new loads to defer or curtail consumption during rare, grid-constrained periods, enabling faster access without major infrastructure upgrades. To quantify how flexible connection unlocks load hosting capacity, we formulate a flexibility-aware hosting capacity analysis problem that explicitly limits the number of utility-controlled interventions per year, ensuring infrequent disruption. Efficient solution methods are developed for this nonconvex problem and applied to real load data and test feeders. Empirical results reveal that modest flexibility, i.e., few interventions with small curtailments or delays, can unlock substantial hosting capacity. Theoretical analysis further explains and generalizes these findings, highlighting the broad potential of flexible connection.
A Faster and More Reliable Middleware for Autonomous Driving Systems
Ensuring safety in high-speed autonomous vehicles requires rapid control loops and tightly bounded delays from perception to actuation. Many open-source autonomy systems rely on ROS 2 middleware; when multiple sensor and control nodes share one compute unit, ROS 2 and its DDS transports add significant (de)serialization, copying, and discovery overheads, shrinking the available time budget. We present Sensor-in-Memory (SIM), a shared-memory transport designed for intra-host pipelines in autonomous vehicles. SIM keeps sensor data in native memory layouts (e.g., cv::Mat, PCL), uses lock-free bounded double buffers that overwrite old data to prioritize freshness, and integrates into ROS 2 nodes with four lines of code. Unlike traditional middleware, SIM operates beside ROS 2 and is optimized for applications where data freshness and minimal latency outweigh guaranteed completeness. SIM provides sequence numbers, a writer heartbeat, and optional checksums to ensure ordering, liveness, and basic integrity. On an NVIDIA Jetson Orin Nano, SIM reduces data-transport latency by up to 98% compared to ROS 2 zero-copy transports such as FastRTPS and Zenoh, lowers mean latency by about 95%, and narrows 95th/99th-percentile tail latencies by around 96%. In tests on a production-ready Level 4 vehicle running Autoware.Universe, SIM increased localization frequency from 7.5 Hz to 9.5 Hz. Applied across all latency-critical modules, SIM cut average perception-to-decision latency from 521.91 ms to 290.26 ms, reducing emergency braking distance at 40 mph (64 km/h) on dry concrete by 13.6 ft (4.14 m).
comment: 8 pages,7 figures, 8 tables
Trajectory control of a suspended load with non-stopping flying carriers
This paper presents the first closed-loop control framework for cooperative payload transportation with non-stopping flying carriers. Building upon grasp-matrix formulations and internal force redundancy, we propose a feedback wrench controller that actively regulates the payload's pose while an optimization layer dynamically shapes internal-force oscillations to guarantee persistent carrier motion. Preliminary experimental results on multirotor UAVs validate the model assumptions, and numerical simulations demonstrate that the method successfully prevents carrier stagnation, achieves accurate load tracking, and generates physically feasible trajectories with smooth velocity profiles. The proposed framework not only advances the state of the art but also offers a reliable, versatile solution for future real-world applications requiring load transportation by coordinated non-stopping flying carriers.
Robust Recovery and Control of Cyber-physical Discrete Event Systems under Actuator Attacks
Critical real-world applications strongly rely on Cyber-physical systems (CPS), but their dependence on communication networks introduces significant security risks, as attackers can exploit vulnerabilities to compromise their integrity and availability. This work explores the topic of cybersecurity in the context of CPS modeled as discrete event systems (DES), focusing on recovery strategies following the detection of cyberattacks. Specifically, we address actuator enablement attacks and propose a method that preserves the system's full valid behavior under normal conditions. Upon detecting an attack, our proposed solution aims to guide the system toward a restricted yet robust behavior, ensuring operational continuity and resilience. Additionally, we introduce a property termed AE-robust recoverability, which characterizes the necessary and sufficient conditions for recovering a system from attacks while preventing further vulnerabilities. Finally, we showcase the proposed solution through a case study based on a manufacturing system.
comment: This work has been accepted for publication in the 64th IEEE Conference on Decision and Control (CDC). The final published version will be available on IEEE Xplore
Robust Closed-Form Control for MIMO Nonlinear Systems under Conflicting Time-Varying Hard and Soft Constraints
This paper introduces a novel robust closed-form control law to handle time-varying hard and soft constraints in uncertain high-relative-degree nonlinear MIMO systems. These constraints represent spatiotemporal specifications in mechanical systems' operational space, with hard constraints ensuring safety-critical requirements and soft constraints encoding performance or task objectives. Initially, all constraints are consolidated into two separate scalar time-varying hard and soft constraint functions, whose positive level sets define feasible regions. A closed-form control law is developed to enforce these constraints using appropriately designed reciprocal barriers and nonlinear transformation functions. When conflicts between hard and soft constraints arise, the control law prioritizes hard constraints by virtually relaxing soft constraints via a dynamic relaxation law. Notably, the proposed control law maintains low complexity by avoiding approximation schemes for coping with system uncertainties. Simulation results confirm the effectiveness of the proposed method.
comment: 18 pages, 6 figures
Data-Driven Estimation of Quadrotor Motor Efficiency via Residual Minimization
A data-driven framework is proposed for online estimation of quadrotor motor efficiency via residual minimization. The problem is formulated as a constrained nonlinear optimization that minimizes trajectory residuals between measured flight data and predictions generated by a quadrotor dynamics model. A sliding-window strategy enables online estimation, and the optimization is efficiently solved using an iteratively reweighted least squares (IRLS) scheme combined with a primal-dual interior-point method, with inequality constraints enforced through a logarithmic barrier function. Robust z-score weighting is employed to reject outliers, which is particularly effective in motor clipping scenarios where the proposed estimator exhibits smaller spikes than an EKF baseline. Compared to traditional filter-based approaches, the batch-mode formulation offers greater flexibility by selectively incorporating informative data segments. This structure is well-suited for onboard implementation, particularly for applications such as fault detection and isolation (FDI), health monitoring, and predictive maintenance in aerial robotic systems. Simulation results under various degradation scenarios demonstrate the accuracy and robustness of the proposed estimator.
High-Order Quarter-Wave Plate Optimization for Linear Birefringence Suppression in Reflective FOCS
Fiber optic current sensors (FOCS) are widely adopted in modern power grids due to high sensitivity, excellent insulation, and strong immunity to electromagnetic interference. This prominence necessitates precise investigation into their error sources and corresponding optimization. This study examines reflective FOCS based on the Faraday effect. A theoretical model is established to simulate phase error caused by linear birefringence from the quarter-wave plate. Conventional methods using circular birefringence are analyzed, revealing inherent limitations. Innovatively, a compensation strategy employing high-order quarter-wave plates is proposed to effectively eliminate linear birefringence effects. This approach significantly enhances the accuracy and practicality of FOCS in precision metrology.
Exponential convergence of multiagent systems with lack of connection
Finding conditions ensuring consensus, i.e. convergence to a common value, for a networked system is of crucial interest, both for theoretical reasons and applications. This goal is harder to achieve when connections between agents are temporarily lost. Here, we prove that known conditions (introduced by Moreau) ensure an exponential convergence to consensus, with explicit rate of convergence. The key result is related to the length of the graph (i.e. the number of connections to reach a common agent): if this is large, then convergence is slow. This general result also provides conditions for convergence of second-order cooperative systems with lack of connections.
Efficient LLM Inference over Heterogeneous Edge Networks with Speculative Decoding
Large language model (LLM) inference at the network edge is a promising serving paradigm that leverages distributed edge resources to run inference near users and enhance privacy. Existing edge-based LLM inference systems typically adopt autoregressive decoding (AD), which only generates one token per forward pass. This iterative process, compounded by the limited computational resources of edge nodes, results in high serving latency and constrains the system's ability to support multiple users under growing demands.To address these challenges, we propose a speculative decoding (SD)-based LLM serving framework that deploys small and large models across heterogeneous edge nodes to collaboratively deliver inference services. Specifically, the small model rapidly generates draft tokens that the large model verifies in parallel, enabling multi-token generation per forward pass and thus reducing serving latency. To improve resource utilization of edge nodes, we incorporate pipeline parallelism to overlap drafting and verification across multiple inference tasks. Based on this framework, we analyze and derive a comprehensive latency model incorporating both communication and inference latency. Then, we formulate a joint optimization problem for speculation length, task batching, and wireless communication resource allocation to minimize total serving latency. To address this problem, we derive the closed-form solutions for wireless communication resource allocation, and develop a dynamic programming algorithm for joint batching and speculation control strategies. Experimental results demonstrate that the proposed framework achieves lower serving latency compared to AD-based serving systems. In addition,the proposed joint optimization method delivers up to 44.9% latency reduction compared to benchmark schemes.
pyspect: An Extensible Toolbox for Automatic Construction of Temporal Logic Trees via Reachability Analysis
In this paper, we present pyspect, a Python toolbox that simplifies the use of reachability analysis for temporal logic problems. Currently, satisfying complex requirements in cyber-physical systems requires significant manual effort and domain expertise to develop the underlying reachability programs. This high development effort limits the broader adoption of reachability analysis for complex verification problems. To address this, pyspect provides a method-agnostic approach to performing reachability analysis for verifying a temporal logic specification via temporal logic trees (TLTs). It enables the specification of complex safety and liveness requirements using high-level logic formulations that are independent of any particular reachability technique or set representation. As a result, pyspect allows for the comparison of different reachability implementations, such as Hamilton-Jacobi and Hybrid Zonotope-based reachability analysis, for the same temporal logic specification. This design separates the concerns of implementation developers (who develop numerical procedures for reachability) and end-users (who write specifications). Through a simple vehicle example, we demonstrate how pyspect simplifies the synthesis of reachability programs, promotes specification reusability, and facilitates side-by-side comparisons of reachability techniques for complex tasks.
comment: To be published in the 64th IEEE Conference on Decision and Control
Edge-to-Cloud Computations-as-a-Service in Software-Defined Energy Networks for Smart Grids
Modern power grids face an acute mismatch between where data is generated and where it can be processed: protection relays, EV (Electric Vehicle) charging, and distributed renewables demand millisecond analytics at the edge, while energy-hungry workloads often sit in distant clouds leading to missed real-time deadlines and wasted power. We address this by proposing, to our knowledge, the first-ever SDEN (Software Defined Energy Network) for CaaS (Computations-as-a-Service) that unifies edge, fog, and cloud compute with 5G URLLC (Ultra-Reliable Low-Latency Communications), SDN (Software Defined Networking), and NFV (Network Functions Virtualization) to co-optimize energy, latency, and reliability end-to-end. Our contributions are threefold: (i) a joint task offloading formulation that couples computation placement with network capacity under explicit URLLC constraints; (ii) a feasibility preserving, lightweight greedy heuristic that scales while closely tracking optimal energy and latency trade-offs; and (iii) a tiered AI (Artificial Intelligence) pipeline-reactive at the edge, predictive in the fog, strategic in the cloud-featuring privacy-preserving, federated GNNs (Graph Neural Networks) for fault detection and microgrid coordination. Unlike prior edge-only or cloud-only schemes, SDEN turns fragmented grid compute into a single, programmable substrate that delivers dependable, energy-aware, real time analytics establishing a first-ever, software defined path to practical, grid-scale CaaS.
Utilizing Bayesian Optimization for Timetable-Independent Railway Junction Performance Determination
The efficiency of railway infrastructure is significantly influenced by the mix of trains that utilize it, as different service types have competing operational requirements. While freight services might require extended service times, passenger services demand more predictable schedules. Traditional methods for addressing long-term traffic assignment problems often rely on fixed-value capacity limitations, determined based on specific assumptions about traffic composition. This paper introduces a methodology for determining timetable-independent capacity within the traffic rate assignment problem, enabling the calculation of junction capacities under dynamic traffic distributions. We solve the underlying non-linear constrained optimization problem maximizing the traffic throughput using Bayesian optimization (BO). This setting combines a known objective function with expensive- to-compute capacity constraints, motivating an adaption of standard BO problems, where objective functions are usually unknown. We tailor the acquisition process in BO to this specific setting and increase performance by incorporating prior knowledge about the shape of the constraint functions into the Gaussian process surrogate model. Our derived approaches are benchmarked on a railway junction near Paris, significantly outperforming fixed traffic composition models and highlighting the benefits of dynamic capacity allocation.
Visible Light Communication for Vehicular Networks: A Tutorial
The advent of the fifth-generation technology promises to bring about more vertical applications and emerging services that include vehicular networks and intelligent transportation systems (ITSs). To achieve their vision of real-time and safetyapplications, vehicular networks rely on short-range to medium-range communications. One emerging technology that aims to provide reliability and high-data rate in short-range communications is the visible light communications (VLC). Due to its remarkable advantages, some studies have recently investigated the integration of VLC in vehicular networks and ITSs. Despite their attractive features, such networks also face several implementation issues. This paper provides an extended tutorial on the implementation of VLC-based vehicular networks. To begin with, we present the implementation characteristics of these systems and discuss some related issues. The underlying system considers a general structure with transmitters, channels, and receivers based on photodetectors and cameras, as well as standardization efforts and types of topologies. In addition, we discuss the impact of the sun and artificial light sources, flickering, dimming, throughput enhancement, uplink security, and mobility on practical implementation. Finally, we highlight some key challenges and potential solutions and provide some directions for future research investigations that could constitute an advancement toward the development of commercial VLC-based vehicular systems.
Establishing assembly-oriented modular product architectures through Design for Assembly enhanced Modular Function Deployment
Modular product design has become a strategic enabler for companies seeking to balance product variety, operational efficiency, and market responsiveness, making the alignment between modular architecture and manufacturing considerations increasingly critical. Modular Function Deployment (MFD) is a widely adopted method for defining modular product architectures, yet it lacks systematic support for assembly considerations during early concept and system-level development. This limitation increases the risk of delayed production ramp-up and lifecycle inefficiencies. This paper proposes a set of enhancements to MFD that integrate Design for Assembly (DFA) logic into architectural synthesis. The extended method introduces structured heuristics, assembly-oriented module drivers, a coded interface taxonomy, and quantitative metrics for assessing assembly feasibility and automation readiness. These additions preserve compatibility with standard MFD workflows while enriching decision-making with traceable, production-informed reasoning. An illustrative case study involving a handheld leaf blower demonstrates the method's usability and effectiveness. The redesigned architecture shows reduced assembly effort, simplified interfaces, and increased automation potential. By supporting early-stage evaluation of architectural alternatives through an assembly lens, the method enables faster transition to efficient volume production and provides a foundation for continuous improvement throughout the product lifecycle.
PhysHSI: Towards a Real-World Generalizable and Natural Humanoid-Scene Interaction System
Deploying humanoid robots to interact with real-world environments--such as carrying objects or sitting on chairs--requires generalizable, lifelike motions and robust scene perception. Although prior approaches have advanced each capability individually, combining them in a unified system is still an ongoing challenge. In this work, we present a physical-world humanoid-scene interaction system, PhysHSI, that enables humanoids to autonomously perform diverse interaction tasks while maintaining natural and lifelike behaviors. PhysHSI comprises a simulation training pipeline and a real-world deployment system. In simulation, we adopt adversarial motion prior-based policy learning to imitate natural humanoid-scene interaction data across diverse scenarios, achieving both generalization and lifelike behaviors. For real-world deployment, we introduce a coarse-to-fine object localization module that combines LiDAR and camera inputs to provide continuous and robust scene perception. We validate PhysHSI on four representative interactive tasks--box carrying, sitting, lying, and standing up--in both simulation and real-world settings, demonstrating consistently high success rates, strong generalization across diverse task goals, and natural motion patterns.
comment: Project website: https://why618188.github.io/physhsi/
Optimal Multi-Modal Transportation and Electric Power Flow: The Value of Coordinated Dynamic Operation
The electrification of transportation represents a critical challenge in the global transition toward net-zero emissions, as the sector often accounts for more than one-quarter of national energy consumption. Achieving this transformation requires not only widespread adoption of electric vehicles (EVs) but also their seamless integration into interdependent infrastructure systems-specifically, the transportation-electricity nexus (TEN). This paper develops an optimal multi-modal transportation and electric power flow (OMTEPF) model to evaluate the benefits of coordinated, dynamic system operation. Building on recent advances in hetero-functional graph theory, the framework enables joint optimization of five key operational decisions in intelligent TEN management: vehicle dispatch, route choice, charging station queuing, coordinated charging, and vehicle-to-grid stabilization. The mesoscopic, dynamic model explicitly represents individual EVs and their state-of-charge trajectories, thereby extending beyond the prevailing literature's focus on static, macroscopic traffic assignment. It further captures the full scope of the TEN as a system-of-systems, incorporating five distinct charging modalities: private residential, private commercial, wired public commercial, inductive public, and discharging. On the power system side, an IV-ACOPF formulation ensures globally optimal solutions to the electrical subproblems. Comparative analysis demonstrates the substantial value of coordinated TEN operation relative to the status quo of siloed, uncoordinated infrastructure management. This work provides both a novel methodological contribution and actionable insights for the co-design and operation of next-generation sustainable mobility-energy systems.
comment: 31 pages, 9 figures
Observability and parameter estimation of a generic model for aggregated distributed energy resources
We propose a novel framework for estimating the parameters of an aggregated distributed energy resources (der_a) model. First, we introduce a rigorous method to determine whether all model parameters are estimable. When they are not, our approach identifies the subset of parameters that can be estimated. The proposed framework offers new insights into the number and specific parameters that can be reliably estimated based on commonly available measurements. It also highlights the limitations of calibrating such models. Second, we introduce a Kalman filtering method to calibrate the der_a model. Since we account for nonlinear effects such as saturation and deadbands, we develop a specific mechanism to handle smoothing functions within the Kalman filter. Specifically, we consider the extended and the unscented Kalman filter. We demonstrate the effectiveness of the proposed framework on a modified IEEE 34-node distribution feeder with inverter-based resources. Our findings align with the North American Electric Reliability Corporation's parameterization guideline and underscore the importance of model calibration in accurately capturing the collective dynamics of distributed energy resources installed on distribution systems.
Quantum Deception: Honey-X Deception using Quantum Games
In this paper, we develop a framework for deception in quantum games, extending the Honey-X paradigm from classical zero-sum settings into the quantum domain. Building on a view of deception in classical games as manipulation of a player's perception of the payoff matrix, we formalize quantum deception as controlled perturbations of the payoff Hamiltonian subject to a deception budget. We show that when victims are aware of possible deception, their equilibrium strategies surprisingly coincide with those of naive victims who fully trust the deceptive Hamiltonian. This equivalence allows us to cast quantum deception as a bilevel optimization problem, which can be reformulated into a bilinear semidefinite program. To illustrate the framework, we present simulations on quantum versions of the Penny Flip game, demonstrating how quantum strategy spaces and non-classical payoffs can amplify the impact of deception relative to classical formulations.
comment: Submitted to 2026 American Control Conference (ACC), New Orleans, LA
Coherent Load Profile Synthesis with Conditional Diffusion for LV Distribution Network Scenario Generation
Limited visibility of power distribution network power flows at the low voltage level presents challenges to both distribution network operators from a planning perspective and distribution system operators from a congestion management perspective. Forestalling these challenges through scenario analysis is confounded by the lack of realistic and coherent load data across representative distribution feeders. Load profiling approaches often rely on summarising demand through typical profiles, which oversimplifies the complexity of substation-level operations and limits their applicability in specific power system studies. Sampling methods, and more recently generative models, have attempted to address this through synthesising representative loads from historical exemplars; however, while these approaches can approximate load shapes to a convincing degree of fidelity, the co-behaviour between substations, which ultimately impacts higher voltage level network operation, is often overlooked. This limitation will become even more pronounced with the increasing integration of low-carbon technologies, as estimates of base loads fail to capture load diversity. To address this gap, a Conditional Diffusion model for synthesising daily active and reactive power profiles at the low voltage distribution substation level is proposed. The evaluation of fidelity is demonstrated through conventional metrics capturing temporal and statistical realism, as well as power flow modelling. The results show synthesised load profiles are plausible both independently and as a cohort in a wider power systems context. The Conditional Diffusion model is benchmarked against both naive and state-of-the-art models to demonstrate its effectiveness in producing realistic scenarios on which to base sub-regional power distribution network planning and operations.
ORN-CBF: Learning Observation-conditioned Residual Neural Control Barrier Functions via Hypernetworks
Control barrier functions (CBFs) have been demonstrated as an effective method for safety-critical control of autonomous systems. Although CBFs are simple to deploy, their design remains challenging, motivating the development of learning-based approaches. Yet, issues such as suboptimal safe sets, applicability in partially observable environments, and lack of rigorous safety guarantees persist. In this work, we propose observation-conditioned neural CBFs based on Hamilton-Jacobi (HJ) reachability analysis, which approximately recover the maximal safe sets. We exploit certain mathematical properties of the HJ value function, ensuring that the predicted safe set never intersects with the observed failure set. Moreover, we leverage a hypernetwork-based architecture that is particularly suitable for the design of observation-conditioned safety filters. The proposed method is examined both in simulation and hardware experiments for a ground robot and a quadcopter. The results show improved success rates and generalization to out-of-domain environments compared to the baselines.
Learning Satellite Attitude Dynamics with Physics-Informed Normalising Flow
Attitude control is a fundamental aspect of spacecraft operations. Model Predictive Control (MPC) has emerged as a powerful strategy for these tasks, relying on accurate models of the system dynamics to optimize control actions over a prediction horizon. In scenarios where physics models are incomplete, difficult to derive, or computationally expensive, machine learning offers a flexible alternative by learning the system behavior directly from data. However, purely data-driven models often struggle with generalization and stability, especially when applied to inputs outside their training domain. To address these limitations, we investigate the benefits of incorporating Physics-Informed Neural Networks (PINNs) into the learning of spacecraft attitude dynamics, comparing their performance with that of purely data-driven approaches. Using a Real-valued Non-Volume Preserving (Real NVP) neural network architecture with a self-attention mechanism, we trained several models on simulated data generated with the Basilisk simulator. Two training strategies were considered: a purely data-driven baseline and a physics-informed variant to improve robustness and stability. Our results demonstrate that the inclusion of physics-based information significantly enhances the performance in terms of the mean relative error of the best architectures found by 27.08%. These advantages are particularly evident when the learned models are integrated into an MPC framework, where PINN-based models consistently outperform their purely data-driven counterparts in terms of control accuracy and robustness, yielding improvements of up to 42.86% in performance stability error and increased robustness-to-noise.
A Unified Framework for Innovation-based Stochastic and Deterministic Event Triggers
Resources such as bandwidth and energy are limited in many wireless communications use cases, especially when large numbers of sensors and fusion centers need to exchange information frequently. One opportunity to overcome resource constraints is the use of event-based transmissions and estimation to transmit only information that contributes significantly to the reconstruction of the system's state. The design of efficient triggering policies and estimators is crucial for successful event-based transmissions. While previously deterministic and stochastic event triggering policies have been treated separately, this paper unifies the two approaches and gives insights into the design of consistent trigger-matching estimators. Two different estimators are presented, and different pairs of triggers and estimators are evaluated through simulation studies.
comment: 8 pages, 5 figures, presented at FUSION 2025
Product Digital Twin Supporting End-of-life Phase of Electric Vehicle Batteries Utilizing Product-Process-Resource Asset Network
In a circular economy, products in their end-of-life phase should be either remanufactured or recycled. Both of these processes are crucial for sustainability and environmental conservation. However, manufacturers frequently do not support these processes enough in terms of not sharing relevant data about the products nor their (re-)manufacturing processes. This paper proposes to accompany each product with a digital twin technology, specifically the Product Digital Twin (PDT), which can carry information for facilitating and optimizing production and remanufacturing processes. This paper introduces a knowledge representation called Bi-Flow Product-Process-Resource Asset Network (Bi-PAN). Bi-PAN extends a well-proven Product-Process-Resource Asset Network (PAN) paradigm by integrating both assembly and disassembly workflows into a single information model. Such networks enable capturing relevant relationships across products, production resources, manufacturing processes, and specific production operations that have to be done in the manufacturing phase of a product. The proposed approach is demonstrated in a use-case of disassembling electric vehicle (EV) batteries. By utilizing PDTs with Bi-PAN knowledge models, challenges associated with disassembling of EV batteries can be solved flexibly and efficiently for various battery types, enhancing the sustainability of the EV battery life-cycle management.
comment: \copyright 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Manipulation of Elasto-Flexible Cables with Single or Multiple UAVs
This work considers a large class of systems composed of multiple quadrotors manipulating deformable and extensible cables. The cable is described via a discretized representation, which decomposes it into linear springs interconnected through lumped-mass passive spherical joints. Sets of flat outputs are found for the systems. Numerical simulations support the findings by showing cable manipulation relying on flatness-based trajectories. Eventually, we present an experimental validation of the effectiveness of the proposed discretized cable model for a two-robot example. Moreover, a closed-loop controller based on the identified model and using cable-output feedback is experimentally tested.
Multiple input tangential interpolation-driven damage detection of a jet trainer aircraft
The problem of damage detection and identification is of interest for many aerospace and aeronautical engineering systems. However, relevant literature mostly focuses on subsystems and parts, rather than full airframes. In structural dynamics, modal parameters, such as natural frequencies and mode shapes, from any structure are the main building blocks of vibration-based damage detection. However, traditional comparisons of these parameters are often ambiguous in complex systems, complicating damage detection and assessment. The modified total modal assurance criterion (MTMAC), an index well-known in the field of finite element model updating, is extended to address this challenge and is proposed as an index for damage identification and severity assessment. To support the requirement for precise and robust modal identification of Structural Health Monitoring (SHM), the improved Loewner Framework (iLF), known for its reliability and computational performance, is pioneeringly employed within SHM. Since the MTMAC is proposed solely as a damage identification and severity assessment index, the coordinate modal assurance criterion (COMAC), also a well-established tool, but for damage localisation using mode shapes, is used for completeness. The iLF SHM capabilities are validated through comparisons with traditional methods, including least-squares complex exponential and stochastic subspace identification with canonical variate analysis on a numerical case study of a cantilever beam. Furthermore, the MTMAC is validated against the traditional vibration-based approach, which involves directly comparing natural frequencies and mode shapes. Finally, an experimental dataset from a BAE Systems Hawk T1A jet trainer ground vibration test is used to demonstrate the iLF and MTMAC capabilities on a real-life, real-size SHM problem, showing their effectiveness in detecting and assessing damage.
General formulation of an analytic, Lipschitz continuous control allocation for thrust-vectored controlled rigid-bodies
This study introduces a systematic and scalable method for arbitrary rigid-bodies equipped with vectorized thrusters. Two novel solutions are proposed: a closed-form, Lipschitz continuous mapping that ensures smooth actuator orientation references, and a convex optimization formulation capable of handling practical actuator constraints such as thrust saturation and angular rate limits. Both methods leverage the null-space structure of the allocation mapping to perform singularity avoidance while generating sub-optimal yet practical solutions. The effectiveness and generality of the proposed framework are demonstrated through numerical simulations on a 3DOF marine vessel and a 6DOF aerial quadcopter.
Topology optimization of decoupling feeding networks for antenna arrays
Near-field and radiation coupling between nearby radiating elements is unavoidable, and it is considered a limiting factor for applications in wireless communications and active sensing. This article proposes a density-based topology optimization approach to design decoupling networks for such systems. The decoupling networks are designed based on a multi-objective optimization problem with the radiating elements replaced by their time-domain impulse response for efficient computations and to enable the solution of the design problem using gradient-based optimization methods. We use the adjoint-field method to compute the gradients of the optimization objectives. Additionally, nonlinear filters are applied during the optimization procedure to impose minimum-size control on the optimized designs. We demonstrate the concept by designing the decoupling network for a two-element planar antenna array; the antenna is designed in a separate optimization problem. The optimized decoupling networks provide a signal path that destructively interferes with the coupling between the radiating elements while preserving their individual matching to the feeding ports. Compact decoupling networks capable of suppressing the mutual coupling by more than 10 dB between two closely separated planar antennas operating around 2.45 GHz are presented and validated experimentally.
comment: Accepted version of the manuscript published in IEEE Transactions on Antennas and Propagation, 2025. The final version is available at https://doi.org/10.1109/TAP.2025.3621265
A Taylor Series Approach to Correction of Input Errors in Gaussian Process Regression
Gaussian Processes (GPs) are widely recognized as powerful non-parametric models for regression and classification. Traditional GP frameworks predominantly operate under the assumption that the inputs are either accurately known or subject to zero-mean noise. However, several real-world applications such as mobile sensors have imperfect localization, leading to inputs with biased errors. These biases can typically be estimated through measurements collected over time using, for example, Kalman filters. To avoid recomputation of the entire GP model when better estimates of the inputs used in the training data become available, we introduce a technique for updating a trained GP model to incorporate updated estimates of the inputs. By leveraging the differentiability of the mean and covariance functions derived from the squared exponential kernel, a second-order correction algorithm is developed to update the trained GP models. Precomputed Jacobians and Hessians of kernels enable real-time refinement of the mean and covariance predictions. The efficacy of the developed approach is demonstrated using two simulation studies, with error analyses revealing improvements in both predictive accuracy and uncertainty quantification.
comment: Improving the paper with better results and adding experimental results to publish again
Adaptive DRL for IRS Mirror Orientation in Dynamic OWC Networks
Intelligent reflecting surfaces (IRSs) have emerged as a promising solution to mitigate line-of-sight (LoS) blockages and enhance signal coverage in optical wireless communication (OWC) systems with minimal additional power. In this work, we consider a mirror-based IRS to assist a dynamic indoor visible light communication (VLC) environment. We formulate an optimization problem that aims to maximize the sum rate by adjusting the orientation of the IRS mirrors. To enable real-time adaptability, the problem is modelled as a Markov decision process (MDP), and a deep reinforcement learning (DRL) algorithm is developed based on the deterministic policy gradient for real-time mirror-based IRS optimization in dynamic VLC networks. The proposed DRL is employed to optimize mirror orientation toward mobile users under blockage and mobility constraints. Simulation results demonstrate that our proposed DRL algorithm outperforms the conventional deep Q- learning (DQL) algorithm and achieves substantial improvements in sum rate compared to random-orientation IRS configurations
comment: 6 pages, 5 figures
PowerPlots.jl: An Open Source Power Grid Visualization and Data Analysis Framework for Academic Research
Data visualization is essential for developing an understanding of a complex system. The power grid is one of the most complex systems in the world and effective power grid research visualization software must 1) be easy to use, 2) support unique data that may arise in research, and 3) be capable of creating custom figures for publication and presentation. However, no current software addresses all three of these needs. PowerPlots is an open-source data visualization tool for power grids that does address these needs. In addition, several tools created to support this software facilitate the analysis of power grid data by transforming the data into graph topology or data-frame data formats that are more compatible for some analyses. In this work, we use PowerPlots to investigate several case studies that involve exploring power grid data. These case studies demonstrate the valuable insights that are possible when using network visualization and how it can be applied to research applications.
Adaptive Decentralized Queue Disclosure for Impatient Tenants in Edge and Non-terrestrial Systems
We study how queue-state information disclosures affect impatient tenants in multi-tenant edge systems. We propose an information-bulletin strategy in which each queue periodically broadcasts two Markov models. One is a model of steady-state service-rate behavior and the other a model of the queue length inter-change times. Tenants autonomously decide to renege or jockey based on this information. The queues observe tenant responses and adapt service rates via a learned, rule-based predictive policy designed for decentralized, partially-observed, and time-varying environments. We compare this decentralized, information-driven policy to the classical, centralized Markov Decision Process (MDP) hedging-point policy for M/M/2 systems. Numerical experiments quantify the tradeoffs in average delay, impatience and robustness to stale information. Results show that when full, instantaneous state information and stationarity hold, the hedging-point policy yields less impatience but this diminishes as information becomes partial or stale. The rule-based predictive policy on the other hand is more robust to staleness in dispatched information, making it conducive for conditions typical of edge cloud and non-terrestrial deployments.
comment: Accepted by NFV-SDN'25 Doctoral Symposium
Modeling nonuniform energy decay through the modal decomposition of acoustic radiance transfer (MoD-ART)
Modeling late reverberation in real-time interactive applications is a challenging task when multiple sound sources and listeners are present in the same environment. This is especially problematic when the environment is geometrically complex and/or features uneven energy absorption (e.g. coupled volumes), because in such cases the late reverberation is dependent on the sound sources' and listeners' positions, and therefore must be adapted to their movements in real time. We present a novel approach to the task, named modal decomposition of acoustic radiance transfer (MoD-ART), which can handle highly complex scenarios with efficiency. The approach is based on the geometrical acoustics method of acoustic radiance transfer, from which we extract a set of energy decay modes and their positional relationships with sources and listeners. In this paper, we describe the physical and mathematical significance of MoD-ART, highlighting its advantages and applicability to different scenarios. Through an analysis of the method's computational complexity, we show that it compares very favorably with ray-tracing. We also present simulation results showing that MoD-ART can capture multiple decay slopes and flutter echoes.
Model Predictive Inferential Control of Neural State-Space Models for Autonomous Vehicle Motion Planning
Model predictive control (MPC) has proven useful in enabling safe and optimal motion planning for autonomous vehicles. In this paper, we investigate how to achieve MPC-based motion planning when a neural state-space model represents the vehicle dynamics. As the neural state-space model will lead to highly complex, nonlinear and nonconvex optimization landscapes, mainstream gradient-based MPC methods will be computationally too heavy to be a viable solution. In a departure, we propose the idea of model predictive inferential control (MPIC), which seeks to infer the best control decisions from the control objectives and constraints. Following the idea, we convert the MPC problem for motion planning into a Bayesian state estimation problem. Then, we develop a new particle filtering/smoothing approach to perform the estimation. This approach is implemented as banks of unscented Kalman filters/smoothers and offers high sampling efficiency, fast computation, and estimation accuracy. We evaluate the MPIC approach through a simulation study of autonomous driving in different scenarios, along with an exhaustive comparison with gradient-based MPC. The results show that the MPIC approach has considerable computational efficiency, regardless of complex neural network architectures, and shows the capability to solve large-scale MPC problems for neural state-space models.
Multiagent Systems
Fast and the Furious: Hot Starts in Pursuit-Evasion Games AAMAS
Effectively positioning pursuers in pursuit-evasion games without prior knowledge of evader locations remains a significant challenge. A novel approach that combines game-theoretic control theory with Graph Neural Networks is introduced in this work. By conceptualizing pursuer configurations as strategic arrangements and representing them as graphs, a Graph Characteristic Space is constructed via multi-objective optimization to identify Pareto-optimal configurations. A Graph Convolutional Network (GCN) is trained on these Pareto-optimal graphs to generate strategically effective initial configurations, termed "hot starts". Empirical evaluations demonstrate that the GCN-generated hot starts provide a significant advantage over random configurations. In scenarios considering multiple pursuers and evaders, this method hastens the decline in evader survival rates, reduces pursuer travel distances, and enhances containment, showcasing clear strategic benefits.
comment: Presented at AAMAS Workshop on Autonomous Robots and Multirobot Systems (ARMS)
Two-Layer Voronoi Coverage Control for Hybrid Aerial-Ground Robot Teams in Emergency Response: Implementation and Analysis
We present a comprehensive two-layer Voronoi coverage control approach for coordinating hybrid aerial-ground robot teams in hazardous material emergency response scenarios. Traditional Voronoi coverage control methods face three critical limitations in emergency contexts: heterogeneous agent capabilities with vastly different velocities, clustered initial deployment configurations, and urgent time constraints requiring rapid response rather than eventual convergence. Our method addresses these challenges through a decoupled two-layer architecture that separately optimizes aerial and ground robot positioning, with aerial agents delivering ground sensors via airdrop to high-priority locations. We provide detailed implementation of bounded Voronoi cell computation, efficient numerical integration techniques for importance-weighted centroids, and robust control strategies that prevent agent trapping. Simulation results demonstrate an 88% reduction in response time, achieving target sensor coverage (18.5% of initial sensor loss) in 25 seconds compared to 220 seconds for ground-only deployment. Complete implementation code is available at https://github.com/dHutchings/ME292B.
comment: 23 pages, 7 figures. Technical report with complete implementation details and open-source code
HyperAgent: Leveraging Hypergraphs for Topology Optimization in Multi-Agent Communication
Recent advances in large language model-powered multi-agent systems have demonstrated remarkable collective intelligence through effective communication. However, existing approaches face two primary challenges: (i) \textit{Ineffective group collaboration modeling}, as they rely on pairwise edge representations in graph structures, limiting their ability to capture relationships among multiple agents; and (ii) \textit{Limited task-adaptiveness in communication topology design}, leading to excessive communication cost for simple tasks and insufficient coordination for complex scenarios. These issues restrict the scalability and practical deployment of adaptive collaboration frameworks. To address these challenges, we propose \textbf{HyperAgent}, a hypergraph-based framework that optimizes communication topologies and effectively captures group collaboration patterns using direct hyperedge representations. Unlike edge-based approaches, HyperAgent uses hyperedges to link multiple agents within the same subtask and employs hypergraph convolutional layers to achieve one-step information aggregation in collaboration groups. Additionally, it incorporates a variational autoencoder framework with sparsity regularization to dynamically adjust hypergraph topologies based on task complexity. Experiments highlight the superiority of HyperAgent in both performance and efficiency. For instance, on GSM8K, HyperAgent achieves 95.07\% accuracy while reducing token consumption by 25.33\%, demonstrating the potential of hypergraph-based optimization for multi-agent communication.
Multitask Learning with Learned Task Relationships
Classical consensus-based strategies for federated and decentralized learning are statistically suboptimal in the presence of heterogeneous local data or task distributions. As a result, in recent years, there has been growing interest in multitask or personalized strategies, which allow individual agents to benefit from one another in pursuing locally optimal models without enforcing consensus. Existing strategies require either precise prior knowledge of the underlying task relationships or are fully non-parametric and instead rely on meta-learning or proximal constructions. In this work, we introduce an algorithmic framework that strikes a balance between these extremes. By modeling task relationships through a Gaussian Markov Random Field with an unknown precision matrix, we develop a strategy that jointly learns both the task relationships and the local models, allowing agents to self-organize in a way consistent with their individual data distributions. Our theoretical analysis quantifies the quality of the learned relationship, and our numerical experiments demonstrate its practical effectiveness.
RobotFleet: An Open-Source Framework for Centralized Multi-Robot Task Planning
Coordinating heterogeneous robot fleets to achieve multiple goals is challenging in multi-robot systems. We introduce an open-source and extensible framework for centralized multi-robot task planning and scheduling that leverages LLMs to enable fleets of heterogeneous robots to accomplish multiple tasks. RobotFleet provides abstractions for planning, scheduling, and execution across robots deployed as containerized services to simplify fleet scaling and management. The framework maintains a shared declarative world state and two-way communication for task execution and replanning. By modularizing each layer of the autonomy stack and using LLMs for open-world reasoning, RobotFleet lowers the barrier to building scalable multi-robot systems. The code can be found here: https://github.com/therohangupta/robot-fleet.
Multi-Objective Multi-Agent Path Finding with Lexicographic Cost Preferences
Many real-world scenarios require multiple agents to coordinate in shared environments, while balancing trade-offs between multiple, potentially competing objectives. Current multi-objective multi-agent path finding (MO-MAPF) algorithms typically produce conflict-free plans by computing Pareto frontiers. They do not explicitly optimize for user-defined preferences, even when the preferences are available, and scale poorly with the number of objectives. We propose a lexicographic framework for modeling MO-MAPF, along with an algorithm \textit{Lexicographic Conflict-Based Search} (LCBS) that directly computes a single solution aligned with a lexicographic preference over objectives. LCBS integrates a priority-aware low-level $A^*$ search with conflict-based search, avoiding Pareto frontier construction and enabling efficient planning guided by preference over objectives. We provide insights into optimality and scalability, and empirically demonstrate that LCBS computes optimal solutions while scaling to instances with up to ten objectives -- far beyond the limits of existing MO-MAPF methods. Evaluations on standard and randomized MAPF benchmarks show consistently higher success rates against state-of-the-art baselines, especially with increasing number of objectives.
comment: 8 pages, 7 figures
FURINA: A Fully Customizable Role-Playing Benchmark via Scalable Multi-Agent Collaboration Pipeline
As large language models (LLMs) advance in role-playing (RP) tasks, existing benchmarks quickly become obsolete due to their narrow scope, outdated interaction paradigms, and limited adaptability across diverse application scenarios. To address this gap, we introduce FURINA-Builder, a novel multi-agent collaboration pipeline that automatically constructs fully customizable RP benchmarks at any scale. It enables evaluation of arbitrary characters across diverse scenarios and prompt formats, as the first benchmark builder in RP area for adaptable assessment. FURINA-Builder simulates dialogues between a test character and other characters drawn from a well-constructed character-scene pool, while an LLM judge selects fine-grained evaluation dimensions and adjusts the test character's responses into final test utterances. Using this pipeline, we build FURINA-Bench, a new comprehensive role-playing benchmark featuring both established and synthesized test characters, each assessed with dimension-specific evaluation criteria. Human evaluation and preliminary separability analysis justify our pipeline and benchmark design. We conduct extensive evaluations of cutting-edge LLMs and find that o3 and DeepSeek-R1 achieve the best performance on English and Chinese RP tasks, respectively. Across all models, established characters consistently outperform synthesized ones, with reasoning capabilities further amplifying this disparity. Interestingly, we observe that model scale does not monotonically reduce hallucinations. More critically, for reasoning LLMs, we uncover a novel trade-off: reasoning improves RP performance but simultaneously increases RP hallucinations. This trade-off extends to a broader Pareto frontier between RP performance and reliability for all LLMs. These findings demonstrate the effectiveness of FURINA-Builder and the challenge posed by FURINA-Bench.
Fake News in Social Networks
We propose multi-agent reinforcement learning as a new method for modeling fake news in social networks. This method allows us to model human behavior in social networks both in unaccustomed populations and in populations that have adapted to the presence of fake news. In particular the latter is challenging for existing methods. We find that a fake-news attack is more effective if it targets highly connected people and people with weaker private information. Attacks are more effective when the disinformation is spread across several agents than when the disinformation is concentrated with more intensity on fewer agents. Furthermore, fake news spread less well in balanced networks than in clustered networks. We test a part of our findings in a human-subject experiment. The experimental evidence provides support for the predictions from the model, suggesting that the model is suitable to analyze the spread of fake news in social networks.
Systems and Control (CS)
Storage Participation in Electricity Markets: Arbitrage and Ancillary Services
Electricity storage is used for intertemporal price arbitrage and for ancillary services that balance unforeseen supply and demand fluctuations via frequency regulation. We present an optimization model that computes bids for both arbitrage and frequency regulation and ensures that storage operators can honor their market commitments at all times for all fluctuation signals in an uncertainty set inspired by market rules. This requirement, initially expressed by an infinite number of nonconvex functional constraints, is shown to be equivalent to a finite number of deterministic constraints. The resulting formulation is a mixed-integer bilinear program that admits mixed-integer linear relaxations and restrictions. Empirical tests on European electricity markets show a negligible optimality gap between the relaxation and the restriction. The model can account for intraday trading and, with a solution time of under 5 seconds, may serve as a building block for more complex trading strategies. Such strategies become necessary as battery capacity exceeds the demand for ancillary services. In a backtest from 1 July 2020 through 30 June 2024 joint market participation more than doubles profits and almost halves energy storage output compared to arbitrage alone.
Structured identification of multivariable modal systems
Physically interpretable models are essential for next-generation industrial systems, as these representations enable effective control, support design validation, and provide a foundation for monitoring strategies. The aim of this paper is to develop a system identification framework for estimating modal models of complex multivariable mechanical systems from frequency response data. To achieve this, a two-step structured identification algorithm is presented, where an additive model is first estimated using a refined instrumental variable method and subsequently projected onto a modal form. The developed identification method provides accurate, physically-relevant, minimal-order models, for both generally-damped and proportionally damped modal systems. The effectiveness of the proposed method is demonstrated through experimental validation on a prototype wafer-stage system, which features a large number of spatially distributed actuators and sensors and exhibits complex flexible dynamics.
comment: 20 pages, 12 figures
Two-Layer Voronoi Coverage Control for Hybrid Aerial-Ground Robot Teams in Emergency Response: Implementation and Analysis
We present a comprehensive two-layer Voronoi coverage control approach for coordinating hybrid aerial-ground robot teams in hazardous material emergency response scenarios. Traditional Voronoi coverage control methods face three critical limitations in emergency contexts: heterogeneous agent capabilities with vastly different velocities, clustered initial deployment configurations, and urgent time constraints requiring rapid response rather than eventual convergence. Our method addresses these challenges through a decoupled two-layer architecture that separately optimizes aerial and ground robot positioning, with aerial agents delivering ground sensors via airdrop to high-priority locations. We provide detailed implementation of bounded Voronoi cell computation, efficient numerical integration techniques for importance-weighted centroids, and robust control strategies that prevent agent trapping. Simulation results demonstrate an 88% reduction in response time, achieving target sensor coverage (18.5% of initial sensor loss) in 25 seconds compared to 220 seconds for ground-only deployment. Complete implementation code is available at https://github.com/dHutchings/ME292B.
comment: 23 pages, 7 figures. Technical report with complete implementation details and open-source code
GPS Spoofing Attack Detection in Autonomous Vehicles Using Adaptive DBSCAN
As autonomous vehicles become an essential component of modern transportation, they are increasingly vulnerable to threats such as GPS spoofing attacks. This study presents an adaptive detection approach utilizing a dynamically tuned Density Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm, designed to adjust the detection threshold ({\epsilon}) in real-time. The threshold is updated based on the recursive mean and standard deviation of displacement errors between GPS and in-vehicle sensors data, but only at instances classified as non-anomalous. Furthermore, an initial threshold, determined from 120,000 clean data samples, ensures the capability to identify even subtle and gradual GPS spoofing attempts from the beginning. To assess the performance of the proposed method, five different subsets from the real-world Honda Research Institute Driving Dataset (HDD) are selected to simulate both large and small magnitude GPS spoofing attacks. The modified algorithm effectively identifies turn-by-turn, stop, overshoot, and multiple small biased spoofing attacks, achieving detection accuracies of 98.621%, 99.960.1%, 99.880.1%, and 98.380.1%, respectively. This work provides a substantial advancement in enhancing the security and safety of AVs against GPS spoofing threats.
Aggregate Modeling of Air-Conditioner Loads Under Packet-based Control with Both On and Off Grid Access Requests
Coordination of distributed energy resources (DERs) can engender flexibility necessary to improve grid reliability. Packetized Energy Management (PEM) is a method for coordinating DERs, such as thermostatically controlled loads (TCLs) and electric vehicles, within customer quality-of-service (QoS) limits. In PEM, a DER uses local information to offer flexibility by sending a request to the DER coordinator to turn-ON or turn-OFF. Much work has focused on modeling and analyzing aggregations of DERs under PEM with fixed packet durations and only turn-ON requests. Different recent efforts to enable variable packet lengths have shown an increase in available flexibility and ramping capability, but have not been modeled in aggregate, which limits systematic analyses. To address this issue, this paper presents a new aggregate bin-based (macro) model of PEM loads that incorporates both turn-ON and turn-OFF request features, enabling the model to accurately characterize the capability of the fleet of DERs to track a power reference signal, population temperature dynamics, aggregate request rates, and variable packet lengths. Simulation-based validation is performed against an agent-based (micro) model to evaluate robustness and quantify model accuracy. Finally, the distribution of variable packet lengths from macro-model simulations are applied to inform past work on PEM with randomized packet lengths
Transforming Tarlac State University (TSU) Gymnasium to a Nearly Zero-Energy Building through Integration of a Solar Photovoltaic (PV) System
The study is anchored to the principles of Nearly-Zero Energy Building (NZEB). It aimed to transform the Tarlac State University Gymnasium into a facility with energy-efficient equipment to contribute to reducing carbon footprints by integrating a solar PV system as its renewable energy source. The researchers found out that the electrical infrastructure of the Gym was outdated, and the lighting was not energy efficient, and there were too few convenience or power outlets. There was also insufficient cooling equipment to maintain a comfortable temperature. Analysis shows that the payback period is within the average range, making it a cost-effective investment for the University. Aside from the cost of the PV System, adherence to engineering design standards will mean additional costs to replace the metal halides with LED high bay lamps, installation of additional air conditioning units, and provision of additional convenience outlets. These additional costs should be considered when evaluating the feasibility of the project. It is recommended that the integrity of the existing roof system of the Gymnasium be considered. The total cost of putting up the whole electrical system, including new lighting, cooling, and convenience loads, must be calculated to determine the total cost of implementing the whole NZEB project. Other factors in the economic evaluation may be considered to determine a more stringent result.
Decoupled Scaling 4ch Bilateral Control on the Cartesian coordinate by 6-DoF Manipulator using Rotation Matrix
Four-channel bilateral control is a method for achieving remote control with force feedback and adjustment operability by synchronizing the positions and forces of two manipulators. This is expected to significantly improve the operability of the remote control in contact-rich tasks. Among these, 4-channel bilateral control on the Cartesian coordinate system is advantageous owing to its suitability for manipulators with different structures and because it allows the dynamics in the Cartesian coordinate system to be adjusted by adjusting the control parameters, thus achieving intuitive operability for humans. This paper proposes a 4-channel bilateral control method that achieves the desired dynamics by decoupling each dimension in the Cartesian coordinate system regardless of the scaling factor.
comment: 6 pages, 4 figures, Accepted at SAMCON 2025
A Verified High-Performance Composable Object Library for Remote Direct Memory Access (Extended Version)
Remote Direct Memory Access (RDMA) is a memory technology that allows remote devices to directly write to and read from each other's memory, bypassing components such as the CPU and operating system. This enables low-latency high-throughput networking, as required for many modern data centres, HPC applications and AI/ML workloads. However, baseline RDMA comprises a highly permissive weak memory model that is difficult to use in practice and has only recently been formalised. In this paper, we introduce the Library of Composable Objects (LOCO), a formally verified library for building multi-node objects on RDMA, filling the gap between shared memory and distributed system programming. LOCO objects are well-encapsulated and take advantage of the strong locality and the weak consistency characteristics of RDMA. They have performance comparable to custom RDMA systems (e.g. distributed maps), but with a far simpler programming model amenable to formal proofs of correctness. To support verification, we develop a novel modular declarative verification framework, called Mowgli, that is flexible enough to model multinode objects and is independent of a memory consistency model. We instantiate Mowgli with the RDMA memory model, and use it to verify correctness of LOCO libraries.
Galilean Symmetry in Robotics
Galilean symmetry is the natural symmetry of inertial motion that underpins Newtonian physics. Although rigid-body symmetry is one of the most established and fundamental tools in robotics, there appears to be no comparable treatment of Galilean symmetry for a robotics audience. In this paper, we present a robotics-tailored exposition of Galilean symmetry that leverages the community's familiarity with and understanding of rigid-body transformations and pose representations. Our approach contrasts with common treatments in the physics literature that introduce Galilean symmetry as a stepping stone to Einstein's relativity. A key insight is that the Galilean matrix Lie group can be used to describe two different pose representations, Galilean frames, that use inertial velocity in the state definition, and extended poses, that use coordinate velocity. We provide three examples where applying the Galilean matrix Lie-group algebra to robotics problems is straightforward and yields significant insights: inertial navigation above the rotating Earth, manipulator kinematics, and sensor data fusion under temporal uncertainty. We believe that the time is right for the robotics community to benefit from rediscovering and extending this classical material and applying it to modern problems.
comment: Under Review
Towards Dynamic Quadrupedal Gaits: A Symmetry-Guided RL Hierarchy Enables Free Gait Transitions at Varying Speeds
Quadrupedal robots exhibit a wide range of viable gaits, but generating specific footfall sequences often requires laborious expert tuning of numerous variables, such as touch-down and lift-off events and holonomic constraints for each leg. This paper presents a unified reinforcement learning framework for generating versatile quadrupedal gaits by leveraging the intrinsic symmetries and velocity-period relationship of dynamic legged systems. We propose a symmetry-guided reward function design that incorporates temporal, morphological, and time-reversal symmetries. By focusing on preserved symmetries and natural dynamics, our approach eliminates the need for predefined trajectories, enabling smooth transitions between diverse locomotion patterns such as trotting, bounding, half-bounding, and galloping. Implemented on the Unitree Go2 robot, our method demonstrates robust performance across a range of speeds in both simulations and hardware tests, significantly improving gait adaptability without extensive reward tuning or explicit foot placement control. This work provides insights into dynamic locomotion strategies and underscores the crucial role of symmetries in robotic gait design.
Controller for Incremental Input-to-State Practical Stabilization of Partially Unknown systems with Invariance Guarantees
Incremental stability is a property of dynamical systems that ensures the convergence of trajectories with respect to each other rather than a fixed equilibrium point or a fixed trajectory. In this paper, we introduce a related stability notion called incremental input-to-state practical stability ({\delta}-ISpS), ensuring safety guarantees. We also present a feedback linearization based control design scheme that renders a partially unknown system incrementally input-to-state practically stable and safe with formal guarantees. To deal with the unknown dynamics, we utilize Gaussian process regression to approximate the model. Finally, we implement the controller synthesized by the proposed scheme on a manipulator example
comment: 2 figures,9 pages
Risk-Budgeted Control Framework for Balanced Performance and Safety in Autonomous Vehicles
This paper presents a risk-budgeted monitor with a control framework that certifies safety for autonomous driving. In this process, a sliding window is proposed to monitor for insufficient barrier residuals or nonzero tail risk, ensuring system safety. When the safety margin deteriorates, it triggers switching the safety constraint from a performance-based relaxed-control barrier function (R-CBF) to a conservative conditional value at risk (CVaR-CBF) to address the safety concern. This switching is governed by two real-time triggers: Feasibility-Triggered (FT) and Quality-Triggered (QT) conditions. In the FT condition, if the R-CBF constraint becomes infeasible or yields a suboptimal solution, the risk monitor triggers the use of the CVaR constraints for the controller. In the QT condition, the risk monitor observes the safety margin of the R-CBF solution at every step, regardless of feasibility. If it falls below the safety margin, the safety filter switches to the CVaR-CBF constraints. The proposed framework is evaluated using a model predictive controller (MPC) for autonomous driving in the presence of autonomous vehicle (AV) localization noise and obstacle position uncertainties. Multiple AV-pedestrian interaction scenarios are considered, with 1,500 Monte Carlo runs conducted for all scenarios. In the most challenging setting with pedestrian detection uncertainty of 5 m, the proposed framework achieves a 94-96% success rate of not colliding with the pedestrians over 300 trials while maintaining the lowest mean cross-track error (CTE = 3.2-3.6 m) to the reference path. The reduced CTE indicates faster trajectory recovery after obstacle avoidance, demonstrating a balance between safety and performance.
Discovering interpretable piecewise nonlinear model predictive control laws via symbolic decision trees
In this paper, we propose symbolic decision trees as surrogate models for approximating model predictive control laws. The proposed approach learns simultaneously the partition of the input domain (splitting logic) as well as local nonlinear expressions for predicting the control action leading to interpretable piecewise nonlinear control laws. The local nonlinear expressions are determined by the learning problem and are modeled using a set of basis functions. The learning task is posed as a mixed integer optimization, which is solved to global optimality with state-of-the-art global optimization solvers. We apply the proposed approach to a case study regarding the control of an isothermal reactor. The results show that the proposed approach can learn the control law accurately, leading to closed-loop performance comparable to that of a standard model predictive controller. Finally, comparison with existing interpretable models shows that the symbolic trees achieve both lower prediction error and superior closed-loop performance.
MicroRoboScope: A Portable and Integrated Mechatronic Platform for Magnetic and Acoustic Microrobotic Experimentation
This paper presents MicroRoboScope, a portable, compact, and versatile microrobotic experimentation platform designed for real-time, closed-loop control of both magnetic and acoustic microrobots. The system integrates an embedded computer, microscope, power supplies, and control circuitry into a single, low-cost and fully integrated apparatus. Custom control software developed in Python and Arduino C++ handles live video acquisition, microrobot tracking, and generation of control signals for electromagnetic coils and acoustic transducers. The platform's multi-modal actuation, accessibility, and portability make it suitable not only for specialized research laboratories but also for educational and outreach settings. By lowering the barrier to entry for microrobotic experimentation, this system enables new opportunities for research, education, and translational applications in biomedicine, tissue engineering, and robotics.
Optimal Voltage Control Using Online Exponential Barrier Method
This paper address the optimal voltage control problem of distribution systems with high penetration of inverter-based renewable energy resources, under inaccurate model information. We propose the online exponential barrier method that explicitly leverages the online feedback from grids to enhance the robustness to model inaccuracy and incorporates the voltage constraints to maintain the safety requirements. We provide analytical results on the optimal barrier parameter selection and sufficient conditions for the safety guarantee of converged voltages. We also establish theoretical results on the exponential convergence rate with proper step-size. The effectiveness of the proposed framework is validated on a 56-bus radial network, where we significantly improve the robustness against model inaccuracy compared to existing methods.
comment: Restate the theorem for readability
DUST: A Framework for Data-Driven Density Steering
We consider the problem of data-driven stochastic optimal control of an unknown LTI dynamical system. Assuming the process noise is normally distributed, we pose the problem of steering the state's mean and covariance to a target normal distribution, under noisy data collected from the underlying system, a problem commonly referred to as covariance steering (CS). A novel framework for Data-driven Uncertainty quantification and density STeering (DUST) is presented that simultaneously characterizes the noise affecting the measured data and designs an optimal affine-feedback controller to steer the density of the state to a prescribed terminal value. We use both indirect and direct data-driven design approaches based on the notions of persistency of excitation and subspace identification to exactly represent the mean and covariance dynamics of the state in terms of the data and noise realizations. Since both the mean and the covariance steering sub-problems are plagued with stochastic uncertainty arising from noisy data collection, we first estimate the noise realization from this dataset and subsequently compute tractable upper bounds on the estimation errors. The first and second moment steering problems are then solved to optimality using techniques from robust control and robust optimization. Lastly, we present an alternative control design approach based on the certainty equivalence principle and interpret the problem as one of CS under multiplicative uncertainty. We analyze the performance and efficacy of each of these data-driven approaches using a case study and compare them with their model-based counterparts.
comment: submitted to Automatica
Global Attitude Synchronization for Heterogeneous Multi-agent Systems on SO(3)
In this paper, we address the problem of attitude synchronization for a group of rigid body systems evolving on SO(3). The interaction among these systems is modeled through an undirected, connected, and acyclic graph topology. First, we present an almost global continuous distributed attitude synchronization scheme with rigorously proven stability guarantees. Thereafter, we propose two global distributed hybrid attitude synchronization schemes on SO(3). The first scheme is a hybrid control law that leverages angular velocities and relative orientations to achieve global alignment to a common orientation. The second scheme eliminates the dependence on angular velocities by introducing dynamic auxiliary variables, while ensuring global asymptotic attitude synchronization. This velocity-free control scheme relies exclusively on attitude information. The proposed schemes are applicable to heterogeneous multi-agent systems, where agents may have distinct inertia matrices. Simulation results are provided to illustrate the effectiveness of the proposed distributed attitude synchronization schemes.
Gait Transitions in Load-Pulling Quadrupeds: Insights from Sled Dogs and a Minimal SLIP Model
Quadrupedal animals employ diverse galloping strategies to optimize speed, stability, and energy efficiency. However, the biomechanical mechanisms that enable adaptive gait transitions during high-speed locomotion under load remain poorly understood. In this study, we present new empirical and modeling insights into the biomechanics of load-pulling quadrupeds, using sprint sled dogs as a model system. High-speed video and force recordings reveal that sled dogs often switch between rotary and transverse galloping gaits within just a few strides and without any observable changes in speed, stride duration, or terrain, providing clear evidence of locomotor multistability during high-speed load-pulling. To investigate the mechanical basis of these transitions, a physics-based quadrupedal Spring-Loaded Inverted Pendulum model with hybrid dynamics and prescribed footfall sequences to reproduce the asymmetric galloping patterns observed in racing sled dogs. Through trajectory optimization, we replicate experimentally observed gait sequences and identify swing-leg stiffness modulation as a key control mechanism for inducing transitions. This work provides a much-needed biomechanical perspective on high-speed animal draft and establishes a modeling framework for studying locomotion in pulling quadrupeds, with implications for both biological understanding and the design of adaptive legged systems.
Adaptive Network Security Policies via Belief Aggregation and Rollout
Evolving security vulnerabilities and shifting operational conditions require frequent updates to network security policies. These updates include adjustments to incident response procedures and modifications to access controls, among others. Reinforcement learning methods have been proposed for automating such policy adaptations, but most of the methods in the research literature lack performance guarantees and adapt slowly to changes. In this paper, we address these limitations and present a method for computing security policies that is scalable, offers theoretical guarantees, and adapts quickly to changes. It assumes a model or simulator of the system and comprises three components: belief estimation through particle filtering, offline policy computation through aggregation, and online policy adaptation through rollout. Central to our method is a new feature-based aggregation technique, which improves scalability and flexibility. We analyze the approximation error of aggregation and show that rollout efficiently adapts policies to changes under certain conditions. Simulations and testbed results demonstrate that our method outperforms state-of-the-art methods on several benchmarks, including CAGE-2.
Systems and Control (EESS)
Storage Participation in Electricity Markets: Arbitrage and Ancillary Services
Electricity storage is used for intertemporal price arbitrage and for ancillary services that balance unforeseen supply and demand fluctuations via frequency regulation. We present an optimization model that computes bids for both arbitrage and frequency regulation and ensures that storage operators can honor their market commitments at all times for all fluctuation signals in an uncertainty set inspired by market rules. This requirement, initially expressed by an infinite number of nonconvex functional constraints, is shown to be equivalent to a finite number of deterministic constraints. The resulting formulation is a mixed-integer bilinear program that admits mixed-integer linear relaxations and restrictions. Empirical tests on European electricity markets show a negligible optimality gap between the relaxation and the restriction. The model can account for intraday trading and, with a solution time of under 5 seconds, may serve as a building block for more complex trading strategies. Such strategies become necessary as battery capacity exceeds the demand for ancillary services. In a backtest from 1 July 2020 through 30 June 2024 joint market participation more than doubles profits and almost halves energy storage output compared to arbitrage alone.
Structured identification of multivariable modal systems
Physically interpretable models are essential for next-generation industrial systems, as these representations enable effective control, support design validation, and provide a foundation for monitoring strategies. The aim of this paper is to develop a system identification framework for estimating modal models of complex multivariable mechanical systems from frequency response data. To achieve this, a two-step structured identification algorithm is presented, where an additive model is first estimated using a refined instrumental variable method and subsequently projected onto a modal form. The developed identification method provides accurate, physically-relevant, minimal-order models, for both generally-damped and proportionally damped modal systems. The effectiveness of the proposed method is demonstrated through experimental validation on a prototype wafer-stage system, which features a large number of spatially distributed actuators and sensors and exhibits complex flexible dynamics.
comment: 20 pages, 12 figures
Two-Layer Voronoi Coverage Control for Hybrid Aerial-Ground Robot Teams in Emergency Response: Implementation and Analysis
We present a comprehensive two-layer Voronoi coverage control approach for coordinating hybrid aerial-ground robot teams in hazardous material emergency response scenarios. Traditional Voronoi coverage control methods face three critical limitations in emergency contexts: heterogeneous agent capabilities with vastly different velocities, clustered initial deployment configurations, and urgent time constraints requiring rapid response rather than eventual convergence. Our method addresses these challenges through a decoupled two-layer architecture that separately optimizes aerial and ground robot positioning, with aerial agents delivering ground sensors via airdrop to high-priority locations. We provide detailed implementation of bounded Voronoi cell computation, efficient numerical integration techniques for importance-weighted centroids, and robust control strategies that prevent agent trapping. Simulation results demonstrate an 88% reduction in response time, achieving target sensor coverage (18.5% of initial sensor loss) in 25 seconds compared to 220 seconds for ground-only deployment. Complete implementation code is available at https://github.com/dHutchings/ME292B.
comment: 23 pages, 7 figures. Technical report with complete implementation details and open-source code
GPS Spoofing Attack Detection in Autonomous Vehicles Using Adaptive DBSCAN
As autonomous vehicles become an essential component of modern transportation, they are increasingly vulnerable to threats such as GPS spoofing attacks. This study presents an adaptive detection approach utilizing a dynamically tuned Density Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm, designed to adjust the detection threshold ({\epsilon}) in real-time. The threshold is updated based on the recursive mean and standard deviation of displacement errors between GPS and in-vehicle sensors data, but only at instances classified as non-anomalous. Furthermore, an initial threshold, determined from 120,000 clean data samples, ensures the capability to identify even subtle and gradual GPS spoofing attempts from the beginning. To assess the performance of the proposed method, five different subsets from the real-world Honda Research Institute Driving Dataset (HDD) are selected to simulate both large and small magnitude GPS spoofing attacks. The modified algorithm effectively identifies turn-by-turn, stop, overshoot, and multiple small biased spoofing attacks, achieving detection accuracies of 98.621%, 99.960.1%, 99.880.1%, and 98.380.1%, respectively. This work provides a substantial advancement in enhancing the security and safety of AVs against GPS spoofing threats.
Aggregate Modeling of Air-Conditioner Loads Under Packet-based Control with Both On and Off Grid Access Requests
Coordination of distributed energy resources (DERs) can engender flexibility necessary to improve grid reliability. Packetized Energy Management (PEM) is a method for coordinating DERs, such as thermostatically controlled loads (TCLs) and electric vehicles, within customer quality-of-service (QoS) limits. In PEM, a DER uses local information to offer flexibility by sending a request to the DER coordinator to turn-ON or turn-OFF. Much work has focused on modeling and analyzing aggregations of DERs under PEM with fixed packet durations and only turn-ON requests. Different recent efforts to enable variable packet lengths have shown an increase in available flexibility and ramping capability, but have not been modeled in aggregate, which limits systematic analyses. To address this issue, this paper presents a new aggregate bin-based (macro) model of PEM loads that incorporates both turn-ON and turn-OFF request features, enabling the model to accurately characterize the capability of the fleet of DERs to track a power reference signal, population temperature dynamics, aggregate request rates, and variable packet lengths. Simulation-based validation is performed against an agent-based (micro) model to evaluate robustness and quantify model accuracy. Finally, the distribution of variable packet lengths from macro-model simulations are applied to inform past work on PEM with randomized packet lengths
Transforming Tarlac State University (TSU) Gymnasium to a Nearly Zero-Energy Building through Integration of a Solar Photovoltaic (PV) System
The study is anchored to the principles of Nearly-Zero Energy Building (NZEB). It aimed to transform the Tarlac State University Gymnasium into a facility with energy-efficient equipment to contribute to reducing carbon footprints by integrating a solar PV system as its renewable energy source. The researchers found out that the electrical infrastructure of the Gym was outdated, and the lighting was not energy efficient, and there were too few convenience or power outlets. There was also insufficient cooling equipment to maintain a comfortable temperature. Analysis shows that the payback period is within the average range, making it a cost-effective investment for the University. Aside from the cost of the PV System, adherence to engineering design standards will mean additional costs to replace the metal halides with LED high bay lamps, installation of additional air conditioning units, and provision of additional convenience outlets. These additional costs should be considered when evaluating the feasibility of the project. It is recommended that the integrity of the existing roof system of the Gymnasium be considered. The total cost of putting up the whole electrical system, including new lighting, cooling, and convenience loads, must be calculated to determine the total cost of implementing the whole NZEB project. Other factors in the economic evaluation may be considered to determine a more stringent result.
Decoupled Scaling 4ch Bilateral Control on the Cartesian coordinate by 6-DoF Manipulator using Rotation Matrix
Four-channel bilateral control is a method for achieving remote control with force feedback and adjustment operability by synchronizing the positions and forces of two manipulators. This is expected to significantly improve the operability of the remote control in contact-rich tasks. Among these, 4-channel bilateral control on the Cartesian coordinate system is advantageous owing to its suitability for manipulators with different structures and because it allows the dynamics in the Cartesian coordinate system to be adjusted by adjusting the control parameters, thus achieving intuitive operability for humans. This paper proposes a 4-channel bilateral control method that achieves the desired dynamics by decoupling each dimension in the Cartesian coordinate system regardless of the scaling factor.
comment: 6 pages, 4 figures, Accepted at SAMCON 2025
A Verified High-Performance Composable Object Library for Remote Direct Memory Access (Extended Version)
Remote Direct Memory Access (RDMA) is a memory technology that allows remote devices to directly write to and read from each other's memory, bypassing components such as the CPU and operating system. This enables low-latency high-throughput networking, as required for many modern data centres, HPC applications and AI/ML workloads. However, baseline RDMA comprises a highly permissive weak memory model that is difficult to use in practice and has only recently been formalised. In this paper, we introduce the Library of Composable Objects (LOCO), a formally verified library for building multi-node objects on RDMA, filling the gap between shared memory and distributed system programming. LOCO objects are well-encapsulated and take advantage of the strong locality and the weak consistency characteristics of RDMA. They have performance comparable to custom RDMA systems (e.g. distributed maps), but with a far simpler programming model amenable to formal proofs of correctness. To support verification, we develop a novel modular declarative verification framework, called Mowgli, that is flexible enough to model multinode objects and is independent of a memory consistency model. We instantiate Mowgli with the RDMA memory model, and use it to verify correctness of LOCO libraries.
Galilean Symmetry in Robotics
Galilean symmetry is the natural symmetry of inertial motion that underpins Newtonian physics. Although rigid-body symmetry is one of the most established and fundamental tools in robotics, there appears to be no comparable treatment of Galilean symmetry for a robotics audience. In this paper, we present a robotics-tailored exposition of Galilean symmetry that leverages the community's familiarity with and understanding of rigid-body transformations and pose representations. Our approach contrasts with common treatments in the physics literature that introduce Galilean symmetry as a stepping stone to Einstein's relativity. A key insight is that the Galilean matrix Lie group can be used to describe two different pose representations, Galilean frames, that use inertial velocity in the state definition, and extended poses, that use coordinate velocity. We provide three examples where applying the Galilean matrix Lie-group algebra to robotics problems is straightforward and yields significant insights: inertial navigation above the rotating Earth, manipulator kinematics, and sensor data fusion under temporal uncertainty. We believe that the time is right for the robotics community to benefit from rediscovering and extending this classical material and applying it to modern problems.
comment: Under Review
Towards Dynamic Quadrupedal Gaits: A Symmetry-Guided RL Hierarchy Enables Free Gait Transitions at Varying Speeds
Quadrupedal robots exhibit a wide range of viable gaits, but generating specific footfall sequences often requires laborious expert tuning of numerous variables, such as touch-down and lift-off events and holonomic constraints for each leg. This paper presents a unified reinforcement learning framework for generating versatile quadrupedal gaits by leveraging the intrinsic symmetries and velocity-period relationship of dynamic legged systems. We propose a symmetry-guided reward function design that incorporates temporal, morphological, and time-reversal symmetries. By focusing on preserved symmetries and natural dynamics, our approach eliminates the need for predefined trajectories, enabling smooth transitions between diverse locomotion patterns such as trotting, bounding, half-bounding, and galloping. Implemented on the Unitree Go2 robot, our method demonstrates robust performance across a range of speeds in both simulations and hardware tests, significantly improving gait adaptability without extensive reward tuning or explicit foot placement control. This work provides insights into dynamic locomotion strategies and underscores the crucial role of symmetries in robotic gait design.
Controller for Incremental Input-to-State Practical Stabilization of Partially Unknown systems with Invariance Guarantees
Incremental stability is a property of dynamical systems that ensures the convergence of trajectories with respect to each other rather than a fixed equilibrium point or a fixed trajectory. In this paper, we introduce a related stability notion called incremental input-to-state practical stability ({\delta}-ISpS), ensuring safety guarantees. We also present a feedback linearization based control design scheme that renders a partially unknown system incrementally input-to-state practically stable and safe with formal guarantees. To deal with the unknown dynamics, we utilize Gaussian process regression to approximate the model. Finally, we implement the controller synthesized by the proposed scheme on a manipulator example
comment: 2 figures,9 pages
Risk-Budgeted Control Framework for Balanced Performance and Safety in Autonomous Vehicles
This paper presents a risk-budgeted monitor with a control framework that certifies safety for autonomous driving. In this process, a sliding window is proposed to monitor for insufficient barrier residuals or nonzero tail risk, ensuring system safety. When the safety margin deteriorates, it triggers switching the safety constraint from a performance-based relaxed-control barrier function (R-CBF) to a conservative conditional value at risk (CVaR-CBF) to address the safety concern. This switching is governed by two real-time triggers: Feasibility-Triggered (FT) and Quality-Triggered (QT) conditions. In the FT condition, if the R-CBF constraint becomes infeasible or yields a suboptimal solution, the risk monitor triggers the use of the CVaR constraints for the controller. In the QT condition, the risk monitor observes the safety margin of the R-CBF solution at every step, regardless of feasibility. If it falls below the safety margin, the safety filter switches to the CVaR-CBF constraints. The proposed framework is evaluated using a model predictive controller (MPC) for autonomous driving in the presence of autonomous vehicle (AV) localization noise and obstacle position uncertainties. Multiple AV-pedestrian interaction scenarios are considered, with 1,500 Monte Carlo runs conducted for all scenarios. In the most challenging setting with pedestrian detection uncertainty of 5 m, the proposed framework achieves a 94-96% success rate of not colliding with the pedestrians over 300 trials while maintaining the lowest mean cross-track error (CTE = 3.2-3.6 m) to the reference path. The reduced CTE indicates faster trajectory recovery after obstacle avoidance, demonstrating a balance between safety and performance.
Discovering interpretable piecewise nonlinear model predictive control laws via symbolic decision trees
In this paper, we propose symbolic decision trees as surrogate models for approximating model predictive control laws. The proposed approach learns simultaneously the partition of the input domain (splitting logic) as well as local nonlinear expressions for predicting the control action leading to interpretable piecewise nonlinear control laws. The local nonlinear expressions are determined by the learning problem and are modeled using a set of basis functions. The learning task is posed as a mixed integer optimization, which is solved to global optimality with state-of-the-art global optimization solvers. We apply the proposed approach to a case study regarding the control of an isothermal reactor. The results show that the proposed approach can learn the control law accurately, leading to closed-loop performance comparable to that of a standard model predictive controller. Finally, comparison with existing interpretable models shows that the symbolic trees achieve both lower prediction error and superior closed-loop performance.
MicroRoboScope: A Portable and Integrated Mechatronic Platform for Magnetic and Acoustic Microrobotic Experimentation
This paper presents MicroRoboScope, a portable, compact, and versatile microrobotic experimentation platform designed for real-time, closed-loop control of both magnetic and acoustic microrobots. The system integrates an embedded computer, microscope, power supplies, and control circuitry into a single, low-cost and fully integrated apparatus. Custom control software developed in Python and Arduino C++ handles live video acquisition, microrobot tracking, and generation of control signals for electromagnetic coils and acoustic transducers. The platform's multi-modal actuation, accessibility, and portability make it suitable not only for specialized research laboratories but also for educational and outreach settings. By lowering the barrier to entry for microrobotic experimentation, this system enables new opportunities for research, education, and translational applications in biomedicine, tissue engineering, and robotics.
Optimal Voltage Control Using Online Exponential Barrier Method
This paper address the optimal voltage control problem of distribution systems with high penetration of inverter-based renewable energy resources, under inaccurate model information. We propose the online exponential barrier method that explicitly leverages the online feedback from grids to enhance the robustness to model inaccuracy and incorporates the voltage constraints to maintain the safety requirements. We provide analytical results on the optimal barrier parameter selection and sufficient conditions for the safety guarantee of converged voltages. We also establish theoretical results on the exponential convergence rate with proper step-size. The effectiveness of the proposed framework is validated on a 56-bus radial network, where we significantly improve the robustness against model inaccuracy compared to existing methods.
comment: Restate the theorem for readability
DUST: A Framework for Data-Driven Density Steering
We consider the problem of data-driven stochastic optimal control of an unknown LTI dynamical system. Assuming the process noise is normally distributed, we pose the problem of steering the state's mean and covariance to a target normal distribution, under noisy data collected from the underlying system, a problem commonly referred to as covariance steering (CS). A novel framework for Data-driven Uncertainty quantification and density STeering (DUST) is presented that simultaneously characterizes the noise affecting the measured data and designs an optimal affine-feedback controller to steer the density of the state to a prescribed terminal value. We use both indirect and direct data-driven design approaches based on the notions of persistency of excitation and subspace identification to exactly represent the mean and covariance dynamics of the state in terms of the data and noise realizations. Since both the mean and the covariance steering sub-problems are plagued with stochastic uncertainty arising from noisy data collection, we first estimate the noise realization from this dataset and subsequently compute tractable upper bounds on the estimation errors. The first and second moment steering problems are then solved to optimality using techniques from robust control and robust optimization. Lastly, we present an alternative control design approach based on the certainty equivalence principle and interpret the problem as one of CS under multiplicative uncertainty. We analyze the performance and efficacy of each of these data-driven approaches using a case study and compare them with their model-based counterparts.
comment: submitted to Automatica
Global Attitude Synchronization for Heterogeneous Multi-agent Systems on SO(3)
In this paper, we address the problem of attitude synchronization for a group of rigid body systems evolving on SO(3). The interaction among these systems is modeled through an undirected, connected, and acyclic graph topology. First, we present an almost global continuous distributed attitude synchronization scheme with rigorously proven stability guarantees. Thereafter, we propose two global distributed hybrid attitude synchronization schemes on SO(3). The first scheme is a hybrid control law that leverages angular velocities and relative orientations to achieve global alignment to a common orientation. The second scheme eliminates the dependence on angular velocities by introducing dynamic auxiliary variables, while ensuring global asymptotic attitude synchronization. This velocity-free control scheme relies exclusively on attitude information. The proposed schemes are applicable to heterogeneous multi-agent systems, where agents may have distinct inertia matrices. Simulation results are provided to illustrate the effectiveness of the proposed distributed attitude synchronization schemes.
Gait Transitions in Load-Pulling Quadrupeds: Insights from Sled Dogs and a Minimal SLIP Model
Quadrupedal animals employ diverse galloping strategies to optimize speed, stability, and energy efficiency. However, the biomechanical mechanisms that enable adaptive gait transitions during high-speed locomotion under load remain poorly understood. In this study, we present new empirical and modeling insights into the biomechanics of load-pulling quadrupeds, using sprint sled dogs as a model system. High-speed video and force recordings reveal that sled dogs often switch between rotary and transverse galloping gaits within just a few strides and without any observable changes in speed, stride duration, or terrain, providing clear evidence of locomotor multistability during high-speed load-pulling. To investigate the mechanical basis of these transitions, a physics-based quadrupedal Spring-Loaded Inverted Pendulum model with hybrid dynamics and prescribed footfall sequences to reproduce the asymmetric galloping patterns observed in racing sled dogs. Through trajectory optimization, we replicate experimentally observed gait sequences and identify swing-leg stiffness modulation as a key control mechanism for inducing transitions. This work provides a much-needed biomechanical perspective on high-speed animal draft and establishes a modeling framework for studying locomotion in pulling quadrupeds, with implications for both biological understanding and the design of adaptive legged systems.
Adaptive Network Security Policies via Belief Aggregation and Rollout
Evolving security vulnerabilities and shifting operational conditions require frequent updates to network security policies. These updates include adjustments to incident response procedures and modifications to access controls, among others. Reinforcement learning methods have been proposed for automating such policy adaptations, but most of the methods in the research literature lack performance guarantees and adapt slowly to changes. In this paper, we address these limitations and present a method for computing security policies that is scalable, offers theoretical guarantees, and adapts quickly to changes. It assumes a model or simulator of the system and comprises three components: belief estimation through particle filtering, offline policy computation through aggregation, and online policy adaptation through rollout. Central to our method is a new feature-based aggregation technique, which improves scalability and flexibility. We analyze the approximation error of aggregation and show that rollout efficiently adapts policies to changes under certain conditions. Simulations and testbed results demonstrate that our method outperforms state-of-the-art methods on several benchmarks, including CAGE-2.
Robotics
Preference-Conditioned Multi-Objective RL for Integrated Command Tracking and Force Compliance in Humanoid Locomotion
Humanoid locomotion requires not only accurate command tracking for navigation but also compliant responses to external forces during human interaction. Despite significant progress, existing RL approaches mainly emphasize robustness, yielding policies that resist external forces but lack compliance-particularly challenging for inherently unstable humanoids. In this work, we address this by formulating humanoid locomotion as a multi-objective optimization problem that balances command tracking and external force compliance. We introduce a preference-conditioned multi-objective RL (MORL) framework that integrates rigid command following and compliant behaviors within a single omnidirectional locomotion policy. External forces are modeled via velocity-resistance factor for consistent reward design, and training leverages an encoder-decoder structure that infers task-relevant privileged features from deployable observations. We validate our approach in both simulation and real-world experiments on a humanoid robot. Experimental results indicate that our framework not only improves adaptability and convergence over standard pipelines, but also realizes deployable preference-conditioned humanoid locomotion.
Contact Sensing via Joint Torque Sensors and a Force/Torque Sensor for Legged Robots
This paper presents a method for detecting and localizing contact along robot legs using distributed joint torque sensors and a single hip-mounted force-torque (FT) sensor using a generalized momentum-based observer framework. We designed a low-cost strain-gauge-based joint torque sensor that can be installed on every joint to provide direct torque measurements, eliminating the need for complex friction models and providing more accurate torque readings than estimation based on motor current. Simulation studies on a floating-based 2-DoF robot leg verified that the proposed framework accurately recovers contact force and location along the thigh and shin links. Through a calibration procedure, our torque sensor achieved an average 96.4% accuracy relative to ground truth measurements. Building upon the torque sensor, we performed hardware experiments on a 2-DoF manipulator, which showed sub-centimeter contact localization accuracy and force errors below 0.2 N.
comment: Proc. IEEE 21st International Conference on Automation Science and Engineering (CASE), Los Angeles, CA, USA, Aug. 17-21, 2025, pp. 1-7, doi:10.1109/CASE58245.2025.11164031
The Irrational Machine: Neurosis and the Limits of Algorithmic Safety
We present a framework for characterizing neurosis in embodied AI: behaviors that are internally coherent yet misaligned with reality, arising from interactions among planning, uncertainty handling, and aversive memory. In a grid navigation stack we catalogue recurrent modalities including flip-flop, plan churn, perseveration loops, paralysis and hypervigilance, futile search, belief incoherence, tie break thrashing, corridor thrashing, optimality compulsion, metric mismatch, policy oscillation, and limited-visibility variants. For each we give lightweight online detectors and reusable escape policies (short commitments, a margin to switch, smoothing, principled arbitration). We then show that durable phobic avoidance can persist even under full visibility when learned aversive costs dominate local choice, producing long detours despite globally safe routes. Using First/Second/Third Law as engineering shorthand for safety latency, command compliance, and resource efficiency, we argue that local fixes are insufficient; global failures can remain. To surface them, we propose genetic-programming based destructive testing that evolves worlds and perturbations to maximize law pressure and neurosis scores, yielding adversarial curricula and counterfactual traces that expose where architectural revision, not merely symptom-level patches, is required.
comment: 41 pages, 17 figures, 5 tables
Representing Data in Robotic Tactile Perception -- A Review
Robotic tactile perception is a complex process involving several computational steps performed at different levels. Tactile information is shaped by the interplay of robot actions, the mechanical properties of its body, and the software that processes the data. In this respect, high-level computation, required to process and extract information, is commonly performed by adapting existing techniques from other domains, such as computer vision, which expects input data to be properly structured. Therefore, it is necessary to transform tactile sensor data to match a specific data structure. This operation directly affects the tactile information encoded and, as a consequence, the task execution. This survey aims to address this specific aspect of the tactile perception pipeline, namely Data Representation. The paper first clearly defines its contributions to the perception pipeline and then reviews how previous studies have dealt with the problem of representing tactile information, investigating the relationships among hardware, representations, and high-level computation methods. The analysis has led to the identification of six structures commonly used in the literature to represent data. The manuscript provides discussions and guidelines for properly selecting a representation depending on operating conditions, including the available hardware, the tactile information required to be encoded, and the task at hand.
Two-Layer Voronoi Coverage Control for Hybrid Aerial-Ground Robot Teams in Emergency Response: Implementation and Analysis
We present a comprehensive two-layer Voronoi coverage control approach for coordinating hybrid aerial-ground robot teams in hazardous material emergency response scenarios. Traditional Voronoi coverage control methods face three critical limitations in emergency contexts: heterogeneous agent capabilities with vastly different velocities, clustered initial deployment configurations, and urgent time constraints requiring rapid response rather than eventual convergence. Our method addresses these challenges through a decoupled two-layer architecture that separately optimizes aerial and ground robot positioning, with aerial agents delivering ground sensors via airdrop to high-priority locations. We provide detailed implementation of bounded Voronoi cell computation, efficient numerical integration techniques for importance-weighted centroids, and robust control strategies that prevent agent trapping. Simulation results demonstrate an 88% reduction in response time, achieving target sensor coverage (18.5% of initial sensor loss) in 25 seconds compared to 220 seconds for ground-only deployment. Complete implementation code is available at https://github.com/dHutchings/ME292B.
comment: 23 pages, 7 figures. Technical report with complete implementation details and open-source code
Real2USD: Scene Representations in Universal Scene Description Language
Large Language Models (LLMs) can help robots reason about abstract task specifications. This requires augmenting classical representations of the environment used by robots with natural language-based priors. There are a number of existing approaches to doing so, but they are tailored to specific tasks, e.g., visual-language models for navigation, language-guided neural radiance fields for mapping, etc. This paper argues that the Universal Scene Description (USD) language is an effective and general representation of geometric, photometric and semantic information in the environment for LLM-based robotics tasks. Our argument is simple: a USD is an XML-based scene graph, readable by LLMs and humans alike, and rich enough to support essentially any task -- Pixar developed this language to store assets, scenes and even movies. We demonstrate a ``Real to USD'' system using a Unitree Go2 quadruped robot carrying LiDAR and a RGB camera that (i) builds an explicit USD representation of indoor environments with diverse objects and challenging settings with lots of glass, and (ii) parses the USD using Google's Gemini to demonstrate scene understanding, complex inferences, and planning. We also study different aspects of this system in simulated warehouse and hospital settings using Nvidia's Issac Sim. Code is available at https://github.com/grasp-lyrl/Real2USD .
comment: 8 pages, 10 figures, 1 table
Gain Tuning Is Not What You Need: Reward Gain Adaptation for Constrained Locomotion Learning
Existing robot locomotion learning techniques rely heavily on the offline selection of proper reward weighting gains and cannot guarantee constraint satisfaction (i.e., constraint violation) during training. Thus, this work aims to address both issues by proposing Reward-Oriented Gains via Embodied Regulation (ROGER), which adapts reward-weighting gains online based on penalties received throughout the embodied interaction process. The ratio between the positive reward (primary reward) and negative reward (penalty) gains is automatically reduced as the learning approaches the constraint thresholds to avoid violation. Conversely, the ratio is increased when learning is in safe states to prioritize performance. With a 60-kg quadruped robot, ROGER achieved near-zero constraint violation throughout multiple learning trials. It also achieved up to 50% more primary reward than the equivalent state-of-the-art techniques. In MuJoCo continuous locomotion benchmarks, including a single-leg hopper, ROGER exhibited comparable or up to 100% higher performance and 60% less torque usage and orientation deviation compared to those trained with the default reward function. Finally, real-world locomotion learning of a physical quadruped robot was achieved from scratch within one hour without any falls. Therefore, this work contributes to constraint-satisfying real-world continual robot locomotion learning and simplifies reward weighting gain tuning, potentially facilitating the development of physical robots and those that learn in the real world.
comment: RSS 2025
Controllable Generative Trajectory Prediction via Weak Preference Alignment
Deep generative models such as conditional variational autoencoders (CVAEs) have shown great promise for predicting trajectories of surrounding agents in autonomous vehicle planning. State-of-the-art models have achieved remarkable accuracy in such prediction tasks. Besides accuracy, diversity is also crucial for safe planning because human behaviors are inherently uncertain and multimodal. However, existing methods generally lack a scheme to generate controllably diverse trajectories, which is arguably more useful than randomly diversified trajectories, to the end of safe planning. To address this, we propose PrefCVAE, an augmented CVAE framework that uses weakly labeled preference pairs to imbue latent variables with semantic attributes. Using average velocity as an example attribute, we demonstrate that PrefCVAE enables controllable, semantically meaningful predictions without degrading baseline accuracy. Our results show the effectiveness of preference supervision as a cost-effective way to enhance sampling-based generative models.
Deployment and Development of a Cognitive Teleoreactive Framework for Deep Sea Autonomy
A new AUV mission planning and execution software has been tested on AUV Sentry. Dubbed DINOS-R, it draws inspiration from cognitive architectures and AUV control systems to replace the legacy MC architecture. Unlike these existing architectures, however, DINOS-R is built from the ground-up to unify symbolic decision making (for understandable, repeatable, provable behavior) with machine learning techniques and reactive behaviors, for field-readiness across oceanographic platforms. Implemented primarily in Python3, DINOS-R is extensible, modular, and reusable, with an emphasis on non-expert use as well as growth for future research in oceanography and robot algorithms. Mission specification is flexible, and can be specified declaratively. Behavior specification is similarly flexible, supporting simultaneous use of real-time task planning and hard-coded user specified plans. These features were demonstrated in the field on Sentry, in addition to a variety of simulated cases. These results are discussed, and future work is outlined.
Bhasha-Rupantarika: Algorithm-Hardware Co-design approach for Multilingual Neural Machine Translation
This paper introduces Bhasha-Rupantarika, a light and efficient multilingual translation system tailored through algorithm-hardware codesign for resource-limited settings. The method investigates model deployment at sub-octet precision levels (FP8, INT8, INT4, and FP4), with experimental results indicating a 4.1x reduction in model size (FP4) and a 4.2x speedup in inference speed, which correlates with an increased throughput of 66 tokens/s (improvement by 4.8x). This underscores the importance of ultra-low precision quantization for real-time deployment in IoT devices using FPGA accelerators, achieving performance on par with expectations. Our evaluation covers bidirectional translation between Indian and international languages, showcasing its adaptability in low-resource linguistic contexts. The FPGA deployment demonstrated a 1.96x reduction in LUTs and a 1.65x decrease in FFs, resulting in a 2.2x enhancement in throughput compared to OPU and a 4.6x enhancement compared to HPTA. Overall, the evaluation provides a viable solution based on quantisation-aware translation along with hardware efficiency suitable for deployable multilingual AI systems. The entire codes [https://github.com/mukullokhande99/Bhasha-Rupantarika/] and dataset for reproducibility are publicly available, facilitating rapid integration and further development by researchers.
UniCoD: Enhancing Robot Policy via Unified Continuous and Discrete Representation Learning
Building generalist robot policies that can handle diverse tasks in open-ended environments is a central challenge in robotics. To leverage knowledge from large-scale pretraining, prior work has typically built generalist policies either on top of vision-language understanding models (VLMs) or generative models. However, both semantic understanding from vision-language pretraining and visual dynamics modeling from visual-generation pretraining are crucial for embodied robots. Recent unified models of generation and understanding have demonstrated strong capabilities in both comprehension and generation through large-scale pretraining. We posit that robotic policy learning can likewise benefit from the combined strengths of understanding, planning and continuous future representation learning. Building on this insight, we introduce UniCoD, which acquires the ability to dynamically model high-dimensional visual features through pretraining on over 1M internet-scale instructional manipulation videos. Subsequently, UniCoD is fine-tuned on data collected from the robot embodiment, enabling the learning of mappings from predictive representations to action tokens. Extensive experiments show our approach consistently outperforms baseline methods in terms of 9\% and 12\% across simulation environments and real-world out-of-distribution tasks.
High-Fidelity Simulated Data Generation for Real-World Zero-Shot Robotic Manipulation Learning with Gaussian Splatting
The scalability of robotic learning is fundamentally bottlenecked by the significant cost and labor of real-world data collection. While simulated data offers a scalable alternative, it often fails to generalize to the real world due to significant gaps in visual appearance, physical properties, and object interactions. To address this, we propose RoboSimGS, a novel Real2Sim2Real framework that converts multi-view real-world images into scalable, high-fidelity, and physically interactive simulation environments for robotic manipulation. Our approach reconstructs scenes using a hybrid representation: 3D Gaussian Splatting (3DGS) captures the photorealistic appearance of the environment, while mesh primitives for interactive objects ensure accurate physics simulation. Crucially, we pioneer the use of a Multi-modal Large Language Model (MLLM) to automate the creation of physically plausible, articulated assets. The MLLM analyzes visual data to infer not only physical properties (e.g., density, stiffness) but also complex kinematic structures (e.g., hinges, sliding rails) of objects. We demonstrate that policies trained entirely on data generated by RoboSimGS achieve successful zero-shot sim-to-real transfer across a diverse set of real-world manipulation tasks. Furthermore, data from RoboSimGS significantly enhances the performance and generalization capabilities of SOTA methods. Our results validate RoboSimGS as a powerful and scalable solution for bridging the sim-to-real gap.
comment: 13 pages, 6 figures
SpikeGrasp: A Benchmark for 6-DoF Grasp Pose Detection from Stereo Spike Streams
Most robotic grasping systems rely on converting sensor data into explicit 3D point clouds, which is a computational step not found in biological intelligence. This paper explores a fundamentally different, neuro-inspired paradigm for 6-DoF grasp detection. We introduce SpikeGrasp, a framework that mimics the biological visuomotor pathway, processing raw, asynchronous events from stereo spike cameras, similarly to retinas, to directly infer grasp poses. Our model fuses these stereo spike streams and uses a recurrent spiking neural network, analogous to high-level visual processing, to iteratively refine grasp hypotheses without ever reconstructing a point cloud. To validate this approach, we built a large-scale synthetic benchmark dataset. Experiments show that SpikeGrasp surpasses traditional point-cloud-based baselines, especially in cluttered and textureless scenes, and demonstrates remarkable data efficiency. By establishing the viability of this end-to-end, neuro-inspired approach, SpikeGrasp paves the way for future systems capable of the fluid and efficient manipulation seen in nature, particularly for dynamic objects.
Fast Vision in the Dark: A Case for Single-Photon Imaging in Planetary Navigation
Improving robotic navigation is critical for extending exploration range and enhancing operational efficiency. Vision-based navigation relying on traditional CCD or CMOS cameras faces major challenges when complex illumination conditions are paired with motion, limiting the range and accessibility of mobile planetary robots. In this study, we propose a novel approach to planetary navigation that leverages the unique imaging capabilities of Single-Photon Avalanche Diode (SPAD) cameras. We present the first comprehensive evaluation of single-photon imaging as an alternative passive sensing technology for robotic exploration missions targeting perceptually challenging locations, with a special emphasis on high-latitude lunar regions. We detail the operating principles and performance characteristics of SPAD cameras, assess their advantages and limitations in addressing key perception challenges of upcoming exploration missions to the Moon, and benchmark their performance under representative illumination conditions.
comment: 9 pages, 6 figures, conference paper
Reinforcement Learning-based Dynamic Adaptation for Sampling-Based Motion Planning in Agile Autonomous Driving ICRA 2026
Sampling-based trajectory planners are widely used for agile autonomous driving due to their ability to generate fast, smooth, and kinodynamically feasible trajectories. However, their behavior is often governed by a cost function with manually tuned, static weights, which forces a tactical compromise that is suboptimal across the wide range of scenarios encountered in a race. To address this shortcoming, we propose using a Reinforcement Learning (RL) agent as a high-level behavioral selector that dynamically switches the cost function parameters of an analytical, low-level trajectory planner during runtime. We show the effectiveness of our approach in simulation in an autonomous racing environment where our RL-based planner achieved 0% collision rate while reducing overtaking time by up to 60% compared to state-of-the-art static planners. Our new agent now dynamically switches between aggressive and conservative behaviors, enabling interactive maneuvers unattainable with static configurations. These results demonstrate that integrating reinforcement learning as a high-level selector resolves the inherent trade-off between safety and competitiveness in autonomous racing planners. The proposed methodology offers a pathway toward adaptive yet interpretable motion planning for broader autonomous driving applications.
comment: 8 pages, submitted to the IEEE ICRA 2026, Vienna, Austria
Decoupled Scaling 4ch Bilateral Control on the Cartesian coordinate by 6-DoF Manipulator using Rotation Matrix
Four-channel bilateral control is a method for achieving remote control with force feedback and adjustment operability by synchronizing the positions and forces of two manipulators. This is expected to significantly improve the operability of the remote control in contact-rich tasks. Among these, 4-channel bilateral control on the Cartesian coordinate system is advantageous owing to its suitability for manipulators with different structures and because it allows the dynamics in the Cartesian coordinate system to be adjusted by adjusting the control parameters, thus achieving intuitive operability for humans. This paper proposes a 4-channel bilateral control method that achieves the desired dynamics by decoupling each dimension in the Cartesian coordinate system regardless of the scaling factor.
comment: 6 pages, 4 figures, Accepted at SAMCON 2025
AI-Agents for Culturally Diverse Online Higher Education Environments
As the global reach of online higher education continues to grow, universities are increasingly accommodating students from diverse cultural backgrounds \parencite{tereshko2024culturally}. This can present a number of challenges including linguistic barriers \parencite{ullah2021linguistic}, cultural differences in learning style \parencite{omidvar2012cultural}, cultural sensitivity in course design \parencite{nguyen2022cultural} and perceived isolation when students feel their perspectives or experiences are not reflected or valued in the learning environment \parencite{hansen2022belonging}. Ensuring active engagement and reasonable learning outcomes in such a environments requires distance educational systems that are not only adaptive but also culturally resonant \parencite{dalle2024cultural}. Both embodied and virtual AI-Agents have great potential in this regard as they can facilitate personalized learning and adapt their interactions and content delivery to align with students' cultural context. In addition Generative AI (GAI), such as, Large Language Models (LLMs) can amplify the potential for these culturally aware AI agents to address educational challenges due to their advanced capacity for understanding and generating contextually relevant content \parencite{wang2024large}. This chapter reviews existing research and suggests the usage of culturally aware AI-Agents, powered by GAI, to foster engagement and improve learning outcomes in culturally diverse online higher education environments.
Population-Coded Spiking Neural Networks for High-Dimensional Robotic Control
Energy-efficient and high-performance motor control remains a critical challenge in robotics, particularly for high-dimensional continuous control tasks with limited onboard resources. While Deep Reinforcement Learning (DRL) has achieved remarkable results, its computational demands and energy consumption limit deployment in resource-constrained environments. This paper introduces a novel framework combining population-coded Spiking Neural Networks (SNNs) with DRL to address these challenges. Our approach leverages the event-driven, asynchronous computation of SNNs alongside the robust policy optimization capabilities of DRL, achieving a balance between energy efficiency and control performance. Central to this framework is the Population-coded Spiking Actor Network (PopSAN), which encodes high-dimensional observations into neuronal population activities and enables optimal policy learning through gradient-based updates. We evaluate our method on the Isaac Gym platform using the PixMC benchmark with complex robotic manipulation tasks. Experimental results on the Franka robotic arm demonstrate that our approach achieves energy savings of up to 96.10% compared to traditional Artificial Neural Networks (ANNs) while maintaining comparable control performance. The trained SNN policies exhibit robust finger position tracking with minimal deviation from commanded trajectories and stable target height maintenance during pick-and-place operations. These results position population-coded SNNs as a promising solution for energy-efficient, high-performance robotic control in resource-constrained applications, paving the way for scalable deployment in real-world robotics systems.
SuperEx: Enhancing Indoor Mapping and Exploration using Non-Line-of-Sight Perception
Efficient exploration and mapping in unknown indoor environments is a fundamental challenge, with high stakes in time-critical settings. In current systems, robot perception remains confined to line-of-sight; occluded regions remain unknown until physically traversed, leading to inefficient exploration when layouts deviate from prior assumptions. In this work, we bring non-line-of-sight (NLOS) sensing to robotic exploration. We leverage single-photon LiDARs, which capture time-of-flight histograms that encode the presence of hidden objects - allowing robots to look around blind corners. Recent single-photon LiDARs have become practical and portable, enabling deployment beyond controlled lab settings. Prior NLOS works target 3D reconstruction in static, lab-based scenarios, and initial efforts toward NLOS-aided navigation consider simplified geometries. We introduce SuperEx, a framework that integrates NLOS sensing directly into the mapping-exploration loop. SuperEx augments global map prediction with beyond-line-of-sight cues by (i) carving empty NLOS regions from timing histograms and (ii) reconstructing occupied structure via a two-step physics-based and data-driven approach that leverages structural regularities. Evaluations on complex simulated maps and the real-world KTH Floorplan dataset show a 12% gain in mapping accuracy under < 30% coverage and improved exploration efficiency compared to line-of-sight baselines, opening a path to reliable mapping beyond direct visibility.
comment: 8 pages, 9 Figures , Project webpage: https://super-ex.github.io/
Align2Act: Instruction-Tuned Models for Human-Aligned Autonomous Driving
Motion planning in complex scenarios is a core challenge in autonomous driving. Conventional methods apply predefined rules or learn from driving data to generate trajectories, while recent approaches leverage large language models (LLMs) for decision-making. However, it remains unclear whether LLMs truly capture human driving logic. We propose Align2Act, a motion planning framework that transforms instruction-tuned LLMs into interpretable planners aligned with human behavior. We derive structured driving instructions based on human reasoning patterns (e.g., anticipate hazards, yield at intersections) and traffic rules (e.g., stop at red lights, maintain lane boundaries). Our Align2ActChain module guides step-by-step reasoning to produce both an interpretable rationale and a safe trajectory. By fine-tuning LLaMA-2-7B with LoRA on one million scenarios from the nuPlan dataset, our method achieves an open-loop score of 85.17 and closed-loop scores of 70.31 (non-reactive) and 66.96 (reactive) on Test14-random. Unlike prior work focused on synthetic or open-loop settings, we demonstrate improved planning quality and human-likeness on the real-world nuPlan closed-loop benchmark. Ablation studies confirm that structured reasoning significantly improves performance over baseline LLM planners.
Galilean Symmetry in Robotics
Galilean symmetry is the natural symmetry of inertial motion that underpins Newtonian physics. Although rigid-body symmetry is one of the most established and fundamental tools in robotics, there appears to be no comparable treatment of Galilean symmetry for a robotics audience. In this paper, we present a robotics-tailored exposition of Galilean symmetry that leverages the community's familiarity with and understanding of rigid-body transformations and pose representations. Our approach contrasts with common treatments in the physics literature that introduce Galilean symmetry as a stepping stone to Einstein's relativity. A key insight is that the Galilean matrix Lie group can be used to describe two different pose representations, Galilean frames, that use inertial velocity in the state definition, and extended poses, that use coordinate velocity. We provide three examples where applying the Galilean matrix Lie-group algebra to robotics problems is straightforward and yields significant insights: inertial navigation above the rotating Earth, manipulator kinematics, and sensor data fusion under temporal uncertainty. We believe that the time is right for the robotics community to benefit from rediscovering and extending this classical material and applying it to modern problems.
comment: Under Review
Towards Dynamic Quadrupedal Gaits: A Symmetry-Guided RL Hierarchy Enables Free Gait Transitions at Varying Speeds
Quadrupedal robots exhibit a wide range of viable gaits, but generating specific footfall sequences often requires laborious expert tuning of numerous variables, such as touch-down and lift-off events and holonomic constraints for each leg. This paper presents a unified reinforcement learning framework for generating versatile quadrupedal gaits by leveraging the intrinsic symmetries and velocity-period relationship of dynamic legged systems. We propose a symmetry-guided reward function design that incorporates temporal, morphological, and time-reversal symmetries. By focusing on preserved symmetries and natural dynamics, our approach eliminates the need for predefined trajectories, enabling smooth transitions between diverse locomotion patterns such as trotting, bounding, half-bounding, and galloping. Implemented on the Unitree Go2 robot, our method demonstrates robust performance across a range of speeds in both simulations and hardware tests, significantly improving gait adaptability without extensive reward tuning or explicit foot placement control. This work provides insights into dynamic locomotion strategies and underscores the crucial role of symmetries in robotic gait design.
MonoSE(3)-Diffusion: A Monocular SE(3) Diffusion Framework for Robust Camera-to-Robot Pose Estimation
We propose MonoSE(3)-Diffusion, a monocular SE(3) diffusion framework that formulates markerless, image-based robot pose estimation as a conditional denoising diffusion process. The framework consists of two processes: a visibility-constrained diffusion process for diverse pose augmentation and a timestep-aware reverse process for progressive pose refinement. The diffusion process progressively perturbs ground-truth poses to noisy transformations for training a pose denoising network. Importantly, we integrate visibility constraints into the process, ensuring the transformations remain within the camera field of view. Compared to the fixed-scale perturbations used in current methods, the diffusion process generates in-view and diverse training poses, thereby improving the network generalization capability. Furthermore, the reverse process iteratively predicts the poses by the denoising network and refines pose estimates by sampling from the diffusion posterior of current timestep, following a scheduled coarse-to-fine procedure. Moreover, the timestep indicates the transformation scales, which guide the denoising network to achieve more accurate pose predictions. The reverse process demonstrates higher robustness than direct prediction, benefiting from its timestep-aware refinement scheme. Our approach demonstrates improvements across two benchmarks (DREAM and RoboKeyGen), achieving a notable AUC of 66.75 on the most challenging dataset, representing a 32.3% gain over the state-of-the-art.
Hierarchical Planning for Long-Horizon Multi-Target Tracking Under Target Motion Uncertainty
Achieving persistent tracking of multiple dynamic targets over a large spatial area poses significant challenges for a single-robot system with constrained sensing capabilities. As the robot moves to track different targets, the ones outside the field of view accumulate uncertainty, making them progressively harder to track. An effective path planning algorithm must manage uncertainty over a long horizon and account for the risk of permanently losing track of targets that remain unseen for too long. However, most existing approaches rely on short planning horizons and assume small, bounded environments, resulting in poor tracking performance and target loss in large-scale scenarios. In this paper, we present a hierarchical planner for tracking multiple moving targets with an aerial vehicle. To address the challenge of tracking non-static targets, our method incorporates motion models and uncertainty propagation during path execution, allowing for more informed decision-making. We decompose the multi-target tracking task into sub-tasks of single target search and detection, and our proposed pipeline consists a novel low-level coverage planner that enables searching for a target in an evolving belief area, and an estimation method to assess the likelihood of success for each sub-task, making it possible to convert the active target tracking task to a Markov decision process (MDP) that we solve with a tree-based algorithm to determine the sequence of sub-tasks. We validate our approach in simulation, demonstrating its effectiveness compared to existing planners for active target tracking tasks, and our proposed planner outperforms existing approaches, achieving a reduction of 11-70% in final uncertainty across different environments.
comment: 8 pages, 7 figures. Accepted to IEEE Robotics and Automation Letters (RAL), 2025
MicroRoboScope: A Portable and Integrated Mechatronic Platform for Magnetic and Acoustic Microrobotic Experimentation
This paper presents MicroRoboScope, a portable, compact, and versatile microrobotic experimentation platform designed for real-time, closed-loop control of both magnetic and acoustic microrobots. The system integrates an embedded computer, microscope, power supplies, and control circuitry into a single, low-cost and fully integrated apparatus. Custom control software developed in Python and Arduino C++ handles live video acquisition, microrobot tracking, and generation of control signals for electromagnetic coils and acoustic transducers. The platform's multi-modal actuation, accessibility, and portability make it suitable not only for specialized research laboratories but also for educational and outreach settings. By lowering the barrier to entry for microrobotic experimentation, this system enables new opportunities for research, education, and translational applications in biomedicine, tissue engineering, and robotics.
RobotFleet: An Open-Source Framework for Centralized Multi-Robot Task Planning
Coordinating heterogeneous robot fleets to achieve multiple goals is challenging in multi-robot systems. We introduce an open-source and extensible framework for centralized multi-robot task planning and scheduling that leverages LLMs to enable fleets of heterogeneous robots to accomplish multiple tasks. RobotFleet provides abstractions for planning, scheduling, and execution across robots deployed as containerized services to simplify fleet scaling and management. The framework maintains a shared declarative world state and two-way communication for task execution and replanning. By modularizing each layer of the autonomy stack and using LLMs for open-world reasoning, RobotFleet lowers the barrier to building scalable multi-robot systems. The code can be found here: https://github.com/therohangupta/robot-fleet.
Zero-Shot Large Language Model Agents for Fully Automated Radiotherapy Treatment Planning NeurIPS 2025
Radiation therapy treatment planning is an iterative, expertise-dependent process, and the growing burden of cancer cases has made reliance on manual planning increasingly unsustainable, underscoring the need for automation. In this study, we propose a workflow that leverages a large language model (LLM)-based agent to navigate inverse treatment planning for intensity-modulated radiation therapy (IMRT). The LLM agent was implemented to directly interact with a clinical treatment planning system (TPS) to iteratively extract intermediate plan states and propose new constraint values to guide inverse optimization. The agent's decision-making process is informed by current observations and previous optimization attempts and evaluations, allowing for dynamic strategy refinement. The planning process was performed in a zero-shot inference setting, where the LLM operated without prior exposure to manually generated treatment plans and was utilized without any fine-tuning or task-specific training. The LLM-generated plans were evaluated on twenty head-and-neck cancer cases against clinical manual plans, with key dosimetric endpoints analyzed and reported. The LLM-generated plans achieved comparable organ-at-risk (OAR) sparing relative to clinical plans while demonstrating improved hot spot control (Dmax: 106.5% vs. 108.8%) and superior conformity (conformity index: 1.18 vs. 1.39 for boost PTV; 1.82 vs. 1.88 for primary PTV). This study demonstrates the feasibility of a zero-shot, LLM-driven workflow for automated IMRT treatment planning in a commercial TPS. The proposed approach provides a generalizable and clinically applicable solution that could reduce planning variability and support broader adoption of AI-based planning strategies.
comment: Accepted for poster presentation at the NeurIPS 2025 Workshop on GenAI for Health: Potential, Trust, and Policy Compliance
VendiRL: A Framework for Self-Supervised Reinforcement Learning of Diversely Diverse Skills NeurIPS 2025
In self-supervised reinforcement learning (RL), one of the key challenges is learning a diverse set of skills to prepare agents for unknown future tasks. Despite impressive advances, scalability and evaluation remain prevalent issues. Regarding scalability, the search for meaningful skills can be obscured by high-dimensional feature spaces, where relevant features may vary across downstream task domains. For evaluating skill diversity, defining what constitutes "diversity" typically requires a hard commitment to a specific notion of what it means for skills to be diverse, potentially leading to inconsistencies in how skill diversity is understood, making results across different approaches hard to compare, and leaving many forms of diversity unexplored. To address these issues, we adopt a measure of sample diversity that translates ideas from ecology to machine learning -- the Vendi Score -- allowing the user to specify and evaluate any desired form of diversity. We demonstrate how this metric facilitates skill evaluation and introduce VendiRL, a unified framework for learning diversely diverse sets of skills. Given distinct similarity functions, VendiRL motivates distinct forms of diversity, which could support skill-diversity pretraining in new and richly interactive environments where optimising for various forms of diversity may be desirable.
comment: 17 pages including appendices, full paper at the Scaling Environments for Agents workshop at NeurIPS 2025
SVN-ICP: Uncertainty Estimation of ICP-based LiDAR Odometry using Stein Variational Newton
This letter introduces SVN-ICP, a novel Iterative Closest Point (ICP) algorithm with uncertainty estimation that leverages Stein Variational Newton (SVN) on manifold. Designed specifically for fusing LiDAR odometry in multisensor systems, the proposed method ensures accurate pose estimation and consistent noise parameter inference, even in LiDAR-degraded environments. By approximating the posterior distribution using particles within the Stein Variational Inference framework, SVN-ICP eliminates the need for explicit noise modeling or manual parameter tuning. To evaluate its effectiveness, we integrate SVN-ICP into a simple error-state Kalman filter alongside an IMU and test it across multiple datasets spanning diverse environments and robot types. Extensive experimental results demonstrate that our approach outperforms best-in-class methods on challenging scenarios while providing reliable uncertainty estimates.
Grasping Deformable Objects via Reinforcement Learning with Cross-Modal Attention to Visuo-Tactile Inputs
We consider the problem of grasping deformable objects with soft shells using a robotic gripper. Such objects have a center-of-mass that changes dynamically and are fragile so prone to burst. Thus, it is difficult for robots to generate appropriate control inputs not to drop or break the object while performing manipulation tasks. Multi-modal sensing data could help understand the grasping state through global information (e.g., shapes, pose) from visual data and local information around the contact (e.g., pressure) from tactile data. Although they have complementary information that can be beneficial to use together, fusing them is difficult owing to their different properties. We propose a method based on deep reinforcement learning (DRL) that generates control inputs of a simple gripper from visuo-tactile sensing information. Our method employs a cross-modal attention module in the encoder network and trains it in a self-supervised manner using the loss function of the RL agent. With the multi-modal fusion, the proposed method can learn the representation for the DRL agent from the visuo-tactile sensory data. The experimental result shows that cross-modal attention is effective to outperform other early and late data fusion methods across different environments including unseen robot motions and objects.
DexGarmentLab: Dexterous Garment Manipulation Environment with Generalizable Policy NeurIPS2025
Garment manipulation is a critical challenge due to the diversity in garment categories, geometries, and deformations. Despite this, humans can effortlessly handle garments, thanks to the dexterity of our hands. However, existing research in the field has struggled to replicate this level of dexterity, primarily hindered by the lack of realistic simulations of dexterous garment manipulation. Therefore, we propose DexGarmentLab, the first environment specifically designed for dexterous (especially bimanual) garment manipulation, which features large-scale high-quality 3D assets for 15 task scenarios, and refines simulation techniques tailored for garment modeling to reduce the sim-to-real gap. Previous data collection typically relies on teleoperation or training expert reinforcement learning (RL) policies, which are labor-intensive and inefficient. In this paper, we leverage garment structural correspondence to automatically generate a dataset with diverse trajectories using only a single expert demonstration, significantly reducing manual intervention. However, even extensive demonstrations cannot cover the infinite states of garments, which necessitates the exploration of new algorithms. To improve generalization across diverse garment shapes and deformations, we propose a Hierarchical gArment-manipuLation pOlicy (HALO). It first identifies transferable affordance points to accurately locate the manipulation area, then generates generalizable trajectories to complete the task. Through extensive experiments and detailed analysis of our method and baseline, we demonstrate that HALO consistently outperforms existing methods, successfully generalizing to previously unseen instances even with significant variations in shape and deformation where others fail. Our project page is available at: https://wayrise.github.io/DexGarmentLab/.
comment: NeurIPS2025 Spotlight
Humanoid Robots and Humanoid AI: Review, Perspectives and Directions
In the approximately century-long journey of robotics, humanoid robots made their debut around six decades ago. While current humanoids bear human-like appearances, none have embodied true humaneness, remaining distant from achieving human-like to human-level intelligence. The rapid recent advancements in generative AI and (multimodal) large language models have further reignited and escalated interest in humanoids towards real-time, interactive, and multimodal designs and applications, such as fostering humanoid workers, advisers, educators, medical professionals, caregivers, and receptionists. These unveil boundless opportunities of transforming 1) AI robotics into a research era of humanoid AI, and 2) AI robots into new-generation humanoid AI robots (AI humanoids). Our unique and comprehensive review of about 30 reported humanoids discloses a systematic terminology and a paradigmatic landscape of human-looking to human-like and human-level humanoids. It inspires comprehensive new perspectives and directions of humanoid AI as an area: transitioning from human-looking to humane humanoids, humanizing humanoids with functional and nonfunctional specifications, and cultivating technical and actionable advances of AI humanoids. Humanoid AI and AI humanoids nurture symbiotic advancements and future opportunities of synthesizing and transforming humanity modeling and conventional, generative to human-level AI into humanoid robotics.
comment: 35 pages, 3 figures
CHD: Coupled Hierarchical Diffusion for Long-Horizon Tasks
Diffusion-based planners have shown strong performance in short-horizon tasks but often fail in complex, long-horizon settings. We trace the failure to loose coupling between high-level (HL) sub-goal selection and low-level (LL) trajectory generation, which leads to incoherent plans and degraded performance. We propose Coupled Hierarchical Diffusion (CHD), a framework that models HL sub-goals and LL trajectories jointly within a unified diffusion process. A shared classifier passes LL feedback upstream so that sub-goals self-correct while sampling proceeds. This tight HL-LL coupling improves trajectory coherence and enables scalable long-horizon diffusion planning. Experiments across maze navigation, tabletop manipulation, and household environments show that CHD consistently outperforms both flat and hierarchical diffusion baselines. Our website is: https://sites.google.com/view/chd2025/home
Robotics
Learning to Throw-Flip IROS 2025
Dynamic manipulation, such as robot tossing or throwing objects, has recently gained attention as a novel paradigm to speed up logistic operations. However, the focus has predominantly been on the object's landing location, irrespective of its final orientation. In this work, we present a method enabling a robot to accurately "throw-flip" objects to a desired landing pose (position and orientation). Conventionally, objects thrown by revolute robots suffer from parasitic rotation, resulting in highly restricted and uncontrollable landing poses. Our approach is based on two key design choices: first, leveraging the impulse-momentum principle, we design a family of throwing motions that effectively decouple the parasitic rotation, significantly expanding the feasible set of landing poses. Second, we combine a physics-based model of free flight with regression-based learning methods to account for unmodeled effects. Real robot experiments demonstrate that our framework can learn to throw-flip objects to a pose target within ($\pm$5 cm, $\pm$45 degrees) threshold in dozens of trials. Thanks to data assimilation, incorporating projectile dynamics reduces sample complexity by an average of 40% when throw-flipping to unseen poses compared to end-to-end learning methods. Additionally, we show that past knowledge on in-hand object spinning can be effectively reused, accelerating learning by 70% when throwing a new object with a Center of Mass (CoM) shift. A video summarizing the proposed method and the hardware experiments is available at https://youtu.be/txYc9b1oflU.
comment: Accepted to IROS 2025. Video Summary: https://youtu.be/txYc9b1oflU
sqrtVINS: Robust and Ultrafast Square-Root Filter-based 3D Motion Tracking
In this paper, we develop and open-source, for the first time, a square-root filter (SRF)-based visual-inertial navigation system (VINS), termed sqrtVINS, which is ultra-fast, numerically stable, and capable of dynamic initialization even under extreme conditions (i.e., extremely small time window). Despite recent advancements in VINS, resource constraints and numerical instability on embedded (robotic) systems with limited precision remain critical challenges. A square-root covariance-based filter offers a promising solution by providing numerical stability, efficient memory usage, and guaranteed positive semi-definiteness. However, canonical SRFs suffer from inefficiencies caused by disruptions in the triangular structure of the covariance matrix during updates. The proposed method significantly improves VINS efficiency with a novel Cholesky decomposition (LLT)-based SRF update, by fully exploiting the system structure to preserve the structure. Moreover, we design a fast, robust, dynamic initialization method, which first recovers the minimal states without triangulating 3D features and then efficiently performs iterative SRF update to refine the full states, enabling seamless VINS operation. The proposed LLT-based SRF is extensively verified through numerical studies, demonstrating superior numerical stability and achieving robust efficient performance on 32-bit single-precision floats, operating at twice the speed of state-of-the-art (SOTA) methods. Our initialization method, tested on both mobile workstations and Jetson Nano computers, achieving a high success rate of initialization even within a 100 ms window under minimal conditions. Finally, the proposed sqrtVINS is extensively validated across diverse scenarios, demonstrating strong efficiency, robustness, and reliability. The full open-source implementation is released to support future research and applications.
Rise of the Robochemist
Chemistry, a long-standing discipline, has historically relied on manual and often time-consuming processes. While some automation exists, the field is now on the cusp of a significant evolution driven by the integration of robotics and artificial intelligence (AI), giving rise to the concept of the robochemist: a new paradigm where autonomous systems assist in designing, executing, and analyzing experiments. Robochemists integrate mobile manipulators, advanced perception, teleoperation, and data-driven protocols to execute experiments with greater adaptability, reproducibility, and safety. Rather than a fully automated replacement for human chemists, we envisioned the robochemist as a complementary partner that works collaboratively to enhance discovery, enabling a more efficient exploration of chemical space and accelerating innovation in pharmaceuticals, materials science, and sustainable manufacturing. This article traces the technologies, applications, and challenges that define this transformation, highlighting both the opportunities and the responsibilities that accompany the emergence of the robochemist. Ultimately, the future of chemistry is argued to lie in a symbiotic partnership where human intuition and expertise is amplified by robotic precision and AI-driven insight.
comment: This article was originally published in the IEEE Systems, Man, and Cybernetics Society eNewsletter, September 2025 issue: https://www.ieeesmc.org/wp-content/uploads/2024/10/FeatureArticle_Sept25.pdf
KG-MAS: Knowledge Graph-Enhanced Multi-Agent Infrastructure for coupling physical and digital robotic environments
The seamless integration of physical and digital environments in Cyber-Physical Systems(CPS), particularly within Industry 4.0, presents significant challenges stemming from system heterogeneity and complexity. Traditional approaches often rely on rigid, data-centric solutions like co-simulation frameworks or brittle point-to-point middleware bridges, which lack the semantic richness and flexibility required for intelligent, autonomous coordination. This report introduces the Knowledge Graph-Enhanced Multi-Agent Infrastructure(KG-MAS), as resolution in addressing such limitations. KG-MAS leverages a centralized Knowledge Graph (KG) as a dynamic, shared world model, providing a common semantic foundation for a Multi-Agent System(MAS). Autonomous agents, representing both physical and digital components, query this KG for decision-making and update it with real-time state information. The infrastructure features a model-driven architecture which facilitates the automatic generation of agents from semantic descriptions, thereby simplifying system extension and maintenance. By abstracting away underlying communication protocols and providing a unified, intelligent coordination mechanism, KG-MAS offers a robust, scalable, and flexible solution for coupling heterogeneous physical and digital robotic environments.
Bridging Perspectives: Foundation Model Guided BEV Maps for 3D Object Detection and Tracking
Camera-based 3D object detection and tracking are essential for perception in autonomous driving. Current state-of-the-art approaches often rely exclusively on either perspective-view (PV) or bird's-eye-view (BEV) features, limiting their ability to leverage both fine-grained object details and spatially structured scene representations. In this work, we propose DualViewDistill, a hybrid detection and tracking framework that incorporates both PV and BEV camera image features to leverage their complementary strengths. Our approach introduces BEV maps guided by foundation models, leveraging descriptive DINOv2 features that are distilled into BEV representations through a novel distillation process. By integrating PV features with BEV maps enriched with semantic and geometric features from DINOv2, our model leverages this hybrid representation via deformable aggregation to enhance 3D object detection and tracking. Extensive experiments on the nuScenes and Argoverse 2 benchmarks demonstrate that DualViewDistill achieves state-of-the-art performance. The results showcase the potential of foundation model BEV maps to enable more reliable perception for autonomous driving. We make the code and pre-trained models available at https://dualviewdistill.cs.uni-freiburg.de .
X-VLA: Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model
Successful generalist Vision-Language-Action (VLA) models rely on effective training across diverse robotic platforms with large-scale, cross-embodiment, heterogeneous datasets. To facilitate and leverage the heterogeneity in rich, diverse robotic data sources, we propose a novel Soft Prompt approach with minimally added parameters, by infusing prompt learning concepts into cross-embodiment robot learning and introducing separate sets of learnable embeddings for each distinct data source. These embeddings serve as embodiment-specific prompts, which in unity empower VLA models with effective exploitation of varying cross-embodiment features. Our new X-VLA, a neat flow-matching-based VLA architecture, relies exclusively on soft-prompted standard Transformer encoders, enjoying both scalability and simplicity. Evaluated across 6 simulations as well as 3 real-world robots, our 0.9B instantiation-X-VLA-0.9B simultaneously achieves SOTA performance over a sweep of benchmarks, demonstrating superior results on a wide axes of capabilities, from flexible dexterity to quick adaptation across embodiments, environments, and tasks. Website: https://thu-air-dream.github.io/X-VLA/
comment: preprint, technical report, 33 pages
A3RNN: Bi-directional Fusion of Bottom-up and Top-down Process for Developmental Visual Attention in Robots
This study investigates the developmental interaction between top-down (TD) and bottom-up (BU) visual attention in robotic learning. Our goal is to understand how structured, human-like attentional behavior emerges through the mutual adaptation of TD and BU mechanisms over time. To this end, we propose a novel attention model $A^3 RNN$ that integrates predictive TD signals and saliency-based BU cues through a bi-directional attention architecture. We evaluate our model in robotic manipulation tasks using imitation learning. Experimental results show that attention behaviors evolve throughout training, from saliency-driven exploration to prediction-driven direction. Initially, BU attention highlights visually salient regions, which guide TD processes, while as learning progresses, TD attention stabilizes and begins to reshape what is perceived as salient. This trajectory reflects principles from cognitive science and the free-energy framework, suggesting the importance of self-organizing attention through interaction between perception and internal prediction. Although not explicitly optimized for stability, our model exhibits more coherent and interpretable attention patterns than baselines, supporting the idea that developmental mechanisms contribute to robust attention formation.
comment: 8 pages, 5 figures
UF-RNN: Real-Time Adaptive Motion Generation Using Uncertainty-Driven Foresight Prediction
Training robots to operate effectively in environments with uncertain states, such as ambiguous object properties or unpredictable interactions, remains a longstanding challenge in robotics. Imitation learning methods typically rely on successful examples and often neglect failure scenarios where uncertainty is most pronounced. To address this limitation, we propose the Uncertainty-driven Foresight Recurrent Neural Network (UF-RNN), a model that combines standard time-series prediction with an active "Foresight" module. This module performs internal simulations of multiple future trajectories and refines the hidden state to minimize predicted variance, enabling the model to selectively explore actions under high uncertainty. We evaluate UF-RNN on a door-opening task in both simulation and a real-robot setting, demonstrating that, despite the absence of explicit failure demonstrations, the model exhibits robust adaptation by leveraging self-induced chaotic dynamics in its latent space. When guided by the Foresight module, these chaotic properties stimulate exploratory behaviors precisely when the environment is ambiguous, yielding improved success rates compared to conventional stochastic RNN baselines. These findings suggest that integrating uncertainty-driven foresight into imitation learning pipelines can significantly enhance a robot's ability to handle unpredictable real-world conditions.
comment: 8 pages, 6 figures
It Takes Two: Learning Interactive Whole-Body Control Between Humanoid Robots
The true promise of humanoid robotics lies beyond single-agent autonomy: two or more humanoids must engage in physically grounded, socially meaningful whole-body interactions that echo the richness of human social interaction. However, single-humanoid methods suffer from the isolation issue, ignoring inter-agent dynamics and causing misaligned contacts, interpenetrations, and unrealistic motions. To address this, we present Harmanoid , a dual-humanoid motion imitation framework that transfers interacting human motions to two robots while preserving both kinematic fidelity and physical realism. Harmanoid comprises two key components: (i) contact-aware motion retargeting, which restores inter-body coordination by aligning SMPL contacts with robot vertices, and (ii) interaction-driven motion controller, which leverages interaction-specific rewards to enforce coordinated keypoints and physically plausible contacts. By explicitly modeling inter-agent contacts and interaction-aware dynamics, Harmanoid captures the coupled behaviors between humanoids that single-humanoid frameworks inherently overlook. Experiments demonstrate that Harmanoid significantly improves interactive motion imitation, surpassing existing single-humanoid frameworks that largely fail in such scenarios.
Dejavu: Post-Deployment Learning for Embodied Agents via Experience Feedback
Embodied agents face a fundamental limitation: once deployed in real-world environments to perform specific tasks, they are unable to acquire new useful knowledge to enhance task performance. In this paper, we propose a general post-deployment learning framework called Dejavu, which employs an Experience Feedback Network (EFN) and augments the frozen Vision-Language-Action (VLA) policy with retrieved execution memories. EFN automatically identifies contextually successful prior action experiences and conditions action prediction on this retrieved guidance. We adopt reinforcement learning with semantic similarity rewards on EFN to ensure that the predicted actions align with past successful behaviors under current observations. During deployment, EFN continually enriches its memory with new trajectories, enabling the agent to exhibit "learning from experience" despite fixed weights. Experiments across diverse embodied tasks show that EFN significantly improves adaptability, robustness, and success rates over frozen baselines. These results highlight a promising path toward embodied agents that continually refine their behavior after deployment.
CompassNav: Steering From Path Imitation To Decision Understanding In Navigation
The dominant paradigm for training Large Vision-Language Models (LVLMs) in navigation relies on imitating expert trajectories. This approach reduces the complex navigation task to a sequence-to-sequence replication of a single correct path, fundamentally limiting the agent's ability to explore and generalize. In this work, we argue for and introduce a new paradigm: a shift from Path Imitation to Decision Understanding. The goal of this paradigm is to build agents that do not just follow, but truly understand how to navigate. We materialize this through two core contributions: first, we introduce Compass-Data-22k, a novel 22k-trajectory dataset.Its Reinforcement Fine-Tuning (RFT) subset provides a panoramic view of the decision landscape by annotating all feasible actions with A* geodesic distances. Second, we design a novel gap-aware hybrid reward function that dynamically adapts its feedback to decision certainty, shifting between decisive signals for optimal actions and nuanced scores to encourage exploration. Integrated into an SFT-then-RFT recipe, our CompassNav agent is trained not to memorize static routes, but to develop an internal ``compass'' that constantly intuits the direction to the goal by evaluating the relative quality of all possible moves. This approach enables our 7B agent to set a new state-of-the-art on Goal navigation benchmarks, outperforming even larger proprietary models, and achieve robust real-world goal navigation on a physical robot.
Ctrl-World: A Controllable Generative World Model for Robot Manipulation
Generalist robot policies can now perform a wide range of manipulation skills, but evaluating and improving their ability with unfamiliar objects and instructions remains a significant challenge. Rigorous evaluation requires a large number of real-world rollouts, while systematic improvement demands additional corrective data with expert labels. Both of these processes are slow, costly, and difficult to scale. World models offer a promising, scalable alternative by enabling policies to rollout within imagination space. However, a key challenge is building a controllable world model that can handle multi-step interactions with generalist robot policies. This requires a world model compatible with modern generalist policies by supporting multi-view prediction, fine-grained action control, and consistent long-horizon interactions, which is not achieved by previous works. In this paper, we make a step forward by introducing a controllable multi-view world model that can be used to evaluate and improve the instruction-following ability of generalist robot policies. Our model maintains long-horizon consistency with a pose-conditioned memory retrieval mechanism and achieves precise action control through frame-level action conditioning. Trained on the DROID dataset (95k trajectories, 564 scenes), our model generates spatially and temporally consistent trajectories under novel scenarios and new camera placements for over 20 seconds. We show that our method can accurately rank policy performance without real-world robot rollouts. Moreover, by synthesizing successful trajectories in imagination and using them for supervised fine-tuning, our approach can improve policy success by 44.7\%.
comment: 17 pages
Beyond ADE and FDE: A Comprehensive Evaluation Framework for Safety-Critical Prediction in Multi-Agent Autonomous Driving Scenarios
Current evaluation methods for autonomous driving prediction models rely heavily on simplistic metrics such as Average Displacement Error (ADE) and Final Displacement Error (FDE). While these metrics offer basic performance assessments, they fail to capture the nuanced behavior of prediction modules under complex, interactive, and safety-critical driving scenarios. For instance, existing benchmarks do not distinguish the influence of nearby versus distant agents, nor systematically test model robustness across varying multi-agent interactions. This paper addresses this critical gap by proposing a novel testing framework that evaluates prediction performance under diverse scene structures, saying, map context, agent density and spatial distribution. Through extensive empirical analysis, we quantify the differential impact of agent proximity on target trajectory prediction and identify scenario-specific failure cases that are not exposed by traditional metrics. Our findings highlight key vulnerabilities in current state-of-the-art prediction models and demonstrate the importance of scenario-aware evaluation. The proposed framework lays the groundwork for rigorous, safety-driven prediction validation, contributing significantly to the identification of failure-prone corner cases and the development of robust, certifiable prediction systems for autonomous vehicles.
Ionospheric and Plasmaspheric Delay Characterization for Lunar Terrestrial GNSS Receivers with Global Core Plasma Model
Recent advancements in lunar positioning, navigation, and timing (PNT) have demonstrated that terrestrial GNSS signals, including weak sidelobe transmissions, can be exploited for lunar spacecraft positioning and timing. While GNSS-based navigation at the Moon has been validated recently, unmodeled ionospheric and plasmaspheric delays remain a significant error source, particularly given the unique signal geometry and extended propagation paths. This paper characterizes these delays using the Global Core Plasma Model (GCPM) and a custom low-cost ray-tracing algorithm that iteratively solves for bent signal paths. We simulate first-, second-, and third-order group delays, as well as excess path length from ray bending, for GNSS signals received at both lunar orbit and the lunar south pole under varying solar and geomagnetic conditions. Results show that mean group delays are typically on the order of 1 m, but can exceed 100 m for low-altitude ray paths during high solar activity, while bending delays are generally smaller but non-negligible for low-altitude ray paths. We also quantify the influence of signal frequency, geomagnetic $K_p$ index, and solar R12 index. These findings inform the design of robust positioning and timing algorithms that utilize terrestrial GNSS signals.
comment: Submitted NAVIGATION: Journal of the Institute of Navigation
LOMORO: Long-term Monitoring of Dynamic Targets with Minimum Robotic Fleet under Resource Constraints IROS 2025
Long-term monitoring of numerous dynamic targets can be tedious for a human operator and infeasible for a single robot, e.g., to monitor wild flocks, detect intruders, search and rescue. Fleets of autonomous robots can be effective by acting collaboratively and concurrently. However, the online coordination is challenging due to the unknown behaviors of the targets and the limited perception of each robot. Existing work often deploys all robots available without minimizing the fleet size, or neglects the constraints on their resources such as battery and memory. This work proposes an online coordination scheme called LOMORO for collaborative target monitoring, path routing and resource charging. It includes three core components: (I) the modeling of multi-robot task assignment problem under the constraints on resources and monitoring intervals; (II) the resource-aware task coordination algorithm iterates between the high-level assignment of dynamic targets and the low-level multi-objective routing via the Martin's algorithm; (III) the online adaptation algorithm in case of unpredictable target behaviors and robot failures. It ensures the explicitly upper-bounded monitoring intervals for all targets and the lower-bounded resource levels for all robots, while minimizing the average number of active robots. The proposed methods are validated extensively via large-scale simulations against several baselines, under different road networks, robot velocities, charging rates and monitoring intervals.
comment: Accepted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025)
Hybrid Robotic Meta-gripper for Tomato Harvesting: Analysis of Auxetic Structures with Lattice Orientation Variations
The agricultural sector is rapidly evolving to meet growing global food demands, yet tasks like fruit and vegetable handling remain labor-intensive, causing inefficiencies and post-harvest losses. Automation, particularly selective harvesting, offers a viable solution, with soft robotics emerging as a key enabler. This study introduces a novel hybrid gripper for tomato harvesting, incorporating a rigid outer frame with a soft auxetic internal lattice. The six-finger, 3D caging-effect design enables gentle yet secure grasping in unstructured environments. Uniquely, the work investigates the effect of auxetic lattice orientation on grasping conformability, combining experimental validation with 2D Digital Image Correlation (DIC) and nonlinear finite element analysis (FEA). Auxetic configurations with unit cell inclinations of 0 deg, 30 deg, 45 deg, and 60 deg are evaluated, and their grasping forces, deformation responses, and motor torque requirements are systematically compared. Results demonstrate that lattice orientation strongly influences compliance, contact forces, and energy efficiency, with distinct advantages across configurations. This comparative framework highlights the novelty of tailoring auxetic geometries to optimize robotic gripper performance. The findings provide new insights into soft-rigid hybrid gripper design, advancing automation strategies for precision agriculture while minimizing crop damage.
ATRos: Learning Energy-Efficient Agile Locomotion for Wheeled-legged Robots IROS 2025
Hybrid locomotion of wheeled-legged robots has recently attracted increasing attention due to their advantages of combining the agility of legged locomotion and the efficiency of wheeled motion. But along with expanded performance, the whole-body control of wheeled-legged robots remains challenging for hybrid locomotion. In this paper, we present ATRos, a reinforcement learning (RL)-based hybrid locomotion framework to achieve hybrid walking-driving motions on the wheeled-legged robot. Without giving predefined gait patterns, our planner aims to intelligently coordinate simultaneous wheel and leg movements, thereby achieving improved terrain adaptability and improved energy efficiency. Based on RL techniques, our approach constructs a prediction policy network that could estimate external environmental states from proprioceptive sensory information, and the outputs are then fed into an actor critic network to produce optimal joint commands. The feasibility of the proposed framework is validated through both simulations and real-world experiments across diverse terrains, including flat ground, stairs, and grassy surfaces. The hybrid locomotion framework shows robust performance over various unseen terrains, highlighting its generalization capability.
comment: 4 pages, 2 figures, submitted to IROS 2025 wheeled-legged workshop
Reinforcement Fine-Tuning of Flow-Matching Policies for Vision-Language-Action Models
Vision-Language-Action (VLA) models such as OpenVLA, Octo, and $\pi_0$ have shown strong generalization by leveraging large-scale demonstrations, yet their performance is still fundamentally constrained by the quality and coverage of supervised data. Reinforcement learning (RL) provides a promising path for improving and fine-tuning VLAs through online interaction. However, conventional policy gradient methods are computationally infeasible in the context of flow-matching based models due to the intractability of the importance sampling process, which requires explicit computation of policy ratios. To overcome this limitation, we propose Flow Policy Optimization (FPO) algorithm, which reformulates importance sampling by leveraging per-sample changes in the conditional flow-matching objective. Furthermore, FPO achieves stable and scalable online reinforcement fine-tuning of the $\pi_0$ model by integrating structure-aware credit assignment to enhance gradient efficiency, clipped surrogate objectives to stabilize optimization, multi-step latent exploration to encourage diverse policy updates, and a Q-ensemble mechanism to provide robust value estimation. We evaluate FPO on the LIBERO benchmark and the ALOHA simulation task against supervised, preference-aligned, diffusion-based, autoregressive online RL, and $\pi_0$-FAST baselines, observing consistent improvements over the imitation prior and strong alternatives with stable learning under sparse rewards. In addition, ablation studies and analyses of the latent space dynamics further highlight the contributions of individual components within FPO, validating the effectiveness of the proposed computational modules and the stable convergence of the conditional flow-matching objective during online RL.
FORM: Fixed-Lag Odometry with Reparative Mapping utilizing Rotating LiDAR Sensors ICRA 2026
Light Detection and Ranging (LiDAR) sensors have become a de-facto sensor for many robot state estimation tasks, spurring development of many LiDAR Odometry (LO) methods in recent years. While some smoothing-based LO methods have been proposed, most require matching against multiple scans, resulting in sub-real-time performance. Due to this, most prior works estimate a single state at a time and are ``submap''-based. This architecture propagates any error in pose estimation to the fixed submap and can cause jittery trajectories and degrade future registrations. We propose Fixed-Lag Odometry with Reparative Mapping (FORM), a LO method that performs smoothing over a densely connected factor graph while utilizing a single iterative map for matching. This allows for both real-time performance and active correction of the local map as pose estimates are further refined. We evaluate on a wide variety of datasets to show that FORM is robust, accurate, real-time, and provides smooth trajectory estimates when compared to prior state-of-the-art LO methods.
comment: Submitted to ICRA 2026
LLM-HBT: Dynamic Behavior Tree Construction for Adaptive Coordination in Heterogeneous Robots ICRA 2026
We introduce a novel framework for automatic behavior tree (BT) construction in heterogeneous multi-robot systems, designed to address the challenges of adaptability and robustness in dynamic environments. Traditional robots are limited by fixed functional attributes and cannot efficiently reconfigure their strategies in response to task failures or environmental changes. To overcome this limitation, we leverage large language models (LLMs) to generate and extend BTs dynamically, combining the reasoning and generalization power of LLMs with the modularity and recovery capability of BTs. The proposed framework consists of four interconnected modules task initialization, task assignment, BT update, and failure node detection which operate in a closed loop. Robots tick their BTs during execution, and upon encountering a failure node, they can either extend the tree locally or invoke a centralized virtual coordinator (Alex) to reassign subtasks and synchronize BTs across peers. This design enables long-term cooperative execution in heterogeneous teams. We validate the framework on 60 tasks across three simulated scenarios and in a real-world cafe environment with a robotic arm and a wheeled-legged robot. Results show that our method consistently outperforms baseline approaches in task success rate, robustness, and scalability, demonstrating its effectiveness for multi-robot collaboration in complex scenarios.
comment: It contains 8 pages, 7 figures and 4 tables. This paper is submitted to ICRA 2026
VG-Mapping: Variation-Aware 3D Gaussians for Online Semi-static Scene Mapping
Maintaining an up-to-date map that accurately reflects recent changes in the environment is crucial, especially for robots that repeatedly traverse the same space. Failing to promptly update the changed regions can degrade map quality, resulting in poor localization, inefficient operations, and even lost robots. 3D Gaussian Splatting (3DGS) has recently seen widespread adoption in online map reconstruction due to its dense, differentiable, and photorealistic properties, yet accurately and efficiently updating the regions of change remains a challenge. In this paper, we propose VG-Mapping, a novel online 3DGS-based mapping system tailored for such semi-static scenes. Our approach introduces a hybrid representation that augments 3DGS with a TSDF-based voxel map to efficiently identify changed regions in a scene, along with a variation-aware density control strategy that inserts or deletes Gaussian primitives in regions undergoing change. Furthermore, to address the absence of public benchmarks for this task, we construct a RGB-D dataset comprising both synthetic and real-world semi-static environments. Experimental results demonstrate that our method substantially improves the rendering quality and map update efficiency in semi-static scenes. The code and dataset are available at https://github.com/heyicheng-never/VG-Mapping.
Accurate and Noise-Tolerant Extraction of Routine Logs in Robotic Process Automation (Extended Version)
Robotic Process Mining focuses on the identification of the routine types performed by human resources through a User Interface. The ultimate goal is to discover routine-type models to enable robotic process automation. The discovery of routine-type models requires the provision of a routine log. Unfortunately, the vast majority of existing works do not directly focus on enabling the model discovery, limiting themselves to extracting the set of actions that are part of the routines. They were also not evaluated in scenarios characterized by inconsistent routine execution, hereafter referred to as noise, which reflects natural variability and occasional errors in human performance. This paper presents a clustering-based technique that aims to extract routine logs. Experiments were conducted on nine UI logs from the literature with different levels of injected noise. Our technique was compared with existing techniques, most of which are not meant to discover routine logs but were adapted for the purpose. The results were evaluated through standard state-of-the-art metrics, showing that we can extract more accurate routine logs than what the state of the art could, especially in the presence of noise.
comment: 16 pages, 5 figures
Differentiable Particle Optimization for Fast Sequential Manipulation
Sequential robot manipulation tasks require finding collision-free trajectories that satisfy geometric constraints across multiple object interactions in potentially high-dimensional configuration spaces. Solving these problems in real-time and at large scales has remained out of reach due to computational requirements. Recently, GPU-based acceleration has shown promising results, but prior methods achieve limited performance due to CPU-GPU data transfer overhead and complex logic that prevents full hardware utilization. To this end, we present SPaSM (Sampling Particle optimization for Sequential Manipulation), a fully GPU-parallelized framework that compiles constraint evaluation, sampling, and gradient-based optimization into optimized CUDA kernels for end-to-end trajectory optimization without CPU coordination. The method consists of a two-stage particle optimization strategy: first solving placement constraints through massively parallel sampling, then lifting solutions to full trajectory optimization in joint space. Unlike hierarchical approaches, SPaSM jointly optimizes object placements and robot trajectories to handle scenarios where motion feasibility constrains placement options. Experimental evaluation on challenging benchmarks demonstrates solution times in the realm of $\textbf{milliseconds}$ with a 100% success rate; a $4000\times$ speedup compared to existing approaches. Code and examples are available at $\href{https://commalab.org/papers/spasm}{commalab.org/papers/spasm}$.
comment: 8 pages, 7 figures, 3 tables. Under review
ASTREA: Introducing Agentic Intelligence for Orbital Thermal Autonomy
This paper presents ASTREA, the first agentic system executed on flight-heritage hardware (TRL 9) for autonomous spacecraft operations, with on-orbit operation aboard the International Space Station (ISS). Using thermal control as a representative use case, we integrate a resource-constrained Large Language Model (LLM) agent with a reinforcement learning controller in an asynchronous architecture tailored for space-qualified platforms. Ground experiments show that LLM-guided supervision improves thermal stability and reduces violations, confirming the feasibility of combining semantic reasoning with adaptive control under hardware constraints. On-orbit validation aboard the ISS initially faced challenges due to inference latency misaligned with the rapid thermal cycles of Low Earth Orbit (LEO) satellites. Synchronization with the orbit length successfully surpassed the baseline with reduced violations, extended episode durations, and improved CPU utilization. These findings demonstrate the potential for scalable agentic supervision architectures in future autonomous spacecraft.
comment: Accepted for presentation at the European Space Agency's AI Start 2025 Conference (see https://atpi.eventsair.com/ai-star-2025/)
A Vision-Based Shared-Control Teleoperation Scheme for Controlling the Robotic Arm of a Four-Legged Robot
In hazardous and remote environments, robotic systems perform critical tasks demanding improved safety and efficiency. Among these, quadruped robots with manipulator arms offer mobility and versatility for complex operations. However, teleoperating quadruped robots is challenging due to the lack of integrated obstacle detection and intuitive control methods for the robotic arm, increasing collision risks in confined or dynamically changing workspaces. Teleoperation via joysticks or pads can be non-intuitive and demands a high level of expertise due to its complexity, culminating in a high cognitive load on the operator. To address this challenge, a teleoperation approach that directly maps human arm movements to the robotic manipulator offers a simpler and more accessible solution. This work proposes an intuitive remote control by leveraging a vision-based pose estimation pipeline that utilizes an external camera with a machine learning-based model to detect the operator's wrist position. The system maps these wrist movements into robotic arm commands to control the robot's arm in real-time. A trajectory planner ensures safe teleoperation by detecting and preventing collisions with both obstacles and the robotic arm itself. The system was validated on the real robot, demonstrating robust performance in real-time control. This teleoperation approach provides a cost-effective solution for industrial applications where safety, precision, and ease of use are paramount, ensuring reliable and intuitive robotic control in high-risk environments.
Autonomous UAV Flight Navigation in Confined Spaces: A Reinforcement Learning Approach
Autonomous UAV inspection of confined industrial infrastructure, such as ventilation ducts, demands robust navigation policies where collisions are unacceptable. While Deep Reinforcement Learning (DRL) offers a powerful paradigm for developing such policies, it presents a critical trade-off between on-policy and off-policy algorithms. Off-policy methods promise high sample efficiency, a vital trait for minimizing costly and unsafe real-world fine-tuning. In contrast, on-policy methods often exhibit greater training stability, which is essential for reliable convergence in hazard-dense environments. This paper directly investigates this trade-off by comparing a leading on-policy algorithm, Proximal Policy Optimization (PPO), against an off-policy counterpart, Soft Actor-Critic (SAC), for precision flight in procedurally generated ducts within a high-fidelity simulator. Our results show that PPO consistently learned a stable, collision-free policy that completed the entire course. In contrast, SAC failed to find a complete solution, converging to a suboptimal policy that navigated only the initial segments before failure. This work provides evidence that for high-precision, safety-critical navigation tasks, the reliable convergence of a well-established on-policy method can be more decisive than the nominal sample efficiency of an off-policy algorithm.
Optimizing Grasping in Legged Robots: A Deep Learning Approach to Loco-Manipulation
This paper presents a deep learning framework designed to enhance the grasping capabilities of quadrupeds equipped with arms, with a focus on improving precision and adaptability. Our approach centers on a sim-to-real methodology that minimizes reliance on physical data collection. We developed a pipeline within the Genesis simulation environment to generate a synthetic dataset of grasp attempts on common objects. By simulating thousands of interactions from various perspectives, we created pixel-wise annotated grasp-quality maps to serve as the ground truth for our model. This dataset was used to train a custom CNN with a U-Net-like architecture that processes multi-modal input from an onboard RGB and depth cameras, including RGB images, depth maps, segmentation masks, and surface normal maps. The trained model outputs a grasp-quality heatmap to identify the optimal grasp point. We validated the complete framework on a four-legged robot. The system successfully executed a full loco-manipulation task: autonomously navigating to a target object, perceiving it with its sensors, predicting the optimal grasp pose using our model, and performing a precise grasp. This work proves that leveraging simulated training with advanced sensing offers a scalable and effective solution for object handling.
A Synthetic Dataset for Manometry Recognition in Robotic Applications
This paper addresses the challenges of data scarcity and high acquisition costs in training robust object detection models for complex industrial environments, such as offshore oil platforms. Data collection in these hazardous settings often limits the development of autonomous inspection systems. To mitigate this issue, we propose a hybrid data synthesis pipeline that integrates procedural rendering and AI-driven video generation. The approach uses BlenderProc to produce photorealistic images with domain randomization and NVIDIA's Cosmos-Predict2 to generate physically consistent video sequences with temporal variation. A YOLO-based detector trained on a composite dataset, combining real and synthetic data, outperformed models trained solely on real images. A 1:1 ratio between real and synthetic samples achieved the highest accuracy. The results demonstrate that synthetic data generation is a viable, cost-effective, and safe strategy for developing reliable perception systems in safety-critical and resource-constrained industrial applications.
Efficient Navigation in Unknown Indoor Environments with Vision-Language Models IROS 2025
We present a novel high-level planning framework that leverages vision-language models (VLMs) to improve autonomous navigation in unknown indoor environments with many dead ends. Traditional exploration methods often take inefficient routes due to limited global reasoning and reliance on local heuristics. In contrast, our approach enables a VLM to reason directly about occupancy maps in a zero-shot manner, selecting subgoals that are likely to yield more efficient paths. At each planning step, we convert a 3D occupancy grid into a partial 2D map of the environment, and generate candidate subgoals. Each subgoal is then evaluated and ranked against other candidates by the model. We integrate this planning scheme into DYNUS \cite{kondo2025dynus}, a state-of-the-art trajectory planner, and demonstrate improved navigation efficiency in simulation. The VLM infers structural patterns (e.g., rooms, corridors) from incomplete maps and balances the need to make progress toward a goal against the risk of entering unknown space. This reduces common greedy failures (e.g., detouring into small rooms) and achieves about 10\% shorter paths on average.
comment: 7 pages, 4 figures, accepted to the OWN workshop at IROS 2025
Robust and Safe Multi-Agent Reinforcement Learning with Communication for Autonomous Vehicles: From Simulation to Hardware
Deep multi-agent reinforcement learning (MARL) has been demonstrated effectively in simulations for multi-robot problems. For autonomous vehicles, the development of vehicle-to-vehicle (V2V) communication technologies provide opportunities to further enhance system safety. However, zero-shot transfer of simulator-trained MARL policies to dynamic hardware systems remains challenging, and how to leverage communication and shared information for MARL has limited demonstrations on hardware. This problem is challenged by discrepancies between simulated and physical states, system state and model uncertainties, practical shared information design, and the need for safety guarantees in both simulation and hardware. This paper designs RSR-RSMARL, a novel Robust and Safe MARL framework that supports Real-Sim-Real (RSR) policy adaptation for multi-agent systems with communication among agents, with both simulation and hardware demonstrations. RSR-RSMARL leverages state (includes shared state information among agents) and action representations considering real system complexities for MARL formulation. The MARL policy is trained with robust MARL algorithm to enable zero-shot transfer to hardware considering the sim-to-real gap. A safety shield module using Control Barrier Functions (CBFs) provides safety guarantee for each individual agent. Experimental results on 1/10th-scale autonomous vehicles with V2V communication demonstrate the ability of RSR-RSMARL framework to enhance driving safety and coordination across multiple configurations. These findings emphasize the importance of jointly designing robust policy representations and modular safety architectures to enable scalable, generalizable RSR transfer in multi-agent autonomy.
comment: 19 pages, 9 Figures
{S\textsuperscript{2}M\textsuperscript{2}}: Scalable Stereo Matching Model for Reliable Depth Estimation ICCV
The pursuit of a generalizable stereo matching model, capable of performing well across varying resolutions and disparity ranges without dataset-specific fine-tuning, has revealed a fundamental trade-off. Iterative local search methods achieve high scores on constrained benchmarks, but their core mechanism inherently limits the global consistency required for true generalization. However, global matching architectures, while theoretically more robust, have historically been rendered infeasible by prohibitive computational and memory costs. We resolve this dilemma with {S\textsuperscript{2}M\textsuperscript{2}}: a global matching architecture that achieves state-of-the-art accuracy and high efficiency without relying on cost volume filtering or deep refinement stacks. Our design integrates a multi-resolution transformer for robust long-range correspondence, trained with a novel loss function that concentrates probability on feasible matches. This approach enables a more robust joint estimation of disparity, occlusion, and confidence. {S\textsuperscript{2}M\textsuperscript{2}} establishes a new state of the art on Middlebury v3 and ETH3D benchmarks, significantly outperforming prior methods in most metrics while reconstructing high-quality details with competitive efficiency.
comment: 8 pages, 5 figures, ICCV accepted paper
Agentic Vehicles for Human-Centered Mobility
Autonomy, from the Greek autos (self) and nomos (law), refers to the capacity to operate according to internal rules without external control. Autonomous vehicles (AuVs) are therefore understood as systems that perceive their environment and execute pre-programmed tasks independently of external input, consistent with the SAE levels of automated driving. Yet recent research and real-world deployments have begun to showcase vehicles that exhibit behaviors outside the scope of this definition. These include natural language interaction with humans, goal adaptation, contextual reasoning, external tool use, and the handling of unforeseen ethical dilemmas, enabled in part by multimodal large language models (LLMs). These developments highlight not only a gap between technical autonomy and the broader cognitive and social capacities required for human-centered mobility, but also the emergence of a form of vehicle intelligence that currently lacks a clear designation. To address this gap, the paper introduces the concept of agentic vehicles (AgVs): vehicles that integrate agentic AI systems to reason, adapt, and interact within complex environments. It synthesizes recent advances in agentic systems and suggests how AgVs can complement and even reshape conventional autonomy to ensure mobility services are aligned with user and societal needs. The paper concludes by outlining key challenges in the development and governance of AgVs and their potential role in shaping future agentic transportation systems.
NaviMaster: Learning a Unified Policy for GUI and Embodied Navigation Tasks
Recent advances in Graphical User Interface (GUI) and embodied navigation have driven progress, yet these domains have largely evolved in isolation, with disparate datasets and training paradigms. In this paper, we observe that both tasks can be formulated as Markov Decision Processes (MDP), suggesting a foundational principle for their unification. Hence, we present NaviMaster, the first unified agent capable of unifying GUI navigation and embodied navigation within a single framework. Specifically, NaviMaster (i) proposes a visual-target trajectory collection pipeline that generates trajectories for both GUI and embodied tasks using a single formulation. (ii) employs a unified reinforcement learning framework on the mix data to improve generalization. (iii) designs a novel distance-aware reward to ensure efficient learning from the trajectories. Through extensive experiments on out-of-domain benchmarks, NaviMaster is shown to outperform state-of-the-art agents in GUI navigation, spatial affordance prediction, and embodied navigation. Ablation studies further demonstrate the efficacy of our unified training strategy, data mixing strategy, and reward design.
comment: Homepage: https://iron-boyy.github.io/navimaster/
Robot Learning with Super-Linear Scaling
Scaling robot learning requires data collection pipelines that scale favorably with human effort. In this work, we propose Crowdsourcing and Amortizing Human Effort for Real-to-Sim-to-Real(CASHER), a pipeline for scaling up data collection and learning in simulation where the performance scales superlinearly with human effort. The key idea is to crowdsource digital twins of real-world scenes using 3D reconstruction and collect large-scale data in simulation, rather than the real-world. Data collection in simulation is initially driven by RL, bootstrapped with human demonstrations. As the training of a generalist policy progresses across environments, its generalization capabilities can be used to replace human effort with model generated demonstrations. This results in a pipeline where behavioral data is collected in simulation with continually reducing human effort. We show that CASHER demonstrates zero-shot and few-shot scaling laws on three real-world tasks across diverse scenarios. We show that CASHER enables fine-tuning of pre-trained policies to a target scenario using a video scan without any additional human effort. See our project website: https://casher-robot-learning.github.io/CASHER/
TinyIO: Lightweight Reparameterized Inertial Odometry
Inertial localization is regarded as a promising positioning solution for consumer-grade IoT devices due to its cost-effectiveness and independence from external infrastructure. However, data-driven inertial localization methods often rely on increasingly complex network architectures to improve accuracy, which challenges the limited computational resources of IoT devices. Moreover, these methods frequently overlook the importance of modeling long-term dependencies in inertial measurements - a critical factor for accurate trajectory reconstruction - thereby limiting localization performance. To address these challenges, we propose a reparameterized inertial localization network that uses a multi-branch structure during training to enhance feature extraction. At inference time, this structure is transformed into an equivalent single-path architecture to improve parameter efficiency. To further capture long-term dependencies in motion trajectories, we introduce a temporal-scale sparse attention mechanism that selectively emphasizes key trajectory segments while suppressing noise. Additionally, a gated convolutional unit is incorporated to effectively integrate long-range dependencies with local fine-grained features. Extensive experiments on public benchmarks demonstrate that our method achieves a favorable trade-off between accuracy and model compactness. For example, on the RoNIN dataset, our approach reduces the Absolute Trajectory Error (ATE) by 2.59% compared to RoNIN-ResNet while reducing the number of parameters by 3.86%.
StarIO: A Lightweight Inertial Odometry for Nonlinear Motion
Inertial odometry (IO) directly estimates the position of a carrier from inertial sensor measurements and serves as a core technology for the widespread deployment of consumer grade localization systems. While existing IO methods can accurately reconstruct simple and near linear motion trajectories, they often fail to account for drift errors caused by complex motion patterns such as turning. This limitation significantly degrades localization accuracy and restricts the applicability of IO systems in real world scenarios. To address these challenges, we propose a lightweight IO framework. Specifically, inertial data is projected into a high dimensional implicit nonlinear feature space using the Star Operation method, enabling the extraction of complex motion features that are typically overlooked. We further introduce a collaborative attention mechanism that jointly models global motion dynamics across both channel and temporal dimensions. In addition, we design Multi Scale Gated Convolution Units to capture fine grained dynamic variations throughout the motion process, thereby enhancing the model's ability to learn rich and expressive motion representations. Extensive experiments demonstrate that our proposed method consistently outperforms SOTA baselines across six widely used inertial datasets. Compared to baseline models on the RoNIN dataset, it achieves reductions in ATE ranging from 2.26% to 65.78%, thereby establishing a new benchmark in the field.
IONext: Unlocking the Next Era of Inertial Odometry
Researchers have increasingly adopted Transformer-based models for inertial odometry. While Transformers excel at modeling long-range dependencies, their limited sensitivity to local, fine-grained motion variations and lack of inherent inductive biases often hinder localization accuracy and generalization. Recent studies have shown that incorporating large-kernel convolutions and Transformer-inspired architectural designs into CNN can effectively expand the receptive field, thereby improving global motion perception. Motivated by these insights, we propose a novel CNN-based module called the Dual-wing Adaptive Dynamic Mixer (DADM), which adaptively captures both global motion patterns and local, fine-grained motion features from dynamic inputs. This module dynamically generates selective weights based on the input, enabling efficient multi-scale feature aggregation. To further improve temporal modeling, we introduce the Spatio-Temporal Gating Unit (STGU), which selectively extracts representative and task-relevant motion features in the temporal domain. This unit addresses the limitations of temporal modeling observed in existing CNN approaches. Built upon DADM and STGU, we present a new CNN-based inertial odometry backbone, named Next Era of Inertial Odometry (IONext). Extensive experiments on six public datasets demonstrate that IONext consistently outperforms state-of-the-art (SOTA) Transformer- and CNN-based methods. For instance, on the RNIN dataset, IONext reduces the average ATE by 10% and the average RTE by 12% compared to the representative model iMOT.
Policy Contrastive Decoding for Robotic Foundation Models
Robotic foundation models, or generalist robot policies, hold immense potential to enable flexible, general-purpose and dexterous robotic systems. Despite their advancements, our empirical experiments reveal that existing robot policies are prone to learning spurious correlations from pre-training trajectories, adversely affecting their generalization capabilities beyond the training data. To tackle this, we propose a novel Policy Contrastive Decoding (PCD) approach, which redirects the robot policy's focus toward object-relevant visual clues by contrasting action probability distributions derived from original and object-masked visual inputs. As a training-free method, our PCD can be used as a plugin to improve different types of robot policies without needing to finetune or access model weights. We conduct extensive experiments on top of three open-source robot policies, including the autoregressive policy OpenVLA and the diffusion-based policies Octo and $\pi_0$. The obtained results in both simulation and real-world environments prove PCD's flexibility and effectiveness, e.g., PCD enhances the state-of-the-art policy $\pi_0$ by 8.9% in the simulation environment and by 108% in the real-world environment. Code and demos are publicly available at: https://Koorye.github.io/proj/PCD.
Follow-Bench: A Unified Motion Planning Benchmark for Socially-Aware Robot Person Following
Robot person following (RPF) -- mobile robots that follow and assist a specific person -- has emerging applications in personal assistance, security patrols, eldercare, and logistics. To be effective, such robots must follow the target while ensuring safety and comfort for both the target and surrounding people. In this work, we present the first comprehensive study of RPF, which (i) surveys representative scenarios, motion-planning methods, and evaluation metrics with a focus on safety and comfort; (ii) introduces Follow-Bench, a unified benchmark simulating diverse scenarios, including various target trajectory patterns, crowd dynamics, and environmental layouts; and (iii) re-implements six representative RPF planners, ensuring that both safety and comfort are systematically considered. Moreover, we evaluate the two best-performing planners from our benchmark on a differential-drive robot to provide insights into the real-world deployment of RPF planners. Extensive simulation and real-world experiments provide a quantitative study of the safety-comfort trade-offs of existing planners, while revealing open challenges and future research directions.
comment: Project page: https://follow-bench.github.io/
Multiagent Systems
KG-MAS: Knowledge Graph-Enhanced Multi-Agent Infrastructure for coupling physical and digital robotic environments
The seamless integration of physical and digital environments in Cyber-Physical Systems(CPS), particularly within Industry 4.0, presents significant challenges stemming from system heterogeneity and complexity. Traditional approaches often rely on rigid, data-centric solutions like co-simulation frameworks or brittle point-to-point middleware bridges, which lack the semantic richness and flexibility required for intelligent, autonomous coordination. This report introduces the Knowledge Graph-Enhanced Multi-Agent Infrastructure(KG-MAS), as resolution in addressing such limitations. KG-MAS leverages a centralized Knowledge Graph (KG) as a dynamic, shared world model, providing a common semantic foundation for a Multi-Agent System(MAS). Autonomous agents, representing both physical and digital components, query this KG for decision-making and update it with real-time state information. The infrastructure features a model-driven architecture which facilitates the automatic generation of agents from semantic descriptions, thereby simplifying system extension and maintenance. By abstracting away underlying communication protocols and providing a unified, intelligent coordination mechanism, KG-MAS offers a robust, scalable, and flexible solution for coupling heterogeneous physical and digital robotic environments.
It Takes Two: Learning Interactive Whole-Body Control Between Humanoid Robots
The true promise of humanoid robotics lies beyond single-agent autonomy: two or more humanoids must engage in physically grounded, socially meaningful whole-body interactions that echo the richness of human social interaction. However, single-humanoid methods suffer from the isolation issue, ignoring inter-agent dynamics and causing misaligned contacts, interpenetrations, and unrealistic motions. To address this, we present Harmanoid , a dual-humanoid motion imitation framework that transfers interacting human motions to two robots while preserving both kinematic fidelity and physical realism. Harmanoid comprises two key components: (i) contact-aware motion retargeting, which restores inter-body coordination by aligning SMPL contacts with robot vertices, and (ii) interaction-driven motion controller, which leverages interaction-specific rewards to enforce coordinated keypoints and physically plausible contacts. By explicitly modeling inter-agent contacts and interaction-aware dynamics, Harmanoid captures the coupled behaviors between humanoids that single-humanoid frameworks inherently overlook. Experiments demonstrate that Harmanoid significantly improves interactive motion imitation, surpassing existing single-humanoid frameworks that largely fail in such scenarios.
MedAgentAudit: Diagnosing and Quantifying Collaborative Failure Modes in Medical Multi-Agent Systems
While large language model (LLM)-based multi-agent systems show promise in simulating medical consultations, their evaluation is often confined to final-answer accuracy. This practice treats their internal collaborative processes as opaque "black boxes" and overlooks a critical question: is a diagnostic conclusion reached through a sound and verifiable reasoning pathway? The inscrutable nature of these systems poses a significant risk in high-stakes medical applications, potentially leading to flawed or untrustworthy conclusions. To address this, we conduct a large-scale empirical study of 3,600 cases from six medical datasets and six representative multi-agent frameworks. Through a rigorous, mixed-methods approach combining qualitative analysis with quantitative auditing, we develop a comprehensive taxonomy of collaborative failure modes. Our quantitative audit reveals four dominant failure patterns: flawed consensus driven by shared model deficiencies, suppression of correct minority opinions, ineffective discussion dynamics, and critical information loss during synthesis. This study demonstrates that high accuracy alone is an insufficient measure of clinical or public trust. It highlights the urgent need for transparent and auditable reasoning processes, a cornerstone for the responsible development and deployment of medical AI.
comment: Code: https://github.com/yhzhu99/MedAgentAudit
ALLOY: Generating Reusable Agent Workflows from User Demonstration
Large language models (LLMs) enable end-users to delegate complex tasks to autonomous agents through natural language. However, prompt-based interaction faces critical limitations: Users often struggle to specify procedural requirements for tasks, especially those that don't have a factually correct solution but instead rely on personal preferences, such as posting social media content or planning a trip. Additionally, a ''successful'' prompt for one task may not be reusable or generalizable across similar tasks. We present ALLOY, a system inspired by classical HCI theories on Programming by Demonstration (PBD), but extended to enhance adaptability in creating LLM-based web agents. ALLOY enables users to express procedural preferences through natural demonstrations rather than prompts, while making these procedures transparent and editable through visualized workflows that can be generalized across task variations. In a study with 12 participants, ALLOY's demonstration--based approach outperformed prompt-based agents and manual workflows in capturing user intent and procedural preferences in complex web tasks. Insights from the study also show how demonstration--based interaction complements the traditional prompt-based approach.
Structured Cooperative Multi-Agent Reinforcement Learning: a Bayesian Network Perspective
The empirical success of multi-agent reinforcement learning (MARL) has motivated the search for more efficient and scalable algorithms for large scale multi-agent systems. However, existing state-of-the-art algorithms do not fully exploit inter-agent coupling information to develop MARL algorithms. In this paper, we propose a systematic approach to leverage structures in the inter-agent couplings for efficient model-free reinforcement learning. We model the cooperative MARL problem via a Bayesian network and characterize the subset of agents, termed as the value dependency set, whose information is required by each agent to estimate its local action value function exactly. Moreover, we propose a partially decentralized training decentralized execution (P-DTDE) paradigm based on the value dependency set. We theoretically establish that the total variance of our P-DTDE policy gradient estimator is less than the centralized training decentralized execution (CTDE) policy gradient estimator. We derive a multi-agent policy gradient theorem based on the P-DTDE scheme and develop a scalable actor-critic algorithm. We demonstrate the efficiency and scalability of the proposed algorithm on multi-warehouse resource allocation and multi-zone temperature control examples. For dense value dependency sets, we propose an approximation scheme based on truncation of the Bayesian network and empirically show that it achieves a faster convergence than the exact value dependence set for applications with a large number of agents.
Scheming Ability in LLM-to-LLM Strategic Interactions
As large language model (LLM) agents are deployed autonomously in diverse contexts, evaluating their capacity for strategic deception becomes crucial. While recent research has examined how AI systems scheme against human developers, LLM-to-LLM scheming remains underexplored. We investigate the scheming ability and propensity of frontier LLM agents through two game-theoretic frameworks: a Cheap Talk signaling game and a Peer Evaluation adversarial game. Testing four models (GPT-4o, Gemini-2.5-pro, Claude-3.7-Sonnet, and Llama-3.3-70b), we measure scheming performance with and without explicit prompting while analyzing scheming tactics through chain-of-thought reasoning. When prompted, most models, especially Gemini-2.5-pro and Claude-3.7-Sonnet, achieved near-perfect performance. Critically, models exhibited significant scheming propensity without prompting: all models chose deception over confession in Peer Evaluation (100% rate), while models choosing to scheme in Cheap Talk succeeded at 95-100% rates. These findings highlight the need for robust evaluations using high-stakes game-theoretic scenarios in multi-agent settings.
comment: 25 pages, 13 figures, under review at IASEAI'26
MADS: Multi-Agent Dialogue Simulation for Diverse Persuasion Data Generation
We propose MADS (Multi-Agent Dialogue Simulation), a scalable framework for generating persuasive multi-turn dialogues via agent self-play. MADS employs three coordinated agents: User Agents designed to simulate diverse persona-driven behaviors by leveraging personality signifiers such as Zodiac Signs and MBTI types, a Dialog Agent executing task-oriented persuasion strategies and an Optimization Agent evaluating and refining dialogue outcomes. We further validate its effectiveness through users' Chain-of-Attitude (CoA) modeling and dedicated LLMs' persuasion assessment. This approach enables low-cost generation of training data without human annotation, addressing key industry challenges such as lack of user data, cold-start evaluation difficulties, and prompt inefficiency. Applied to a real-world marketing scenario, MADS significantly improved the persuasion capacity of small LLMs, increasing the organic traffic conversion rate by 22.4% (from 1.83% to 2.24%) , demonstrating clear business value.
ASTREA: Introducing Agentic Intelligence for Orbital Thermal Autonomy
This paper presents ASTREA, the first agentic system executed on flight-heritage hardware (TRL 9) for autonomous spacecraft operations, with on-orbit operation aboard the International Space Station (ISS). Using thermal control as a representative use case, we integrate a resource-constrained Large Language Model (LLM) agent with a reinforcement learning controller in an asynchronous architecture tailored for space-qualified platforms. Ground experiments show that LLM-guided supervision improves thermal stability and reduces violations, confirming the feasibility of combining semantic reasoning with adaptive control under hardware constraints. On-orbit validation aboard the ISS initially faced challenges due to inference latency misaligned with the rapid thermal cycles of Low Earth Orbit (LEO) satellites. Synchronization with the orbit length successfully surpassed the baseline with reduced violations, extended episode durations, and improved CPU utilization. These findings demonstrate the potential for scalable agentic supervision architectures in future autonomous spacecraft.
comment: Accepted for presentation at the European Space Agency's AI Start 2025 Conference (see https://atpi.eventsair.com/ai-star-2025/)
Debate, Deliberate, Decide (D3): A Cost-Aware Adversarial Framework for Reliable and Interpretable LLM Evaluation
The evaluation of Large Language Models (LLMs) remains challenging due to inconsistency, bias, and the absence of transparent decision criteria in automated judging. We present Debate, Deliberate, Decide (D3), a cost-aware, adversarial multi-agent framework that orchestrates structured debate among role-specialized agents (advocates, a judge, and an optional jury) to produce reliable and interpretable evaluations. D3 instantiates two complementary protocols: (1) Multi-Advocate One-Round Evaluation (MORE), which elicits k parallel defenses per answer to amplify signal via diverse advocacy, and (2) Single-Advocate Multi-Round Evaluation (SAMRE) with budgeted stopping, which iteratively refines arguments under an explicit token budget and convergence checks. We develop a probabilistic model of score gaps that (i) characterizes reliability and convergence under iterative debate and (ii) explains the separation gains from parallel advocacy. Under mild assumptions, the posterior distribution of the round-r gap concentrates around the true difference and the probability of mis-ranking vanishes; moreover, aggregating across k advocates provably increases expected score separation. We complement theory with a rigorous experimental suite across MT-Bench, AlignBench, and AUTO-J, showing state-of-the-art agreement with human judgments (accuracy and Cohen's kappa), reduced positional and verbosity biases via anonymization and role diversification, and a favorable cost-accuracy frontier enabled by budgeted stopping. Ablations and qualitative analyses isolate the contributions of debate, aggregation, and anonymity. Together, these results establish D3 as a principled, practical recipe for reliable, interpretable, and cost-aware LLM evaluation.
HealthFlow: A Self-Evolving AI Agent with Meta Planning for Autonomous Healthcare Research
The rapid proliferation of scientific knowledge presents a grand challenge: transforming this vast repository of information into an active engine for discovery, especially in high-stakes domains like healthcare. Current AI agents, however, are constrained by static, predefined strategies, limiting their ability to navigate the complex, evolving ecosystem of scientific research. This paper introduces HealthFlow, a self-evolving AI agent that overcomes this limitation through a novel meta-level evolution mechanism. HealthFlow autonomously refines its high-level problem-solving policies by distilling procedural successes and failures into a durable, structured knowledge base, enabling it to learn not just how to use tools, but how to strategize. To anchor our research and provide a community resource, we introduce EHRFlowBench, a new benchmark featuring complex health data analysis tasks systematically derived from peer-reviewed scientific literature. Our experiments demonstrate that HealthFlow's self-evolving approach significantly outperforms state-of-the-art agent frameworks. This work offers a new paradigm for intelligent systems that can learn to operationalize the procedural knowledge embedded in scientific content, marking a critical step toward more autonomous and effective AI for healthcare scientific discovery.
comment: Code: https://github.com/yhzhu99/HealthFlow
Coordination Requires Simplification: Thermodynamic Bounds on Multi-Objective Compromise in Natural and Artificial Intelligence
Information-processing systems coordinating across multiple agents and objectives face fundamental thermodynamic constraints. We show that solutions with maximum utility to act as coordination focal points have much higher selection pressure for being findable across agents rather than accuracy. We derive that the information-theoretic minimum description length of coordination protocols to precision $\varepsilon$ scales as $L(P)\geq NK\log_2 K+N^2d^2\log (1/\varepsilon)$ for $N$ agents with $d$ potentially conflicting objectives and internal model complexity $K$. This scaling forces progressive simplification, with coordination dynamics changing the environment itself and shifting optimization across hierarchical levels. Moving from established focal points requires re-coordination, creating persistent metastable states and hysteresis until significant environmental shifts trigger phase transitions through spontaneous symmetry breaking. We operationally define coordination temperature to predict critical phenomena and estimate coordination work costs, identifying measurable signatures across systems from neural networks to restaurant bills to bureaucracies. Extending the topological version of Arrow's theorem on the impossibility of consistent preference aggregation, we find it recursively binds whenever preferences are combined. This potentially explains the indefinite cycling in multi-objective gradient descent and alignment faking in Large Language Models trained with reinforcement learning with human feedback. We term this framework Thermodynamic Coordination Theory (TCT), which demonstrates that coordination requires radical information loss.
comment: 11 pages, 1 figure, 6 pages supplementary material, submitted to Physical Review E
Robust and Safe Multi-Agent Reinforcement Learning with Communication for Autonomous Vehicles: From Simulation to Hardware
Deep multi-agent reinforcement learning (MARL) has been demonstrated effectively in simulations for multi-robot problems. For autonomous vehicles, the development of vehicle-to-vehicle (V2V) communication technologies provide opportunities to further enhance system safety. However, zero-shot transfer of simulator-trained MARL policies to dynamic hardware systems remains challenging, and how to leverage communication and shared information for MARL has limited demonstrations on hardware. This problem is challenged by discrepancies between simulated and physical states, system state and model uncertainties, practical shared information design, and the need for safety guarantees in both simulation and hardware. This paper designs RSR-RSMARL, a novel Robust and Safe MARL framework that supports Real-Sim-Real (RSR) policy adaptation for multi-agent systems with communication among agents, with both simulation and hardware demonstrations. RSR-RSMARL leverages state (includes shared state information among agents) and action representations considering real system complexities for MARL formulation. The MARL policy is trained with robust MARL algorithm to enable zero-shot transfer to hardware considering the sim-to-real gap. A safety shield module using Control Barrier Functions (CBFs) provides safety guarantee for each individual agent. Experimental results on 1/10th-scale autonomous vehicles with V2V communication demonstrate the ability of RSR-RSMARL framework to enhance driving safety and coordination across multiple configurations. These findings emphasize the importance of jointly designing robust policy representations and modular safety architectures to enable scalable, generalizable RSR transfer in multi-agent autonomy.
comment: 19 pages, 9 Figures
OrbitZoo: Multi-Agent Reinforcement Learning Environment for Orbital Dynamics NeurIPS 2025
The increasing number of satellites and orbital debris has made space congestion a critical issue, threatening satellite safety and sustainability. Challenges such as collision avoidance, station-keeping, and orbital maneuvering require advanced techniques to handle dynamic uncertainties and multi-agent interactions. Reinforcement learning (RL) has shown promise in this domain, enabling adaptive, autonomous policies for space operations; however, many existing RL frameworks rely on custom-built environments developed from scratch, which often use simplified models and require significant time to implement and validate the orbital dynamics, limiting their ability to fully capture real-world complexities. To address this, we introduce OrbitZoo, a versatile multi-agent RL environment built on a high-fidelity industry standard library, that enables realistic data generation, supports scenarios like collision avoidance and cooperative maneuvers, and ensures robust and accurate orbital dynamics. The environment is validated against a real satellite constellation, Starlink, achieving a Mean Absolute Percentage Error (MAPE) of 0.16% compared to real-world data. This validation ensures reliability for generating high-fidelity simulations and enabling autonomous and independent satellite operations.
comment: Accepted for publication at the Thirty-Ninth Conference on Neural Information Processing Systems (NeurIPS 2025)
On the Surprising Effectiveness of a Single Global Merging in Decentralized Learning
Decentralized learning provides a scalable alternative to parameter-server-based training, yet its performance is often hindered by limited peer-to-peer communication. In this paper, we study how communication should be scheduled over time to improve global generalization, including determining when and how frequently devices synchronize. Counterintuitive empirical results show that concentrating communication budgets in the later stages of decentralized training remarkably improves global generalization. Surprisingly, we uncover that fully connected communication at the final step, implemented by a single global merging, can significant improve the generalization performance of decentralized learning under serve high data heterogeneity. Our theoretical contributions, which explains these phenomena, are first to establish that the globally merged model of decentralized SGD can match the convergence rate of parallel SGD. Technically, we reinterpret part of the discrepancy among local models, which were previously considered as detrimental noise, as constructive components essential for matching this rate. This work provides promising results that decentralized learning is able to generalize under high data heterogeneity and limited communication, while offering broad new avenues for model merging research. The code will be made publicly available.
comment: We discover and theoretically explain why and when a single global parameter merging in decentralized learning can recover the performance of federated learning, even in highly heterogeneous and communication-constrained environments
Systems and Control (CS)
Low-cost Pyranometer-Based ANN Approach for MPPT in Solar PV Systems
This article presents a study on the application of artificial neural networks (ANNs) for maximum power point tracking (MPPT) in photovoltaic (PV) systems using low-cost pyranometer sensors. The proposed approach integrates pyranometers, temperature sensors, and an ANN to estimate the duty cycle of a DC/DC converter, enabling the system to consistently operate at its maximum power point. The strategy was implemented in the local control of a Cuk converter and experimentally validated against the conventional Perturb and Observe (P&O) method. Results demonstrate that the ANN-based technique, leveraging affordable sensor technology, achieves accurate MPPT performance with reduced fluctuations, enhancing the responsiveness and efficiency of PV tracking systems.
The algorithmic regulator
The regulator theorem states that, under certain conditions, any optimal controller must embody a model of the system it regulates, grounding the idea that controllers embed, explicitly or implicitly, internal models of the controlled. This principle underpins neuroscience and predictive brain theories like the Free-Energy Principle or Kolmogorov/Algorithmic Agent theory. However, the theorem is only proven in limited settings. Here, we treat the deterministic, closed, coupled world-regulator system $(W,R)$ as a single self-delimiting program $p$ via a constant-size wrapper that produces the world output string~$x$ fed to the regulator. We analyze regulation from the viewpoint of the algorithmic complexity of the output, $K(x)$. We define $R$ to be a \emph{good algorithmic regulator} if it \emph{reduces} the algorithmic complexity of the readout relative to a null (unregulated) baseline $\varnothing$, i.e., \[ \Delta = K\big(O_{W,\varnothing}\big) - K\big(O_{W,R}\big) > 0. \] We then prove that the larger $\Delta$ is, the more world-regulator pairs with high mutual algorithmic information are favored. More precisely, a complexity gap $\Delta > 0$ yields \[ \Pr\big((W,R)\mid x\big) \le C\,2^{\,M(W{:}R)}\,2^{-\Delta}, \] making low $M(W{:}R)$ exponentially unlikely as $\Delta$ grows. This is an AIT version of the idea that ``the regulator contains a model of the world.'' The framework is distribution-free, applies to individual sequences, and complements the Internal Model Principle. Beyond this necessity claim, the same coding-theorem calculus singles out a \emph{canonical scalar objective} and implicates a \emph{planner}. On the realized episode, a regulator behaves \emph{as if} it minimized the conditional description length of the readout.
comment: 2 Figures
Optimal monophasic, asymmetric electric field pulses for selective transcranial magnetic stimulation (TMS) with minimised power and coil heating
Transcranial magnetic stimulation (TMS) with asymmetric electric field pulses, such as monophasic, offers directional selectivity for neural activation but requires excessive energy. Previous pulse shape optimisation has been limited to symmetric pulses or heavily constrained variations of conventional waveforms without achieving general optimality in energy efficiency or neural selectivity. We implemented an optimisation framework that incorporates neuron model activation constraints and flexible control of pulse asymmetry. The optimised electric field waveforms achieved up to 92 % and 88 % reduction in energy loss and thus coil heating respectively compared to conventional monophasic pulses and previously improved monophasic-equivalent pulses. In the human experiments, OUR pulses showed similar motor thresholds to monophasic pulses in both AP and PA directions with significantly lower energy loss, particularly in the AP direction. Moreover, there was a significant MEP latency difference of (1.79 +/- 0.41) ms between AP and PA direction with OUR pulses, which suggests directional selectivity. Our framework successfully identified highly energy-efficient asymmetric pulses for directionally-selective neural engagement. These pulses can enable selective rapid-rate repetitive TMS protocols with reduced power consumption and coil heating, with potential benefits for precision and potency of neuro-modulation.
comment: 31 pages, 8 figures
Hybrid MAC Protocol with Integrated Multi-Layered Security for Resource-Constrained UAV Swarm Communications
Flying Ad Hoc Networks (FANETs) present unique challenges due to high node mobility, dynamic topologies, and strict resource constraints. Existing routing protocols often optimize for a single metric, such as path length or energy, while neglecting the complex dependencies between network performance, security, and MAC layer efficiency. This paper introduces a novel hardware software co design framework for secure and adaptive UAV swarm communications, featuring an energy aware protocol stack. The architecture employs a multicast, clustered organization where routing decisions integrate dynamic trust scores, historical link quality, and internodal distance. A hybrid MAC protocol combines contention based and scheduled channel access for optimized throughput. Security is ensured through a zero trust model that fuses cryptographic authentication with a behavioral reputation system, alongside hardware accelerated AES GCM encryption. Comparative analysis in an NS 3 simulation environment demonstrates the framework's superiority in packet delivery ratio, latency, resilience, and overhead, providing a scalable foundation for high performance swarm operations.
comment: Accepted at ISED 2025
Bounds of Validity for Bifurcations of Equilibria in a Class of Networked Dynamical Systems
Local bifurcation analysis plays a central role in understanding qualitative transitions in networked nonlinear dynamical systems, including dynamic neural network and opinion dynamics models. In this article we establish explicit bounds of validity for the classification of bifurcation diagrams in two classes of continuous-time networked dynamical systems, analogous in structure to the Hopfield and the Firing Rate dynamic neural network models. Our approach leverages recent advances in computing the bounds for the validity of Lyapunov-Schmidt reduction, a reduction method widely employed in nonlinear systems analysis. Using these bounds we rigorously characterize neighborhoods around bifurcation points where predictions from reduced-order models remain reliable. We further demonstrate how these bounds can be applied to an illustrative family of nonlinear opinion dynamics on k-regular graphs, which emerges as a special case of the general framework. These results provide new analytical tools for quantifying the robustness of bifurcation phenomena in dynamics over networked systems and highlight the interplay between network structure and nonlinear dynamical behavior.
comment: This manuscript has been submitted to the 2026 American Control Conference taking place in New Orleans, Louisiana, in May 2026
Distributionally Robust Control with End-to-End Statistically Guaranteed Metric Learning
Wasserstein distributionally robust control (DRC) recently emerges as a principled paradigm for handling uncertainty in stochastic dynamical systems. However, it constructs data-driven ambiguity sets via uniform distribution shifts before sequentially incorporating them into downstream control synthesis. This segregation between ambiguity set construction and control objectives inherently introduces a structural misalignment, which undesirably leads to conservative control policies with sub-optimal performance. To address this limitation, we propose a novel end-to-end finite-horizon Wasserstein DRC framework that integrates the learning of anisotropic Wasserstein metrics with downstream control tasks in a closed-loop manner, thus enabling ambiguity sets to be systematically adjusted along performance-critical directions and yielding more effective control policies. This framework is formulated as a bilevel program: the inner level characterizes dynamical system evolution under DRC, while the outer level refines the anisotropic metric leveraging control-performance feedback across a range of initial conditions. To solve this program efficiently, we develop a stochastic augmented Lagrangian algorithm tailored to the bilevel structure. Theoretically, we prove that the learned ambiguity sets preserve statistical finite-sample guarantees under a novel radius adjustment mechanism, and we establish the well-posedness of the bilevel formulation by demonstrating its continuity with respect to the learnable metric. Furthermore, we show that the algorithm converges to stationary points of the outer level problem, which are statistically consistent with the optimal metric at a non-asymptotic convergence rate. Experiments on both numerical and inventory control tasks verify that the proposed framework achieves superior closed-loop performance and robustness compared against state-of-the-art methods.
Performance Index Shaping for Closed-loop Optimal Control
The design of the performance index, also referred to as cost or reward shaping, is central to both optimal control and reinforcement learning, as it directly determines the behaviors, trade-offs, and objectives that the resulting control laws seek to achieve. A commonly used approach for this inference task in recent years is differentiable trajectory optimization, which allows gradients to be computed with respect to cost parameters by differentiating through an optimal control solver. However, this method often requires repeated solving of the underlying optimal control problem at every iteration, making the method computationally expensive. In this work, assuming known dynamics, we propose a novel framework that analytically links the performance index to the resulting closed-loop optimal control law, thereby transforming a typically bi-level inverse problem into a tractable single-level formulation. Our approach is motivated by the question: given a closed-loop control law that solves an infinite-horizon optimal control problem, how does this law change when the performance index is modified with additional terms? This formulation yields closed-form characterizations for broad classes of systems and performance indices, which not only facilitate interpretation and stability analysis, but also provide insight into the robust stability and input-to-state stable behavior of the resulting nonlinear closed-loop system. Moreover, this analytical perspective enables the generalization of our approach to diverse design objectives, yielding a unifying framework for performance index shaping. Given specific design objectives, we propose a systematic methodology to guide the shaping of the performance index and thereby design the resulting optimal control law.
Modeling the Impact of Communication and Human Uncertainties on Runway Capacity in Terminal Airspace
We investigate the potential impact of communication and human performance uncertainties on runway operations. Specifically, we consider these impacts within the context of an arrival scenario with two converging flows: a straight-in approach stream and a downwind stream merging into it. Both arrival stream are modeled using a modified Possion distribution that incorporate the separation minima as well as the runway occupancy time. Various system level uncertainties are addressed in this process, including communication link- and human-related uncertainties. In this research, we first build a Monte Carlo-based discrete-time simulation, where aircraft arrivals are generated by modified Poisson processes subject to minimum separation constraints, simulating various traffic operations. The merging logic incorporates standard bank angle continuous turn-to-final, pilot response delays, and dynamic gap availability in real time. Then, we investigate an automated final approach vectoring model (i.e., Auto-ATC), in which inverse optimal control is used to learn decision advisories from human expert records. By augmenting trajectories and incorporating the aforementioned uncertainties into the planning scenario, we create a setup analogous to the discrete event simulation. For both studies, runway capacity is measured by runway throughput, the fraction of downwind arrivals that merge immediately without holding, and the average delay (i.e., holding time/distance) experienced on the downwind leg. This research provides a method for runway capacity estimation in merging scenarios, and demonstrates that aeronautical communication link uncertainties significantly affect runway capacity in current voice-based operations, whereas the impact can be mitigated in autonomous operational settings.
Causal-Guided Dimension Reduction for Efficient Pareto Optimization
Multi-objective optimization of analog circuits is hindered by high-dimensional parameter spaces, strong feedback couplings, and expensive transistor-level simulations. Evolutionary algorithms such as Non-dominated Sorting Genetic Algorithm II (NSGA-II) are widely used but treat all parameters equally, thereby wasting effort on variables with little impact on performance, which limits their scalability. We introduce CaDRO, a causal-guided dimensionality reduction framework that embeds causal discovery into the optimization pipeline. CaDRO builds a quantitative causal map through a hybrid observational-interventional process, ranking parameters by their causal effect on the objectives. Low-impact parameters are fixed to values from high-quality solutions, while critical drivers remain active in the search. The reduced design space enables focused evolutionary optimization without modifying the underlying algorithm. Across amplifiers, regulators, and RF circuits, CaDRO converges up to 10$\times$ faster than NSGA-II while preserving or improving Pareto quality. For instance, on the Folded-Cascode Amplifier, hypervolume improves from 0.56 to 0.94, and on the LDO regulator from 0.65 to 0.81, with large gains in non-dominated solutions.
Structured Cooperative Multi-Agent Reinforcement Learning: a Bayesian Network Perspective
The empirical success of multi-agent reinforcement learning (MARL) has motivated the search for more efficient and scalable algorithms for large scale multi-agent systems. However, existing state-of-the-art algorithms do not fully exploit inter-agent coupling information to develop MARL algorithms. In this paper, we propose a systematic approach to leverage structures in the inter-agent couplings for efficient model-free reinforcement learning. We model the cooperative MARL problem via a Bayesian network and characterize the subset of agents, termed as the value dependency set, whose information is required by each agent to estimate its local action value function exactly. Moreover, we propose a partially decentralized training decentralized execution (P-DTDE) paradigm based on the value dependency set. We theoretically establish that the total variance of our P-DTDE policy gradient estimator is less than the centralized training decentralized execution (CTDE) policy gradient estimator. We derive a multi-agent policy gradient theorem based on the P-DTDE scheme and develop a scalable actor-critic algorithm. We demonstrate the efficiency and scalability of the proposed algorithm on multi-warehouse resource allocation and multi-zone temperature control examples. For dense value dependency sets, we propose an approximation scheme based on truncation of the Bayesian network and empirically show that it achieves a faster convergence than the exact value dependence set for applications with a large number of agents.
Viscosity CBFs: Bridging the Control Barrier Function and Hamilton-Jacobi Reachability Frameworks in Safe Control Theory
Control barrier functions (CBFs) and Hamilton-Jacobi reachability (HJR) are central frameworks in safe control. Traditionally, these frameworks have been viewed as distinct, with the former focusing on optimally safe controller design and the latter providing sufficient conditions for safety. A previous work introduced the notion of a control barrier value function (CB-VF), which is defined similarly to the other value functions studied in HJR but has certain CBF-like properties. In this work, we proceed the other direction by generalizing CBFs to non-differentiable ``viscosity'' CBFs. We show the deep connection between viscosity CBFs and CB-VFs, bridging the CBF and HJR frameworks. Through this bridge, we characterize the viscosity CBFs as precisely those functions which provide CBF-like safety guarantees (control invariance and smooth approach to the boundary). We then further show nice theoretical properties of viscosity CBFs, including their desirable closure under maximum and limit operations. In the process, we also extend CB-VFs to non-exponential anti-discounting and update the corresponding theory for CB-VFs along these lines.
ASTREA: Introducing Agentic Intelligence for Orbital Thermal Autonomy
This paper presents ASTREA, the first agentic system executed on flight-heritage hardware (TRL 9) for autonomous spacecraft operations, with on-orbit operation aboard the International Space Station (ISS). Using thermal control as a representative use case, we integrate a resource-constrained Large Language Model (LLM) agent with a reinforcement learning controller in an asynchronous architecture tailored for space-qualified platforms. Ground experiments show that LLM-guided supervision improves thermal stability and reduces violations, confirming the feasibility of combining semantic reasoning with adaptive control under hardware constraints. On-orbit validation aboard the ISS initially faced challenges due to inference latency misaligned with the rapid thermal cycles of Low Earth Orbit (LEO) satellites. Synchronization with the orbit length successfully surpassed the baseline with reduced violations, extended episode durations, and improved CPU utilization. These findings demonstrate the potential for scalable agentic supervision architectures in future autonomous spacecraft.
comment: Accepted for presentation at the European Space Agency's AI Start 2025 Conference (see https://atpi.eventsair.com/ai-star-2025/)
Improving cosmological reach of a gravitational wave observatory using Deep Loop Shaping
Improved low-frequency sensitivity of gravitational wave observatories would unlock study of intermediate-mass black hole mergers, binary black hole eccentricity, and provide early warnings for multi-messenger observations of binary neutron star mergers. Today's mirror stabilization control injects harmful noise, constituting a major obstacle to sensitivity improvements. We eliminated this noise through Deep Loop Shaping, a reinforcement learning method using frequency domain rewards. We proved our methodology on the LIGO Livingston Observatory (LLO). Our controller reduced control noise in the 10--30Hz band by over 30x, and up to 100x in sub-bands surpassing the design goal motivated by the quantum limit. These results highlight the potential of Deep Loop Shaping to improve current and future GW observatories, and more broadly instrumentation and control systems.
comment: Re-added a reference that was dropped by mistake in the published paper. Fixed date of experiment in text
A Vision-Based Shared-Control Teleoperation Scheme for Controlling the Robotic Arm of a Four-Legged Robot
In hazardous and remote environments, robotic systems perform critical tasks demanding improved safety and efficiency. Among these, quadruped robots with manipulator arms offer mobility and versatility for complex operations. However, teleoperating quadruped robots is challenging due to the lack of integrated obstacle detection and intuitive control methods for the robotic arm, increasing collision risks in confined or dynamically changing workspaces. Teleoperation via joysticks or pads can be non-intuitive and demands a high level of expertise due to its complexity, culminating in a high cognitive load on the operator. To address this challenge, a teleoperation approach that directly maps human arm movements to the robotic manipulator offers a simpler and more accessible solution. This work proposes an intuitive remote control by leveraging a vision-based pose estimation pipeline that utilizes an external camera with a machine learning-based model to detect the operator's wrist position. The system maps these wrist movements into robotic arm commands to control the robot's arm in real-time. A trajectory planner ensures safe teleoperation by detecting and preventing collisions with both obstacles and the robotic arm itself. The system was validated on the real robot, demonstrating robust performance in real-time control. This teleoperation approach provides a cost-effective solution for industrial applications where safety, precision, and ease of use are paramount, ensuring reliable and intuitive robotic control in high-risk environments.
Autonomous UAV Flight Navigation in Confined Spaces: A Reinforcement Learning Approach
Autonomous UAV inspection of confined industrial infrastructure, such as ventilation ducts, demands robust navigation policies where collisions are unacceptable. While Deep Reinforcement Learning (DRL) offers a powerful paradigm for developing such policies, it presents a critical trade-off between on-policy and off-policy algorithms. Off-policy methods promise high sample efficiency, a vital trait for minimizing costly and unsafe real-world fine-tuning. In contrast, on-policy methods often exhibit greater training stability, which is essential for reliable convergence in hazard-dense environments. This paper directly investigates this trade-off by comparing a leading on-policy algorithm, Proximal Policy Optimization (PPO), against an off-policy counterpart, Soft Actor-Critic (SAC), for precision flight in procedurally generated ducts within a high-fidelity simulator. Our results show that PPO consistently learned a stable, collision-free policy that completed the entire course. In contrast, SAC failed to find a complete solution, converging to a suboptimal policy that navigated only the initial segments before failure. This work provides evidence that for high-precision, safety-critical navigation tasks, the reliable convergence of a well-established on-policy method can be more decisive than the nominal sample efficiency of an off-policy algorithm.
Optimizing Grasping in Legged Robots: A Deep Learning Approach to Loco-Manipulation
This paper presents a deep learning framework designed to enhance the grasping capabilities of quadrupeds equipped with arms, with a focus on improving precision and adaptability. Our approach centers on a sim-to-real methodology that minimizes reliance on physical data collection. We developed a pipeline within the Genesis simulation environment to generate a synthetic dataset of grasp attempts on common objects. By simulating thousands of interactions from various perspectives, we created pixel-wise annotated grasp-quality maps to serve as the ground truth for our model. This dataset was used to train a custom CNN with a U-Net-like architecture that processes multi-modal input from an onboard RGB and depth cameras, including RGB images, depth maps, segmentation masks, and surface normal maps. The trained model outputs a grasp-quality heatmap to identify the optimal grasp point. We validated the complete framework on a four-legged robot. The system successfully executed a full loco-manipulation task: autonomously navigating to a target object, perceiving it with its sensors, predicting the optimal grasp pose using our model, and performing a precise grasp. This work proves that leveraging simulated training with advanced sensing offers a scalable and effective solution for object handling.
A Novel Hybrid Grey Wolf Differential Evolution Algorithm
Grey wolf optimizer (GWO) is a nature-inspired stochastic meta-heuristic of the swarm intelligence field that mimics the hunting behavior of grey wolves. Differential evolution (DE) is a popular stochastic algorithm of the evolutionary computation field that is well suited for global optimization. In this part, we introduce a new algorithm based on the hybridization of GWO and two DE variants, namely the GWO-DE algorithm. We evaluate the new algorithm by applying various numerical benchmark functions. The numerical results of the comparative study are quite satisfactory in terms of performance and solution quality.
comment: 19 pages, 32 figures, journal
Continuous body 3-D reconstruction of limbless animals
Limbless animals such as snakes, limbless lizards, worms, eels, and lampreys move their slender, long bodies in three dimensions to traverse diverse environments. Accurately quantifying their continuous body's 3-D shape and motion is important for understanding body-environment interactions in complex terrain, but this is difficult to achieve (especially for local orientation and rotation). Here, we describe an interpolation method to quantify continuous body 3-D position and orientation. We simplify the body as an elastic rod and apply a backbone optimization method to interpolate continuous body shape between end constraints imposed by tracked markers. Despite over-simplifying the biomechanics, our method achieves a higher interpolation accuracy (~50% error) in both 3-D position and orientation compared with the widely-used cubic B-spline interpolation method. Beyond snakes traversing large obstacles as demonstrated, our method applies to other long, slender, limbless animals and continuum robots. We provide codes and demo files for easy application of our method.
Optimal Planning of Electric Vehicle Charging Stations: Integrating Public Charging Networks and Transportation Congestion
The adoption of electric vehicles (EVs) represents a critical shift in personal mobility, fueled by policy support and advancements in automotive technology. However, the expansion of EVs for long-distance travel is hindered by charging time concerns, the sparse distribution of charging stations, and the worsening waiting times due to congestion. The main objective of this work is two-fold: 1) first, to comprehensively analyze the existing public charging station robustness and effectively strategize for the new ones, and 2) secondly, to select the optimal chargers for long-distance journeys, by estimating the waiting time from current traffic congestion. This is achieved by accompanying effective EV charging strategies, pinpointing on the congestion points from the existing traffic, and the robustness of the current charging station infrastructure. Utilizing a real-time transportation and charging station dataset in Texas, we identify optimal charger placement strategies to minimize travel time by examining the congestion and charging time trade-offs. Our findings suggest that maximizing the constant current phase during charging enhances efficiency, crucial for long-distance travel. On the contrary, we also explore the negative impact of congestion on travel times and we conclude that sometimes it might be beneficial to exceed the constant current phase to avoid the congested charging stations.
Systems and Control (EESS)
Low-cost Pyranometer-Based ANN Approach for MPPT in Solar PV Systems
This article presents a study on the application of artificial neural networks (ANNs) for maximum power point tracking (MPPT) in photovoltaic (PV) systems using low-cost pyranometer sensors. The proposed approach integrates pyranometers, temperature sensors, and an ANN to estimate the duty cycle of a DC/DC converter, enabling the system to consistently operate at its maximum power point. The strategy was implemented in the local control of a Cuk converter and experimentally validated against the conventional Perturb and Observe (P&O) method. Results demonstrate that the ANN-based technique, leveraging affordable sensor technology, achieves accurate MPPT performance with reduced fluctuations, enhancing the responsiveness and efficiency of PV tracking systems.
The algorithmic regulator
The regulator theorem states that, under certain conditions, any optimal controller must embody a model of the system it regulates, grounding the idea that controllers embed, explicitly or implicitly, internal models of the controlled. This principle underpins neuroscience and predictive brain theories like the Free-Energy Principle or Kolmogorov/Algorithmic Agent theory. However, the theorem is only proven in limited settings. Here, we treat the deterministic, closed, coupled world-regulator system $(W,R)$ as a single self-delimiting program $p$ via a constant-size wrapper that produces the world output string~$x$ fed to the regulator. We analyze regulation from the viewpoint of the algorithmic complexity of the output, $K(x)$. We define $R$ to be a \emph{good algorithmic regulator} if it \emph{reduces} the algorithmic complexity of the readout relative to a null (unregulated) baseline $\varnothing$, i.e., \[ \Delta = K\big(O_{W,\varnothing}\big) - K\big(O_{W,R}\big) > 0. \] We then prove that the larger $\Delta$ is, the more world-regulator pairs with high mutual algorithmic information are favored. More precisely, a complexity gap $\Delta > 0$ yields \[ \Pr\big((W,R)\mid x\big) \le C\,2^{\,M(W{:}R)}\,2^{-\Delta}, \] making low $M(W{:}R)$ exponentially unlikely as $\Delta$ grows. This is an AIT version of the idea that ``the regulator contains a model of the world.'' The framework is distribution-free, applies to individual sequences, and complements the Internal Model Principle. Beyond this necessity claim, the same coding-theorem calculus singles out a \emph{canonical scalar objective} and implicates a \emph{planner}. On the realized episode, a regulator behaves \emph{as if} it minimized the conditional description length of the readout.
comment: 2 Figures
Optimal monophasic, asymmetric electric field pulses for selective transcranial magnetic stimulation (TMS) with minimised power and coil heating
Transcranial magnetic stimulation (TMS) with asymmetric electric field pulses, such as monophasic, offers directional selectivity for neural activation but requires excessive energy. Previous pulse shape optimisation has been limited to symmetric pulses or heavily constrained variations of conventional waveforms without achieving general optimality in energy efficiency or neural selectivity. We implemented an optimisation framework that incorporates neuron model activation constraints and flexible control of pulse asymmetry. The optimised electric field waveforms achieved up to 92 % and 88 % reduction in energy loss and thus coil heating respectively compared to conventional monophasic pulses and previously improved monophasic-equivalent pulses. In the human experiments, OUR pulses showed similar motor thresholds to monophasic pulses in both AP and PA directions with significantly lower energy loss, particularly in the AP direction. Moreover, there was a significant MEP latency difference of (1.79 +/- 0.41) ms between AP and PA direction with OUR pulses, which suggests directional selectivity. Our framework successfully identified highly energy-efficient asymmetric pulses for directionally-selective neural engagement. These pulses can enable selective rapid-rate repetitive TMS protocols with reduced power consumption and coil heating, with potential benefits for precision and potency of neuro-modulation.
comment: 31 pages, 8 figures
Hybrid MAC Protocol with Integrated Multi-Layered Security for Resource-Constrained UAV Swarm Communications
Flying Ad Hoc Networks (FANETs) present unique challenges due to high node mobility, dynamic topologies, and strict resource constraints. Existing routing protocols often optimize for a single metric, such as path length or energy, while neglecting the complex dependencies between network performance, security, and MAC layer efficiency. This paper introduces a novel hardware software co design framework for secure and adaptive UAV swarm communications, featuring an energy aware protocol stack. The architecture employs a multicast, clustered organization where routing decisions integrate dynamic trust scores, historical link quality, and internodal distance. A hybrid MAC protocol combines contention based and scheduled channel access for optimized throughput. Security is ensured through a zero trust model that fuses cryptographic authentication with a behavioral reputation system, alongside hardware accelerated AES GCM encryption. Comparative analysis in an NS 3 simulation environment demonstrates the framework's superiority in packet delivery ratio, latency, resilience, and overhead, providing a scalable foundation for high performance swarm operations.
comment: Accepted at ISED 2025
Bounds of Validity for Bifurcations of Equilibria in a Class of Networked Dynamical Systems
Local bifurcation analysis plays a central role in understanding qualitative transitions in networked nonlinear dynamical systems, including dynamic neural network and opinion dynamics models. In this article we establish explicit bounds of validity for the classification of bifurcation diagrams in two classes of continuous-time networked dynamical systems, analogous in structure to the Hopfield and the Firing Rate dynamic neural network models. Our approach leverages recent advances in computing the bounds for the validity of Lyapunov-Schmidt reduction, a reduction method widely employed in nonlinear systems analysis. Using these bounds we rigorously characterize neighborhoods around bifurcation points where predictions from reduced-order models remain reliable. We further demonstrate how these bounds can be applied to an illustrative family of nonlinear opinion dynamics on k-regular graphs, which emerges as a special case of the general framework. These results provide new analytical tools for quantifying the robustness of bifurcation phenomena in dynamics over networked systems and highlight the interplay between network structure and nonlinear dynamical behavior.
comment: This manuscript has been submitted to the 2026 American Control Conference taking place in New Orleans, Louisiana, in May 2026
Distributionally Robust Control with End-to-End Statistically Guaranteed Metric Learning
Wasserstein distributionally robust control (DRC) recently emerges as a principled paradigm for handling uncertainty in stochastic dynamical systems. However, it constructs data-driven ambiguity sets via uniform distribution shifts before sequentially incorporating them into downstream control synthesis. This segregation between ambiguity set construction and control objectives inherently introduces a structural misalignment, which undesirably leads to conservative control policies with sub-optimal performance. To address this limitation, we propose a novel end-to-end finite-horizon Wasserstein DRC framework that integrates the learning of anisotropic Wasserstein metrics with downstream control tasks in a closed-loop manner, thus enabling ambiguity sets to be systematically adjusted along performance-critical directions and yielding more effective control policies. This framework is formulated as a bilevel program: the inner level characterizes dynamical system evolution under DRC, while the outer level refines the anisotropic metric leveraging control-performance feedback across a range of initial conditions. To solve this program efficiently, we develop a stochastic augmented Lagrangian algorithm tailored to the bilevel structure. Theoretically, we prove that the learned ambiguity sets preserve statistical finite-sample guarantees under a novel radius adjustment mechanism, and we establish the well-posedness of the bilevel formulation by demonstrating its continuity with respect to the learnable metric. Furthermore, we show that the algorithm converges to stationary points of the outer level problem, which are statistically consistent with the optimal metric at a non-asymptotic convergence rate. Experiments on both numerical and inventory control tasks verify that the proposed framework achieves superior closed-loop performance and robustness compared against state-of-the-art methods.
Performance Index Shaping for Closed-loop Optimal Control
The design of the performance index, also referred to as cost or reward shaping, is central to both optimal control and reinforcement learning, as it directly determines the behaviors, trade-offs, and objectives that the resulting control laws seek to achieve. A commonly used approach for this inference task in recent years is differentiable trajectory optimization, which allows gradients to be computed with respect to cost parameters by differentiating through an optimal control solver. However, this method often requires repeated solving of the underlying optimal control problem at every iteration, making the method computationally expensive. In this work, assuming known dynamics, we propose a novel framework that analytically links the performance index to the resulting closed-loop optimal control law, thereby transforming a typically bi-level inverse problem into a tractable single-level formulation. Our approach is motivated by the question: given a closed-loop control law that solves an infinite-horizon optimal control problem, how does this law change when the performance index is modified with additional terms? This formulation yields closed-form characterizations for broad classes of systems and performance indices, which not only facilitate interpretation and stability analysis, but also provide insight into the robust stability and input-to-state stable behavior of the resulting nonlinear closed-loop system. Moreover, this analytical perspective enables the generalization of our approach to diverse design objectives, yielding a unifying framework for performance index shaping. Given specific design objectives, we propose a systematic methodology to guide the shaping of the performance index and thereby design the resulting optimal control law.
Modeling the Impact of Communication and Human Uncertainties on Runway Capacity in Terminal Airspace
We investigate the potential impact of communication and human performance uncertainties on runway operations. Specifically, we consider these impacts within the context of an arrival scenario with two converging flows: a straight-in approach stream and a downwind stream merging into it. Both arrival stream are modeled using a modified Possion distribution that incorporate the separation minima as well as the runway occupancy time. Various system level uncertainties are addressed in this process, including communication link- and human-related uncertainties. In this research, we first build a Monte Carlo-based discrete-time simulation, where aircraft arrivals are generated by modified Poisson processes subject to minimum separation constraints, simulating various traffic operations. The merging logic incorporates standard bank angle continuous turn-to-final, pilot response delays, and dynamic gap availability in real time. Then, we investigate an automated final approach vectoring model (i.e., Auto-ATC), in which inverse optimal control is used to learn decision advisories from human expert records. By augmenting trajectories and incorporating the aforementioned uncertainties into the planning scenario, we create a setup analogous to the discrete event simulation. For both studies, runway capacity is measured by runway throughput, the fraction of downwind arrivals that merge immediately without holding, and the average delay (i.e., holding time/distance) experienced on the downwind leg. This research provides a method for runway capacity estimation in merging scenarios, and demonstrates that aeronautical communication link uncertainties significantly affect runway capacity in current voice-based operations, whereas the impact can be mitigated in autonomous operational settings.
Causal-Guided Dimension Reduction for Efficient Pareto Optimization
Multi-objective optimization of analog circuits is hindered by high-dimensional parameter spaces, strong feedback couplings, and expensive transistor-level simulations. Evolutionary algorithms such as Non-dominated Sorting Genetic Algorithm II (NSGA-II) are widely used but treat all parameters equally, thereby wasting effort on variables with little impact on performance, which limits their scalability. We introduce CaDRO, a causal-guided dimensionality reduction framework that embeds causal discovery into the optimization pipeline. CaDRO builds a quantitative causal map through a hybrid observational-interventional process, ranking parameters by their causal effect on the objectives. Low-impact parameters are fixed to values from high-quality solutions, while critical drivers remain active in the search. The reduced design space enables focused evolutionary optimization without modifying the underlying algorithm. Across amplifiers, regulators, and RF circuits, CaDRO converges up to 10$\times$ faster than NSGA-II while preserving or improving Pareto quality. For instance, on the Folded-Cascode Amplifier, hypervolume improves from 0.56 to 0.94, and on the LDO regulator from 0.65 to 0.81, with large gains in non-dominated solutions.
Structured Cooperative Multi-Agent Reinforcement Learning: a Bayesian Network Perspective
The empirical success of multi-agent reinforcement learning (MARL) has motivated the search for more efficient and scalable algorithms for large scale multi-agent systems. However, existing state-of-the-art algorithms do not fully exploit inter-agent coupling information to develop MARL algorithms. In this paper, we propose a systematic approach to leverage structures in the inter-agent couplings for efficient model-free reinforcement learning. We model the cooperative MARL problem via a Bayesian network and characterize the subset of agents, termed as the value dependency set, whose information is required by each agent to estimate its local action value function exactly. Moreover, we propose a partially decentralized training decentralized execution (P-DTDE) paradigm based on the value dependency set. We theoretically establish that the total variance of our P-DTDE policy gradient estimator is less than the centralized training decentralized execution (CTDE) policy gradient estimator. We derive a multi-agent policy gradient theorem based on the P-DTDE scheme and develop a scalable actor-critic algorithm. We demonstrate the efficiency and scalability of the proposed algorithm on multi-warehouse resource allocation and multi-zone temperature control examples. For dense value dependency sets, we propose an approximation scheme based on truncation of the Bayesian network and empirically show that it achieves a faster convergence than the exact value dependence set for applications with a large number of agents.
Viscosity CBFs: Bridging the Control Barrier Function and Hamilton-Jacobi Reachability Frameworks in Safe Control Theory
Control barrier functions (CBFs) and Hamilton-Jacobi reachability (HJR) are central frameworks in safe control. Traditionally, these frameworks have been viewed as distinct, with the former focusing on optimally safe controller design and the latter providing sufficient conditions for safety. A previous work introduced the notion of a control barrier value function (CB-VF), which is defined similarly to the other value functions studied in HJR but has certain CBF-like properties. In this work, we proceed the other direction by generalizing CBFs to non-differentiable ``viscosity'' CBFs. We show the deep connection between viscosity CBFs and CB-VFs, bridging the CBF and HJR frameworks. Through this bridge, we characterize the viscosity CBFs as precisely those functions which provide CBF-like safety guarantees (control invariance and smooth approach to the boundary). We then further show nice theoretical properties of viscosity CBFs, including their desirable closure under maximum and limit operations. In the process, we also extend CB-VFs to non-exponential anti-discounting and update the corresponding theory for CB-VFs along these lines.
ASTREA: Introducing Agentic Intelligence for Orbital Thermal Autonomy
This paper presents ASTREA, the first agentic system executed on flight-heritage hardware (TRL 9) for autonomous spacecraft operations, with on-orbit operation aboard the International Space Station (ISS). Using thermal control as a representative use case, we integrate a resource-constrained Large Language Model (LLM) agent with a reinforcement learning controller in an asynchronous architecture tailored for space-qualified platforms. Ground experiments show that LLM-guided supervision improves thermal stability and reduces violations, confirming the feasibility of combining semantic reasoning with adaptive control under hardware constraints. On-orbit validation aboard the ISS initially faced challenges due to inference latency misaligned with the rapid thermal cycles of Low Earth Orbit (LEO) satellites. Synchronization with the orbit length successfully surpassed the baseline with reduced violations, extended episode durations, and improved CPU utilization. These findings demonstrate the potential for scalable agentic supervision architectures in future autonomous spacecraft.
comment: Accepted for presentation at the European Space Agency's AI Start 2025 Conference (see https://atpi.eventsair.com/ai-star-2025/)
Improving cosmological reach of a gravitational wave observatory using Deep Loop Shaping
Improved low-frequency sensitivity of gravitational wave observatories would unlock study of intermediate-mass black hole mergers, binary black hole eccentricity, and provide early warnings for multi-messenger observations of binary neutron star mergers. Today's mirror stabilization control injects harmful noise, constituting a major obstacle to sensitivity improvements. We eliminated this noise through Deep Loop Shaping, a reinforcement learning method using frequency domain rewards. We proved our methodology on the LIGO Livingston Observatory (LLO). Our controller reduced control noise in the 10--30Hz band by over 30x, and up to 100x in sub-bands surpassing the design goal motivated by the quantum limit. These results highlight the potential of Deep Loop Shaping to improve current and future GW observatories, and more broadly instrumentation and control systems.
comment: Re-added a reference that was dropped by mistake in the published paper. Fixed date of experiment in text
A Vision-Based Shared-Control Teleoperation Scheme for Controlling the Robotic Arm of a Four-Legged Robot
In hazardous and remote environments, robotic systems perform critical tasks demanding improved safety and efficiency. Among these, quadruped robots with manipulator arms offer mobility and versatility for complex operations. However, teleoperating quadruped robots is challenging due to the lack of integrated obstacle detection and intuitive control methods for the robotic arm, increasing collision risks in confined or dynamically changing workspaces. Teleoperation via joysticks or pads can be non-intuitive and demands a high level of expertise due to its complexity, culminating in a high cognitive load on the operator. To address this challenge, a teleoperation approach that directly maps human arm movements to the robotic manipulator offers a simpler and more accessible solution. This work proposes an intuitive remote control by leveraging a vision-based pose estimation pipeline that utilizes an external camera with a machine learning-based model to detect the operator's wrist position. The system maps these wrist movements into robotic arm commands to control the robot's arm in real-time. A trajectory planner ensures safe teleoperation by detecting and preventing collisions with both obstacles and the robotic arm itself. The system was validated on the real robot, demonstrating robust performance in real-time control. This teleoperation approach provides a cost-effective solution for industrial applications where safety, precision, and ease of use are paramount, ensuring reliable and intuitive robotic control in high-risk environments.
Autonomous UAV Flight Navigation in Confined Spaces: A Reinforcement Learning Approach
Autonomous UAV inspection of confined industrial infrastructure, such as ventilation ducts, demands robust navigation policies where collisions are unacceptable. While Deep Reinforcement Learning (DRL) offers a powerful paradigm for developing such policies, it presents a critical trade-off between on-policy and off-policy algorithms. Off-policy methods promise high sample efficiency, a vital trait for minimizing costly and unsafe real-world fine-tuning. In contrast, on-policy methods often exhibit greater training stability, which is essential for reliable convergence in hazard-dense environments. This paper directly investigates this trade-off by comparing a leading on-policy algorithm, Proximal Policy Optimization (PPO), against an off-policy counterpart, Soft Actor-Critic (SAC), for precision flight in procedurally generated ducts within a high-fidelity simulator. Our results show that PPO consistently learned a stable, collision-free policy that completed the entire course. In contrast, SAC failed to find a complete solution, converging to a suboptimal policy that navigated only the initial segments before failure. This work provides evidence that for high-precision, safety-critical navigation tasks, the reliable convergence of a well-established on-policy method can be more decisive than the nominal sample efficiency of an off-policy algorithm.
Optimizing Grasping in Legged Robots: A Deep Learning Approach to Loco-Manipulation
This paper presents a deep learning framework designed to enhance the grasping capabilities of quadrupeds equipped with arms, with a focus on improving precision and adaptability. Our approach centers on a sim-to-real methodology that minimizes reliance on physical data collection. We developed a pipeline within the Genesis simulation environment to generate a synthetic dataset of grasp attempts on common objects. By simulating thousands of interactions from various perspectives, we created pixel-wise annotated grasp-quality maps to serve as the ground truth for our model. This dataset was used to train a custom CNN with a U-Net-like architecture that processes multi-modal input from an onboard RGB and depth cameras, including RGB images, depth maps, segmentation masks, and surface normal maps. The trained model outputs a grasp-quality heatmap to identify the optimal grasp point. We validated the complete framework on a four-legged robot. The system successfully executed a full loco-manipulation task: autonomously navigating to a target object, perceiving it with its sensors, predicting the optimal grasp pose using our model, and performing a precise grasp. This work proves that leveraging simulated training with advanced sensing offers a scalable and effective solution for object handling.
A Novel Hybrid Grey Wolf Differential Evolution Algorithm
Grey wolf optimizer (GWO) is a nature-inspired stochastic meta-heuristic of the swarm intelligence field that mimics the hunting behavior of grey wolves. Differential evolution (DE) is a popular stochastic algorithm of the evolutionary computation field that is well suited for global optimization. In this part, we introduce a new algorithm based on the hybridization of GWO and two DE variants, namely the GWO-DE algorithm. We evaluate the new algorithm by applying various numerical benchmark functions. The numerical results of the comparative study are quite satisfactory in terms of performance and solution quality.
comment: 19 pages, 32 figures, journal
Continuous body 3-D reconstruction of limbless animals
Limbless animals such as snakes, limbless lizards, worms, eels, and lampreys move their slender, long bodies in three dimensions to traverse diverse environments. Accurately quantifying their continuous body's 3-D shape and motion is important for understanding body-environment interactions in complex terrain, but this is difficult to achieve (especially for local orientation and rotation). Here, we describe an interpolation method to quantify continuous body 3-D position and orientation. We simplify the body as an elastic rod and apply a backbone optimization method to interpolate continuous body shape between end constraints imposed by tracked markers. Despite over-simplifying the biomechanics, our method achieves a higher interpolation accuracy (~50% error) in both 3-D position and orientation compared with the widely-used cubic B-spline interpolation method. Beyond snakes traversing large obstacles as demonstrated, our method applies to other long, slender, limbless animals and continuum robots. We provide codes and demo files for easy application of our method.
Optimal Planning of Electric Vehicle Charging Stations: Integrating Public Charging Networks and Transportation Congestion
The adoption of electric vehicles (EVs) represents a critical shift in personal mobility, fueled by policy support and advancements in automotive technology. However, the expansion of EVs for long-distance travel is hindered by charging time concerns, the sparse distribution of charging stations, and the worsening waiting times due to congestion. The main objective of this work is two-fold: 1) first, to comprehensively analyze the existing public charging station robustness and effectively strategize for the new ones, and 2) secondly, to select the optimal chargers for long-distance journeys, by estimating the waiting time from current traffic congestion. This is achieved by accompanying effective EV charging strategies, pinpointing on the congestion points from the existing traffic, and the robustness of the current charging station infrastructure. Utilizing a real-time transportation and charging station dataset in Texas, we identify optimal charger placement strategies to minimize travel time by examining the congestion and charging time trade-offs. Our findings suggest that maximizing the constant current phase during charging enhances efficiency, crucial for long-distance travel. On the contrary, we also explore the negative impact of congestion on travel times and we conclude that sometimes it might be beneficial to exceed the constant current phase to avoid the congested charging stations.
Robotics
Zero-shot Structure Learning and Planning for Autonomous Robot Navigation using Active Inference
Autonomous navigation in unfamiliar environments requires robots to simultaneously explore, localise, and plan under uncertainty, without relying on predefined maps or extensive training. We present a biologically inspired, Active Inference-based framework, Active Inference MAPping and Planning (AIMAPP). This model unifies mapping, localisation, and decision-making within a single generative model. Inspired by hippocampal navigation, it uses topological reasoning, place-cell encoding, and episodic memory to guide behaviour. The agent builds and updates a sparse topological map online, learns state transitions dynamically, and plans actions by minimising Expected Free Energy. This allows it to balance goal-directed and exploratory behaviours. We implemented a ROS-compatible navigation system that is sensor and robot-agnostic, capable of integrating with diverse hardware configurations. It operates in a fully self-supervised manner, is resilient to drift, and supports both exploration and goal-directed navigation without any pre-training. We demonstrate robust performance in large-scale real and simulated environments against state-of-the-art planning models, highlighting the system's adaptability to ambiguous observations, environmental changes, and sensor noise. The model offers a biologically inspired, modular solution to scalable, self-supervised navigation in unstructured settings. AIMAPP is available at https://github.com/decide-ugent/AIMAPP.
comment: yet to be submitted
Differential Analysis of Pseudo Haptic Feedback: Novel Comparative Study of Visual and Auditory Cue Integration for Psychophysical Evaluation
Pseudo-haptics exploit carefully crafted visual or auditory cues to trick the brain into "feeling" forces that are never physically applied, offering a low-cost alternative to traditional haptic hardware. Here, we present a comparative psychophysical study that quantifies how visual and auditory stimuli combine to evoke pseudo-haptic pressure sensations on a commodity tablet. Using a Unity-based Rollball game, participants (n = 4) guided a virtual ball across three textured terrains while their finger forces were captured in real time with a Robotous RFT40 force-torque sensor. Each terrain was paired with a distinct rolling-sound profile spanning 440 Hz - 4.7 kHz, 440 Hz - 13.1 kHz, or 440 Hz - 8.9 kHz; crevice collisions triggered additional "knocking" bursts to heighten realism. Average tactile forces increased systematically with cue intensity: 0.40 N, 0.79 N and 0.88 N for visual-only trials and 0.41 N, 0.81 N and 0.90 N for audio-only trials on Terrains 1-3, respectively. Higher audio frequencies and denser visual textures both elicited stronger muscle activation, and their combination further reduced the force needed to perceive surface changes, confirming multisensory integration. These results demonstrate that consumer-grade isometric devices can reliably induce and measure graded pseudo-haptic feedback without specialized actuators, opening a path toward affordable rehabilitation tools, training simulators and assistive interfaces.
comment: 17 Pages, 9 Figures
Guiding Energy-Efficient Locomotion through Impact Mitigation Rewards
Animals achieve energy-efficient locomotion by their implicit passive dynamics, a marvel that has captivated roboticists for decades.Recently, methods incorporated Adversarial Motion Prior (AMP) and Reinforcement learning (RL) shows promising progress to replicate Animals' naturalistic motion. However, such imitation learning approaches predominantly capture explicit kinematic patterns, so-called gaits, while overlooking the implicit passive dynamics. This work bridges this gap by incorporating a reward term guided by Impact Mitigation Factor (IMF), a physics-informed metric that quantifies a robot's ability to passively mitigate impacts. By integrating IMF with AMP, our approach enables RL policies to learn both explicit motion trajectories from animal reference motion and the implicit passive dynamic. We demonstrate energy efficiency improvements of up to 32%, as measured by the Cost of Transport (CoT), across both AMP and handcrafted reward structure.
Dynamic Quadrupedal Legged and Aerial Locomotion via Structure Repurposing
Multi-modal ground-aerial robots have been extensively studied, with a significant challenge lying in the integration of conflicting requirements across different modes of operation. The Husky robot family, developed at Northeastern University, and specifically the Husky v.2 discussed in this study, addresses this challenge by incorporating posture manipulation and thrust vectoring into multi-modal locomotion through structure repurposing. This quadrupedal robot features leg structures that can be repurposed for dynamic legged locomotion and flight. In this paper, we present the hardware design of the robot and report primary results on dynamic quadrupedal legged locomotion and hovering.
Toggling stiffness via multistability
Mechanical metamaterials enable unconventional and programmable mechanical responses through structural design rather than material composition. In this work, we introduce a multistable mechanical metamaterial that exhibits a toggleable stiffness effect, where the effective shear stiffness switches discretely between stable configurations. The mechanical analysis of surrogate beam models of the unit cell reveal that this behavior originates from the rotation transmitted by the support beams to the curved beam, which governs the balance between bending and axial deformation. The stiffness ratio between the two states of the unit cell can be tuned by varying the slenderness of the support beams or by incorporating localized hinges that modulate rotational transfer. Experiments on 3D-printed prototypes validate the numerical predictions, confirming consistent stiffness toggling across different geometries. Finally, we demonstrate a monolithic soft clutch that leverages this effect to achieve programmable, stepwise stiffness modulation. This work establishes a design strategy for toggleable stiffness using multistable metamaterials, paving the way for adaptive, lightweight, and autonomous systems in soft robotics and smart structures.
PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs
The ability to use, understand, and create tools is a hallmark of human intelligence, enabling sophisticated interaction with the physical world. For any general-purpose intelligent agent to achieve true versatility, it must also master these fundamental skills. While modern Multimodal Large Language Models (MLLMs) leverage their extensive common knowledge for high-level planning in embodied AI and in downstream Vision-Language-Action (VLA) models, the extent of their true understanding of physical tools remains unquantified. To bridge this gap, we present PhysToolBench, the first benchmark dedicated to evaluating the comprehension of physical tools by MLLMs. Our benchmark is structured as a Visual Question Answering (VQA) dataset comprising over 1,000 image-text pairs. It assesses capabilities across three distinct difficulty levels: (1) Tool Recognition: Requiring the recognition of a tool's primary function. (2) Tool Understanding: Testing the ability to grasp the underlying principles of a tool's operation. (3) Tool Creation: Challenging the model to fashion a new tool from surrounding objects when conventional options are unavailable. Our comprehensive evaluation of 32 MLLMs-spanning proprietary, open-source, specialized embodied, and backbones in VLAs-reveals a significant deficiency in tool understanding. Furthermore, we provide an in-depth analysis and propose preliminary solutions. Code and dataset are publicly available.
Autonomous Soft Robotic Guidewire Navigation via Imitation Learning
In endovascular surgery, endovascular interventionists push a thin tube called a catheter, guided by a thin wire to a treatment site inside the patient's blood vessels to treat various conditions such as blood clots, aneurysms, and malformations. Guidewires with robotic tips can enhance maneuverability, but they present challenges in modeling and control. Automation of soft robotic guidewire navigation has the potential to overcome these challenges, increasing the precision and safety of endovascular navigation. In other surgical domains, end-to-end imitation learning has shown promising results. Thus, we develop a transformer-based imitation learning framework with goal conditioning, relative action outputs, and automatic contrast dye injections to enable generalizable soft robotic guidewire navigation in an aneurysm targeting task. We train the model on 36 different modular bifurcated geometries, generating 647 total demonstrations under simulated fluoroscopy, and evaluate it on three previously unseen vascular geometries. The model can autonomously drive the tip of the robot to the aneurysm location with a success rate of 83% on the unseen geometries, outperforming several baselines. In addition, we present ablation and baseline studies to evaluate the effectiveness of each design and data collection choice. Project website: https://softrobotnavigation.github.io/
FOGMACHINE -- Leveraging Discrete-Event Simulation and Scene Graphs for Modeling Hierarchical, Interconnected Environments under Partial Observations from Mobile Agents
Dynamic Scene Graphs (DSGs) provide a structured representation of hierarchical, interconnected environments, but current approaches struggle to capture stochastic dynamics, partial observability, and multi-agent activity. These aspects are critical for embodied AI, where agents must act under uncertainty and delayed perception. We introduce FOGMACHINE , an open-source framework that fuses DSGs with discrete-event simulation to model object dynamics, agent observations, and interactions at scale. This setup enables the study of uncertainty propagation, planning under limited perception, and emergent multi-agent behavior. Experiments in urban scenarios illustrate realistic temporal and spatial patterns while revealing the challenges of belief estimation under sparse observations. By combining structured representations with efficient simulation, FOGMACHINE establishes an effective tool for benchmarking, model training, and advancing embodied AI in complex, uncertain environments.
comment: submitted to the IEEE for possible publication; 8 pages, 3 figures, 1 table
Scalable Multi-Agent Path Finding using Collision-Aware Dynamic Alert Mask and a Hybrid Execution Strategy
Multi-agent pathfinding (MAPF) remains a critical problem in robotics and autonomous systems, where agents must navigate shared spaces efficiently while avoiding conflicts. Traditional centralized algorithms that have global information, such as Conflict-Based Search (CBS), provide high-quality solutions but become computationally expensive in large-scale scenarios due to the combinatorial explosion of conflicts that need resolution. Conversely, distributed approaches that have local information, particularly learning-based methods, offer better scalability by operating with relaxed information availability, yet often at the cost of solution quality. To address these limitations, we propose a hybrid framework that combines decentralized path planning with a lightweight centralized coordinator. Our framework leverages reinforcement learning (RL) for decentralized planning, enabling agents to adapt their planning based on minimal, targeted alerts--such as static conflict-cell flags or brief conflict tracks--that are dynamically shared information from the central coordinator for effective conflict resolution. We empirically study the effect of the information available to an agent on its planning performance. Our approach reduces the inter-agent information sharing compared to fully centralized and distributed methods, while still consistently finding feasible, collision-free solutions--even in large-scale scenarios having higher agent counts.
Failure Prediction at Runtime for Generative Robot Policies NeurIPS 2025
Imitation learning (IL) with generative models, such as diffusion and flow matching, has enabled robots to perform complex, long-horizon tasks. However, distribution shifts from unseen environments or compounding action errors can still cause unpredictable and unsafe behavior, leading to task failure. Early failure prediction during runtime is therefore essential for deploying robots in human-centered and safety-critical environments. We propose FIPER, a general framework for Failure Prediction at Runtime for generative IL policies that does not require failure data. FIPER identifies two key indicators of impending failure: (i) out-of-distribution (OOD) observations detected via random network distillation in the policy's embedding space, and (ii) high uncertainty in generated actions measured by a novel action-chunk entropy score. Both failure prediction scores are calibrated using a small set of successful rollouts via conformal prediction. A failure alarm is triggered when both indicators, aggregated over short time windows, exceed their thresholds. We evaluate FIPER across five simulation and real-world environments involving diverse failure modes. Our results demonstrate that FIPER better distinguishes actual failures from benign OOD situations and predicts failures more accurately and earlier than existing methods. We thus consider this work an important step towards more interpretable and safer generative robot policies. Code, data and videos are available at https://tum-lsy.github.io/fiper_website.
comment: Accepted to NeurIPS 2025
SilvaScenes: Tree Segmentation and Species Classification from Under-Canopy Images in Natural Forests
Interest in robotics for forest management is growing, but perception in complex, natural environments remains a significant hurdle. Conditions such as heavy occlusion, variable lighting, and dense vegetation pose challenges to automated systems, which are essential for precision forestry, biodiversity monitoring, and the automation of forestry equipment. These tasks rely on advanced perceptual capabilities, such as detection and fine-grained species classification of individual trees. Yet, existing datasets are inadequate to develop such perception systems, as they often focus on urban settings or a limited number of species. To address this, we present SilvaScenes, a new dataset for instance segmentation of tree species from under-canopy images. Collected across five bioclimatic domains in Quebec, Canada, SilvaScenes features 1476 trees from 24 species with annotations from forestry experts. We demonstrate the relevance and challenging nature of our dataset by benchmarking modern deep learning approaches for instance segmentation. Our results show that, while tree segmentation is easy, with a top mean average precision (mAP) of 67.65%, species classification remains a significant challenge with an mAP of only 35.69%. Our dataset and source code will be available at https://github.com/norlab-ulaval/SilvaScenes.
comment: 8 pages, 5 figures
Bridging Research and Practice in Simulation-based Testing of Industrial Robot Navigation Systems
Ensuring robust robotic navigation in dynamic environments is a key challenge, as traditional testing methods often struggle to cover the full spectrum of operational requirements. This paper presents the industrial adoption of Surrealist, a simulation-based test generation framework originally for UAVs, now applied to the ANYmal quadrupedal robot for industrial inspection. Our method uses a search-based algorithm to automatically generate challenging obstacle avoidance scenarios, uncovering failures often missed by manual testing. In a pilot phase, generated test suites revealed critical weaknesses in one experimental algorithm (40.3% success rate) and served as an effective benchmark to prove the superior robustness of another (71.2% success rate). The framework was then integrated into the ANYbotics workflow for a six-month industrial evaluation, where it was used to test five proprietary algorithms. A formal survey confirmed its value, showing it enhances the development process, uncovers critical failures, provides objective benchmarks, and strengthens the overall verification pipeline.
comment: 12 pages, accepted for publication at IEEE/ACM International Conference on Automated Software Engineering (ASE) 2025 - Industry Showcase Track
Parametrized Topological Complexity for a Multi-Robot System with Variable Tasks
We study a generalized motion planning problem involving multiple autonomous robots navigating in a $d$-dimensional Euclidean space in the presence of a set of obstacles whose positions are unknown a priori. Each robot is required to visit sequentially a prescribed set of target states, with the number of targets varying between robots. This heterogeneous setting generalizes the framework considered in the prior works on sequential parametrized topological complexity by Farber and the second author of this article. To determine the topological complexity of our problem, we formulate it mathematically by constructing an appropriate fibration. Our main contribution is the determination of this invariant in the generalized setting, which captures the minimal algorithmic instability required for designing collision-free motion planning algorithms under parameter-dependent constraints. We provide a detailed analysis for both odd and even-dimensional ambient spaces, including the essential cohomological computations and explicit constructions of corresponding motion planning algorithms.
comment: 25 pages. All comments are welcome
Placeit! A Framework for Learning Robot Object Placement Skills
Robotics research has made significant strides in learning, yet mastering basic skills like object placement remains a fundamental challenge. A key bottleneck is the acquisition of large-scale, high-quality data, which is often a manual and laborious process. Inspired by Graspit!, a foundational work that used simulation to automatically generate dexterous grasp poses, we introduce Placeit!, an evolutionary-computation framework for generating valid placement positions for rigid objects. Placeit! is highly versatile, supporting tasks from placing objects on tables to stacking and inserting them. Our experiments show that by leveraging quality-diversity optimization, Placeit! significantly outperforms state-of-the-art methods across all scenarios for generating diverse valid poses. A pick&place pipeline built on our framework achieved a 90% success rate over 120 real-world deployments. This work positions Placeit! as a powerful tool for open-environment pick-and-place tasks and as a valuable engine for generating the data needed to train simulation-based foundation models in robotics.
comment: 8 pages, 8 figures. Draft version
Obstacle Avoidance using Dynamic Movement Primitives and Reinforcement Learning
Learning-based motion planning can quickly generate near-optimal trajectories. However, it often requires either large training datasets or costly collection of human demonstrations. This work proposes an alternative approach that quickly generates smooth, near-optimal collision-free 3D Cartesian trajectories from a single artificial demonstration. The demonstration is encoded as a Dynamic Movement Primitive (DMP) and iteratively reshaped using policy-based reinforcement learning to create a diverse trajectory dataset for varying obstacle configurations. This dataset is used to train a neural network that takes as inputs the task parameters describing the obstacle dimensions and location, derived automatically from a point cloud, and outputs the DMP parameters that generate the trajectory. The approach is validated in simulation and real-robot experiments, outperforming a RRT-Connect baseline in terms of computation and execution time, as well as trajectory length, while supporting multi-modal trajectory generation for different obstacle geometries and end-effector dimensions. Videos and the implementation code are available at https://github.com/DominikUrbaniak/obst-avoid-dmp-pi2.
comment: 8 pages, 7 figures
Glovity: Learning Dexterous Contact-Rich Manipulation via Spatial Wrench Feedback Teleoperation System
We present Glovity, a novel, low-cost wearable teleoperation system that integrates a spatial wrench (force-torque) feedback device with a haptic glove featuring fingertip Hall sensor calibration, enabling feedback-rich dexterous manipulation. Glovity addresses key challenges in contact-rich tasks by providing intuitive wrench and tactile feedback, while overcoming embodiment gaps through precise retargeting. User studies demonstrate significant improvements: wrench feedback boosts success rates in book-flipping tasks from 48% to 78% and reduces completion time by 25%, while fingertip calibration enhances thin-object grasping success significantly compared to commercial glove. Furthermore, incorporating wrench signals into imitation learning (via DP-R3M) achieves high success rate in novel contact-rich scenarios, such as adaptive page flipping and force-aware handovers. All hardware designs, software will be open-sourced. Project website: https://glovity.github.io/
HANDO: Hierarchical Autonomous Navigation and Dexterous Omni-loco-manipulation IROS 2025
Seamless loco-manipulation in unstructured environments requires robots to leverage autonomous exploration alongside whole-body control for physical interaction. In this work, we introduce HANDO (Hierarchical Autonomous Navigation and Dexterous Omni-loco-manipulation), a two-layer framework designed for legged robots equipped with manipulators to perform human-centered mobile manipulation tasks. The first layer utilizes a goal-conditioned autonomous exploration policy to guide the robot to semantically specified targets, such as a black office chair in a dynamic environment. The second layer employs a unified whole-body loco-manipulation policy to coordinate the arm and legs for precise interaction tasks-for example, handing a drink to a person seated on the chair. We have conducted an initial deployment of the navigation module, and will continue to pursue finer-grained deployment of whole-body loco-manipulation.
comment: 4 pages, 2 figures, this paper has been accepted for the workshop Perception and Planning for Mobile Manipulation in Changing Environments (PM2CE) at IROS 2025
PLEXUS Hand: Lightweight Four-Motor Prosthetic Hand Enabling Precision-Lateral Dexterous Manipulation
Electric prosthetic hands should be lightweight to decrease the burden on the user, shaped like human hands for cosmetic purposes, and have motors inside to protect them from damage and dirt. In addition to the ability to perform daily activities, these features are essential for everyday use of the hand. In-hand manipulation is necessary to perform daily activities such as transitioning between different postures, particularly through rotational movements, such as reorienting cards before slot insertion and operating tools such as screwdrivers. However, currently used electric prosthetic hands only achieve static grasp postures, and existing manipulation approaches require either many motors, which makes the prosthesis heavy for daily use in the hand, or complex mechanisms that demand a large internal space and force external motor placement, complicating attachment and exposing the components to damage. Alternatively, we combine a single-axis thumb and optimized thumb positioning to achieve basic posture and in-hand manipulation, that is, the reorientation between precision and lateral grasps, using only four motors in a lightweight (311 g) prosthetic hand. Experimental validation using primitive objects of various widths (5-30 mm) and shapes (cylinders and prisms) resulted in success rates of 90-100% for reorientation tasks. The hand performed seal stamping and USB device insertion, as well as rotation to operate a screwdriver.
Flow-Opt: Scalable Centralized Multi-Robot Trajectory Optimization with Flow Matching and Differentiable Optimization
Centralized trajectory optimization in the joint space of multiple robots allows access to a larger feasible space that can result in smoother trajectories, especially while planning in tight spaces. Unfortunately, it is often computationally intractable beyond a very small swarm size. In this paper, we propose Flow-Opt, a learning-based approach towards improving the computational tractability of centralized multi-robot trajectory optimization. Specifically, we reduce the problem to first learning a generative model to sample different candidate trajectories and then using a learned Safety-Filter(SF) to ensure fast inference-time constraint satisfaction. We propose a flow-matching model with a diffusion transformer (DiT) augmented with permutation invariant robot position and map encoders as the generative model. We develop a custom solver for our SF and equip it with a neural network that predicts context-specific initialization. The initialization network is trained in a self-supervised manner, taking advantage of the differentiability of the SF solver. We advance the state-of-the-art in the following respects. First, we show that we can generate trajectories of tens of robots in cluttered environments in a few tens of milliseconds. This is several times faster than existing centralized optimization approaches. Moreover, our approach also generates smoother trajectories orders of magnitude faster than competing baselines based on diffusion models. Second, each component of our approach can be batched, allowing us to solve a few tens of problem instances in a fraction of a second. We believe this is a first such result; no existing approach provides such capabilities. Finally, our approach can generate a diverse set of trajectories between a given set of start and goal locations, which can capture different collision-avoidance behaviors.
Decentralized Multi-Robot Relative Navigation in Unknown, Structurally Constrained Environments under Limited Communication
Multi-robot navigation in unknown, structurally constrained, and GPS-denied environments presents a fundamental trade-off between global strategic foresight and local tactical agility, particularly under limited communication. Centralized methods achieve global optimality but suffer from high communication overhead, while distributed methods are efficient but lack the broader awareness to avoid deadlocks and topological traps. To address this, we propose a fully decentralized, hierarchical relative navigation framework that achieves both strategic foresight and tactical agility without a unified coordinate system. At the strategic layer, robots build and exchange lightweight topological maps upon opportunistic encounters. This process fosters an emergent global awareness, enabling the planning of efficient, trap-avoiding routes at an abstract level. This high-level plan then inspires the tactical layer, which operates on local metric information. Here, a sampling-based escape point strategy resolves dense spatio-temporal conflicts by generating dynamically feasible trajectories in real time, concurrently satisfying tight environmental and kinodynamic constraints. Extensive simulations and real-world experiments demonstrate that our system significantly outperforms in success rate and efficiency, especially in communication-limited environments with complex topological structures.
When a Robot is More Capable than a Human: Learning from Constrained Demonstrators
Learning from demonstrations enables experts to teach robots complex tasks using interfaces such as kinesthetic teaching, joystick control, and sim-to-real transfer. However, these interfaces often constrain the expert's ability to demonstrate optimal behavior due to indirect control, setup restrictions, and hardware safety. For example, a joystick can move a robotic arm only in a 2D plane, even though the robot operates in a higher-dimensional space. As a result, the demonstrations collected by constrained experts lead to suboptimal performance of the learned policies. This raises a key question: Can a robot learn a better policy than the one demonstrated by a constrained expert? We address this by allowing the agent to go beyond direct imitation of expert actions and explore shorter and more efficient trajectories. We use the demonstrations to infer a state-only reward signal that measures task progress, and self-label reward for unknown states using temporal interpolation. Our approach outperforms common imitation learning in both sample efficiency and task completion time. On a real WidowX robotic arm, it completes the task in 12 seconds, 10x faster than behavioral cloning, as shown in real-robot videos on https://sites.google.com/view/constrainedexpert .
Robust Visual Teach-and-Repeat Navigation with Flexible Topo-metric Graph Map Representation
Visual Teach-and-Repeat Navigation is a direct solution for mobile robot to be deployed in unknown environments. However, robust trajectory repeat navigation still remains challenged due to environmental changing and dynamic objects. In this paper, we propose a novel visual teach-and-repeat navigation system, which consists of a flexible map representation, robust map matching and a map-less local navigation module. During the teaching process, the recorded keyframes are formulated as a topo-metric graph and each node can be further extended to save new observations. Such representation also alleviates the requirement of globally consistent mapping. To enhance the place recognition performance during repeating process, instead of using frame-to-frame matching, we firstly implement keyframe clustering to aggregate similar connected keyframes into local map and perform place recognition based on visual frame-tolocal map matching strategy. To promote the local goal persistent tracking performance, a long-term goal management algorithm is constructed, which can avoid the robot getting lost due to environmental changes or obstacle occlusion. To achieve the goal without map, a local trajectory-control candidate optimization algorithm is proposed. Extensively experiments are conducted on our mobile platform. The results demonstrate that our system is superior to the baselines in terms of robustness and effectiveness.
Training Models to Detect Successive Robot Errors from Human Reactions
As robots become more integrated into society, detecting robot errors is essential for effective human-robot interaction (HRI). When a robot fails repeatedly, how can it know when to change its behavior? Humans naturally respond to robot errors through verbal and nonverbal cues that intensify over successive failures-from confusion and subtle speech changes to visible frustration and impatience. While prior work shows that human reactions can indicate robot failures, few studies examine how these evolving responses reveal successive failures. This research uses machine learning to recognize stages of robot failure from human reactions. In a study with 26 participants interacting with a robot that made repeated conversational errors, behavioral features were extracted from video data to train models for individual users. The best model achieved 93.5% accuracy for detecting errors and 84.1% for classifying successive failures. Modeling the progression of human reactions enhances error detection and understanding of repeated interaction breakdowns in HRI.
comment: Accepted to NERC '25
Visual Anomaly Detection for Reliable Robotic Implantation of Flexible Microelectrode Array IROS 2025
Flexible microelectrode (FME) implantation into brain cortex is challenging due to the deformable fiber-like structure of FME probe and the interaction with critical bio-tissue. To ensure reliability and safety, the implantation process should be monitored carefully. This paper develops an image-based anomaly detection framework based on the microscopic cameras of the robotic FME implantation system. The unified framework is utilized at four checkpoints to check the micro-needle, FME probe, hooking result, and implantation point, respectively. Exploiting the existing object localization results, the aligned regions of interest (ROIs) are extracted from raw image and input to a pretrained vision transformer (ViT). Considering the task specifications, we propose a progressive granularity patch feature sampling method to address the sensitivity-tolerance trade-off issue at different locations. Moreover, we select a part of feature channels with higher signal-to-noise ratios from the raw general ViT features, to provide better descriptors for each specific scene. The effectiveness of the proposed methods is validated with the image datasets collected from our implantation system.
comment: Accept by IROS 2025
iMoWM: Taming Interactive Multi-Modal World Model for Robotic Manipulation
Learned world models hold significant potential for robotic manipulation, as they can serve as simulator for real-world interactions. While extensive progress has been made in 2D video-based world models, these approaches often lack geometric and spatial reasoning, which is essential for capturing the physical structure of the 3D world. To address this limitation, we introduce iMoWM, a novel interactive world model designed to generate color images, depth maps, and robot arm masks in an autoregressive manner conditioned on actions. To overcome the high computational cost associated with three-dimensional information, we propose MMTokenizer, which unifies multi-modal inputs into a compact token representation. This design enables iMoWM to leverage large-scale pretrained VideoGPT models while maintaining high efficiency and incorporating richer physical information. With its multi-modal representation, iMoWM not only improves the visual quality of future predictions but also serves as an effective simulator for model-based reinforcement learning (MBRL) and facilitates real-world imitation learning. Extensive experiments demonstrate the superiority of iMoWM across these tasks, showcasing the advantages of multi-modal world modeling for robotic manipulation. Homepage: https://xingyoujun.github.io/imowm/
Exploring Single Domain Generalization of LiDAR-based Semantic Segmentation under Imperfect Labels
Accurate perception is critical for vehicle safety, with LiDAR as a key enabler in autonomous driving. To ensure robust performance across environments, sensor types, and weather conditions without costly re-annotation, domain generalization in LiDAR-based 3D semantic segmentation is essential. However, LiDAR annotations are often noisy due to sensor imperfections, occlusions, and human errors. Such noise degrades segmentation accuracy and is further amplified under domain shifts, threatening system reliability. While noisy-label learning is well-studied in images, its extension to 3D LiDAR segmentation under domain generalization remains largely unexplored, as the sparse and irregular structure of point clouds limits direct use of 2D methods. To address this gap, we introduce the novel task Domain Generalization for LiDAR Semantic Segmentation under Noisy Labels (DGLSS-NL) and establish the first benchmark by adapting three representative noisy-label learning strategies from image classification to 3D segmentation. However, we find that existing noisy-label learning approaches adapt poorly to LiDAR data. We therefore propose DuNe, a dual-view framework with strong and weak branches that enforce feature-level consistency and apply cross-entropy loss based on confidence-aware filtering of predictions. Our approach shows state-of-the-art performance by achieving 56.86% mIoU on SemanticKITTI, 42.28% on nuScenes, and 52.58% on SemanticPOSS under 10% symmetric label noise, with an overall Arithmetic Mean (AM) of 49.57% and Harmonic Mean (HM) of 48.50%, thereby demonstrating robust domain generalization in DGLSS-NL tasks. The code is available on our project page.
Trust Modeling and Estimation in Human-Autonomy Interactions
Advances in the control of autonomous systems have accompanied an expansion in the potential applications for autonomous robotic systems. The success of applications involving humans depends on the quality of interaction between the autonomous system and the human supervisor, which is particularly affected by the degree of trust that the supervisor places in the autonomous system. Absent from the literature are models of supervisor trust dynamics that can accommodate asymmetric responses to autonomous system performance and the intermittent nature of supervisor-autonomous system communication. This paper focuses on formulating an estimated model of supervisor trust that incorporates both of these features by employing a switched linear system structure with event-triggered sampling of the model input and output. Trust response data collected in a user study with 51 participants were then used identify parameters for a switched linear model-based observer of supervisor trust.
comment: 10 pages. 13 figures
A geometrical approach to solve the proximity of a point to an axisymmetric quadric in space
This paper presents the classification of a general quadric into an axisymmetric quadric (AQ) and the solution to the problem of the proximity of a given point to an AQ. The problem of proximity in $R^3$ is reduced to the same in $R^2$, which is not found in the literature. A new method to solve the problem in $R^2$ is used based on the geometrical properties of the conics, such as sub-normal, length of the semi-major axis, eccentricity, slope and radius. Furthermore, the problem in $R^2$ is categorised into two and three more sub-cases for parabola and ellipse/hyperbola, respectively, depending on the location of the point, which is a novel approach as per the authors' knowledge. The proposed method is suitable for implementation in a common programming language, such as C and proved to be faster than a commercial library, namely, Bullet.
Direct Data-Driven Predictive Control for a Three-dimensional Cable-Driven Soft Robotic Arm
Soft robots offer significant advantages in safety and adaptability, yet achieving precise and dynamic control remains a major challenge due to their inherently complex and nonlinear dynamics. Recently, Data-enabled Predictive Control (DeePC) has emerged as a promising model-free approach that bypasses explicit system identification by directly leveraging input-output data. While DeePC has shown success in other domains, its application to soft robots remains underexplored, particularly for three-dimensional (3D) soft robotic systems. This paper addresses this gap by developing and experimentally validating an effective DeePC framework on a 3D, cable-driven soft arm. Specifically, we design and fabricate a soft robotic arm with a thick tubing backbone for stability, a dense silicone body with large cavities for strength and flexibility, and rigid endcaps for secure termination. Using this platform, we implement DeePC with singular value decomposition (SVD)-based dimension reduction for two key control tasks: fixed-point regulation and trajectory tracking in 3D space. Comparative experiments with a baseline model-based controller demonstrate DeePC's superior accuracy, robustness, and adaptability, highlighting its potential as a practical solution for dynamic control of soft robots.
Model-Based Lookahead Reinforcement Learning for in-hand manipulation
In-Hand Manipulation, as many other dexterous tasks, remains a difficult challenge in robotics by combining complex dynamic systems with the capability to control and manoeuvre various objects using its actuators. This work presents the application of a previously developed hybrid Reinforcement Learning (RL) Framework to In-Hand Manipulation task, verifying that it is capable of improving the performance of the task. The model combines concepts of both Model-Free and Model-Based Reinforcement Learning, by guiding a trained policy with the help of a dynamic model and value-function through trajectory evaluation, as done in Model Predictive Control. This work evaluates the performance of the model by comparing it with the policy that will be guided. To fully explore this, various tests are performed using both fully-actuated and under-actuated simulated robotic hands to manipulate different objects for a given task. The performance of the model will also be tested for generalization tests, by changing the properties of the objects in which both the policy and dynamic model were trained, such as density and size, and additionally by guiding a trained policy in a certain object to perform the same task in a different one. The results of this work show that, given a policy with high average reward and an accurate dynamic model, the hybrid framework improves the performance of in-hand manipulation tasks for most test cases, even when the object properties are changed. However, this improvement comes at the expense of increasing the computational cost, due to the complexity of trajectory evaluation.
Online IMU-odometer Calibration using GNSS Measurements for Autonomous Ground Vehicle Localization
Accurate calibration of intrinsic (odometer scaling factors) and extrinsic parameters (IMU-odometer translation and rotation) is essential for autonomous ground vehicle localization. Existing GNSS-aided approaches often rely on positioning results or raw measurements without ambiguity resolution, and their observability properties remain underexplored. This paper proposes a tightly coupled online calibration method that fuses IMU, odometer, and raw GNSS measurements (pseudo-range, carrier-phase, and Doppler) within an extendable factor graph optimization (FGO) framework, incorporating outlier mitigation and ambiguity resolution. Observability analysis reveals that two horizontal translation and three rotation parameters are observable under general motion, while vertical translation remains unobservable. Simulation and real-world experiments demonstrate superior calibration and localization performance over state-of-the-art loosely coupled methods. Specifically, the IMU-odometer positioning using our calibrated parameters achieves the absolute maximum error of 17.75 m while the one of LC method is 61.51 m, achieving up to 71.14 percent improvement. To foster further research, we also release the first open-source dataset that combines IMU, 2D odometer, and raw GNSS measurements from both rover and base stations.
comment: Submitted to IEEE Transactions on Intelligent Transportation Systems
Computing Safe Control Inputs using Discrete-Time Matrix Control Barrier Functions via Convex Optimization
Control barrier functions (CBFs) have seen widespread success in providing forward invariance and safety guarantees for dynamical control systems. A crucial limitation of discrete-time formulations is that CBFs that are nonconcave in their argument require the solution of nonconvex optimization problems to compute safety-preserving control inputs, which inhibits real-time computation of control inputs guaranteeing forward invariance. This paper presents a novel method for computing safety-preserving control inputs for discrete-time systems with nonconvex safety sets, utilizing convex optimization and the recently developed class of matrix control barrier function techniques. The efficacy of our methods is demonstrated through numerical simulations on a bicopter system.
comment: 17 pages, 8 figures
Cross-Sensor Touch Generation
Today's visuo-tactile sensors come in many shapes and sizes, making it challenging to develop general-purpose tactile representations. This is because most models are tied to a specific sensor design. To address this challenge, we propose two approaches to cross-sensor image generation. The first is an end-to-end method that leverages paired data (Touch2Touch). The second method builds an intermediate depth representation and does not require paired data (T2D2: Touch-to-Depth-to-Touch). Both methods enable the use of sensor-specific models across multiple sensors via the cross-sensor touch generation process. Together, these models offer flexible solutions for sensor translation, depending on data availability and application needs. We demonstrate their effectiveness on downstream tasks such as in-hand pose estimation and behavior cloning, successfully transferring models trained on one sensor to another. Project page: https://samantabelen.github.io/cross_sensor_touch_generation.
comment: CoRL 2025
Enhancing Diffusion Policy with Classifier-Free Guidance for Temporal Robotic Tasks
Temporal sequential tasks challenge humanoid robots, as existing Diffusion Policy (DP) and Action Chunking with Transformers (ACT) methods often lack temporal context, resulting in local optima traps and excessive repetitive actions. To address these issues, this paper introduces a Classifier-Free Guidance-Based Diffusion Policy (CFG-DP), a novel framework to enhance DP by integrating Classifier-Free Guidance (CFG) with conditional and unconditional models. Specifically, CFG leverages timestep inputs to track task progression and ensure precise cycle termination. It dynamically adjusts action predictions based on task phase, using a guidance factor tuned to balance temporal coherence and action accuracy. Real-world experiments on a humanoid robot demonstrate high success rates and minimal repetitive actions. Furthermore, we assessed the model's ability to terminate actions and examined how different components and parameter adjustments affect its performance. This framework significantly enhances deterministic control and execution reliability for sequential robotic tasks.
comment: 7 pages, 7 figures
Navigation and Exploration with Active Inference: from Biology to Industry
By building and updating internal cognitive maps, animals exhibit extraordinary navigation abilities in complex, dynamic environments. Inspired by these biological mechanisms, we present a real time robotic navigation system grounded in the Active Inference Framework (AIF). Our model incrementally constructs a topological map, infers the agent's location, and plans actions by minimising expected uncertainty and fulfilling perceptual goals without any prior training. Integrated into the ROS2 ecosystem, we validate its adaptability and efficiency across both 2D and 3D environments (simulated and real world), demonstrating competitive performance with traditional and state of the art exploration approaches while offering a biologically inspired navigation approach.
comment: conference IWAI 2025 - accepted (in processing)
Robo-Instruct: Simulator-Augmented Instruction Alignment For Finetuning Code LLMs
Code LLMs have shown promising results with converting tasks in natural language to programs that can be executed by service robots. We are interested in finetuning small, specialized LLMs for this purpose, but collecting datasets of task-program pairs specific to each robot is time-consuming and expensive. While approaches such as SELF-INSTRUCT and EVOL-INSTRUCT are capable of generating novel tasks given a few examples, they are unable to provide the corresponding programs that correctly abide by physical-world and robot-constraints using the provided programming interface. Using a simulator is a natural potential solution to checking for such constraints, but building simulation environments that can handle arbitrary tasks and their necessary objects and locations, is challenging. To address these challenges, we introduce ROBO-INSTRUCT, which synthesizes task-specific simulation environments on the fly during program execution, by opportunistically inferring entity properties and enforcing corresponding constraints based on how the entities are used in the task program. Additionally, ROBO-INSTRUCT integrates an LLM-aided post-processing procedure to refine instructions for better alignment with robot programs. We demonstrate the effectiveness of ROBO-INSTRUCT across multiple LLMs, showing that our fine-tuned models outperform all baseline methods and even match or surpass the performance of several larger and proprietary models.
comment: Conference on Language Modeling (COLM) 2025, Project site: https://amrl.cs.utexas.edu/robo-instruct/
SwarmGPT: Combining Large Language Models with Safe Motion Planning for Drone Swarm Choreography
Drone swarm performances -- synchronized, expressive aerial displays set to music -- have emerged as a captivating application of modern robotics. Yet designing smooth, safe choreographies remains a complex task requiring expert knowledge. We present SwarmGPT, a language-based choreographer that leverages the reasoning power of large language models (LLMs) to streamline drone performance design. The LLM is augmented by a safety filter that ensures deployability by making minimal corrections when safety or feasibility constraints are violated. By decoupling high-level choreographic design from low-level motion planning, our system enables non-experts to iteratively refine choreographies using natural language without worrying about collisions or actuator limits. We validate our approach through simulations with swarms up to 200 drones and real-world experiments with up to 20 drones performing choreographies to diverse types of songs, demonstrating scalable, synchronized, and safe performances. Beyond entertainment, this work offers a blueprint for integrating foundation models into safety-critical swarm robotics applications.
comment: Accepted at RA-L 2025
Extending First-order Robotic Motion Planners to Second-order Robot Dynamics
This paper extends first-order motion planners to robots governed by second-order dynamics. Two control schemes are proposed based on the knowledge of a scalar function whose negative gradient aligns with a given first-order motion planner. When such a function is known, the first-order motion planner is combined with a damping velocity vector with a dynamic gain to extend the safety and convergence guarantees of the first-order motion planner to second-order systems. If no such function is available, we propose an alternative control scheme ensuring that the error between the robot's velocity and the first-order motion planner converges to zero. The theoretical developments are supported by simulation results demonstrating the effectiveness of the proposed approaches.
comment: 14 pages, 10 figures
MP1: MeanFlow Tames Policy Learning in 1-step for Robotic Manipulation
In robot manipulation, robot learning has become a prevailing approach. However, generative models within this field face a fundamental trade-off between the slow, iterative sampling of diffusion models and the architectural constraints of faster Flow-based methods, which often rely on explicit consistency losses. To address these limitations, we introduce MP1, which pairs 3D point-cloud inputs with the MeanFlow paradigm to generate action trajectories in one network function evaluation (1-NFE). By directly learning the interval-averaged velocity via the "MeanFlow Identity", our policy avoids any additional consistency constraints. This formulation eliminates numerical ODE-solver errors during inference, yielding more precise trajectories. MP1 further incorporates CFG for improved trajectory controllability while retaining 1-NFE inference without reintroducing structural constraints. Because subtle scene-context variations are critical for robot learning, especially in few-shot learning, we introduce a lightweight Dispersive Loss that repels state embeddings during training, boosting generalization without slowing inference. We validate our method on the Adroit and Meta-World benchmarks, as well as in real-world scenarios. Experimental results show MP1 achieves superior average task success rates, outperforming DP3 by 10.2% and FlowPolicy by 7.3%. Its average inference time is only 6.8 ms-19x faster than DP3 and nearly 2x faster than FlowPolicy. Our project page is available at https://mp1-2254.github.io/, and the code can be accessed at https://github.com/LogSSim/MP1.
The Impact of 2D Segmentation Backbones on Point Cloud Predictions Using 4D Radar
LiDAR's dense, sharp point cloud (PC) representations of the surrounding environment enable accurate perception and significantly improve road safety by offering greater scene awareness and understanding. However, LiDAR's high cost continues to restrict the broad adoption of high-level Autonomous Driving (AD) systems in commercially available vehicles. Prior research has shown progress towards circumventing the need for LiDAR by training a neural network, using LiDAR point clouds as ground truth (GT), to produce LiDAR-like 3D point clouds using only 4D Radars. One of the best examples is a neural network created to train a more efficient radar target detector with a modular 2D convolutional neural network (CNN) backbone and a temporal coherence network at its core that uses the RaDelft dataset for training (see arXiv:2406.04723). In this work, we investigate the impact of higher-capacity segmentation backbones on the quality of the produced point clouds. Our results show that while very high-capacity models may actually hurt performance, an optimal segmentation backbone can provide a 23.7% improvement over the state-of-the-art (SOTA).
PeRoI: A Pedestrian-Robot Interaction Dataset for Learning Avoidance, Neutrality, and Attraction Behaviors in Social Navigation
Robots are increasingly being deployed in public spaces such as shopping malls, sidewalks, and hospitals, where safe and socially aware navigation depends on anticipating how pedestrians respond to their presence. However, existing datasets rarely capture the full spectrum of robot-induced reactions, e.g., avoidance, neutrality, attraction, which limits progress in modeling these interactions. In this paper, we present the Pedestrian-Robot Interaction~(PeRoI) dataset that captures pedestrian motions categorized into attraction, neutrality, and repulsion across two outdoor sites under three controlled conditions: no robot present, with stationary robot, and with moving robot. This design explicitly reveals how pedestrian behavior varies across robot contexts, and we provide qualitative and quantitative comparisons to established state-of-the-art datasets. Building on these data, we propose the Neural Robot Social Force Model~(NeuRoSFM), an extension of the Social Force Model that integrates neural networks to augment inter-human dynamics with learned components and explicit robot-induced forces to better predict pedestrian motion in vicinity of robots. We evaluate NeuRoSFM by generating trajectories on multiple real-world datasets. The results demonstrate improved modeling of pedestrian-robot interactions, leading to better prediction accuracy, and highlight the value of our dataset and method for advancing socially aware navigation strategies in human-centered environments.
SMapper: A Multi-Modal Data Acquisition Platform for SLAM Benchmarking
Advancing research in fields such as Simultaneous Localization and Mapping (SLAM) and autonomous navigation critically depends on the availability of reliable and reproducible multimodal datasets. While several influential datasets have driven progress in these domains, they often suffer from limitations in sensing modalities, environmental diversity, and the reproducibility of the underlying hardware setups. To address these challenges, this paper introduces SMapper, a novel open-hardware, multi-sensor platform designed explicitly for, though not limited to, SLAM research. The device integrates synchronized LiDAR, multi-camera, and inertial sensing, supported by a robust calibration and synchronization pipeline that ensures precise spatio-temporal alignment across modalities. Its open and replicable design allows researchers to extend its capabilities and reproduce experiments across both handheld and robot-mounted scenarios. To demonstrate its practicality, we additionally release SMapper-light, a publicly available SLAM dataset containing representative indoor and outdoor sequences. The dataset includes tightly synchronized multimodal data and ground truth trajectories derived from offline LiDAR-based SLAM with sub-centimeter accuracy, alongside dense 3D reconstructions. Furthermore, the paper contains benchmarking results on state-of-the-art LiDAR and visual SLAM frameworks using the SMapper-light dataset. By combining open-hardware design, reproducible data collection, and comprehensive benchmarking, SMapper establishes a robust foundation for advancing SLAM algorithm development, evaluation, and reproducibility. The project's documentation, including source code, CAD models, and dataset links, is publicly available at https://snt-arg.github.io/smapper_docs.
comment: 13 pages, 5 figures, 6 tables
Multi-robot Rigid Formation Navigation via Synchronous Motion and Discrete-time Communication-Control Optimization
Rigid-formation navigation of multiple robots is essential for applications such as cooperative transportation. This process involves a team of collaborative robots maintaining a predefined geometric configuration, such as a square, while in motion. For untethered collaborative motion, inter-robot communication must be conducted through a wireless network. Notably, few existing works offer a comprehensive solution for multi-robot formation navigation executable on microprocessor platforms via wireless networks, particularly for formations that must traverse complex curvilinear paths. To address this gap, we introduce a novel "hold-and-hit" communication-control framework designed to work seamlessly with the widely-used Robotic Operating System (ROS) platform. The hold-and-hit framework synchronizes robot movements in a manner robust against wireless network delays and packet loss. It operates over discrete-time communication-control cycles, making it suitable for implementation on contemporary microprocessors. Complementary to hold-and-hit, we propose an intra-cycle optimization approach that enables rigid formations to closely follow desired curvilinear paths, even under the nonholonomic movement constraints inherent to most vehicular robots. The combination of hold-and-hit and intra-cycle optimization ensures precise and reliable navigation even in challenging scenarios. Simulations in a virtual environment demonstrate the superiority of our method in maintaining a four-robot square formation along an S-shaped path, outperforming two existing approaches. Furthermore, real-world experiments validate the effectiveness of our framework: the robots maintained an inter-distance error within $\pm 0.069m$ and an inter-angular orientation error within $\pm19.15^{\circ}$ while navigating along an S-shaped path at a fixed linear velocity of $0.1 m/s$.
A Multimodal Depth-Aware Method For Embodied Reference Understanding
Embodied Reference Understanding requires identifying a target object in a visual scene based on both language instructions and pointing cues. While prior works have shown progress in open-vocabulary object detection, they often fail in ambiguous scenarios where multiple candidate objects exist in the scene. To address these challenges, we propose a novel ERU framework that jointly leverages LLM-based data augmentation, depth-map modality, and a depth-aware decision module. This design enables robust integration of linguistic and embodied cues, improving disambiguation in complex or cluttered environments. Experimental results on two datasets demonstrate that our approach significantly outperforms existing baselines, achieving more accurate and reliable referent detection.
A Knowledge-Informed Deep Learning Paradigm for Generalizable and Stability-Optimized Car-Following Models
Car-following models (CFMs) are fundamental to traffic flow analysis and autonomous driving. Although calibrated physics-based and trained data-driven CFMs can replicate human driving behavior, their reliance on specific datasets limits generalization across diverse scenarios and reduces reliability in real-world deployment. Moreover, these models typically focus on behavioral fidelity and do not support the explicit optimization of local and string stability, which are increasingly important for the safe and efficient operation of autonomous vehicles (AVs). To address these limitations, we propose a Knowledge-Informed Deep Learning (KIDL) paradigm that distills the generalization capabilities of pre-trained Large Language Models (LLMs) into a lightweight and stability-aware neural architecture. LLMs are used to extract fundamental car-following knowledge beyond dataset-specific patterns, and this knowledge is transferred to a reliable, tractable, and computationally efficient model through knowledge distillation. KIDL also incorporates stability constraints directly into its training objective, ensuring that the resulting model not only emulates human-like behavior but also satisfies the local and string stability requirements essential for real-world AV deployment. We evaluate KIDL on the real-world NGSIM and HighD datasets, comparing its performance with representative physics-based, data-driven, and hybrid CFMs. Both empirical and theoretical results consistently demonstrate KIDL's superior behavioral generalization and traffic flow stability, offering a robust and scalable solution for next-generation traffic systems.
An Introduction to Zero-Order Optimization Techniques for Robotics
Zero-order optimization techniques are becoming increasingly popular in robotics due to their ability to handle non-differentiable functions and escape local minima. These advantages make them particularly useful for trajectory optimization and policy optimization. In this work, we propose a mathematical tutorial on random search. It offers a simple and unifying perspective for understanding a wide range of algorithms commonly used in robotics. Leveraging this viewpoint, we classify many trajectory optimization methods under a common framework and derive novel competitive RL algorithms.
CCDP: Composition of Conditional Diffusion Policies with Guided Sampling IROS 2025
Imitation Learning offers a promising approach to learn directly from data without requiring explicit models, simulations, or detailed task definitions. During inference, actions are sampled from the learned distribution and executed on the robot. However, sampled actions may fail for various reasons, and simply repeating the sampling step until a successful action is obtained can be inefficient. In this work, we propose an enhanced sampling strategy that refines the sampling distribution to avoid previously unsuccessful actions. We demonstrate that by solely utilizing data from successful demonstrations, our method can infer recovery actions without the need for additional exploratory behavior or a high-level controller. Furthermore, we leverage the concept of diffusion model decomposition to break down the primary problem, which may require long-horizon history to manage failures, into multiple smaller, more manageable sub-problems in learning, data collection, and inference, thereby enabling the system to adapt to variable failure counts. Our approach yields a low-level controller that dynamically adjusts its sampling space to improve efficiency when prior samples fall short. We validate our method across several tasks, including door opening with unknown directions, object manipulation, and button-searching scenarios, demonstrating that our approach outperforms traditional baselines.
comment: Accepted to IROS 2025
A Real-Time System for Scheduling and Managing UAV Delivery in Urban Areas
As urban logistics demand continues to grow, UAV delivery has become a key solution to improve delivery efficiency, reduce traffic congestion, and lower logistics costs. However, to fully leverage the potential of UAV delivery networks, efficient swarm scheduling and management are crucial. In this paper, we propose a real-time scheduling and management system based on the ``Airport-Unloading Station" model, aiming to bridge the gap between high-level scheduling algorithms and low-level execution systems. This system, acting as middleware, accurately translates the requirements from the scheduling layer into specific execution instructions, ensuring that the scheduling algorithms perform effectively in real-world environments. Additionally, we implement three collaborative scheduling schemes involving autonomous ground vehicles (AGVs), unmanned aerial vehicles (UAVs), and ground staff to further optimize overall delivery efficiency. Through extensive experiments, this study demonstrates the rationality and feasibility of the proposed management system, providing practical solution for the commercial application of UAVs delivery in urban. Code: https://github.com/chengji253/UAVDeliverySystem
Aegis: Automated Error Generation and Attribution for Multi-Agent Systems
Large language model based multi-agent systems (MAS) have unlocked significant advancements in tackling complex problems, but their increasing capability introduces a structural fragility that makes them difficult to debug. A key obstacle to improving their reliability is the severe scarcity of large-scale, diverse datasets for error attribution, as existing resources rely on costly and unscalable manual annotation. To address this bottleneck, we introduce Aegis, a novel framework for Automated error generation and attribution for multi-agent systems. Aegis constructs a large dataset of 9,533 trajectories with annotated faulty agents and error modes, covering diverse MAS architectures and task domains. This is achieved using a LLM-based manipulator that can adaptively inject context-aware errors into successful execution trajectories. Leveraging fine-grained labels and the structured arrangement of positive-negative sample pairs, Aegis supports three different learning paradigms: Supervised Fine-Tuning, Reinforcement Learning, and Contrastive Learning. We develop learning methods for each paradigm. Comprehensive experiments show that trained models consistently achieve substantial improvements in error attribution. Notably, several of our fine-tuned LLMs demonstrate performance competitive with or superior to proprietary models an order of magnitude larger, validating our automated data generation framework as a crucial resource for developing more robust and interpretable multi-agent systems. Our project website is available at https://kfq20.github.io/Aegis-Website/.
Nav-EE: Navigation-Guided Early Exiting for Efficient Vision-Language Models in Autonomous Driving
Vision-Language Models (VLMs) are increasingly applied in autonomous driving for unified perception and reasoning, but high inference latency hinders real-time deployment. Early-exit reduces latency by terminating inference at intermediate layers, yet its task-dependent nature limits generalization across diverse scenarios. We observe that this limitation aligns with autonomous driving: navigation systems can anticipate upcoming contexts (e.g., intersections, traffic lights), indicating which tasks will be required. We propose Nav-EE, a navigation-guided early-exit framework that precomputes task-specific exit layers offline and dynamically applies them online based on navigation priors. Experiments on CODA, Waymo, and BOSCH show that Nav-EE achieves accuracy comparable to full inference while reducing latency by up to 63.9%. Real-vehicle integration with Autoware Universe further demonstrates reduced inference latency (600ms to 300ms), supporting faster decision-making in complex scenarios. These results suggest that coupling navigation foresight with early-exit offers a viable path toward efficient deployment of large models in autonomous systems. Code and data are available at our anonymous repository: https://anonymous.4open.science/r/Nav-EE-BBC4
Mitigating Suboptimality of Deterministic Policy Gradients in Complex Q-functions
In reinforcement learning, off-policy actor-critic methods like DDPG and TD3 use deterministic policy gradients: the Q-function is learned from environment data, while the actor maximizes it via gradient ascent. We observe that in complex tasks such as dexterous manipulation and restricted locomotion with mobility constraints, the Q-function exhibits many local optima, making gradient ascent prone to getting stuck. To address this, we introduce SAVO, an actor architecture that (i) generates multiple action proposals and selects the one with the highest Q-value, and (ii) approximates the Q-function repeatedly by truncating poor local optima to guide gradient ascent more effectively. We evaluate tasks such as restricted locomotion, dexterous manipulation, and large discrete-action space recommender systems and show that our actor finds optimal actions more frequently and outperforms alternate actor architectures.
comment: Outstanding Paper Award on Empirical Reinforcement Learning Research, RLC 2025
AirScape: An Aerial Generative World Model with Motion Controllability
How to enable agents to predict the outcomes of their own motion intentions in three-dimensional space has been a fundamental problem in embodied intelligence. To explore general spatial imagination capability, we present AirScape, the first world model designed for six-degree-of-freedom aerial agents. AirScape predicts future observation sequences based on current visual inputs and motion intentions. Specifically, we construct a dataset for aerial world model training and testing, which consists of 11k video-intention pairs. This dataset includes first-person-view videos capturing diverse drone actions across a wide range of scenarios, with over 1,000 hours spent annotating the corresponding motion intentions. Then we develop a two-phase schedule to train a foundation model--initially devoid of embodied spatial knowledge--into a world model that is controllable by motion intentions and adheres to physical spatio-temporal constraints. Experimental results demonstrate that AirScape significantly outperforms existing foundation models in 3D spatial imagination capabilities, especially with over a 50% improvement in metrics reflecting motion alignment. The project is available at: https://embodiedcity.github.io/AirScape/.
Maximizing UAV Cellular Connectivity with Reinforcement Learning for BVLoS Path Planning
This paper presents a reinforcement learning (RL) based approach for path planning of cellular connected unmanned aerial vehicles (UAVs) operating beyond visual line of sight (BVLoS). The objective is to minimize travel distance while maximizing the quality of cellular link connectivity by considering real world aerial coverage constraints and employing an empirical aerial channel model. The proposed solution employs RL techniques to train an agent, using the quality of communication links between the UAV and base stations (BSs) as the reward function. Simulation results demonstrate the effectiveness of the proposed method in training the agent and generating feasible UAV path plans. The proposed approach addresses the challenges due to limitations in UAV cellular communications, highlighting the need for investigations and considerations in this area. The RL algorithm efficiently identifies optimal paths, ensuring maximum connectivity with ground BSs to ensure safe and reliable BVLoS flight operation. Moreover, the solution can be deployed as an offline path planning module that can be integrated into future ground control systems (GCS) for UAV operations, enhancing their capabilities and safety. The method holds potential for complex long range UAV applications, advancing the technology in the field of cellular connected UAV path planning.
comment: Submitted to an IEEE Conference
DPL: Depth-only Perceptive Humanoid Locomotion via Realistic Depth Synthesis and Cross-Attention Terrain Reconstruction
Recent advancements in legged robot perceptive locomotion have shown promising progress. However, terrain-aware humanoid locomotion remains largely constrained to two paradigms: depth image-based end-to-end learning and elevation map-based methods. The former suffers from limited training efficiency and a significant sim-to-real gap in depth perception, while the latter depends heavily on multiple vision sensors and localization systems, resulting in latency and reduced robustness. To overcome these challenges, we propose a novel framework that tightly integrates three key components: (1) Terrain-Aware Locomotion Policy with a Blind Backbone, which leverages pre-trained elevation map-based perception to guide reinforcement learning with minimal visual input; (2) Multi-Modality Cross-Attention Transformer, which reconstructs structured terrain representations from noisy depth images; (3) Realistic Depth Images Synthetic Method, which employs self-occlusion-aware ray casting and noise-aware modeling to synthesize realistic depth observations, achieving over 30\% reduction in terrain reconstruction error. This combination enables efficient policy training with limited data and hardware resources, while preserving critical terrain features essential for generalization. We validate our framework on a full-sized humanoid robot, demonstrating agile and adaptive locomotion across diverse and challenging terrains.
Automating eHMI Action Design with LLMs for Automated Vehicle Communication EMNLP 2025
The absence of explicit communication channels between automated vehicles (AVs) and other road users requires the use of external Human-Machine Interfaces (eHMIs) to convey messages effectively in uncertain scenarios. Currently, most eHMI studies employ predefined text messages and manually designed actions to perform these messages, which limits the real-world deployment of eHMIs, where adaptability in dynamic scenarios is essential. Given the generalizability and versatility of large language models (LLMs), they could potentially serve as automated action designers for the message-action design task. To validate this idea, we make three contributions: (1) We propose a pipeline that integrates LLMs and 3D renderers, using LLMs as action designers to generate executable actions for controlling eHMIs and rendering action clips. (2) We collect a user-rated Action-Design Scoring dataset comprising a total of 320 action sequences for eight intended messages and four representative eHMI modalities. The dataset validates that LLMs can translate intended messages into actions close to a human level, particularly for reasoning-enabled LLMs. (3) We introduce two automated raters, Action Reference Score (ARS) and Vision-Language Models (VLMs), to benchmark 18 LLMs, finding that the VLM aligns with human preferences yet varies across eHMI modalities.
comment: Accepted as findings for EMNLP 2025
SHeRLoc: Synchronized Heterogeneous Radar Place Recognition for Cross-Modal Localization
Despite the growing adoption of radar in robotics, the majority of research has been confined to homogeneous sensor types, overlooking the integration and cross-modality challenges inherent in heterogeneous radar technologies. This leads to significant difficulties in generalizing across diverse radar data types, with modality-aware approaches that could leverage the complementary strengths of heterogeneous radar remaining unexplored. To bridge these gaps, we propose SHeRLoc, the first deep network tailored for heterogeneous radar, which utilizes RCS polar matching to align multimodal radar data. Our hierarchical optimal transport-based feature aggregation method generates rotationally robust multi-scale descriptors. By employing FFT-similarity-based data mining and adaptive margin-based triplet loss, SHeRLoc enables FOV-aware metric learning. SHeRLoc achieves an order of magnitude improvement in heterogeneous radar place recognition, increasing recall@1 from below 0.1 to 0.9 on a public dataset and outperforming state of-the-art methods. Also applicable to LiDAR, SHeRLoc paves the way for cross-modal place recognition and heterogeneous sensor SLAM. The supplementary materials and source code are available at https://sites.google.com/view/radar-sherloc.
comment: 9 pages, 9 figures, accepted to RA-L
USIM and U0: A Vision-Language-Action Dataset and Model for General Underwater Robots
Underwater environments present unique challenges for robotic operation, including complex hydrodynamics, limited visibility, and constrained communication. Although data-driven approaches have advanced embodied intelligence in terrestrial robots and enabled task-specific autonomous underwater robots, developing underwater intelligence capable of autonomously performing multiple tasks remains highly challenging, as large-scale, high-quality underwater datasets are still scarce. To address these limitations, we introduce USIM, a simulation-based multi-task Vision-Language-Action (VLA) dataset for underwater robots. USIM comprises over 561K frames from 1,852 trajectories, totaling approximately 15.6 hours of BlueROV2 interactions across 20 tasks in 9 diverse scenarios, ranging from visual navigation to mobile manipulation. Building upon this dataset, we propose U0, a VLA model for general underwater robots, which integrates binocular vision and other sensor modalities through multimodal fusion, and further incorporates a convolution-attention-based perception focus enhancement module (CAP) to improve spatial understanding and mobile manipulation. Across tasks such as inspection, obstacle avoidance, scanning, and dynamic tracking, the framework achieves a success rate of 80%, while in challenging mobile manipulation tasks, it reduces the distance to the target by 21.2% compared with baseline methods, demonstrating its effectiveness. USIM and U0 show that VLA models can be effectively applied to underwater robotic applications, providing a foundation for scalable dataset construction, improved task autonomy, and the practical realization of intelligent general underwater robots.
comment: Project Page: https://vincentgu2000.github.io/u0project/
An Imitative Reinforcement Learning Framework for Pursuit-Lock-Launch Missions
Unmanned Combat Aerial Vehicle (UCAV) Within-Visual-Range (WVR) engagement, referring to a fight between two or more UCAVs at close quarters, plays a decisive role on the aerial battlefields. With the development of artificial intelligence, WVR engagement progressively advances towards intelligent and autonomous modes. However, autonomous WVR engagement policy learning is hindered by challenges such as weak exploration capabilities, low learning efficiency, and unrealistic simulated environments. To overcome these challenges, we propose a novel imitative reinforcement learning framework, which efficiently leverages expert data while enabling autonomous exploration. The proposed framework not only enhances learning efficiency through expert imitation, but also ensures adaptability to dynamic environments via autonomous exploration with reinforcement learning. Therefore, the proposed framework can learn a successful policy of `pursuit-lock-launch' for UCAVs. To support data-driven learning, we establish an environment based on the Harfang3D sandbox. The extensive experiment results indicate that the proposed framework excels in this multistage task, and significantly outperforms state-of-the-art reinforcement learning and imitation learning methods. Thanks to the ability of imitating experts and autonomous exploration, our framework can quickly learn the critical knowledge in complex aerial combat tasks, achieving up to a 100% success rate and demonstrating excellent robustness.
Event-RGB Fusion for Spacecraft Pose Estimation Under Harsh Lighting
Spacecraft pose estimation is crucial for autonomous in-space operations, such as rendezvous, docking and on-orbit servicing. Vision-based pose estimation methods, which typically employ RGB imaging sensors, is a compelling solution for spacecraft pose estimation, but are challenged by harsh lighting conditions, which produce imaging artifacts such as glare, over-exposure, blooming and lens flare. Due to their much higher dynamic range, neuromorphic or event sensors are more resilient to extreme lighting conditions. However, event sensors generally have lower spatial resolution and suffer from reduced signal-to-noise ratio during periods of low relative motion. This work addresses these individual sensor limitations by introducing a sensor fusion approach combining RGB and event sensors. A beam-splitter prism was employed to achieve precise optical and temporal alignment. Then, a RANSAC-based technique was developed to fuse the information from the RGB and event channels to achieve pose estimation that leveraged the strengths of the two modalities. The pipeline was complemented by dropout uncertainty estimation to detect extreme conditions that affect either channel. To benchmark the performance of the proposed event-RGB fusion method, we collected a comprehensive real dataset of RGB and event data for satellite pose estimation in a laboratory setting under a variety of challenging illumination conditions. Encouraging results on the dataset demonstrate the efficacy of our event-RGB fusion approach and further supports the usage of event sensors for spacecraft pose estimation. To support community research on this topic, our dataset has been released publicly.
Learning a Shape-adaptive Assist-as-needed Rehabilitation Policy from Therapist-informed Input
Therapist-in-the-loop robotic rehabilitation has shown great promise in enhancing rehabilitation outcomes by integrating the strengths of therapists and robotic systems. However, its broader adoption remains limited due to insufficient safe interaction and limited adaptation capability. This article proposes a novel telerobotics-mediated framework that enables therapists to intuitively and safely deliver assist-as-needed~(AAN) therapy based on two primary contributions. First, our framework encodes the therapist-informed corrective force into via-points in a latent space, allowing the therapist to provide only minimal assistance while encouraging patient maintaining own motion preferences. Second, a shape-adaptive ANN rehabilitation policy is learned to partially and progressively deform the reference trajectory for movement therapy based on encoded patient motion preferences and therapist-informed via-points. The effectiveness of the proposed shape-adaptive AAN strategy was validated on a telerobotic rehabilitation system using two representative tasks. The results demonstrate its practicality for remote AAN therapy and its superiority over two state-of-the-art methods in reducing corrective force and improving movement smoothness.
NAMOUnc: Navigation Among Movable Obstacles with Decision Making on Uncertainty Interval
Navigation among movable obstacles (NAMO) is a critical task in robotics, often challenged by real-world uncertainties such as observation noise, model approximations, action failures, and partial observability. Existing solutions frequently assume ideal conditions, leading to suboptimal or risky decisions. This paper introduces NAMOUnc, a novel framework designed to address these uncertainties by integrating them into the decision-making process. We first estimate them and compare the corresponding time cost intervals for removing and bypassing obstacles, optimizing both the success rate and time efficiency, ensuring safer and more efficient navigation. We validate our method through extensive simulations and real-world experiments, demonstrating significant improvements over existing NAMO frameworks. More details can be found in our website: https://kai-zhang-er.github.io/namo-uncertainty/
comment: 11 pages, ICINCO2025
MLLM-Fabric: Multimodal Large Language Model-Driven Robotic Framework for Fabric Sorting and Selection
Choosing appropriate fabrics is critical for meeting functional and quality demands in robotic textile manufacturing, apparel production, and smart retail. We propose MLLM-Fabric, a robotic framework leveraging multimodal large language models (MLLMs) for fabric sorting and selection. Built on a multimodal robotic platform, the system is trained through supervised fine-tuning and explanation-guided distillation to rank fabric properties. We also release a dataset of 220 diverse fabrics, each with RGB images and synchronized visuotactile and pressure data. Experiments show that our Fabric-Llama-90B consistently outperforms pretrained vision-language baselines in both attribute ranking and selection reliability. Code and dataset are publicly available at https://github.com/limanwang/MLLM-Fabric.
comment: Accepted to IEEE Robotics and Automation Letters (RAL)
Motion Tracks: A Unified Representation for Human-Robot Transfer in Few-Shot Imitation Learning
Teaching robots to autonomously complete everyday tasks remains a challenge. Imitation Learning (IL) is a powerful approach that imbues robots with skills via demonstrations, but is limited by the labor-intensive process of collecting teleoperated robot data. Human videos offer a scalable alternative, but it remains difficult to directly train IL policies from them due to the lack of robot action labels. To address this, we propose to represent actions as short-horizon 2D trajectories on an image. These actions, or motion tracks, capture the predicted direction of motion for either human hands or robot end-effectors. We instantiate an IL policy called Motion Track Policy (MT-pi) which receives image observations and outputs motion tracks as actions. By leveraging this unified, cross-embodiment action space, MT-pi completes tasks with high success given just minutes of human video and limited additional robot demonstrations. At test time, we predict motion tracks from two camera views, recovering 6DoF trajectories via multi-view synthesis. MT-pi achieves an average success rate of 86.5% across 4 real-world tasks, outperforming state-of-the-art IL baselines which do not leverage human data or our action space by 40%, and generalizes to scenarios seen only in human videos. Code and videos are available on our website https://portal-cornell.github.io/motion_track_policy/.
Effect of Performance Feedback Timing on Motor Learning for a Surgical Training Task
Objective: Robot-assisted minimally invasive surgery (RMIS) has become the gold standard for a variety of surgical procedures, but the optimal method of training surgeons for RMIS is unknown. We hypothesized that real-time, rather than post-task, error feedback would better increase learning speed and reduce errors. Methods: Forty-two surgical novices learned a virtual version of the ring-on-wire task, a canonical task in RMIS training. We investigated the impact of feedback timing with multi-sensory (haptic and visual) cues in three groups: (1) real-time error feedback, (2) trial replay with error feedback, and (3) no error feedback. Results: Participant performance was evaluated based on the accuracy of ring position and orientation during the task. Participants who received real-time feedback outperformed other groups in ring orientation. Additionally, participants who received feedback in replay outperformed participants who did not receive any error feedback on ring orientation during long, straight path sections. There were no significant differences between groups for ring position overall, but participants who received real-time feedback outperformed the other groups in positional accuracy on tightly curved path sections. Conclusion: The addition of real-time haptic and visual error feedback improves learning outcomes in a virtual surgical task over error feedback in replay or no error feedback at all. Significance: This work demonstrates that multi-sensory error feedback delivered in real time leads to better training outcomes as compared to the same feedback delivered after task completion. This novel method of training may enable surgical trainees to develop skills with greater speed and accuracy.
comment: Submitted to IEEE Transactions on Biomedical Engineering
Adaptive Dynamics Planning for Robot Navigation
Autonomous robot navigation systems often rely on hierarchical planning, where global planners compute collision-free paths without considering dynamics, and local planners enforce dynamics constraints to produce executable commands. This discontinuity in dynamics often leads to trajectory tracking failure in highly constrained environments. Recent approaches integrate dynamics within the entire planning process by gradually decreasing its fidelity, e.g., increasing integration steps and reducing collision checking resolution, for real-time planning efficiency. However, they assume that the fidelity of the dynamics should decrease according to a manually designed scheme. Such static settings fail to adapt to environmental complexity variations, resulting in computational overhead in simple environments or insufficient dynamics consideration in obstacle-rich scenarios. To overcome this limitation, we propose Adaptive Dynamics Planning (ADP), a learning-augmented paradigm that uses reinforcement learning to dynamically adjust robot dynamics properties, enabling planners to adapt across diverse environments. We integrate ADP into three different planners and further design a standalone ADP-based navigation system, benchmarking them against other baselines. Experiments in both simulation and real-world tests show that ADP consistently improves navigation success, safety, and efficiency.
comment: 8 pages, 4 figures
Adaptive Stress Testing Black-Box LLM Planners
Large language models (LLMs) have recently demonstrated success in generalizing across decision-making tasks including planning, control, and prediction, but their tendency to hallucinate unsafe and undesired outputs poses risks. We argue that detecting such failures is necessary, especially in safety-critical scenarios. Existing methods for black-box models often detect hallucinations by identifying inconsistencies across multiple samples. Many of these approaches typically introduce prompt perturbations like randomizing detail order or generating adversarial inputs, with the intuition that a confident model should produce stable outputs. We first perform a manual case study showing that other forms of perturbations (e.g., adding noise, removing sensor details) cause LLMs to hallucinate in a multi-agent driving environment. We then propose a novel method for efficiently searching the space of prompt perturbations using adaptive stress testing (AST) with Monte-Carlo tree search (MCTS). Our AST formulation enables discovery of scenarios and prompts that cause language models to act with high uncertainty or even crash. By generating MCTS prompt perturbation trees across diverse scenarios, we show through extensive experiments that offline analyses can be used at runtime to automatically generate prompts that influence model uncertainty, and to inform real-time trust assessments of an LLM. We further characterize LLMs deployed as planners in a single-agent lunar lander environment and in a multi-agent robot crowd navigation simulation. Overall, ours is one of the first hallucination intervention algorithms to pave a path towards rigorous characterization of black-box LLM planners.
comment: 25 pages, 24 figures, 5 tables
Multiagent Systems
Scalable Multi-Agent Path Finding using Collision-Aware Dynamic Alert Mask and a Hybrid Execution Strategy
Multi-agent pathfinding (MAPF) remains a critical problem in robotics and autonomous systems, where agents must navigate shared spaces efficiently while avoiding conflicts. Traditional centralized algorithms that have global information, such as Conflict-Based Search (CBS), provide high-quality solutions but become computationally expensive in large-scale scenarios due to the combinatorial explosion of conflicts that need resolution. Conversely, distributed approaches that have local information, particularly learning-based methods, offer better scalability by operating with relaxed information availability, yet often at the cost of solution quality. To address these limitations, we propose a hybrid framework that combines decentralized path planning with a lightweight centralized coordinator. Our framework leverages reinforcement learning (RL) for decentralized planning, enabling agents to adapt their planning based on minimal, targeted alerts--such as static conflict-cell flags or brief conflict tracks--that are dynamically shared information from the central coordinator for effective conflict resolution. We empirically study the effect of the information available to an agent on its planning performance. Our approach reduces the inter-agent information sharing compared to fully centralized and distributed methods, while still consistently finding feasible, collision-free solutions--even in large-scale scenarios having higher agent counts.
Identifying & Interactively Refining Ambiguous User Goals for Data Visualization Code Generation
Establishing shared goals is a fundamental step in human-AI communication. However, ambiguities can lead to outputs that seem correct but fail to reflect the speaker's intent. In this paper, we explore this issue with a focus on the data visualization domain, where ambiguities in natural language impact the generation of code that visualizes data. The availability of multiple views on the contextual (e.g., the intended plot and the code rendering the plot) allows for a unique and comprehensive analysis of diverse ambiguity types. We develop a taxonomy of types of ambiguity that arise in this task and propose metrics to quantify them. Using Matplotlib problems from the DS-1000 dataset, we demonstrate that our ambiguity metrics better correlate with human annotations than uncertainty baselines. Our work also explores how multi-turn dialogue can reduce ambiguity, therefore, improve code accuracy by better matching user goals. We evaluate three pragmatic models to inform our dialogue strategies: Gricean Cooperativity, Discourse Representation Theory, and Questions under Discussion. A simulated user study reveals how pragmatic dialogues reduce ambiguity and enhance code accuracy, highlighting the value of multi-turn exchanges in code generation.
Decentralized Multi-Robot Relative Navigation in Unknown, Structurally Constrained Environments under Limited Communication
Multi-robot navigation in unknown, structurally constrained, and GPS-denied environments presents a fundamental trade-off between global strategic foresight and local tactical agility, particularly under limited communication. Centralized methods achieve global optimality but suffer from high communication overhead, while distributed methods are efficient but lack the broader awareness to avoid deadlocks and topological traps. To address this, we propose a fully decentralized, hierarchical relative navigation framework that achieves both strategic foresight and tactical agility without a unified coordinate system. At the strategic layer, robots build and exchange lightweight topological maps upon opportunistic encounters. This process fosters an emergent global awareness, enabling the planning of efficient, trap-avoiding routes at an abstract level. This high-level plan then inspires the tactical layer, which operates on local metric information. Here, a sampling-based escape point strategy resolves dense spatio-temporal conflicts by generating dynamically feasible trajectories in real time, concurrently satisfying tight environmental and kinodynamic constraints. Extensive simulations and real-world experiments demonstrate that our system significantly outperforms in success rate and efficiency, especially in communication-limited environments with complex topological structures.
GTAlign: Game-Theoretic Alignment of LLM Assistants for Mutual Welfare
Large Language Models (LLMs) have achieved remarkable progress in reasoning, yet sometimes produce responses that are suboptimal for users in tasks such as writing, information seeking, or providing practical guidance. Conventional alignment practices typically assume that maximizing model reward also maximizes user welfare, but this assumption frequently fails in practice: models may over-clarify or generate overly verbose reasoning when users prefer concise answers. Such behaviors resemble the prisoner's dilemma, where individually rational choices lead to socially suboptimal outcomes. The fundamental challenge is the lack of a principled decision making mechanism that mutually benefits both the LLM and the user. We propose Game-Theoretic Alignment (GTAlign), an alignment framework that integrates game-theoretic decision making into both reasoning and training. During reasoning, the model explicitly treats user-LLM interaction as a strategic game: it constructs payoff matrices within its reasoning chain to estimate welfare for both itself and the user, and then selects actions that are mutually beneficial. During training, we introduce a mutual welfare reward that reinforces cooperative responses, aligning model behavior with socially efficient outcomes. In addition, we introduce an inference technique that leverages game-theoretic reasoning to dynamically adapt LLM's response when pricing policies of LLM service change. Extensive experiments demonstrate that GTAlign substantially improves reasoning efficiency, answer quality, and mutual welfare compared to baselines across diverse tasks. The code is available at https://github.com/ulab-uiuc/GTAlign .
comment: 31 pages, 6 figures
CLARITY: Clinical Assistant for Routing, Inference, and Triage EMNLP 2025
We present CLARITY (Clinical Assistant for Routing, Inference and Triage), an AI-driven platform designed to facilitate patient-to-specialist routing, clinical consultations, and severity assessment of patient conditions. Its hybrid architecture combines a Finite State Machine (FSM) for structured dialogue flows with collaborative agents that employ Large Language Model (LLM) to analyze symptoms and prioritize referrals to appropriate specialists. Built on a modular microservices framework, CLARITY ensures safe, efficient, and robust performance, flexible and readily scalable to meet the demands of existing workflows and IT solutions in healthcare. We report integration of our clinical assistant into a large-scale national interhospital platform, with more than 55,000 content-rich user dialogues completed within the two months of deployment, 2,500 of which were expert-annotated for subsequent validation. The validation results show that CLARITY surpasses human-level performance in terms of the first-attempt routing precision, naturally requiring up to 3 times shorter duration of the consultation than with a human.
comment: Accepted to EMNLP 2025 (Industrial Track)
Anemoi: A Semi-Centralized Multi-agent System Based on Agent-to-Agent Communication MCP server from Coral Protocol
Recent advances in generalist multi-agent systems (MAS) have largely followed a context-engineering plus centralized paradigm, where a planner agent coordinates multiple worker agents through unidirectional prompt passing. While effective under strong planner models, this design suffers from two critical limitations: (1) strong dependency on the planner's capability, which leads to degraded performance when a smaller LLM powers the planner; and (2) limited inter-agent communication, where collaboration relies on prompt concatenation rather than genuine refinement through structured discussions. To address these challenges, we propose Anemoi, a semi-centralized MAS built on the Agent-to-Agent (A2A) communication MCP server from Coral Protocol. Unlike traditional designs, Anemoi enables structured and direct inter-agent collaboration, allowing all agents to monitor progress, assess results, identify bottlenecks, and propose refinements in real time. This paradigm reduces reliance on a single planner, supports adaptive plan updates, and minimizes redundant context passing, resulting in more scalable execution. Evaluated on the GAIA benchmark, Anemoi achieved 52.73% accuracy with a small LLM (GPT-4.1-mini) as the planner, surpassing the strongest open-source baseline OWL (43.63%) by +9.09% under identical LLM settings. Our implementation is publicly available at https://github.com/Coral-Protocol/Anemoi.
Aegis: Automated Error Generation and Attribution for Multi-Agent Systems
Large language model based multi-agent systems (MAS) have unlocked significant advancements in tackling complex problems, but their increasing capability introduces a structural fragility that makes them difficult to debug. A key obstacle to improving their reliability is the severe scarcity of large-scale, diverse datasets for error attribution, as existing resources rely on costly and unscalable manual annotation. To address this bottleneck, we introduce Aegis, a novel framework for Automated error generation and attribution for multi-agent systems. Aegis constructs a large dataset of 9,533 trajectories with annotated faulty agents and error modes, covering diverse MAS architectures and task domains. This is achieved using a LLM-based manipulator that can adaptively inject context-aware errors into successful execution trajectories. Leveraging fine-grained labels and the structured arrangement of positive-negative sample pairs, Aegis supports three different learning paradigms: Supervised Fine-Tuning, Reinforcement Learning, and Contrastive Learning. We develop learning methods for each paradigm. Comprehensive experiments show that trained models consistently achieve substantial improvements in error attribution. Notably, several of our fine-tuned LLMs demonstrate performance competitive with or superior to proprietary models an order of magnitude larger, validating our automated data generation framework as a crucial resource for developing more robust and interpretable multi-agent systems. Our project website is available at https://kfq20.github.io/Aegis-Website/.
Reimagining Agent-based Modeling with Large Language Model Agents via Shachi
The study of emergent behaviors in large language model (LLM)-driven multi-agent systems is a critical research challenge, yet progress is limited by a lack of principled methodologies for controlled experimentation. To address this, we introduce Shachi, a formal methodology and modular framework that decomposes an agent's policy into core cognitive components: Configuration for intrinsic traits, Memory for contextual persistence, and Tools for expanded capabilities, all orchestrated by an LLM reasoning engine. This principled architecture moves beyond brittle, ad-hoc agent designs and enables the systematic analysis of how specific architectural choices influence collective behavior. We validate our methodology on a comprehensive 10-task benchmark and demonstrate its power through novel scientific inquiries. Critically, we establish the external validity of our approach by modeling a real-world U.S. tariff shock, showing that agent behaviors align with observed market reactions only when their cognitive architecture is appropriately configured with memory and tools. Our work provides a rigorous, open-source foundation for building and evaluating LLM agents, aimed at fostering more cumulative and scientifically grounded research.
DDO: Dual-Decision Optimization for LLM-Based Medical Consultation via Multi-Agent Collaboration EMNLP 2025
Large Language Models (LLMs) demonstrate strong generalization and reasoning abilities, making them well-suited for complex decision-making tasks such as medical consultation (MC). However, existing LLM-based methods often fail to capture the dual nature of MC, which entails two distinct sub-tasks: symptom inquiry, a sequential decision-making process, and disease diagnosis, a classification problem. This mismatch often results in ineffective symptom inquiry and unreliable disease diagnosis. To address this, we propose \textbf{DDO}, a novel LLM-based framework that performs \textbf{D}ual-\textbf{D}ecision \textbf{O}ptimization by decoupling the two sub-tasks and optimizing them with distinct objectives through a collaborative multi-agent workflow. Experiments on three real-world MC datasets show that DDO consistently outperforms existing LLM-based approaches and achieves competitive performance with state-of-the-art generation-based methods, demonstrating its effectiveness in the MC task. The code is available at https://github.com/zh-jia/DDO.
comment: Accepted to EMNLP 2025
What Do Agents Think One Another Want? Level-2 Inverse Games for Inferring Agents' Estimates of Others' Objectives
Effectively interpreting strategic interactions among multiple agents requires us to infer each agent's objective from limited information. Existing inverse game-theoretic approaches frame this challenge in terms of a "level-1" inference problem, in which we take the perspective of a third-party observer and assume that individual agents share complete knowledge of one another's objectives. However, this assumption breaks down in decentralized, real-world scenarios like urban driving and bargaining, in which agents may act based on conflicting views of one another's objectives. We demonstrate the necessity of inferring agents' different estimates of each other's objectives through empirical examples, and by theoretically characterizing the prediction error of level-1 inference on fictitious gameplay data from linear-quadratic games. To address this fundamental issue, we propose a framework for level-2 inference to address the question: "What does each agent believe about other agents' objectives?" We prove that the level-2 inference problem is non-convex even in benign settings like linear-quadratic games, and we develop an efficient gradient-based approach for identifying local solutions. Experiments on a synthetic urban driving example show that our approach uncovers nuanced misalignments that level-1 methods miss.
comment: 8 pages + references + appendix + supplement
LegalWiz: A Multi-Agent Generation Framework for Contradiction Detection in Legal Documents
Retrieval-Augmented Generation (RAG) integrates large language models (LLMs) with external sources, but unresolved contradictions in retrieved evidence often lead to hallucinations and legally unsound outputs. Benchmarks currently used for contradiction detection lack domain realism, cover only limited conflict types, and rarely extend beyond single-sentence pairs, making them unsuitable for legal applications. Controlled generation of documents with embedded contradictions is therefore essential: it enables systematic stress-testing of models, ensures coverage of diverse conflict categories, and provides a reliable basis for evaluating contradiction detection and resolution. We present a multi-agent contradiction-aware benchmark framework for the legal domain that generates synthetic legal-style documents, injects six structured contradiction types, and models both self- and pairwise inconsistencies. Automated contradiction mining is combined with human-in-the-loop validation to guarantee plausibility and fidelity. This benchmark offers one of the first structured resources for contradiction-aware evaluation in legal RAG pipelines, supporting more consistent, interpretable, and trustworthy systems.
Co-Alignment: Rethinking Alignment as Bidirectional Human-AI Cognitive Adaptation
Current AI alignment through RLHF follows a single directional paradigm that AI conforms to human preferences while treating human cognition as fixed. We propose a shift to co-alignment through Bidirectional Cognitive Alignment (BiCA), where humans and AI mutually adapt. BiCA uses learnable protocols, representation mapping, and KL-budget constraints for controlled co-evolution. In collaborative navigation, BiCA achieved 85.5% success versus 70.3% baseline, with 230% better mutual adaptation and 332% better protocol convergence. Emergent protocols outperformed handcrafted ones by 84%, while bidirectional adaptation unexpectedly improved safety (+23% out-of-distribution robustness). The 46% synergy improvement demonstrates optimal collaboration exists at the intersection, not union, of human and AI capabilities, validating the shift from single-directional to co-alignment paradigms.
Systems and Control (CS)
Robust reset control design for piezo-actuated nano-positioner in presence of hysteresis nonlinearity
In this paper, a robust nonlinear control scheme is designed for the motion control of a class of piezo-actuated nano-positioning systems using frequency-domain analysis. The hysteresis, the nonlinearity in the piezoelectric material, degrades the precision in tracking references with high frequency contents and different travel ranges. The hysteresis compensation by the inverse model, as the state-of-the-art solution, is not reliable alone. Therefore, a control framework with robustness against the remaining nonlinearity is needed. It is shown that there is an unavoidable limitation in robust linear control design to improve the performance. A robust control methodology based on a complex-order element is established to relax the limitation. Then, a constant-in-gain-lead-in-phase (CgLp) reset controller is utilized to realize the complex-order control. The control design is based on the sinusoidal input describing function (SIDF) and the higher-order SIDF (HOSIDF) tools. A constrained optimization problem is provided to tune the control parameters. The achieved improvements by the CgLp control is validated by the simulation.
Demystifying and Navigating AI Ethics in Power Electronics
Artificial intelligence (AI) is rapidly transforming power electronics, with AI-related publications in IEEE Power Electronics Society selected journals increasing more than fourfold from 2020 to 2025. However, the ethical dimensions of this transformation have received limited attention. This article underscores the urgent need for an ethical framework to guide responsible AI integration in power electronics, not only to prevent AI-related incidents but also to comply with legal and regulatory responsibilities. In this context, this article identifies four core pillars of AI ethics in power electronics: Security & Safety, Explainability & Transparency, Energy Sustainability, and Evolving Roles of Engineers. Each pillar is supported by practical and actionable insights to ensure that ethical principles are embedded in algorithm design, system deployment, and workforce development. The authors advocate for power electronics engineers to lead the ethical discourse, given their deep technical understanding of both AI systems and power conversion technologies. The paper concludes by calling on the IEEE Power Electronics Society to spearhead the establishment of ethical standards and best practices that ensure AI innovations are not only technically advanced but also trustworthy, safe, and sustainable.
Critical States Identiffcation in Power System via Lattice Partition and Its Application in Reliability Assessment
With the increasing complexity of power systems,accurately identifying critical states (the states corresponding to minimal cut sets) and assessing system reliability have become crucial tasks. In this paper, a mathematical lattice structure is employed to represent and partition the state space of power system. Based on this structure, a novel recursive method is proposed to efffciently identify critical states by leveraging lattice partitioning and Optimal Power Flow(OPF) calculations. This method not only enables the extension of failure system states,but also calculates the upper and lower bounds of the Loss of Load Probability (LOLP) in a progressively converging manner. Compared to traditional reliability assessment methods such as State Enumeration (SE) and Monte Carlo Simulation (MCS), this approach offers greater accuracy and efffciency. Experiments conducted on the RBTS and RTS79 systems demonstrate that the proposed method accurately identiffes all critical states up to a preset order, which are high-risk states. The contribution of these critical states to LOLP highlights their signiffcance in the system. Moreover, the proposed method achieves the analytical value with signiffcantly fewer OPF calculations in RBTS system, reaching acceptable precision of LOLP up to 100 times faster than SE in both the RBTS and RTS systems.
Grid-forming Control of Converter Infinite Bus System: Modeling by Data-driven Methods
This study explores data-driven modeling techniques to capture the dynamics of a grid-forming converter-based infinite bus system, critical for renewable-integrated power grids. Using sparse identification of nonlinear dynamics and deep symbolic regression, models were generated from synthetic data simulating key disturbances in active power, reactive power, and voltage references. Deep symbolic regression demonstrated more accuracy in capturing complex system dynamics, though it required substantially more computational time than sparse identification of nonlinear dynamics. These findings suggest that while deep symbolic regression offers high fidelity, sparse identification of nonlinear dynamics provides a more computationally efficient approach, balancing accuracy and runtime for real-time grid applications.
3C Resources Joint Allocation for Time-Deterministic Remote Sensing Image Backhaul in the Space-Ground Integrated Network
Low-Earth-orbit (LEO) satellites assist observation satellites (OSs) to compress and backhaul more time-determined images (TDI) has become a new paradigm, which is used to enhance the timeout caused by the limited computing resources of OSs. However, how to capture the time-varying and dynamic characteristics of multi-dimensional resources is challenging for efficient collaborative scheduling. Motivated by this factor, we design a highly succinct multi-dimensional resource time-expanded graph (MDR-TEG) modell. Specifically, by employing a slots division mechanism and introducing an external virtual node, the time-varying communication, caching, and computing (3C) resources are depicted in low complexity by the link weights within, between, and outside the slots. Based on the MDR-TEG, the maximizing successful transmission ratio of TDI (MSTR-TDI) is modeled as a mixed integer linear programming (MILP) problem. Which further relaxed decomposed into two tractable sub-problems: maximizing the successful transmission rate of images (MSTRI) and ensuring the timeliness problem (ETP). Subsequently, an efficient subgradient of relaxation computing constraint (SRCC) algorithm is proposed. The upper and lower bounds of MSTR-TDI are obtained by solving the two subproblems and the dual problem (DP), and the direction of the next iteration is obtained by feedback. Furthermore, arranging the sending sequences of images to improve the quality of the solution. The approximate optimal solution of MSTR-TDI is eventually obtained through repeated iterations. The simulation results verify the superiority of the proposed MDR-TEG model and the effectiveness of the SRCC.
Task-Level Insights from Eigenvalues across Sequence Models
Although softmax attention drives state-of-the-art performance for sequence models, its quadratic complexity limits scalability, motivating linear alternatives such as state space models (SSMs). While these alternatives improve efficiency, their fundamental differences in information processing remain poorly understood. In this work, we leverage the recently proposed dynamical systems framework to represent softmax, norm and linear attention as dynamical systems, enabling a structured comparison with SSMs by analyzing their respective eigenvalue spectra. Since eigenvalues capture essential aspects of dynamical system behavior, we conduct an extensive empirical analysis across diverse sequence models and benchmarks. We first show that eigenvalues influence essential aspects of memory and long-range dependency modeling, revealing spectral signatures that align with task requirements. Building on these insights, we then investigate how architectural modifications in sequence models impact both eigenvalue spectra and task performance. This correspondence further strengthens the position of eigenvalue analysis as a principled metric for interpreting, understanding, and ultimately improving the capabilities of sequence models.
MPA-DNN: Projection-Aware Unsupervised Learning for Multi-period DC-OPF
Ensuring both feasibility and efficiency in optimal power flow (OPF) operations has become increasingly important in modern power systems with high penetrations of renewable energy and energy storage. While deep neural networks (DNNs) have emerged as promising fast surrogates for OPF solvers, they often fail to satisfy critical operational constraints, especially those involving inter-temporal coupling, such as generator ramping limits and energy storage operations. To deal with these issues, we propose a Multi-Period Projection-Aware Deep Neural Network (MPA-DNN) that incorporates a projection layer for multi-period dispatch into the network. By doing so, our model enforces physical feasibility through the projection, enabling end-to-end learning of constraint-compliant dispatch trajectories without relying on labeled data. Experimental results demonstrate that the proposed method achieves near-optimal performance while strictly satisfying all constraints in varying load conditions.
Data-Driven Control Of Power Converters
The fundamental role of power converters is to efficiently manage and control the flow of electrical energy, ensuring compatibility between power sources and loads. All these applications of power converters need the design of an appropriate control law. Control of power converters is a challenging problem due to the presence of switching devices which are difficult to handle using traditional control approaches. The objective of this paper is to investigate the use of data-driven techniques, in particular the Virtual References Feedback Tuning (VRFT) method, in the context of power converters feedback control. This study considers a buck \pauline{mode} power converter circuit provided by the OwnTech foundation.
comment: Conference paper for the french national electrical engineering symposium, SGE 2025
Weighting Factors Tuning by Direct Feedback in Predictive Control of Multiphase Motors
Predictive Stator Current Control (PSCC) has been proposed for control of multi-phase drives. The flexibility offered by the use of a Cost Function has been used to deal with the increased number of phases. However, tuning of the Weighting Factors constitutes a problem. Intensive trial and error tests are usual in this context. Existing on-line selection methods, on the other hand, require large amounts of data and/or complex optimization procedures. The proposal of this paper is a closed-loop scheme that links Weighting Factors to performance indicators. In this way, optimal Weighting Factors are determined for each operating point. Also, changes in reference values for performance indicators are easily tackled. Unlike previous methods, the proposal carries very little computational burden. A case study is developed for a five-phase induction motor and assessed with real experimentation on a laboratory set-up.
Safety Analysis of eVTOL Operations based on STPA
Electric Vertical Take-Off and Landing (eVTOL) aircraft are expected to be quieter and more cost-effective than helicopters, offering major economic and social benefits through improved connectivity. Their adoption will require new ground infrastructure and airspace redesign, introducing risks involving multiple stakeholders (Regulators, eVTOL operators, Air navigation service providers, Vertiport operators, OEMs, Pilots, etc.). To assess these risks for the UK airspace, systems-thinking based System Theoretic Process Analysis (STPA) was conducted. To manage the large number of Unsafe Control Actions (UCAs) and requirements generated due to the complexity of the analysis, a novel extension to STPA for the prioritization of results was applied. 317 UCAs were identified in total out of which 110 high-priority UCAs were analyzed (Step-4), resulting in 377 causal factors and 432 requirements. These were prioritized to produce a targeted list of 124 distinct high-priority requirements, 56 of which were identified as gaps in existing aviation regulations, policies, or procedures.. These highlight opportunities for regulatory updates in areas such as organizational performance, certification processes, training, collision avoidance, energy management, and automation. The findings provide regulators with safety considerations that could shape new or updated regulations, compliance methods, and guidance materials for the safe deployment of eVTOLs.
Single vs Multi Vector Predictive Control of Five-phase Drives
The field of Finite State Model Predictive Control for multiphase drives has produced many contributions. Many variants of FSMPC exist, each aiming at some aspect such as complexity of the cost function, switching frequency, etc. Despite past efforts to compare different techniques, the field is still out of consensus regarding the relative merits of each one. This paper presents a new method to compare FSMPC variants. The method is based on analyzing the modulation, implicit or explicit, used by each variant. In the paper the method is used to compare single-vector state-of-the-art FSMPC with a multi-vector variant designed to cancel xy currents and simplify the cost function. The results show the strengths and weaknesses of each technique. Also, it is found that the trade-offs between figures, previously thought to concern just individual regimes, extend to the whole operating space and also can be pinpoint to each FSMPC variant. Finally, it is shown that the flexibility of the single-vector approach and its better DC-link usage makes it, arguably, superior over the multi-vector variant.
Robust Adaptive Boundary Control of a Thermal Process with Thermoelectric Actuators: Theory and Experimental Validation
A sliding-mode-based adaptive boundary control law is proposed for a class of uncertain thermal reaction-diffusion processes subject to matched disturbances. The disturbances are assumed to be bounded, but the corresponding bounds are unknown, thus motivating the use of adaptive control strategies. A boundary control law comprising a proportional and discontinuous term is proposed, wherein the magnitude of the discontinuous relay term is adjusted via a gradient-based adaptation algorithm. Depending on how the adaptation algorithm is parameterized, the adaptive gain can be either a nondecreasing function of time (monodirectional adaptation) or it can both increase and decrease (bidirectional adaptation). The convergence and stability properties of these two solutions are investigated by Lyapunov analyses, and two distinct stability results are derived, namely, asymptotic stability for the monodirectional adaptation and globally uniformly ultimately bounded solutions for the bidirectional adaptation. The proposed algorithms are then specified to address the control problem of stabilizing a desired temperature profile in a metal beam equipped with thermoelectric boundary actuators. Experiments are conducted to investigate the real-world performance of the proposed sliding-mode-based adaptive control, with a particular focus on comparing the monodirectional and bidirectional adaptation laws.
comment: Extended version of the preprint submitted to the journal Automatica
Antenna's Performance in Microwave Imaging of Stratified Media
Numerous types of antennas have been employed for microwave imaging of stratified media for ground penetrating radar (GPR), through-the-wall-radar imaging (TWRI), etc. This letter aims to investigate the impact of the different antennas with their characteristics on the image reconstruction of those media. Hence, three types of antennas, including horn antennas, open waveguide and Vivaldi antennas, are chosen as almost directional antennas, operating at X-band 8-12 GHz. The antenna's far-field and near-field characteristics are analyzed. A diffraction tomography (DT)-based algorithm is used to reconstruct the target location within the stratified media using monostatic and multistatic data. It is observed that the more directional antennas provide a better-reconstructed image with less shadowing image of the stratified media.
Sensing, Detection and Localization for Low Altitude UAV: A RF-Based Framework via Multiple BSs Collaboration
The rapid growth of the low-altitude economy has resulted in a significant increase in the number of Low, slow, and small (LLS) unmanned aerial vehicles (UAVs), raising critical challenges for secure airspace management and reliable trajectory planning. To address this, this paper proposes a cooperative radio-frequency (RF) detection and localization framework that leverages existing cellular base stations. The proposed approach features a robust scheme for LSS target identification, integrating a cell averaging-constant false alarm rate (CA-CFAR) detector with a micro-Doppler signature (MDS) based recognition method. Multi-station measurements are fused through a grid-based probabilistic algorithm combined with clustering techniques, effectively mitigating ghost targets and improving localization accuracy in multi-UAV scenarios. Furthermore, the Cramer-Rao lower bound (CRLB) is derived as a performance benchmark and reinforcement learning (RL)-based optimization is employed to balance localization accuracy against station resource usage. Simulations demonstrate that increasing from one to multiple BSs reduces the positioning error to near the CRLB, while practical experiments further verify the framework's effectiveness. Furthermore, our RL-based optimization can find solutions that maintain high accuracy while minimizing resource usage, highlighting its potential as a scalable solution for ensuring airspace safety in the emerging low-altitude economy.
MAKO: Meta-Adaptive Koopman Operators for Learning-based Model Predictive Control of Parametrically Uncertain Nonlinear Systems
In this work, we propose a meta-learning-based Koopman modeling and predictive control approach for nonlinear systems with parametric uncertainties. An adaptive deep meta-learning-based modeling approach, called Meta Adaptive Koopman Operator (MAKO), is proposed. Without knowledge of the parametric uncertainty, the proposed MAKO approach can learn a meta-model from a multi-modal dataset and efficiently adapt to new systems with previously unseen parameter settings by using online data. Based on the learned meta Koopman model, a predictive control scheme is developed, and the stability of the closed-loop system is ensured even in the presence of previously unseen parameter settings. Through extensive simulations, our proposed approach demonstrates superior performance in both modeling accuracy and control efficacy as compared to competitive baselines.
Trust Modeling and Estimation in Human-Autonomy Interactions
Advances in the control of autonomous systems have accompanied an expansion in the potential applications for autonomous robotic systems. The success of applications involving humans depends on the quality of interaction between the autonomous system and the human supervisor, which is particularly affected by the degree of trust that the supervisor places in the autonomous system. Absent from the literature are models of supervisor trust dynamics that can accommodate asymmetric responses to autonomous system performance and the intermittent nature of supervisor-autonomous system communication. This paper focuses on formulating an estimated model of supervisor trust that incorporates both of these features by employing a switched linear system structure with event-triggered sampling of the model input and output. Trust response data collected in a user study with 51 participants were then used identify parameters for a switched linear model-based observer of supervisor trust.
comment: 10 pages. 13 figures
Traffic-Aware Eco-Driving Control in CAVs via Learning-based Terminal Cost Model
Connected and Automated Vehicles (CAVs) offer significant potential for improving energy efficiency and lowering vehicle emissions through eco-driving technologies. Control algorithms in CAVs leverage look-ahead route information and Vehicle-to-Everything (V2X) communication to optimize vehicle performance. However, existing eco-driving strategies often neglect macroscopic traffic effects, such as upstream traffic jams, that occur outside the optimization horizon but significantly impact vehicle energy efficiency. This work presents a novel Neural Network (NN)-based methodology to approximate the terminal cost within a model predictive control (MPC) problem framework, explicitly incorporating upstream traffic dynamics. By incorporating traffic jams into the optimization process, the proposed traffic-aware approach yields more energy-efficient speed trajectories compared to traffic-agnostic methods, with minimal impact on travel time. The framework is scalable for real-time implementation while effectively addressing uncertainties from dynamic traffic conditions and macroscopic traffic events.
Direct Data-Driven Predictive Control for a Three-dimensional Cable-Driven Soft Robotic Arm
Soft robots offer significant advantages in safety and adaptability, yet achieving precise and dynamic control remains a major challenge due to their inherently complex and nonlinear dynamics. Recently, Data-enabled Predictive Control (DeePC) has emerged as a promising model-free approach that bypasses explicit system identification by directly leveraging input-output data. While DeePC has shown success in other domains, its application to soft robots remains underexplored, particularly for three-dimensional (3D) soft robotic systems. This paper addresses this gap by developing and experimentally validating an effective DeePC framework on a 3D, cable-driven soft arm. Specifically, we design and fabricate a soft robotic arm with a thick tubing backbone for stability, a dense silicone body with large cavities for strength and flexibility, and rigid endcaps for secure termination. Using this platform, we implement DeePC with singular value decomposition (SVD)-based dimension reduction for two key control tasks: fixed-point regulation and trajectory tracking in 3D space. Comparative experiments with a baseline model-based controller demonstrate DeePC's superior accuracy, robustness, and adaptability, highlighting its potential as a practical solution for dynamic control of soft robots.
A pilot cohort study of a microfluidic-based point-of-care bilirubin measurement system
Objective The concentration of bilirubin in blood or serum is useful for assessing liver function as well as monitoring treatment. This study evaluates the clinical performance of a novel point-of-care (PoC) device for the detection of bilirubin in serum. The PoC device incorporates an integrated miniature optoelectronic sensing module and a microfluidic test cartridge. Methods Patients' serum total bilirubin concentrations, ranging from 2 {\mu}mol/L to 480 {\mu}mol/L, were measured using the PoC device and the standard laboratory method (n=20). Bland-Altman analysis and regression analysis using Passing-Bablok method were used to benchmark the PoC device against the standard laboratory measurements. The diagnostic capability of the PoC device in categorising the serum samples within clinically relevant bilirubin concentration thresholds of 200, 300, and 450 {\mu}mol/L was assessed using receiver operating characteristic (ROC) analysis. Results The mean difference between the PoC device and the standard laboratory method was -5.6 {\mu}mol/L, with a 95% confidence interval (CI) of -45.1 {\mu}mol/L to 33.9 {\mu}mol/L. The coefficient of determination (R2) was 0.986. The PoC device achieved a detection sensitivity of 90% and specificity of 97% in categorising bilirubin concentrations within bands used in clinical decision-making. Conclusions This study demonstrates that the proposed PoC device is capable of measuring bilirubin levels in patient samples with clinically acceptable accuracy.
Cognitive Radio for Asymmetric Cellular Downlink with Multi-User MIMO
Cognitive radio (CR) is an important technique for improving spectral efficiency, letting a secondary system operate in a wireless spectrum when the primary system does not make use of it. While it has been widely explored over the past 25 years, many common assumptions are not aligned with the realities of 5G networks. In this paper, we consider the CR problem for the following setup: (i) infrastructure-based systems, where downlink transmissions might occur to receivers whose positions are not, or not exactly, known; (ii) multi-beam antennas at both primary and secondary base stations. We formulate a detailed protocol to determine when secondary transmissions into different beam directions can interfere with primary users at potential locations and create probability-based interference rules. We then analyze the "catastrophic interference" probability and the "missed transmission opportunity" probability, as well as the achievable throughput, as a function of the transmit powers of the primary and secondary base stations and the sensing window of the secondary base station. Results can serve to more realistically assess the spectral efficiency gains in 5G infrastructure-based cognitive systems.
Observation Matrix Design for Densifying MIMO Channel Estimation via 2D Ice Filling
In recent years, densifying multiple-input multiple-output (MIMO) has attracted much attention from the communication community. Thanks to the subwavelength antenna spacing, the strong correlations among densifying antennas provide sufficient prior knowledge about channel state information (CSI). This inspires the careful design of observation matrices (e.g., transmit precoders and receive combiners), that exploits the CSI prior knowledge, to boost channel estimation performance. Aligned with this vision, this work proposes to jointly design the combiners and precoders by maximizing the mutual information between the received pilots and densifying MIMO channels. A two-dimensional ice-filling (2DIF) algorithm is proposed to efficiently accomplish this objective. The algorithm is motivated by the fact that the eigenspace of MIMO channel covariance can be decoupled into two sub-eigenspaces, which are associated with the correlations of transmitter antennas and receiver antennas, respectively. By properly setting the precoder and the combiner as the eigenvectors from these two sub-eigenspaces, the 2DIF promises to generate near-optimal observation matrices. Moreover, we further extend the 2DIF method to the popular hybrid combining systems, where a two-stage 2DIF (TS-2DIF) algorithm is developed to handle the analog combining circuits realized by phase shifters. Simulation results demonstrate that, compared to the state-of-the-art schemes, the proposed 2DIF and TS-2DIF methods can achieve superior channel estimation accuracy.
comment: 17 pages, 8 figures
Computing Safe Control Inputs using Discrete-Time Matrix Control Barrier Functions via Convex Optimization
Control barrier functions (CBFs) have seen widespread success in providing forward invariance and safety guarantees for dynamical control systems. A crucial limitation of discrete-time formulations is that CBFs that are nonconcave in their argument require the solution of nonconvex optimization problems to compute safety-preserving control inputs, which inhibits real-time computation of control inputs guaranteeing forward invariance. This paper presents a novel method for computing safety-preserving control inputs for discrete-time systems with nonconvex safety sets, utilizing convex optimization and the recently developed class of matrix control barrier function techniques. The efficacy of our methods is demonstrated through numerical simulations on a bicopter system.
comment: 17 pages, 8 figures
Cyber-Physical Systems on the Megawatt Scale: The impact of battery control on grid frequency stability
Electric power systems are undergoing fundamental change. The shift to inverter-based generation challenges frequency stability, while growing digitalisation heightens vulnerability to errors and attacks. Here we identify an emerging risk at the intersection of cyber-physical coupling and control system design. We show that grid frequency time series worldwide exhibit a persistent one-minute oscillatory pattern, whose origin has remained largely unexplained. We trace this pattern back to the energy management systems of battery electric storage systems and demonstrate that the pattern amplitude has increased substantially in the Nordic and British grids. We argue that this effect is a potential burden for stability in future grids with low inertia and an increasing penetration with batteries and smart devices, though it can be mitigated by a revision of battery control algorithms.
comment: 19 pages, 23 figures
Latent-Feature-Informed Neural ODE Modeling for Lightweight Stability Evaluation of Black-box Grid-Tied Inverters
Stability evaluation of black-box grid-tied inverters is vital for grid reliability, yet identification techniques are both data-hungry and blocked by proprietary internals. {To solve this, this letter proposes a latent-feature-informed neural ordinary differential equation (LFI-NODE) modeling method that can achieve lightweight stability evaluation directly from trajectory data.} LFI-NODE parameterizes the entire system ODE with a single continuous-time neural network, allowing each new sample to refine a unified global model. It faithfully captures nonlinear large-signal dynamics to preserve uniform predictive accuracy as the inverter transitions between operating points. Meanwhile, latent perturbation features distilled from every trajectory steer the learning process and concurrently reveal the small-signal eigenstructure essential for rigorous stability analysis. Validated on a grid-forming inverter, {The LFI-NODE requires one to two orders of magnitude fewer training samples compared with traditional methods, collected from short time-domain trajectories instead of extensive frequency-domain measurements.} {Furthermore, the LFI-NODE requires only 48 short transients to achieve a trajectory prediction error at the hundredth level and an eigenvalue estimation error at the tenth level, outperforming benchmark methods by one to two orders of magnitude.} This makes LFI-NODE a practical and lightweight approach for achieving high-fidelity stability assessment of complex black-box power-electronic systems.
comment: 6 pages 8fugures
Designing Control Barrier Functions Using a Dynamic Backup Policy
This paper presents a systematic approach to construct control barrier functions for nonlinear control affine systems subject to arbitrary state and input constraints. Taking inspiration from the reference governor literature, the proposed method defines a family of backup policies, parametrized by the equilibrium manifold of the system. The control barrier function is defined on the augmented state-and-reference space: given a state-reference pair, the approach quantifies the distance to constraint violation at any time in the future, should the current backup policy reference remain constant. Sensitivity analysis is then used to compute the (possibly nonsmooth) Jacobian with respect to the augmented state vector. To showcase its simple yet general nature, the proposed method is applied to an inverted pendulum on cart.
comment: 7 pages, 1 figure
Science ouverte et collaborative pour l'élaboration d'un banc automatisé de caractérisation de pertes en commutation par opposition
The switching losses of power transistors are generally measured using the so-called double pulse method. Measuring the opposition of two switching cells is a complementary method that is more accurate but indirect. However, implementing this method can be more complex and requires calibration steps and comprehensive control, with the added issue of thermal management. In this context, we proposed to address this topic through open and collaborative science, first in the form of a two-day hackathon, followed by monthly open sessions. More than 20 participants contributed to the two-day hackathon, followed by monthly sessions for those wishing to continue working together. This enabled us to set up an automated bench, in open science, including the generation of switching commands, the configuration and control of measuring instruments, and the hardware part. Here we present and share our work and this open approach.
comment: Paper in french, presented at the french national electrical engineering conference SGE 2025
Robustness Analysis for Quantum Systems Controlled by Continuous-Time Pulses
Differential sensitivity techniques originally developed to study the robustness of energy landscape controllers are generalized to the important case of closed quantum systems subject to continuously varying controls. Vanishing sensitivity to parameter variation is shown to coincide with perfect fidelity, as was the case for time-invariant controls. Upper bounds on the magnitude of the differential sensitivity to any parameter variation are derived based simply on knowledge of the system Hamiltonian and the maximum size of the control inputs.
comment: 6 pages, 2 figures
Safe and Optimal N-Spacecraft Swarm Reconfiguration in Non-Keplerian Cislunar Orbits
This paper presents a novel fuel-optimal guidance and control methodology for spacecraft swarm reconfiguration in Restricted Multi-Body Problems (RMBPs) with a guarantee of passive safety, maintaining miss distance even under abrupt loss of control authority. A new set of constraints exploits a quasi-periodic structure of RMBPs to guarantee passive safety. Particularly, the condition for passive safety is expressed as simple geometric constraints by solving optimal control in Local Toroidal Coordinates, which is based on a local eigenspace of a quasi-periodic motion around the corresponding periodic orbit. The proposed formulation enables a significant simplification of problem structure, which is applicable to large-scale swarm reconfiguration in cislunar orbits. The method is demonstrated in the Circular Restricted Three-Body Problem, the Elliptic Restricted Three-Body Problem, and the Bi-Circular Restricted Four-Body Problem. Furthermore, the optimized control profiles are validated in the full-ephemeris dynamics model. By extending and generalizing well-known concepts of relative orbital elements within the restricted two-body problem to the three- and four-body problems, this paper lays the foundation for practical control schemes of relative motion in cislunar space.
comment: 41 pages, 19 figures. Submitted and accepted to Journal of Guidance, Control, and Dynamics
A Digital Twin for Diesel Engines: Operator-infused Physics-Informed Neural Networks with Transfer Learning for Engine Health Monitoring
Improving diesel engine efficiency, reducing emissions, and enabling robust health monitoring have been critical research topics in engine modelling. While recent advancements in the use of neural networks for system monitoring have shown promising results, such methods often focus on component-level analysis, lack generalizability, and physical interpretability. In this study, we propose a novel hybrid framework that combines physics-informed neural networks (PINNs) with deep operator networks (DeepONet) to enable accurate and computationally efficient parameter identification in mean-value diesel engine models. Our method leverages physics-based system knowledge in combination with data-driven training of neural networks to enhance model applicability. Incorporating offline-trained DeepONets to predict actuator dynamics significantly lowers the online computation cost when compared to the existing PINN framework. To address the re-training burden typical of PINNs under varying input conditions, we propose two transfer learning (TL) strategies: (i) a multi-stage TL scheme offering better runtime efficiency than full online training of the PINN model and (ii) a few-shot TL scheme that freezes a shared multi-head network body and computes physics-based derivatives required for model training outside the training loop. The second strategy offers a computationally inexpensive and physics-based approach for predicting engine dynamics and parameter identification, offering computational efficiency over the existing PINN framework. Compared to existing health monitoring methods, our framework combines the interpretability of physics-based models with the flexibility of deep learning, offering substantial gains in generalization, accuracy, and deployment efficiency for diesel engine diagnostics.
SwarmGPT: Combining Large Language Models with Safe Motion Planning for Drone Swarm Choreography
Drone swarm performances -- synchronized, expressive aerial displays set to music -- have emerged as a captivating application of modern robotics. Yet designing smooth, safe choreographies remains a complex task requiring expert knowledge. We present SwarmGPT, a language-based choreographer that leverages the reasoning power of large language models (LLMs) to streamline drone performance design. The LLM is augmented by a safety filter that ensures deployability by making minimal corrections when safety or feasibility constraints are violated. By decoupling high-level choreographic design from low-level motion planning, our system enables non-experts to iteratively refine choreographies using natural language without worrying about collisions or actuator limits. We validate our approach through simulations with swarms up to 200 drones and real-world experiments with up to 20 drones performing choreographies to diverse types of songs, demonstrating scalable, synchronized, and safe performances. Beyond entertainment, this work offers a blueprint for integrating foundation models into safety-critical swarm robotics applications.
comment: Accepted at RA-L 2025
Extending First-order Robotic Motion Planners to Second-order Robot Dynamics
This paper extends first-order motion planners to robots governed by second-order dynamics. Two control schemes are proposed based on the knowledge of a scalar function whose negative gradient aligns with a given first-order motion planner. When such a function is known, the first-order motion planner is combined with a damping velocity vector with a dynamic gain to extend the safety and convergence guarantees of the first-order motion planner to second-order systems. If no such function is available, we propose an alternative control scheme ensuring that the error between the robot's velocity and the first-order motion planner converges to zero. The theoretical developments are supported by simulation results demonstrating the effectiveness of the proposed approaches.
comment: 14 pages, 10 figures
Direction Estimation of Sound Sources Using Microphone Arrays and Signal Strength ICSE
Sound-tracking refers to the process of determining the direction from which a sound originates, making it a fundamental component of sound source localization. This capability is essential in a variety of applications, including security systems, acoustic monitoring, and speaker tracking, where accurately identifying the direction of a sound source enables real-time responses, efficient resource allocation, and improved situational awareness. While sound-tracking is closely related to localization, it specifically focuses on identifying the direction of the sound source rather than estimating its exact position in space. Despite its utility, sound-tracking systems face several challenges, such as maintaining directional accuracy and precision, along with the need for sophisticated hardware configurations and complex signal processing algorithms. This paper presents a sound-tracking method using three electret microphones. We estimate the direction of a sound source using a lightweight method that analyzes signals from three strategically placed microphones. By comparing the average power of the received signals, the system infers the most probable direction of the sound. The results indicate that the power level from each microphone effectively determines the sound source direction. Our system employs a straightforward and cost-effective hardware design, ensuring simplicity and affordability in implementation. It achieves a localization error of less than 6 degrees and a precision of 98%. Additionally, its effortless integration with various systems makes it versatile and adaptable. Consequently, this technique presents a robust and reliable solution for sound-tracking and localization, with potential applications spanning diverse domains such as security systems, smart homes, and acoustic monitoring.
comment: Accepted to the 32nd International Conference on Systems Engineering (ICSEng'2025)
Optimization via a Control-Centric Framework
Optimization plays a central role in intelligent systems and cyber-physical technologies, where speed and reliability of convergence directly impact performance. In control theory, optimization-centric methods are standard: controllers are designed by repeatedly solving optimization problems, as in linear quadratic regulation, $H_\infty$ control, and model predictive control. In contrast, this paper develops a control-centric framework for optimization itself, where algorithms are constructed directly from Lyapunov stability principles rather than being proposed first and analyzed afterward. A key element is the stationarity vector, which encodes first-order optimality conditions and enables Lyapunov-based convergence analysis. By pairing a Lyapunov function with a selectable decay law, we obtain continuous-time dynamics with guaranteed exponential, finite-time, fixed-time, or prescribed-time convergence. Within this framework, we introduce three feedback realizations of increasing restrictiveness: the Hessian-gradient, Newton, and gradient dynamics. Each realization shapes the decay of the stationarity vector to achieve the desired rate. These constructions unify unconstrained optimization, extend naturally to constrained problems via Lyapunov-consistent primal-dual dynamics, and broaden the results for minimax and generalized Nash equilibrium seeking problems beyond exponential stability. The framework provides systematic design tools for optimization algorithms in control and game-theoretic problems.
comment: This work has been submitted to the IEEE for possible publication. 12 pages, 3 figures
A view on learning robust goal-conditioned value functions: Interplay between RL and MPC
Reinforcement learning (RL) and model predictive control (MPC) offer a wealth of distinct approaches for automatic decision-making under uncertainty. Given the impact both fields have had independently across numerous domains, there is growing interest in combining the general-purpose learning capability of RL with the safety and robustness features of MPC. To this end, this paper presents a tutorial-style treatment of RL and MPC, treating them as alternative approaches to solving Markov decision processes. In our formulation, RL aims to learn a global value function through offline exploration in an uncertain environment, whereas MPC constructs a local value function through online optimization. This local-global perspective suggests new ways to design policies that combine robustness and goal-conditioned learning. Robustness is incorporated into the RL and MPC pipelines through a scenario-based approach. Goal-conditioned learning aims to alleviate the burden of engineering a reward function for RL. Combining the two leads to a single policy that unites a robust, high-level RL terminal value function with short-term, scenario-based MPC planning for reliable constraint satisfaction. This approach leverages the benefits of both RL and MPC, the effectiveness of which is demonstrated on classical control benchmarks.
comment: Postprint; 37 pages
Decentralized CBF-based Safety Filters for Collision Avoidance of Cooperative Missile Systems with Input Constraints
This paper presents a decentralized safety filter for collision avoidance in multi-agent aerospace interception scenarios. The approach leverages robust control barrier functions (RCBFs) to guarantee forward invariance of safety sets under bounded inputs and high-relative-degree dynamics. Each effector executes its nominal cooperative guidance command, while a local quadratic program (QP) modifies the input only when necessary. Event-triggered activation based on range and zero-effort miss (ZEM) criteria ensures scalability by restricting active constraints to relevant neighbors. To resolve feasibility issues from simultaneous constraints, a slack-variable relaxation scheme is introduced that prioritizes critical agents in a Pareto-optimal manner. Simulation results in many-on-many interception scenarios demonstrate that the proposed framework maintains collision-free operation with minimal deviation from nominal guidance, providing a computationally efficient and scalable solution for safety-critical multi-agent aerospace systems.
comment: 7 pages, 5 figures
A Control Allocation Algorithm for Hypersonic Glide Vehicles with Input Limitations
Hypersonic glide vehicles (HGVs) operate in challenging flight regimes characterized by strong nonlinearities in actuation and stringent physical constraints. These include state-dependent actuator limitations, asymmetric control bounds, and thermal loads that vary with maneuvering conditions. This paper introduces an iterative control allocation method to address these challenges in real time. The proposed algorithm searches for control inputs that achieve the desired moment commands while respecting constraints on input magnitude and rate. For slender HGV configurations, thermal loads and drag generation are strongly correlated-lower drag typically results in reduced surface heating. By embedding drag-sensitive soft constraints, the method improves energy efficiency and implicitly reduces surface temperatures, lowering the vehicle's infrared signature. These features are particularly advantageous for long-range military operations that require low observability. The approach is demonstrated using the DLR's Generic Hypersonic Glide Vehicle 2 (GHGV-2) simulation model. The results confirm the method's effectiveness in maintaining control authority under realistic, constrained flight conditions.
comment: 38 pages, 20 figures, submitted to the AIAA Journal of Guidance, Control, and Dynamics
Machine Learning Detection of Lithium Plating in Lithium-ion Cells: A Gaussian Process Approach
Lithium plating during fast charging is a critical degradation mechanism that accelerates capacity fade and can trigger catastrophic safety failures. Recent work has identified a distinctive dQ/dV peak above 4.0 V as a reliable signature of plating onset; however, conventional methods for computing dQ/dV rely on finite differencing with filtering, which amplifies sensor noise and introduces bias in peak location. In this paper, we propose a Gaussian Process (GP) framework for lithium plating detection by directly modeling the charge-voltage relationship Q(V) as a stochastic process with calibrated uncertainty. Leveraging the property that derivatives of GPs remain GPs, we infer dQ/dV analytically and probabilistically from the posterior, enabling robust detection without ad hoc smoothing. The framework provides three key benefits: (i) noise-aware inference with hyperparameters learned from data, (ii) closed-form derivatives with credible intervals for uncertainty quantification, and (iii) scalability to online variants suitable for embedded BMS. Experimental validation on Li-ion coin cells across a range of C-rates (0.2C-1C) and temperatures (0-40\deg C) demonstrates that the GP-based method reliably detects plating peaks under low-temperature, high-rate charging, while correctly reporting no peaks in baseline cases. The concurrence of GP-identified differential peaks, reduced charge throughput, and capacity fade measured via reference performance tests confirms the method's accuracy and robustness, establishing a practical pathway for real-time lithium plating detection.
comment: Submitted to American Control Conference 2026 - ACC 2026
Techno-economic analysis of self-sustainable thermophotovoltaic systems for grid-scale energy generation
To facilitate the widespread adoption of renewable energy, dispatchable, zero-emission power sources are essential for grid stability. This work performs a comprehensive techno-economic analysis of a self-sustainable thermophotovoltaic (TPV) system, an architecture that integrates solar charging to function as a standalone power generation asset. Using theory-based models for conventional air-bridge InGaAs and Si diode cells, our analysis reveals that while the system is not currently competitive from a pure levelized of storage cost (LCOS) perspective due to the high capital expenditure for thermal battery materials, its primary value lies in its competitive levelized cost of electricity (LCOE), which is comparable to that of conventional dispatchable generators such as gas turbines. Furthermore, we show that a full Si-based TPV system, utilizing a 50-{\mu}m-thick air-bridge cell for enhanced photon utilization, can also achieve an LCOE that is competitive with such conventional power sources at scales exceeding the gigawatt-hour level, despite its lower conversion efficiency relative to its InGaAs counterpart. This highlights a practical engineering pathway for leveraging the immense manufacturing scalability of Si, offering a lower-risk route to deployment compared to III-V materials. Ultimately, this work establishes the self-sustainable TPV architecture as a compelling pathway toward providing grid-scale, on-demand, zero-emission power.
comment: 27 pages, 6 figures, 1 table
Transferable Parasitic Estimation via Graph Contrastive Learning and Label Rebalancing in AMS Circuits
Graph representation learning on Analog-Mixed Signal (AMS) circuits is crucial for various downstream tasks, e.g., parasitic estimation. However, the scarcity of design data, the unbalanced distribution of labels, and the inherent diversity of circuit implementations pose significant challenges to learning robust and transferable circuit representations. To address these limitations, we propose CircuitGCL, a novel graph contrastive learning framework that integrates representation scattering and label rebalancing to enhance transferability across heterogeneous circuit graphs. CircuitGCL employs a self-supervised strategy to learn topology-invariant node embeddings through hyperspherical representation scattering, eliminating dependency on large-scale data. Simultaneously, balanced mean squared error (BMSE) and balanced softmax cross-entropy (BSCE) losses are introduced to mitigate label distribution disparities between circuits, enabling robust and transferable parasitic estimation. Evaluated on parasitic capacitance estimation (edge-level task) and ground capacitance classification (node-level task) across TSMC 28nm AMS designs, CircuitGCL outperforms all state-of-the-art (SOTA) methods, with the $R^2$ improvement of $33.64\% \sim 44.20\%$ for edge regression and F1-score gain of $0.9\times \sim 2.1\times$ for node classification. Our code is available at https://github.com/ShenShan123/CircuitGCL.
comment: Final version accepted by the International Conference on Computer-Aided Design (ICCAD) 2025. First two authors have equal contributions
Geometry of Distance Protection
Distance relays detect faults on transmission lines. They face uncertainty from the fault's location and resistance, as well as the current from the line's remote terminal. In this paper, we aggregate this uncertainty with the Minkowski sum. This allows us to explicitly model the power grid surrounding the relay's line, and in turn accommodate any mix of synchronous machines and inverter-based resources. To make the relay's task easier, inverters can inject perturbations, or auxiliary signals, such as negative-sequence current. We use Farkas' lemma to construct an optimization for designing inverter auxiliary signals.
A Real-Time System for Scheduling and Managing UAV Delivery in Urban Areas
As urban logistics demand continues to grow, UAV delivery has become a key solution to improve delivery efficiency, reduce traffic congestion, and lower logistics costs. However, to fully leverage the potential of UAV delivery networks, efficient swarm scheduling and management are crucial. In this paper, we propose a real-time scheduling and management system based on the ``Airport-Unloading Station" model, aiming to bridge the gap between high-level scheduling algorithms and low-level execution systems. This system, acting as middleware, accurately translates the requirements from the scheduling layer into specific execution instructions, ensuring that the scheduling algorithms perform effectively in real-world environments. Additionally, we implement three collaborative scheduling schemes involving autonomous ground vehicles (AGVs), unmanned aerial vehicles (UAVs), and ground staff to further optimize overall delivery efficiency. Through extensive experiments, this study demonstrates the rationality and feasibility of the proposed management system, providing practical solution for the commercial application of UAVs delivery in urban. Code: https://github.com/chengji253/UAVDeliverySystem
Coordinated Control of Deformation and Flight for Morphing Aircraft via Meta-Learning and Coupled State-Dependent Riccati Equations
In this paper, the coordinated control problem of deformation and flight for morphing aircraft (MA) is studied by using meta-learning (ML) and coupled state-dependent Riccati equations (CSDREs). Our method is built on two principal observations that dynamic models of MA under varying morphing conditions share a morphing condition independent representation function and that the specific morphing condition part lies in a set of linear coefficients. To that end, the domain adversarially invariant meta-learning (DAIML) is employed to learn the shared representation with offline flight data. Based on the learned representation function, the coordinated control of the deformation and flight for MA is formulated as a non-cooperative differential game. The state-dependent feedback control solutions can be derived by addressing a pair of CSDREs. For this purpose, Lyapunov iterations are extended to obtain the positive semidefinite (definite) stabilizing solutions of the CSDREs, and the convergence proof of the proposed algorithm is provided. Finally, a simulation study is carried out to validate the efficacy of the developed coordinated game control strategies.
Modeling and Simulation of an Active Car Suspension with a Robust LQR Controller under Road Disturbance, Parameter Uncertainty and White Noise
Vehicle suspension is important for passengers to travel comfortably and to be less exposed to effects such as vibration and shock. A good suspension system increases the road holding of vehicles, allows them to take turns safely, and reduces the risk of traffic accidents. A passive suspension system is the most widely used suspension system in vehicles due to its simple structure and low cost. Passive suspension systems do not have an actuator and therefore do not have a controller. Active suspension systems have an actuator and a controller. Although their structures are more complex and costly, they are safer. PID controller is widely used in active suspension systems due to its simple structure, reasonable cost, and easy adjustment of coefficients. In this study, a more robust LQR-controlled active suspension was designed than a passive suspension and a PID-controlled active suspension. Robustness analyses were performed for passive suspension, PID-controlled active suspension, and LQR-controlled active suspension. Suspension travel, sprung mass acceleration, and sprung mass motion simulations were performed for all three suspensions under road disturbance, under simultaneous road disturbance and parameter uncertainty and under road disturbance with white noise. A comparative analysis was performed by obtaining the rise time, overshoot, and settling time data of the suspensions under different conditions. It was observed that the LQR-controlled active suspension showed the fastest rise time, the least overshoot and had the shortest settling time. In this case, it was proven that the LQR controlled active suspension provided a more comfortable and safe ride compared to the other two suspension systems.
comment: 20 pages, 19 figures
Resilient Multi-Dimensional Consensus and Distributed Optimization against Agent-Based and Denial-of-Service Attacks
In this paper, we consider the resilient multi-dimensional consensus and distributed optimization problems of multi-agent systems (MASs) in the presence of both agent-based and denial-of-service (DoS) attacks. The considered agent-based attacks can cover malicious, Byzantine, and stubborn agents. The links between agents in the network can be blocked by DoS attacks, which may lead the digraph to be time-varying and even disconnected. The objective is to ensure that the remaining benign agents achieve consensus. To this end, an "auxiliary point"-based resilient control algorithm is proposed for MASs. Under the proposed algorithm, each healthy agent constructs a "safe kernel" utilizing the states of its in-neighbors and updates its state toward a specific point within this kernel at each iteration. If an agent cannot receive its neighbors' states owing to DoS attacks, it will use the states received immediately before the DoS period. Moreover, a resilient multi-dimensional distributed optimization (RMDO) algorithm is also proposed. Theoretical proofs and numerical examples are presented to demonstrate the effectiveness of the proposed algorithms.
Learning a Shape-adaptive Assist-as-needed Rehabilitation Policy from Therapist-informed Input
Therapist-in-the-loop robotic rehabilitation has shown great promise in enhancing rehabilitation outcomes by integrating the strengths of therapists and robotic systems. However, its broader adoption remains limited due to insufficient safe interaction and limited adaptation capability. This article proposes a novel telerobotics-mediated framework that enables therapists to intuitively and safely deliver assist-as-needed~(AAN) therapy based on two primary contributions. First, our framework encodes the therapist-informed corrective force into via-points in a latent space, allowing the therapist to provide only minimal assistance while encouraging patient maintaining own motion preferences. Second, a shape-adaptive ANN rehabilitation policy is learned to partially and progressively deform the reference trajectory for movement therapy based on encoded patient motion preferences and therapist-informed via-points. The effectiveness of the proposed shape-adaptive AAN strategy was validated on a telerobotic rehabilitation system using two representative tasks. The results demonstrate its practicality for remote AAN therapy and its superiority over two state-of-the-art methods in reducing corrective force and improving movement smoothness.
Hierarchical Analysis and Control of Epidemic Spreading over Networks using Dissipativity and Mesh Stability
Analyzing and controlling spreading processes are challenging problems due to the involved non-linear node (subsystem) dynamics, unknown disturbances, complex interconnections, and the large-scale and multi-level nature of the problems. The dissipativity concept provides a practical framework for addressing such concerns, thanks to the energy-based representation it offers for subsystems and the compositional properties it provides for the analysis and control of interconnected (networked) systems comprised of such subsystems. Therefore, in this paper, we utilize the dissipativity concept to analyze and control a spreading process that occurs over a hierarchy of nodes, groups, and a network (i.e., a spreading network). We start by generalizing several existing results on dissipativity-based topology design for networked systems. Next, we model the considered spreading network as a networked system and establish the dissipativity properties of its nodes. The generalized topology design method is then applied at multiple levels of the considered spreading network to formulate its analysis and control problems as Linear Matrix Inequality (LMI) problems. We identify and enforce localized necessary conditions to support the feasibility of the LMI problem solved at each subsequent hierarchical level of the spreading network. Consequently, the proposed method does not involve iterative multi-level optimization stages that are computationally inefficient. The proposed control solution ensures that the spreading network is not only stable but also dissipative and mesh-stable. Compared to conventional methods, such as threshold pruning and high-degree edge removal, our approach offers superior performance in terms of infection containment, control efficiency, and disturbance robustness. Extensive numerical results demonstrate the effectiveness of the proposed technique.
comment: To be submitted to Automatica
Connecting the Equinoctial Elements and Rodrigues Parameters: A New Set of Elements
A geometric interpretation of the equinoctial elements is given with a connection to orthogonal rotations and attitude dynamics in Euclidean 3-space. An identification is made between the equinoctial elements and classic Rodrigues parameters. A new set of equinoctial elements are developed using the modified Rodrigues parameters, thereby removing the coordinate singularity for retrograde equatorial orbits present in previous versions of these elements. A low-thrust trajectory optimization problem is set up using the new elements to numerically verify convergence for the two-point boundary problem, as compared to their predecessors.
comment: formatting corrected for better readability
Optimal Assignment and Motion Control in Two-Class Continuum Swarms
We consider optimal swarm control problems where two different classes of agents are present. Continuum idealizations of large-scale swarms are used where the dynamics describe the evolution of the spatially-distributed densities of each agent class. The problem formulation we adopt is motivated by applications where agents of one class are assigned to agents of the other class, which we refer to as demand and resource agents respectively. Assignments have costs related to the distances between mutually assigned agents, and the overall cost of an assignment is quantified by a Wasserstein distance between the densities of the two agent classes. When agents can move, the assignment cost can decrease at the expense of a physical motion cost, and this tradeoff sets up a nonlinear infinite-dimensional optimal control problem. We show that in one spatial dimension, this problem can be converted to an infinite-dimensional, but decoupled, linear-quadratic (LQ) tracking problem when expressed in terms of the quantile functions of the respective agent densities. Solutions are given in the general one-dimensional case, as well as in the special cases of constant and periodically time-varying demands.
comment: Extended version including periodic-demand case. 13 pages, 7 figures
Data-driven Model Predictive Control using MATLAB
This paper presents a comprehensive overview of data-driven model predictive control, highlighting state-of-the-art methodologies and their numerical implementation. The discussion begins with a brief review of conventional model predictive control (MPC), which discusses both linear MPC (LMPC) and nonlinear MPC (NMPC). This is followed by a section on data-driven LMPC, outlining fundamental concepts and the implementation of various approaches, including subspace predictive control and prediction error methods. Subsequently, the focus shifts to data-driven NMPC, emphasizing approaches based on neural network models. The paper concludes with a review of recent advancements in data-driven MPC and explores potential directions for future research.
comment: 22 pages, 8 figures
Systems and Control (EESS)
Robust reset control design for piezo-actuated nano-positioner in presence of hysteresis nonlinearity
In this paper, a robust nonlinear control scheme is designed for the motion control of a class of piezo-actuated nano-positioning systems using frequency-domain analysis. The hysteresis, the nonlinearity in the piezoelectric material, degrades the precision in tracking references with high frequency contents and different travel ranges. The hysteresis compensation by the inverse model, as the state-of-the-art solution, is not reliable alone. Therefore, a control framework with robustness against the remaining nonlinearity is needed. It is shown that there is an unavoidable limitation in robust linear control design to improve the performance. A robust control methodology based on a complex-order element is established to relax the limitation. Then, a constant-in-gain-lead-in-phase (CgLp) reset controller is utilized to realize the complex-order control. The control design is based on the sinusoidal input describing function (SIDF) and the higher-order SIDF (HOSIDF) tools. A constrained optimization problem is provided to tune the control parameters. The achieved improvements by the CgLp control is validated by the simulation.
Demystifying and Navigating AI Ethics in Power Electronics
Artificial intelligence (AI) is rapidly transforming power electronics, with AI-related publications in IEEE Power Electronics Society selected journals increasing more than fourfold from 2020 to 2025. However, the ethical dimensions of this transformation have received limited attention. This article underscores the urgent need for an ethical framework to guide responsible AI integration in power electronics, not only to prevent AI-related incidents but also to comply with legal and regulatory responsibilities. In this context, this article identifies four core pillars of AI ethics in power electronics: Security & Safety, Explainability & Transparency, Energy Sustainability, and Evolving Roles of Engineers. Each pillar is supported by practical and actionable insights to ensure that ethical principles are embedded in algorithm design, system deployment, and workforce development. The authors advocate for power electronics engineers to lead the ethical discourse, given their deep technical understanding of both AI systems and power conversion technologies. The paper concludes by calling on the IEEE Power Electronics Society to spearhead the establishment of ethical standards and best practices that ensure AI innovations are not only technically advanced but also trustworthy, safe, and sustainable.
Critical States Identiffcation in Power System via Lattice Partition and Its Application in Reliability Assessment
With the increasing complexity of power systems,accurately identifying critical states (the states corresponding to minimal cut sets) and assessing system reliability have become crucial tasks. In this paper, a mathematical lattice structure is employed to represent and partition the state space of power system. Based on this structure, a novel recursive method is proposed to efffciently identify critical states by leveraging lattice partitioning and Optimal Power Flow(OPF) calculations. This method not only enables the extension of failure system states,but also calculates the upper and lower bounds of the Loss of Load Probability (LOLP) in a progressively converging manner. Compared to traditional reliability assessment methods such as State Enumeration (SE) and Monte Carlo Simulation (MCS), this approach offers greater accuracy and efffciency. Experiments conducted on the RBTS and RTS79 systems demonstrate that the proposed method accurately identiffes all critical states up to a preset order, which are high-risk states. The contribution of these critical states to LOLP highlights their signiffcance in the system. Moreover, the proposed method achieves the analytical value with signiffcantly fewer OPF calculations in RBTS system, reaching acceptable precision of LOLP up to 100 times faster than SE in both the RBTS and RTS systems.
Grid-forming Control of Converter Infinite Bus System: Modeling by Data-driven Methods
This study explores data-driven modeling techniques to capture the dynamics of a grid-forming converter-based infinite bus system, critical for renewable-integrated power grids. Using sparse identification of nonlinear dynamics and deep symbolic regression, models were generated from synthetic data simulating key disturbances in active power, reactive power, and voltage references. Deep symbolic regression demonstrated more accuracy in capturing complex system dynamics, though it required substantially more computational time than sparse identification of nonlinear dynamics. These findings suggest that while deep symbolic regression offers high fidelity, sparse identification of nonlinear dynamics provides a more computationally efficient approach, balancing accuracy and runtime for real-time grid applications.
3C Resources Joint Allocation for Time-Deterministic Remote Sensing Image Backhaul in the Space-Ground Integrated Network
Low-Earth-orbit (LEO) satellites assist observation satellites (OSs) to compress and backhaul more time-determined images (TDI) has become a new paradigm, which is used to enhance the timeout caused by the limited computing resources of OSs. However, how to capture the time-varying and dynamic characteristics of multi-dimensional resources is challenging for efficient collaborative scheduling. Motivated by this factor, we design a highly succinct multi-dimensional resource time-expanded graph (MDR-TEG) modell. Specifically, by employing a slots division mechanism and introducing an external virtual node, the time-varying communication, caching, and computing (3C) resources are depicted in low complexity by the link weights within, between, and outside the slots. Based on the MDR-TEG, the maximizing successful transmission ratio of TDI (MSTR-TDI) is modeled as a mixed integer linear programming (MILP) problem. Which further relaxed decomposed into two tractable sub-problems: maximizing the successful transmission rate of images (MSTRI) and ensuring the timeliness problem (ETP). Subsequently, an efficient subgradient of relaxation computing constraint (SRCC) algorithm is proposed. The upper and lower bounds of MSTR-TDI are obtained by solving the two subproblems and the dual problem (DP), and the direction of the next iteration is obtained by feedback. Furthermore, arranging the sending sequences of images to improve the quality of the solution. The approximate optimal solution of MSTR-TDI is eventually obtained through repeated iterations. The simulation results verify the superiority of the proposed MDR-TEG model and the effectiveness of the SRCC.
Task-Level Insights from Eigenvalues across Sequence Models
Although softmax attention drives state-of-the-art performance for sequence models, its quadratic complexity limits scalability, motivating linear alternatives such as state space models (SSMs). While these alternatives improve efficiency, their fundamental differences in information processing remain poorly understood. In this work, we leverage the recently proposed dynamical systems framework to represent softmax, norm and linear attention as dynamical systems, enabling a structured comparison with SSMs by analyzing their respective eigenvalue spectra. Since eigenvalues capture essential aspects of dynamical system behavior, we conduct an extensive empirical analysis across diverse sequence models and benchmarks. We first show that eigenvalues influence essential aspects of memory and long-range dependency modeling, revealing spectral signatures that align with task requirements. Building on these insights, we then investigate how architectural modifications in sequence models impact both eigenvalue spectra and task performance. This correspondence further strengthens the position of eigenvalue analysis as a principled metric for interpreting, understanding, and ultimately improving the capabilities of sequence models.
MPA-DNN: Projection-Aware Unsupervised Learning for Multi-period DC-OPF
Ensuring both feasibility and efficiency in optimal power flow (OPF) operations has become increasingly important in modern power systems with high penetrations of renewable energy and energy storage. While deep neural networks (DNNs) have emerged as promising fast surrogates for OPF solvers, they often fail to satisfy critical operational constraints, especially those involving inter-temporal coupling, such as generator ramping limits and energy storage operations. To deal with these issues, we propose a Multi-Period Projection-Aware Deep Neural Network (MPA-DNN) that incorporates a projection layer for multi-period dispatch into the network. By doing so, our model enforces physical feasibility through the projection, enabling end-to-end learning of constraint-compliant dispatch trajectories without relying on labeled data. Experimental results demonstrate that the proposed method achieves near-optimal performance while strictly satisfying all constraints in varying load conditions.
Data-Driven Control Of Power Converters
The fundamental role of power converters is to efficiently manage and control the flow of electrical energy, ensuring compatibility between power sources and loads. All these applications of power converters need the design of an appropriate control law. Control of power converters is a challenging problem due to the presence of switching devices which are difficult to handle using traditional control approaches. The objective of this paper is to investigate the use of data-driven techniques, in particular the Virtual References Feedback Tuning (VRFT) method, in the context of power converters feedback control. This study considers a buck \pauline{mode} power converter circuit provided by the OwnTech foundation.
comment: Conference paper for the french national electrical engineering symposium, SGE 2025
Weighting Factors Tuning by Direct Feedback in Predictive Control of Multiphase Motors
Predictive Stator Current Control (PSCC) has been proposed for control of multi-phase drives. The flexibility offered by the use of a Cost Function has been used to deal with the increased number of phases. However, tuning of the Weighting Factors constitutes a problem. Intensive trial and error tests are usual in this context. Existing on-line selection methods, on the other hand, require large amounts of data and/or complex optimization procedures. The proposal of this paper is a closed-loop scheme that links Weighting Factors to performance indicators. In this way, optimal Weighting Factors are determined for each operating point. Also, changes in reference values for performance indicators are easily tackled. Unlike previous methods, the proposal carries very little computational burden. A case study is developed for a five-phase induction motor and assessed with real experimentation on a laboratory set-up.
Safety Analysis of eVTOL Operations based on STPA
Electric Vertical Take-Off and Landing (eVTOL) aircraft are expected to be quieter and more cost-effective than helicopters, offering major economic and social benefits through improved connectivity. Their adoption will require new ground infrastructure and airspace redesign, introducing risks involving multiple stakeholders (Regulators, eVTOL operators, Air navigation service providers, Vertiport operators, OEMs, Pilots, etc.). To assess these risks for the UK airspace, systems-thinking based System Theoretic Process Analysis (STPA) was conducted. To manage the large number of Unsafe Control Actions (UCAs) and requirements generated due to the complexity of the analysis, a novel extension to STPA for the prioritization of results was applied. 317 UCAs were identified in total out of which 110 high-priority UCAs were analyzed (Step-4), resulting in 377 causal factors and 432 requirements. These were prioritized to produce a targeted list of 124 distinct high-priority requirements, 56 of which were identified as gaps in existing aviation regulations, policies, or procedures.. These highlight opportunities for regulatory updates in areas such as organizational performance, certification processes, training, collision avoidance, energy management, and automation. The findings provide regulators with safety considerations that could shape new or updated regulations, compliance methods, and guidance materials for the safe deployment of eVTOLs.
Single vs Multi Vector Predictive Control of Five-phase Drives
The field of Finite State Model Predictive Control for multiphase drives has produced many contributions. Many variants of FSMPC exist, each aiming at some aspect such as complexity of the cost function, switching frequency, etc. Despite past efforts to compare different techniques, the field is still out of consensus regarding the relative merits of each one. This paper presents a new method to compare FSMPC variants. The method is based on analyzing the modulation, implicit or explicit, used by each variant. In the paper the method is used to compare single-vector state-of-the-art FSMPC with a multi-vector variant designed to cancel xy currents and simplify the cost function. The results show the strengths and weaknesses of each technique. Also, it is found that the trade-offs between figures, previously thought to concern just individual regimes, extend to the whole operating space and also can be pinpoint to each FSMPC variant. Finally, it is shown that the flexibility of the single-vector approach and its better DC-link usage makes it, arguably, superior over the multi-vector variant.
Robust Adaptive Boundary Control of a Thermal Process with Thermoelectric Actuators: Theory and Experimental Validation
A sliding-mode-based adaptive boundary control law is proposed for a class of uncertain thermal reaction-diffusion processes subject to matched disturbances. The disturbances are assumed to be bounded, but the corresponding bounds are unknown, thus motivating the use of adaptive control strategies. A boundary control law comprising a proportional and discontinuous term is proposed, wherein the magnitude of the discontinuous relay term is adjusted via a gradient-based adaptation algorithm. Depending on how the adaptation algorithm is parameterized, the adaptive gain can be either a nondecreasing function of time (monodirectional adaptation) or it can both increase and decrease (bidirectional adaptation). The convergence and stability properties of these two solutions are investigated by Lyapunov analyses, and two distinct stability results are derived, namely, asymptotic stability for the monodirectional adaptation and globally uniformly ultimately bounded solutions for the bidirectional adaptation. The proposed algorithms are then specified to address the control problem of stabilizing a desired temperature profile in a metal beam equipped with thermoelectric boundary actuators. Experiments are conducted to investigate the real-world performance of the proposed sliding-mode-based adaptive control, with a particular focus on comparing the monodirectional and bidirectional adaptation laws.
comment: Extended version of the preprint submitted to the journal Automatica
Antenna's Performance in Microwave Imaging of Stratified Media
Numerous types of antennas have been employed for microwave imaging of stratified media for ground penetrating radar (GPR), through-the-wall-radar imaging (TWRI), etc. This letter aims to investigate the impact of the different antennas with their characteristics on the image reconstruction of those media. Hence, three types of antennas, including horn antennas, open waveguide and Vivaldi antennas, are chosen as almost directional antennas, operating at X-band 8-12 GHz. The antenna's far-field and near-field characteristics are analyzed. A diffraction tomography (DT)-based algorithm is used to reconstruct the target location within the stratified media using monostatic and multistatic data. It is observed that the more directional antennas provide a better-reconstructed image with less shadowing image of the stratified media.
Sensing, Detection and Localization for Low Altitude UAV: A RF-Based Framework via Multiple BSs Collaboration
The rapid growth of the low-altitude economy has resulted in a significant increase in the number of Low, slow, and small (LLS) unmanned aerial vehicles (UAVs), raising critical challenges for secure airspace management and reliable trajectory planning. To address this, this paper proposes a cooperative radio-frequency (RF) detection and localization framework that leverages existing cellular base stations. The proposed approach features a robust scheme for LSS target identification, integrating a cell averaging-constant false alarm rate (CA-CFAR) detector with a micro-Doppler signature (MDS) based recognition method. Multi-station measurements are fused through a grid-based probabilistic algorithm combined with clustering techniques, effectively mitigating ghost targets and improving localization accuracy in multi-UAV scenarios. Furthermore, the Cramer-Rao lower bound (CRLB) is derived as a performance benchmark and reinforcement learning (RL)-based optimization is employed to balance localization accuracy against station resource usage. Simulations demonstrate that increasing from one to multiple BSs reduces the positioning error to near the CRLB, while practical experiments further verify the framework's effectiveness. Furthermore, our RL-based optimization can find solutions that maintain high accuracy while minimizing resource usage, highlighting its potential as a scalable solution for ensuring airspace safety in the emerging low-altitude economy.
MAKO: Meta-Adaptive Koopman Operators for Learning-based Model Predictive Control of Parametrically Uncertain Nonlinear Systems
In this work, we propose a meta-learning-based Koopman modeling and predictive control approach for nonlinear systems with parametric uncertainties. An adaptive deep meta-learning-based modeling approach, called Meta Adaptive Koopman Operator (MAKO), is proposed. Without knowledge of the parametric uncertainty, the proposed MAKO approach can learn a meta-model from a multi-modal dataset and efficiently adapt to new systems with previously unseen parameter settings by using online data. Based on the learned meta Koopman model, a predictive control scheme is developed, and the stability of the closed-loop system is ensured even in the presence of previously unseen parameter settings. Through extensive simulations, our proposed approach demonstrates superior performance in both modeling accuracy and control efficacy as compared to competitive baselines.
Trust Modeling and Estimation in Human-Autonomy Interactions
Advances in the control of autonomous systems have accompanied an expansion in the potential applications for autonomous robotic systems. The success of applications involving humans depends on the quality of interaction between the autonomous system and the human supervisor, which is particularly affected by the degree of trust that the supervisor places in the autonomous system. Absent from the literature are models of supervisor trust dynamics that can accommodate asymmetric responses to autonomous system performance and the intermittent nature of supervisor-autonomous system communication. This paper focuses on formulating an estimated model of supervisor trust that incorporates both of these features by employing a switched linear system structure with event-triggered sampling of the model input and output. Trust response data collected in a user study with 51 participants were then used identify parameters for a switched linear model-based observer of supervisor trust.
comment: 10 pages. 13 figures
Traffic-Aware Eco-Driving Control in CAVs via Learning-based Terminal Cost Model
Connected and Automated Vehicles (CAVs) offer significant potential for improving energy efficiency and lowering vehicle emissions through eco-driving technologies. Control algorithms in CAVs leverage look-ahead route information and Vehicle-to-Everything (V2X) communication to optimize vehicle performance. However, existing eco-driving strategies often neglect macroscopic traffic effects, such as upstream traffic jams, that occur outside the optimization horizon but significantly impact vehicle energy efficiency. This work presents a novel Neural Network (NN)-based methodology to approximate the terminal cost within a model predictive control (MPC) problem framework, explicitly incorporating upstream traffic dynamics. By incorporating traffic jams into the optimization process, the proposed traffic-aware approach yields more energy-efficient speed trajectories compared to traffic-agnostic methods, with minimal impact on travel time. The framework is scalable for real-time implementation while effectively addressing uncertainties from dynamic traffic conditions and macroscopic traffic events.
Direct Data-Driven Predictive Control for a Three-dimensional Cable-Driven Soft Robotic Arm
Soft robots offer significant advantages in safety and adaptability, yet achieving precise and dynamic control remains a major challenge due to their inherently complex and nonlinear dynamics. Recently, Data-enabled Predictive Control (DeePC) has emerged as a promising model-free approach that bypasses explicit system identification by directly leveraging input-output data. While DeePC has shown success in other domains, its application to soft robots remains underexplored, particularly for three-dimensional (3D) soft robotic systems. This paper addresses this gap by developing and experimentally validating an effective DeePC framework on a 3D, cable-driven soft arm. Specifically, we design and fabricate a soft robotic arm with a thick tubing backbone for stability, a dense silicone body with large cavities for strength and flexibility, and rigid endcaps for secure termination. Using this platform, we implement DeePC with singular value decomposition (SVD)-based dimension reduction for two key control tasks: fixed-point regulation and trajectory tracking in 3D space. Comparative experiments with a baseline model-based controller demonstrate DeePC's superior accuracy, robustness, and adaptability, highlighting its potential as a practical solution for dynamic control of soft robots.
A pilot cohort study of a microfluidic-based point-of-care bilirubin measurement system
Objective The concentration of bilirubin in blood or serum is useful for assessing liver function as well as monitoring treatment. This study evaluates the clinical performance of a novel point-of-care (PoC) device for the detection of bilirubin in serum. The PoC device incorporates an integrated miniature optoelectronic sensing module and a microfluidic test cartridge. Methods Patients' serum total bilirubin concentrations, ranging from 2 {\mu}mol/L to 480 {\mu}mol/L, were measured using the PoC device and the standard laboratory method (n=20). Bland-Altman analysis and regression analysis using Passing-Bablok method were used to benchmark the PoC device against the standard laboratory measurements. The diagnostic capability of the PoC device in categorising the serum samples within clinically relevant bilirubin concentration thresholds of 200, 300, and 450 {\mu}mol/L was assessed using receiver operating characteristic (ROC) analysis. Results The mean difference between the PoC device and the standard laboratory method was -5.6 {\mu}mol/L, with a 95% confidence interval (CI) of -45.1 {\mu}mol/L to 33.9 {\mu}mol/L. The coefficient of determination (R2) was 0.986. The PoC device achieved a detection sensitivity of 90% and specificity of 97% in categorising bilirubin concentrations within bands used in clinical decision-making. Conclusions This study demonstrates that the proposed PoC device is capable of measuring bilirubin levels in patient samples with clinically acceptable accuracy.
Cognitive Radio for Asymmetric Cellular Downlink with Multi-User MIMO
Cognitive radio (CR) is an important technique for improving spectral efficiency, letting a secondary system operate in a wireless spectrum when the primary system does not make use of it. While it has been widely explored over the past 25 years, many common assumptions are not aligned with the realities of 5G networks. In this paper, we consider the CR problem for the following setup: (i) infrastructure-based systems, where downlink transmissions might occur to receivers whose positions are not, or not exactly, known; (ii) multi-beam antennas at both primary and secondary base stations. We formulate a detailed protocol to determine when secondary transmissions into different beam directions can interfere with primary users at potential locations and create probability-based interference rules. We then analyze the "catastrophic interference" probability and the "missed transmission opportunity" probability, as well as the achievable throughput, as a function of the transmit powers of the primary and secondary base stations and the sensing window of the secondary base station. Results can serve to more realistically assess the spectral efficiency gains in 5G infrastructure-based cognitive systems.
Observation Matrix Design for Densifying MIMO Channel Estimation via 2D Ice Filling
In recent years, densifying multiple-input multiple-output (MIMO) has attracted much attention from the communication community. Thanks to the subwavelength antenna spacing, the strong correlations among densifying antennas provide sufficient prior knowledge about channel state information (CSI). This inspires the careful design of observation matrices (e.g., transmit precoders and receive combiners), that exploits the CSI prior knowledge, to boost channel estimation performance. Aligned with this vision, this work proposes to jointly design the combiners and precoders by maximizing the mutual information between the received pilots and densifying MIMO channels. A two-dimensional ice-filling (2DIF) algorithm is proposed to efficiently accomplish this objective. The algorithm is motivated by the fact that the eigenspace of MIMO channel covariance can be decoupled into two sub-eigenspaces, which are associated with the correlations of transmitter antennas and receiver antennas, respectively. By properly setting the precoder and the combiner as the eigenvectors from these two sub-eigenspaces, the 2DIF promises to generate near-optimal observation matrices. Moreover, we further extend the 2DIF method to the popular hybrid combining systems, where a two-stage 2DIF (TS-2DIF) algorithm is developed to handle the analog combining circuits realized by phase shifters. Simulation results demonstrate that, compared to the state-of-the-art schemes, the proposed 2DIF and TS-2DIF methods can achieve superior channel estimation accuracy.
comment: 17 pages, 8 figures
Computing Safe Control Inputs using Discrete-Time Matrix Control Barrier Functions via Convex Optimization
Control barrier functions (CBFs) have seen widespread success in providing forward invariance and safety guarantees for dynamical control systems. A crucial limitation of discrete-time formulations is that CBFs that are nonconcave in their argument require the solution of nonconvex optimization problems to compute safety-preserving control inputs, which inhibits real-time computation of control inputs guaranteeing forward invariance. This paper presents a novel method for computing safety-preserving control inputs for discrete-time systems with nonconvex safety sets, utilizing convex optimization and the recently developed class of matrix control barrier function techniques. The efficacy of our methods is demonstrated through numerical simulations on a bicopter system.
comment: 17 pages, 8 figures
Cyber-Physical Systems on the Megawatt Scale: The impact of battery control on grid frequency stability
Electric power systems are undergoing fundamental change. The shift to inverter-based generation challenges frequency stability, while growing digitalisation heightens vulnerability to errors and attacks. Here we identify an emerging risk at the intersection of cyber-physical coupling and control system design. We show that grid frequency time series worldwide exhibit a persistent one-minute oscillatory pattern, whose origin has remained largely unexplained. We trace this pattern back to the energy management systems of battery electric storage systems and demonstrate that the pattern amplitude has increased substantially in the Nordic and British grids. We argue that this effect is a potential burden for stability in future grids with low inertia and an increasing penetration with batteries and smart devices, though it can be mitigated by a revision of battery control algorithms.
comment: 19 pages, 23 figures
Latent-Feature-Informed Neural ODE Modeling for Lightweight Stability Evaluation of Black-box Grid-Tied Inverters
Stability evaluation of black-box grid-tied inverters is vital for grid reliability, yet identification techniques are both data-hungry and blocked by proprietary internals. {To solve this, this letter proposes a latent-feature-informed neural ordinary differential equation (LFI-NODE) modeling method that can achieve lightweight stability evaluation directly from trajectory data.} LFI-NODE parameterizes the entire system ODE with a single continuous-time neural network, allowing each new sample to refine a unified global model. It faithfully captures nonlinear large-signal dynamics to preserve uniform predictive accuracy as the inverter transitions between operating points. Meanwhile, latent perturbation features distilled from every trajectory steer the learning process and concurrently reveal the small-signal eigenstructure essential for rigorous stability analysis. Validated on a grid-forming inverter, {The LFI-NODE requires one to two orders of magnitude fewer training samples compared with traditional methods, collected from short time-domain trajectories instead of extensive frequency-domain measurements.} {Furthermore, the LFI-NODE requires only 48 short transients to achieve a trajectory prediction error at the hundredth level and an eigenvalue estimation error at the tenth level, outperforming benchmark methods by one to two orders of magnitude.} This makes LFI-NODE a practical and lightweight approach for achieving high-fidelity stability assessment of complex black-box power-electronic systems.
comment: 6 pages 8fugures
Designing Control Barrier Functions Using a Dynamic Backup Policy
This paper presents a systematic approach to construct control barrier functions for nonlinear control affine systems subject to arbitrary state and input constraints. Taking inspiration from the reference governor literature, the proposed method defines a family of backup policies, parametrized by the equilibrium manifold of the system. The control barrier function is defined on the augmented state-and-reference space: given a state-reference pair, the approach quantifies the distance to constraint violation at any time in the future, should the current backup policy reference remain constant. Sensitivity analysis is then used to compute the (possibly nonsmooth) Jacobian with respect to the augmented state vector. To showcase its simple yet general nature, the proposed method is applied to an inverted pendulum on cart.
comment: 7 pages, 1 figure
Science ouverte et collaborative pour l'élaboration d'un banc automatisé de caractérisation de pertes en commutation par opposition
The switching losses of power transistors are generally measured using the so-called double pulse method. Measuring the opposition of two switching cells is a complementary method that is more accurate but indirect. However, implementing this method can be more complex and requires calibration steps and comprehensive control, with the added issue of thermal management. In this context, we proposed to address this topic through open and collaborative science, first in the form of a two-day hackathon, followed by monthly open sessions. More than 20 participants contributed to the two-day hackathon, followed by monthly sessions for those wishing to continue working together. This enabled us to set up an automated bench, in open science, including the generation of switching commands, the configuration and control of measuring instruments, and the hardware part. Here we present and share our work and this open approach.
comment: Paper in french, presented at the french national electrical engineering conference SGE 2025
Robustness Analysis for Quantum Systems Controlled by Continuous-Time Pulses
Differential sensitivity techniques originally developed to study the robustness of energy landscape controllers are generalized to the important case of closed quantum systems subject to continuously varying controls. Vanishing sensitivity to parameter variation is shown to coincide with perfect fidelity, as was the case for time-invariant controls. Upper bounds on the magnitude of the differential sensitivity to any parameter variation are derived based simply on knowledge of the system Hamiltonian and the maximum size of the control inputs.
comment: 6 pages, 2 figures
Safe and Optimal N-Spacecraft Swarm Reconfiguration in Non-Keplerian Cislunar Orbits
This paper presents a novel fuel-optimal guidance and control methodology for spacecraft swarm reconfiguration in Restricted Multi-Body Problems (RMBPs) with a guarantee of passive safety, maintaining miss distance even under abrupt loss of control authority. A new set of constraints exploits a quasi-periodic structure of RMBPs to guarantee passive safety. Particularly, the condition for passive safety is expressed as simple geometric constraints by solving optimal control in Local Toroidal Coordinates, which is based on a local eigenspace of a quasi-periodic motion around the corresponding periodic orbit. The proposed formulation enables a significant simplification of problem structure, which is applicable to large-scale swarm reconfiguration in cislunar orbits. The method is demonstrated in the Circular Restricted Three-Body Problem, the Elliptic Restricted Three-Body Problem, and the Bi-Circular Restricted Four-Body Problem. Furthermore, the optimized control profiles are validated in the full-ephemeris dynamics model. By extending and generalizing well-known concepts of relative orbital elements within the restricted two-body problem to the three- and four-body problems, this paper lays the foundation for practical control schemes of relative motion in cislunar space.
comment: 41 pages, 19 figures. Submitted and accepted to Journal of Guidance, Control, and Dynamics
A Digital Twin for Diesel Engines: Operator-infused Physics-Informed Neural Networks with Transfer Learning for Engine Health Monitoring
Improving diesel engine efficiency, reducing emissions, and enabling robust health monitoring have been critical research topics in engine modelling. While recent advancements in the use of neural networks for system monitoring have shown promising results, such methods often focus on component-level analysis, lack generalizability, and physical interpretability. In this study, we propose a novel hybrid framework that combines physics-informed neural networks (PINNs) with deep operator networks (DeepONet) to enable accurate and computationally efficient parameter identification in mean-value diesel engine models. Our method leverages physics-based system knowledge in combination with data-driven training of neural networks to enhance model applicability. Incorporating offline-trained DeepONets to predict actuator dynamics significantly lowers the online computation cost when compared to the existing PINN framework. To address the re-training burden typical of PINNs under varying input conditions, we propose two transfer learning (TL) strategies: (i) a multi-stage TL scheme offering better runtime efficiency than full online training of the PINN model and (ii) a few-shot TL scheme that freezes a shared multi-head network body and computes physics-based derivatives required for model training outside the training loop. The second strategy offers a computationally inexpensive and physics-based approach for predicting engine dynamics and parameter identification, offering computational efficiency over the existing PINN framework. Compared to existing health monitoring methods, our framework combines the interpretability of physics-based models with the flexibility of deep learning, offering substantial gains in generalization, accuracy, and deployment efficiency for diesel engine diagnostics.
SwarmGPT: Combining Large Language Models with Safe Motion Planning for Drone Swarm Choreography
Drone swarm performances -- synchronized, expressive aerial displays set to music -- have emerged as a captivating application of modern robotics. Yet designing smooth, safe choreographies remains a complex task requiring expert knowledge. We present SwarmGPT, a language-based choreographer that leverages the reasoning power of large language models (LLMs) to streamline drone performance design. The LLM is augmented by a safety filter that ensures deployability by making minimal corrections when safety or feasibility constraints are violated. By decoupling high-level choreographic design from low-level motion planning, our system enables non-experts to iteratively refine choreographies using natural language without worrying about collisions or actuator limits. We validate our approach through simulations with swarms up to 200 drones and real-world experiments with up to 20 drones performing choreographies to diverse types of songs, demonstrating scalable, synchronized, and safe performances. Beyond entertainment, this work offers a blueprint for integrating foundation models into safety-critical swarm robotics applications.
comment: Accepted at RA-L 2025
Extending First-order Robotic Motion Planners to Second-order Robot Dynamics
This paper extends first-order motion planners to robots governed by second-order dynamics. Two control schemes are proposed based on the knowledge of a scalar function whose negative gradient aligns with a given first-order motion planner. When such a function is known, the first-order motion planner is combined with a damping velocity vector with a dynamic gain to extend the safety and convergence guarantees of the first-order motion planner to second-order systems. If no such function is available, we propose an alternative control scheme ensuring that the error between the robot's velocity and the first-order motion planner converges to zero. The theoretical developments are supported by simulation results demonstrating the effectiveness of the proposed approaches.
comment: 14 pages, 10 figures
Direction Estimation of Sound Sources Using Microphone Arrays and Signal Strength ICSE
Sound-tracking refers to the process of determining the direction from which a sound originates, making it a fundamental component of sound source localization. This capability is essential in a variety of applications, including security systems, acoustic monitoring, and speaker tracking, where accurately identifying the direction of a sound source enables real-time responses, efficient resource allocation, and improved situational awareness. While sound-tracking is closely related to localization, it specifically focuses on identifying the direction of the sound source rather than estimating its exact position in space. Despite its utility, sound-tracking systems face several challenges, such as maintaining directional accuracy and precision, along with the need for sophisticated hardware configurations and complex signal processing algorithms. This paper presents a sound-tracking method using three electret microphones. We estimate the direction of a sound source using a lightweight method that analyzes signals from three strategically placed microphones. By comparing the average power of the received signals, the system infers the most probable direction of the sound. The results indicate that the power level from each microphone effectively determines the sound source direction. Our system employs a straightforward and cost-effective hardware design, ensuring simplicity and affordability in implementation. It achieves a localization error of less than 6 degrees and a precision of 98%. Additionally, its effortless integration with various systems makes it versatile and adaptable. Consequently, this technique presents a robust and reliable solution for sound-tracking and localization, with potential applications spanning diverse domains such as security systems, smart homes, and acoustic monitoring.
comment: Accepted to the 32nd International Conference on Systems Engineering (ICSEng'2025)
Optimization via a Control-Centric Framework
Optimization plays a central role in intelligent systems and cyber-physical technologies, where speed and reliability of convergence directly impact performance. In control theory, optimization-centric methods are standard: controllers are designed by repeatedly solving optimization problems, as in linear quadratic regulation, $H_\infty$ control, and model predictive control. In contrast, this paper develops a control-centric framework for optimization itself, where algorithms are constructed directly from Lyapunov stability principles rather than being proposed first and analyzed afterward. A key element is the stationarity vector, which encodes first-order optimality conditions and enables Lyapunov-based convergence analysis. By pairing a Lyapunov function with a selectable decay law, we obtain continuous-time dynamics with guaranteed exponential, finite-time, fixed-time, or prescribed-time convergence. Within this framework, we introduce three feedback realizations of increasing restrictiveness: the Hessian-gradient, Newton, and gradient dynamics. Each realization shapes the decay of the stationarity vector to achieve the desired rate. These constructions unify unconstrained optimization, extend naturally to constrained problems via Lyapunov-consistent primal-dual dynamics, and broaden the results for minimax and generalized Nash equilibrium seeking problems beyond exponential stability. The framework provides systematic design tools for optimization algorithms in control and game-theoretic problems.
comment: This work has been submitted to the IEEE for possible publication. 12 pages, 3 figures
A view on learning robust goal-conditioned value functions: Interplay between RL and MPC
Reinforcement learning (RL) and model predictive control (MPC) offer a wealth of distinct approaches for automatic decision-making under uncertainty. Given the impact both fields have had independently across numerous domains, there is growing interest in combining the general-purpose learning capability of RL with the safety and robustness features of MPC. To this end, this paper presents a tutorial-style treatment of RL and MPC, treating them as alternative approaches to solving Markov decision processes. In our formulation, RL aims to learn a global value function through offline exploration in an uncertain environment, whereas MPC constructs a local value function through online optimization. This local-global perspective suggests new ways to design policies that combine robustness and goal-conditioned learning. Robustness is incorporated into the RL and MPC pipelines through a scenario-based approach. Goal-conditioned learning aims to alleviate the burden of engineering a reward function for RL. Combining the two leads to a single policy that unites a robust, high-level RL terminal value function with short-term, scenario-based MPC planning for reliable constraint satisfaction. This approach leverages the benefits of both RL and MPC, the effectiveness of which is demonstrated on classical control benchmarks.
comment: Postprint; 37 pages
Decentralized CBF-based Safety Filters for Collision Avoidance of Cooperative Missile Systems with Input Constraints
This paper presents a decentralized safety filter for collision avoidance in multi-agent aerospace interception scenarios. The approach leverages robust control barrier functions (RCBFs) to guarantee forward invariance of safety sets under bounded inputs and high-relative-degree dynamics. Each effector executes its nominal cooperative guidance command, while a local quadratic program (QP) modifies the input only when necessary. Event-triggered activation based on range and zero-effort miss (ZEM) criteria ensures scalability by restricting active constraints to relevant neighbors. To resolve feasibility issues from simultaneous constraints, a slack-variable relaxation scheme is introduced that prioritizes critical agents in a Pareto-optimal manner. Simulation results in many-on-many interception scenarios demonstrate that the proposed framework maintains collision-free operation with minimal deviation from nominal guidance, providing a computationally efficient and scalable solution for safety-critical multi-agent aerospace systems.
comment: 7 pages, 5 figures
A Control Allocation Algorithm for Hypersonic Glide Vehicles with Input Limitations
Hypersonic glide vehicles (HGVs) operate in challenging flight regimes characterized by strong nonlinearities in actuation and stringent physical constraints. These include state-dependent actuator limitations, asymmetric control bounds, and thermal loads that vary with maneuvering conditions. This paper introduces an iterative control allocation method to address these challenges in real time. The proposed algorithm searches for control inputs that achieve the desired moment commands while respecting constraints on input magnitude and rate. For slender HGV configurations, thermal loads and drag generation are strongly correlated-lower drag typically results in reduced surface heating. By embedding drag-sensitive soft constraints, the method improves energy efficiency and implicitly reduces surface temperatures, lowering the vehicle's infrared signature. These features are particularly advantageous for long-range military operations that require low observability. The approach is demonstrated using the DLR's Generic Hypersonic Glide Vehicle 2 (GHGV-2) simulation model. The results confirm the method's effectiveness in maintaining control authority under realistic, constrained flight conditions.
comment: 38 pages, 20 figures, submitted to the AIAA Journal of Guidance, Control, and Dynamics
Machine Learning Detection of Lithium Plating in Lithium-ion Cells: A Gaussian Process Approach
Lithium plating during fast charging is a critical degradation mechanism that accelerates capacity fade and can trigger catastrophic safety failures. Recent work has identified a distinctive dQ/dV peak above 4.0 V as a reliable signature of plating onset; however, conventional methods for computing dQ/dV rely on finite differencing with filtering, which amplifies sensor noise and introduces bias in peak location. In this paper, we propose a Gaussian Process (GP) framework for lithium plating detection by directly modeling the charge-voltage relationship Q(V) as a stochastic process with calibrated uncertainty. Leveraging the property that derivatives of GPs remain GPs, we infer dQ/dV analytically and probabilistically from the posterior, enabling robust detection without ad hoc smoothing. The framework provides three key benefits: (i) noise-aware inference with hyperparameters learned from data, (ii) closed-form derivatives with credible intervals for uncertainty quantification, and (iii) scalability to online variants suitable for embedded BMS. Experimental validation on Li-ion coin cells across a range of C-rates (0.2C-1C) and temperatures (0-40\deg C) demonstrates that the GP-based method reliably detects plating peaks under low-temperature, high-rate charging, while correctly reporting no peaks in baseline cases. The concurrence of GP-identified differential peaks, reduced charge throughput, and capacity fade measured via reference performance tests confirms the method's accuracy and robustness, establishing a practical pathway for real-time lithium plating detection.
comment: Submitted to American Control Conference 2026 - ACC 2026
Techno-economic analysis of self-sustainable thermophotovoltaic systems for grid-scale energy generation
To facilitate the widespread adoption of renewable energy, dispatchable, zero-emission power sources are essential for grid stability. This work performs a comprehensive techno-economic analysis of a self-sustainable thermophotovoltaic (TPV) system, an architecture that integrates solar charging to function as a standalone power generation asset. Using theory-based models for conventional air-bridge InGaAs and Si diode cells, our analysis reveals that while the system is not currently competitive from a pure levelized of storage cost (LCOS) perspective due to the high capital expenditure for thermal battery materials, its primary value lies in its competitive levelized cost of electricity (LCOE), which is comparable to that of conventional dispatchable generators such as gas turbines. Furthermore, we show that a full Si-based TPV system, utilizing a 50-{\mu}m-thick air-bridge cell for enhanced photon utilization, can also achieve an LCOE that is competitive with such conventional power sources at scales exceeding the gigawatt-hour level, despite its lower conversion efficiency relative to its InGaAs counterpart. This highlights a practical engineering pathway for leveraging the immense manufacturing scalability of Si, offering a lower-risk route to deployment compared to III-V materials. Ultimately, this work establishes the self-sustainable TPV architecture as a compelling pathway toward providing grid-scale, on-demand, zero-emission power.
comment: 27 pages, 6 figures, 1 table
Transferable Parasitic Estimation via Graph Contrastive Learning and Label Rebalancing in AMS Circuits
Graph representation learning on Analog-Mixed Signal (AMS) circuits is crucial for various downstream tasks, e.g., parasitic estimation. However, the scarcity of design data, the unbalanced distribution of labels, and the inherent diversity of circuit implementations pose significant challenges to learning robust and transferable circuit representations. To address these limitations, we propose CircuitGCL, a novel graph contrastive learning framework that integrates representation scattering and label rebalancing to enhance transferability across heterogeneous circuit graphs. CircuitGCL employs a self-supervised strategy to learn topology-invariant node embeddings through hyperspherical representation scattering, eliminating dependency on large-scale data. Simultaneously, balanced mean squared error (BMSE) and balanced softmax cross-entropy (BSCE) losses are introduced to mitigate label distribution disparities between circuits, enabling robust and transferable parasitic estimation. Evaluated on parasitic capacitance estimation (edge-level task) and ground capacitance classification (node-level task) across TSMC 28nm AMS designs, CircuitGCL outperforms all state-of-the-art (SOTA) methods, with the $R^2$ improvement of $33.64\% \sim 44.20\%$ for edge regression and F1-score gain of $0.9\times \sim 2.1\times$ for node classification. Our code is available at https://github.com/ShenShan123/CircuitGCL.
comment: Final version accepted by the International Conference on Computer-Aided Design (ICCAD) 2025. First two authors have equal contributions
Geometry of Distance Protection
Distance relays detect faults on transmission lines. They face uncertainty from the fault's location and resistance, as well as the current from the line's remote terminal. In this paper, we aggregate this uncertainty with the Minkowski sum. This allows us to explicitly model the power grid surrounding the relay's line, and in turn accommodate any mix of synchronous machines and inverter-based resources. To make the relay's task easier, inverters can inject perturbations, or auxiliary signals, such as negative-sequence current. We use Farkas' lemma to construct an optimization for designing inverter auxiliary signals.
A Real-Time System for Scheduling and Managing UAV Delivery in Urban Areas
As urban logistics demand continues to grow, UAV delivery has become a key solution to improve delivery efficiency, reduce traffic congestion, and lower logistics costs. However, to fully leverage the potential of UAV delivery networks, efficient swarm scheduling and management are crucial. In this paper, we propose a real-time scheduling and management system based on the ``Airport-Unloading Station" model, aiming to bridge the gap between high-level scheduling algorithms and low-level execution systems. This system, acting as middleware, accurately translates the requirements from the scheduling layer into specific execution instructions, ensuring that the scheduling algorithms perform effectively in real-world environments. Additionally, we implement three collaborative scheduling schemes involving autonomous ground vehicles (AGVs), unmanned aerial vehicles (UAVs), and ground staff to further optimize overall delivery efficiency. Through extensive experiments, this study demonstrates the rationality and feasibility of the proposed management system, providing practical solution for the commercial application of UAVs delivery in urban. Code: https://github.com/chengji253/UAVDeliverySystem
Coordinated Control of Deformation and Flight for Morphing Aircraft via Meta-Learning and Coupled State-Dependent Riccati Equations
In this paper, the coordinated control problem of deformation and flight for morphing aircraft (MA) is studied by using meta-learning (ML) and coupled state-dependent Riccati equations (CSDREs). Our method is built on two principal observations that dynamic models of MA under varying morphing conditions share a morphing condition independent representation function and that the specific morphing condition part lies in a set of linear coefficients. To that end, the domain adversarially invariant meta-learning (DAIML) is employed to learn the shared representation with offline flight data. Based on the learned representation function, the coordinated control of the deformation and flight for MA is formulated as a non-cooperative differential game. The state-dependent feedback control solutions can be derived by addressing a pair of CSDREs. For this purpose, Lyapunov iterations are extended to obtain the positive semidefinite (definite) stabilizing solutions of the CSDREs, and the convergence proof of the proposed algorithm is provided. Finally, a simulation study is carried out to validate the efficacy of the developed coordinated game control strategies.
Modeling and Simulation of an Active Car Suspension with a Robust LQR Controller under Road Disturbance, Parameter Uncertainty and White Noise
Vehicle suspension is important for passengers to travel comfortably and to be less exposed to effects such as vibration and shock. A good suspension system increases the road holding of vehicles, allows them to take turns safely, and reduces the risk of traffic accidents. A passive suspension system is the most widely used suspension system in vehicles due to its simple structure and low cost. Passive suspension systems do not have an actuator and therefore do not have a controller. Active suspension systems have an actuator and a controller. Although their structures are more complex and costly, they are safer. PID controller is widely used in active suspension systems due to its simple structure, reasonable cost, and easy adjustment of coefficients. In this study, a more robust LQR-controlled active suspension was designed than a passive suspension and a PID-controlled active suspension. Robustness analyses were performed for passive suspension, PID-controlled active suspension, and LQR-controlled active suspension. Suspension travel, sprung mass acceleration, and sprung mass motion simulations were performed for all three suspensions under road disturbance, under simultaneous road disturbance and parameter uncertainty and under road disturbance with white noise. A comparative analysis was performed by obtaining the rise time, overshoot, and settling time data of the suspensions under different conditions. It was observed that the LQR-controlled active suspension showed the fastest rise time, the least overshoot and had the shortest settling time. In this case, it was proven that the LQR controlled active suspension provided a more comfortable and safe ride compared to the other two suspension systems.
comment: 20 pages, 19 figures
Resilient Multi-Dimensional Consensus and Distributed Optimization against Agent-Based and Denial-of-Service Attacks
In this paper, we consider the resilient multi-dimensional consensus and distributed optimization problems of multi-agent systems (MASs) in the presence of both agent-based and denial-of-service (DoS) attacks. The considered agent-based attacks can cover malicious, Byzantine, and stubborn agents. The links between agents in the network can be blocked by DoS attacks, which may lead the digraph to be time-varying and even disconnected. The objective is to ensure that the remaining benign agents achieve consensus. To this end, an "auxiliary point"-based resilient control algorithm is proposed for MASs. Under the proposed algorithm, each healthy agent constructs a "safe kernel" utilizing the states of its in-neighbors and updates its state toward a specific point within this kernel at each iteration. If an agent cannot receive its neighbors' states owing to DoS attacks, it will use the states received immediately before the DoS period. Moreover, a resilient multi-dimensional distributed optimization (RMDO) algorithm is also proposed. Theoretical proofs and numerical examples are presented to demonstrate the effectiveness of the proposed algorithms.
Learning a Shape-adaptive Assist-as-needed Rehabilitation Policy from Therapist-informed Input
Therapist-in-the-loop robotic rehabilitation has shown great promise in enhancing rehabilitation outcomes by integrating the strengths of therapists and robotic systems. However, its broader adoption remains limited due to insufficient safe interaction and limited adaptation capability. This article proposes a novel telerobotics-mediated framework that enables therapists to intuitively and safely deliver assist-as-needed~(AAN) therapy based on two primary contributions. First, our framework encodes the therapist-informed corrective force into via-points in a latent space, allowing the therapist to provide only minimal assistance while encouraging patient maintaining own motion preferences. Second, a shape-adaptive ANN rehabilitation policy is learned to partially and progressively deform the reference trajectory for movement therapy based on encoded patient motion preferences and therapist-informed via-points. The effectiveness of the proposed shape-adaptive AAN strategy was validated on a telerobotic rehabilitation system using two representative tasks. The results demonstrate its practicality for remote AAN therapy and its superiority over two state-of-the-art methods in reducing corrective force and improving movement smoothness.
Hierarchical Analysis and Control of Epidemic Spreading over Networks using Dissipativity and Mesh Stability
Analyzing and controlling spreading processes are challenging problems due to the involved non-linear node (subsystem) dynamics, unknown disturbances, complex interconnections, and the large-scale and multi-level nature of the problems. The dissipativity concept provides a practical framework for addressing such concerns, thanks to the energy-based representation it offers for subsystems and the compositional properties it provides for the analysis and control of interconnected (networked) systems comprised of such subsystems. Therefore, in this paper, we utilize the dissipativity concept to analyze and control a spreading process that occurs over a hierarchy of nodes, groups, and a network (i.e., a spreading network). We start by generalizing several existing results on dissipativity-based topology design for networked systems. Next, we model the considered spreading network as a networked system and establish the dissipativity properties of its nodes. The generalized topology design method is then applied at multiple levels of the considered spreading network to formulate its analysis and control problems as Linear Matrix Inequality (LMI) problems. We identify and enforce localized necessary conditions to support the feasibility of the LMI problem solved at each subsequent hierarchical level of the spreading network. Consequently, the proposed method does not involve iterative multi-level optimization stages that are computationally inefficient. The proposed control solution ensures that the spreading network is not only stable but also dissipative and mesh-stable. Compared to conventional methods, such as threshold pruning and high-degree edge removal, our approach offers superior performance in terms of infection containment, control efficiency, and disturbance robustness. Extensive numerical results demonstrate the effectiveness of the proposed technique.
comment: To be submitted to Automatica
Connecting the Equinoctial Elements and Rodrigues Parameters: A New Set of Elements
A geometric interpretation of the equinoctial elements is given with a connection to orthogonal rotations and attitude dynamics in Euclidean 3-space. An identification is made between the equinoctial elements and classic Rodrigues parameters. A new set of equinoctial elements are developed using the modified Rodrigues parameters, thereby removing the coordinate singularity for retrograde equatorial orbits present in previous versions of these elements. A low-thrust trajectory optimization problem is set up using the new elements to numerically verify convergence for the two-point boundary problem, as compared to their predecessors.
comment: formatting corrected for better readability
Optimal Assignment and Motion Control in Two-Class Continuum Swarms
We consider optimal swarm control problems where two different classes of agents are present. Continuum idealizations of large-scale swarms are used where the dynamics describe the evolution of the spatially-distributed densities of each agent class. The problem formulation we adopt is motivated by applications where agents of one class are assigned to agents of the other class, which we refer to as demand and resource agents respectively. Assignments have costs related to the distances between mutually assigned agents, and the overall cost of an assignment is quantified by a Wasserstein distance between the densities of the two agent classes. When agents can move, the assignment cost can decrease at the expense of a physical motion cost, and this tradeoff sets up a nonlinear infinite-dimensional optimal control problem. We show that in one spatial dimension, this problem can be converted to an infinite-dimensional, but decoupled, linear-quadratic (LQ) tracking problem when expressed in terms of the quantile functions of the respective agent densities. Solutions are given in the general one-dimensional case, as well as in the special cases of constant and periodically time-varying demands.
comment: Extended version including periodic-demand case. 13 pages, 7 figures
Data-driven Model Predictive Control using MATLAB
This paper presents a comprehensive overview of data-driven model predictive control, highlighting state-of-the-art methodologies and their numerical implementation. The discussion begins with a brief review of conventional model predictive control (MPC), which discusses both linear MPC (LMPC) and nonlinear MPC (NMPC). This is followed by a section on data-driven LMPC, outlining fundamental concepts and the implementation of various approaches, including subspace predictive control and prediction error methods. Subsequently, the focus shifts to data-driven NMPC, emphasizing approaches based on neural network models. The paper concludes with a review of recent advancements in data-driven MPC and explores potential directions for future research.
comment: 22 pages, 8 figures
Multiagent Systems
Opponent Shaping in LLM Agents
Large Language Models (LLMs) are increasingly being deployed as autonomous agents in real-world environments. As these deployments scale, multi-agent interactions become inevitable, making it essential to understand strategic behavior in such systems. A central open question is whether LLM agents, like reinforcement learning agents, can shape the learning dynamics and influence the behavior of others through interaction alone. In this paper, we present the first investigation of opponent shaping (OS) with LLM-based agents. Existing OS algorithms cannot be directly applied to LLMs, as they require higher-order derivatives, face scalability constraints, or depend on architectural components that are absent in transformers. To address this gap, we introduce ShapeLLM, an adaptation of model-free OS methods tailored for transformer-based agents. Using ShapeLLM, we examine whether LLM agents can influence co-players' learning dynamics across diverse game-theoretic environments. We demonstrate that LLM agents can successfully guide opponents toward exploitable equilibria in competitive games (Iterated Prisoner's Dilemma, Matching Pennies, and Chicken) and promote coordination and improve collective welfare in cooperative games (Iterated Stag Hunt and a cooperative version of the Prisoner's Dilemma). Our findings show that LLM agents can both shape and be shaped through interaction, establishing opponent shaping as a key dimension of multi-agent LLM research.
comment: 29 pages, 15 figures, 15 tables
Bayesian Decision Making around Experts
Complex learning agents are increasingly deployed alongside existing experts, such as human operators or previously trained agents. However, it remains unclear how should learners optimally incorporate certain forms of expert data, which may differ in structure from the learner's own action-outcome experiences. We study this problem in the context of Bayesian multi-armed bandits, considering: (i) offline settings, where the learner receives a dataset of outcomes from the expert's optimal policy before interaction, and (ii) simultaneous settings, where the learner must choose at each step whether to update its beliefs based on its own experience, or based on the outcome simultaneously achieved by an expert. We formalize how expert data influences the learner's posterior, and prove that pretraining on expert outcomes tightens information-theoretic regret bounds by the mutual information between the expert data and the optimal action. For the simultaneous setting, we propose an information-directed rule where the learner processes the data source that maximizes their one-step information gain about the optimal action. Finally, we propose strategies for how the learner can infer when to trust the expert and when not to, safeguarding the learner for the cases where the expert is ineffective or compromised. By quantifying the value of expert data, our framework provides practical, information-theoretic algorithms for agents to intelligently decide when to learn from others.
Climate Surrogates for Scalable Multi-Agent Reinforcement Learning: A Case Study with CICERO-SCM
Climate policy studies require models that capture the combined effects of multiple greenhouse gases on global temperature, but these models are computationally expensive and difficult to embed in reinforcement learning. We present a multi-agent reinforcement learning (MARL) framework that integrates a high-fidelity, highly efficient climate surrogate directly in the environment loop, enabling regional agents to learn climate policies under multi-gas dynamics. As a proof of concept, we introduce a recurrent neural network architecture pretrained on ($20{,}000$) multi-gas emission pathways to surrogate the climate model CICERO-SCM. The surrogate model attains near-simulator accuracy with global-mean temperature RMSE $\approx 0.0004 \mathrm{K}$ and approximately $1000\times$ faster one-step inference. When substituted for the original simulator in a climate-policy MARL setting, it accelerates end-to-end training by $>\!100\times$. We show that the surrogate and simulator converge to the same optimal policies and propose a methodology to assess this property in cases where using the simulator is intractable. Our work allows to bypass the core computational bottleneck without sacrificing policy fidelity, enabling large-scale multi-agent experiments across alternative climate-policy regimes with multi-gas dynamics and high-fidelity climate response.
Network Topology and Information Efficiency of Multi-Agent Systems: Study based on MARL
Multi-agent systems (MAS) solve complex problems through coordinated autonomous entities with individual decision-making capabilities. While Multi-Agent Reinforcement Learning (MARL) enables these agents to learn intelligent strategies, it faces challenges of non-stationarity and partial observability. Communications among agents offer a solution, but questions remain about its optimal structure and evaluation. This paper explores two underexamined aspects: communication topology and information efficiency. We demonstrate that directed and sequential topologies improve performance while reducing communication overhead across both homogeneous and heterogeneous tasks. Additionally, we introduce two metrics -- Information Entropy Efficiency Index (IEI) and Specialization Efficiency Index (SEI) -- to evaluate message compactness and role differentiation. Incorporating these metrics into training objectives improves success rates and convergence speed. Our findings highlight that designing adaptive communication topologies with information-efficient messaging is essential for effective coordination in complex MAS.
Multimodal Safety Evaluation in Generative Agent Social Simulations
Can generative agents be trusted in multimodal environments? Despite advances in large language and vision-language models that enable agents to act autonomously and pursue goals in rich settings, their ability to reason about safety, coherence, and trust across modalities remains limited. We introduce a reproducible simulation framework for evaluating agents along three dimensions: (1) safety improvement over time, including iterative plan revisions in text-visual scenarios; (2) detection of unsafe activities across multiple categories of social situations; and (3) social dynamics, measured as interaction counts and acceptance ratios of social exchanges. Agents are equipped with layered memory, dynamic planning, multimodal perception, and are instrumented with SocialMetrics, a suite of behavioral and structural metrics that quantifies plan revisions, unsafe-to-safe conversions, and information diffusion across networks. Experiments show that while agents can detect direct multimodal contradictions, they often fail to align local revisions with global safety, reaching only a 55 percent success rate in correcting unsafe plans. Across eight simulation runs with three models - Claude, GPT-4o mini, and Qwen-VL - five agents achieved average unsafe-to-safe conversion rates of 75, 55, and 58 percent, respectively. Overall performance ranged from 20 percent in multi-risk scenarios with GPT-4o mini to 98 percent in localized contexts such as fire/heat with Claude. Notably, 45 percent of unsafe actions were accepted when paired with misleading visuals, showing a strong tendency to overtrust images. These findings expose critical limitations in current architectures and provide a reproducible platform for studying multimodal safety, coherence, and social dynamics.
What Is Your Agent's GPA? A Framework for Evaluating Agent Goal-Plan-Action Alignment
We introduce the Agent GPA (Goal-Plan-Action) framework: an evaluation paradigm based on an agent's operational loop of setting goals, devising plans, and executing actions. The framework includes five evaluation metrics: Goal Fulfillment, Logical Consistency, Execution Efficiency, Plan Quality, and Plan Adherence. Logical Consistency checks that an agent's actions are consistent with its prior actions. Execution Efficiency checks whether the agent executes in the most efficient way to achieve its goal. Plan Quality checks whether an agent's plans are aligned with its goals; Plan Adherence checks if an agent's actions are aligned with its plan; and Goal Fulfillment checks that agent's final outcomes match the stated goals. Our experimental results on two benchmark datasets - the public TRAIL/GAIA dataset and an internal dataset for a production-grade data agent - show that this framework (a) provides a systematic way to cover a broad range of agent failures, including all agent errors on the TRAIL/GAIA benchmark dataset; (b) supports LLM-judges that exhibit strong agreement with human annotation, covering 80% to over 95% errors; and (c) localizes errors with 86% agreement to enable targeted improvement of agent performance.
A Hybrid Agent-Based and System Dynamics Framework for Modelling Project Execution and Technology Maturity in Early-Stage R&D
This paper presents a hybrid approach to predict the evolution of technological maturity in R and D projects, using the oil and gas sector as an example. Integrating System Dynamics (SD) and Agent Based Modelling (ABM) allows the proposed multi level framework to capture uncertainties in work effort, team size, and project duration, which influence technological progress. While AB SD hybrid models are established in other fields, their use in R and D remains limited. The model combines system level feedback structures governing work phases, rework cycles, and duration with decentralised agents such as team members, tasks, and controllers, whose interactions generate emergent project dynamics. A base case scenario analysed early stage innovation projects with 15 parallel tasks over 156 weeks. A comparative sequential scenario showed an 88 percent reduction in rework duration. A second scenario assessed mixed parallel sequential task structures with varying team sizes. In parallel configurations, increasing team size reduced project duration and improved task completion, with optimal results for teams of four to five members. These findings align with empirical evidence showing that moderate team expansion enhances coordination efficiency without excessive communication overhead. However, larger teams may decrease performance due to communication complexity and management delays. Overall, the model outputs and framework align with expert understanding, supporting their validity as quantitative tools for analysing resource allocation, scheduling efficiency, and technology maturity progression.
What Is Your Agent's GPA? A Framework for Evaluating Agent Goal-Plan-Action Alignment
We introduce the Agent GPA (Goal-Plan-Action) framework: an evaluation paradigm based on an agent's operational loop of setting goals, devising plans, and executing actions. The framework includes five evaluation metrics: Goal Fulfillment, Logical Consistency, Execution Efficiency, Plan Quality, and Plan Adherence. Logical Consistency checks that an agent's actions are consistent with its prior actions. Execution Efficiency checks whether the agent executes in the most efficient way to achieve its goal. Plan Quality checks whether an agent's plans are aligned with its goals; Plan Adherence checks if an agent's actions are aligned with its plan; and Goal Fulfillment checks that agent's final outcomes match the stated goals. Our experimental results on two benchmark datasets - the public TRAIL/GAIA dataset and an internal dataset for a production-grade data agent - show that this framework (a) provides a systematic way to cover a broad range of agent failures, including all agent errors on the TRAIL/GAIA benchmark dataset; (b) supports LLM-judges that exhibit strong agreement with human annotation, covering 80% to over 95% errors; and (c) localizes errors with 86% agreement to enable targeted improvement of agent performance.
MARLIN: Multi-Agent Reinforcement Learning with Murmuration Intelligence and LLM Guidance for Reservoir Management
As climate change intensifies extreme weather events, water disasters pose growing threats to global communities, making adaptive reservoir management critical for protecting vulnerable populations and ensuring water security. Modern water resource management faces unprecedented challenges from cascading uncertainties propagating through interconnected reservoir networks. These uncertainties, rooted in physical water transfer losses and environmental variability, make precise control difficult. For example, sending 10 tons downstream may yield only 8-12 tons due to evaporation and seepage. Traditional centralized optimization approaches suffer from exponential computational complexity and cannot effectively handle such real-world uncertainties, while existing multi-agent reinforcement learning (MARL) methods fail to achieve effective coordination under uncertainty. To address these challenges, we present MARLIN, a decentralized reservoir management framework inspired by starling murmurations intelligence. Integrating bio-inspired alignment, separation, and cohesion rules with MARL, MARLIN enables individual reservoirs to make local decisions while achieving emergent global coordination. In addition, a LLM provides real-time reward shaping signals, guiding agents to adapt to environmental changes and human-defined preferences. Experiments on real-world USGS data show that MARLIN improves uncertainty handling by 23\%, cuts computation by 35\%, and accelerates flood response by 68\%, exhibiting super-linear coordination, with complexity scaling 5.4x from 400 to 10,000 nodes. These results demonstrate MARLIN's potential for disaster prevention and protecting communities through intelligent, scalable water resource management.
Multi-Turn Human-LLM Interaction Through the Lens of a Two-Way Intelligibility Protocol NeurIPS 2025
Our interest is in the design of software systems involving a human-expert interacting -- using natural language -- with a large language model (LLM) on data analysis tasks. For complex problems, it is possible that LLMs can harness human expertise and creativity to find solutions that were otherwise elusive. On one level, this interaction takes place through multiple turns of prompts from the human and responses from the LLM. Here we investigate a more structured approach based on an abstract protocol described in [3] for interaction between agents. The protocol is motivated by a notion of "two-way intelligibility" and is modelled by a pair of communicating finite-state machines. We provide an implementation of the protocol, and provide empirical evidence of using the implementation to mediate interactions between an LLM and a human-agent in two areas of scientific interest (radiology and drug design). We conduct controlled experiments with a human proxy (a database), and uncontrolled experiments with human subjects. The results provide evidence in support of the protocol's capability of capturing one- and two-way intelligibility in human-LLM interaction; and for the utility of two-way intelligibility in the design of human-machine systems. Our code is available at https://github.com/karannb/interact.
comment: Multi-Turn Interactions in Large Language Models (MTI-LLM) Workshop at NeurIPS 2025
Paper2Video: Automatic Video Generation from Scientific Papers
Academic presentation videos have become an essential medium for research communication, yet producing them remains highly labor-intensive, often requiring hours of slide design, recording, and editing for a short 2 to 10 minutes video. Unlike natural video, presentation video generation involves distinctive challenges: inputs from research papers, dense multi-modal information (text, figures, tables), and the need to coordinate multiple aligned channels such as slides, subtitles, speech, and human talker. To address these challenges, we introduce Paper2Video, the first benchmark of 101 research papers paired with author-created presentation videos, slides, and speaker metadata. We further design four tailored evaluation metrics--Meta Similarity, PresentArena, PresentQuiz, and IP Memory--to measure how videos convey the paper's information to the audience. Building on this foundation, we propose PaperTalker, the first multi-agent framework for academic presentation video generation. It integrates slide generation with effective layout refinement by a novel effective tree search visual choice, cursor grounding, subtitling, speech synthesis, and talking-head rendering, while parallelizing slide-wise generation for efficiency. Experiments on Paper2Video demonstrate that the presentation videos produced by our approach are more faithful and informative than existing baselines, establishing a practical step toward automated and ready-to-use academic video generation. Our dataset, agent, and code are available at https://github.com/showlab/Paper2Video.
comment: Project Page: https://showlab.github.io/Paper2Video/
Neuro-Symbolic Agents with Modal Logic for Autonomous Diagnostics
The development of intelligent agents, particularly those powered by language models (LMs), has shown the critical role in various environments that require intelligent and autonomous decision. Environments are not passive testing grounds and they represent the data required for agents to learn and exhibit very challenging conditions that require adaptive, complex and autonomous capacity to make decisions. While the paradigm of scaling models and datasets has led to remarkable emergent capabilities, we argue that scaling the structure, fidelity, and logical consistency of agent reasoning within these environments is a crucial, yet underexplored, dimension of AI research. This paper introduces a neuro-symbolic multi-agent architecture where the belief states of individual agents are formally represented as Kripke models. This foundational choice enables them to reason about known concepts of \emph{possibility} and \emph{necessity} using the formal language of modal logic. In this work, we use of immutable, domain-specific knowledge to make infere information, which is encoded as logical constraints essential for proper diagnosis. In the proposed model, we show constraints that actively guide the hypothesis generation of LMs, effectively preventing them from reaching physically or logically untenable conclusions. In a high-fidelity simulated particle accelerator environment, our system successfully diagnoses complex, cascading failures by combining the powerful semantic intuition of LMs with the rigorous, verifiable validation of modal logic and a factual world model and showcasing a viable path toward more robust, reliable, and verifiable autonomous agents.
comment: 10 pages, 1 figure, Scaling Environments for Agents (SEA) Workshop at NeuralIPS
FG-PE: Factor-graph Approach for Multi-robot Pursuit-Evasion
With the increasing use of robots in daily life, there is a growing need to provide robust collaboration protocols for robots to tackle more complicated and dynamic problems effectively. This paper presents a novel, factor graph-based approach to address the pursuit-evasion problem, enabling accurate estimation, planning, and tracking of an evader by multiple pursuers working together. It is assumed that there are multiple pursuers and only one evader in this scenario. The proposed method significantly improves the accuracy of evader estimation and tracking, allowing pursuers to capture the evader in the shortest possible time and distance compared to existing techniques. In addition to these primary objectives, the proposed approach effectively minimizes uncertainty while remaining robust, even when communication issues lead to some messages being dropped or lost. Through a series of comprehensive experiments, this paper demonstrates that the proposed algorithm consistently outperforms traditional pursuit-evasion methods across several key performance metrics, such as the time required to capture the evader and the average distance traveled by the pursuers. Additionally, the proposed method is tested in real-world hardware experiments, further validating its effectiveness and applicability.
Position Paper: Towards Open Complex Human-AI Agents Collaboration Systems for Problem Solving and Knowledge Management
We propose a technology-agnostic, collaboration-ready stance for Human-AI Agents Collaboration Systems (HAACS) that closes long-standing gaps in prior stages (automation; flexible autonomy; agentic multi-agent collectives). Reading empirical patterns through a seven-dimension collaboration spine and human-agent contrasts, we identify missing pieces: principled budgeting of initiative, instantaneous and auditable reconfiguration, a system-wide knowledge backbone with an epistemic promotion gate, capacity-aware human interfaces; and, as a prerequisite to all of the above, unified definitions of agent and formal collaborative dynamics. We respond with (i) a boundary-centric ontology of agenthood synthesized with cybernetics; (ii) a Petri net family (colored and interpreted) that models ownership, cross-boundary interaction, concurrency, guards, and rates with collaboration transitions; and (iii) a three-level orchestration (meta, agent, execution) that governs behavior families via guard flips. On the knowledge side, we ground collaborative learning in Conversation Theory and SECI with teach-back gates and an evolving backbone; on the problem-solving side, we coordinate routine MEA-style control with practice-guided open-ended discovery. The result is the Hierarchical Exploration-Exploitation Net (HE2-Net): a policy-controlled stance that splits provisional from validated assets, promotes only after tests and peer checks, and budgets concurrent probing while keeping reuse fast and safe. We show interoperability with emerging agent protocols without ad hoc glue and sketch bio-cybernetic extensions (autopoiesis, autogenesis, evolving boundaries, synergetics, etc). Altogether, the framework keeps humans central to setting aims, justifying knowledge, and steering theory-practice dynamics, while scaling agents as reliable collaborators within audited governance.
comment: polish the structures, flows and connections, while complement the diagrams
Using utility graphs to search for Pareto-optimal outcomes in complex, interdependent issue negotiations
This paper studies how utility graphs decomposition algorithms can be used to effectively search for Pareto-efficient outcomes in complex automated negotiation. We propose a number of algorithms that can efficiently handle high-dimensional utility graphs, and test them on a variety of utility graph topologies, generated based on state of the art methods for analysing complex graphs. We show that we can achieve exponential speed-up, for many structures, even for very large utility graphs. To our knowledge, our approach can handle the largest utility spaces to date for complex interdependent negotiations, in terms of number of issues. Moreover, we examine the performance of our algorithms across two different types of elicitation queries from the literature: value and comparison queries, thus making a connection between automated negotiation and the preference elicitation literature.
comment: Authors' pre-print (16 pages)
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning
Reasoning-based large language models have excelled in mathematics and programming, yet their potential in knowledge-intensive medical question answering remains underexplored and insufficiently validated in clinical contexts. To bridge this gap, we introduce ReasonMed, the largest medical reasoning dataset to date, comprising 370k high-quality examples distilled from 1.75 million initial reasoning paths generated by complementary LLMs and curated through a cost-efficient easy-medium-difficult (EMD) pipeline. ReasonMed is built through a multi-agent generation, verification, and refinement process, in which an Error Refiner improves reasoning paths by correcting error-prone steps identified by a verifier. Using ReasonMed, we investigate effective strategies for training medical reasoning models and find that integrating detailed CoT reasoning with concise answer summaries yields the most robust fine-tuning results. Models trained on ReasonMed set a new benchmark: ReasonMed-7B surpasses the prior best sub-10B models by 4.17% and even exceeds LLaMA3.1-70B on PubMedQA by 4.60%. When scaled to ReasonMed-14B, it remains highly competitive, underscoring consistent scaling potential. The codes and datasets are available at https://github.com/YuSun-Work/ReasonMed.
comment: 28 pages, 6 figures, 7 tables
Multiple Memory Systems for Enhancing the Long-term Memory of Agent
An agent powered by large language models have achieved impressive results, but effectively handling the vast amounts of historical data generated during interactions remains a challenge. The current approach is to design a memory module for the agent to process these data. However, existing methods, such as MemoryBank and A-MEM, have poor quality of stored memory content, which affects recall performance and response quality. In order to better construct high-quality long-term memory content, we have designed a multiple memory system (MMS) inspired by cognitive psychology theory. The system processes short-term memory to multiple long-term memory fragments, and constructs retrieval memory units and contextual memory units based on these fragments, with a one-to-one correspondence between the two. During the retrieval phase, MMS will match the most relevant retrieval memory units based on the user's query. Then, the corresponding contextual memory units is obtained as the context for the response stage to enhance knowledge, thereby effectively utilizing historical data. Experiments on LoCoMo dataset compared our method with three others, proving its effectiveness. Ablation studies confirmed the rationality of our memory units. We also analyzed the robustness regarding the number of selected memory segments and the storage overhead, demonstrating its practical value.
What Do Agents Think One Another Want? Level-2 Inverse Games for Inferring Agents' Estimates of Others' Objectives
Effectively interpreting strategic interactions among multiple agents requires us to infer each agent's objective from limited information. Existing inverse game-theoretic approaches frame this challenge in terms of a "level-1" inference problem, in which we take the perspective of a third-party observer and assume that individual agents share complete knowledge of one another's objectives. However, this assumption breaks down in decentralized, real-world scenarios like urban driving and bargaining, in which agents may act based on conflicting views of one another's objectives. We demonstrate the necessity of inferring agents' different estimates of each other's objectives through empirical examples, and by theoretically characterizing the prediction error of level-1 inference on fictitious gameplay data from linear-quadratic games. To address this fundamental issue, we propose a framework for level-2 inference to address the question: "What does each agent believe about other agents' objectives?" We prove that the level-2 inference problem is non-convex even in benign settings like linear-quadratic games, and we develop an efficient gradient-based approach for identifying local solutions. Experiments on a synthetic urban driving example show that our approach uncovers nuanced misalignments that level-1 methods miss.
comment: 8 pages + references + appendix + supplement
Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning
Reinforcement learning (RL) is the dominant paradigm for sharpening strategic tool use capabilities of LLMs on long-horizon, sparsely-rewarded agent tasks, yet it faces a fundamental challenge of exploration-exploitation trade-off. Existing studies stimulate exploration through the lens of policy entropy, but such mechanical entropy maximization is prone to RL training instability due to the multi-turn distribution shifting. In this paper, we target the progressive exploration-exploitation balance under the guidance of the agent own experiences without succumbing to either entropy collapsing or runaway divergence. We propose SPEAR, a curriculum-based self-imitation learning (SIL) recipe for training agentic LLMs. It extends the vanilla SIL framework, where a replay buffer stores self-generated promising trajectories for off-policy update, by gradually steering the policy evolution within a well-balanced range of entropy across stages. Specifically, our approach incorporates a curriculum to manage the exploration process, utilizing intrinsic rewards to foster skill-level exploration and facilitating action-level exploration through SIL. At first, the auxiliary tool call reward plays a critical role in the accumulation of tool-use skills, enabling broad exposure to the unfamiliar distributions of the environment feedback with an upward entropy trend. As training progresses, self-imitation gets strengthened to exploit existing successful patterns from replayed experiences for comparative action-level exploration, accelerating solution iteration without unbounded entropy growth. To further stabilize training, we recalibrate the advantages of experiences in the replay buffer to address the potential policy drift. Reugularizations such as the clipping of tokens with high covariance between probability and advantage are introduced to the trajectory-level entropy control to curb over-confidence.
comment: 26 pages, 11 figures
MAViS: A Multi-Agent Framework for Long-Sequence Video Storytelling
Despite recent advances, long-sequence video generation frameworks still suffer from significant limitations: poor assistive capability, suboptimal visual quality, and limited expressiveness. To mitigate these limitations, we propose MAViS, a multi-agent collaborative framework designed to assist in long-sequence video storytelling by efficiently translating ideas into visual narratives. MAViS orchestrates specialized agents across multiple stages, including script writing, shot designing, character modeling, keyframe generation, video animation, and audio generation. In each stage, agents operate under the 3E Principle -- Explore, Examine, and Enhance -- to ensure the completeness of intermediate outputs. Considering the capability limitations of current generative models, we propose the Script Writing Guidelines to optimize compatibility between scripts and generative tools. Experimental results demonstrate that MAViS achieves state-of-the-art performance in assistive capability, visual quality, and video expressiveness. Its modular framework further enables scalability with diverse generative models and tools. With just a brief idea description, MAViS enables users to rapidly explore diverse visual storytelling and creative directions for sequential video generation by efficiently producing high-quality, complete long-sequence videos. To the best of our knowledge, MAViS is the only framework that provides multimodal design output -- videos with narratives and background music.
comment: Video Generation Agent
MAHL: Multi-Agent LLM-Guided Hierarchical Chiplet Design with Adaptive Debugging
As program workloads (e.g., AI) increase in size and algorithmic complexity, the primary challenge lies in their high dimensionality, encompassing computing cores, array sizes, and memory hierarchies. To overcome these obstacles, innovative approaches are required. Agile chip design has already benefited from machine learning integration at various stages, including logic synthesis, placement, and routing. With Large Language Models (LLMs) recently demonstrating impressive proficiency in Hardware Description Language (HDL) generation, it is promising to extend their abilities to 2.5D integration, an advanced technique that saves area overhead and development costs. However, LLM-driven chiplet design faces challenges such as flatten design, high validation cost and imprecise parameter optimization, which limit its chiplet design capability. To address this, we propose MAHL, a hierarchical LLM-based chiplet design generation framework that features six agents which collaboratively enable AI algorithm-hardware mapping, including hierarchical description generation, retrieval-augmented code generation, diverseflow-based validation, and multi-granularity design space exploration. These components together enhance the efficient generation of chiplet design with optimized Power, Performance and Area (PPA). Experiments show that MAHL not only significantly improves the generation accuracy of simple RTL design, but also increases the generation accuracy of real-world chiplet design, evaluated by Pass@5, from 0 to 0.72 compared to conventional LLMs under the best-case scenario. Compared to state-of-the-art CLARIE (expert-based), MAHL achieves comparable or even superior PPA results under certain optimization objectives.
Systems and Control (CS)
Reliability of Single-Level Equality-Constrained Inverse Optimal Control
Inverse optimal control (IOC) allows the retrieval of optimal cost function weights, or behavioral parameters, from human motion. The literature on IOC uses methods that are either based on a slow bilevel process or a fast but noise-sensitive minimization of optimality condition violation. Assuming equality-constrained optimal control models of human motion, this article presents a faster but robust approach to solving IOC using a single-level reformulation of the bilevel method and yields equivalent results. Through numerical experiments in simulation, we analyze the robustness to noise of the proposed single-level reformulation to the bilevel IOC formulation with a human-like planar reaching task that is used across recent studies. The approach shows resilience to very large levels of noise and reduces the computation time of the IOC on this task by a factor of 15 when compared to a classical bilevel implementation.
comment: 8 pages, 3 figures
Learning to Mitigate Post-Outage Load Surges: A Data-Driven Framework for Electrifying and Decarbonizing Grids
Electrification and decarbonization are transforming power system demand and recovery dynamics, yet their implications for post-outage load surges remain poorly understood. Here we analyze a metropolitan-scale heterogeneous dataset for Indianapolis comprising 30,046 feeder-level outages between 2020 and 2024, linked to smart meters and submetering, to quantify the causal impact of electric vehicles (EVs), heat pumps (HPs) and distributed energy resources (DERs) on restoration surges. Statistical analysis and causal forest inference demonstrate that rising penetrations of all three assets significantly increase surge ratios, with effects strongly modulated by restoration timing, outage duration and weather conditions. We develop a component-aware multi-task Transformer estimator that disaggregates EV, HP and DER contributions, and apply it to project historical outages under counterfactual 2035 adoption pathways. In a policy-aligned pathway, evening restorations emerge as the binding reliability constraint, with exceedance probabilities of 0.057 when 30\% of system load is restored within the first 15 minutes. Mitigation measures, probabilistic EV restarts, short thermostat offsets and accelerated DER reconnection, reduce exceedance to 0.019 and eliminate it entirely when 20\% or less of system load is restored. These results demonstrate that transition-era surges are asset-driven and causally linked to electrification and decarbonization, but can be effectively managed through integrated operational strategies.
Underground Power Distribution System Restoration Using Inverter Based Resources
Underground power distribution systems (PDSs) are increasingly deployed in urban areas. The integration of smart devices including smart switchgears, pad-mounted distribution transformers and inverter-based resources (IBRs) enhance system resilience, however simultaneously introducing unique challenges. The challenges include inrush currents caused by trapped charges in underground cables, ferroresonance in distribution transformers during energization, and three-phase load imbalance resulting from single-phase underground laterals. To address these issues, this paper proposes an underground PDS restoration framework using IBRs. Firstly, an underground cable energization model is developed to quantify inrush current by analyzing voltage differences across both switchgear terminals. Secondly, a distribution transformer energization model is proposed to evaluate ferroresonance using Q-factor constraints based on underground cable capacitance and damping resistance. Thirdly, a phase-swapping model is proposed to improve load balancing by dynamically reassigning lateral-phase connections through smart switchgears. The proposed models are further integrated into a mixed-integer nonlinear programming (MINLP) formulation to maximize the total weighted restored load while constraining inrush currents, ferroresonance, and phase imbalance. To address the nonlinearity induced by impedance matrix reordering during phase swapping, a permutation-based linearization technique is proposed. Finally, case studies on an underground PDS established based on IEEE 123-Node Test Feeder validate the effectiveness of the proposed strategy in improving uderground PDS restoration performance.
Quantum memory optimisation using finite-horizon, decoherence time and discounted mean-square performance criteria
This paper is concerned with open quantum memory systems for approximately retaining quantum information, such as initial dynamic variables or quantum states to be stored over a bounded time interval. In the Heisenberg picture of quantum dynamics, the deviation of the system variables from their initial values lends itself to closed-form computation in terms of tractable moment dynamics for open quantum harmonic oscillators and finite-level quantum systems governed by linear or quasi-linear Hudson-Parthasarathy quantum stochastic differential equations, respectively. This tractability is used in a recently proposed optimality criterion for varying the system parameters so as to maximise the memory decoherence time when the mean-square deviation achieves a given critical threshold. The memory decoherence time maximisation approach is extended beyond the previously considered low-threshold asymptotic approximation and to Schr\"{o}dinger type mean-square deviation functionals for the reduced system state governed by the Lindblad master equation. We link this approach with the minimisation of the mean-square deviation functionals at a finite time horizon and with their discounted version which quantifies the averaged performance of the quantum system as a temporary memory under a Poisson flow of storage requests.
comment: 8 pages, 1 figure, submitted to IFAC World Congress 2026
CPU- and GPU-Based Parallelization of the Robust Reference Governor
Constraint management is a central challenge in modern control systems. A solution is the Reference Governor (RG), which is an add-on strategy to pre-stabilized feedback control systems to enforce state and input constraints by shaping the reference command. While robust formulations of RG exist for linear systems, their extension to nonlinear systems is often computationally intractable. This paper develops a scenario-based robust RG formulation for nonlinear systems and investigates its parallel implementation on multi-core CPUs and CUDA-enabled GPUs. We analyze the computational structure of the algorithm, identify parallelization opportunities, and implement the resulting schemes on modern parallel hardware. Benchmarking on a nonlinear hydrogen fuel cell model demonstrates order-of-magnitude speedups (by as much as three orders of magnitude) compared to sequential implementations.
A Control Allocation Algorithm for Hypersonic Glide Vehicles with Input Limitations
Hypersonic glide vehicles (HGVs) operate in challenging flight regimes characterized by strong nonlinearities in actuation and stringent physical constraints. These include state-dependent actuator limitations, asymmetric control bounds, and thermal loads that vary with maneuvering conditions. This paper introduces an iterative control allocation method to address these challenges in real time. The proposed algorithm searches for control inputs that achieve the desired moment commands while respecting constraints on input magnitude and rate. For slender HGV configurations, thermal loads and drag generation are strongly correlated-lower drag typically results in reduced surface heating. By embedding drag-sensitive soft constraints, the method improves energy efficiency and implicitly reduces surface temperatures, lowering the vehicle's infrared signature. These features are particularly advantageous for long-range military operations that require low observability. The approach is demonstrated using the DLR's Generic Hypersonic Glide Vehicle 2 (GHGV-2) simulation model. The results confirm the method's effectiveness in maintaining control authority under realistic, constrained flight conditions.
comment: 38 pages, 20 figures, submitted to the AIAA Journal of Guidance, Control, and Dynamics
Satellite Navigation and Control using Physics-Informed Artificial Potential Field and Sliding Mode Controller
Increase in the number of space exploration missions has led to the accumulation of space debris, posing risk of collision with the operational satellites. Addressing this challenge is crucial for the sustainability of space operations. To plan a safe trajectory in the presence of moving space debris, an integrated approach of artificial potential field and sliding mode controller is proposed and implemented in this paper. The relative 6-DOF kinematics and dynamics of the spacecraft is modelled in the framework of geometric mechanics with the relative configuration expressed through exponential coordinates. Various collision avoidance guidance algorithms have been proposed in the literature but the Artificial Potential Field guidance algorithm is computationally efficient and enables real-time path adjustments to avoid collision with obstacles. However, it is prone to issues such as local minima. In literature, local minima issue is typically avoided by either redefining the potential function such as adding vorticity or by employing search techniques which are computationally expensive. To address these challenges, a physics-informed APF is proposed in this paper where Hamiltonian mechanics is used instead of the traditional Newtonian mechanics-based approach. In this approach, instead of relying on attractive and repulsive forces for path planning, the Hamiltonian approach uses the potential field to define a path of minimum potential. Additionally, to track the desired trajectory planned by the guidance algorithm within a fixed-time frame, a non-singular fixed-time sliding mode controller (FTSMC) is used. The proposed fixed-time sliding surface not only ensures fixed-time convergence of system states but also guarantees the global stability of the closed-loop system without singularity. The simulation results presented support the claims made.
SecuLEx: a Secure Limit Exchange Market for Dynamic Operating Envelopes
Distributed energy resources (DERs) are transforming power networks, challenging traditional operational methods, and requiring new coordination mechanisms. To address this challenge, this paper introduces SecuLEx (Secure Limit Exchange), a new market-based paradigm to allocate power injection and withdrawal limits that guarantee network security during time periods, called dynamic operating envelopes (DOEs). Under this paradigm, distribution system operators (DSOs) assign initial DOEs to customers. These limits can be exchanged afterward through a market, allowing customers to reallocate them according to their needs while ensuring network operational constraints. We formalize SecuLEx and illustrate DOE allocation and market exchanges on a small-scale low-voltage (LV) network, demonstrating that both procedures are computationally tractable. In this example, SecuLEx reduces renewable curtailment and improves grid utilization and social welfare compared to traditional approaches.
Closed-loop control of sloshing fuel in a spinning spacecraft
New-generation space missions require satellites to carry substantial amounts of liquid propellant, making it essential to analyse the coupled control-structure-propellant dynamics in detail. While Computational Fluid Dynamics (CFD) offers high-fidelity predictions, its computational cost limits its use in iterative design. Equivalent Mechanical Models (EMMs) provide a faster alternative, though their predictive performance, especially in closed-loop scenarios, remains largely unexplored. This work presents a comparative analysis of a spacecraft under feedback control, using both CFD and a reduced-order sloshing model. Results show good agreement, validating the simplified model for the manoeuvrer considered. This validation enables efficient sensitivity and stability studies, offering a practical tool for early-stage spacecraft design.
General formulation of an analytic, Lipschitz continuous control allocation for thrust-vectored controlled rigid-bodies
This study introduces a systematic and scalable method for arbitrary rigid-bodies equipped with vectorized thrusters. Two novel solutions are proposed: a closed-form, Lipschitz continuous mapping that ensures smooth actuator orientation references, and a convex optimization formulation capable of handling practical actuator constraints such as thrust saturation and angular rate limits. Both methods leverage the null-space structure of the allocation mapping to perform singularity avoidance while generating sub-optimal yet practical solutions. The effectiveness and generality of the proposed framework are demonstrated through numerical simulations on a 3DOF marine vessel and a 6DOF aerial quadcopter.
Optimizing BCI Rehabilitation Protocols for Stroke: Exploring Task Design and Training Duration
Stroke is a leading cause of long-term disability and the second most common cause of death worldwide. Although acute treatments have advanced, recovery remains challenging and limited. Brain-computer interfaces (BCIs) have emerged as a promising tool for post-stroke rehabilitation by promoting neuroplasticity. However, clinical outcomes remain variable, and optimal protocols have yet to be established. This study explores strategies to optimize BCI-based rehabilitation by comparing motor imagery of affected hand movement versus rest, instead of the conventional left-versus-right motor imagery. This alternative aims to simplify the task and address the weak contralateral activation commonly observed in stroke patients. Two datasets, one from healthy individuals and one from stroke patients, were used to evaluate the proposed approach. The results showed improved performance using both FBCSP and EEGNet. Additionally, we investigated the impact of session duration and found that shorter training sessions produced better BCI performance than longer sessions.
comment: 4 pages, 4 figures, accepted for 8th IEEE ENBENG Conference
A Stable, Accurate and Well-Conditioned Time-Domain PMCHWT Formulation
This paper introduces a new boundary element formulation for transient electromagnetic scattering by homogeneous dielectric objects based on the time-domain PMCHWT equation. To address dense-mesh breakdown, a multiplicative Calderon preconditioner utilizing a modified static electric field integral operator is employed. Large-timestep breakdown and late-time instability are simultaneously resolved by rescaling the Helmholtz components leveraging the quasi-Helmholtz projectors and using temporal differentiation and integration as rescaling operators. This rescaling also balances the loop and star components at large timesteps, improving solution accuracy. The resulting discrete system is solved using a marching-on-in-time scheme and iterative solvers. Numerical experiments for simply- and multiply-connected dielectric scatterers, including highly non-smooth geometries, corroborate the accuracy, stability, and efficiency of the proposed approach.
comment: 12 pages, 5 figures
Multi-level informed optimization via decomposed Kriging for large design problems under uncertainty
Engineering design involves demanding models encompassing many decision variables and uncontrollable parameters. In addition, unavoidable aleatoric and epistemic uncertainties can be very impactful and add further complexity. The state-of-the-art adopts two steps, uncertainty quantification and design optimization, to optimize systems under uncertainty by means of robust or stochastic metrics. However, conventional scenario-based, surrogate-assisted, and mathematical programming methods are not sufficiently scalable to be affordable and precise in large and complex cases. Here, a multi-level approach is proposed to accurately optimize resource-intensive, high-dimensional, and complex engineering problems under uncertainty with minimal resources. A non-intrusive, fast-scaling, Kriging-based surrogate is developed to map the combined design/parameter domain efficiently. Multiple surrogates are adaptively updated by hierarchical and orthogonal decomposition to leverage the fewer and most uncertainty-informed data. The proposed method is statistically compared to the state-of-the-art via an analytical testbed and is shown to be concurrently faster and more accurate by orders of magnitude.
comment: 34 pages, 18 figures
Topology optimization of nonlinear forced response curves via reduction on spectral submanifolds
Forced response curves (FRCs) of nonlinear systems can exhibit complex behaviors, including hardening/softening behavior and bifurcations. Although topology optimization holds great potential for tuning these nonlinear dynamic responses, its use in high-dimensional systems is limited by the high cost of repeated response and sensitivity analyses. To address this challenge, we employ the spectral submanifolds (SSMs) reduction theory, which reformulates the periodic response as the equilibria of an associated reduced-order model (ROM). This enables efficient and analytic evaluation of both response amplitudes and their sensitivities. Based on the SSM-based ROM, we formulate optimization problems that optimize the peak amplitude, the hardening/softening behavior, and the distance between two saddle-node bifurcations for an FRC. The proposed method is applied to the design of nonlinear MEMS devices, achieving targeted performance optimization. This framework provides a practical and efficient strategy for incorporating nonlinear dynamic effects into the topology optimization of structures.
comment: 26 pages, 12 figures. Submitted to Nonlinear Dynamics
Multi-Level Multi-Fidelity Methods for Path Integral and Safe Control
Sampling-based approaches are widely used in systems without analytic models to estimate risk or find optimal control. However, gathering sufficient data in such scenarios can be prohibitively costly. On the other hand, in many situations, low-fidelity models or simulators are available from which samples can be obtained at low cost. In this paper, we propose an efficient approach for risk quantification and path integral control that leverages such data from multiple models with heterogeneous sampling costs. A key technical novelty of our approach is the integration of Multi-level Monte Carlo (MLMC) and Multi-fidelity Monte Carlo (MFMC) that enable data from different time and state representations (system models) to be jointly used to reduce variance and improve sampling efficiency. We also provide theoretical analysis of the proposed method and show that our estimator is unbiased and consistent under mild conditions. Finally, we demonstrate via numerical simulation that the proposed method has improved computation (sampling costs) vs. accuracy trade-offs for risk quantification and path integral control.
Space Logistics Analysis and Incentive Design for Commercialization of Orbital Debris Remediation
As orbital debris continues to become a higher priority for the space industry, there is a need to explore how partnerships between the public and private space sector may aid in addressing this issue. This research develops a space logistics framework for planning orbital debris remediation missions, providing a quantitative basis for partnerships that are mutually beneficial between space operators and debris remediators. By integrating network-based space logistics and game theory, we illuminate the high-level costs of remediating orbital debris, and the surplus that stands to be shared as a result. These findings indicate significant progress toward the continued development of a safe, sustainable, and profitable space economy.
comment: 28 pages, 14 figures, Journal of Spacecraft and Rockets (Articles in Advance)
EB-MBD: Emerging-Barrier Model-Based Diffusion for Safe Trajectory Optimization in Highly Constrained Environments
We propose enforcing constraints on Model-Based Diffusion by introducing emerging barrier functions inspired by interior point methods. We show that constraints on Model-Based Diffusion can lead to catastrophic performance degradation, even on simple 2D systems due to sample inefficiency in the Monte Carlo approximation of the score function. We introduce Emerging-Barrier Model-Based Diffusion (EB-MBD) which uses progressively introduced barrier constraints to avoid these problems, significantly improving solution quality, without the need for computationally expensive operations such as projections. We analyze the sampling liveliness of samples each iteration to inform barrier parameter scheduling choice. We demonstrate results for 2D collision avoidance and a 3D underwater manipulator system and show that our method achieves lower cost solutions than Model-Based Diffusion, and requires orders of magnitude less computation time than projection based methods.
Some Reflections on Sliding Mode Designs in Control Systems: An Example of Adaptive Tracking Control for Simple Mechanical Systems With Friction Without Measurement of Velocity
The objective of this note is to share some reflections of the authors regarding the use of sliding mode designs in control systems. We believe the abundant, and ever increasing, appearance of this kind of works on our scientific publications deserves some critical evaluation of their actual role, relevance and pertinence. First, we discuss the procedure followed by most of these designs -- illustrated with examples from the literature. Second, we bring to the readers attention several aspects of the control problem, central in classical designs, which are disregarded in the sliding mode literature. Finally, to illustrate with an specific example our previous considerations, we compare the performance of two adaptive tracking controllers for a simple one degree of freedom mechanical systems with unknown parameters and static and Coulomb friction -- that do not rely on the measurement of velocity.
Optimal Control with Lyapunov Stability Guarantees for Space Applications
This paper investigates the infinite horizon optimal control problem (OCP) for space applications characterized by nonlinear dynamics. The proposed approach divides the problem into a finite horizon OCP with a regularized terminal cost, guiding the system towards a terminal set, and an infinite horizon linear regulation phase within this set. This strategy guarantees global asymptotic stability under specific assumptions. Our method maintains the system's fully nonlinear dynamics until it reaches the terminal set, where the system dynamics is linearized. As the terminal set converges to the origin, the difference in optimal cost incurred reduces to zero, guaranteeing an efficient and stable solution. The approach is tested through simulations on three problems: spacecraft attitude control, rendezvous maneuver, and soft landing. In spacecraft attitude control, we focus on achieving precise orientation and stabilization. For rendezvous maneuvers, we address the navigation of a chaser to meet a target spacecraft. For the soft landing problem, we ensure a controlled descent and touchdown on a planetary surface. We provide numerical results confirming the effectiveness of the proposed method in managing these nonlinear dynamics problems, offering robust solutions essential for successful space missions.
Joint Detection, Channel Estimation and Interference Nulling for Terrestrial-Satellite Downlink Co-Existence in the Upper Mid-Band
The upper mid-band FR3 spectrum (7-24 GHz) has garnered significant interest for future cellular services. However, utilizing a large portion of this band requires careful interference coordination with incumbent satellite systems. This paper investigates interference from high-power terrestrial base stations (TN-BSs) to satellite downlink receivers. A central challenge is that the victim receivers, i.e., ground-based non-terrestrial user equipment (NTN-UEs) such as satellite customer premises equipment, must first be detected and their channels estimated before the TN-BS can effectively place nulls in their directions. We explore a potential solution where NTN-UEs periodically transmit preambles or beacon signals that TN-BSs can use for detection and channel estimation. The performance of this nulling approach is analyzed in a simplified scenario with a single victim, revealing the interplay between path loss and estimation quality in determining nulling performance. To further validate the method, we conduct a detailed multi-user site-specific ray-tracing (RT) simulation in a rural environment. The results show that the proposed nulling approach is effective under realistic parameters, even with high densities of victim units, although TN-BS may require a substantial number of antennas.
comment: Accepted for publication in the Proceedings of GlobeCom 2025
Whole Body Model Predictive Control for Spin-Aware Quadrupedal Table Tennis ICRA 2026
Developing table tennis robots that mirror human speed, accuracy, and ability to predict and respond to the full range of ball spins remains a significant challenge for legged robots. To demonstrate these capabilities we present a system to play dynamic table tennis for quadrupedal robots that integrates high speed perception, trajectory prediction, and agile control. Our system uses external cameras for high-speed ball localization, physical models with learned residuals to infer spin and predict trajectories, and a novel model predictive control (MPC) formulation for agile full-body control. Notably, a continuous set of stroke strategies emerge automatically from different ball return objectives using this control paradigm. We demonstrate our system in the real world on a Spot quadruped, evaluate accuracy of each system component, and exhibit coordination through the system's ability to aim and return balls with varying spin types. As a further demonstration, the system is able to rally with human players.
comment: Submitted to appear in IEEE ICRA 2026
When to Reason: Semantic Router for vLLM NeurIPS 2025
Large Language Models (LLMs) demonstrate substantial accuracy gains when augmented with reasoning modes such as chain-of-thought and inference-time scaling. However, reasoning also incurs significant costs in inference latency and token usage, with environmental and financial impacts, which are unnecessary for many simple prompts. We present a semantic router that classifies queries based on their reasoning requirements and selectively applies reasoning only when beneficial. Our approach achieves a 10.2 percentage point improvement in accuracy on the MMLU-Pro benchmark while reducing response latency by 47.1% and token consumption by 48.5% compared to direct inference with vLLM. These results demonstrate that semantic routing offers an effective mechanism for striking a balance between accuracy and efficiency in open-source LLM serving systems
comment: 5 pages, excluding references and appendix. To be appeared at Workshop on ML for Systems at NeurIPS 2025, December 6, 2025 https://mlforsystems.org/
Identification and optimal control strategies for the transversal splitting of ultra--cold Bose gases
Splitting a Bose--Einstein condensate (BEC) is a key operation in fundamental physics experiments and emerging quantum technologies, where precise preparation of well--defined initial states requires fast yet coherent control of the condensate's nonlinear dynamics. This work formulates the BEC splitting process as an optimal feedforward control problem based on a physically interpretable, reduced--order model identified from limited experimental data. We introduce a systematic calibration strategy that combines optimal experiment selection and constrained nonlinear parameter estimation, enabling accurate system identification with minimal experimental overhead. Using this calibrated model, we compute energy--optimal trajectories via indirect optimal control to realize shortcuts to adiabaticity (STAs), achieving rapid transitions to the ground state of a double--well potential while suppressing excitations. Experiments confirm that the proposed control framework yields high--fidelity state transfers across multiple configurations, demonstrating its robustness and scalability for quantum control applications.
Resilient Multi-Dimensional Consensus and Distributed Optimization against Agent-Based and Denial-of-Service Attacks
In this paper, we consider the resilient multi-dimensional consensus and distributed optimization problems of multi-agent systems (MASs) in the presence of both agent-based and denial-of-service (DoS) attacks. The considered agent-based attacks can cover malicious, Byzantine, and stubborn agents. The links between agents in the network can be blocked by DoS attacks, which may lead the digraph to be time-varying and even disconnected. The objective is to ensure that the remaining benign agents achieve consensus. To this end, an "auxiliary point"-based resilient control algorithm is proposed for MASs. Under the proposed algorithm, each healthy agent constructs a "safe kernel" utilizing the states of its in-neighbors and updates its state toward a specific point within this kernel at each iteration. If an agent cannot receive its neighbors' states owing to DoS attacks, it will use the states received immediately before the DoS period. Moreover, a resilient multi-dimensional distributed optimization (RMDO) algorithm is also proposed. Theoretical proofs and numerical examples are presented to demonstrate the effectiveness of the proposed algorithms.
Optimization via a Control-Centric Framework
Optimization plays a central role in intelligent systems and cyber-physical technologies, where the speed and reliability of convergence directly impact performance. In control theory, optimization-centric methods are standard: controllers are designed by repeatedly solving optimization problems, as in linear quadratic regulation, $H_\infty$ control, and model predictive control. In contrast, this paper develops a control-centric framework for optimization itself, where algorithms are constructed directly from Lyapunov stability principles rather than being proposed first and analyzed afterward. A key element is the stationarity vector, which encodes first-order optimality conditions and enables Lyapunov-based convergence analysis. By pairing a Lyapunov function with a selectable decay law, we obtain continuous-time dynamics with guaranteed exponential, finite-time, fixed-time, or prescribed-time convergence. Within this framework, we introduce three feedback realizations of increasing restrictiveness: the Hessian-gradient, Newton, and gradient dynamics. Each realization shapes the decay of the stationarity vector to achieve the desired rate. These constructions unify unconstrained optimization, extend naturally to constrained problems via Lyapunov-consistent primal-dual dynamics, and broaden the results for minimax and generalized Nash equilibrium seeking problems beyond exponential stability. The framework provides systematic design tools for optimization algorithms in control and game-theoretic problems.
comment: This work has been submitted to the IEEE for possible publication. 12 pages, 3 figures
Optimal control of continuous-time symmetric systems with unknown dynamics and noisy measurements
An iterative learning algorithm is presented for continuous-time linear-quadratic optimal control problems where the system is externally symmetric with unknown dynamics. Both finite-horizon and infinite-horizon problems are considered. It is shown that the proposed algorithm is globally convergent to the optimal solution and has some advantages over adaptive dynamic programming, including being unbiased under noisy measurements and having a relatively low computational burden. Numerical experiments show the effectiveness of the results.
Product-oriented Product-Process-Resource Asset Network and its Representation in AutomationML for Asset Administration Shell
Current products, especially in the automotive sector, pose complex technical systems having a multi-disciplinary mechatronic nature. Industrial standards supporting system engineering and production typically (i) address the production phase only, but do not cover the complete product life cycle, and (ii) focus on production processes and resources rather than the products themselves. The presented approach is motivated by incorporating the impacts of the end-of-life phase of the product life cycle into the engineering phase. This paper proposes a modeling approach coming up from the Product-Process-Resource (PPR) modeling paradigm. It combines requirements on (i) respecting the product structure as a basis for the model, and (ii) incorporates repairing, remanufacturing, or upcycling within cyber-physical production systems. The proposed model called PoPAN should accompany the product during the entire life cycle as a digital shadow encapsulated within the Asset Administration Shell of a product. To facilitate the adoption of the proposed paradigm, the paper also proposes serialization of the model in the AutomationML data format. The model is demonstrated on a use-case for disassembling electric vehicle batteries to support their remanufacturing for stationary battery applications.
comment: \copyright 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Hierarchical Reinforcement Learning with Low-Level MPC for Multi-Agent Control
Achieving safe and coordinated behavior in dynamic, constraint-rich environments remains a major challenge for learning-based control. Pure end-to-end learning often suffers from poor sample efficiency and limited reliability, while model-based methods depend on predefined references and struggle to generalize. We propose a hierarchical framework that combines tactical decision-making via reinforcement learning (RL) with low-level execution through Model Predictive Control (MPC). For the case of multi-agent systems this means that high-level policies select abstract targets from structured regions of interest (ROIs), while MPC ensures dynamically feasible and safe motion. Tested on a predator-prey benchmark, our approach outperforms end-to-end and shielding-based RL baselines in terms of reward, safety, and consistency, underscoring the benefits of combining structured learning with model-based control.
Revisiting Functional Derivatives in Multi-object Tracking
Probability generating functionals (PGFLs) are efficient and powerful tools for tracking independent objects in clutter. It was shown that PGFLs could be used for the elegant derivation of practical multi-object tracking algorithms, e.g., the probability hypothesis density (PHD) filter. However, derivations using PGFLs use the so-called functional derivatives whose definitions usually appear too complicated or heuristic, involving Dirac delta ``functions''. This paper begins by comparing different definitions of functional derivatives and exploring their relationships and implications for practical applications. It then proposes a rigorous definition of the functional derivative, utilizing straightforward yet precise mathematics for clarity. Key properties of the functional derivative are revealed and discussed.
comment: submitted to IEEE Transactions on Information Theory
Carleman-Fourier linearization of nonlinear real dynamical systems with quasi-periodic fields
This paper presents Carleman-Fourier linearization for analyzing nonlinear real dynamical systems with periodic vector fields. Using Fourier basis functions, this novel framework transforms such dynamical systems into equivalent infinite-dimensional linear dynamical systems. In this paper, we establish the exponential convergence of the primary block in the finite-section approximation of this linearized system to the state vector of the original nonlinear system. To showcase the efficacy of our approach, we apply it to the Kuramoto model, a prominent model for coupled oscillators. The results demonstrate promising accuracy in approximating the original system's behavior.
comment: Discrete and Continuous Dynamical Systems Series B, accepted
Foundation Models for Structural Health Monitoring
Structural Health Monitoring (SHM) is a critical task for ensuring the safety and reliability of civil infrastructures, typically realized on bridges and viaducts by means of vibration monitoring. In this paper, we propose for the first time the use of Transformer neural networks, with a Masked Auto-Encoder architecture, as Foundation Models for SHM. We demonstrate the ability of these models to learn generalizable representations from multiple large datasets through self-supervised pre-training, which, coupled with task-specific fine-tuning, allows them to outperform state-of-the-art traditional methods on diverse tasks, including Anomaly Detection (AD) and Traffic Load Estimation (TLE). We then extensively explore model size versus accuracy trade-offs and experiment with Knowledge Distillation (KD) to improve the performance of smaller Transformers, enabling their embedding directly into the SHM edge nodes. We showcase the effectiveness of our foundation models using data from three operational viaducts. For AD, we achieve a near-perfect 99.9% accuracy with a monitoring time span of just 15 windows. In contrast, a state-of-the-art method based on Principal Component Analysis (PCA) obtains its first good result (95.03% accuracy), only considering 120 windows. On two different TLE tasks, our models obtain state-of-the-art performance on multiple evaluation metrics (R$^2$ score, MAE% and MSE%). On the first benchmark, we achieve an R$^2$ score of 0.97 and 0.90 for light and heavy vehicle traffic, respectively, while the best previous approach (a Random Forest) stops at 0.91 and 0.84. On the second one, we achieve an R$^2$ score of 0.54 versus the 0.51 of the best competitor method, a Long-Short Term Memory network.
comment: 17 pages, 6 tables, 9 figures
A Long-Duration Autonomy Approach to Connected and Automated Vehicles
In this article, we present a long-duration autonomy approach for the control of connected and automated vehicles (CAVs) operating in a transportation network. In particular, we focus on the performance of CAVs at traffic bottlenecks, including roundabouts, merging roadways, and intersections. We take a principled approach based on optimal control, and derive a reactive controller with guarantees on safety, performance, and energy efficiency. We guarantee safety through high order control barrier functions (HOCBFs), which we ``lift'' to first order CBFs using time-optimal motion primitives. This yields a set of first-order CBFs that are compatible with the control bounds. We demonstrate the performance of our approach in simulation and compare it to an optimal control-based approach.
comment: 8 pages, 3 figures
MARLIN: Multi-Agent Reinforcement Learning with Murmuration Intelligence and LLM Guidance for Reservoir Management
As climate change intensifies extreme weather events, water disasters pose growing threats to global communities, making adaptive reservoir management critical for protecting vulnerable populations and ensuring water security. Modern water resource management faces unprecedented challenges from cascading uncertainties propagating through interconnected reservoir networks. These uncertainties, rooted in physical water transfer losses and environmental variability, make precise control difficult. For example, sending 10 tons downstream may yield only 8-12 tons due to evaporation and seepage. Traditional centralized optimization approaches suffer from exponential computational complexity and cannot effectively handle such real-world uncertainties, while existing multi-agent reinforcement learning (MARL) methods fail to achieve effective coordination under uncertainty. To address these challenges, we present MARLIN, a decentralized reservoir management framework inspired by starling murmurations intelligence. Integrating bio-inspired alignment, separation, and cohesion rules with MARL, MARLIN enables individual reservoirs to make local decisions while achieving emergent global coordination. In addition, a LLM provides real-time reward shaping signals, guiding agents to adapt to environmental changes and human-defined preferences. Experiments on real-world USGS data show that MARLIN improves uncertainty handling by 23\%, cuts computation by 35\%, and accelerates flood response by 68\%, exhibiting super-linear coordination, with complexity scaling 5.4x from 400 to 10,000 nodes. These results demonstrate MARLIN's potential for disaster prevention and protecting communities through intelligent, scalable water resource management.
A Survey of Foundation Models for IoT: Taxonomy and Criteria-Based Analysis
Foundation models have gained growing interest in the IoT domain due to their reduced reliance on labeled data and strong generalizability across tasks, which address key limitations of traditional machine learning approaches. However, most existing foundation model based methods are developed for specific IoT tasks, making it difficult to compare approaches across IoT domains and limiting guidance for applying them to new tasks. This survey aims to bridge this gap by providing a comprehensive overview of current methodologies and organizing them around four shared performance objectives by different domains: efficiency, context-awareness, safety, and security & privacy. For each objective, we review representative works, summarize commonly-used techniques and evaluation metrics. This objective-centric organization enables meaningful cross-domain comparisons and offers practical insights for selecting and designing foundation model based solutions for new IoT tasks. We conclude with key directions for future research to guide both practitioners and researchers in advancing the use of foundation models in IoT applications.
comment: Accepted by CCF Transactions on Pervasive Computing and Interaction (CCF TPCI)
Fast Online Adaptive Neural MPC via Meta-Learning
Data-driven model predictive control (MPC) has demonstrated significant potential for improving robot control performance in the presence of model uncertainties. However, existing approaches often require extensive offline data collection and computationally intensive training, limiting their ability to adapt online. To address these challenges, this paper presents a fast online adaptive MPC framework that leverages neural networks integrated with Model-Agnostic Meta-Learning (MAML). Our approach focuses on few-shot adaptation of residual dynamics - capturing the discrepancy between nominal and true system behavior - using minimal online data and gradient steps. By embedding these meta-learned residual models into a computationally efficient L4CasADi-based MPC pipeline, the proposed method enables rapid model correction, enhances predictive accuracy, and improves real-time control performance. We validate the framework through simulation studies on a Van der Pol oscillator, a Cart-Pole system, and a 2D quadrotor. Results show significant gains in adaptation speed and prediction accuracy over both nominal MPC and nominal MPC augmented with a freshly initialized neural network, underscoring the effectiveness of our approach for real-time adaptive robot control.
Characterizing and Optimizing Real-Time Optimal Control for Embedded SoCs
Resource-limited robots face significant challenges in executing computationally intensive tasks, such as locomotion and manipulation, particularly for real-time optimal control algorithms like Model Predictive Control (MPC). This paper provides a comprehensive design space exploration to identify optimal hardware computation architectures for these demanding model-based control algorithms. We profile and optimize representative architectural designs, including general-purpose scalar CPUs, vector processors, and specialized accelerators. By characterizing kernel-level benchmarks and end-to-end robotic scenarios, including a hardware-in-the-loop evaluation on a fabricated RISC-V multi-core vector SoC, we present a quantitative comparison of performance, area, and utilization across distinct architectural design points. Our findings demonstrate that targeted architectural modifications, coupled with deep software and system optimizations, enable up to 3.71x speedups for MPC, resulting in up to 27% system-level power reductions while completing robotic tasks. Finally, we propose a code generation flow designed to simplify the complex engineering effort required for mapping robotic workloads onto specialized architectures.
Hybrid Feedback Control for Global Navigation with Locally Optimal Obstacle Avoidance in n-Dimensional Spaces
We present a hybrid feedback control framework for autonomous robot navigation in n-dimensional Euclidean spaces cluttered with spherical obstacles. The proposed approach ensures safe and global navigation towards a target location by dynamically switching between two operational modes: motion-to-destination and locally optimal obstacle-avoidance. It produces continuous velocity inputs, ensures collision-free trajectories and generates locally optimal obstacle avoidance maneuvers. Unlike existing methods, the proposed framework is compatible with range sensors, enabling navigation in both a priori known and unknown environments. Extensive simulations in 2D and 3D settings, complemented by experimental validation on a TurtleBot 4 platform, confirm the efficacy and robustness of the approach. Our results demonstrate shorter paths and smoother trajectories compared to state-of-the-art methods, while maintaining computational efficiency and real-world feasibility.
Safe Autonomous Environmental Contact for Soft Robots using Control Barrier Functions
Robots built from soft materials will inherently apply lower environmental forces than their rigid counterparts, and therefore may be more suitable in sensitive settings with unintended contact. However, these robots' applied forces result from both their design and their control system in closed-loop, and therefore, ensuring bounds on these forces requires controller synthesis for safety as well. This article introduces the first feedback controller for a soft manipulator that formally meets a safety specification with respect to environmental contact. In our proof-of-concept setting, the robot's environment has known geometry and is deformable with a known elastic modulus. Our approach maps a bound on applied forces to a safe set of positions of the robot's tip via predicted deformations of the environment. Then, a quadratic program with Control Barrier Functions in its constraints is used to supervise a nominal feedback signal, verifiably maintaining the robot's tip within this safe set. Hardware experiments on a multi-segment soft pneumatic robot demonstrate that the proposed framework successfully maintains a positive safety margin. This framework represents a fundamental shift in perspective on control and safety for soft robots, implementing a formally verifiable logic specification on their pose and contact forces.
comment: 8 pages, 9 figures
Observability for Nonlinear Systems: Connecting Variational Dynamics, Lyapunov Exponents, and Empirical Gramians
Observability quantification is a key problem in dynamic network sciences. While it has been thoroughly studied for linear systems, observability quantification for nonlinear networks is less intuitive and more cumbersome. One common approach to quantify observability for nonlinear systems is via the Empirical Gramian (Empr-Gram) -- a generalized form of the Gramian of linear systems. In this paper, we produce three new results. First, we establish that a variational form of discrete-time autonomous nonlinear systems yields a so-called Variational Gramian (Var-Gram) that is equivalent to the classic Empr-Gram; the former being easier to compute than the latter. Via Lyapunov exponents derived from Lyapunov's direct method, the paper's second result derives connections between existing observability measures and Var-Gram. The third result demonstrates the applicability of these new notions for sensor selection/placement in nonlinear systems. Numerical case studies demonstrate these three developments and their merits.
Systems and Control (EESS)
Reliability of Single-Level Equality-Constrained Inverse Optimal Control
Inverse optimal control (IOC) allows the retrieval of optimal cost function weights, or behavioral parameters, from human motion. The literature on IOC uses methods that are either based on a slow bilevel process or a fast but noise-sensitive minimization of optimality condition violation. Assuming equality-constrained optimal control models of human motion, this article presents a faster but robust approach to solving IOC using a single-level reformulation of the bilevel method and yields equivalent results. Through numerical experiments in simulation, we analyze the robustness to noise of the proposed single-level reformulation to the bilevel IOC formulation with a human-like planar reaching task that is used across recent studies. The approach shows resilience to very large levels of noise and reduces the computation time of the IOC on this task by a factor of 15 when compared to a classical bilevel implementation.
comment: 8 pages, 3 figures
Learning to Mitigate Post-Outage Load Surges: A Data-Driven Framework for Electrifying and Decarbonizing Grids
Electrification and decarbonization are transforming power system demand and recovery dynamics, yet their implications for post-outage load surges remain poorly understood. Here we analyze a metropolitan-scale heterogeneous dataset for Indianapolis comprising 30,046 feeder-level outages between 2020 and 2024, linked to smart meters and submetering, to quantify the causal impact of electric vehicles (EVs), heat pumps (HPs) and distributed energy resources (DERs) on restoration surges. Statistical analysis and causal forest inference demonstrate that rising penetrations of all three assets significantly increase surge ratios, with effects strongly modulated by restoration timing, outage duration and weather conditions. We develop a component-aware multi-task Transformer estimator that disaggregates EV, HP and DER contributions, and apply it to project historical outages under counterfactual 2035 adoption pathways. In a policy-aligned pathway, evening restorations emerge as the binding reliability constraint, with exceedance probabilities of 0.057 when 30\% of system load is restored within the first 15 minutes. Mitigation measures, probabilistic EV restarts, short thermostat offsets and accelerated DER reconnection, reduce exceedance to 0.019 and eliminate it entirely when 20\% or less of system load is restored. These results demonstrate that transition-era surges are asset-driven and causally linked to electrification and decarbonization, but can be effectively managed through integrated operational strategies.
Underground Power Distribution System Restoration Using Inverter Based Resources
Underground power distribution systems (PDSs) are increasingly deployed in urban areas. The integration of smart devices including smart switchgears, pad-mounted distribution transformers and inverter-based resources (IBRs) enhance system resilience, however simultaneously introducing unique challenges. The challenges include inrush currents caused by trapped charges in underground cables, ferroresonance in distribution transformers during energization, and three-phase load imbalance resulting from single-phase underground laterals. To address these issues, this paper proposes an underground PDS restoration framework using IBRs. Firstly, an underground cable energization model is developed to quantify inrush current by analyzing voltage differences across both switchgear terminals. Secondly, a distribution transformer energization model is proposed to evaluate ferroresonance using Q-factor constraints based on underground cable capacitance and damping resistance. Thirdly, a phase-swapping model is proposed to improve load balancing by dynamically reassigning lateral-phase connections through smart switchgears. The proposed models are further integrated into a mixed-integer nonlinear programming (MINLP) formulation to maximize the total weighted restored load while constraining inrush currents, ferroresonance, and phase imbalance. To address the nonlinearity induced by impedance matrix reordering during phase swapping, a permutation-based linearization technique is proposed. Finally, case studies on an underground PDS established based on IEEE 123-Node Test Feeder validate the effectiveness of the proposed strategy in improving uderground PDS restoration performance.
Quantum memory optimisation using finite-horizon, decoherence time and discounted mean-square performance criteria
This paper is concerned with open quantum memory systems for approximately retaining quantum information, such as initial dynamic variables or quantum states to be stored over a bounded time interval. In the Heisenberg picture of quantum dynamics, the deviation of the system variables from their initial values lends itself to closed-form computation in terms of tractable moment dynamics for open quantum harmonic oscillators and finite-level quantum systems governed by linear or quasi-linear Hudson-Parthasarathy quantum stochastic differential equations, respectively. This tractability is used in a recently proposed optimality criterion for varying the system parameters so as to maximise the memory decoherence time when the mean-square deviation achieves a given critical threshold. The memory decoherence time maximisation approach is extended beyond the previously considered low-threshold asymptotic approximation and to Schr\"{o}dinger type mean-square deviation functionals for the reduced system state governed by the Lindblad master equation. We link this approach with the minimisation of the mean-square deviation functionals at a finite time horizon and with their discounted version which quantifies the averaged performance of the quantum system as a temporary memory under a Poisson flow of storage requests.
comment: 8 pages, 1 figure, submitted to IFAC World Congress 2026
CPU- and GPU-Based Parallelization of the Robust Reference Governor
Constraint management is a central challenge in modern control systems. A solution is the Reference Governor (RG), which is an add-on strategy to pre-stabilized feedback control systems to enforce state and input constraints by shaping the reference command. While robust formulations of RG exist for linear systems, their extension to nonlinear systems is often computationally intractable. This paper develops a scenario-based robust RG formulation for nonlinear systems and investigates its parallel implementation on multi-core CPUs and CUDA-enabled GPUs. We analyze the computational structure of the algorithm, identify parallelization opportunities, and implement the resulting schemes on modern parallel hardware. Benchmarking on a nonlinear hydrogen fuel cell model demonstrates order-of-magnitude speedups (by as much as three orders of magnitude) compared to sequential implementations.
A Control Allocation Algorithm for Hypersonic Glide Vehicles with Input Limitations
Hypersonic glide vehicles (HGVs) operate in challenging flight regimes characterized by strong nonlinearities in actuation and stringent physical constraints. These include state-dependent actuator limitations, asymmetric control bounds, and thermal loads that vary with maneuvering conditions. This paper introduces an iterative control allocation method to address these challenges in real time. The proposed algorithm searches for control inputs that achieve the desired moment commands while respecting constraints on input magnitude and rate. For slender HGV configurations, thermal loads and drag generation are strongly correlated-lower drag typically results in reduced surface heating. By embedding drag-sensitive soft constraints, the method improves energy efficiency and implicitly reduces surface temperatures, lowering the vehicle's infrared signature. These features are particularly advantageous for long-range military operations that require low observability. The approach is demonstrated using the DLR's Generic Hypersonic Glide Vehicle 2 (GHGV-2) simulation model. The results confirm the method's effectiveness in maintaining control authority under realistic, constrained flight conditions.
comment: 38 pages, 20 figures, submitted to the AIAA Journal of Guidance, Control, and Dynamics
Satellite Navigation and Control using Physics-Informed Artificial Potential Field and Sliding Mode Controller
Increase in the number of space exploration missions has led to the accumulation of space debris, posing risk of collision with the operational satellites. Addressing this challenge is crucial for the sustainability of space operations. To plan a safe trajectory in the presence of moving space debris, an integrated approach of artificial potential field and sliding mode controller is proposed and implemented in this paper. The relative 6-DOF kinematics and dynamics of the spacecraft is modelled in the framework of geometric mechanics with the relative configuration expressed through exponential coordinates. Various collision avoidance guidance algorithms have been proposed in the literature but the Artificial Potential Field guidance algorithm is computationally efficient and enables real-time path adjustments to avoid collision with obstacles. However, it is prone to issues such as local minima. In literature, local minima issue is typically avoided by either redefining the potential function such as adding vorticity or by employing search techniques which are computationally expensive. To address these challenges, a physics-informed APF is proposed in this paper where Hamiltonian mechanics is used instead of the traditional Newtonian mechanics-based approach. In this approach, instead of relying on attractive and repulsive forces for path planning, the Hamiltonian approach uses the potential field to define a path of minimum potential. Additionally, to track the desired trajectory planned by the guidance algorithm within a fixed-time frame, a non-singular fixed-time sliding mode controller (FTSMC) is used. The proposed fixed-time sliding surface not only ensures fixed-time convergence of system states but also guarantees the global stability of the closed-loop system without singularity. The simulation results presented support the claims made.
SecuLEx: a Secure Limit Exchange Market for Dynamic Operating Envelopes
Distributed energy resources (DERs) are transforming power networks, challenging traditional operational methods, and requiring new coordination mechanisms. To address this challenge, this paper introduces SecuLEx (Secure Limit Exchange), a new market-based paradigm to allocate power injection and withdrawal limits that guarantee network security during time periods, called dynamic operating envelopes (DOEs). Under this paradigm, distribution system operators (DSOs) assign initial DOEs to customers. These limits can be exchanged afterward through a market, allowing customers to reallocate them according to their needs while ensuring network operational constraints. We formalize SecuLEx and illustrate DOE allocation and market exchanges on a small-scale low-voltage (LV) network, demonstrating that both procedures are computationally tractable. In this example, SecuLEx reduces renewable curtailment and improves grid utilization and social welfare compared to traditional approaches.
Closed-loop control of sloshing fuel in a spinning spacecraft
New-generation space missions require satellites to carry substantial amounts of liquid propellant, making it essential to analyse the coupled control-structure-propellant dynamics in detail. While Computational Fluid Dynamics (CFD) offers high-fidelity predictions, its computational cost limits its use in iterative design. Equivalent Mechanical Models (EMMs) provide a faster alternative, though their predictive performance, especially in closed-loop scenarios, remains largely unexplored. This work presents a comparative analysis of a spacecraft under feedback control, using both CFD and a reduced-order sloshing model. Results show good agreement, validating the simplified model for the manoeuvrer considered. This validation enables efficient sensitivity and stability studies, offering a practical tool for early-stage spacecraft design.
General formulation of an analytic, Lipschitz continuous control allocation for thrust-vectored controlled rigid-bodies
This study introduces a systematic and scalable method for arbitrary rigid-bodies equipped with vectorized thrusters. Two novel solutions are proposed: a closed-form, Lipschitz continuous mapping that ensures smooth actuator orientation references, and a convex optimization formulation capable of handling practical actuator constraints such as thrust saturation and angular rate limits. Both methods leverage the null-space structure of the allocation mapping to perform singularity avoidance while generating sub-optimal yet practical solutions. The effectiveness and generality of the proposed framework are demonstrated through numerical simulations on a 3DOF marine vessel and a 6DOF aerial quadcopter.
Optimizing BCI Rehabilitation Protocols for Stroke: Exploring Task Design and Training Duration
Stroke is a leading cause of long-term disability and the second most common cause of death worldwide. Although acute treatments have advanced, recovery remains challenging and limited. Brain-computer interfaces (BCIs) have emerged as a promising tool for post-stroke rehabilitation by promoting neuroplasticity. However, clinical outcomes remain variable, and optimal protocols have yet to be established. This study explores strategies to optimize BCI-based rehabilitation by comparing motor imagery of affected hand movement versus rest, instead of the conventional left-versus-right motor imagery. This alternative aims to simplify the task and address the weak contralateral activation commonly observed in stroke patients. Two datasets, one from healthy individuals and one from stroke patients, were used to evaluate the proposed approach. The results showed improved performance using both FBCSP and EEGNet. Additionally, we investigated the impact of session duration and found that shorter training sessions produced better BCI performance than longer sessions.
comment: 4 pages, 4 figures, accepted for 8th IEEE ENBENG Conference
A Stable, Accurate and Well-Conditioned Time-Domain PMCHWT Formulation
This paper introduces a new boundary element formulation for transient electromagnetic scattering by homogeneous dielectric objects based on the time-domain PMCHWT equation. To address dense-mesh breakdown, a multiplicative Calderon preconditioner utilizing a modified static electric field integral operator is employed. Large-timestep breakdown and late-time instability are simultaneously resolved by rescaling the Helmholtz components leveraging the quasi-Helmholtz projectors and using temporal differentiation and integration as rescaling operators. This rescaling also balances the loop and star components at large timesteps, improving solution accuracy. The resulting discrete system is solved using a marching-on-in-time scheme and iterative solvers. Numerical experiments for simply- and multiply-connected dielectric scatterers, including highly non-smooth geometries, corroborate the accuracy, stability, and efficiency of the proposed approach.
comment: 12 pages, 5 figures
Multi-level informed optimization via decomposed Kriging for large design problems under uncertainty
Engineering design involves demanding models encompassing many decision variables and uncontrollable parameters. In addition, unavoidable aleatoric and epistemic uncertainties can be very impactful and add further complexity. The state-of-the-art adopts two steps, uncertainty quantification and design optimization, to optimize systems under uncertainty by means of robust or stochastic metrics. However, conventional scenario-based, surrogate-assisted, and mathematical programming methods are not sufficiently scalable to be affordable and precise in large and complex cases. Here, a multi-level approach is proposed to accurately optimize resource-intensive, high-dimensional, and complex engineering problems under uncertainty with minimal resources. A non-intrusive, fast-scaling, Kriging-based surrogate is developed to map the combined design/parameter domain efficiently. Multiple surrogates are adaptively updated by hierarchical and orthogonal decomposition to leverage the fewer and most uncertainty-informed data. The proposed method is statistically compared to the state-of-the-art via an analytical testbed and is shown to be concurrently faster and more accurate by orders of magnitude.
comment: 34 pages, 18 figures
Topology optimization of nonlinear forced response curves via reduction on spectral submanifolds
Forced response curves (FRCs) of nonlinear systems can exhibit complex behaviors, including hardening/softening behavior and bifurcations. Although topology optimization holds great potential for tuning these nonlinear dynamic responses, its use in high-dimensional systems is limited by the high cost of repeated response and sensitivity analyses. To address this challenge, we employ the spectral submanifolds (SSMs) reduction theory, which reformulates the periodic response as the equilibria of an associated reduced-order model (ROM). This enables efficient and analytic evaluation of both response amplitudes and their sensitivities. Based on the SSM-based ROM, we formulate optimization problems that optimize the peak amplitude, the hardening/softening behavior, and the distance between two saddle-node bifurcations for an FRC. The proposed method is applied to the design of nonlinear MEMS devices, achieving targeted performance optimization. This framework provides a practical and efficient strategy for incorporating nonlinear dynamic effects into the topology optimization of structures.
comment: 26 pages, 12 figures. Submitted to Nonlinear Dynamics
Multi-Level Multi-Fidelity Methods for Path Integral and Safe Control
Sampling-based approaches are widely used in systems without analytic models to estimate risk or find optimal control. However, gathering sufficient data in such scenarios can be prohibitively costly. On the other hand, in many situations, low-fidelity models or simulators are available from which samples can be obtained at low cost. In this paper, we propose an efficient approach for risk quantification and path integral control that leverages such data from multiple models with heterogeneous sampling costs. A key technical novelty of our approach is the integration of Multi-level Monte Carlo (MLMC) and Multi-fidelity Monte Carlo (MFMC) that enable data from different time and state representations (system models) to be jointly used to reduce variance and improve sampling efficiency. We also provide theoretical analysis of the proposed method and show that our estimator is unbiased and consistent under mild conditions. Finally, we demonstrate via numerical simulation that the proposed method has improved computation (sampling costs) vs. accuracy trade-offs for risk quantification and path integral control.
Space Logistics Analysis and Incentive Design for Commercialization of Orbital Debris Remediation
As orbital debris continues to become a higher priority for the space industry, there is a need to explore how partnerships between the public and private space sector may aid in addressing this issue. This research develops a space logistics framework for planning orbital debris remediation missions, providing a quantitative basis for partnerships that are mutually beneficial between space operators and debris remediators. By integrating network-based space logistics and game theory, we illuminate the high-level costs of remediating orbital debris, and the surplus that stands to be shared as a result. These findings indicate significant progress toward the continued development of a safe, sustainable, and profitable space economy.
comment: 28 pages, 14 figures, Journal of Spacecraft and Rockets (Articles in Advance)
EB-MBD: Emerging-Barrier Model-Based Diffusion for Safe Trajectory Optimization in Highly Constrained Environments
We propose enforcing constraints on Model-Based Diffusion by introducing emerging barrier functions inspired by interior point methods. We show that constraints on Model-Based Diffusion can lead to catastrophic performance degradation, even on simple 2D systems due to sample inefficiency in the Monte Carlo approximation of the score function. We introduce Emerging-Barrier Model-Based Diffusion (EB-MBD) which uses progressively introduced barrier constraints to avoid these problems, significantly improving solution quality, without the need for computationally expensive operations such as projections. We analyze the sampling liveliness of samples each iteration to inform barrier parameter scheduling choice. We demonstrate results for 2D collision avoidance and a 3D underwater manipulator system and show that our method achieves lower cost solutions than Model-Based Diffusion, and requires orders of magnitude less computation time than projection based methods.
Some Reflections on Sliding Mode Designs in Control Systems: An Example of Adaptive Tracking Control for Simple Mechanical Systems With Friction Without Measurement of Velocity
The objective of this note is to share some reflections of the authors regarding the use of sliding mode designs in control systems. We believe the abundant, and ever increasing, appearance of this kind of works on our scientific publications deserves some critical evaluation of their actual role, relevance and pertinence. First, we discuss the procedure followed by most of these designs -- illustrated with examples from the literature. Second, we bring to the readers attention several aspects of the control problem, central in classical designs, which are disregarded in the sliding mode literature. Finally, to illustrate with an specific example our previous considerations, we compare the performance of two adaptive tracking controllers for a simple one degree of freedom mechanical systems with unknown parameters and static and Coulomb friction -- that do not rely on the measurement of velocity.
Optimal Control with Lyapunov Stability Guarantees for Space Applications
This paper investigates the infinite horizon optimal control problem (OCP) for space applications characterized by nonlinear dynamics. The proposed approach divides the problem into a finite horizon OCP with a regularized terminal cost, guiding the system towards a terminal set, and an infinite horizon linear regulation phase within this set. This strategy guarantees global asymptotic stability under specific assumptions. Our method maintains the system's fully nonlinear dynamics until it reaches the terminal set, where the system dynamics is linearized. As the terminal set converges to the origin, the difference in optimal cost incurred reduces to zero, guaranteeing an efficient and stable solution. The approach is tested through simulations on three problems: spacecraft attitude control, rendezvous maneuver, and soft landing. In spacecraft attitude control, we focus on achieving precise orientation and stabilization. For rendezvous maneuvers, we address the navigation of a chaser to meet a target spacecraft. For the soft landing problem, we ensure a controlled descent and touchdown on a planetary surface. We provide numerical results confirming the effectiveness of the proposed method in managing these nonlinear dynamics problems, offering robust solutions essential for successful space missions.
Joint Detection, Channel Estimation and Interference Nulling for Terrestrial-Satellite Downlink Co-Existence in the Upper Mid-Band
The upper mid-band FR3 spectrum (7-24 GHz) has garnered significant interest for future cellular services. However, utilizing a large portion of this band requires careful interference coordination with incumbent satellite systems. This paper investigates interference from high-power terrestrial base stations (TN-BSs) to satellite downlink receivers. A central challenge is that the victim receivers, i.e., ground-based non-terrestrial user equipment (NTN-UEs) such as satellite customer premises equipment, must first be detected and their channels estimated before the TN-BS can effectively place nulls in their directions. We explore a potential solution where NTN-UEs periodically transmit preambles or beacon signals that TN-BSs can use for detection and channel estimation. The performance of this nulling approach is analyzed in a simplified scenario with a single victim, revealing the interplay between path loss and estimation quality in determining nulling performance. To further validate the method, we conduct a detailed multi-user site-specific ray-tracing (RT) simulation in a rural environment. The results show that the proposed nulling approach is effective under realistic parameters, even with high densities of victim units, although TN-BS may require a substantial number of antennas.
comment: Accepted for publication in the Proceedings of GlobeCom 2025
Whole Body Model Predictive Control for Spin-Aware Quadrupedal Table Tennis ICRA 2026
Developing table tennis robots that mirror human speed, accuracy, and ability to predict and respond to the full range of ball spins remains a significant challenge for legged robots. To demonstrate these capabilities we present a system to play dynamic table tennis for quadrupedal robots that integrates high speed perception, trajectory prediction, and agile control. Our system uses external cameras for high-speed ball localization, physical models with learned residuals to infer spin and predict trajectories, and a novel model predictive control (MPC) formulation for agile full-body control. Notably, a continuous set of stroke strategies emerge automatically from different ball return objectives using this control paradigm. We demonstrate our system in the real world on a Spot quadruped, evaluate accuracy of each system component, and exhibit coordination through the system's ability to aim and return balls with varying spin types. As a further demonstration, the system is able to rally with human players.
comment: Submitted to appear in IEEE ICRA 2026
When to Reason: Semantic Router for vLLM NeurIPS 2025
Large Language Models (LLMs) demonstrate substantial accuracy gains when augmented with reasoning modes such as chain-of-thought and inference-time scaling. However, reasoning also incurs significant costs in inference latency and token usage, with environmental and financial impacts, which are unnecessary for many simple prompts. We present a semantic router that classifies queries based on their reasoning requirements and selectively applies reasoning only when beneficial. Our approach achieves a 10.2 percentage point improvement in accuracy on the MMLU-Pro benchmark while reducing response latency by 47.1% and token consumption by 48.5% compared to direct inference with vLLM. These results demonstrate that semantic routing offers an effective mechanism for striking a balance between accuracy and efficiency in open-source LLM serving systems
comment: 5 pages, excluding references and appendix. To be appeared at Workshop on ML for Systems at NeurIPS 2025, December 6, 2025 https://mlforsystems.org/
Identification and optimal control strategies for the transversal splitting of ultra--cold Bose gases
Splitting a Bose--Einstein condensate (BEC) is a key operation in fundamental physics experiments and emerging quantum technologies, where precise preparation of well--defined initial states requires fast yet coherent control of the condensate's nonlinear dynamics. This work formulates the BEC splitting process as an optimal feedforward control problem based on a physically interpretable, reduced--order model identified from limited experimental data. We introduce a systematic calibration strategy that combines optimal experiment selection and constrained nonlinear parameter estimation, enabling accurate system identification with minimal experimental overhead. Using this calibrated model, we compute energy--optimal trajectories via indirect optimal control to realize shortcuts to adiabaticity (STAs), achieving rapid transitions to the ground state of a double--well potential while suppressing excitations. Experiments confirm that the proposed control framework yields high--fidelity state transfers across multiple configurations, demonstrating its robustness and scalability for quantum control applications.
Resilient Multi-Dimensional Consensus and Distributed Optimization against Agent-Based and Denial-of-Service Attacks
In this paper, we consider the resilient multi-dimensional consensus and distributed optimization problems of multi-agent systems (MASs) in the presence of both agent-based and denial-of-service (DoS) attacks. The considered agent-based attacks can cover malicious, Byzantine, and stubborn agents. The links between agents in the network can be blocked by DoS attacks, which may lead the digraph to be time-varying and even disconnected. The objective is to ensure that the remaining benign agents achieve consensus. To this end, an "auxiliary point"-based resilient control algorithm is proposed for MASs. Under the proposed algorithm, each healthy agent constructs a "safe kernel" utilizing the states of its in-neighbors and updates its state toward a specific point within this kernel at each iteration. If an agent cannot receive its neighbors' states owing to DoS attacks, it will use the states received immediately before the DoS period. Moreover, a resilient multi-dimensional distributed optimization (RMDO) algorithm is also proposed. Theoretical proofs and numerical examples are presented to demonstrate the effectiveness of the proposed algorithms.
Optimization via a Control-Centric Framework
Optimization plays a central role in intelligent systems and cyber-physical technologies, where the speed and reliability of convergence directly impact performance. In control theory, optimization-centric methods are standard: controllers are designed by repeatedly solving optimization problems, as in linear quadratic regulation, $H_\infty$ control, and model predictive control. In contrast, this paper develops a control-centric framework for optimization itself, where algorithms are constructed directly from Lyapunov stability principles rather than being proposed first and analyzed afterward. A key element is the stationarity vector, which encodes first-order optimality conditions and enables Lyapunov-based convergence analysis. By pairing a Lyapunov function with a selectable decay law, we obtain continuous-time dynamics with guaranteed exponential, finite-time, fixed-time, or prescribed-time convergence. Within this framework, we introduce three feedback realizations of increasing restrictiveness: the Hessian-gradient, Newton, and gradient dynamics. Each realization shapes the decay of the stationarity vector to achieve the desired rate. These constructions unify unconstrained optimization, extend naturally to constrained problems via Lyapunov-consistent primal-dual dynamics, and broaden the results for minimax and generalized Nash equilibrium seeking problems beyond exponential stability. The framework provides systematic design tools for optimization algorithms in control and game-theoretic problems.
comment: This work has been submitted to the IEEE for possible publication. 12 pages, 3 figures
Optimal control of continuous-time symmetric systems with unknown dynamics and noisy measurements
An iterative learning algorithm is presented for continuous-time linear-quadratic optimal control problems where the system is externally symmetric with unknown dynamics. Both finite-horizon and infinite-horizon problems are considered. It is shown that the proposed algorithm is globally convergent to the optimal solution and has some advantages over adaptive dynamic programming, including being unbiased under noisy measurements and having a relatively low computational burden. Numerical experiments show the effectiveness of the results.
Product-oriented Product-Process-Resource Asset Network and its Representation in AutomationML for Asset Administration Shell
Current products, especially in the automotive sector, pose complex technical systems having a multi-disciplinary mechatronic nature. Industrial standards supporting system engineering and production typically (i) address the production phase only, but do not cover the complete product life cycle, and (ii) focus on production processes and resources rather than the products themselves. The presented approach is motivated by incorporating the impacts of the end-of-life phase of the product life cycle into the engineering phase. This paper proposes a modeling approach coming up from the Product-Process-Resource (PPR) modeling paradigm. It combines requirements on (i) respecting the product structure as a basis for the model, and (ii) incorporates repairing, remanufacturing, or upcycling within cyber-physical production systems. The proposed model called PoPAN should accompany the product during the entire life cycle as a digital shadow encapsulated within the Asset Administration Shell of a product. To facilitate the adoption of the proposed paradigm, the paper also proposes serialization of the model in the AutomationML data format. The model is demonstrated on a use-case for disassembling electric vehicle batteries to support their remanufacturing for stationary battery applications.
comment: \copyright 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Hierarchical Reinforcement Learning with Low-Level MPC for Multi-Agent Control
Achieving safe and coordinated behavior in dynamic, constraint-rich environments remains a major challenge for learning-based control. Pure end-to-end learning often suffers from poor sample efficiency and limited reliability, while model-based methods depend on predefined references and struggle to generalize. We propose a hierarchical framework that combines tactical decision-making via reinforcement learning (RL) with low-level execution through Model Predictive Control (MPC). For the case of multi-agent systems this means that high-level policies select abstract targets from structured regions of interest (ROIs), while MPC ensures dynamically feasible and safe motion. Tested on a predator-prey benchmark, our approach outperforms end-to-end and shielding-based RL baselines in terms of reward, safety, and consistency, underscoring the benefits of combining structured learning with model-based control.
Revisiting Functional Derivatives in Multi-object Tracking
Probability generating functionals (PGFLs) are efficient and powerful tools for tracking independent objects in clutter. It was shown that PGFLs could be used for the elegant derivation of practical multi-object tracking algorithms, e.g., the probability hypothesis density (PHD) filter. However, derivations using PGFLs use the so-called functional derivatives whose definitions usually appear too complicated or heuristic, involving Dirac delta ``functions''. This paper begins by comparing different definitions of functional derivatives and exploring their relationships and implications for practical applications. It then proposes a rigorous definition of the functional derivative, utilizing straightforward yet precise mathematics for clarity. Key properties of the functional derivative are revealed and discussed.
comment: submitted to IEEE Transactions on Information Theory
Carleman-Fourier linearization of nonlinear real dynamical systems with quasi-periodic fields
This paper presents Carleman-Fourier linearization for analyzing nonlinear real dynamical systems with periodic vector fields. Using Fourier basis functions, this novel framework transforms such dynamical systems into equivalent infinite-dimensional linear dynamical systems. In this paper, we establish the exponential convergence of the primary block in the finite-section approximation of this linearized system to the state vector of the original nonlinear system. To showcase the efficacy of our approach, we apply it to the Kuramoto model, a prominent model for coupled oscillators. The results demonstrate promising accuracy in approximating the original system's behavior.
comment: Discrete and Continuous Dynamical Systems Series B, accepted
Foundation Models for Structural Health Monitoring
Structural Health Monitoring (SHM) is a critical task for ensuring the safety and reliability of civil infrastructures, typically realized on bridges and viaducts by means of vibration monitoring. In this paper, we propose for the first time the use of Transformer neural networks, with a Masked Auto-Encoder architecture, as Foundation Models for SHM. We demonstrate the ability of these models to learn generalizable representations from multiple large datasets through self-supervised pre-training, which, coupled with task-specific fine-tuning, allows them to outperform state-of-the-art traditional methods on diverse tasks, including Anomaly Detection (AD) and Traffic Load Estimation (TLE). We then extensively explore model size versus accuracy trade-offs and experiment with Knowledge Distillation (KD) to improve the performance of smaller Transformers, enabling their embedding directly into the SHM edge nodes. We showcase the effectiveness of our foundation models using data from three operational viaducts. For AD, we achieve a near-perfect 99.9% accuracy with a monitoring time span of just 15 windows. In contrast, a state-of-the-art method based on Principal Component Analysis (PCA) obtains its first good result (95.03% accuracy), only considering 120 windows. On two different TLE tasks, our models obtain state-of-the-art performance on multiple evaluation metrics (R$^2$ score, MAE% and MSE%). On the first benchmark, we achieve an R$^2$ score of 0.97 and 0.90 for light and heavy vehicle traffic, respectively, while the best previous approach (a Random Forest) stops at 0.91 and 0.84. On the second one, we achieve an R$^2$ score of 0.54 versus the 0.51 of the best competitor method, a Long-Short Term Memory network.
comment: 17 pages, 6 tables, 9 figures
A Long-Duration Autonomy Approach to Connected and Automated Vehicles
In this article, we present a long-duration autonomy approach for the control of connected and automated vehicles (CAVs) operating in a transportation network. In particular, we focus on the performance of CAVs at traffic bottlenecks, including roundabouts, merging roadways, and intersections. We take a principled approach based on optimal control, and derive a reactive controller with guarantees on safety, performance, and energy efficiency. We guarantee safety through high order control barrier functions (HOCBFs), which we ``lift'' to first order CBFs using time-optimal motion primitives. This yields a set of first-order CBFs that are compatible with the control bounds. We demonstrate the performance of our approach in simulation and compare it to an optimal control-based approach.
comment: 8 pages, 3 figures
MARLIN: Multi-Agent Reinforcement Learning with Murmuration Intelligence and LLM Guidance for Reservoir Management
As climate change intensifies extreme weather events, water disasters pose growing threats to global communities, making adaptive reservoir management critical for protecting vulnerable populations and ensuring water security. Modern water resource management faces unprecedented challenges from cascading uncertainties propagating through interconnected reservoir networks. These uncertainties, rooted in physical water transfer losses and environmental variability, make precise control difficult. For example, sending 10 tons downstream may yield only 8-12 tons due to evaporation and seepage. Traditional centralized optimization approaches suffer from exponential computational complexity and cannot effectively handle such real-world uncertainties, while existing multi-agent reinforcement learning (MARL) methods fail to achieve effective coordination under uncertainty. To address these challenges, we present MARLIN, a decentralized reservoir management framework inspired by starling murmurations intelligence. Integrating bio-inspired alignment, separation, and cohesion rules with MARL, MARLIN enables individual reservoirs to make local decisions while achieving emergent global coordination. In addition, a LLM provides real-time reward shaping signals, guiding agents to adapt to environmental changes and human-defined preferences. Experiments on real-world USGS data show that MARLIN improves uncertainty handling by 23\%, cuts computation by 35\%, and accelerates flood response by 68\%, exhibiting super-linear coordination, with complexity scaling 5.4x from 400 to 10,000 nodes. These results demonstrate MARLIN's potential for disaster prevention and protecting communities through intelligent, scalable water resource management.
A Survey of Foundation Models for IoT: Taxonomy and Criteria-Based Analysis
Foundation models have gained growing interest in the IoT domain due to their reduced reliance on labeled data and strong generalizability across tasks, which address key limitations of traditional machine learning approaches. However, most existing foundation model based methods are developed for specific IoT tasks, making it difficult to compare approaches across IoT domains and limiting guidance for applying them to new tasks. This survey aims to bridge this gap by providing a comprehensive overview of current methodologies and organizing them around four shared performance objectives by different domains: efficiency, context-awareness, safety, and security & privacy. For each objective, we review representative works, summarize commonly-used techniques and evaluation metrics. This objective-centric organization enables meaningful cross-domain comparisons and offers practical insights for selecting and designing foundation model based solutions for new IoT tasks. We conclude with key directions for future research to guide both practitioners and researchers in advancing the use of foundation models in IoT applications.
comment: Accepted by CCF Transactions on Pervasive Computing and Interaction (CCF TPCI)
Fast Online Adaptive Neural MPC via Meta-Learning
Data-driven model predictive control (MPC) has demonstrated significant potential for improving robot control performance in the presence of model uncertainties. However, existing approaches often require extensive offline data collection and computationally intensive training, limiting their ability to adapt online. To address these challenges, this paper presents a fast online adaptive MPC framework that leverages neural networks integrated with Model-Agnostic Meta-Learning (MAML). Our approach focuses on few-shot adaptation of residual dynamics - capturing the discrepancy between nominal and true system behavior - using minimal online data and gradient steps. By embedding these meta-learned residual models into a computationally efficient L4CasADi-based MPC pipeline, the proposed method enables rapid model correction, enhances predictive accuracy, and improves real-time control performance. We validate the framework through simulation studies on a Van der Pol oscillator, a Cart-Pole system, and a 2D quadrotor. Results show significant gains in adaptation speed and prediction accuracy over both nominal MPC and nominal MPC augmented with a freshly initialized neural network, underscoring the effectiveness of our approach for real-time adaptive robot control.
Characterizing and Optimizing Real-Time Optimal Control for Embedded SoCs
Resource-limited robots face significant challenges in executing computationally intensive tasks, such as locomotion and manipulation, particularly for real-time optimal control algorithms like Model Predictive Control (MPC). This paper provides a comprehensive design space exploration to identify optimal hardware computation architectures for these demanding model-based control algorithms. We profile and optimize representative architectural designs, including general-purpose scalar CPUs, vector processors, and specialized accelerators. By characterizing kernel-level benchmarks and end-to-end robotic scenarios, including a hardware-in-the-loop evaluation on a fabricated RISC-V multi-core vector SoC, we present a quantitative comparison of performance, area, and utilization across distinct architectural design points. Our findings demonstrate that targeted architectural modifications, coupled with deep software and system optimizations, enable up to 3.71x speedups for MPC, resulting in up to 27% system-level power reductions while completing robotic tasks. Finally, we propose a code generation flow designed to simplify the complex engineering effort required for mapping robotic workloads onto specialized architectures.
Hybrid Feedback Control for Global Navigation with Locally Optimal Obstacle Avoidance in n-Dimensional Spaces
We present a hybrid feedback control framework for autonomous robot navigation in n-dimensional Euclidean spaces cluttered with spherical obstacles. The proposed approach ensures safe and global navigation towards a target location by dynamically switching between two operational modes: motion-to-destination and locally optimal obstacle-avoidance. It produces continuous velocity inputs, ensures collision-free trajectories and generates locally optimal obstacle avoidance maneuvers. Unlike existing methods, the proposed framework is compatible with range sensors, enabling navigation in both a priori known and unknown environments. Extensive simulations in 2D and 3D settings, complemented by experimental validation on a TurtleBot 4 platform, confirm the efficacy and robustness of the approach. Our results demonstrate shorter paths and smoother trajectories compared to state-of-the-art methods, while maintaining computational efficiency and real-world feasibility.
Safe Autonomous Environmental Contact for Soft Robots using Control Barrier Functions
Robots built from soft materials will inherently apply lower environmental forces than their rigid counterparts, and therefore may be more suitable in sensitive settings with unintended contact. However, these robots' applied forces result from both their design and their control system in closed-loop, and therefore, ensuring bounds on these forces requires controller synthesis for safety as well. This article introduces the first feedback controller for a soft manipulator that formally meets a safety specification with respect to environmental contact. In our proof-of-concept setting, the robot's environment has known geometry and is deformable with a known elastic modulus. Our approach maps a bound on applied forces to a safe set of positions of the robot's tip via predicted deformations of the environment. Then, a quadratic program with Control Barrier Functions in its constraints is used to supervise a nominal feedback signal, verifiably maintaining the robot's tip within this safe set. Hardware experiments on a multi-segment soft pneumatic robot demonstrate that the proposed framework successfully maintains a positive safety margin. This framework represents a fundamental shift in perspective on control and safety for soft robots, implementing a formally verifiable logic specification on their pose and contact forces.
comment: 8 pages, 9 figures
Observability for Nonlinear Systems: Connecting Variational Dynamics, Lyapunov Exponents, and Empirical Gramians
Observability quantification is a key problem in dynamic network sciences. While it has been thoroughly studied for linear systems, observability quantification for nonlinear networks is less intuitive and more cumbersome. One common approach to quantify observability for nonlinear systems is via the Empirical Gramian (Empr-Gram) -- a generalized form of the Gramian of linear systems. In this paper, we produce three new results. First, we establish that a variational form of discrete-time autonomous nonlinear systems yields a so-called Variational Gramian (Var-Gram) that is equivalent to the classic Empr-Gram; the former being easier to compute than the latter. Via Lyapunov exponents derived from Lyapunov's direct method, the paper's second result derives connections between existing observability measures and Var-Gram. The third result demonstrates the applicability of these new notions for sensor selection/placement in nonlinear systems. Numerical case studies demonstrate these three developments and their merits.
Robotics
BLAZER: Bootstrapping LLM-based Manipulation Agents with Zero-Shot Data Generation
Scaling data and models has played a pivotal role in the remarkable progress of computer vision and language. Inspired by these domains, recent efforts in robotics have similarly focused on scaling both data and model size to develop more generalizable and robust policies. However, unlike vision and language, robotics lacks access to internet-scale demonstrations across diverse robotic tasks and environments. As a result, the scale of existing datasets typically suffers from the need for manual data collection and curation. To address this problem, here we propose BLAZER, a framework that learns manipulation policies from automatically generated training data. We build on the zero-shot capabilities of LLM planners and automatically generate demonstrations for diverse manipulation tasks in simulation. Successful examples are then used to finetune an LLM and to improve its planning capabilities without human supervision. Notably, while BLAZER training requires access to the simulator's state, we demonstrate direct transfer of acquired skills to sensor-based manipulation. Through extensive experiments, we show BLAZER to significantly improve zero-shot manipulation in both simulated and real environments. Moreover, BLAZER improves on tasks outside of its training pool and enables downscaling of LLM models. Our code and data will be made publicly available on the project page.
comment: 11 pages, 8 figures
Scalable Offline Metrics for Autonomous Driving IROS 2025
Real-World evaluation of perception-based planning models for robotic systems, such as autonomous vehicles, can be safely and inexpensively conducted offline, i.e., by computing model prediction error over a pre-collected validation dataset with ground-truth annotations. However, extrapolating from offline model performance to online settings remains a challenge. In these settings, seemingly minor errors can compound and result in test-time infractions or collisions. This relationship is understudied, particularly across diverse closed-loop metrics and complex urban maneuvers. In this work, we revisit this undervalued question in policy evaluation through an extensive set of experiments across diverse conditions and metrics. Based on analysis in simulation, we find an even worse correlation between offline and online settings than reported by prior studies, casting doubts on the validity of current evaluation practices and metrics for driving policies. Next, we bridge the gap between offline and online evaluation. We investigate an offline metric based on epistemic uncertainty, which aims to capture events that are likely to cause errors in closed-loop settings. The resulting metric achieves over 13% improvement in correlation compared to previous offline metrics. We further validate the generalization of our findings beyond the simulation environment in real-world settings, where even greater gains are observed.
comment: Accepted at IROS 2025 (IEEE/RSJ International Conference on Intelligent Robots and Systems)
NovaFlow: Zero-Shot Manipulation via Actionable Flow from Generated Videos
Enabling robots to execute novel manipulation tasks zero-shot is a central goal in robotics. Most existing methods assume in-distribution tasks or rely on fine-tuning with embodiment-matched data, limiting transfer across platforms. We present NovaFlow, an autonomous manipulation framework that converts a task description into an actionable plan for a target robot without any demonstrations. Given a task description, NovaFlow synthesizes a video using a video generation model and distills it into 3D actionable object flow using off-the-shelf perception modules. From the object flow, it computes relative poses for rigid objects and realizes them as robot actions via grasp proposals and trajectory optimization. For deformable objects, this flow serves as a tracking objective for model-based planning with a particle-based dynamics model. By decoupling task understanding from low-level control, NovaFlow naturally transfers across embodiments. We validate on rigid, articulated, and deformable object manipulation tasks using a table-top Franka arm and a Spot quadrupedal mobile robot, and achieve effective zero-shot execution without demonstrations or embodiment-specific training. Project website: https://novaflow.lhy.xyz/.
ResAD: Normalized Residual Trajectory Modeling for End-to-End Autonomous Driving
End-to-end autonomous driving (E2EAD) systems, which learn to predict future trajectories directly from sensor data, are fundamentally challenged by the inherent spatio-temporal imbalance of trajectory data. This imbalance creates a significant optimization burden, causing models to learn spurious correlations instead of causal inference, while also prioritizing uncertain, distant predictions, thereby compromising immediate safety. To address these issues, we propose ResAD, a novel Normalized Residual Trajectory Modeling framework. Instead of predicting the future trajectory directly, our approach reframes the learning task to predict the residual deviation from a deterministic inertial reference. The inertial reference serves as a counterfactual, forcing the model to move beyond simple pattern recognition and instead identify the underlying causal factors (e.g., traffic rules, obstacles) that necessitate deviations from a default, inertially-guided path. To deal with the optimization imbalance caused by uncertain, long-term horizons, ResAD further incorporates Point-wise Normalization of the predicted residual. It re-weights the optimization objective, preventing large-magnitude errors associated with distant, uncertain waypoints from dominating the learning signal. Extensive experiments validate the effectiveness of our framework. On the NAVSIM benchmark, ResAD achieves a state-of-the-art PDMS of 88.6 using a vanilla diffusion policy with only two denoising steps, demonstrating that our approach significantly simplifies the learning task and improves model performance. The code will be released to facilitate further research.
DexNDM: Closing the Reality Gap for Dexterous In-Hand Rotation via Joint-Wise Neural Dynamics Model
Achieving generalized in-hand object rotation remains a significant challenge in robotics, largely due to the difficulty of transferring policies from simulation to the real world. The complex, contact-rich dynamics of dexterous manipulation create a "reality gap" that has limited prior work to constrained scenarios involving simple geometries, limited object sizes and aspect ratios, constrained wrist poses, or customized hands. We address this sim-to-real challenge with a novel framework that enables a single policy, trained in simulation, to generalize to a wide variety of objects and conditions in the real world. The core of our method is a joint-wise dynamics model that learns to bridge the reality gap by effectively fitting limited amount of real-world collected data and then adapting the sim policy's actions accordingly. The model is highly data-efficient and generalizable across different whole-hand interaction distributions by factorizing dynamics across joints, compressing system-wide influences into low-dimensional variables, and learning each joint's evolution from its own dynamic profile, implicitly capturing these net effects. We pair this with a fully autonomous data collection strategy that gathers diverse, real-world interaction data with minimal human intervention. Our complete pipeline demonstrates unprecedented generality: a single policy successfully rotates challenging objects with complex shapes (e.g., animals), high aspect ratios (up to 5.33), and small sizes, all while handling diverse wrist orientations and rotation axes. Comprehensive real-world evaluations and a teleoperation application for complex tasks validate the effectiveness and robustness of our approach. Website: https://meowuu7.github.io/DexNDM/
comment: Project Website: https://meowuu7.github.io/DexNDM/ Video: https://youtu.be/tU2Mv8vWftU
Dream to Recall: Imagination-Guided Experience Retrieval for Memory-Persistent Vision-and-Language Navigation
Vision-and-Language Navigation (VLN) requires agents to follow natural language instructions through environments, with memory-persistent variants demanding progressive improvement through accumulated experience. Existing approaches for memory-persistent VLN face critical limitations: they lack effective memory access mechanisms, instead relying on entire memory incorporation or fixed-horizon lookup, and predominantly store only environmental observations while neglecting navigation behavioral patterns that encode valuable decision-making strategies. We present Memoir, which employs imagination as a retrieval mechanism grounded by explicit memory: a world model imagines future navigation states as queries to selectively retrieve relevant environmental observations and behavioral histories. The approach comprises: 1) a language-conditioned world model that imagines future states serving dual purposes: encoding experiences for storage and generating retrieval queries; 2) Hybrid Viewpoint-Level Memory that anchors both observations and behavioral patterns to viewpoints, enabling hybrid retrieval; and 3) an experience-augmented navigation model that integrates retrieved knowledge through specialized encoders. Extensive evaluation across diverse memory-persistent VLN benchmarks with 10 distinctive testing scenarios demonstrates Memoir's effectiveness: significant improvements across all scenarios, with 5.4% SPL gains on IR2R over the best memory-persistent baseline, accompanied by 8.3x training speedup and 74% inference memory reduction. The results validate that predictive retrieval of both environmental and behavioral memories enables more effective navigation, with analysis indicating substantial headroom (73.3% vs 93.4% upper bound) for this imagination-guided paradigm. Code at https://github.com/xyz9911/Memoir.
comment: 14 pages, 6 figures, 13 tables
R2RGEN: Real-to-Real 3D Data Generation for Spatially Generalized Manipulation
Towards the aim of generalized robotic manipulation, spatial generalization is the most fundamental capability that requires the policy to work robustly under different spatial distribution of objects, environment and agent itself. To achieve this, substantial human demonstrations need to be collected to cover different spatial configurations for training a generalized visuomotor policy via imitation learning. Prior works explore a promising direction that leverages data generation to acquire abundant spatially diverse data from minimal source demonstrations. However, most approaches face significant sim-to-real gap and are often limited to constrained settings, such as fixed-base scenarios and predefined camera viewpoints. In this paper, we propose a real-to-real 3D data generation framework (R2RGen) that directly augments the pointcloud observation-action pairs to generate real-world data. R2RGen is simulator- and rendering-free, thus being efficient and plug-and-play. Specifically, given a single source demonstration, we introduce an annotation mechanism for fine-grained parsing of scene and trajectory. A group-wise augmentation strategy is proposed to handle complex multi-object compositions and diverse task constraints. We further present camera-aware processing to align the distribution of generated data with real-world 3D sensor. Empirically, R2RGen substantially enhances data efficiency on extensive experiments and demonstrates strong potential for scaling and application on mobile manipulation.
comment: Project page: https://r2rgen.github.io/
Have We Scene It All? Scene Graph-Aware Deep Point Cloud Compression
Efficient transmission of 3D point cloud data is critical for advanced perception in centralized and decentralized multi-agent robotic systems, especially nowadays with the growing reliance on edge and cloud-based processing. However, the large and complex nature of point clouds creates challenges under bandwidth constraints and intermittent connectivity, often degrading system performance. We propose a deep compression framework based on semantic scene graphs. The method decomposes point clouds into semantically coherent patches and encodes them into compact latent representations with semantic-aware encoders conditioned by Feature-wise Linear Modulation (FiLM). A folding-based decoder, guided by latent features and graph node attributes, enables structurally accurate reconstruction. Experiments on the SemanticKITTI and nuScenes datasets show that the framework achieves state-of-the-art compression rates, reducing data size by up to 98% while preserving both structural and semantic fidelity. In addition, it supports downstream applications such as multi-robot pose graph optimization and map merging, achieving trajectory accuracy and map alignment comparable to those obtained with raw LiDAR scans.
comment: Accepted for publication in IEEE Robotics and Automation Letters (RA-L). 8 pages, 6 figures
DexMan: Learning Bimanual Dexterous Manipulation from Human and Generated Videos
We present DexMan, an automated framework that converts human visual demonstrations into bimanual dexterous manipulation skills for humanoid robots in simulation. Operating directly on third-person videos of humans manipulating rigid objects, DexMan eliminates the need for camera calibration, depth sensors, scanned 3D object assets, or ground-truth hand and object motion annotations. Unlike prior approaches that consider only simplified floating hands, it directly controls a humanoid robot and leverages novel contact-based rewards to improve policy learning from noisy hand-object poses estimated from in-the-wild videos. DexMan achieves state-of-the-art performance in object pose estimation on the TACO benchmark, with absolute gains of 0.08 and 0.12 in ADD-S and VSD. Meanwhile, its reinforcement learning policy surpasses previous methods by 19% in success rate on OakInk-v2. Furthermore, DexMan can generate skills from both real and synthetic videos, without the need for manual data collection and costly motion capture, and enabling the creation of large-scale, diverse datasets for training generalist dexterous manipulation.
comment: Video results are available at: https://embodiedai-ntu.github.io/dexman/index.html
Don't Run with Scissors: Pruning Breaks VLA Models but They Can Be Recovered
Vision-Language-Action (VLA) models have advanced robotic capabilities but remain challenging to deploy on resource-limited hardware. Pruning has enabled efficient compression of large language models (LLMs), yet it is largely understudied in robotics. Surprisingly, we observe that pruning VLA models leads to drastic degradation and increased safety violations. We introduce GLUESTICK, a post-pruning recovery method that restores much of the original model's functionality while retaining sparsity benefits. Our method performs a one-time interpolation between the dense and pruned models in weight-space to compute a corrective term. This correction is used during inference by each pruned layer to recover lost capabilities with minimal overhead. GLUESTICK requires no additional training, is agnostic to the pruning algorithm, and introduces a single hyperparameter that controls the tradeoff between efficiency and accuracy. Across diverse VLA architectures and tasks in manipulation and navigation, GLUESTICK achieves competitive memory efficiency while substantially recovering success rates and reducing safety violations. Additional material can be found at: https://gluestick-vla.github.io/.
Gaze on the Prize: Shaping Visual Attention with Return-Guided Contrastive Learning
Visual Reinforcement Learning (RL) agents must learn to act based on high-dimensional image data where only a small fraction of the pixels is task-relevant. This forces agents to waste exploration and computational resources on irrelevant features, leading to sample-inefficient and unstable learning. To address this, inspired by human visual foveation, we introduce Gaze on the Prize. This framework augments visual RL with a learnable foveal attention mechanism (Gaze), guided by a self-supervised signal derived from the agent's experience pursuing higher returns (the Prize). Our key insight is that return differences reveal what matters most: If two similar representations produce different outcomes, their distinguishing features are likely task-relevant, and the gaze should focus on them accordingly. This is realized through return-guided contrastive learning that trains the attention to distinguish between the features relevant to success and failure. We group similar visual representations into positives and negatives based on their return differences and use the resulting labels to construct contrastive triplets. These triplets provide the training signal that teaches the attention mechanism to produce distinguishable representations for states associated with different outcomes. Our method achieves up to 2.4x improvement in sample efficiency and can solve tasks that the baseline fails to learn, demonstrated across a suite of manipulation tasks from the ManiSkill3 benchmark, all without modifying the underlying algorithm or hyperparameters.
comment: Project page: https://andrewcwlee.github.io/gaze-on-the-prize
Validation of collision-free spheres of Stewart-Gough platforms for constant orientations using the Application Programming Interface of a CAD software
This paper presents a method of validation of the size of the largest collision-free sphere (CFS) of a 6-6 Stewart-Gough platform manipulator (SGPM) for a given orientation of its moving platform (MP) using the Application Programming Interface (API) of a CAD software. The position of the MP is updated via the API in an automated manner over a set of samples within a shell enclosing the surface of the CFS. For each pose of the manipulator, each pair of legs is investigated for mutual collisions. The CFS is considered safe or validated iff none of the points falling inside the CFS lead to a collision between any pair of legs. This approach can not only validate the safety of a precomputed CFS, but also estimate the same for any spatial parallel manipulator.
Reliability of Single-Level Equality-Constrained Inverse Optimal Control
Inverse optimal control (IOC) allows the retrieval of optimal cost function weights, or behavioral parameters, from human motion. The literature on IOC uses methods that are either based on a slow bilevel process or a fast but noise-sensitive minimization of optimality condition violation. Assuming equality-constrained optimal control models of human motion, this article presents a faster but robust approach to solving IOC using a single-level reformulation of the bilevel method and yields equivalent results. Through numerical experiments in simulation, we analyze the robustness to noise of the proposed single-level reformulation to the bilevel IOC formulation with a human-like planar reaching task that is used across recent studies. The approach shows resilience to very large levels of noise and reduces the computation time of the IOC on this task by a factor of 15 when compared to a classical bilevel implementation.
comment: 8 pages, 3 figures
Airy: Reading Robot Intent through Height and Sky
As industrial robots move into shared human spaces, their opaque decision making threatens safety, trust, and public oversight. This artwork, Airy, asks whether complex multi agent AI can become intuitively understandable by staging a competition between two reinforcement trained robot arms that snap a bedsheet skyward. Building on three design principles, competition as a clear metric (who lifts higher), embodied familiarity (audiences recognize fabric snapping), and sensor to sense mapping (robot cooperation or rivalry shown through forest and weather projections), the installation gives viewers a visceral way to read machine intent. Observations from five international exhibitions indicate that audiences consistently read the robots' strategies, conflict, and cooperation in real time, with emotional reactions that mirror the system's internal state. The project shows how sensory metaphors can turn a black box into a public interface.
Co-design is powerful and not free
Robotic performance emerges from the coupling of body and controller, yet it remains unclear when morphology-control co-design is necessary. We present a unified framework that embeds morphology and control parameters within a single neural network, enabling end-to-end joint optimization. Through case studies in static-obstacle-constrained reaching, we evaluate trajectory error, success rate, and collision probability. The results show that co-design provides clear benefits when morphology is poorly matched to the task, such as near obstacles or workspace boundaries, where structural adaptation simplifies control. Conversely, when the baseline morphology already affords sufficient capability, control-only optimization often matches or exceeds co-design. By clarifying when control is enough and when it is not, this work advances the understanding of embodied intelligence and offers practical guidance for embodiment-aware robot design.
A Multimodal Depth-Aware Method For Embodied Reference Understanding
Embodied Reference Understanding requires identifying a target object in a visual scene based on both language instructions and pointing cues. While prior works have shown progress in open-vocabulary object detection, they often fail in ambiguous scenarios where multiple candidate objects exist in the scene. To address these challenges, we propose a novel ERU framework that jointly leverages LLM-based data augmentation, depth-map modality, and a depth-aware decision module. This design enables robust integration of linguistic and embodied cues, improving disambiguation in complex or cluttered environments. Experimental results on two datasets demonstrate that our approach significantly outperforms existing baselines, achieving more accurate and reliable referent detection.
Evaluation of a Robust Control System in Real-World Cable-Driven Parallel Robots
This study evaluates the performance of classical and modern control methods for real-world Cable-Driven Parallel Robots (CDPRs), focusing on underconstrained systems with limited time discretization. A comparative analysis is conducted between classical PID controllers and modern reinforcement learning algorithms, including Deep Deterministic Policy Gradient (DDPG), Proximal Policy Optimization (PPO), and Trust Region Policy Optimization (TRPO). The results demonstrate that TRPO outperforms other methods, achieving the lowest root mean square (RMS) errors across various trajectories and exhibiting robustness to larger time intervals between control updates. TRPO's ability to balance exploration and exploitation enables stable control in noisy, real-world environments, reducing reliance on high-frequency sensor feedback and computational demands. These findings highlight TRPO's potential as a robust solution for complex robotic control tasks, with implications for dynamic environments and future applications in sensor fusion or hybrid control strategies.
NavSpace: How Navigation Agents Follow Spatial Intelligence Instructions
Instruction-following navigation is a key step toward embodied intelligence. Prior benchmarks mainly focus on semantic understanding but overlook systematically evaluating navigation agents' spatial perception and reasoning capabilities. In this work, we introduce the NavSpace benchmark, which contains six task categories and 1,228 trajectory-instruction pairs designed to probe the spatial intelligence of navigation agents. On this benchmark, we comprehensively evaluate 22 navigation agents, including state-of-the-art navigation models and multimodal large language models. The evaluation results lift the veil on spatial intelligence in embodied navigation. Furthermore, we propose SNav, a new spatially intelligent navigation model. SNav outperforms existing navigation agents on NavSpace and real robot tests, establishing a strong baseline for future work.
Accurate and Noise-Tolerant Extraction of Routine Logs in Robotic Process Automation (Extended Version)
Robotic Process Mining focuses on the identification of the routine types performed by human resources through a User Interface. The ultimate goal is to discover routine-type models to enable robotic process automation. The discovery of routine-type models requires the provision of a routine log. Unfortunately, the vast majority of existing works do not directly focus on enabling the model discovery, limiting themselves to extracting the set of actions that are part of the routines. They were also not evaluated in scenarios characterized by inconsistent routine execution, hereafter referred to as noise, which reflects natural variability and occasional errors in human performance. This paper presents a clustering-based technique that aims to extract routine logs. Experiments were conducted on nine UI logs from the literature with different levels of injected noise. Our technique was compared with existing techniques, most of which are not meant to discover routine logs but were adapted for the purpose. The results were evaluated through standard state-of-the-art metrics, showing that we can extract more accurate routine logs than what the state of the art could, especially in the presence of noise.
comment: 16 pages, 5 figures
Beyond hospital reach: Autonomous lightweight ultrasound robot for liver sonography
Liver disease is a major global health burden. While ultrasound is the first-line diagnostic tool, liver sonography requires locating multiple non-continuous planes from positions where target structures are often not visible, for biometric assessment and lesion detection, requiring significant expertise. However, expert sonographers are severely scarce in resource-limited regions. Here, we develop an autonomous lightweight ultrasound robot comprising an AI agent that integrates multi-modal perception with memory attention for localization of unseen target structures, and a 588-gram 6-degrees-of-freedom cable-driven robot. By mounting on the abdomen, the system enhances robustness against motion. Our robot can autonomously acquire expert-level standard liver ultrasound planes and detect pathology in patients, including two from Xining, a 2261-meter-altitude city with limited medical resources. Our system performs effectively on rapid-motion individuals and in wilderness environments. This work represents the first demonstration of autonomous sonography across multiple challenging scenarios, potentially transforming access to expert-level diagnostics in underserved regions.
Towards Reliable LLM-based Robot Planning via Combined Uncertainty Estimation
Large language models (LLMs) demonstrate advanced reasoning abilities, enabling robots to understand natural language instructions and generate high-level plans with appropriate grounding. However, LLM hallucinations present a significant challenge, often leading to overconfident yet potentially misaligned or unsafe plans. While researchers have explored uncertainty estimation to improve the reliability of LLM-based planning, existing studies have not sufficiently differentiated between epistemic and intrinsic uncertainty, limiting the effectiveness of uncertainty estimation. In this paper, we present Combined Uncertainty estimation for Reliable Embodied planning (CURE), which decomposes the uncertainty into epistemic and intrinsic uncertainty, each estimated separately. Furthermore, epistemic uncertainty is subdivided into task clarity and task familiarity for more accurate evaluation. The overall uncertainty assessments are obtained using random network distillation and multi-layer perceptron regression heads driven by LLM features. We validated our approach in two distinct experimental settings: kitchen manipulation and tabletop rearrangement experiments. The results show that, compared to existing methods, our approach yields uncertainty estimates that are more closely aligned with the actual execution outcomes.
FastUMI-100K: Advancing Data-driven Robotic Manipulation with a Large-scale UMI-style Dataset
Data-driven robotic manipulation learning depends on large-scale, high-quality expert demonstration datasets. However, existing datasets, which primarily rely on human teleoperated robot collection, are limited in terms of scalability, trajectory smoothness, and applicability across different robotic embodiments in real-world environments. In this paper, we present FastUMI-100K, a large-scale UMI-style multimodal demonstration dataset, designed to overcome these limitations and meet the growing complexity of real-world manipulation tasks. Collected by FastUMI, a novel robotic system featuring a modular, hardware-decoupled mechanical design and an integrated lightweight tracking system, FastUMI-100K offers a more scalable, flexible, and adaptable solution to fulfill the diverse requirements of real-world robot demonstration data. Specifically, FastUMI-100K contains over 100K+ demonstration trajectories collected across representative household environments, covering 54 tasks and hundreds of object types. Our dataset integrates multimodal streams, including end-effector states, multi-view wrist-mounted fisheye images and textual annotations. Each trajectory has a length ranging from 120 to 500 frames. Experimental results demonstrate that FastUMI-100K enables high policy success rates across various baseline algorithms, confirming its robustness, adaptability, and real-world applicability for solving complex, dynamic manipulation challenges. The source code and dataset will be released in this link https://github.com/MrKeee/FastUMI-100K.
Orientation Learning and Adaptation towards Simultaneous Incorporation of Multiple Local Constraints
Orientation learning plays a pivotal role in many tasks. However, the rotation group SO(3) is a Riemannian manifold. As a result, the distortion caused by non-Euclidean geometric nature introduces difficulties to the incorporation of local constraints, especially for the simultaneous incorporation of multiple local constraints. To address this issue, we propose the Angle-Axis Space-based orientation representation method to solve several orientation learning problems, including orientation adaptation and minimization of angular acceleration. Specifically, we propose a weighted average mechanism in SO(3) based on the angle-axis representation method. Our main idea is to generate multiple trajectories by considering different local constraints at different basepoints. Then these multiple trajectories are fused to generate a smooth trajectory by our proposed weighted average mechanism, achieving the goal to incorporate multiple local constraints simultaneously. Compared with existing solution, ours can address the distortion issue and make the off-theshelf Euclidean learning algorithm be re-applicable in non-Euclidean space. Simulation and Experimental evaluations validate that our solution can not only adapt orientations towards arbitrary desired via-points and cope with angular acceleration constraints, but also incorporate multiple local constraints simultaneously to achieve extra benefits, e.g., achieving smaller acceleration costs.
Executable Analytic Concepts as the Missing Link Between VLM Insight and Precise Manipulation
Enabling robots to perform precise and generalized manipulation in unstructured environments remains a fundamental challenge in embodied AI. While Vision-Language Models (VLMs) have demonstrated remarkable capabilities in semantic reasoning and task planning, a significant gap persists between their high-level understanding and the precise physical execution required for real-world manipulation. To bridge this "semantic-to-physical" gap, we introduce GRACE, a novel framework that grounds VLM-based reasoning through executable analytic concepts (EAC)-mathematically defined blueprints that encode object affordances, geometric constraints, and semantics of manipulation. Our approach integrates a structured policy scaffolding pipeline that turn natural language instructions and visual information into an instantiated EAC, from which we derive grasp poses, force directions and plan physically feasible motion trajectory for robot execution. GRACE thus provides a unified and interpretable interface between high-level instruction understanding and low-level robot control, effectively enabling precise and generalizable manipulation through semantic-physical grounding. Extensive experiments demonstrate that GRACE achieves strong zero-shot generalization across a variety of articulated objects in both simulated and real-world environments, without requiring task-specific training.
Towards Proprioception-Aware Embodied Planning for Dual-Arm Humanoid Robots
In recent years, Multimodal Large Language Models (MLLMs) have demonstrated the ability to serve as high-level planners, enabling robots to follow complex human instructions. However, their effectiveness, especially in long-horizon tasks involving dual-arm humanoid robots, remains limited. This limitation arises from two main challenges: (i) the absence of simulation platforms that systematically support task evaluation and data collection for humanoid robots, and (ii) the insufficient embodiment awareness of current MLLMs, which hinders reasoning about dual-arm selection logic and body positions during planning. To address these issues, we present DualTHOR, a new dual-arm humanoid simulator, with continuous transition and a contingency mechanism. Building on this platform, we propose Proprio-MLLM, a model that enhances embodiment awareness by incorporating proprioceptive information with motion-based position embedding and a cross-spatial encoder. Experiments show that, while existing MLLMs struggle in this environment, Proprio-MLLM achieves an average improvement of 19.75% in planning performance. Our work provides both an essential simulation platform and an effective model to advance embodied intelligence in humanoid robotics. The code is available at https://anonymous.4open.science/r/DualTHOR-5F3B.
Team Xiaomi EV-AD VLA: Learning to Navigate Socially Through Proactive Risk Perception -- Technical Report for IROS 2025 RoboSense Challenge Social Navigation Track
In this report, we describe the technical details of our submission to the IROS 2025 RoboSense Challenge Social Navigation Track. This track focuses on developing RGBD-based perception and navigation systems that enable autonomous agents to navigate safely, efficiently, and socially compliantly in dynamic human-populated indoor environments. The challenge requires agents to operate from an egocentric perspective using only onboard sensors including RGB-D observations and odometry, without access to global maps or privileged information, while maintaining social norm compliance such as safe distances and collision avoidance. Building upon the Falcon model, we introduce a Proactive Risk Perception Module to enhance social navigation performance. Our approach augments Falcon with collision risk understanding that learns to predict distance-based collision risk scores for surrounding humans, which enables the agent to develop more robust spatial awareness and proactive collision avoidance behaviors. The evaluation on the Social-HM3D benchmark demonstrates that our method improves the agent's ability to maintain personal space compliance while navigating toward goals in crowded indoor scenes with dynamic human agents, achieving 2nd place among 16 participating teams in the challenge.
USIM and U0: A Vision-Language-Action Dataset and Model for General Underwater Robots
Underwater environments present unique challenges for robotic operation, including complex hydrodynamics, limited visibility, and constrained communication. Although data-driven approaches have advanced embodied intelligence in terrestrial robots and enabled task-specific autonomous underwater robots, developing underwater intelligence capable of autonomously performing multiple tasks remains highly challenging, as large-scale, high-quality underwater datasets are still scarce. To address these limitations, we introduce USIM, a simulation-based multi-task Vision-Language-Action (VLA) dataset for underwater robots. USIM comprises over 561K frames from 1,852 trajectories, totaling approximately 15.6 hours of BlueROV2 interactions across 20 tasks in 9 diverse scenarios, ranging from visual navigation to mobile manipulation. Building upon this dataset, we propose U0, a VLA model for general underwater robots, which integrates binocular vision and other sensor modalities through multimodal fusion, and further incorporates a convolution-attention-based perception focus enhancement module (CAP) to improve spatial understanding and mobile manipulation. Across tasks such as inspection, obstacle avoidance, scanning, and dynamic tracking, the framework achieves a success rate of 80%, while in challenging mobile manipulation tasks, it reduces the distance to the target by 21.2% compared with baseline methods, demonstrating its effectiveness. USIM and U0 show that VLA models can be effectively applied to underwater robotic applications, providing a foundation for scalable dataset construction, improved task autonomy, and the practical realization of intelligent general underwater robots.
DM1: MeanFlow with Dispersive Regularization for 1-Step Robotic Manipulation
The ability to learn multi-modal action distributions is indispensable for robotic manipulation policies to perform precise and robust control. Flow-based generative models have recently emerged as a promising solution to learning distributions of actions, offering one-step action generation and thus achieving much higher sampling efficiency compared to diffusion-based methods. However, existing flow-based policies suffer from representation collapse, the inability to distinguish similar visual representations, leading to failures in precise manipulation tasks. We propose DM1 (MeanFlow with Dispersive Regularization for One-Step Robotic Manipulation), a novel flow matching framework that integrates dispersive regularization into MeanFlow to prevent collapse while maintaining one-step efficiency. DM1 employs multiple dispersive regularization variants across different intermediate embedding layers, encouraging diverse representations across training batches without introducing additional network modules or specialized training procedures. Experiments on RoboMimic benchmarks show that DM1 achieves 20-40 times faster inference (0.07s vs. 2-3.5s) and improves success rates by 10-20 percentage points, with the Lift task reaching 99% success over 85% of the baseline. Real-robot deployment on a Franka Panda further validates that DM1 transfers effectively from simulation to the physical world. To the best of our knowledge, this is the first work to leverage representation regularization to enable flow-based policies to achieve strong performance in robotic manipulation, establishing a simple yet powerful approach for efficient and robust manipulation.
comment: Website with code: https://guowei-zou.github.io/dm1/
GM3: A General Physical Model for Micro-Mobility Vehicles
Modeling the dynamics of micro-mobility vehicles (MMV) is becoming increasingly important for training autonomous vehicle systems and building urban traffic simulations. However, mainstream tools rely on variants of the Kinematic Bicycle Model (KBM) or mode-specific physics that miss tire slip, load transfer, and rider/vehicle lean. To our knowledge, no unified, physics-based model captures these dynamics across the full range of common MMVs and wheel layouts. We propose the "Generalized Micro-mobility Model" (GM3), a tire-level formulation based on the tire brush representation that supports arbitrary wheel configurations, including single/double track and multi-wheel platforms. We introduce an interactive model-agnostic simulation framework that decouples vehicle/layout specification from dynamics to compare the GM3 with the KBM and other models, consisting of fixed step RK4 integration, human-in-the-loop and scripted control, real-time trajectory traces and logging for analysis. We also empirically validate the GM3 on the Stanford Drone Dataset's deathCircle (roundabout) scene for biker, skater, and cart classes.
IntentionVLA: Generalizable and Efficient Embodied Intention Reasoning for Human-Robot Interaction
Vision-Language-Action (VLA) models leverage pretrained vision-language models (VLMs) to couple perception with robotic control, offering a promising path toward general-purpose embodied intelligence. However, current SOTA VLAs are primarily pretrained on multimodal tasks with limited relevance to embodied scenarios, and then finetuned to map explicit instructions to actions. Consequently, due to the lack of reasoning-intensive pretraining and reasoning-guided manipulation, these models are unable to perform implicit human intention reasoning required for complex, real-world interactions. To overcome these limitations, we propose \textbf{IntentionVLA}, a VLA framework with a curriculum training paradigm and an efficient inference mechanism. Our proposed method first leverages carefully designed reasoning data that combine intention inference, spatial grounding, and compact embodied reasoning, endowing the model with both reasoning and perception capabilities. In the following finetuning stage, IntentionVLA employs the compact reasoning outputs as contextual guidance for action generation, enabling fast inference under indirect instructions. Experimental results show that IntentionVLA substantially outperforms $\pi_0$, achieving 18\% higher success rates with direct instructions and 28\% higher than ECoT under intention instructions. On out-of-distribution intention tasks, IntentionVLA achieves over twice the success rate of all baselines, and further enables zero-shot human-robot interaction with 40\% success rate. These results highlight IntentionVLA as a promising paradigm for next-generation human-robot interaction (HRI) systems.
Trajectory Conditioned Cross-embodiment Skill Transfer
Learning manipulation skills from human demonstration videos presents a promising yet challenging problem, primarily due to the significant embodiment gap between human body and robot manipulators. Existing methods rely on paired datasets or hand-crafted rewards, which limit scalability and generalization. We propose TrajSkill, a framework for Trajectory Conditioned Cross-embodiment Skill Transfer, enabling robots to acquire manipulation skills directly from human demonstration videos. Our key insight is to represent human motions as sparse optical flow trajectories, which serve as embodiment-agnostic motion cues by removing morphological variations while preserving essential dynamics. Conditioned on these trajectories together with visual and textual inputs, TrajSkill jointly synthesizes temporally consistent robot manipulation videos and translates them into executable actions, thereby achieving cross-embodiment skill transfer. Extensive experiments are conducted, and the results on simulation data (MetaWorld) show that TrajSkill reduces FVD by 39.6\% and KVD by 36.6\% compared with the state-of-the-art, and improves cross-embodiment success rate by up to 16.7\%. Real-robot experiments in kitchen manipulation tasks further validate the effectiveness of our approach, demonstrating practical human-to-robot skill transfer across embodiments.
Injecting Hallucinations in Autonomous Vehicles: A Component-Agnostic Safety Evaluation Framework
Perception failures in autonomous vehicles (AV) remain a major safety concern because they are the basis for many accidents. To study how these failures affect safety, researchers typically inject artificial faults into hardware or software components and observe the outcomes. However, existing fault injection studies often target a single sensor or machine perception (MP) module, resulting in siloed frameworks that are difficult to generalize or integrate into unified simulation environments. This work addresses that limitation by reframing perception failures as hallucinations, false perceptions that distort an AV situational awareness and may trigger unsafe control actions. Since hallucinations describe only observable effects, this abstraction enables analysis independent of specific sensors or algorithms, focusing instead on how their faults manifest along the MP pipeline. Building on this concept, we propose a configurable, component-agnostic hallucination injection framework that induces six plausible hallucination types in an iterative open-source simulator. More than 18,350 simulations were executed in which hallucinations were injected while AVs crossed an unsignalized transverse street with traffic. The results statistically validate the framework and quantify the impact of each hallucination type on collisions and near misses. Certain hallucinations, such as perceptual latency and drift, significantly increase the risk of collision in the scenario tested, validating the proposed paradigm can stress the AV system safety. The framework offers a scalable, statistically validated, component agnostic, and fully interoperable toolset that simplifies and accelerates AV safety validations, even those with novel MP architectures and components. It can potentially reduce the time-to-market of AV and lay the foundation for future research on fault tolerance, and resilient AV design.
comment: 22 pages, 15 figures, 21 tables
DEAS: DEtached value learning with Action Sequence for Scalable Offline RL
Offline reinforcement learning (RL) presents an attractive paradigm for training intelligent agents without expensive online interactions. However, current approaches still struggle with complex, long-horizon sequential decision making. In this work, we introduce DEtached value learning with Action Sequence (DEAS), a simple yet effective offline RL framework that leverages action sequences for value learning. These temporally extended actions provide richer information than single-step actions and can be interpreted through the options framework via semi-Markov decision process Q-learning, enabling reduction of the effective planning horizon by considering longer sequences at once. However, directly adopting such sequences in actor-critic algorithms introduces excessive value overestimation, which we address through detached value learning that steers value estimates toward in-distribution actions that achieve high return in the offline dataset. We demonstrate that DEAS consistently outperforms baselines on complex, long-horizon tasks from OGBench and can be applied to enhance the performance of large-scale Vision-Language-Action models that predict action sequences, significantly boosting performance in both RoboCasa Kitchen simulation tasks and real-world manipulation tasks.
comment: Project website: https://changyeon.site/deas
Probabilistically-Safe Bipedal Navigation over Uncertain Terrain via Conformal Prediction and Contraction Analysis
We address the challenge of enabling bipedal robots to traverse rough terrain by developing probabilistically safe planning and control strategies that ensure dynamic feasibility and centroidal robustness under terrain uncertainty. Specifically, we propose a high-level Model Predictive Control (MPC) navigation framework for a bipedal robot with a specified confidence level of safety that (i) enables safe traversal toward a desired goal location across a terrain map with uncertain elevations, and (ii) formally incorporates uncertainty bounds into the centroidal dynamics of locomotion control. To model the rough terrain, we employ Gaussian Process (GP) regression to estimate elevation maps and leverage Conformal Prediction (CP) to construct calibrated confidence intervals that capture the true terrain elevation. Building on this, we formulate contraction-based reachable tubes that explicitly account for terrain uncertainty, ensuring state convergence and tube invariance. In addition, we introduce a contraction-based flywheel torque control law for the reduced-order Linear Inverted Pendulum Model (LIPM), which stabilizes the angular momentum about the center-of-mass (CoM). This formulation provides both probabilistic safety and goal reachability guarantees. For a given confidence level, we establish the forward invariance of the proposed torque control law by demonstrating exponential stabilization of the actual CoM phase-space trajectory and the desired trajectory prescribed by the high-level planner. Finally, we evaluate the effectiveness of our planning framework through physics-based simulations of the Digit bipedal robot in MuJoCo.
comment: 9 pages, 4 figures
EB-MBD: Emerging-Barrier Model-Based Diffusion for Safe Trajectory Optimization in Highly Constrained Environments
We propose enforcing constraints on Model-Based Diffusion by introducing emerging barrier functions inspired by interior point methods. We show that constraints on Model-Based Diffusion can lead to catastrophic performance degradation, even on simple 2D systems due to sample inefficiency in the Monte Carlo approximation of the score function. We introduce Emerging-Barrier Model-Based Diffusion (EB-MBD) which uses progressively introduced barrier constraints to avoid these problems, significantly improving solution quality, without the need for computationally expensive operations such as projections. We analyze the sampling liveliness of samples each iteration to inform barrier parameter scheduling choice. We demonstrate results for 2D collision avoidance and a 3D underwater manipulator system and show that our method achieves lower cost solutions than Model-Based Diffusion, and requires orders of magnitude less computation time than projection based methods.
Differentiable Particle Optimization for Fast Sequential Manipulation
Sequential robot manipulation tasks require finding collision-free trajectories that satisfy geometric constraints across multiple object interactions in potentially high-dimensional configuration spaces. Solving these problems in real-time and at large scales has remained out of reach due to computational requirements. Recently, GPU-based acceleration has shown promising results, but prior methods achieve limited performance due to CPU-GPU data transfer overhead and complex logic that prevents full hardware utilization. To this end, we present SPaSM (Sampling Particle optimization for Sequential Manipulation), a fully GPU-parallelized framework that compiles constraint evaluation, sampling, and gradient-based optimization into optimized CUDA kernels for end-to-end trajectory optimization without CPU coordination. The method consists of a two-stage particle optimization strategy: first solving placement constraints through massively parallel sampling, then lifting solutions to full trajectory optimization in joint space. Unlike hierarchical approaches, SPaSM jointly optimizes object placements and robot trajectories to handle scenarios where motion feasibility constrains placement options. Experimental evaluation on challenging benchmarks demonstrates solution times in the realm of $\textbf{milliseconds}$ with a 100% success rate; a $4000\times$ speedup compared to existing approaches.
comment: 8 pages, 7 figures, 3 tables. Under review
CDE: Concept-Driven Exploration for Reinforcement Learning
Intelligent exploration remains a critical challenge in reinforcement learning (RL), especially in visual control tasks. Unlike low-dimensional state-based RL, visual RL must extract task-relevant structure from raw pixels, making exploration inefficient. We propose Concept-Driven Exploration (CDE), which leverages a pre-trained vision-language model (VLM) to generate object-centric visual concepts from textual task descriptions as weak, potentially noisy supervisory signals. Rather than directly conditioning on these noisy signals, CDE trains a policy to reconstruct the concepts via an auxiliary objective, using reconstruction accuracy as an intrinsic reward to guide exploration toward task-relevant objects. Because the policy internalizes these concepts, VLM queries are only needed during training, reducing dependence on external models during deployment. Across five challenging simulated visual manipulation tasks, CDE achieves efficient, targeted exploration and remains robust to noisy VLM predictions. Finally, we demonstrate real-world transfer by deploying CDE on a Franka Research 3 arm, attaining an 80\% success rate in a real-world manipulation task.
comment: Preprint
Adaptive Science Operations in Deep Space Missions Using Offline Belief State Planning SP
Deep space missions face extreme communication delays and environmental uncertainty that prevent real-time ground operations. To support autonomous science operations in communication-constrained environments, we present a partially observable Markov decision process (POMDP) framework that adaptively sequences spacecraft science instruments. We integrate a Bayesian network into the POMDP observation space to manage the high-dimensional and uncertain measurements typical of astrobiology missions. This network compactly encodes dependencies among measurements and improves the interpretability and computational tractability of science data. Instrument operation policies are computed offline, allowing resource-aware plans to be generated and thoroughly validated prior to launch. We use the Enceladus Orbilander's proposed Life Detection Suite (LDS) as a case study, demonstrating how Bayesian network structure and reward shaping influence system performance. We compare our method against the mission's baseline Concept of Operations (ConOps), evaluating both misclassification rates and performance in off-nominal sample accumulation scenarios. Our approach reduces sample identification errors by nearly 40%
comment: 7 pages, 4 tables, 5 figures, accepted in IEEE ISPARO 2026
Adaptive Motion Planning via Contact-Based Intent Inference for Human-Robot Collaboration
Human-robot collaboration (HRC) requires robots to adapt their motions to human intent to ensure safe and efficient cooperation in shared spaces. Although large language models (LLMs) provide high-level reasoning for inferring human intent, their application to reliable motion planning in HRC remains challenging. Physical human-robot interaction (pHRI) is intuitive but often relies on continuous kinesthetic guidance, which imposes burdens on operators. To address these challenges, a contact-informed adaptive motion-planning framework is introduced to infer human intent directly from physical contact and employ the inferred intent for online motion correction in HRC. First, an optimization-based force estimation method is proposed to infer human-intended contact forces and locations from joint torque measurements and a robot dynamics model, thereby reducing cost and installation complexity while enabling whole-body sensitivity. Then, a torque-based contact detection mechanism with link-level localization is introduced to reduce the optimization search space and to enable real-time estimation. Subsequently, a contact-informed adaptive motion planner is developed to infer human intent from contacts and to replan robot motion online, while maintaining smoothness and adapting to human corrections. Finally, experiments on a 7-DOF manipulator are conducted to demonstrate the accuracy of the proposed force estimation method and the effectiveness of the contact-informed adaptive motion planner under perception uncertainty in HRC.
Humanoid Everyday: A Comprehensive Robotic Dataset for Open-World Humanoid Manipulation
From loco-motion to dextrous manipulation, humanoid robots have made remarkable strides in demonstrating complex full-body capabilities. However, the majority of current robot learning datasets and benchmarks mainly focus on stationary robot arms, and the few existing humanoid datasets are either confined to fixed environments or limited in task diversity, often lacking human-humanoid interaction and lower-body locomotion. Moreover, there are a few standardized evaluation platforms for benchmarking learning-based policies on humanoid data. In this work, we present Humanoid Everyday, a large-scale and diverse humanoid manipulation dataset characterized by extensive task variety involving dextrous object manipulation, human-humanoid interaction, locomotion-integrated actions, and more. Leveraging a highly efficient human-supervised teleoperation pipeline, Humanoid Everyday aggregates high-quality multimodal sensory data, including RGB, depth, LiDAR, and tactile inputs, together with natural language annotations, comprising 10.3k trajectories and over 3 million frames of data across 260 tasks across 7 broad categories. In addition, we conduct an analysis of representative policy learning methods on our dataset, providing insights into their strengths and limitations across different task categories. For standardized evaluation, we introduce a cloud-based evaluation platform that allows researchers to seamlessly deploy their policies in our controlled setting and receive performance feedback. By releasing Humanoid Everyday along with our policy learning analysis and a standardized cloud-based evaluation platform, we intend to advance research in general-purpose humanoid manipulation and lay the groundwork for more capable and embodied robotic agents in real-world scenarios. Our dataset, data collection code, and cloud evaluation website are made publicly available on our project website.
Geometry-aware Policy Imitation
We propose a Geometry-aware Policy Imitation (GPI) approach that rethinks imitation learning by treating demonstrations as geometric curves rather than collections of state-action samples. From these curves, GPI derives distance fields that give rise to two complementary control primitives: a progression flow that advances along expert trajectories and an attraction flow that corrects deviations. Their combination defines a controllable, non-parametric vector field that directly guides robot behavior. This formulation decouples metric learning from policy synthesis, enabling modular adaptation across low-dimensional robot states and high-dimensional perceptual inputs. GPI naturally supports multimodality by preserving distinct demonstrations as separate models and allows efficient composition of new demonstrations through simple additions to the distance field. We evaluate GPI in simulation and on real robots across diverse tasks. Experiments show that GPI achieves higher success rates than diffusion-based policies while running 20 times faster, requiring less memory, and remaining robust to perturbations. These results establish GPI as an efficient, interpretable, and scalable alternative to generative approaches for robotic imitation learning. Project website: https://yimingli1998.github.io/projects/GPI/
comment: 21 pages, 13 figures. In submission
Detecting spills using thermal imaging, pretrained deep learning models, and a robotic platform
This paper presents a real-time spill detection system that utilizes pretrained deep learning models with RGB and thermal imaging to classify spill vs. no-spill scenarios across varied environments. Using a balanced binary dataset (4,000 images), our experiments demonstrate the advantages of thermal imaging in inference speed, accuracy, and model size. We achieve up to 100% accuracy using lightweight models like VGG19 and NasNetMobile, with thermal models performing faster and more robustly across different lighting conditions. Our system runs on consumer-grade hardware (RTX 4080) and achieves inference times as low as 44 ms with model sizes under 350 MB, highlighting its deployability in safety-critical contexts. Results from experiments with a real robot and test datasets indicate that a VGG19 model trained on thermal imaging performs best.
comment: 6 pages
Zero-Shot Policy Transfer in Reinforcement Learning using Buckingham's Pi Theorem
Reinforcement learning (RL) policies often fail to generalize to new robots, tasks, or environments with different physical parameters, a challenge that limits their real-world applicability. This paper presents a simple, zero-shot transfer method based on Buckingham's Pi Theorem to address this limitation. The method adapts a pre-trained policy to new system contexts by scaling its inputs (observations) and outputs (actions) through a dimensionless space, requiring no retraining. The approach is evaluated against a naive transfer baseline across three environments of increasing complexity: a simulated pendulum, a physical pendulum for sim-to-real validation, and the high-dimensional HalfCheetah. Results demonstrate that the scaled transfer exhibits no loss of performance on dynamically similar contexts. Furthermore, on non-similar contexts, the scaled policy consistently outperforms the naive transfer, significantly expanding the volume of contexts where the original policy remains effective. These findings demonstrate that dimensional analysis provides a powerful and practical tool to enhance the robustness and generalization of RL policies.
BEAR: Benchmarking and Enhancing Multimodal Language Models for Atomic Embodied Capabilities
Embodied capabilities refer to a suite of fundamental abilities for an agent to perceive, comprehend, and interact with the physical world. While multimodal large language models (MLLMs) show promise as embodied agents, a thorough and systematic evaluation of their embodied capabilities remains underexplored, as existing benchmarks primarily focus on specific domains such as planning or spatial understanding. To bridge this gap, we introduce BEAR, a comprehensive and fine-grained benchmark that evaluates MLLMs on atomic embodied capabilities. BEAR comprises 4,469 interleaved image-video-text entries across 14 domains in 6 categories, including tasks from low-level pointing, trajectory understanding, spatial reasoning, to high-level planning. Extensive evaluation results of 20 representative MLLMs reveal their persistent limitations across all domains of embodied capabilities. To tackle the shortfall, we propose BEAR-Agent, a multimodal conversable agent that integrates pretrained vision models to strengthen MLLM perception, 3D understanding, and planning capabilities. It substantially enhances MLLM performance across diverse embodied capabilities on BEAR, yielding a 9.12% absolute gain and a relative improvement of 17.5% on GPT-5. Furthermore, our experiments indicate that improving MLLM embodied capabilities can benefit embodied tasks in simulated environments. Project website: https://bear-official66.github.io/
Whole Body Model Predictive Control for Spin-Aware Quadrupedal Table Tennis ICRA 2026
Developing table tennis robots that mirror human speed, accuracy, and ability to predict and respond to the full range of ball spins remains a significant challenge for legged robots. To demonstrate these capabilities we present a system to play dynamic table tennis for quadrupedal robots that integrates high speed perception, trajectory prediction, and agile control. Our system uses external cameras for high-speed ball localization, physical models with learned residuals to infer spin and predict trajectories, and a novel model predictive control (MPC) formulation for agile full-body control. Notably, a continuous set of stroke strategies emerge automatically from different ball return objectives using this control paradigm. We demonstrate our system in the real world on a Spot quadruped, evaluate accuracy of each system component, and exhibit coordination through the system's ability to aim and return balls with varying spin types. As a further demonstration, the system is able to rally with human players.
comment: Submitted to appear in IEEE ICRA 2026
Point and Go: Intuitive Reference Frame Reallocation in Mode Switching for Assistive Robotics
Operating high degree of freedom robots can be difficult for users of wheelchair mounted robotic manipulators. Mode switching in Cartesian space has several drawbacks such as unintuitive control reference frames, separate translation and orientation control, and limited movement capabilities that hinder performance. We propose Point and Go mode switching, which reallocates the Cartesian mode switching reference frames into a more intuitive action space comprised of new translation and rotation modes. We use a novel sweeping motion to point the gripper, which defines the new translation axis along the robot base frame's horizontal plane. This creates an intuitive `point and go' translation mode that allows the user to easily perform complex, human-like movements without switching control modes. The system's rotation mode combines position control with a refined end-effector oriented frame that provides precise and consistent robot actions in various end-effector poses. We verified its effectiveness through initial experiments, followed by a three-task user study that compared our method to Cartesian mode switching and a state of the art learning method. Results show that Point and Go mode switching reduced completion times by 31\%, pauses by 41\%, and mode switches by 33\%, while receiving significantly favorable responses in user surveys.
comment: 7 Pages, 5 figures
Unified World Models: Memory-Augmented Planning and Foresight for Visual Navigation
Enabling embodied agents to effectively imagine future states is critical for robust and generalizable visual navigation. Current state-of-the-art approaches, however, adopt modular architectures that separate navigation planning from visual world modeling, leading to state-action misalignment and limited adaptability in novel or dynamic scenarios. To overcome this fundamental limitation, we propose UniWM, a unified, memory-augmented world model integrating egocentric visual foresight and planning within a single multimodal autoregressive backbone. Unlike modular frameworks, UniWM explicitly grounds action decisions in visually imagined outcomes, ensuring tight alignment between prediction and control. A hierarchical memory mechanism further integrates detailed short-term perceptual cues with longer-term trajectory context, enabling stable, coherent reasoning over extended horizons. Extensive experiments across four challenging benchmarks (Go Stanford, ReCon, SCAND, HuRoN) demonstrate that UniWM substantially improves navigation success rates by up to 30%, significantly reduces trajectory errors compared to strong baselines, and exhibits impressive zero-shot generalization on the unseen TartanDrive dataset. These results highlight UniWM as a principled step toward unified, imagination-driven embodied navigation.
comment: 18 pages, 11 figures, code: https://github.com/F1y1113/UniWM
ConPoSe: LLM-Guided Contact Point Selection for Scalable Cooperative Object Pushing
Object transportation in cluttered environments is a fundamental task in various domains, including domestic service and warehouse logistics. In cooperative object transport, multiple robots must coordinate to move objects that are too large for a single robot. One transport strategy is pushing, which only requires simple robots. However, careful selection of robot-object contact points is necessary to push the object along a preplanned path. Although this selection can be solved analytically, the solution space grows combinatorially with the number of robots and object size, limiting scalability. Inspired by how humans rely on common-sense reasoning for cooperative transport, we propose combining the reasoning capabilities of Large Language Models with local search to select suitable contact points. Our LLM-guided local search method for contact point selection, ConPoSe, successfully selects contact points for a variety of shapes, including cuboids, cylinders, and T-shapes. We demonstrate that ConPoSe scales better with the number of robots and object size than the analytical approach, and also outperforms pure LLM-based selection.
FreeTacMan: Robot-free Visuo-Tactile Data Collection System for Contact-rich Manipulation
Enabling robots with contact-rich manipulation remains a pivotal challenge in robot learning, which is substantially hindered by the data collection gap, including its inefficiency and limited sensor setup. While prior work has explored handheld paradigms, their rod-based mechanical structures remain rigid and unintuitive, providing limited tactile feedback and posing challenges for human operators. Motivated by the dexterity and force feedback of human motion, we propose FreeTacMan, a human-centric and robot-free data collection system for accurate and efficient robot manipulation. Concretely, we design a wearable gripper with dual visuo-tactile sensors for data collection, which can be worn by human fingers for intuitive control. A high-precision optical tracking system is introduced to capture end-effector poses while synchronizing visual and tactile feedback simultaneously. We leverage FreeTacMan to collect a large-scale multimodal dataset, comprising over 3000k paired visual-tactile images with end-effector poses, 10k demonstration trajectories across 50 diverse contact-rich manipulation tasks. FreeTacMan achieves multiple improvements in data collection performance compared to prior works, and enables effective policy learning for contact-rich manipulation tasks with self-collected dataset. The full suite of hardware specifications and the dataset will be released to facilitate reproducibility and support research in visuo-tactile manipulation.
Uncertainty Comes for Free: Human-in-the-Loop Policies with Diffusion Models
Human-in-the-loop (HitL) robot deployment has gained significant attention in both academia and industry as a semi-autonomous paradigm that enables human operators to intervene and adjust robot behaviors at deployment time, improving success rates. However, continuous human monitoring and intervention can be highly labor-intensive and impractical when deploying a large number of robots. To address this limitation, we propose a method that allows diffusion policies to actively seek human assistance only when necessary, reducing reliance on constant human oversight. To achieve this, we leverage the generative process of diffusion policies to compute an uncertainty-based metric based on which the autonomous agent can decide to request operator assistance at deployment time, without requiring any operator interaction during training. Additionally, we show that the same method can be used for efficient data collection for fine-tuning diffusion policies in order to improve their autonomous performance. Experimental results from simulated and real-world environments demonstrate that our approach enhances policy performance during deployment for a variety of scenarios.
ERR@HRI 2.0 Challenge: Multimodal Detection of Errors and Failures in Human-Robot Conversations
The integration of large language models (LLMs) into conversational robots has made human-robot conversations more dynamic. Yet, LLM-powered conversational robots remain prone to errors, e.g., misunderstanding user intent, prematurely interrupting users, or failing to respond altogether. Detecting and addressing these failures is critical for preventing conversational breakdowns, avoiding task disruptions, and sustaining user trust. To tackle this problem, the ERR@HRI 2.0 Challenge provides a multimodal dataset of LLM-powered conversational robot failures during human-robot conversations and encourages researchers to benchmark machine learning models designed to detect robot failures. The dataset includes 16 hours of dyadic human-robot interactions, incorporating facial, speech, and head movement features. Each interaction is annotated with the presence or absence of robot errors from the system perspective, and perceived user intention to correct for a mismatch between robot behavior and user expectation. Participants are invited to form teams and develop machine learning models that detect these failures using multimodal data. Submissions will be evaluated using various performance metrics, including detection accuracy and false positive rate. This challenge represents another key step toward improving failure detection in human-robot interaction through social signal analysis.
iA*: Imperative Learning-based A* Search for Path Planning
Path planning, which aims to find a collision-free path between two locations, is critical for numerous applications ranging from mobile robots to self-driving vehicles. Traditional search-based methods like A* search guarantee path optimality but are often computationally expensive when handling large-scale maps. While learning-based methods alleviate this issue by incorporating learned constraints into their search procedures, they often face challenges like overfitting and reliance on extensive labeled datasets. To address these limitations, we propose Imperative A* (iA*), a novel self-supervised path planning framework leveraging bilevel optimization (BLO) and imperative learning (IL). The iA* framework integrates a neural network that predicts node costs with a differentiable A* search mechanism, enabling efficient self-supervised training via bilevel optimization. This integration significantly enhances the balance between search efficiency and path optimality while improving generalization to previously unseen maps. Extensive experiments demonstrate that iA* outperforms both classical and supervised learning-based methods, achieving an average reduction of 65.7\% in search area and 54.4\% in runtime, underscoring its effectiveness in robot path planning tasks.
FG-PE: Factor-graph Approach for Multi-robot Pursuit-Evasion
With the increasing use of robots in daily life, there is a growing need to provide robust collaboration protocols for robots to tackle more complicated and dynamic problems effectively. This paper presents a novel, factor graph-based approach to address the pursuit-evasion problem, enabling accurate estimation, planning, and tracking of an evader by multiple pursuers working together. It is assumed that there are multiple pursuers and only one evader in this scenario. The proposed method significantly improves the accuracy of evader estimation and tracking, allowing pursuers to capture the evader in the shortest possible time and distance compared to existing techniques. In addition to these primary objectives, the proposed approach effectively minimizes uncertainty while remaining robust, even when communication issues lead to some messages being dropped or lost. Through a series of comprehensive experiments, this paper demonstrates that the proposed algorithm consistently outperforms traditional pursuit-evasion methods across several key performance metrics, such as the time required to capture the evader and the average distance traveled by the pursuers. Additionally, the proposed method is tested in real-world hardware experiments, further validating its effectiveness and applicability.
Ultra-Efficient On-Device Object Detection on AI-Integrated Smart Glasses with TinyissimoYOLO ECCV 2024
Smart glasses are rapidly gaining advanced functions thanks to cutting-edge computing technologies, especially accelerated hardware architectures, and tiny Artificial Intelligence (AI) algorithms. However, integrating AI into smart glasses featuring a small form factor and limited battery capacity remains challenging for a satisfactory user experience. To this end, this paper proposes the design of a smart glasses platform for always-on on-device object detection with an all-day battery lifetime. The proposed platform is based on GAP9, a novel multi-core RISC-V processor from Greenwaves Technologies. Additionally, a family of sub-million parameter TinyissimoYOLO networks are proposed. They are benchmarked on established datasets, capable of differentiating up to 80 classes on MS-COCO. Evaluations on the smart glasses prototype demonstrate TinyissimoYOLO's inference latency of only 17ms and consuming 1.59mJ energy per inference. An end-to-end latency of 56ms is achieved which is equivalent to 18 frames per seconds (FPS) with a total power consumption of 62.9mW. This ensures continuous system runtime of up to 9.3 hours on a 154mAh battery. These results outperform MCUNet (TinyNAS+TinyEngine), which runs a simpler task (image classification) at just 7.3 FPS, while the 18 FPS achieved in this paper even include image-capturing, network inference, and detection post-processing. The algorithm's code is released open with this paper and can be found here: https://github.com/ETH-PBL/TinyissimoYOLO
comment: This paper has been accepted for publication at ECCV 2024 Workshops, Milan, 2024
TIGeR: Tool-Integrated Geometric Reasoning in Vision-Language Models for Robotics
Vision-Language Models (VLMs) have shown remarkable capabilities in spatial reasoning, yet they remain fundamentally limited to qualitative precision and lack the computational precision required for real-world robotics. Current approaches fail to leverage metric cues from depth sensors and camera calibration, instead reducing geometric problems to pattern recognition tasks that cannot deliver the centimeter-level accuracy essential for robotic manipulation. We present TIGeR (Tool-Integrated Geometric Reasoning), a novel framework that transforms VLMs from perceptual estimators to geometric computers by enabling them to generate and execute precise geometric computations through external tools. Rather than attempting to internalize complex geometric operations within neural networks, TIGeR empowers models to recognize geometric reasoning requirements, synthesize appropriate computational code, and invoke specialized libraries for exact calculations. To support this paradigm, we introduce TIGeR-300K, a comprehensive tool-invocation-oriented dataset covering point transformations, pose estimation, and spatial compatibility verification, complete with tool invocation sequences and intermediate computations. Through a two-stage training pipeline combining supervised fine-tuning (SFT) and reinforcement fine-tuning (RFT) with our proposed hierarchical reward design, TIGeR achieves SOTA performance on geometric reasoning benchmarks while demonstrating centimeter-level precision in real-world robotic manipulation tasks.
comment: 9 pages, 6 figures
Product-oriented Product-Process-Resource Asset Network and its Representation in AutomationML for Asset Administration Shell
Current products, especially in the automotive sector, pose complex technical systems having a multi-disciplinary mechatronic nature. Industrial standards supporting system engineering and production typically (i) address the production phase only, but do not cover the complete product life cycle, and (ii) focus on production processes and resources rather than the products themselves. The presented approach is motivated by incorporating the impacts of the end-of-life phase of the product life cycle into the engineering phase. This paper proposes a modeling approach coming up from the Product-Process-Resource (PPR) modeling paradigm. It combines requirements on (i) respecting the product structure as a basis for the model, and (ii) incorporates repairing, remanufacturing, or upcycling within cyber-physical production systems. The proposed model called PoPAN should accompany the product during the entire life cycle as a digital shadow encapsulated within the Asset Administration Shell of a product. To facilitate the adoption of the proposed paradigm, the paper also proposes serialization of the model in the AutomationML data format. The model is demonstrated on a use-case for disassembling electric vehicle batteries to support their remanufacturing for stationary battery applications.
comment: \copyright 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Hierarchical Reinforcement Learning with Low-Level MPC for Multi-Agent Control
Achieving safe and coordinated behavior in dynamic, constraint-rich environments remains a major challenge for learning-based control. Pure end-to-end learning often suffers from poor sample efficiency and limited reliability, while model-based methods depend on predefined references and struggle to generalize. We propose a hierarchical framework that combines tactical decision-making via reinforcement learning (RL) with low-level execution through Model Predictive Control (MPC). For the case of multi-agent systems this means that high-level policies select abstract targets from structured regions of interest (ROIs), while MPC ensures dynamically feasible and safe motion. Tested on a predator-prey benchmark, our approach outperforms end-to-end and shielding-based RL baselines in terms of reward, safety, and consistency, underscoring the benefits of combining structured learning with model-based control.
OmniNav: A Unified Framework for Prospective Exploration and Visual-Language Navigation
Embodied navigation presents a core challenge for intelligent robots, requiring the comprehension of visual environments, natural language instructions, and autonomous exploration. Existing models often fall short in offering a unified solution across diverse navigation paradigms, resulting in low success rates and limited generalization. We introduce OmniNav, a unified framework addressing instruct-goal, object-goal, point-goal navigation, and frontier-based exploration within a single architecture. Our approach features a lightweight, low-latency policy that accurately predicts continuous-space waypoints (coordinates and orientations). This policy surpasses action-chunk methods in precision and supports real-world deployment at control frequencies up to 5 Hz. Architecturally, OmniNav employs a fast-slow system design: a fast module generates waypoints using short-horizon visual context and subtasks, while a slow module performs deliberative planning with long-horizon observations and candidate frontiers to select subsequent subgoals and subtasks. This collaboration enhances path efficiency and maintains trajectory coherence, particularly in exploration and memory-intensive scenarios. Crucially, we identify that the primary bottleneck isn't merely navigation policy learning, but a robust understanding of general instructions and objects. To boost generalization, OmniNav integrates large-scale, general-purpose training datasets, including those for image captioning and visual recognition, into a joint multi-task regimen. This significantly improves success rates and robustness. Extensive experiments confirm OmniNav's state-of-the-art performance across various navigation benchmarks, with real-world deployment further validating its efficacy. OmniNav provides practical insights for embodied navigation, charting a scalable path towards versatile, highly generalizable robotic intelligence.
Beyond Behavior Cloning: Robustness through Interactive Imitation and Contrastive Learning
Behavior cloning (BC) traditionally relies on demonstration data, assuming the demonstrated actions are optimal. This can lead to overfitting under noisy data, particularly when expressive models are used (e.g., the energy-based model in Implicit BC). To address this, we extend behavior cloning into an iterative process of optimal action estimation within the Interactive Imitation Learning framework. Specifically, we introduce Contrastive policy Learning from Interactive Corrections (CLIC). CLIC leverages human corrections to estimate a set of desired actions and optimizes the policy to select actions from this set. Extensive simulation and real-robot experiments validate CLIC's advantages over existing state-of-the-art methods, including stable training of energy-based models, robustness to feedback noise, and adaptability to diverse feedback types beyond demonstrations. Our implementation is publicly available at https://github.com/clic-webpage/CLIC.
ManipGPT: Is Affordance Segmentation by Large Vision Models Enough for Articulated Object Manipulation?
Visual actionable affordance has emerged as a transformative approach in robotics, focusing on perceiving interaction areas prior to manipulation. Traditional methods rely on pixel sampling to identify successful interaction samples or processing pointclouds for affordance mapping. However, these approaches are computationally intensive and struggle to adapt to diverse and dynamic environments. This paper introduces ManipGPT, a framework designed to predict optimal interaction areas for articulated objects using a large pre-trained vision transformer (ViT). We create a dataset of 9.9k simulated and real images to bridge the visual sim-to-real gap and enhance real-world applicability. By fine-tuning the vision transformer on this small dataset, we significantly improve part-level affordance segmentation, adapting the model's in-context segmentation capabilities to robot manipulation scenarios. This enables effective manipulation across simulated and real-world environments by generating part-level affordance masks, paired with an impedance adaptation policy, sufficiently eliminating the need for complex datasets or perception systems.
comment: 8 pages, 6 figures
A Long-Duration Autonomy Approach to Connected and Automated Vehicles
In this article, we present a long-duration autonomy approach for the control of connected and automated vehicles (CAVs) operating in a transportation network. In particular, we focus on the performance of CAVs at traffic bottlenecks, including roundabouts, merging roadways, and intersections. We take a principled approach based on optimal control, and derive a reactive controller with guarantees on safety, performance, and energy efficiency. We guarantee safety through high order control barrier functions (HOCBFs), which we ``lift'' to first order CBFs using time-optimal motion primitives. This yields a set of first-order CBFs that are compatible with the control bounds. We demonstrate the performance of our approach in simulation and compare it to an optimal control-based approach.
comment: 8 pages, 3 figures
Fast Online Adaptive Neural MPC via Meta-Learning
Data-driven model predictive control (MPC) has demonstrated significant potential for improving robot control performance in the presence of model uncertainties. However, existing approaches often require extensive offline data collection and computationally intensive training, limiting their ability to adapt online. To address these challenges, this paper presents a fast online adaptive MPC framework that leverages neural networks integrated with Model-Agnostic Meta-Learning (MAML). Our approach focuses on few-shot adaptation of residual dynamics - capturing the discrepancy between nominal and true system behavior - using minimal online data and gradient steps. By embedding these meta-learned residual models into a computationally efficient L4CasADi-based MPC pipeline, the proposed method enables rapid model correction, enhances predictive accuracy, and improves real-time control performance. We validate the framework through simulation studies on a Van der Pol oscillator, a Cart-Pole system, and a 2D quadrotor. Results show significant gains in adaptation speed and prediction accuracy over both nominal MPC and nominal MPC augmented with a freshly initialized neural network, underscoring the effectiveness of our approach for real-time adaptive robot control.
Characterizing and Optimizing Real-Time Optimal Control for Embedded SoCs
Resource-limited robots face significant challenges in executing computationally intensive tasks, such as locomotion and manipulation, particularly for real-time optimal control algorithms like Model Predictive Control (MPC). This paper provides a comprehensive design space exploration to identify optimal hardware computation architectures for these demanding model-based control algorithms. We profile and optimize representative architectural designs, including general-purpose scalar CPUs, vector processors, and specialized accelerators. By characterizing kernel-level benchmarks and end-to-end robotic scenarios, including a hardware-in-the-loop evaluation on a fabricated RISC-V multi-core vector SoC, we present a quantitative comparison of performance, area, and utilization across distinct architectural design points. Our findings demonstrate that targeted architectural modifications, coupled with deep software and system optimizations, enable up to 3.71x speedups for MPC, resulting in up to 27% system-level power reductions while completing robotic tasks. Finally, we propose a code generation flow designed to simplify the complex engineering effort required for mapping robotic workloads onto specialized architectures.
Hybrid Feedback Control for Global Navigation with Locally Optimal Obstacle Avoidance in n-Dimensional Spaces
We present a hybrid feedback control framework for autonomous robot navigation in n-dimensional Euclidean spaces cluttered with spherical obstacles. The proposed approach ensures safe and global navigation towards a target location by dynamically switching between two operational modes: motion-to-destination and locally optimal obstacle-avoidance. It produces continuous velocity inputs, ensures collision-free trajectories and generates locally optimal obstacle avoidance maneuvers. Unlike existing methods, the proposed framework is compatible with range sensors, enabling navigation in both a priori known and unknown environments. Extensive simulations in 2D and 3D settings, complemented by experimental validation on a TurtleBot 4 platform, confirm the efficacy and robustness of the approach. Our results demonstrate shorter paths and smoother trajectories compared to state-of-the-art methods, while maintaining computational efficiency and real-world feasibility.
Safe Autonomous Environmental Contact for Soft Robots using Control Barrier Functions
Robots built from soft materials will inherently apply lower environmental forces than their rigid counterparts, and therefore may be more suitable in sensitive settings with unintended contact. However, these robots' applied forces result from both their design and their control system in closed-loop, and therefore, ensuring bounds on these forces requires controller synthesis for safety as well. This article introduces the first feedback controller for a soft manipulator that formally meets a safety specification with respect to environmental contact. In our proof-of-concept setting, the robot's environment has known geometry and is deformable with a known elastic modulus. Our approach maps a bound on applied forces to a safe set of positions of the robot's tip via predicted deformations of the environment. Then, a quadratic program with Control Barrier Functions in its constraints is used to supervise a nominal feedback signal, verifiably maintaining the robot's tip within this safe set. Hardware experiments on a multi-segment soft pneumatic robot demonstrate that the proposed framework successfully maintains a positive safety margin. This framework represents a fundamental shift in perspective on control and safety for soft robots, implementing a formally verifiable logic specification on their pose and contact forces.
comment: 8 pages, 9 figures
Through the Perspective of LiDAR: A Feature-Enriched and Uncertainty-Aware Annotation Pipeline for Terrestrial Point Cloud Segmentation
Accurate semantic segmentation of terrestrial laser scanning (TLS) point clouds is limited by costly manual annotation. We propose a semi-automated, uncertainty-aware pipeline that integrates spherical projection, feature enrichment, ensemble learning, and targeted annotation to reduce labeling effort, while sustaining high accuracy. Our approach projects 3D points to a 2D spherical grid, enriches pixels with multi-source features, and trains an ensemble of segmentation networks to produce pseudo-labels and uncertainty maps, the latter guiding annotation of ambiguous regions. The 2D outputs are back-projected to 3D, yielding densely annotated point clouds supported by a three-tier visualization suite (2D feature maps, 3D colorized point clouds, and compact virtual spheres) for rapid triage and reviewer guidance. Using this pipeline, we build Mangrove3D, a semantic segmentation TLS dataset for mangrove forests. We further evaluate data efficiency and feature importance to address two key questions: (1) how much annotated data are needed and (2) which features matter most. Results show that performance saturates after ~12 annotated scans, geometric features contribute the most, and compact nine-channel stacks capture nearly all discriminative power, with the mean Intersection over Union (mIoU) plateauing at around 0.76. Finally, we confirm the generalization of our feature-enrichment strategy through cross-dataset tests on ForestSemantic and Semantic3D. Our contributions include: (i) a robust, uncertainty-aware TLS annotation pipeline with visualization tools; (ii) the Mangrove3D dataset; and (iii) empirical guidance on data efficiency and feature importance, thus enabling scalable, high-quality segmentation of TLS point clouds for ecological monitoring and beyond. The dataset and processing scripts are publicly available at https://fz-rit.github.io/through-the-lidars-eye/.
comment: 40 pages (28 main text), 20 figures, 4 supplementary materials; links to 3D point animations are included in the last table
Real-time Human Finger Pointing Recognition and Estimation for Robot Directives Using a Single Web-Camera
Gestures play a pivotal role in human communication, often serving as a preferred or complementary medium to verbal expression due to their superior spatial reference capabilities. A finger-pointing gesture conveys vital information regarding some point of interest in the environment. In Human-Robot Interaction (HRI), users can easily direct robots to target locations, facilitating tasks in diverse domains such as search and rescue or factory assistance. State-of-the-art approaches for visual pointing estimation often rely on depth cameras, are limited to indoor environments, and provide discrete predictions between limited targets. In this paper, we explore the development of models that enable robots to understand pointing directives from humans using a single web camera, even in diverse indoor and outdoor environments. A novel perception framework is proposed which includes a designated data-based model termed PointingNet. PointingNet recognizes the occurrence of pointing through classification followed by approximating the position and direction of the index finger with an advanced regression model. The model relies on a novel segmentation model for masking any lifted arm. While state-of-the-art human pose estimation models provide poor pointing angle estimation error of 28deg, PointingNet exhibits a mean error of less than 2deg. With the pointing information, the target location is computed, followed by robot motion planning and execution. The framework is evaluated on two robotic systems, demonstrating accurate target reaching.
IG-MCTS: Human-in-the-Loop Cooperative Navigation under Incomplete Information
Human-robot cooperative navigation is challenging under incomplete information. We introduce CoNav-Maze, a simulated environment where a robot navigates with local perception while a human operator provides guidance based on an inaccurate map. The robot can share its onboard camera views to help the operator refine their understanding of the environment. To enable efficient cooperation, we propose Information Gain Monte Carlo Tree Search (IG-MCTS), an online planning algorithm that jointly optimizes autonomous movement and informative communication. IG-MCTS leverages a learned Neural Human Perception Model (NHPM) -- trained on a crowdsourced mapping dataset -- to predict how the human's internal map evolves as new observations are shared. User studies show that IG-MCTS significantly reduces communication demands and yields eye-tracking metrics indicative of lower cognitive load, while maintaining task performance comparable to teleoperation and instruction-following baselines. Finally, we illustrate generalization beyond discrete mazes through a continuous-space waterway navigation setting, in which NHPM benefits from deeper encoder-decoder architectures and IG-MCTS leverages a dynamically constructed Voronoi-partitioned traversability graph.
HA-VLN 2.0: An Open Benchmark and Leaderboard for Human-Aware Navigation in Discrete and Continuous Environments with Dynamic Multi-Human Interactions
Vision-and-Language Navigation (VLN) has been studied mainly in either discrete or continuous settings, with little attention to dynamic, crowded environments. We present HA-VLN 2.0, a unified benchmark introducing explicit social-awareness constraints. Our contributions are: (i) a standardized task and metrics capturing both goal accuracy and personal-space adherence; (ii) HAPS 2.0 dataset and simulators modeling multi-human interactions, outdoor contexts, and finer language-motion alignment; (iii) benchmarks on 16,844 socially grounded instructions, revealing sharp performance drops of leading agents under human dynamics and partial observability; and (iv) real-world robot experiments validating sim-to-real transfer, with an open leaderboard enabling transparent comparison. Results show that explicit social modeling improves navigation robustness and reduces collisions, underscoring the necessity of human-centric approaches. By releasing datasets, simulators, baselines, and protocols, HA-VLN 2.0 provides a strong foundation for safe, socially responsible navigation research.
comment: 33 pages, 20 figures, website: https://ha-vln-project.vercel.app/
Artists' Views on Robotics Involvement in Painting Productions
As robotic technologies evolve, their potential in artistic creation becomes an increasingly relevant topic of inquiry. This study explores how professional abstract artists perceive and experience co-creative interactions with an autonomous painting robotic arm. Eight artists engaged in six painting sessions -- three with a human partner, followed by three with the robot -- and subsequently participated in semi-structured interviews analyzed through reflexive thematic analysis. Human-human interactions were described as intuitive, dialogic, and emotionally engaging, whereas human-robot sessions felt more playful and reflective, offering greater autonomy and prompting for novel strategies to overcome the system's limitations. This work offers one of the first empirical investigations into artists' lived experiences with a robot, highlighting the value of long-term engagement and a multidisciplinary approach to human-robot co-creation.
comment: 10 pages, 9 figures, submitted to RAM special issue: Arts and Robotics
Robotics
WristWorld: Generating Wrist-Views via 4D World Models for Robotic Manipulation
Wrist-view observations are crucial for VLA models as they capture fine-grained hand-object interactions that directly enhance manipulation performance. Yet large-scale datasets rarely include such recordings, resulting in a substantial gap between abundant anchor views and scarce wrist views. Existing world models cannot bridge this gap, as they require a wrist-view first frame and thus fail to generate wrist-view videos from anchor views alone. Amid this gap, recent visual geometry models such as VGGT emerge with geometric and cross-view priors that make it possible to address extreme viewpoint shifts. Inspired by these insights, we propose WristWorld, the first 4D world model that generates wrist-view videos solely from anchor views. WristWorld operates in two stages: (i) Reconstruction, which extends VGGT and incorporates our Spatial Projection Consistency (SPC) Loss to estimate geometrically consistent wrist-view poses and 4D point clouds; (ii) Generation, which employs our video generation model to synthesize temporally coherent wrist-view videos from the reconstructed perspective. Experiments on Droid, Calvin, and Franka Panda demonstrate state-of-the-art video generation with superior spatial consistency, while also improving VLA performance, raising the average task completion length on Calvin by 3.81% and closing 42.4% of the anchor-wrist view gap.
HyPlan: Hybrid Learning-Assisted Planning Under Uncertainty for Safe Autonomous Driving
We present a novel hybrid learning-assisted planning method, named HyPlan, for solving the collision-free navigation problem for self-driving cars in partially observable traffic environments. HyPlan combines methods for multi-agent behavior prediction, deep reinforcement learning with proximal policy optimization and approximated online POMDP planning with heuristic confidence-based vertical pruning to reduce its execution time without compromising safety of driving. Our experimental performance analysis on the CARLA-CTS2 benchmark of critical traffic scenarios with pedestrians revealed that HyPlan may navigate safer than selected relevant baselines and perform significantly faster than considered alternative online POMDP planners.
COMPAct: Computational Optimization and Automated Modular design of Planetary Actuators
The optimal design of robotic actuators is a critical area of research, yet limited attention has been given to optimizing gearbox parameters and automating actuator CAD. This paper introduces COMPAct: Computational Optimization and Automated Modular Design of Planetary Actuators, a framework that systematically identifies optimal gearbox parameters for a given motor across four gearbox types, single-stage planetary gearbox (SSPG), compound planetary gearbox (CPG), Wolfrom planetary gearbox (WPG), and double-stage planetary gearbox (DSPG). The framework minimizes mass and actuator width while maximizing efficiency, and further automates actuator CAD generation to enable direct 3D printing without manual redesign. Using this framework, optimal gearbox designs are explored over a wide range of gear ratios, providing insights into the suitability of different gearbox types across various gear ratio ranges. In addition, the framework is used to generate CAD models of all four gearbox types with varying gear ratios and motors. Two actuator types are fabricated and experimentally evaluated through power efficiency, no-load backlash, and transmission stiffness tests. Experimental results indicate that the SSPG actuator achieves a mechanical efficiency of 60-80 %, a no-load backlash of 0.59 deg, and a transmission stiffness of 242.7 Nm/rad, while the CPG actuator demonstrates 60 % efficiency, 2.6 deg backlash, and a stiffness of 201.6 Nm/rad. Code available at: https://anonymous.4open.science/r/COMPAct-SubNum-3408 Video: https://youtu.be/99zOKgxsDho
comment: 8 pages, 9 Figures, 2 tables, first two authors contributed equally
TIGeR: Tool-Integrated Geometric Reasoning in Vision-Language Models for Robotics
Vision-Language Models (VLMs) have shown remarkable capabilities in spatial reasoning, yet they remain fundamentally limited to qualitative precision and lack the computational precision required for real-world robotics. Current approaches fail to leverage metric cues from depth sensors and camera calibration, instead reducing geometric problems to pattern recognition tasks that cannot deliver the centimeter-level accuracy essential for robotic manipulation. We present TIGeR (Tool-Integrated Geometric Reasoning), a novel framework that transforms VLMs from perceptual estimators to geometric computers by enabling them to generate and execute precise geometric computations through external tools. Rather than attempting to internalize complex geometric operations within neural networks, TIGeR empowers models to recognize geometric reasoning requirements, synthesize appropriate computational code, and invoke specialized libraries for exact calculations. To support this paradigm, we introduce TIGeR-300K, a comprehensive tool-invocation-oriented dataset covering point transformations, pose estimation, trajectory generation, and spatial compatibility verification, complete with tool invocation sequences and intermediate computations. Through a two-stage training pipeline combining supervised fine-tuning (SFT) and reinforcement fine-tuning (RFT) with our proposed hierarchical reward design, TIGeR achieves SOTA performance on geometric reasoning benchmarks while demonstrating centimeter-level precision in real-world robotic manipulation tasks.
comment: 9 pages, 6 figures
A Narwhal-Inspired Sensing-to-Control Framework for Small Fixed-Wing Aircraft
Fixed-wing unmanned aerial vehicles (UAVs) offer endurance and efficiency but lack low-speed agility due to highly coupled dynamics. We present an end-to-end sensing-to-control pipeline that combines bio-inspired hardware, physics-informed dynamics learning, and convex control allocation. Measuring airflow on a small airframe is difficult because near-body aerodynamics, propeller slipstream, control-surface actuation, and ambient gusts distort pressure signals. Inspired by the narwhal's protruding tusk, we mount in-house multi-hole probes far upstream and complement them with sparse, carefully placed wing pressure sensors for local flow measurement. A data-driven calibration maps probe pressures to airspeed and flow angles. We then learn a control-affine dynamics model using the estimated airspeed/angles and sparse sensors. A soft left/right symmetry regularizer improves identifiability under partial observability and limits confounding between wing pressures and flaperon inputs. Desired wrenches (forces and moments) are realized by a regularized least-squares allocator that yields smooth, trimmed actuation. Wind-tunnel studies across a wide operating range show that adding wing pressures reduces force-estimation error by 25-30%, the proposed model degrades less under distribution shift (about 12% versus 44% for an unstructured baseline), and force tracking improves with smoother inputs, including a 27% reduction in normal-force RMSE versus a plain affine model and 34% versus an unstructured baseline.
DPL: Depth-only Perceptive Humanoid Locomotion via Realistic Depth Synthesis and Cross-Attention Terrain Reconstruction
Recent advancements in legged robot perceptive locomotion have shown promising progress. However, terrain-aware humanoid locomotion remains largely constrained to two paradigms: depth image-based end-to-end learning and elevation map-based methods. The former suffers from limited training efficiency and a significant sim-to-real gap in depth perception, while the latter depends heavily on multiple vision sensors and localization systems, resulting in latency and reduced robustness. To overcome these challenges, we propose a novel framework that tightly integrates three key components: (1) Terrain-Aware Locomotion Policy with a Blind Backbone, which leverages pre-trained elevation map-based perception to guide reinforcement learning with minimal visual input; (2) Multi-Modality Cross-Attention Transformer, which reconstructs structured terrain representations from noisy depth images; (3) Realistic Depth Images Synthetic Method, which employs self-occlusion-aware ray casting and noise-aware modeling to synthesize realistic depth observations, achieving over 30\% reduction in terrain reconstruction error. This combination enables efficient policy training with limited data and hardware resources, while preserving critical terrain features essential for generalization. We validate our framework on a full-sized humanoid robot, demonstrating agile and adaptive locomotion across diverse and challenging terrains.
ELMUR: External Layer Memory with Update/Rewrite for Long-Horizon RL
Real-world robotic agents must act under partial observability and long horizons, where key cues may appear long before they affect decision making. However, most modern approaches rely solely on instantaneous information, without incorporating insights from the past. Standard recurrent or transformer models struggle with retaining and leveraging long-term dependencies: context windows truncate history, while naive memory extensions fail under scale and sparsity. We propose ELMUR (External Layer Memory with Update/Rewrite), a transformer architecture with structured external memory. Each layer maintains memory embeddings, interacts with them via bidirectional cross-attention, and updates them through an Least Recently Used (LRU) memory module using replacement or convex blending. ELMUR extends effective horizons up to 100,000 times beyond the attention window and achieves a 100% success rate on a synthetic T-Maze task with corridors up to one million steps. In POPGym, it outperforms baselines on more than half of the tasks. On MIKASA-Robo sparse-reward manipulation tasks with visual observations, it nearly doubles the performance of strong baselines. These results demonstrate that structured, layer-local external memory offers a simple and scalable approach to decision making under partial observability.
comment: 22 pages, 7 figures
TrackVLA++: Unleashing Reasoning and Memory Capabilities in VLA Models for Embodied Visual Tracking
Embodied Visual Tracking (EVT) is a fundamental ability that underpins practical applications, such as companion robots, guidance robots and service assistants, where continuously following moving targets is essential. Recent advances have enabled language-guided tracking in complex and unstructured scenes. However, existing approaches lack explicit spatial reasoning and effective temporal memory, causing failures under severe occlusions or in the presence of similar-looking distractors. To address these challenges, we present TrackVLA++, a novel Vision-Language-Action (VLA) model that enhances embodied visual tracking with two key modules, a spatial reasoning mechanism and a Target Identification Memory (TIM). The reasoning module introduces a Chain-of-Thought paradigm, termed Polar-CoT, which infers the target's relative position and encodes it as a compact polar-coordinate token for action prediction. Guided by these spatial priors, the TIM employs a gated update strategy to preserve long-horizon target memory, ensuring spatiotemporal consistency and mitigating target loss during extended occlusions. Extensive experiments show that TrackVLA++ achieves state-of-the-art performance on public benchmarks across both egocentric and multi-camera settings. On the challenging EVT-Bench DT split, TrackVLA++ surpasses the previous leading approach by 5.1 and 12, respectively. Furthermore, TrackVLA++ exhibits strong zero-shot generalization, enabling robust real-world tracking in dynamic and occluded scenarios.
comment: Project page: https://pku-epic.github.io/TrackVLA-plus-plus-Web/
A Digital Twin Framework for Metamorphic Testing of Autonomous Driving Systems Using Generative Model
Ensuring the safety of self-driving cars remains a major challenge due to the complexity and unpredictability of real-world driving environments. Traditional testing methods face significant limitations, such as the oracle problem, which makes it difficult to determine whether a system's behavior is correct, and the inability to cover the full range of scenarios an autonomous vehicle may encounter. In this paper, we introduce a digital twin-driven metamorphic testing framework that addresses these challenges by creating a virtual replica of the self-driving system and its operating environment. By combining digital twin technology with AI-based image generative models such as Stable Diffusion, our approach enables the systematic generation of realistic and diverse driving scenes. This includes variations in weather, road topology, and environmental features, all while maintaining the core semantics of the original scenario. The digital twin provides a synchronized simulation environment where changes can be tested in a controlled and repeatable manner. Within this environment, we define three metamorphic relations inspired by real-world traffic rules and vehicle behavior. We validate our framework in the Udacity self-driving simulator and demonstrate that it significantly enhances test coverage and effectiveness. Our method achieves the highest true positive rate (0.719), F1 score (0.689), and precision (0.662) compared to baseline approaches. This paper highlights the value of integrating digital twins with AI-powered scenario generation to create a scalable, automated, and high-fidelity testing solution for autonomous vehicle safety.
Sampling Strategies for Robust Universal Quadrupedal Locomotion Policies
This work focuses on sampling strategies of configuration variations for generating robust universal locomotion policies for quadrupedal robots. We investigate the effects of sampling physical robot parameters and joint proportional-derivative gains to enable training a single reinforcement learning policy that generalizes to multiple parameter configurations. Three fundamental joint gain sampling strategies are compared: parameter sampling with (1) linear and polynomial function mappings of mass-to-gains, (2) performance-based adaptive filtering, and (3) uniform random sampling. We improve the robustness of the policy by biasing the configurations using nominal priors and reference models. All training was conducted on RaiSim, tested in simulation on a range of diverse quadrupeds, and zero-shot deployed onto hardware using the ANYmal quadruped robot. Compared to multiple baseline implementations, our results demonstrate the need for significant joint controller gains randomization for robust closing of the sim-to-real gap.
Generative World Modelling for Humanoids: 1X World Model Challenge Technical Report
World models are a powerful paradigm in AI and robotics, enabling agents to reason about the future by predicting visual observations or compact latent states. The 1X World Model Challenge introduces an open-source benchmark of real-world humanoid interaction, with two complementary tracks: sampling, focused on forecasting future image frames, and compression, focused on predicting future discrete latent codes. For the sampling track, we adapt the video generation foundation model Wan-2.2 TI2V-5B to video-state-conditioned future frame prediction. We condition the video generation on robot states using AdaLN-Zero, and further post-train the model using LoRA. For the compression track, we train a Spatio-Temporal Transformer model from scratch. Our models achieve 23.0 dB PSNR in the sampling task and a Top-500 CE of 6.6386 in the compression task, securing 1st place in both challenges.
comment: 6 pages, 3 figures, 1X world model challenge technical report
Vision-Language-Action Models for Robotics: A Review Towards Real-World Applications
Amid growing efforts to leverage advances in large language models (LLMs) and vision-language models (VLMs) for robotics, Vision-Language-Action (VLA) models have recently gained significant attention. By unifying vision, language, and action data at scale, which have traditionally been studied separately, VLA models aim to learn policies that generalise across diverse tasks, objects, embodiments, and environments. This generalisation capability is expected to enable robots to solve novel downstream tasks with minimal or no additional task-specific data, facilitating more flexible and scalable real-world deployment. Unlike previous surveys that focus narrowly on action representations or high-level model architectures, this work offers a comprehensive, full-stack review, integrating both software and hardware components of VLA systems. In particular, this paper provides a systematic review of VLAs, covering their strategy and architectural transition, architectures and building blocks, modality-specific processing techniques, and learning paradigms. In addition, to support the deployment of VLAs in real-world robotic applications, we also review commonly used robot platforms, data collection strategies, publicly available datasets, data augmentation methods, and evaluation benchmarks. Throughout this comprehensive survey, this paper aims to offer practical guidance for the robotics community in applying VLAs to real-world robotic systems. All references categorized by training approach, evaluation method, modality, and dataset are available in the table on our project website: https://vla-survey.github.io .
comment: Accepted to IEEE Access, website: https://vla-survey.github.io
Bring the Apple, Not the Sofa: Impact of Irrelevant Context in Embodied AI Commands on VLA Models
Vision Language Action (VLA) models are widely used in Embodied AI, enabling robots to interpret and execute language instructions. However, their robustness to natural language variability in real-world scenarios has not been thoroughly investigated. In this work, we present a novel systematic study of the robustness of state-of-the-art VLA models under linguistic perturbations. Specifically, we evaluate model performance under two types of instruction noise: (1) human-generated paraphrasing and (2) the addition of irrelevant context. We further categorize irrelevant contexts into two groups according to their length and their semantic and lexical proximity to robot commands. In this study, we observe consistent performance degradation as context size expands. We also demonstrate that the model can exhibit relative robustness to random context, with a performance drop within 10%, while semantically and lexically similar context of the same length can trigger a quality decline of around 50%. Human paraphrases of instructions lead to a drop of nearly 20%. To mitigate this, we propose an LLM-based filtering framework that extracts core commands from noisy inputs. Incorporating our filtering step allows models to recover up to 98.5% of their original performance under noisy conditions.
Artists' Views on Robotics Involvement in Painting Productions
As robotic technologies evolve, their potential in artistic creation becomes an increasingly relevant topic of inquiry. This study explores how professional abstract artists perceive and experience co-creative interactions with an autonomous painting robotic arm. Eight artists engaged in six painting sessions -- three with a human partner, followed by three with the robot -- and subsequently participated in semi-structured interviews analyzed through reflexive thematic analysis. Human-human interactions were described as intuitive, dialogic, and emotionally engaging, whereas human-robot sessions felt more playful and reflective, offering greater autonomy and prompting for novel strategies to overcome the system's limitations. This work offers one of the first empirical investigations into artists' lived experiences with a robot, highlighting the value of long-term engagement and a multidisciplinary approach to human-robot co-creation.
comment: 10 pages, 9 figures, submitted to RAM special issue: Arts and Robotics
Introspection in Learned Semantic Scene Graph Localisation IROS 2025
This work investigates how semantics influence localisation performance and robustness in a learned self-supervised, contrastive semantic localisation framework. After training a localisation network on both original and perturbed maps, we conduct a thorough post-hoc introspection analysis to probe whether the model filters environmental noise and prioritises distinctive landmarks over routine clutter. We validate various interpretability methods and present a comparative reliability analysis. Integrated gradients and Attention Weights consistently emerge as the most reliable probes of learned behaviour. A semantic class ablation further reveals an implicit weighting in which frequent objects are often down-weighted. Overall, the results indicate that the model learns noise-robust, semantically salient relations about place definition, thereby enabling explainable registration under challenging visual and structural variations.
comment: IEEE IROS 2025 Workshop FAST
Diffusing Trajectory Optimization Problems for Recovery During Multi-Finger Manipulation
Multi-fingered hands are emerging as powerful platforms for performing fine manipulation tasks, including tool use. However, environmental perturbations or execution errors can impede task performance, motivating the use of recovery behaviors that enable normal task execution to resume. In this work, we take advantage of recent advances in diffusion models to construct a framework that autonomously identifies when recovery is necessary and optimizes contact-rich trajectories to recover. We use a diffusion model trained on the task to estimate when states are not conducive to task execution, framed as an out-of-distribution detection problem. We then use diffusion sampling to project these states in-distribution and use trajectory optimization to plan contact-rich recovery trajectories. We also propose a novel diffusion-based approach that distills this process to efficiently diffuse the full parameterization, including constraints, goal state, and initialization, of the recovery trajectory optimization problem, saving time during online execution. We compare our method to a reinforcement learning baseline and other methods that do not explicitly plan contact interactions, including on a hardware screwdriver-turning task where we show that recovering using our method improves task performance by 96% and that ours is the only method evaluated that can attempt recovery without causing catastrophic task failure. Videos can be found at https://dtourrecovery.github.io/.
Temporal-Prior-Guided View Planning for Periodic 3D Plant Reconstruction IROS 2025
Periodic 3D reconstruction is essential for crop monitoring, but costly when each cycle restarts from scratch, wasting resources and ignoring information from previous captures. We propose temporal-prior-guided view planning for periodic plant reconstruction, in which a previously reconstructed model of the same plant is non-rigidly aligned to a new partial observation to form an approximation of the current geometry. To accommodate plant growth, we inflate this approximation and solve a set covering optimization problem to compute a minimal set of views. We integrated this method into a complete pipeline that acquires one additional next-best view before registration for robustness and then plans a globally shortest path to connect the planned set of views and outputs the best view sequence. Experiments on maize and tomato under hemisphere and sphere view spaces show that our system maintains or improves surface coverage while requiring fewer views and comparable movement cost compared to state-of-the-art baselines.
comment: Accepted to the Active Perception Workshop at IROS 2025
Tailoring materials into kirigami robots
Kirigami, the traditional paper-cutting craft, holds immense potential for revolutionizing robotics by providing multifunctional, lightweight, and adaptable solutions. Kirigami structures, characterized by their bending-dominated deformation, offer resilience to tensile forces and facilitate shape morphing under small actuation forces. Kirigami components such as actuators, sensors, batteries, controllers, and body structures can be tailored to specific robotic applications by optimizing cut patterns. Actuators based on kirigami principles exhibit complex motions programmable through various energy sources, while kirigami sensors bridge the gap between electrical conductivity and compliance. Kirigami-integrated batteries enable energy storage directly within robot structures, enhancing flexibility and compactness. Kirigami-controlled mechanisms mimic mechanical computations, enabling advanced functionalities such as shape morphing and memory functions. Applications of kirigami-enabled robots include grasping, locomotion, and wearables, showcasing their adaptability to diverse environments and tasks. Despite promising opportunities, challenges remain in the design of cut patterns for a given function and streamlining fabrication techniques.
DecompGAIL: Learning Realistic Traffic Behaviors with Decomposed Multi-Agent Generative Adversarial Imitation Learning
Realistic traffic simulation is critical for the development of autonomous driving systems and urban mobility planning, yet existing imitation learning approaches often fail to model realistic traffic behaviors. Behavior cloning suffers from covariate shift, while Generative Adversarial Imitation Learning (GAIL) is notoriously unstable in multi-agent settings. We identify a key source of this instability: irrelevant interaction misguidance, where a discriminator penalizes an ego vehicle's realistic behavior due to unrealistic interactions among its neighbors. To address this, we propose Decomposed Multi-agent GAIL (DecompGAIL), which explicitly decomposes realism into ego-map and ego-neighbor components, filtering out misleading neighbor: neighbor and neighbor: map interactions. We further introduce a social PPO objective that augments ego rewards with distance-weighted neighborhood rewards, encouraging overall realism across agents. Integrated into a lightweight SMART-based backbone, DecompGAIL achieves state-of-the-art performance on the WOMD Sim Agents 2025 benchmark.
HARP-NeXt: High-Speed and Accurate Range-Point Fusion Network for 3D LiDAR Semantic Segmentation IROS 2025
LiDAR semantic segmentation is crucial for autonomous vehicles and mobile robots, requiring high accuracy and real-time processing, especially on resource-constrained embedded systems. Previous state-of-the-art methods often face a trade-off between accuracy and speed. Point-based and sparse convolution-based methods are accurate but slow due to the complexity of neighbor searching and 3D convolutions. Projection-based methods are faster but lose critical geometric information during the 2D projection. Additionally, many recent methods rely on test-time augmentation (TTA) to improve performance, which further slows the inference. Moreover, the pre-processing phase across all methods increases execution time and is demanding on embedded platforms. Therefore, we introduce HARP-NeXt, a high-speed and accurate LiDAR semantic segmentation network. We first propose a novel pre-processing methodology that significantly reduces computational overhead. Then, we design the Conv-SE-NeXt feature extraction block to efficiently capture representations without deep layer stacking per network stage. We also employ a multi-scale range-point fusion backbone that leverages information at multiple abstraction levels to preserve essential geometric details, thereby enhancing accuracy. Experiments on the nuScenes and SemanticKITTI benchmarks show that HARP-NeXt achieves a superior speed-accuracy trade-off compared to all state-of-the-art methods, and, without relying on ensemble models or TTA, is comparable to the top-ranked PTv3, while running 24$\times$ faster. The code is available at https://github.com/SamirAbouHaidar/HARP-NeXt
comment: Accepted at IROS 2025 (IEEE/RSJ International Conference on Intelligent Robots and Systems)
Distributed 3D Source Seeking via SO(3) Geometric Control of Robot Swarms
This paper presents a geometric control framework on the Lie group SO(3) for 3D source-seeking by robots with first-order attitude dynamics and constant translational speed. By working directly on SO(3), the approach avoids Euler-angle singularities and quaternion ambiguities, providing a unique, intrinsic representation of orientation. We design a proportional feed-forward controller that ensures exponential alignment of each agent to an estimated ascending direction toward a 3D scalar field source. The controller adapts to bounded unknown variations and preserves well-posed swarm formations. Numerical simulations demonstrate the effectiveness of the method, with all code provided open source for reproducibility.
comment: 7 pages, 3 figures. Submitted for presentation at the IFAC World Congress 2026
UniFField: A Generalizable Unified Neural Feature Field for Visual, Semantic, and Spatial Uncertainties in Any Scene
Comprehensive visual, geometric, and semantic understanding of a 3D scene is crucial for successful execution of robotic tasks, especially in unstructured and complex environments. Additionally, to make robust decisions, it is necessary for the robot to evaluate the reliability of perceived information. While recent advances in 3D neural feature fields have enabled robots to leverage features from pretrained foundation models for tasks such as language-guided manipulation and navigation, existing methods suffer from two critical limitations: (i) they are typically scene-specific, and (ii) they lack the ability to model uncertainty in their predictions. We present UniFField, a unified uncertainty-aware neural feature field that combines visual, semantic, and geometric features in a single generalizable representation while also predicting uncertainty in each modality. Our approach, which can be applied zero shot to any new environment, incrementally integrates RGB-D images into our voxel-based feature representation as the robot explores the scene, simultaneously updating uncertainty estimation. We evaluate our uncertainty estimations to accurately describe the model prediction errors in scene reconstruction and semantic feature prediction. Furthermore, we successfully leverage our feature predictions and their respective uncertainty for an active object search task using a mobile manipulator robot, demonstrating the capability for robust decision-making.
comment: Project website: https://sites.google.com/view/uniffield
SanDRA: Safe Large-Language-Model-Based Decision Making for Automated Vehicles Using Reachability Analysis
Large language models have been widely applied to knowledge-driven decision-making for automated vehicles due to their strong generalization and reasoning capabilities. However, the safety of the resulting decisions cannot be ensured due to possible hallucinations and the lack of integrated vehicle dynamics. To address this issue, we propose SanDRA, the first safe large-language-model-based decision making framework for automated vehicles using reachability analysis. Our approach starts with a comprehensive description of the driving scenario to prompt large language models to generate and rank feasible driving actions. These actions are translated into temporal logic formulas that incorporate formalized traffic rules, and are subsequently integrated into reachability analysis to eliminate unsafe actions. We validate our approach in both open-loop and closed-loop driving environments using off-the-shelf and finetuned large language models, showing that it can provide provably safe and, where possible, legally compliant driving actions, even under high-density traffic conditions. To ensure transparency and facilitate future research, all code and experimental setups are publicly available at github.com/CommonRoad/SanDRA.
comment: @2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training
Recent progress in vision and language foundation models has significantly advanced multimodal understanding, reasoning, and generation, inspiring a surge of interest in extending such capabilities to embodied settings through vision-language-action (VLA) models. Yet, most VLA models are still trained with supervised fine-tuning (SFT), which struggles to generalize under distribution shifts due to error accumulation. Reinforcement learning (RL) offers a promising alternative by directly optimizing task performance through interaction, but existing attempts remain fragmented and lack a unified platform for fair and systematic comparison across model architectures and algorithmic designs. To address this gap, we introduce RLinf-VLA, a unified and efficient framework for scalable RL training of VLA models. The system adopts a highly flexible resource allocation design that addresses the challenge of integrating rendering, training, and inference in RL+VLA training. In particular, for GPU-parallelized simulators, RLinf-VLA implements a novel hybrid fine-grained pipeline allocation mode, achieving a 1.61x-1.88x speedup in training. Through a unified interface, RLinf-VLA seamlessly supports diverse VLA architectures (e.g., OpenVLA, OpenVLA-OFT), multiple RL algorithms (e.g., PPO, GRPO), and various simulators (e.g., ManiSkill, LIBERO). In simulation, a unified model achieves 98.11\% across 130 LIBERO tasks and 97.66\% across 25 ManiSkill tasks. Beyond empirical performance, our study distills a set of best practices for applying RL to VLA training and sheds light on emerging patterns in this integration. Furthermore, we present preliminary deployment on a real-world Franka robot, where RL-trained policies exhibit stronger generalization than those trained with SFT. We envision RLinf-VLA as a foundation to accelerate and standardize research on embodied intelligence.
comment: This is the technical report of the RLinf Team, focusing on the algorithm side. For the system-level design, please refer to arXiv:2509.15965. The open-sourced code link: https://github.com/RLinf/RLinf
Assist-As-Needed: Adaptive Multimodal Robotic Assistance for Medication Management in Dementia Care
People living with dementia (PLWDs) face progressively declining abilities in medication management-from simple forgetfulness to complete task breakdown-yet most assistive technologies fail to adapt to these changing needs. This one-size-fits-all approach undermines autonomy, accelerates dependence, and increases caregiver burden. Occupational therapy principles emphasize matching assistance levels to individual capabilities: minimal reminders for those who merely forget, spatial guidance for those who misplace items, and comprehensive multimodal support for those requiring step-by-step instruction. However, existing robotic systems lack this adaptive, graduated response framework essential for maintaining PLWD independence. We present an adaptive multimodal robotic framework using the Pepper robot that dynamically adjusts assistance based on real-time assessment of user needs. Our system implements a hierarchical intervention model progressing from (1) simple verbal reminders, to (2) verbal + gestural cues, to (3) full multimodal guidance combining physical navigation to medication locations with step-by-step verbal and gestural instructions. Powered by LLM-driven interaction strategies and multimodal sensing, the system continuously evaluates task states to provide just-enough assistance-preserving autonomy while ensuring medication adherence. We conducted a preliminary study with healthy adults and dementia care stakeholders in a controlled lab setting, evaluating the system's usability, comprehensibility, and appropriateness of adaptive feedback mechanisms. This work contributes: (1) a theoretically grounded adaptive assistance framework translating occupational therapy principles into HRI design, (2) a multimodal robotic implementation that preserves PLWD dignity through graduated support, and (3) empirical insights into stakeholder perceptions of adaptive robotic care.
Through the Perspective of LiDAR: A Feature-Enriched and Uncertainty-Aware Annotation Pipeline for Terrestrial Point Cloud Segmentation
Accurate semantic segmentation of terrestrial laser scanning (TLS) point clouds is limited by costly manual annotation. We propose a semi-automated, uncertainty-aware pipeline that integrates spherical projection, feature enrichment, ensemble learning, and targeted annotation to reduce labeling effort, while sustaining high accuracy. Our approach projects 3D points to a 2D spherical grid, enriches pixels with multi-source features, and trains an ensemble of segmentation networks to produce pseudo-labels and uncertainty maps, the latter guiding annotation of ambiguous regions. The 2D outputs are back-projected to 3D, yielding densely annotated point clouds supported by a three-tier visualization suite (2D feature maps, 3D colorized point clouds, and compact virtual spheres) for rapid triage and reviewer guidance. Using this pipeline, we build Mangrove3D, a semantic segmentation TLS dataset for mangrove forests. We further evaluate data efficiency and feature importance to address two key questions: (1) how much annotated data are needed and (2) which features matter most. Results show that performance saturates after ~12 annotated scans, geometric features contribute the most, and compact nine-channel stacks capture nearly all discriminative power, with the mean Intersection over Union (mIoU) plateauing at around 0.76. Finally, we confirm the generalization of our feature-enrichment strategy through cross-dataset tests on ForestSemantic and Semantic3D. Our contributions include: (i) a robust, uncertainty-aware TLS annotation pipeline with visualization tools; (ii) the Mangrove3D dataset; and (iii) empirical guidance on data efficiency and feature importance, thus enabling scalable, high-quality segmentation of TLS point clouds for ecological monitoring and beyond. The dataset and processing scripts are publicly available at https://fz-rit.github.io/through-the-lidars-eye/.
Safe Obstacle-Free Guidance of Space Manipulators in Debris Removal Missions via Deep Reinforcement Learning
The objective of this study is to develop a model-free workspace trajectory planner for space manipulators using a Twin Delayed Deep Deterministic Policy Gradient (TD3) agent to enable safe and reliable debris capture. A local control strategy with singularity avoidance and manipulability enhancement is employed to ensure stable execution. The manipulator must simultaneously track a capture point on a non-cooperative target, avoid self-collisions, and prevent unintended contact with the target. To address these challenges, we propose a curriculum-based multi-critic network where one critic emphasizes accurate tracking and the other enforces collision avoidance. A prioritized experience replay buffer is also used to accelerate convergence and improve policy robustness. The framework is evaluated on a simulated seven-degree-of-freedom KUKA LBR iiwa mounted on a free-floating base in Matlab/Simulink, demonstrating safe and adaptive trajectory generation for debris removal missions.
RAISE: A self-driving laboratory for interfacial property formulation discovery
Surface wettability is a critical design parameter for biomedical devices, coatings, and textiles. Contact angle measurements quantify liquid-surface interactions, which depend strongly on liquid formulation. Herein, we present the Robotic Autonomous Imaging Surface Evaluator (RAISE), a closed-loop, self-driving laboratory that is capable of linking liquid formulation optimization with surface wettability assessment. RAISE comprises a full experimental orchestrator with the ability of mixing liquid ingredients to create varying formulation cocktails, transferring droplets of prepared formulations to a high-throughput stage, and using a pick-and-place camera tool for automated droplet image capture. The system also includes an automated image processing pipeline to measure contact angles. This closed loop experiment orchestrator is integrated with a Bayesian Optimization (BO) client, which enables iterative exploration of new formulations based on previous contact angle measurements to meet user-defined objectives. The system operates in a high-throughput manner and can achieve a measurement rate of approximately 1 contact angle measurement per minute. Here we demonstrate RAISE can be used to explore surfactant wettability and how surfactant combinations create tunable formulations that compensate for purity-related variations. Furthermore, multi-objective BO demonstrates how precise and optimal formulations can be reached based on application-specific goals. The optimization is guided by a desirability score, which prioritizes formulations that are within target contact angle ranges, minimize surfactant usage and reduce cost. This work demonstrates the capabilities of RAISE to autonomously link liquid formulations to contact angle measurements in a closed-loop system, using multi-objective BO to efficiently identify optimal formulations aligned with researcher-defined criteria.
comment: Mohammad Nazeri, Sheldon Mei, and Jeffrey Watchorn contributed equally to this work. *Corresponding author: Frank Gu (f.gu@utoronto.ca)
GATO: GPU-Accelerated and Batched Trajectory Optimization for Scalable Edge Model Predictive Control
While Model Predictive Control (MPC) delivers strong performance across robotics applications, solving the underlying (batches of) nonlinear trajectory optimization (TO) problems online remains computationally demanding. Existing GPU-accelerated approaches typically (i) parallelize a single solve to meet real-time deadlines, (ii) scale to very large batches at slower-than-real-time rates, or (iii) achieve speed by restricting model generality (e.g., point-mass dynamics or a single linearization). This leaves a large gap in solver performance for many state-of-the-art MPC applications that require real-time batches of tens to low-hundreds of solves. As such, we present GATO, an open source, GPU-accelerated, batched TO solver co-designed across algorithm, software, and computational hardware to deliver real-time throughput for these moderate batch size regimes. Our approach leverages a combination of block-, warp-, and thread-level parallelism within and across solves for ultra-high performance. We demonstrate the effectiveness of our approach through a combination of: simulated benchmarks showing speedups of 18-21x over CPU baselines and 1.4-16x over GPU baselines as batch size increases; case studies highlighting improved disturbance rejection and convergence behavior; and finally a validation on hardware using an industrial manipulator. We open source GATO to support reproducibility and adoption.
Inspection Planning Primitives with Implicit Models
The aging and increasing complexity of infrastructures make efficient inspection planning more critical in ensuring safety. Thanks to sampling-based motion planning, many inspection planners are fast. However, they often require huge memory. This is particularly true when the structure under inspection is large and complex, consisting of many struts and pillars of various geometry and sizes. Such structures can be represented efficiently using implicit models, such as neural Signed Distance Functions (SDFs). However, most primitive computations used in sampling-based inspection planner have been designed to work efficiently with explicit environment models, which in turn requires the planner to use explicit environment models or performs frequent transformations between implicit and explicit environment models during planning. This paper proposes a set of primitive computations, called Inspection Planning Primitives with Implicit Models (IPIM), that enable sampling-based inspection planners to entirely use neural SDFs representation during planning. Evaluation on three scenarios, including inspection of a complex real-world structure with over 92M triangular mesh faces, indicates that even a rudimentary sampling-based planner with IPIM can generate inspection trajectories of similar quality to those generated by the state-of-the-art planner, while using up to 70x less memory than the state-of-the-art inspection planner.
IGUANA: Immersive Guidance, Navigation, and Control for Consumer UAV
As the markets for unmanned aerial vehicles (UAVs) and mixed reality (MR) headsets continue to grow, recent research has increasingly explored their integration, which enables more intuitive, immersive, and situationally aware control systems. We present IGUANA, an MR-based immersive guidance, navigation, and control system for consumer UAVs. IGUANA introduces three key elements beyond conventional control interfaces: (1) a 3D terrain map interface with draggable waypoint markers and live camera preview for high-level control, (2) a novel spatial control metaphor that uses a virtual ball as a physical analogy for low-level control, and (3) a spatial overlay that helps track the UAV when it is not visible with the naked eye or visual line of sight is interrupted. We conducted a user study to evaluate our design, both quantitatively and qualitatively, and found that (1) the 3D map interface is intuitive and easy to use, relieving users from manual control and suggesting improved accuracy and consistency with lower perceived workload relative to conventional dual-stick controller, (2) the virtual ball interface is intuitive but limited by the lack of physical feedback, and (3) the spatial overlay is very useful in enhancing the users' situational awareness.
comment: This is the author's version of the work. The definitive Version of Record was published in 31st ACM Symposium on Virtual Reality Software and Technology (VRST '25)
AVO: Amortized Value Optimization for Contact Mode Switching in Multi-Finger Manipulation
Dexterous manipulation tasks often require switching between different contact modes, such as rolling, sliding, sticking, or non-contact contact modes. When formulating dexterous manipulation tasks as a trajectory optimization problem, a common approach is to decompose these tasks into sub-tasks for each contact mode, which are each solved independently. Optimizing each sub-task independently can limit performance, as optimizing contact points, contact forces, or other variables without information about future sub-tasks can place the system in a state from which it is challenging to make progress on subsequent sub-tasks. Further, optimizing these sub-tasks is very computationally expensive. To address these challenges, we propose Amortized Value Optimization (AVO), which introduces a learned value function that predicts the total future task performance. By incorporating this value function into the cost of the trajectory optimization at each planning step, the value function gradients guide the optimizer toward states that minimize the cost in future sub-tasks. This effectively bridges separately optimized sub-tasks, and accelerates the optimization by reducing the amount of online computation needed. We validate AVO on a screwdriver grasping and turning task in both simulation and real world experiments, and show improved performance even with 50% less computational budget compared to trajectory optimization without the value function.
HJCD-IK: GPU-Accelerated Inverse Kinematics through Batched Hybrid Jacobian Coordinate Descent
Inverse Kinematics (IK) is a core problem in robotics, in which joint configurations are found to achieve a desired end-effector pose. Although analytical solvers are fast and efficient, they are limited to systems with low degrees-of-freedom and specific topological structures. Numerical optimization-based approaches are more general, but suffer from high computational costs and frequent convergence to spurious local minima. Recent efforts have explored the use of GPUs to combine sampling and optimization to enhance both the accuracy and speed of IK solvers. We build on this recent literature and introduce HJCD-IK, a GPU-accelerated, sampling-based hybrid solver that combines an orientation-aware greedy coordinate descent initialization scheme with a Jacobian-based polishing routine. This design enables our solver to improve both convergence speed and overall accuracy as compared to the state-of-the-art, consistently finding solutions along the accuracy-latency Pareto frontier and often achieving order-of-magnitude gains. In addition, our method produces a broad distribution of high-quality samples, yielding the lowest maximum mean discrepancy. We release our code open-source for the benefit of the community.
VeMo: A Lightweight Data-Driven Approach to Model Vehicle Dynamics
Developing a dynamic model for a high-performance vehicle is a complex problem that requires extensive structural information about the system under analysis. This information is often unavailable to those who did not design the vehicle and represents a typical issue in autonomous driving applications, which are frequently developed on top of existing vehicles; therefore, vehicle models are developed under conditions of information scarcity. This paper proposes a lightweight encoder-decoder model based on Gate Recurrent Unit layers to correlate the vehicle's future state with its past states, measured onboard, and control actions the driver performs. The results demonstrate that the model achieves a maximum mean relative error below 2.6% in extreme dynamic conditions. It also shows good robustness when subject to noisy input data across the interested frequency components. Furthermore, being entirely data-driven and free from physical constraints, the model exhibits physical consistency in the output signals, such as longitudinal and lateral accelerations, yaw rate, and the vehicle's longitudinal velocity.
A Rotation-Invariant Embedded Platform for (Neural) Cellular Automata
This paper presents a rotation-invariant embedded platform for simulating (neural) cellular automata (NCA) in modular robotic systems. Inspired by previous work on physical NCA, we introduce key innovations that overcome limitations in prior hardware designs. Our platform features a symmetric, modular structure, enabling seamless connections between cells regardless of orientation. Additionally, each cell is battery-powered, allowing it to operate independently and retain its state even when disconnected from the collective. To demonstrate the platform's applicability, we present a novel rotation-invariant NCA model for isotropic shape classification. The proposed system provides a robust foundation for exploring the physical realization of NCA, with potential applications in distributed robotic systems and self-organizing structures. Our implementation, including hardware, software code, a simulator, and a video, is openly shared at: https://github.com/dwoiwode/embedded_nca
comment: Accepted for ALIFE 2025
FLEET: Formal Language-Grounded Scheduling for Heterogeneous Robot Teams
Coordinating heterogeneous robot teams from free-form natural-language instructions is hard. Language-only planners struggle with long-horizon coordination and hallucination, while purely formal methods require closed-world models. We present FLEET, a hybrid decentralized framework that turns language into optimized multi-robot schedules. An LLM front-end produces (i) a task graph with durations and precedence and (ii) a capability-aware robot--task fitness matrix; a formal back-end solves a makespan-minimization problem while the underlying robots execute their free-form subtasks with agentic closed-loop control. Across multiple free-form language-guided autonomy coordination benchmarks, FLEET improves success over state of the art generative planners on two-agent teams across heterogeneous tasks. Ablations show that mixed integer linear programming (MILP) primarily improves temporal structure, while LLM-derived fitness is decisive for capability-coupled tasks; together they deliver the highest overall performance. We demonstrate the translation to real world challenges with hardware trials using a pair of quadruped robots with disjoint capabilities.
ResMimic: From General Motion Tracking to Humanoid Whole-body Loco-Manipulation via Residual Learning
Humanoid whole-body loco-manipulation promises transformative capabilities for daily service and warehouse tasks. While recent advances in general motion tracking (GMT) have enabled humanoids to reproduce diverse human motions, these policies lack the precision and object awareness required for loco-manipulation. To this end, we introduce ResMimic, a two-stage residual learning framework for precise and expressive humanoid control from human motion data. First, a GMT policy, trained on large-scale human-only motion, serves as a task-agnostic base for generating human-like whole-body movements. An efficient but precise residual policy is then learned to refine the GMT outputs to improve locomotion and incorporate object interaction. To further facilitate efficient training, we design (i) a point-cloud-based object tracking reward for smoother optimization, (ii) a contact reward that encourages accurate humanoid body-object interactions, and (iii) a curriculum-based virtual object controller to stabilize early training. We evaluate ResMimic in both simulation and on a real Unitree G1 humanoid. Results show substantial gains in task success, training efficiency, and robustness over strong baselines. Videos are available at https://resmimic.github.io/ .
comment: 9 pages, 8 figures
Online Hybrid-Belief POMDP with Coupled Semantic-Geometric Models
Robots operating in complex and unknown environments frequently require geometric-semantic representations of the environment to safely perform their tasks. While inferring the environment, they must account for many possible scenarios when planning future actions. Since objects' class types are discrete and the robot's self-pose and the objects' poses are continuous, the environment can be represented by a hybrid discrete-continuous belief which is updated according to models and incoming data. Prior probabilities and observation models representing the environment can be learned from data using deep learning algorithms. Such models often couple environmental semantic and geometric properties. As a result, semantic variables are interconnected, causing semantic state space dimensionality to increase exponentially. In this paper, we consider planning under uncertainty using partially observable Markov decision processes (POMDPs) with hybrid semantic-geometric beliefs. The models and priors consider the coupling between semantic and geometric variables. Within POMDP, we introduce the concept of semantically aware safety. Obtaining representative samples of the theoretical hybrid belief, required for estimating the value function, is very challenging. As a key contribution, we develop a novel form of the hybrid belief and leverage it to sample representative samples. We show that under certain conditions, the value function and probability of safety can be calculated efficiently with an explicit expectation over all possible semantic mappings. Our simulations show that our estimates of the objective function and probability of safety achieve similar levels of accuracy compared to estimators that run exhaustively on the entire semantic state-space using samples from the theoretical hybrid belief. Nevertheless, the complexity of our estimators is polynomial rather than exponential.
comment: 20 pages, 9 figures
BIM-Constrained Optimization for Accurate Localization and Deviation Correction in Construction Monitoring
Augmented reality (AR) applications for construction monitoring rely on real-time environmental tracking to visualize architectural elements. However, construction sites present significant challenges for traditional tracking methods due to featureless surfaces, dynamic changes, and drift accumulation, leading to misalignment between digital models and the physical world. This paper proposes a BIM-aware drift correction method to address these challenges. Instead of relying solely on SLAM-based localization, we align ``as-built" detected planes from the real-world environment with ``as-planned" architectural planes in BIM. Our method performs robust plane matching and computes a transformation (TF) between SLAM (S) and BIM (B) origin frames using optimization techniques, minimizing drift over time. By incorporating BIM as prior structural knowledge, we can achieve improved long-term localization and enhanced AR visualization accuracy in noisy construction environments. The method is evaluated through real-world experiments, showing significant reductions in drift-induced errors and optimized alignment consistency. On average, our system achieves a reduction of 52.24% in angular deviations and a reduction of 60.8% in the distance error of the matched walls compared to the initial manual alignment by the user.
M^3RS: Multi-robot, Multi-objective, and Multi-mode Routing and Scheduling
Task execution quality significantly impacts multi-robot missions, yet existing task allocation frameworks rarely consider quality of service as a decision variable, despite its importance in applications like robotic disinfection and cleaning. We introduce the multi-robot, multi-objective, and multi-mode routing and scheduling (M3RS) problem, designed for time-constrained missions. In M3RS, each task offers multiple execution modes with varying resource needs, durations, and quality levels, allowing trade-offs across mission objectives. M3RS is modeled as a mixed-integer linear programming (MIP) problem and optimizes task sequencing and execution modes for each agent. We apply M3RS to multi-robot disinfection in healthcare and public spaces, optimizing disinfection quality and task completion rates. Through synthetic case studies, M3RS demonstrates 3-46$\%$ performance improvements over the standard task allocation method across various metrics. Further, to improve compute time, we propose a clustering-based column generation algorithm that achieves solutions comparable to or better than the baseline MIP solver while reducing computation time by 60$\%$. We also conduct case studies with simulated and real robots. Experimental videos are available on the project page: \href{https://sites.google.com/view/g-robot/m3rs/}{https://sites.google.com/view/g-robot/m3rs/}.
comment: Under review
BIM Informed Visual SLAM for Construction Monitoring
Simultaneous Localization and Mapping (SLAM) is a key tool for monitoring construction sites, where aligning the evolving as-built state with the as-planned design enables early error detection and reduces costly rework. LiDAR-based SLAM achieves high geometric precision, but its sensors are typically large and power-demanding, limiting their use on portable platforms. Visual SLAM offers a practical alternative with lightweight cameras already embedded in most mobile devices. however, visually mapping construction environments remains challenging: repetitive layouts, occlusions, and incomplete or low-texture structures often cause drift in the trajectory map. To mitigate this, we propose an RGB-D SLAM system that incorporates the Building Information Model (BIM) as structural prior knowledge. Instead of relying solely on visual cues, our system continuously establishes correspondences between detected wall and their BIM counterparts, which are then introduced as constraints in the back-end optimization. The proposed method operates in real time and has been validated on real construction sites, reducing trajectory error by an average of 23.71% and map RMSE by 7.14% compared to visual SLAM baselines. These results demonstrate that BIM constraints enable reliable alignment of the digital plan with the as-built scene, even under partially constructed conditions.
comment: 8 pages, 5 tables, 4 figures
Interleave-VLA: Enhancing Robot Manipulation with Interleaved Image-Text Instructions
The rise of foundation models paves the way for generalist robot policies in the physical world. Existing methods relying on text-only instructions often struggle to generalize to unseen scenarios. We argue that interleaved image-text inputs offer richer and less biased context and enable robots to better handle unseen tasks with more versatile human-robot interaction. Building on this insight, Interleave-VLA, the first robot learning paradigm capable of comprehending interleaved image-text instructions and directly generating continuous action sequences in the physical world, is introduced. It offers a natural, flexible, and model-agnostic paradigm that extends state-of-the-art vision-language-action (VLA) models with minimal modifications while achieving strong zero-shot generalization. Interleave-VLA also includes an automatic pipeline that converts text instructions from Open X-Embodiment into interleaved image-text instructions, resulting in a large-scale real-world interleaved embodied dataset with 210k episodes. Comprehensive evaluation in simulation and the real world shows that Interleave-VLA offers two major benefits: (1) improves out-of-domain generalization to unseen objects by 2x compared to text input baselines, (2) supports flexible task interfaces and diverse instructions in a zero-shot manner, such as hand-drawn sketches. We attribute Interleave-VLA's strong zero-shot capability to the use of instruction images, which effectively mitigate hallucinations, and the inclusion of heterogeneous multimodal datasets, enriched with Internet-sourced images, offering potential for scalability. More information is available at https://interleave-vla.github.io/Interleave-VLA-Anonymous/
Diffusion Trajectory-guided Policy for Long-horizon Robot Manipulation
Recently, Vision-Language-Action models (VLA) have advanced robot imitation learning, but high data collection costs and limited demonstrations hinder generalization and current imitation learning methods struggle in out-of-distribution scenarios, especially for long-horizon tasks. A key challenge is how to mitigate compounding errors in imitation learning, which lead to cascading failures over extended trajectories. To address these challenges, we propose the Diffusion Trajectory-guided Policy (DTP) framework, which generates 2D trajectories through a diffusion model to guide policy learning for long-horizon tasks. By leveraging task-relevant trajectories, DTP provides trajectory-level guidance to reduce error accumulation. Our two-stage approach first trains a generative vision-language model to create diffusion-based trajectories, then refines the imitation policy using them. Experiments on the CALVIN benchmark show that DTP outperforms state-of-the-art baselines by 25% in success rate, starting from scratch without external pretraining. Moreover, DTP significantly improves real-world robot performance.
comment: 8 pages, 5 figures, accepted to IEEE Robotics and Automation Letters (RAL)
Touch Speaks, Sound Feels: A Multimodal Approach to Affective and Social Touch from Robots to Humans
Affective tactile interaction constitutes a fundamental component of human communication. In natural human-human encounters, touch is seldom experienced in isolation; rather, it is inherently multisensory. Individuals not only perceive the physical sensation of touch but also register the accompanying auditory cues generated through contact. The integration of haptic and auditory information forms a rich and nuanced channel for emotional expression. While extensive research has examined how robots convey emotions through facial expressions and speech, their capacity to communicate social gestures and emotions via touch remains largely underexplored. To address this gap, we developed a multimodal interaction system incorporating a 5*5 grid of 25 vibration motors synchronized with audio playback, enabling robots to deliver combined haptic-audio stimuli. In an experiment involving 32 Chinese participants, ten emotions and six social gestures were presented through vibration, sound, or their combination. Participants rated each stimulus on arousal and valence scales. The results revealed that (1) the combined haptic-audio modality significantly enhanced decoding accuracy compared to single modalities; (2) each individual channel-vibration or sound-effectively supported certain emotions recognition, with distinct advantages depending on the emotional expression; and (3) gestures alone were generally insufficient for conveying clearly distinguishable emotions. These findings underscore the importance of multisensory integration in affective human-robot interaction and highlight the complementary roles of haptic and auditory cues in enhancing emotional communication.
Learning to Recover: Dynamic Reward Shaping with Wheel-Leg Coordination for Fallen Robots
Adaptive recovery from fall incidents are essential skills for the practical deployment of wheeled-legged robots, which uniquely combine the agility of legs with the speed of wheels for rapid recovery. However, traditional methods relying on preplanned recovery motions, simplified dynamics or sparse rewards often fail to produce robust recovery policies. This paper presents a learning-based framework integrating Episode-based Dynamic Reward Shaping and curriculum learning, which dynamically balances exploration of diverse recovery maneuvers with precise posture refinement. An asymmetric actor-critic architecture accelerates training by leveraging privileged information in simulation, while noise-injected observations enhance robustness against uncertainties. We further demonstrate that synergistic wheel-leg coordination reduces joint torque consumption by 15.8% and 26.2% and improves stabilization through energy transfer mechanisms. Extensive evaluations on two distinct quadruped platforms achieve recovery success rates up to 99.1% and 97.8% without platform-specific tuning. The supplementary material is available at https://boyuandeng.github.io/L2R-WheelLegCoordination/
Interpretable Robot Control via Structured Behavior Trees and Large Language Models
As intelligent robots become more integrated into human environments, there is a growing need for intuitive and reliable Human-Robot Interaction (HRI) interfaces that are adaptable and more natural to interact with. Traditional robot control methods often require users to adapt to interfaces or memorize predefined commands, limiting usability in dynamic, unstructured environments. This paper presents a novel framework that bridges natural language understanding and robotic execution by combining Large Language Models (LLMs) with Behavior Trees. This integration enables robots to interpret natural language instructions given by users and translate them into executable actions by activating domain-specific plugins. The system supports scalable and modular integration, with a primary focus on perception-based functionalities, such as person tracking and hand gesture recognition. To evaluate the system, a series of real-world experiments was conducted across diverse environments. Experimental results demonstrate that the proposed approach is practical in real-world scenarios, with an average cognition-to-execution accuracy of approximately 94%, making a significant contribution to HRI systems and robots. The complete source code of the framework is publicly available at https://github.com/snt-arg/robot_suite.
comment: 15 pages, 5 figures, 3 tables
UltraHiT: A Hierarchical Transformer Architecture for Generalizable Internal Carotid Artery Robotic Ultrasonography
Carotid ultrasound is crucial for the assessment of cerebrovascular health, particularly the internal carotid artery (ICA). While previous research has explored automating carotid ultrasound, none has tackled the challenging ICA. This is primarily due to its deep location, tortuous course, and significant individual variations, which greatly increase scanning complexity. To address this, we propose a Hierarchical Transformer-based decision architecture, namely UltraHiT, that integrates high-level variation assessment with low-level action decision. Our motivation stems from conceptualizing individual vascular structures as morphological variations derived from a standard vascular model. The high-level module identifies variation and switches between two low-level modules: an adaptive corrector for variations, or a standard executor for normal cases. Specifically, both the high-level module and the adaptive corrector are implemented as causal transformers that generate predictions based on the historical scanning sequence. To ensure generalizability, we collected the first large-scale ICA scanning dataset comprising 164 trajectories and 72K samples from 28 subjects of both genders. Based on the above innovations, our approach achieves a 95% success rate in locating the ICA on unseen individuals, outperforming baselines and demonstrating its effectiveness. Our code will be released after acceptance.
Automating RT Planning at Scale: High Quality Data For AI Training
Radiotherapy (RT) planning is complex, subjective, and time-intensive. Advances with artificial intelligence (AI) promise to improve its precision and efficiency, but progress is often limited by the scarcity of large, standardized datasets. To address this, we introduce the Automated Iterative RT Planning (AIRTP) system, a scalable solution for generating high-quality treatment plans. This scalable solution is designed to generate substantial volumes of consistently high-quality treatment plans, overcoming a key obstacle in the advancement of AI-driven RT planning. Our AIRTP pipeline adheres to clinical guidelines and automates essential steps, including organ-at-risk (OAR) contouring, helper structure creation, beam setup, optimization, and plan quality improvement, using AI integrated with RT planning software like Varian Eclipse. Furthermore, a novel approach for determining optimization parameters to reproduce 3D dose distributions, i.e. a method to convert dose predictions to deliverable treatment plans constrained by machine limitations is proposed. A comparative analysis of plan quality reveals that our automated pipeline produces treatment plans of quality comparable to those generated manually, which traditionally require several hours of labor per plan. Committed to public research, the first data release of our AIRTP pipeline includes nine cohorts covering head-and-neck and lung cancer sites to support an AAPM 2025 challenge. To our best knowledge, this dataset features more than 10 times number of plans compared to the largest existing well-curated public dataset. Repo: https://github.com/RiqiangGao/GDP-HMM_AAPMChallenge.
comment: radiotherapy planning, data for AI training
Context Matters! Relaxing Goals with LLMs for Feasible 3D Scene Planning
Embodied agents need to plan and act reliably in real and complex 3D environments. Classical planning (e.g., PDDL) offers structure and guarantees, but in practice it fails under noisy perception and incorrect predicate grounding. On the other hand, Large Language Models (LLMs)-based planners leverage commonsense reasoning, yet frequently propose actions that are unfeasible or unsafe. Following recent works that combine the two approaches, we introduce ContextMatters, a framework that fuses LLMs and classical planning to perform hierarchical goal relaxation: the LLM helps ground symbols to the scene and, when the target is unreachable, it proposes functionally equivalent goals that progressively relax constraints, adapting the goal to the context of the agent's environment. Operating on 3D Scene Graphs, this mechanism turns many nominally unfeasible tasks into tractable plans and enables context-aware partial achievement when full completion is not achievable. Our experimental results show a +52.45% Success Rate improvement over state-of-the-art LLMs+PDDL baseline, demonstrating the effectiveness of our approach. Moreover, we validate the execution of ContextMatter in a real world scenario by deploying it on a TIAGo robot. Code, dataset, and supplementary materials are available to the community at https://lab-rococo-sapienza.github.io/context-matters/.
Development of a magnetorheological hand exoskeleton featuring a high force-to-power ratio for enhanced grip endurance
Hand exoskeletons have significant potential in labor-intensive fields by mitigating hand grip fatigue, enhancing hand strength, and preventing injuries. However, most of the traditional hand exoskeletons are driven by motors, whose output force is limited in the constrained installation conditions. Besides, they also come with the disadvantages of high power consumption, complex and bulky assistive systems, and high instability. In this work, we develop a novel hand exoskeleton integrated with innovative magnetorheological (MR) clutches that offers a high force-to-power ratio to improve grip endurance. The clutch features an enhanced structure design, a micro roller enhancing structure, which can significantly boost output forces. The experimental data demonstrate that, when it is supplied with 2 V, the clutch can deliver a peak holding force of 381.15 N-55 times that when no voltage is provided (7 N). In this scenario, it only consumes 1.38 W, yielding a force-to-power ratio of 256.75N/W, which is 2.35 times higher than the best-reported actuator used for hand exoskeletons. This capability enables the designed MRHE to provide approximately 419.79 N support force for gripping. The designed MR hand exoskeleton is highly integrated, comprising an exoskeleton frame, MR clutches, a control unit, and a battery. Evaluations through static grip endurance tests and dynamic carrying and lifting tests confirm that the MR hand exoskeleton can effectively reduce muscle fatigue, extend grip endurance, and minimize injuries. These findings highlight its strong potential for practical applications in repetitive tasks such as carrying and lifting in industrial settings.
NAR-*ICP: Neural Execution of Classical ICP-based Pointcloud Registration Algorithms
This study explores the intersection of neural networks and classical robotics algorithms through the Neural Algorithmic Reasoning (NAR) blueprint, enabling the training of neural networks to reason like classical robotics algorithms by learning to execute them. Algorithms are integral to robotics and safety-critical applications due to their predictable and consistent performance through logical and mathematical principles. In contrast, while neural networks are highly adaptable, handling complex, high-dimensional data and generalising across tasks, they often lack interpretability and transparency in their internal computations. To bridge the two, we propose a novel Graph Neural Network (GNN)-based framework, NAR-*ICP, that learns the intermediate computations of classical ICP-based registration algorithms, extending the CLRS Benchmark. We evaluate our approach across real-world and synthetic datasets, demonstrating its flexibility in handling complex inputs, and its potential to be used within larger learning pipelines. Our method achieves superior performance compared to the baselines, even surpassing the algorithms it was trained on, further demonstrating its ability to generalise beyond the capabilities of traditional algorithms.
comment: 19 pages, 16 tables, 7 figures
P2 Explore: Efficient Exploration in Unknown Cluttered Environment with Floor Plan Prediction IROS 2025
Robot exploration aims at the reconstruction of unknown environments, and it is important to achieve it with shorter paths. Traditional methods focus on optimizing the visiting order of frontiers based on current observations, which may lead to local-minimal results. Recently, by predicting the structure of the unseen environment, the exploration efficiency can be further improved. However, in a cluttered environment, due to the randomness of obstacles, the ability to predict is weak. Moreover, this inaccuracy will lead to limited improvement in exploration. Therefore, we propose FPUNet which can be efficient in predicting the layout of noisy indoor environments. Then, we extract the segmentation of rooms and construct their topological connectivity based on the predicted map. The visiting order of these predicted rooms is optimized which can provide high-level guidance for exploration. The FPUNet is compared with other network architectures which demonstrates it is the SOTA method for this task. Extensive experiments in simulations show that our method can shorten the path length by 2.18% to 34.60% compared to the baselines.
comment: 7 pages, Accepted by IROS 2025, Open-sourced at https://github.com/song-kun/P2Explore
Estimating the Joint Probability of Scenario Parameters with Gaussian Mixture Copula Models
This paper presents the first application of Gaussian Mixture Copula Models to the statistical modeling of driving scenarios for the safety validation of automated driving systems. Knowledge of the joint probability distribution of scenario parameters is essential for scenario-based safety assessment, where risk quantification depends on the likelihood of concrete parameter combinations. Gaussian Mixture Copula Models bring together the multimodal expressivity of Gaussian Mixture Models and the flexibility of copulas, enabling separate modeling of marginal distributions and dependencies. We benchmark Gaussian Mixture Copula Models against previously proposed approaches - Gaussian Mixture Models and Gaussian Copula Models - using real-world driving data drawn from scenarios defined in United Nations Regulation No. 157. Our evaluation across approximately 18 million scenario instances demonstrates that Gaussian Mixture Copula Models consistently surpass Gaussian Copula Models and perform better than, or at least comparably to, Gaussian Mixture Models, as measured by both log-likelihood and Sinkhorn distance. These results are promising for the adoption of Gaussian Mixture Copula Models as a statistical foundation for future scenario-based validation frameworks.
comment: 8 pages, 4 figures; This work has been submitted to the IEEE for possible publication
Gaze Estimation for Human-Robot Interaction: Analysis Using the NICO Platform
This paper evaluates the current gaze estimation methods within an HRI context of a shared workspace scenario. We introduce a new, annotated dataset collected with the NICO robotic platform. We evaluate four state-of-the-art gaze estimation models. The evaluation shows that the angular errors are close to those reported on general-purpose benchmarks. However, when expressed in terms of distance in the shared workspace the best median error is 16.48 cm quantifying the practical limitations of current methods. We conclude by discussing these limitations and offering recommendations on how to best integrate gaze estimation as a modality in HRI systems.
comment: Code available at http://github.com/kocurvik/nico_gaze
Control of Humanoid Robots with Parallel Mechanisms using Differential Actuation Models
Several recently released humanoid robots, inspired by the mechanical design of Cassie, employ actuator configurations in which the motors are displaced from the joints to reduce leg inertia. While studies accounting for the full kinematic complexity have demonstrated the benefits of these designs, the associated loop-closure constraints greatly increase computational cost and limit their use in control and learning. As a result, the non-linear transmission is often approximated by a constant reduction ratio, preventing exploitation of the mechanism's full capabilities. This paper introduces a compact analytical formulation for the two standard knee and ankle mechanisms that captures the exact non-linear transmission while remaining computationally efficient. The model is fully differentiable up to second order with a minimal formulation, enabling low-cost evaluation of dynamic derivatives for trajectory optimization and of the apparent transmission impedance for reinforcement learning. We integrate this formulation into trajectory optimization and locomotion policy learning, and compare it against simplified constant-ratio approaches. Hardware experiments demonstrate improved accuracy and robustness, showing that the proposed method provides a practical means to incorporate parallel actuation into modern control algorithms.
Robot Learning from Any Images
We introduce RoLA, a framework that transforms any in-the-wild image into an interactive, physics-enabled robotic environment. Unlike previous methods, RoLA operates directly on a single image without requiring additional hardware or digital assets. Our framework democratizes robotic data generation by producing massive visuomotor robotic demonstrations within minutes from a wide range of image sources, including camera captures, robotic datasets, and Internet images. At its core, our approach combines a novel method for single-view physical scene recovery with an efficient visual blending strategy for photorealistic data collection. We demonstrate RoLA's versatility across applications like scalable robotic data generation and augmentation, robot learning from Internet images, and single-image real-to-sim-to-real systems for manipulators and humanoids. Video results are available at https://sihengz02.github.io/RoLA .
comment: CoRL 2025 camera ready
Mitigating Cross-Modal Distraction and Ensuring Geometric Feasibility via Affordance-Guided and Self-Consistent MLLMs for Task Planning in Instruction-Following Manipulation
We investigate the use of Multimodal Large Language Models (MLLMs) with in-context learning for closed-loop task planning in instruction-following manipulation. We identify four essential requirements for successful task planning: quantity estimation, reachability analysis, relative positioning, and collision avoidance. However, existing benchmarks fail to support holistic evaluation across all these aspects. To address this gap, we introduce \textbf{QuARC} (Quantity, Analysis, Relative positioning, Collision), a new benchmark based on a food preparation scenario that integrates all four challenges. Using QuARC, we reveal two major limitations of current MLLMs: cross-modal distraction and geometric infeasibility. To tackle these, we adapt Chain-of-Thought with Self-Consistency to mitigate reasoning loss from cross-modal distractions and incorporate an affordance predictor to guide planning based on geometric feasibility. Our comprehensive evaluation analyzes performance across multiple baselines and explains sources of improvement. Our method achieves a 76.7\% success rate on the benchmark, significantly outperforming the ViLa baseline (36.7\%), without requiring additional finetuning. Code and dataset are available at https://hcis-lab.github.io/Affordance-Guided-Self-Consistent-MLLM.
Generating and Optimizing Topologically Distinct Guesses for Mobile Manipulator Path Planning with Path Constraints
Optimal path planning is prone to convergence to local, rather than global, optima. This is often the case for mobile manipulators due to nonconvexities induced by obstacles, robot kinematics and constraints. This paper focuses on planning under end effector path constraints and attempts to circumvent the issue of converging to a local optimum. We propose a pipeline that first discovers multiple homotopically distinct paths, and then optimizes them to obtain multiple distinct local optima. The best out of these distinct local optima is likely to be close to the global optimum. We demonstrate the effectiveness of our pipeline in the optimal path planning of mobile manipulators in the presence of path and obstacle constraints.
EffiTune: Diagnosing and Mitigating Training Inefficiency for Parameter Tuner in Robot Navigation System IROS 2025
Robot navigation systems are critical for various real-world applications such as delivery services, hospital logistics, and warehouse management. Although classical navigation methods provide interpretability, they rely heavily on expert manual tuning, limiting their adaptability. Conversely, purely learning-based methods offer adaptability but often lead to instability and erratic robot behaviors. Recently introduced parameter tuners aim to balance these approaches by integrating data-driven adaptability into classical navigation frameworks. However, the parameter tuning process currently suffers from training inefficiencies and redundant sampling, with critical regions in environment often underrepresented in training data. In this paper, we propose EffiTune, a novel framework designed to diagnose and mitigate training inefficiency for parameter tuners in robot navigation systems. EffiTune first performs robot-behavior-guided diagnostics to pinpoint critical bottlenecks and underrepresented regions. It then employs a targeted up-sampling strategy to enrich the training dataset with critical samples, significantly reducing redundancy and enhancing training efficiency. Our comprehensive evaluation demonstrates that EffiTune achieves more than a 13.5% improvement in navigation performance, enhanced robustness in out-of-distribution scenarios, and a 4x improvement in training efficiency within the same computational budget.
comment: Accepted to IROS 2025
EgoExo++: Integrating On-demand Exocentric Visuals with 2.5D Ground Surface Estimation for Interactive Teleoperation of Subsea ROVs
Underwater ROVs (Remotely Operated Vehicles) are indispensable for subsea exploration and task execution, yet typical teleoperation engines based on egocentric (first-person) video feeds restrict human operators' field-of-view and limit precise maneuvering in complex, unstructured underwater environments. To address this, we propose EgoExo, a geometry-driven solution integrated into a visual SLAM pipeline that synthesizes on-demand exocentric (third-person) views from egocentric camera feeds. Our proposed framework, EgoExo++, extends beyond 2D exocentric view synthesis (EgoExo) to augment a dense 2.5D ground surface estimation on-the-fly. It simultaneously renders the ROV model onto this reconstructed surface, enhancing semantic perception and depth comprehension. The computations involved are closed-form and rely solely on egocentric views and monocular SLAM estimates, which makes it portable across existing teleoperation engines and robust to varying waterbody characteristics. We validate the geometric accuracy of our approach through extensive experiments of 2-DOF indoor navigation and 6-DOF underwater cave exploration in challenging low-light conditions. Quantitative metrics confirm the reliability of the rendered Exo views, while a user study involving 15 operators demonstrates improved situational awareness, navigation safety, and task efficiency during teleoperation. Furthermore, we highlight the role of EgoExo++ augmented visuals in supporting shared autonomy, operator training, and embodied teleoperation. This new interactive approach to ROV teleoperation presents promising opportunities for future research in subsea telerobotics.
comment: EgoExo++ (Journal extension), V5, metadata updated, 12 pages
OmniRetarget: Interaction-Preserving Data Generation for Humanoid Whole-Body Loco-Manipulation and Scene Interaction
A dominant paradigm for teaching humanoid robots complex skills is to retarget human motions as kinematic references to train reinforcement learning (RL) policies. However, existing retargeting pipelines often struggle with the significant embodiment gap between humans and robots, producing physically implausible artifacts like foot-skating and penetration. More importantly, common retargeting methods neglect the rich human-object and human-environment interactions essential for expressive locomotion and loco-manipulation. To address this, we introduce OmniRetarget, an interaction-preserving data generation engine based on an interaction mesh that explicitly models and preserves the crucial spatial and contact relationships between an agent, the terrain, and manipulated objects. By minimizing the Laplacian deformation between the human and robot meshes while enforcing kinematic constraints, OmniRetarget generates kinematically feasible trajectories. Moreover, preserving task-relevant interactions enables efficient data augmentation, from a single demonstration to different robot embodiments, terrains, and object configurations. We comprehensively evaluate OmniRetarget by retargeting motions from OMOMO, LAFAN1, and our in-house MoCap datasets, generating over 8-hour trajectories that achieve better kinematic constraint satisfaction and contact preservation than widely used baselines. Such high-quality data enables proprioceptive RL policies to successfully execute long-horizon (up to 30 seconds) parkour and loco-manipulation skills on a Unitree G1 humanoid, trained with only 5 reward terms and simple domain randomization shared by all tasks, without any learning curriculum.
comment: Project website: https://omniretarget.github.io
Flight Demonstration and Model Validation of a Prototype Variable-Altitude Venus Aerobot
This paper details a significant milestone towards maturing a buoyant aerial robotic platform, or aerobot, for flight in the Venus clouds. We describe two flights of our subscale altitude-controlled aerobot, fabricated from the materials necessary to survive Venus conditions. During these flights over the Nevada Black Rock desert, the prototype flew at the identical atmospheric densities as 54 to 55 km cloud layer altitudes on Venus. We further describe a first-principle aerobot dynamics model which we validate against the Nevada flight data and subsequently employ to predict the performance of future aerobots on Venus. The aerobot discussed in this paper is under JPL and Aerostar development for an in-situ mission flying multiple circumnavigations of Venus, sampling the chemical and physical properties of the planet's atmosphere and also remotely sensing surface properties.
comment: Accepted version for AIAA Journal of Aircraft
Preferenced Oracle Guided Multi-mode Policies for Dynamic Bipedal Loco-Manipulation
Dynamic loco-manipulation calls for effective whole-body control and contact-rich interactions with the object and the environment. Existing learning-based control synthesis relies on training low-level skill policies and explicitly switching with a high-level policy or a hand-designed finite state machine, leading to quasi-static behaviors. In contrast, dynamic tasks such as soccer require the robot to run towards the ball, decelerate to an optimal approach to dribble, and eventually kick a goal - a continuum of smooth motion. To this end, we propose Preferenced Oracle Guided Multi-mode Policies (OGMP) to learn a single policy mastering all the required modes and preferred sequence of transitions to solve uni-object loco-manipulation tasks. We design hybrid automatons as oracles to generate references with continuous dynamics and discrete mode jumps to perform a guided policy optimization through bounded exploration. To enforce learning a desired sequence of mode transitions, we present a task-agnostic preference reward that enhances performance. The proposed approach demonstrates successful loco-manipulation for tasks like soccer and moving boxes omnidirectionally through whole-body control. In soccer, a single policy learns to optimally reach the ball, transition to contact-rich dribbling, and execute successful goal kicks and ball stops. Leveraging the oracle's abstraction, we solve each loco-manipulation task on robots with varying morphologies, including HECTOR V1, Berkeley Humanoid, Unitree G1, and H1, using the same reward definition and weights.
comment: 7 pages, 8 figures
BaTCAVe: Trustworthy Explanations for Robot Behaviors
Black box neural networks are an indispensable part of modern robots. Nevertheless, deploying such high-stakes systems in real-world scenarios poses significant challenges when the stakeholders, such as engineers and legislative bodies, lack insights into the neural networks' decision-making process. Presently, explainable AI is primarily tailored to natural language processing and computer vision, falling short in two critical aspects when applied in robots: grounding in decision-making tasks and the ability to assess trustworthiness of their explanations. In this paper, we introduce a trustworthy explainable robotics technique based on human-interpretable, high-level concepts that attribute to the decisions made by the neural network. Our proposed technique provides explanations with associated uncertainty scores for the explanation by matching neural network's activations with human-interpretable visualizations. To validate our approach, we conducted a series of experiments with various simulated and real-world robot decision-making models, demonstrating the effectiveness of the proposed approach as a post-hoc, human-friendly robot diagnostic tool.
comment: 19 pages, 26 figures
SiLVR: Scalable Lidar-Visual Radiance Field Reconstruction with Uncertainty Quantification
We present a neural radiance field (NeRF) based large-scale reconstruction system that fuses lidar and vision data to generate high-quality reconstructions that are geometrically accurate and capture photorealistic texture. Our system adopts the state-of-the-art NeRF representation to incorporate lidar. Adding lidar data adds strong geometric constraints on the depth and surface normals, which is particularly useful when modelling uniform texture surfaces which contain ambiguous visual reconstruction cues. A key contribution of this work is a novel method to quantify the epistemic uncertainty of the lidar-visual NeRF reconstruction by estimating the spatial variance of each point location in the radiance field given the sensor observations from the cameras and lidar. This provides a principled approach to evaluate the contribution of each sensor modality to the final reconstruction. In this way, reconstructions that are uncertain (due to e.g. uniform visual texture, limited observation viewpoints, or little lidar coverage) can be identified and removed. Our system is integrated with a real-time lidar SLAM system which is used to bootstrap a Structure-from-Motion (SfM) reconstruction procedure. It also helps to properly constrain the overall metric scale which is essential for the lidar depth loss. The refined SLAM trajectory can then be divided into submaps using Spectral Clustering to group sets of co-visible images together. This submapping approach is more suitable for visual reconstruction than distance-based partitioning. Our uncertainty estimation is particularly effective when merging submaps as their boundaries often contain artefacts due to limited observations. We demonstrate the reconstruction system using a multi-camera, lidar sensor suite in experiments involving both robot-mounted and handheld scanning. Our test datasets cover a total area of more than 20,000 square metres.
comment: Accepted by T-RO. Webpage: https://dynamic.robots.ox.ac.uk/projects/silvr/
Autonomy Architectures for Safe Planning in Unknown Environments Under Budget Constraints
Mission planning can often be formulated as a constrained control problem under multiple path constraints (i.e., safety constraints) and budget constraints (i.e., resource expenditure constraints). In a priori unknown environments, verifying that an offline solution will satisfy the constraints for all time can be difficult, if not impossible. We present ReRoot, a novel sampling-based framework that enforces safety and budget constraints for nonlinear systems in unknown environments. The main idea is that ReRoot grows multiple reverse RRT* trees online, starting from renewal sets, i.e., sets where the budget constraints are renewed. The dynamically feasible backup trajectories guarantee safety and reduce resource expenditure, which provides a principled backup policy when integrated into the gatekeeper safety verification architecture. We demonstrate our approach in simulation with a fixed-wing UAV in a GNSS-denied environment with a budget constraint on localization error that can be renewed at visual landmarks.
comment: Code: https://github.com/dcherenson/budget-constrained-planning
Estimating Dynamic Soft Continuum Robot States From Boundaries
State estimation is one of the fundamental problems in robotics. For soft continuum robots, this task is particularly challenging because their states (poses, strains, internal wrenches, and velocities) are inherently infinite-dimensional functions due to their continuous deformability. Traditional sensing techniques, however, can only provide discrete measurements. Recently, a dynamic state estimation method known as a \textit{boundary observer} was introduced, which uses Cosserat rod theory to recover all infinite-dimensional states by measuring only the tip velocity. In this work, we present a dual design that instead relies on measuring the internal wrench at the robot's base. Despite the duality, this new approach offers a key practical advantage: it requires only a force/torque (FT) sensor embedded at the base and eliminates the need for external motion capture systems. Both observer types are inspired by principles of energy dissipation and can be naturally combined to enhance performance. We conduct a Lyapunov-based analysis to study the convergence rate of these boundary observers and reveal a useful property: as the observer gains increase, the convergence rate initially improves and then degrades. This convex trend enables efficient tuning of the observer gains. We also identify special cases where linear and angular states are fully determined by each other, which further relaxes sensing requirements. Experimental studies using a tendon-driven continuum robot validate the convergence of all observer variants under fast dynamic motions, the existence of optimal gains, robustness against unknown external forces, and the algorithm's real-time computational performance.
Multiagent Systems
Multi-Objective Multi-Agent Path Finding with Lexicographic Cost Preferences
Many real-world scenarios require multiple agents to coordinate in shared environments, while balancing trade-offs between multiple, potentially competing objectives. Current multi-objective multi-agent path finding (MO-MAPF) algorithms typically produce conflict-free plans by computing Pareto frontiers. They do not explicitly optimize for user-defined preferences, even when the preferences are available, and scale poorly with the number of objectives. We propose a lexicographic framework for modeling MO-MAPF, along with an algorithm \textit{Lexicographic Conflict-Based Search} (LCBS) that directly computes a single solution aligned with a lexicographic preference over objectives. LCBS integrates a priority-aware low-level $A^*$ search with conflict-based search, avoiding Pareto frontier construction and enabling efficient planning guided by preference over objectives. We provide insights into optimality and scalability, and empirically demonstrate that LCBS computes optimal solutions while scaling to instances with up to ten objectives -- far beyond the limits of existing MO-MAPF methods. Evaluations on standard and randomized MAPF benchmarks show consistently higher success rates against state-of-the-art baselines, especially with increasing number of objectives.
comment: 8 pages, 7 figures
A Multi-Agent Framework for Stateful Inference-Time Search
Recent work explores agentic inference-time techniques to perform structured, multi-step reasoning. However, stateless inference often struggles on multi-step tasks due to the absence of persistent state. Moreover, task-specific fine-tuning or instruction-tuning often achieve surface-level code generation but remain brittle on tasks requiring deeper reasoning and long-horizon dependencies. To address these limitations, we propose stateful multi-agent evolutionary search, a training-free framework that departs from prior stateless approaches by combining (i) persistent inference-time state, (ii) adversarial mutation, and (iii) evolutionary preservation. We demonstrate its effectiveness in automated unit test generation through the generation of edge cases. We generate robust edge cases using an evolutionary search process, where specialized agents sequentially propose, mutate, and score candidates. A controller maintains persistent state across generations, while evolutionary preservation ensures diversity and exploration across all possible cases. This yields a generalist agent capable of discovering robust, high-coverage edge cases across unseen codebases. Experiments show our stateful multi-agent inference framework achieves substantial gains in coverage over stateless single-step baselines, evaluated on prevalent unit-testing benchmarks such as HumanEval and TestGenEvalMini and using three diverse LLM families - Llama, Gemma, and GPT. These results indicate that combining persistent inference-time state with evolutionary search materially improves unit-test generation.
FURINA: A Fully Customizable Role-Playing Benchmark via Scalable Multi-Agent Collaboration Pipeline
As large language models (LLMs) advance in role-playing (RP) tasks, existing benchmarks quickly become obsolete due to their narrow scope, outdated interaction paradigms, and limited adaptability across diverse application scenarios. To address this gap, we introduce FURINA-Builder, a novel multi-agent collaboration pipeline that automatically constructs fully customizable RP benchmarks at any scale. It enables evaluation of arbitrary characters across diverse scenarios and prompt formats, as the first benchmark builder in RP area for adaptable assessment. FURINA-Builder simulates dialogues between a test character and other characters drawn from a well-constructed character-scene pool, while an LLM judge selects fine-grained evaluation dimensions and adjusts the test character's responses into final test utterances. Using this pipeline, we build FURINA-Bench, a new comprehensive role-playing benchmark featuring both established and synthesized test characters, each assessed with dimension-specific evaluation criteria. Human evaluation and preliminary separability analysis justify our pipeline and benchmark design. We conduct extensive evaluations of cutting-edge LLMs and find that o3 and DeepSeek-R1 achieve the best performance on English and Chinese RP tasks, respectively. Across all models, established characters consistently outperform synthesized ones, with reasoning capabilities further amplifying this disparity. Interestingly, we observe that model scale does not monotonically reduce hallucinations. More critically, for reasoning LLMs, we uncover a novel trade-off: reasoning improves RP performance but simultaneously increases RP hallucinations. This trade-off extends to a broader Pareto frontier between RP performance and reliability for all LLMs. These findings demonstrate the effectiveness of FURINA-Builder and the challenge posed by FURINA-Bench.
Measuring and Mitigating Identity Bias in Multi-Agent Debate via Anonymization
Multi-agent debate (MAD) aims to improve large language model (LLM) reasoning by letting multiple agents exchange answers and then aggregate their opinions. Yet recent studies reveal that agents are not neutral: they are prone to identity-driven sycophancy and self-bias, uncritically adopting a peer's view or stubbornly adhering to their own prior output, undermining the reliability of debate. In this work, we present the first principled framework that joins sycophancy and self-bias to mitigate and quantify identity bias in MAD. First, we formalize the debate dynamics as an identity-weighted Bayesian update process. Second, we propose response anonymization: by removing identity markers from prompts, agents cannot distinguish "self" from "peer", which forces equal weights on agent identity, thereby reducing bias. Third, we define the Identity Bias Coefficient (IBC), a principled metric that measures how often an agent follows a peer versus itself. Empirical studies across multiple models, datasets and debate rounds confirm that identity bias is widespread, with sycophancy far more common than self-bias. Our findings highlight the need to "mask" identity to ensure that MAD systems reason based on content rather than source identity. Code is released in https://github.com/deeplearning-wisc/MAD-identity-bias.
Code Like Humans: A Multi-Agent Solution for Medical Coding EMNLP
In medical coding, experts map unstructured clinical notes to alphanumeric codes for diagnoses and procedures. We introduce Code Like Humans: a new agentic framework for medical coding with large language models. It implements official coding guidelines for human experts, and it is the first solution that can support the full ICD-10 coding system (+70K labels). It achieves the best performance to date on rare diagnosis codes (fine-tuned discriminative classifiers retain an advantage for high-frequency codes, to which they are limited). Towards future work, we also contribute an analysis of system performance and identify its `blind spots' (codes that are systematically undercoded).
comment: EMNLP Findings 2025
GPS-MTM: Capturing Pattern of Normalcy in GPS-Trajectories with self-supervised learning
Foundation models have driven remarkable progress in text, vision, and video understanding, and are now poised to unlock similar breakthroughs in trajectory modeling. We introduce the GPSMasked Trajectory Transformer (GPS-MTM), a foundation model for large-scale mobility data that captures patterns of normalcy in human movement. Unlike prior approaches that flatten trajectories into coordinate streams, GPS-MTM decomposes mobility into two complementary modalities: states (point-of-interest categories) and actions (agent transitions). Leveraging a bi-directional Transformer with a self-supervised masked modeling objective, the model reconstructs missing segments across modalities, enabling it to learn rich semantic correlations without manual labels. Across benchmark datasets, including Numosim-LA, Urban Anomalies, and Geolife, GPS-MTM consistently outperforms on downstream tasks such as trajectory infilling and next-stop prediction. Its advantages are most pronounced in dynamic tasks (inverse and forward dynamics), where contextual reasoning is critical. These results establish GPS-MTM as a robust foundation model for trajectory analytics, positioning mobility data as a first-class modality for large-scale representation learning. Code is released for further reference.
comment: 4 pages, 2 figures
AutoMind: Adaptive Knowledgeable Agent for Automated Data Science
Large Language Model (LLM) agents have shown great potential in addressing real-world data science problems. LLM-driven data science agents promise to automate the entire machine learning pipeline, yet their real-world effectiveness remains limited. Existing frameworks depend on rigid, pre-defined workflows and inflexible coding strategies; consequently, they excel only on relatively simple, classical problems and fail to capture the empirical expertise that human practitioners bring to complex, innovative tasks. In this work, we introduce AutoMind, an adaptive, knowledgeable LLM-agent framework that overcomes these deficiencies through three key advances: (1) a curated expert knowledge base that grounds the agent in domain expert knowledge, (2) an agentic knowledgeable tree search algorithm that strategically explores possible solutions, and (3) a self-adaptive coding strategy that dynamically tailors code generation to task complexity. Evaluations on two automated data science benchmarks demonstrate that AutoMind delivers superior performance versus state-of-the-art baselines. Additional analyses confirm favorable effectiveness, efficiency, and qualitative solution quality, highlighting AutoMind as an efficient and robust step toward fully automated data science. Code is at https://github.com/innovatingAI/AutoMind.
comment: Ongoing work
KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality
Large Language Models (LLMs), particularly slow-thinking models, often exhibit severe hallucination, outputting incorrect content due to an inability to accurately recognize knowledge boundaries during reasoning. While Reinforcement Learning (RL) can enhance complex reasoning abilities, its outcome-oriented reward mechanism often lacks factual supervision over the thinking process, further exacerbating the hallucination problem. To address the high hallucination in slow-thinking models, we propose Knowledge-enhanced RL, KnowRL. KnowRL guides models to perform fact-based slow thinking by integrating a factuality reward, based on knowledge verification, into the RL training process, helping them recognize their knowledge boundaries. KnowRL guides models to perform fact-based slow thinking by integrating a factuality reward, based on knowledge verification, into the RL training process, helping them recognize their knowledge boundaries. This targeted factual input during RL training enables the model to learn and internalize fact-based reasoning strategies. By directly rewarding adherence to facts within the reasoning steps, KnowRL fosters a more reliable thinking process. Experimental results on three hallucination evaluation datasets and two reasoning evaluation datasets demonstrate that KnowRL effectively mitigates hallucinations in slow-thinking models while maintaining their original strong reasoning capabilities. Our code is available at https://github.com/zjunlp/KnowRL.
comment: Work in progress
Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning
Large Language Models (LLMs) have demonstrated impressive capabilities across a wide range of NLP tasks, but they remain fundamentally stateless, constrained by limited context windows that hinder long-horizon reasoning. Recent efforts to address this limitation often augment LLMs with an external memory bank, yet most existing pipelines are static and heuristic-driven, lacking a learned mechanism for deciding what to store, update, or retrieve. We present Memory-R1, a reinforcement learning (RL) framework that equips LLMs with the ability to actively manage and utilize external memory through two specialized agents: a Memory Manager that learns structured operations, including ADD, UPDATE, DELETE, and NOOP; and an Answer Agent that pre-selects and reasons over relevant entries. Both agents are fine-tuned with outcome-driven RL (PPO and GRPO), enabling adaptive memory management with minimal supervision. With only 152 training QA pairs, Memory-R1 outperforms strong baselines and generalizes across diverse question types, three benchmarks (LoCoMo, MSC, LongMemEval), and multiple model scales (3B-14B).
Behavioral alignment in social networks
The orderly behaviors observed in large-scale groups, such as fish schooling and the organized movement of crowds, are both ubiquitous and essential for the survival and stability of these systems. Understanding how such complex collective behaviors emerge from simple local interactions and behavioral adjustments is a significant scientific challenge. Historically, research has predominantly focused on imitation and social learning, where individuals adopt the strategies of more successful peers to refine their behavior. However, in recent years, an alternative learning approach based on self-exploration and introspective learning has garnered increasing attention. In this paradigm, individuals assess their own circumstances and select strategies that best align with their specific conditions. Two examples are coordination and anti-coordination, where individuals align with and diverge from the local majority, respectively. In this study, we analyze networked systems of coordinating and anti-coordinating individuals, exploring the combined effects of system dynamics, network structure, and behavioral patterns. We address several practical questions, including the number of equilibria, their characteristics, the equilibrium time, and the resilience of the system. We find that the number of equilibrium states can be extremely large, even increasing exponentially with minor alterations to the network structure. Moreover, the network structure has a significant impact on the average equilibrium time. Despite the complexity of these findings, we find that variations can be captured by a single, simple network characteristic (the average path length), which we illustrate in both synthetic and empirical networks.
CodeCureAgent: Automatic Classification and Repair of Static Analysis Warnings
Static analysis tools are widely used to detect bugs, vulnerabilities, and code smells. Traditionally, developers must resolve these warnings manually. Because this process is tedious, developers sometimes ignore warnings, leading to an accumulation of warnings and a degradation of code quality. This paper presents CodeCureAgent, an approach that harnesses LLM-based agents to automatically analyze, classify, and repair static analysis warnings. Unlike previous work, our method does not follow a predetermined algorithm. Instead, we adopt an agentic framework that iteratively invokes tools to gather additional information from the codebase (e.g., via code search) and edit the codebase to resolve the warning. CodeCureAgent detects and suppresses false positives, while fixing true positives when identified. We equip CodeCureAgent with a three-step heuristic to approve patches: (1) build the project, (2) verify that the warning disappears without introducing new warnings, and (3) run the test suite. We evaluate CodeCureAgent on a dataset of 1,000 SonarQube warnings found in 106 Java projects and covering 291 distinct rules. Our approach produces plausible fixes for 96.8% of the warnings, outperforming state-of-the-art baseline approaches by 30.7% and 29.2% in plausible-fix rate, respectively. Manual inspection of 291 cases reveals a correct-fix rate of 86.3%, showing that CodeCureAgent can reliably repair static analysis warnings. The approach incurs LLM costs of about 2.9 cents (USD) and an end-to-end processing time of about four minutes per warning. We envision CodeCureAgent helping to clean existing codebases and being integrated into CI/CD pipelines to prevent the accumulation of static analysis warnings.
Consistent Opponent Modeling of Static Opponents in Imperfect-Information Games
The goal of agents in multi-agent environments is to maximize total reward against the opposing agents that are encountered. Following a game-theoretic solution concept, such as Nash equilibrium, may obtain a strong performance in some settings; however, such approaches fail to capitalize on historical and observed data from repeated interactions against our opponents. Opponent modeling algorithms integrate machine learning techniques to exploit suboptimal opponents utilizing available data; however, the effectiveness of such approaches in imperfect-information games to date is quite limited. We show that existing opponent modeling approaches fail to satisfy a simple desirable property even against static opponents drawn from a known prior distribution; namely, they do not guarantee that the model approaches the opponent's true strategy even in the limit as the number of game iterations approaches infinity. We develop a new algorithm that is able to achieve this property and runs efficiently by solving a convex minimization problem based on the sequence-form game representation using projected gradient descent. The algorithm is guaranteed to efficiently converge to the opponent's true strategy given observations from gameplay and possibly additional historical data if it is available.
The Price of Uncertainty for Social Consensus
How hard is it to achieve consensus in a social network under uncertainty? In this paper we model this problem as a social graph of agents where each vertex is initially colored red or blue. The goal of the agents is to achieve consensus, which is when the colors of all agents align. Agents attempt to do this locally through steps in which an agent changes their color to the color of the majority of their neighbors. In real life, agents may not know exactly how many of their neighbors are red or blue, which introduces uncertainty into this process. Modeling uncertainty as perturbations of relative magnitude $1+\varepsilon$ to these color neighbor counts, we show that even small values of $\varepsilon$ greatly hinder the ability to achieve consensus in a social network. We prove theoretically tight upper and lower bounds on the price of uncertainty, a metric defined by Balcan et al. to quantify the effect of uncertainty in network games.
comment: 17 pages
$\textit{Agents Under Siege}$: Breaking Pragmatic Multi-Agent LLM Systems with Optimized Prompt Attacks
Most discussions about Large Language Model (LLM) safety have focused on single-agent settings but multi-agent LLM systems now create novel adversarial risks because their behavior depends on communication between agents and decentralized reasoning. In this work, we innovatively focus on attacking pragmatic systems that have constrains such as limited token bandwidth, latency between message delivery, and defense mechanisms. We design a $\textit{permutation-invariant adversarial attack}$ that optimizes prompt distribution across latency and bandwidth-constraint network topologies to bypass distributed safety mechanisms within the system. Formulating the attack path as a problem of $\textit{maximum-flow minimum-cost}$, coupled with the novel $\textit{Permutation-Invariant Evasion Loss (PIEL)}$, we leverage graph-based optimization to maximize attack success rate while minimizing detection risk. Evaluating across models including $\texttt{Llama}$, $\texttt{Mistral}$, $\texttt{Gemma}$, $\texttt{DeepSeek}$ and other variants on various datasets like $\texttt{JailBreakBench}$ and $\texttt{AdversarialBench}$, our method outperforms conventional attacks by up to $7\times$, exposing critical vulnerabilities in multi-agent systems. Moreover, we demonstrate that existing defenses, including variants of $\texttt{Llama-Guard}$ and $\texttt{PromptGuard}$, fail to prohibit our attack, emphasizing the urgent need for multi-agent specific safety mechanisms.
Chisme: Fully Decentralized Differentiated Deep Learning for IoT Intelligence
As end-user device capability increases and demand for intelligent services at the Internet's edge rise, distributed learning has emerged as a key enabling technology. Existing approaches like federated learning (FL) and decentralized FL (DFL) enable distributed learning among clients, while gossip learning (GL) approaches have emerged to address the potential challenges in resource-constrained, connectivity-challenged infrastructure-less environments. However, most distributed learning approaches assume largely homogeneous data distributions and may not consider or exploit the heterogeneity of clients and their underlying data distributions. This paper introduces Chisme, a novel fully decentralized distributed learning algorithm designed to address the challenges of implementing robust intelligence in network edge contexts characterized by heterogeneous data distributions, episodic connectivity, and sparse network infrastructure. Chisme leverages cosine similarity-based data affinity heuristics calculated from received model exchanges to inform how much influence received models have when merging into the local model. By doing so, it facilitates stronger merging influence between clients with more similar model learning progressions, enabling clients to strategically balance between broader collaboration to build more general knowledge and more selective collaboration to build specific knowledge. We evaluate Chisme against contemporary approaches using image recognition and time-series prediction scenarios while considering different network connectivity conditions, representative of real-world distributed intelligent systems. Our experiments demonstrate that Chisme outperforms state-of-the-art edge intelligence approaches in almost every case -- clients using Chisme exhibit faster training convergence, lower final loss after training, and lower performance disparity between clients.
comment: This work has been submitted to the IEEE PerCom 2026 for potential publication
Systems and Control (CS)
A Genetic Algorithm Approach to Anti-Jamming UAV Swarm Behavior
In recent years, Unmanned Aerial Vehicles (UAVs) have brought a new true revolution to military tactics. While UAVs already constitute an advantage when operating alone, multi-UAV swarms expand the available possibilities, allowing the UAVs to collaborate and support each other as a team to carry out a given task. This entails the capability to exchange information related with situation awareness and action coordination by means of a suitable wireless communication technology. In such scenario, the adversary is expected to disrupt communications by jamming the communication channel. The latter becomes the Achilles heel of the swarm. While anti-jamming techniques constitute a well covered topic in the literature, the use of intelligent swarm behaviors to leverage those techniques is still an open research issue. This paper explores the use of Genetic Algorithms (GAs) to jointly optimize UAV swarm formation, beam-steering antennas and traffic routing in order to mitigate the effect of jamming in the main coordination channel, under the assumption that a more robust and low data rate channel is used for formation management signaling. Simulation results show the effectiveness of proposed approach. However, the significant computational cost paves the way for further research.
comment: 8 pages, conference paper
Stability Preserving Safe Control of a Bicopter
This paper presents a control law for stabilization and trajectory tracking of a multicopter subject to safety constraints. The proposed approach guarantees forward invariance of a prescribed safety set while ensuring smooth tracking performance. Unlike conventional control barrier function methods, the constrained control problem is transformed into an unconstrained one using state-dependent mappings together with carefully constructed Lyapunov functions. This approach enables explicit synthesis of the control law, instead of requiring a solution of constrained optimization at each step. The transformation also enables the controller to enforce safety without sacrificing stability or performance. Simulation results for a polytopic reference trajectory confined within a designated safe region demonstrate the effectiveness of the proposed method.
From Neural Sensing to Stimulation: An Interdisciplinary Roadmap for Neurotechnology
Neurotechnologies are transforming how we measure, interpret, and modulate brain-body interactions, integrating real-time sensing, computation, and stimulation to enable precise physiological control. They hold transformative potential across clinical and non-clinical domains, from treating disorders to enhancing cognition and performance. Realizing this potential requires navigating complex, interdisciplinary challenges spanning neuroscience, materials science, device engineering, signal processing, computational modelling, and regulatory and ethical frameworks. This Perspective presents a strategic roadmap for neurotechnology development, created by early-career researchers, highlighting their role at the intersection of disciplines and their capacity to bridge traditional silos. We identify five cross-cutting trade-offs that constrain progress across functionality, scalability, adaptability, and translatability, and illustrate how technical domains influence their resolution. Rather than a domain-specific review, we focus on shared challenges and strategic opportunities that transcend disciplines. We propose a unified framework for collaborative innovation and education, highlight ethical and regulatory priorities, and outline a timeline for overcoming key bottlenecks. By aligning technical development with translational and societal needs, this roadmap aims to accelerate equitable, effective, and future-ready adaptive neurotechnologies, guiding coordinated efforts across the global research and innovation community.
Identification and optimal control strategies for the transversal splitting of ultra--cold Bose gases
Splitting a Bose--Einstein condensate (BEC) is a key operation in fundamental physics experiments and emerging quantum technologies, where precise preparation of well--defined initial states requires fast yet coherent control of the condensate's nonlinear dynamics. This work formulates the BEC splitting process as an optimal feedforward control problem based on a physically interpretable, reduced--order model identified from limited experimental data. We introduce a systematic calibration strategy that combines optimal experiment selection and constrained nonlinear parameter estimation, enabling accurate system identification with minimal experimental overhead. Using this calibrated model, we compute energy--optimal trajectories via indirect optimal control to realize shortcuts to adiabaticity (STAs), achieving rapid transitions to the ground state of a double--well potential while suppressing excitations. Experiments confirm that the proposed control framework yields high--fidelity state transfers across multiple configurations, demonstrating its robustness and scalability for quantum control applications.
comment: To be published in IEEE Transactions on Control Systems Technology
Mitigating Increase-Decrease Gaming with Alternative Connection Agreements: A Defender-Attacker-Defender Game
Redispatch markets are widely used by system operators to manage network congestion. A well-known drawback, however, is that Flexibility Service Providers (FSPs) may strategically adjust their baselines in anticipation of redispatch actions, thereby aggravating congestion and raising system costs. To address this increase-decrease gaming, Distribution System Operators (DSOs) could use Alternative Connection Agreements (ACAs) to conditionally limit the available connection capacity of market participants in the day-ahead stage. In this paper, we present a novel Defender-Attacker-Defender game to investigate the potential of this approach in distribution networks under load and price uncertainty. We solve the resulting trilevel optimization model using a custom branch-and-bound algorithm, and we demonstrate that it efficiently solves the problem without exploring many nodes in the branch-and-bound search tree for most simulated scenarios. The case study demonstrates that applying ACAs can substantially lower redispatch costs (e.g. by 25%) for the DSO with only a limited impact on FSP profits. The effectiveness of the approach critically depends on how often the DSO can invoke ACAs and on the extent to which the DSO can anticipate strategic bidding behavior of the FSP.
Falsification-Driven Reinforcement Learning for Maritime Motion Planning
Compliance with maritime traffic rules is essential for the safe operation of autonomous vessels, yet training reinforcement learning (RL) agents to adhere to them is challenging. The behavior of RL agents is shaped by the training scenarios they encounter, but creating scenarios that capture the complexity of maritime navigation is non-trivial, and real-world data alone is insufficient. To address this, we propose a falsification-driven RL approach that generates adversarial training scenarios in which the vessel under test violates maritime traffic rules, which are expressed as signal temporal logic specifications. Our experiments on open-sea navigation with two vessels demonstrate that the proposed approach provides more relevant training scenarios and achieves more consistent rule compliance.
Decentralized CBF-based Safety Filters for Collision Avoidance of Cooperative Missile Systems with Input Constraints
This paper presents a decentralized safety filter for collision avoidance in multi-agent aerospace interception scenarios. The approach leverages robust control barrier functions (RCBFs) to guarantee forward invariance of safety sets under bounded inputs and high-relative-degree dynamics. Each effector executes its nominal cooperative guidance command, while a local quadratic program (QP) modifies the input only when necessary. Event-triggered activation based on range and zero-effort miss (ZEM) criteria ensures scalability by restricting active constraints to relevant neighbors. To resolve feasibility issues from simultaneous constraints, a slack-variable relaxation scheme is introduced that prioritizes critical agents in a Pareto-optimal manner. Simulation results in many-on-many interception scenarios demonstrate that the proposed framework maintains collision-free operation with minimal deviation from nominal guidance, providing a computationally efficient and scalable solution for safety-critical multi-agent aerospace systems.
comment: 7 pages, 5 figures
Resilient Multi-Dimensional Consensus and Distributed Optimization against Agent-Based and Denial-of-Service Attacks
In this paper, we consider the resilient multi-dimensional consensus and distributed optimization problems of multi-agent systems (MASs) in the presence of both agent-based and denial-of-service (DoS) attacks. The considered agent-based attacks can cover malicious, Byzantine, and stubborn agents. The links between agents in the network can be blocked by DoS attacks, which may lead the digraph to be time-varying and even disconnected. The objective is to ensure that the remaining benign agents achieve consensus. To this end, an "auxiliary point"-based resilient control algorithm is proposed for MASs. Under the proposed algorithm, each healthy agent constructs a "safe kernel" utilizing the states of its in-neighbors and updates its state toward a specific point within this kernel at each iteration. If an agent cannot receive its neighbors' states owing to DoS attacks, it will use the states received immediately before the DoS period. Moreover, a resilient multi-dimensional distributed optimization (RMDO) algorithm is also proposed. Theoretical proofs and numerical examples are presented to demonstrate the effectiveness of the proposed algorithms.
Delay Independent Safe Control with Neural Networks: Positive Lur'e Certificates for Risk Aware Autonomy
We present a risk-aware safety certification method for autonomous, learning enabled control systems. Focusing on two realistic risks, state/input delays and interval matrix uncertainty, we model the neural network (NN) controller with local sector bounds and exploit positivity structure to derive linear, delay-independent certificates that guarantee local exponential stability across admissible uncertainties. To benchmark performance, we adopt and implement a state-of-the-art IQC NN verification pipeline. On representative cases, our positivity-based tests run orders of magnitude faster than SDP-based IQC while certifying regimes the latter cannot-providing scalable safety guarantees that complement risk-aware control.
comment: Submitted to 2026 American Control Conference (ACC), New Orleans, LA
A Cascade of Systems and the Product of Their $θ$-Symmetric Scaled Relative Graphs
In this paper, we utilize a variant of the scaled relative graph (SRG), referred to as the $\theta$-symmetric SRG, to develop a graphical stability criterion for the feedback interconnection of a cascade of systems. A crucial submultiplicative property of $\theta$-symmetric SRG is established, enabling it to handle cyclic interconnections for which conventional graph separation methods are not applicable. By integrating both gain and refined phase information, the $\theta$-symmetric SRG provides a unified graphical characterization of the system, which better captures system properties and yields less conservative results. In the scalar case, the $\theta$-symmetric SRG can be reduced exactly to the scalar itself, whereas the standard SRG appears to be a conjugate pair. Consequently, the frequency-wise $\theta$-symmetric SRG is more suitable than the standard SRG as a multi-input multi-output extension of the classical Nyquist plot. Illustrative examples are included to demonstrate the effectiveness of the $\theta$-symmetric SRG.
comment: 9 pages, 4 figures
Safe Stabilization of the Stefan Problem with a High-Order Moving Boundary Dynamics by PDE Backstepping
This paper presents a safe stabilization of the Stefan PDE model with a moving boundary governed by a high-order dynamics. We consider a parabolic PDE with a time-varying domain governed by a second-order response with respect to the Neumann boundary value of the PDE state at the moving boundary. The objective is to design a boundary heat flux control to stabilize the moving boundary at a desired setpoint, with satisfying the required conditions of the model on PDE state and the moving boundary. We apply a PDE backstepping method for the control design with considering a constraint on the control law. The PDE and moving boundary constraints are shown to be satisfied by applying the maximum principle for parabolic PDEs. Then the closed-loop system is shown to be globally exponentially stable by performing Lyapunov analysis. The proposed control is implemented in numerical simulation, which illustrates the desired performance in safety and stability. An outline of the extension to third-order moving boundary dynamics is also presented. Code is released at https://github.com/shumon0423/HighOrderStefan_CDC2025.git.
comment: 6 pages, 4 figures, 64th IEEE Conference on Decision and Control (CDC) 2025
Model Predictive Path Integral Control for Roll-to-Roll Manufacturing
Roll-to-roll (R2R) manufacturing is a continuous processing technology essential for scalable production of thin-film materials and printed electronics, but precise control remains challenging due to subsystem interactions, nonlinearities, and process disturbances. This paper proposes a Model Predictive Path Integral (MPPI) control formulation for R2R systems, leveraging a GPU-based Monte-Carlo sampling approach to efficiently approximate optimal controls online. Crucially, MPPI easily handles non-differentiable cost functions, enabling the incorporation of complex performance criteria relevant to advanced manufacturing processes. A case study is presented that demonstrates that MPPI significantly improves tension regulation performance compared to conventional model predictive control (MPC), highlighting its suitability for real-time control in advanced manufacturing.
comment: 6 pages, 4 figures
GATO: GPU-Accelerated and Batched Trajectory Optimization for Scalable Edge Model Predictive Control
While Model Predictive Control (MPC) delivers strong performance across robotics applications, solving the underlying (batches of) nonlinear trajectory optimization (TO) problems online remains computationally demanding. Existing GPU-accelerated approaches typically (i) parallelize a single solve to meet real-time deadlines, (ii) scale to very large batches at slower-than-real-time rates, or (iii) achieve speed by restricting model generality (e.g., point-mass dynamics or a single linearization). This leaves a large gap in solver performance for many state-of-the-art MPC applications that require real-time batches of tens to low-hundreds of solves. As such, we present GATO, an open source, GPU-accelerated, batched TO solver co-designed across algorithm, software, and computational hardware to deliver real-time throughput for these moderate batch size regimes. Our approach leverages a combination of block-, warp-, and thread-level parallelism within and across solves for ultra-high performance. We demonstrate the effectiveness of our approach through a combination of: simulated benchmarks showing speedups of 18-21x over CPU baselines and 1.4-16x over GPU baselines as batch size increases; case studies highlighting improved disturbance rejection and convergence behavior; and finally a validation on hardware using an industrial manipulator. We open source GATO to support reproducibility and adoption.
Accuracy, Memory Efficiency and Generalization: A Comparative Study on Liquid Neural Networks and Recurrent Neural Networks
This review aims to conduct a comparative analysis of liquid neural networks (LNNs) and traditional recurrent neural networks (RNNs) and their variants, such as long short-term memory networks (LSTMs) and gated recurrent units (GRUs). The core dimensions of the analysis include model accuracy, memory efficiency, and generalization ability. By systematically reviewing existing research, this paper explores the basic principles, mathematical models, key characteristics, and inherent challenges of these neural network architectures in processing sequential data. Research findings reveal that LNN, as an emerging, biologically inspired, continuous-time dynamic neural network, demonstrates significant potential in handling noisy, non-stationary data, and achieving out-of-distribution (OOD) generalization. Additionally, some LNN variants outperform traditional RNN in terms of parameter efficiency and computational speed. However, RNN remains a cornerstone in sequence modeling due to its mature ecosystem and successful applications across various tasks. This review identifies the commonalities and differences between LNNs and RNNs, summarizes their respective shortcomings and challenges, and points out valuable directions for future research, particularly emphasizing the importance of improving the scalability of LNNs to promote their application in broader and more complex scenarios.
comment: 13 pages, 12 figures. Submitted to IEEE Transactions on Neural Networks and Learning Systems (TNNLS)
Adaptive Control Allocation for Underactuated Time-Scale Separated Non-Affine Systems
Many robotic systems are underactuated, meaning not all degrees of freedom can be directly controlled due to lack of actuators, input constraints, or state-dependent actuation. This property, compounded by modeling uncertainties and disturbances, complicates the control design process for trajectory tracking. In this work, we propose an adaptive control architecture for uncertain, nonlinear, underactuated systems with input constraints. Leveraging time-scale separation, we construct a reduced-order model where fast dynamics provide virtual inputs to the slower subsystem and use dynamic control allocation to select the optimal control inputs given the non-affine dynamics. To handle uncertainty, we introduce a state predictor-based adaptive law, and through singular perturbation theory and Lyapunov analysis, we prove stability and bounded tracking of reference trajectories. The proposed method is validated on a VTOL quadplane with nonlinear, state-dependent actuation, demonstrating its utility as a unified controller across various flight regimes, including cruise, landing transition, and hover.
comment: Code https://github.com/dcherenson/adaptive-control-underactuated
Optimal Batched Scheduling of Stochastic Processing Networks Using Atomic Action Decomposition
Stochastic processing networks (SPNs) have broad applications in healthcare, transportation, and communication networks. The control of SPN is to dynamically assign servers in batches under uncertainty to optimize long-run performance. This problem is challenging as the policy dimension grows exponentially with the number of servers, making standard reinforcement learning and policy optimization methods intractable at scale. We propose an atomic action decomposition framework that addresses this scalability challenge by breaking joint assignments into sequential single-server assignments. This yields policies with constant dimension, independent of the number of servers. We study two classes of atomic policies, the step-dependent and step-independent atomic policies, and prove that both achieve the same optimal long-run average reward as the original joint policies. These results establish that computing the optimal SPN control can be made scalable without loss of optimality using the atomic framework. Our results offer theoretical justification for the strong empirical success of the atomic framework in large-scale applications reported in previous articles.
Optimization via a Control-Centric Framework
Optimization plays a central role in intelligent systems and cyber-physical technologies, where the speed and reliability of convergence directly impact performance. In control theory, optimization-centric methods are standard: controllers are designed by repeatedly solving optimization problems, as in linear quadratic regulation, $H_\infty$ control, and model predictive control. In contrast, this paper develops a control-centric framework for optimization itself, where algorithms are constructed directly from Lyapunov stability principles rather than being proposed first and analyzed afterward. A key element is the stationarity vector, which encodes first-order optimality conditions and enables Lyapunov-based convergence analysis. By pairing a Lyapunov function with a selectable decay law, we obtain continuous-time dynamics with guaranteed exponential, finite-time, fixed-time, or prescribed-time convergence. Within this framework, we introduce three feedback realizations of increasing restrictiveness: the Hessian-gradient, Newton, and gradient dynamics. Each realization shapes the decay of the stationarity vector to achieve the desired rate. These constructions unify unconstrained optimization, extend naturally to constrained problems via Lyapunov-consistent primal-dual dynamics, and broaden the results for minimax and generalized Nash equilibrium seeking problems beyond exponential stability. The framework provides systematic design tools for optimization algorithms in control and game-theoretic problems.
comment: This work has been submitted to the IEEE for possible publication. 12 pages, 3 figures
Probabilistic Simulation of Aircraft Descent via a Physics-Informed Machine Learning Approach
This paper presents a method for generating probabilistic descent trajectories in simulations of real-world airspace. A dataset of 116,066 trajectories harvested from Mode S radar returns in UK airspace was used to train and test the model. Thirteen aircraft types with varying performance characteristics were investigated. It was found that the error in the mean prediction of time to reach the bottom of descent for the proposed method was less than that of the the Base of Aircraft Data (BADA) model by a factor of 10. Furthermore, the method was capable of generating a range of trajectories that were similar to the held out test dataset when analysed in distribution. The proposed method is hybrid, with aircraft drag and calibrated airspeed functions generated probabilistically to parameterise the BADA equations, ensuring the physical plausibility of generated trajectories.
Data-Driven Adaptive PID Control Based on Physics-Informed Neural Networks
This article proposes a data-driven PID controller design based on the principle of adaptive gain optimization, leveraging Physics-Informed Neural Networks (PINNs) generated for predictive modeling purposes. The proposed control design method utilizes gradients of the PID gain optimization, achieved through the automatic differentiation of PINNs, to apply model predictive control using a cost function based on tracking error and control inputs. By optimizing PINNs-based PID gains, the method achieves adaptive gain tuning that ensures stability while accounting for system nonlinearities. The proposed method features a systematic framework for integrating PINNs-based models of dynamical control systems into closed-loop control systems, enabling direct application to PID control design. A series of numerical experiments is conducted to demonstrate the effectiveness of the proposed method from the control perspectives based on both time and frequency domains.
comment: This work has been submitted to the IEEE Transactions on Control Systems Technology for possible publication
Robust Sensor Placement for Poisson Arrivals with False Alarm Aware Spatiotemporal Sensing
This paper studies sensor placement when detection performance varies stochastically due to environmental factors over space and time and false alarms are present, but a filter is used to attenuate the effect. We introduce a unified model that couples detection and false alarms through an availability function, which captures how false alarms reduce effective sensing and filtering responses to the disturbance. Building on this model, we give a sufficient condition under which filtering improves detection. In addition, we derive a coverage-based lower bound on the void probability. Furthermore, we prove robustness guarantees showing that performance remains stable when detection probabilities are learned from limited data. We validate the approach with numerical studies using AIS vessel-traffic data and synthetic maritime scenarios. Together, these results provide theory and practical guidance for deploying sensors in dynamic, uncertain environments.
comment: Submitted to IEEE ACC
Regular Pairings for Non-quadratic Lyapunov Functions and Contraction Analysis
Recent studies on stability and contractivity have highlighted the importance of semi-inner products, which we refer to as pairings, associated with general norms. A pairing is a binary operation that relates the derivative of a curve's norm to the radius-vector of the curve and its tangent. This relationship, known as the curve norm derivative formula, is crucial when using the norm as a Lyapunov function. Another important property of the pairing, used in stability and contraction criteria, is the so-called Lumer inequality, which relates the pairing to the induced logarithmic norm. We prove that the curve norm derivative formula and Lumer's inequality are, in fact, equivalent to each other and to several simpler properties. We then introduce and characterize regular pairings that satisfy all of these properties. Our results unify several independent theories of pairings (semi-inner products) developed in previous work on functional analysis and control theory. Additionally, we introduce the polyhedral max pairing and develop computational tools for polyhedral norms, advancing contraction theory in non-Euclidean spaces.
Reconfigurable Intelligent Surface-Assisted Cross-Layer Authentication for Secure and Efficient Vehicular Communications
Intelligent transportation systems increasingly depend on wireless communication for broadcasting traffic messages and facilitating real-time vehicular communication. In this context, message authentication is crucial for establishing secure and reliable communication. However, security solutions must consider the dynamic nature of vehicular communication links, which fluctuate between line-of-sight (LoS) and non-line-of-sight (NLoS) due to obstructions. This paper proposes a lightweight cross-layer authentication scheme that employs public-key infrastructure (PKI)-based authentication for initial legitimacy detection/handshaking while using key-based physical-layer re-authentication for message verification. This approach reduces signature generation and signaling overheads associated with each transmission, thereby enhancing network scalability. However, the receiver operating characteristic (ROC; Pd: detection vs. PFA: false alarm probabilities) of the latter decreases with lower signal-to-noise ratio (SNR). To address this, we investigate the use of reconfigurable intelligent surfaces (RISs) to strengthen the SNR directed toward the designated vehicle in shadowed areas (i.e., NLoS scenarios), thereby improving the ROC. Theoretical analysis and practical implementation are conducted using a 1-bit RIS consisting of 64 x 64 reflective meta-surfaces. Experimental results show a significant improvement in Pd, increasing from 0.82 to 0.96 at SNR = -6 dB for an orthogonal frequency-division multiplexing (OFDM) system with 128 subcarriers. We also conducted informal and formal security analyses using Burrows-Abadi-Needham (BAN) logic to prove the scheme's ability to resist passive and active attacks.
comment: 18 pages, 14 figures and 6 tables
Development of a magnetorheological hand exoskeleton featuring a high force-to-power ratio for enhanced grip endurance
Hand exoskeletons have significant potential in labor-intensive fields by mitigating hand grip fatigue, enhancing hand strength, and preventing injuries. However, most of the traditional hand exoskeletons are driven by motors, whose output force is limited in the constrained installation conditions. Besides, they also come with the disadvantages of high power consumption, complex and bulky assistive systems, and high instability. In this work, we develop a novel hand exoskeleton integrated with innovative magnetorheological (MR) clutches that offers a high force-to-power ratio to improve grip endurance. The clutch features an enhanced structure design, a micro roller enhancing structure, which can significantly boost output forces. The experimental data demonstrate that, when it is supplied with 2 V, the clutch can deliver a peak holding force of 381.15 N-55 times that when no voltage is provided (7 N). In this scenario, it only consumes 1.38 W, yielding a force-to-power ratio of 256.75N/W, which is 2.35 times higher than the best-reported actuator used for hand exoskeletons. This capability enables the designed MRHE to provide approximately 419.79 N support force for gripping. The designed MR hand exoskeleton is highly integrated, comprising an exoskeleton frame, MR clutches, a control unit, and a battery. Evaluations through static grip endurance tests and dynamic carrying and lifting tests confirm that the MR hand exoskeleton can effectively reduce muscle fatigue, extend grip endurance, and minimize injuries. These findings highlight its strong potential for practical applications in repetitive tasks such as carrying and lifting in industrial settings.
HBS -- Hardware Build System: A Tcl-based, minimal common abstraction approach for build system for hardware designs
Build systems become an indispensable part of the software implementation and deployment process. New programming languages are released with the build system integrated into the language tools, for example, Go, Rust, or Zig. However, in the hardware description domain, no official build systems have been released with the predominant Hardware Description Languages (HDL) such as VHDL or SystemVerilog. Moreover, hardware design projects are often multilanguage. The paper proposes a new build system for the hardware description domain. The system is called the Hardware Build System (HBS). The main goals of the system include simplicity, readability, a minimal number of dependencies, and ease of integration with the existing Electronic Design Automation (EDA) tools. The system proposes a novel, minimal common abstraction approach, whose particular implications are described in the article. All the core functionalities are implemented in Tcl. Only the EDA tool's independent features, such as dependency graph generation, are implemented in a Python wrapper.
Distributionally Robust System Level Synthesis With Output Feedback Affine Control Policy
This paper studies the finite-horizon robust optimal control of constrained linear systems subject to model mismatch and additive stochastic disturbances. Utilizing the system level synthesis (SLS) parameterization, we propose a novel SLS design using an output-feedback affine control policy and extend it to a distributionally robust setting to improve system resilience by minimizing the cost function while ensuring constraint satisfaction against the worst-case uncertainty distribution. The scopes of model mismatch and stochastic disturbances are quantified using the 1-norm and a Wasserstein metric-based ambiguity set, respectively. For the closed-loop dynamics, we analyze the distributional shift between the predicted output-input response -- computed using nominal parameters and empirical disturbance samples -- and the actual closed-loop distribution, highlighting its dependence on model mismatch and SLS parameterization. Assuming convex and Lipschitz continuous cost functions and constraints, we derive a tractable reformulation of the distributionally robust SLS (DR-SLS) problem by leveraging tools from robust control and distributionally robust optimization (DRO). Numerical experiments validate the performance and robustness of the proposed approach.
Control of Humanoid Robots with Parallel Mechanisms using Differential Actuation Models
Several recently released humanoid robots, inspired by the mechanical design of Cassie, employ actuator configurations in which the motors are displaced from the joints to reduce leg inertia. While studies accounting for the full kinematic complexity have demonstrated the benefits of these designs, the associated loop-closure constraints greatly increase computational cost and limit their use in control and learning. As a result, the non-linear transmission is often approximated by a constant reduction ratio, preventing exploitation of the mechanism's full capabilities. This paper introduces a compact analytical formulation for the two standard knee and ankle mechanisms that captures the exact non-linear transmission while remaining computationally efficient. The model is fully differentiable up to second order with a minimal formulation, enabling low-cost evaluation of dynamic derivatives for trajectory optimization and of the apparent transmission impedance for reinforcement learning. We integrate this formulation into trajectory optimization and locomotion policy learning, and compare it against simplified constant-ratio approaches. Hardware experiments demonstrate improved accuracy and robustness, showing that the proposed method provides a practical means to incorporate parallel actuation into modern control algorithms.
Sparse dynamic network reconstruction through L1-regularization of a Lyapunov equation
An important problem in many areas of science is that of recovering interaction networks from simultaneous time-series of many interacting dynamical processes. A common approach is to use the elements of the correlation matrix or its inverse as proxies of the interaction strengths, but the reconstructed networks are necessarily undirected. Transfer entropy methods have been proposed to reconstruct directed networks but the reconstructed network lacks information about interaction strengths. We propose a network reconstruction method that inherits the best of the two approaches by reconstructing a directed weighted network from noisy data under the assumption that the network is sparse and the dynamics are governed by a linear (or weakly-nonlinear) stochastic dynamical system. The two steps of our method are i) constructing an (infinite) family of candidate networks by solving the covariance matrix Lyapunov equation for the state matrix and ii) using L1-regularization to select a sparse solution. We further show how to use prior information on the (non)existence of a few directed edges to drastically improve the quality of the reconstruction.
Scalable analysis of stop-and-go waves: Representation, measurements and insights
Analyzing stop-and-go waves at the scale of miles and hours of data is an emerging challenge in traffic research. The past 5 years have seen an explosion in the availability of large-scale traffic data containing traffic waves and complex congestion patterns, making existing approaches unsuitable for repeatable and scalable analysis of traffic waves in these data. This paper makes a first step towards addressing this challenge by introducing an automatic and scalable stop-and-go wave identification method capable of capturing wave generation, propagation, dissipation, as well as bifurcation and merging, which have previously been observed only very rarely. Using a concise and simple critical-speed based definition of a stop-and-go wave, the proposed method identifies all wave boundaries that encompass spatio-temporal points where vehicle speed is below a chosen critical speed. The method is built upon a graph representation of the spatio-temporal points associated with stop-and-go waves, specifically wave front (start) points and wave tail (end) points, and approaches the solution as a graph component identification problem. It enables the measurement of wave properties at scale. The method is implemented in Python and demonstrated on a large-scale dataset, I-24 MOTION INCEPTION. Our results show insights on the complexity of traffic waves. Traffic waves can bifurcate and merge at a scale that has never been observed or described before. The clustering analysis of all the identified wave components reveals the different topological structures of traffic waves. We explored that the wave merge or bifurcation points can be explained by spatial features. The gallery of all the identified wave topologies is demonstrated at https://trafficwaves.github.io/.
OmniRetarget: Interaction-Preserving Data Generation for Humanoid Whole-Body Loco-Manipulation and Scene Interaction
A dominant paradigm for teaching humanoid robots complex skills is to retarget human motions as kinematic references to train reinforcement learning (RL) policies. However, existing retargeting pipelines often struggle with the significant embodiment gap between humans and robots, producing physically implausible artifacts like foot-skating and penetration. More importantly, common retargeting methods neglect the rich human-object and human-environment interactions essential for expressive locomotion and loco-manipulation. To address this, we introduce OmniRetarget, an interaction-preserving data generation engine based on an interaction mesh that explicitly models and preserves the crucial spatial and contact relationships between an agent, the terrain, and manipulated objects. By minimizing the Laplacian deformation between the human and robot meshes while enforcing kinematic constraints, OmniRetarget generates kinematically feasible trajectories. Moreover, preserving task-relevant interactions enables efficient data augmentation, from a single demonstration to different robot embodiments, terrains, and object configurations. We comprehensively evaluate OmniRetarget by retargeting motions from OMOMO, LAFAN1, and our in-house MoCap datasets, generating over 8-hour trajectories that achieve better kinematic constraint satisfaction and contact preservation than widely used baselines. Such high-quality data enables proprioceptive RL policies to successfully execute long-horizon (up to 30 seconds) parkour and loco-manipulation skills on a Unitree G1 humanoid, trained with only 5 reward terms and simple domain randomization shared by all tasks, without any learning curriculum.
comment: Project website: https://omniretarget.github.io
Conformal Robust Control of Linear Systems
End-to-end engineering design pipelines, in which designs are evaluated using concurrently defined optimal controllers, are becoming increasingly common in practice. To discover designs that perform well even under the misspecification of system dynamics, such end-to-end pipelines have now begun evaluating designs with a robust control objective in place of the nominal optimal control setup. Current approaches of specifying such robust control subproblems, however, rely on hand specification of perturbations anticipated to be present upon deployment or margin methods that ignore problem structure, resulting in a lack of theoretical guarantees and overly conservative empirical performance. We, instead, propose a novel methodology for LQR systems that leverages conformal prediction to specify such uncertainty regions in a data-driven fashion. Such regions have distribution-free coverage guarantees on the true system dynamics, in turn allowing for a probabilistic characterization of the regret of the resulting robust controller. We then demonstrate that such a controller can be efficiently produced via a novel policy gradient method that has convergence guarantees. We finally demonstrate the superior empirical performance of our method over alternate robust control specifications, such as $H_{\infty}$ and LQR with multiplicative noise, across a collection of engineering control systems.
Learn to Bid as a Price-Maker Wind Power Producer
Wind power producers (WPPs) participating in short-term power markets face significant imbalance costs due to their non-dispatchable and variable production. While some WPPs have a large enough market share to influence prices with their bidding decisions, existing optimal bidding methods rarely account for this aspect. Price-maker approaches typically model bidding as a bilevel optimization problem, but these methods require complex market models, estimating other participants' actions, and are computationally demanding. To address these challenges, we propose an online learning algorithm that leverages contextual information to optimize WPP bids in the price-maker setting. We formulate the strategic bidding problem as a contextual multi-armed bandit, ensuring provable regret minimization. The algorithm's performance is evaluated against various benchmark strategies using a numerical simulation of the German day-ahead and real-time markets.
Autonomy Architectures for Safe Planning in Unknown Environments Under Budget Constraints
Mission planning can often be formulated as a constrained control problem under multiple path constraints (i.e., safety constraints) and budget constraints (i.e., resource expenditure constraints). In a priori unknown environments, verifying that an offline solution will satisfy the constraints for all time can be difficult, if not impossible. We present ReRoot, a novel sampling-based framework that enforces safety and budget constraints for nonlinear systems in unknown environments. The main idea is that ReRoot grows multiple reverse RRT* trees online, starting from renewal sets, i.e., sets where the budget constraints are renewed. The dynamically feasible backup trajectories guarantee safety and reduce resource expenditure, which provides a principled backup policy when integrated into the gatekeeper safety verification architecture. We demonstrate our approach in simulation with a fixed-wing UAV in a GNSS-denied environment with a budget constraint on localization error that can be renewed at visual landmarks.
comment: Code: https://github.com/dcherenson/budget-constrained-planning
Systems and Control (EESS)
A Genetic Algorithm Approach to Anti-Jamming UAV Swarm Behavior
In recent years, Unmanned Aerial Vehicles (UAVs) have brought a new true revolution to military tactics. While UAVs already constitute an advantage when operating alone, multi-UAV swarms expand the available possibilities, allowing the UAVs to collaborate and support each other as a team to carry out a given task. This entails the capability to exchange information related with situation awareness and action coordination by means of a suitable wireless communication technology. In such scenario, the adversary is expected to disrupt communications by jamming the communication channel. The latter becomes the Achilles heel of the swarm. While anti-jamming techniques constitute a well covered topic in the literature, the use of intelligent swarm behaviors to leverage those techniques is still an open research issue. This paper explores the use of Genetic Algorithms (GAs) to jointly optimize UAV swarm formation, beam-steering antennas and traffic routing in order to mitigate the effect of jamming in the main coordination channel, under the assumption that a more robust and low data rate channel is used for formation management signaling. Simulation results show the effectiveness of proposed approach. However, the significant computational cost paves the way for further research.
comment: 8 pages, conference paper
Stability Preserving Safe Control of a Bicopter
This paper presents a control law for stabilization and trajectory tracking of a multicopter subject to safety constraints. The proposed approach guarantees forward invariance of a prescribed safety set while ensuring smooth tracking performance. Unlike conventional control barrier function methods, the constrained control problem is transformed into an unconstrained one using state-dependent mappings together with carefully constructed Lyapunov functions. This approach enables explicit synthesis of the control law, instead of requiring a solution of constrained optimization at each step. The transformation also enables the controller to enforce safety without sacrificing stability or performance. Simulation results for a polytopic reference trajectory confined within a designated safe region demonstrate the effectiveness of the proposed method.
From Neural Sensing to Stimulation: An Interdisciplinary Roadmap for Neurotechnology
Neurotechnologies are transforming how we measure, interpret, and modulate brain-body interactions, integrating real-time sensing, computation, and stimulation to enable precise physiological control. They hold transformative potential across clinical and non-clinical domains, from treating disorders to enhancing cognition and performance. Realizing this potential requires navigating complex, interdisciplinary challenges spanning neuroscience, materials science, device engineering, signal processing, computational modelling, and regulatory and ethical frameworks. This Perspective presents a strategic roadmap for neurotechnology development, created by early-career researchers, highlighting their role at the intersection of disciplines and their capacity to bridge traditional silos. We identify five cross-cutting trade-offs that constrain progress across functionality, scalability, adaptability, and translatability, and illustrate how technical domains influence their resolution. Rather than a domain-specific review, we focus on shared challenges and strategic opportunities that transcend disciplines. We propose a unified framework for collaborative innovation and education, highlight ethical and regulatory priorities, and outline a timeline for overcoming key bottlenecks. By aligning technical development with translational and societal needs, this roadmap aims to accelerate equitable, effective, and future-ready adaptive neurotechnologies, guiding coordinated efforts across the global research and innovation community.
Identification and optimal control strategies for the transversal splitting of ultra--cold Bose gases
Splitting a Bose--Einstein condensate (BEC) is a key operation in fundamental physics experiments and emerging quantum technologies, where precise preparation of well--defined initial states requires fast yet coherent control of the condensate's nonlinear dynamics. This work formulates the BEC splitting process as an optimal feedforward control problem based on a physically interpretable, reduced--order model identified from limited experimental data. We introduce a systematic calibration strategy that combines optimal experiment selection and constrained nonlinear parameter estimation, enabling accurate system identification with minimal experimental overhead. Using this calibrated model, we compute energy--optimal trajectories via indirect optimal control to realize shortcuts to adiabaticity (STAs), achieving rapid transitions to the ground state of a double--well potential while suppressing excitations. Experiments confirm that the proposed control framework yields high--fidelity state transfers across multiple configurations, demonstrating its robustness and scalability for quantum control applications.
comment: To be published in IEEE Transactions on Control Systems Technology
Mitigating Increase-Decrease Gaming with Alternative Connection Agreements: A Defender-Attacker-Defender Game
Redispatch markets are widely used by system operators to manage network congestion. A well-known drawback, however, is that Flexibility Service Providers (FSPs) may strategically adjust their baselines in anticipation of redispatch actions, thereby aggravating congestion and raising system costs. To address this increase-decrease gaming, Distribution System Operators (DSOs) could use Alternative Connection Agreements (ACAs) to conditionally limit the available connection capacity of market participants in the day-ahead stage. In this paper, we present a novel Defender-Attacker-Defender game to investigate the potential of this approach in distribution networks under load and price uncertainty. We solve the resulting trilevel optimization model using a custom branch-and-bound algorithm, and we demonstrate that it efficiently solves the problem without exploring many nodes in the branch-and-bound search tree for most simulated scenarios. The case study demonstrates that applying ACAs can substantially lower redispatch costs (e.g. by 25%) for the DSO with only a limited impact on FSP profits. The effectiveness of the approach critically depends on how often the DSO can invoke ACAs and on the extent to which the DSO can anticipate strategic bidding behavior of the FSP.
Falsification-Driven Reinforcement Learning for Maritime Motion Planning
Compliance with maritime traffic rules is essential for the safe operation of autonomous vessels, yet training reinforcement learning (RL) agents to adhere to them is challenging. The behavior of RL agents is shaped by the training scenarios they encounter, but creating scenarios that capture the complexity of maritime navigation is non-trivial, and real-world data alone is insufficient. To address this, we propose a falsification-driven RL approach that generates adversarial training scenarios in which the vessel under test violates maritime traffic rules, which are expressed as signal temporal logic specifications. Our experiments on open-sea navigation with two vessels demonstrate that the proposed approach provides more relevant training scenarios and achieves more consistent rule compliance.
Decentralized CBF-based Safety Filters for Collision Avoidance of Cooperative Missile Systems with Input Constraints
This paper presents a decentralized safety filter for collision avoidance in multi-agent aerospace interception scenarios. The approach leverages robust control barrier functions (RCBFs) to guarantee forward invariance of safety sets under bounded inputs and high-relative-degree dynamics. Each effector executes its nominal cooperative guidance command, while a local quadratic program (QP) modifies the input only when necessary. Event-triggered activation based on range and zero-effort miss (ZEM) criteria ensures scalability by restricting active constraints to relevant neighbors. To resolve feasibility issues from simultaneous constraints, a slack-variable relaxation scheme is introduced that prioritizes critical agents in a Pareto-optimal manner. Simulation results in many-on-many interception scenarios demonstrate that the proposed framework maintains collision-free operation with minimal deviation from nominal guidance, providing a computationally efficient and scalable solution for safety-critical multi-agent aerospace systems.
comment: 7 pages, 5 figures
Resilient Multi-Dimensional Consensus and Distributed Optimization against Agent-Based and Denial-of-Service Attacks
In this paper, we consider the resilient multi-dimensional consensus and distributed optimization problems of multi-agent systems (MASs) in the presence of both agent-based and denial-of-service (DoS) attacks. The considered agent-based attacks can cover malicious, Byzantine, and stubborn agents. The links between agents in the network can be blocked by DoS attacks, which may lead the digraph to be time-varying and even disconnected. The objective is to ensure that the remaining benign agents achieve consensus. To this end, an "auxiliary point"-based resilient control algorithm is proposed for MASs. Under the proposed algorithm, each healthy agent constructs a "safe kernel" utilizing the states of its in-neighbors and updates its state toward a specific point within this kernel at each iteration. If an agent cannot receive its neighbors' states owing to DoS attacks, it will use the states received immediately before the DoS period. Moreover, a resilient multi-dimensional distributed optimization (RMDO) algorithm is also proposed. Theoretical proofs and numerical examples are presented to demonstrate the effectiveness of the proposed algorithms.
Delay Independent Safe Control with Neural Networks: Positive Lur'e Certificates for Risk Aware Autonomy
We present a risk-aware safety certification method for autonomous, learning enabled control systems. Focusing on two realistic risks, state/input delays and interval matrix uncertainty, we model the neural network (NN) controller with local sector bounds and exploit positivity structure to derive linear, delay-independent certificates that guarantee local exponential stability across admissible uncertainties. To benchmark performance, we adopt and implement a state-of-the-art IQC NN verification pipeline. On representative cases, our positivity-based tests run orders of magnitude faster than SDP-based IQC while certifying regimes the latter cannot-providing scalable safety guarantees that complement risk-aware control.
comment: Submitted to 2026 American Control Conference (ACC), New Orleans, LA
A Cascade of Systems and the Product of Their $θ$-Symmetric Scaled Relative Graphs
In this paper, we utilize a variant of the scaled relative graph (SRG), referred to as the $\theta$-symmetric SRG, to develop a graphical stability criterion for the feedback interconnection of a cascade of systems. A crucial submultiplicative property of $\theta$-symmetric SRG is established, enabling it to handle cyclic interconnections for which conventional graph separation methods are not applicable. By integrating both gain and refined phase information, the $\theta$-symmetric SRG provides a unified graphical characterization of the system, which better captures system properties and yields less conservative results. In the scalar case, the $\theta$-symmetric SRG can be reduced exactly to the scalar itself, whereas the standard SRG appears to be a conjugate pair. Consequently, the frequency-wise $\theta$-symmetric SRG is more suitable than the standard SRG as a multi-input multi-output extension of the classical Nyquist plot. Illustrative examples are included to demonstrate the effectiveness of the $\theta$-symmetric SRG.
comment: 9 pages, 4 figures
Safe Stabilization of the Stefan Problem with a High-Order Moving Boundary Dynamics by PDE Backstepping
This paper presents a safe stabilization of the Stefan PDE model with a moving boundary governed by a high-order dynamics. We consider a parabolic PDE with a time-varying domain governed by a second-order response with respect to the Neumann boundary value of the PDE state at the moving boundary. The objective is to design a boundary heat flux control to stabilize the moving boundary at a desired setpoint, with satisfying the required conditions of the model on PDE state and the moving boundary. We apply a PDE backstepping method for the control design with considering a constraint on the control law. The PDE and moving boundary constraints are shown to be satisfied by applying the maximum principle for parabolic PDEs. Then the closed-loop system is shown to be globally exponentially stable by performing Lyapunov analysis. The proposed control is implemented in numerical simulation, which illustrates the desired performance in safety and stability. An outline of the extension to third-order moving boundary dynamics is also presented. Code is released at https://github.com/shumon0423/HighOrderStefan_CDC2025.git.
comment: 6 pages, 4 figures, 64th IEEE Conference on Decision and Control (CDC) 2025
Model Predictive Path Integral Control for Roll-to-Roll Manufacturing
Roll-to-roll (R2R) manufacturing is a continuous processing technology essential for scalable production of thin-film materials and printed electronics, but precise control remains challenging due to subsystem interactions, nonlinearities, and process disturbances. This paper proposes a Model Predictive Path Integral (MPPI) control formulation for R2R systems, leveraging a GPU-based Monte-Carlo sampling approach to efficiently approximate optimal controls online. Crucially, MPPI easily handles non-differentiable cost functions, enabling the incorporation of complex performance criteria relevant to advanced manufacturing processes. A case study is presented that demonstrates that MPPI significantly improves tension regulation performance compared to conventional model predictive control (MPC), highlighting its suitability for real-time control in advanced manufacturing.
comment: 6 pages, 4 figures
GATO: GPU-Accelerated and Batched Trajectory Optimization for Scalable Edge Model Predictive Control
While Model Predictive Control (MPC) delivers strong performance across robotics applications, solving the underlying (batches of) nonlinear trajectory optimization (TO) problems online remains computationally demanding. Existing GPU-accelerated approaches typically (i) parallelize a single solve to meet real-time deadlines, (ii) scale to very large batches at slower-than-real-time rates, or (iii) achieve speed by restricting model generality (e.g., point-mass dynamics or a single linearization). This leaves a large gap in solver performance for many state-of-the-art MPC applications that require real-time batches of tens to low-hundreds of solves. As such, we present GATO, an open source, GPU-accelerated, batched TO solver co-designed across algorithm, software, and computational hardware to deliver real-time throughput for these moderate batch size regimes. Our approach leverages a combination of block-, warp-, and thread-level parallelism within and across solves for ultra-high performance. We demonstrate the effectiveness of our approach through a combination of: simulated benchmarks showing speedups of 18-21x over CPU baselines and 1.4-16x over GPU baselines as batch size increases; case studies highlighting improved disturbance rejection and convergence behavior; and finally a validation on hardware using an industrial manipulator. We open source GATO to support reproducibility and adoption.
Accuracy, Memory Efficiency and Generalization: A Comparative Study on Liquid Neural Networks and Recurrent Neural Networks
This review aims to conduct a comparative analysis of liquid neural networks (LNNs) and traditional recurrent neural networks (RNNs) and their variants, such as long short-term memory networks (LSTMs) and gated recurrent units (GRUs). The core dimensions of the analysis include model accuracy, memory efficiency, and generalization ability. By systematically reviewing existing research, this paper explores the basic principles, mathematical models, key characteristics, and inherent challenges of these neural network architectures in processing sequential data. Research findings reveal that LNN, as an emerging, biologically inspired, continuous-time dynamic neural network, demonstrates significant potential in handling noisy, non-stationary data, and achieving out-of-distribution (OOD) generalization. Additionally, some LNN variants outperform traditional RNN in terms of parameter efficiency and computational speed. However, RNN remains a cornerstone in sequence modeling due to its mature ecosystem and successful applications across various tasks. This review identifies the commonalities and differences between LNNs and RNNs, summarizes their respective shortcomings and challenges, and points out valuable directions for future research, particularly emphasizing the importance of improving the scalability of LNNs to promote their application in broader and more complex scenarios.
comment: 13 pages, 12 figures. Submitted to IEEE Transactions on Neural Networks and Learning Systems (TNNLS)
Adaptive Control Allocation for Underactuated Time-Scale Separated Non-Affine Systems
Many robotic systems are underactuated, meaning not all degrees of freedom can be directly controlled due to lack of actuators, input constraints, or state-dependent actuation. This property, compounded by modeling uncertainties and disturbances, complicates the control design process for trajectory tracking. In this work, we propose an adaptive control architecture for uncertain, nonlinear, underactuated systems with input constraints. Leveraging time-scale separation, we construct a reduced-order model where fast dynamics provide virtual inputs to the slower subsystem and use dynamic control allocation to select the optimal control inputs given the non-affine dynamics. To handle uncertainty, we introduce a state predictor-based adaptive law, and through singular perturbation theory and Lyapunov analysis, we prove stability and bounded tracking of reference trajectories. The proposed method is validated on a VTOL quadplane with nonlinear, state-dependent actuation, demonstrating its utility as a unified controller across various flight regimes, including cruise, landing transition, and hover.
comment: Code https://github.com/dcherenson/adaptive-control-underactuated
Optimal Batched Scheduling of Stochastic Processing Networks Using Atomic Action Decomposition
Stochastic processing networks (SPNs) have broad applications in healthcare, transportation, and communication networks. The control of SPN is to dynamically assign servers in batches under uncertainty to optimize long-run performance. This problem is challenging as the policy dimension grows exponentially with the number of servers, making standard reinforcement learning and policy optimization methods intractable at scale. We propose an atomic action decomposition framework that addresses this scalability challenge by breaking joint assignments into sequential single-server assignments. This yields policies with constant dimension, independent of the number of servers. We study two classes of atomic policies, the step-dependent and step-independent atomic policies, and prove that both achieve the same optimal long-run average reward as the original joint policies. These results establish that computing the optimal SPN control can be made scalable without loss of optimality using the atomic framework. Our results offer theoretical justification for the strong empirical success of the atomic framework in large-scale applications reported in previous articles.
Optimization via a Control-Centric Framework
Optimization plays a central role in intelligent systems and cyber-physical technologies, where the speed and reliability of convergence directly impact performance. In control theory, optimization-centric methods are standard: controllers are designed by repeatedly solving optimization problems, as in linear quadratic regulation, $H_\infty$ control, and model predictive control. In contrast, this paper develops a control-centric framework for optimization itself, where algorithms are constructed directly from Lyapunov stability principles rather than being proposed first and analyzed afterward. A key element is the stationarity vector, which encodes first-order optimality conditions and enables Lyapunov-based convergence analysis. By pairing a Lyapunov function with a selectable decay law, we obtain continuous-time dynamics with guaranteed exponential, finite-time, fixed-time, or prescribed-time convergence. Within this framework, we introduce three feedback realizations of increasing restrictiveness: the Hessian-gradient, Newton, and gradient dynamics. Each realization shapes the decay of the stationarity vector to achieve the desired rate. These constructions unify unconstrained optimization, extend naturally to constrained problems via Lyapunov-consistent primal-dual dynamics, and broaden the results for minimax and generalized Nash equilibrium seeking problems beyond exponential stability. The framework provides systematic design tools for optimization algorithms in control and game-theoretic problems.
comment: This work has been submitted to the IEEE for possible publication. 12 pages, 3 figures
Probabilistic Simulation of Aircraft Descent via a Physics-Informed Machine Learning Approach
This paper presents a method for generating probabilistic descent trajectories in simulations of real-world airspace. A dataset of 116,066 trajectories harvested from Mode S radar returns in UK airspace was used to train and test the model. Thirteen aircraft types with varying performance characteristics were investigated. It was found that the error in the mean prediction of time to reach the bottom of descent for the proposed method was less than that of the the Base of Aircraft Data (BADA) model by a factor of 10. Furthermore, the method was capable of generating a range of trajectories that were similar to the held out test dataset when analysed in distribution. The proposed method is hybrid, with aircraft drag and calibrated airspeed functions generated probabilistically to parameterise the BADA equations, ensuring the physical plausibility of generated trajectories.
Data-Driven Adaptive PID Control Based on Physics-Informed Neural Networks
This article proposes a data-driven PID controller design based on the principle of adaptive gain optimization, leveraging Physics-Informed Neural Networks (PINNs) generated for predictive modeling purposes. The proposed control design method utilizes gradients of the PID gain optimization, achieved through the automatic differentiation of PINNs, to apply model predictive control using a cost function based on tracking error and control inputs. By optimizing PINNs-based PID gains, the method achieves adaptive gain tuning that ensures stability while accounting for system nonlinearities. The proposed method features a systematic framework for integrating PINNs-based models of dynamical control systems into closed-loop control systems, enabling direct application to PID control design. A series of numerical experiments is conducted to demonstrate the effectiveness of the proposed method from the control perspectives based on both time and frequency domains.
comment: This work has been submitted to the IEEE Transactions on Control Systems Technology for possible publication
Robust Sensor Placement for Poisson Arrivals with False Alarm Aware Spatiotemporal Sensing
This paper studies sensor placement when detection performance varies stochastically due to environmental factors over space and time and false alarms are present, but a filter is used to attenuate the effect. We introduce a unified model that couples detection and false alarms through an availability function, which captures how false alarms reduce effective sensing and filtering responses to the disturbance. Building on this model, we give a sufficient condition under which filtering improves detection. In addition, we derive a coverage-based lower bound on the void probability. Furthermore, we prove robustness guarantees showing that performance remains stable when detection probabilities are learned from limited data. We validate the approach with numerical studies using AIS vessel-traffic data and synthetic maritime scenarios. Together, these results provide theory and practical guidance for deploying sensors in dynamic, uncertain environments.
comment: Submitted to IEEE ACC
Regular Pairings for Non-quadratic Lyapunov Functions and Contraction Analysis
Recent studies on stability and contractivity have highlighted the importance of semi-inner products, which we refer to as pairings, associated with general norms. A pairing is a binary operation that relates the derivative of a curve's norm to the radius-vector of the curve and its tangent. This relationship, known as the curve norm derivative formula, is crucial when using the norm as a Lyapunov function. Another important property of the pairing, used in stability and contraction criteria, is the so-called Lumer inequality, which relates the pairing to the induced logarithmic norm. We prove that the curve norm derivative formula and Lumer's inequality are, in fact, equivalent to each other and to several simpler properties. We then introduce and characterize regular pairings that satisfy all of these properties. Our results unify several independent theories of pairings (semi-inner products) developed in previous work on functional analysis and control theory. Additionally, we introduce the polyhedral max pairing and develop computational tools for polyhedral norms, advancing contraction theory in non-Euclidean spaces.
Reconfigurable Intelligent Surface-Assisted Cross-Layer Authentication for Secure and Efficient Vehicular Communications
Intelligent transportation systems increasingly depend on wireless communication for broadcasting traffic messages and facilitating real-time vehicular communication. In this context, message authentication is crucial for establishing secure and reliable communication. However, security solutions must consider the dynamic nature of vehicular communication links, which fluctuate between line-of-sight (LoS) and non-line-of-sight (NLoS) due to obstructions. This paper proposes a lightweight cross-layer authentication scheme that employs public-key infrastructure (PKI)-based authentication for initial legitimacy detection/handshaking while using key-based physical-layer re-authentication for message verification. This approach reduces signature generation and signaling overheads associated with each transmission, thereby enhancing network scalability. However, the receiver operating characteristic (ROC; Pd: detection vs. PFA: false alarm probabilities) of the latter decreases with lower signal-to-noise ratio (SNR). To address this, we investigate the use of reconfigurable intelligent surfaces (RISs) to strengthen the SNR directed toward the designated vehicle in shadowed areas (i.e., NLoS scenarios), thereby improving the ROC. Theoretical analysis and practical implementation are conducted using a 1-bit RIS consisting of 64 x 64 reflective meta-surfaces. Experimental results show a significant improvement in Pd, increasing from 0.82 to 0.96 at SNR = -6 dB for an orthogonal frequency-division multiplexing (OFDM) system with 128 subcarriers. We also conducted informal and formal security analyses using Burrows-Abadi-Needham (BAN) logic to prove the scheme's ability to resist passive and active attacks.
comment: 18 pages, 14 figures and 6 tables
Development of a magnetorheological hand exoskeleton featuring a high force-to-power ratio for enhanced grip endurance
Hand exoskeletons have significant potential in labor-intensive fields by mitigating hand grip fatigue, enhancing hand strength, and preventing injuries. However, most of the traditional hand exoskeletons are driven by motors, whose output force is limited in the constrained installation conditions. Besides, they also come with the disadvantages of high power consumption, complex and bulky assistive systems, and high instability. In this work, we develop a novel hand exoskeleton integrated with innovative magnetorheological (MR) clutches that offers a high force-to-power ratio to improve grip endurance. The clutch features an enhanced structure design, a micro roller enhancing structure, which can significantly boost output forces. The experimental data demonstrate that, when it is supplied with 2 V, the clutch can deliver a peak holding force of 381.15 N-55 times that when no voltage is provided (7 N). In this scenario, it only consumes 1.38 W, yielding a force-to-power ratio of 256.75N/W, which is 2.35 times higher than the best-reported actuator used for hand exoskeletons. This capability enables the designed MRHE to provide approximately 419.79 N support force for gripping. The designed MR hand exoskeleton is highly integrated, comprising an exoskeleton frame, MR clutches, a control unit, and a battery. Evaluations through static grip endurance tests and dynamic carrying and lifting tests confirm that the MR hand exoskeleton can effectively reduce muscle fatigue, extend grip endurance, and minimize injuries. These findings highlight its strong potential for practical applications in repetitive tasks such as carrying and lifting in industrial settings.
HBS -- Hardware Build System: A Tcl-based, minimal common abstraction approach for build system for hardware designs
Build systems become an indispensable part of the software implementation and deployment process. New programming languages are released with the build system integrated into the language tools, for example, Go, Rust, or Zig. However, in the hardware description domain, no official build systems have been released with the predominant Hardware Description Languages (HDL) such as VHDL or SystemVerilog. Moreover, hardware design projects are often multilanguage. The paper proposes a new build system for the hardware description domain. The system is called the Hardware Build System (HBS). The main goals of the system include simplicity, readability, a minimal number of dependencies, and ease of integration with the existing Electronic Design Automation (EDA) tools. The system proposes a novel, minimal common abstraction approach, whose particular implications are described in the article. All the core functionalities are implemented in Tcl. Only the EDA tool's independent features, such as dependency graph generation, are implemented in a Python wrapper.
Distributionally Robust System Level Synthesis With Output Feedback Affine Control Policy
This paper studies the finite-horizon robust optimal control of constrained linear systems subject to model mismatch and additive stochastic disturbances. Utilizing the system level synthesis (SLS) parameterization, we propose a novel SLS design using an output-feedback affine control policy and extend it to a distributionally robust setting to improve system resilience by minimizing the cost function while ensuring constraint satisfaction against the worst-case uncertainty distribution. The scopes of model mismatch and stochastic disturbances are quantified using the 1-norm and a Wasserstein metric-based ambiguity set, respectively. For the closed-loop dynamics, we analyze the distributional shift between the predicted output-input response -- computed using nominal parameters and empirical disturbance samples -- and the actual closed-loop distribution, highlighting its dependence on model mismatch and SLS parameterization. Assuming convex and Lipschitz continuous cost functions and constraints, we derive a tractable reformulation of the distributionally robust SLS (DR-SLS) problem by leveraging tools from robust control and distributionally robust optimization (DRO). Numerical experiments validate the performance and robustness of the proposed approach.
Control of Humanoid Robots with Parallel Mechanisms using Differential Actuation Models
Several recently released humanoid robots, inspired by the mechanical design of Cassie, employ actuator configurations in which the motors are displaced from the joints to reduce leg inertia. While studies accounting for the full kinematic complexity have demonstrated the benefits of these designs, the associated loop-closure constraints greatly increase computational cost and limit their use in control and learning. As a result, the non-linear transmission is often approximated by a constant reduction ratio, preventing exploitation of the mechanism's full capabilities. This paper introduces a compact analytical formulation for the two standard knee and ankle mechanisms that captures the exact non-linear transmission while remaining computationally efficient. The model is fully differentiable up to second order with a minimal formulation, enabling low-cost evaluation of dynamic derivatives for trajectory optimization and of the apparent transmission impedance for reinforcement learning. We integrate this formulation into trajectory optimization and locomotion policy learning, and compare it against simplified constant-ratio approaches. Hardware experiments demonstrate improved accuracy and robustness, showing that the proposed method provides a practical means to incorporate parallel actuation into modern control algorithms.
Sparse dynamic network reconstruction through L1-regularization of a Lyapunov equation
An important problem in many areas of science is that of recovering interaction networks from simultaneous time-series of many interacting dynamical processes. A common approach is to use the elements of the correlation matrix or its inverse as proxies of the interaction strengths, but the reconstructed networks are necessarily undirected. Transfer entropy methods have been proposed to reconstruct directed networks but the reconstructed network lacks information about interaction strengths. We propose a network reconstruction method that inherits the best of the two approaches by reconstructing a directed weighted network from noisy data under the assumption that the network is sparse and the dynamics are governed by a linear (or weakly-nonlinear) stochastic dynamical system. The two steps of our method are i) constructing an (infinite) family of candidate networks by solving the covariance matrix Lyapunov equation for the state matrix and ii) using L1-regularization to select a sparse solution. We further show how to use prior information on the (non)existence of a few directed edges to drastically improve the quality of the reconstruction.
Scalable analysis of stop-and-go waves: Representation, measurements and insights
Analyzing stop-and-go waves at the scale of miles and hours of data is an emerging challenge in traffic research. The past 5 years have seen an explosion in the availability of large-scale traffic data containing traffic waves and complex congestion patterns, making existing approaches unsuitable for repeatable and scalable analysis of traffic waves in these data. This paper makes a first step towards addressing this challenge by introducing an automatic and scalable stop-and-go wave identification method capable of capturing wave generation, propagation, dissipation, as well as bifurcation and merging, which have previously been observed only very rarely. Using a concise and simple critical-speed based definition of a stop-and-go wave, the proposed method identifies all wave boundaries that encompass spatio-temporal points where vehicle speed is below a chosen critical speed. The method is built upon a graph representation of the spatio-temporal points associated with stop-and-go waves, specifically wave front (start) points and wave tail (end) points, and approaches the solution as a graph component identification problem. It enables the measurement of wave properties at scale. The method is implemented in Python and demonstrated on a large-scale dataset, I-24 MOTION INCEPTION. Our results show insights on the complexity of traffic waves. Traffic waves can bifurcate and merge at a scale that has never been observed or described before. The clustering analysis of all the identified wave components reveals the different topological structures of traffic waves. We explored that the wave merge or bifurcation points can be explained by spatial features. The gallery of all the identified wave topologies is demonstrated at https://trafficwaves.github.io/.
OmniRetarget: Interaction-Preserving Data Generation for Humanoid Whole-Body Loco-Manipulation and Scene Interaction
A dominant paradigm for teaching humanoid robots complex skills is to retarget human motions as kinematic references to train reinforcement learning (RL) policies. However, existing retargeting pipelines often struggle with the significant embodiment gap between humans and robots, producing physically implausible artifacts like foot-skating and penetration. More importantly, common retargeting methods neglect the rich human-object and human-environment interactions essential for expressive locomotion and loco-manipulation. To address this, we introduce OmniRetarget, an interaction-preserving data generation engine based on an interaction mesh that explicitly models and preserves the crucial spatial and contact relationships between an agent, the terrain, and manipulated objects. By minimizing the Laplacian deformation between the human and robot meshes while enforcing kinematic constraints, OmniRetarget generates kinematically feasible trajectories. Moreover, preserving task-relevant interactions enables efficient data augmentation, from a single demonstration to different robot embodiments, terrains, and object configurations. We comprehensively evaluate OmniRetarget by retargeting motions from OMOMO, LAFAN1, and our in-house MoCap datasets, generating over 8-hour trajectories that achieve better kinematic constraint satisfaction and contact preservation than widely used baselines. Such high-quality data enables proprioceptive RL policies to successfully execute long-horizon (up to 30 seconds) parkour and loco-manipulation skills on a Unitree G1 humanoid, trained with only 5 reward terms and simple domain randomization shared by all tasks, without any learning curriculum.
comment: Project website: https://omniretarget.github.io
Conformal Robust Control of Linear Systems
End-to-end engineering design pipelines, in which designs are evaluated using concurrently defined optimal controllers, are becoming increasingly common in practice. To discover designs that perform well even under the misspecification of system dynamics, such end-to-end pipelines have now begun evaluating designs with a robust control objective in place of the nominal optimal control setup. Current approaches of specifying such robust control subproblems, however, rely on hand specification of perturbations anticipated to be present upon deployment or margin methods that ignore problem structure, resulting in a lack of theoretical guarantees and overly conservative empirical performance. We, instead, propose a novel methodology for LQR systems that leverages conformal prediction to specify such uncertainty regions in a data-driven fashion. Such regions have distribution-free coverage guarantees on the true system dynamics, in turn allowing for a probabilistic characterization of the regret of the resulting robust controller. We then demonstrate that such a controller can be efficiently produced via a novel policy gradient method that has convergence guarantees. We finally demonstrate the superior empirical performance of our method over alternate robust control specifications, such as $H_{\infty}$ and LQR with multiplicative noise, across a collection of engineering control systems.
Learn to Bid as a Price-Maker Wind Power Producer
Wind power producers (WPPs) participating in short-term power markets face significant imbalance costs due to their non-dispatchable and variable production. While some WPPs have a large enough market share to influence prices with their bidding decisions, existing optimal bidding methods rarely account for this aspect. Price-maker approaches typically model bidding as a bilevel optimization problem, but these methods require complex market models, estimating other participants' actions, and are computationally demanding. To address these challenges, we propose an online learning algorithm that leverages contextual information to optimize WPP bids in the price-maker setting. We formulate the strategic bidding problem as a contextual multi-armed bandit, ensuring provable regret minimization. The algorithm's performance is evaluated against various benchmark strategies using a numerical simulation of the German day-ahead and real-time markets.
Autonomy Architectures for Safe Planning in Unknown Environments Under Budget Constraints
Mission planning can often be formulated as a constrained control problem under multiple path constraints (i.e., safety constraints) and budget constraints (i.e., resource expenditure constraints). In a priori unknown environments, verifying that an offline solution will satisfy the constraints for all time can be difficult, if not impossible. We present ReRoot, a novel sampling-based framework that enforces safety and budget constraints for nonlinear systems in unknown environments. The main idea is that ReRoot grows multiple reverse RRT* trees online, starting from renewal sets, i.e., sets where the budget constraints are renewed. The dynamically feasible backup trajectories guarantee safety and reduce resource expenditure, which provides a principled backup policy when integrated into the gatekeeper safety verification architecture. We demonstrate our approach in simulation with a fixed-wing UAV in a GNSS-denied environment with a budget constraint on localization error that can be renewed at visual landmarks.
comment: Code: https://github.com/dcherenson/budget-constrained-planning
Systems and Control (CS)
Multi-Segment Photonic Power Converters for Energy Harvesting and High-Speed Optical Wireless Communication
The demand for energy-efficient high-speed wireless communication, coupled with the rapid rise of IoT devices, requires systems that integrate power harvesting with optical data reception to eliminate the need for charging or battery replacements. Recent advances have explored the use of solar cells as optical receivers for high-speed data detection alongside power harvesting. \acs{GaAs}-based \acp{PPC} provide six times greater electron mobility than silicon- or cadmium telluride-based cells, enabling faster data detection and improved power efficiency. However, their bandwidth is constrained by junction capacitance, which increases with active area, creating a trade-off between power output and data rate. To address this, we propose and test multi-segment \acs{GaAs}-based \Acp{PPC} that serve as both energy harvesters and data detectors. By segmenting the active area into 2, 4, or 6 subcells, forming circular areas with diameters of 1, 1.5, or 2.08~mm, we reduce capacitance and boost bandwidth while preserving light collection. Fabricated on a semi-insulating \ac{GaAs} substrate with etched trenches for electrical isolation, the series-connected subcells optimize absorption and minimize parasitic effects. The \Acp{PPC} were used for an eye-safe 1.5~m optical wireless link, employing \ac{OFDM} with adaptive bit and power loading. The system achieved a world record data rate of 3.8~Gbps, which is four times higher than prior works. The system converts 39.7\% of optical power from a beam of 2.3~mW, although the segmentation increases the sensitivity of the alignment. These findings provide new solutions for off-grid backhaul for future communication networks, such as 6th generation (6G) cellular.
Differentiable Model Predictive Control on the GPU
Differentiable model predictive control (MPC) offers a powerful framework for combining learning and control. However, its adoption has been limited by the inherently sequential nature of traditional optimization algorithms, which are challenging to parallelize on modern computing hardware like GPUs. In this work, we tackle this bottleneck by introducing a GPU-accelerated differentiable optimization tool for MPC. This solver leverages sequential quadratic programming and a custom preconditioned conjugate gradient (PCG) routine with tridiagonal preconditioning to exploit the problem's structure and enable efficient parallelization. We demonstrate substantial speedups over CPU- and GPU-based baselines, significantly improving upon state-of-the-art training times on benchmark reinforcement learning and imitation learning tasks. Finally, we showcase the method on the challenging task of reinforcement learning for driving at the limits of handling, where it enables robust drifting of a Toyota Supra through water puddles.
Learning Mixtures of Linear Dynamical Systems (MoLDS) via Hybrid Tensor-EM Method
Mixtures of linear dynamical systems (MoLDS) provide a path to model time-series data that exhibit diverse temporal dynamics across trajectories. However, its application remains challenging in complex and noisy settings, limiting its effectiveness for neural data analysis. Tensor-based moment methods can provide global identifiability guarantees for MoLDS, but their performance degrades under noise and complexity. Commonly used expectation-maximization (EM) methods offer flexibility in fitting latent models but are highly sensitive to initialization and prone to poor local minima. Here, we propose a tensor-based method that provides identifiability guarantees for learning MoLDS, which is followed by EM updates to combine the strengths of both approaches. The novelty in our approach lies in the construction of moment tensors using the input-output data to recover globally consistent estimates of mixture weights and system parameters. These estimates can then be refined through a Kalman EM algorithm, with closed-form updates for all LDS parameters. We validate our framework on synthetic benchmarks and real-world datasets. On synthetic data, the proposed Tensor-EM method achieves more reliable recovery and improved robustness compared to either pure tensor or randomly initialized EM methods. We then analyze neural recordings from the primate somatosensory cortex while a non-human primate performs reaches in different directions. Our method successfully models and clusters different conditions as separate subsystems, consistent with supervised single-LDS fits for each condition. Finally, we apply this approach to another neural dataset where monkeys perform a sequential reaching task. These results demonstrate that MoLDS provides an effective framework for modeling complex neural data, and that Tensor-EM is a reliable approach to MoLDS learning for these applications.
comment: 20 pages, 7 figures
Toward Model Matching for Remotely Controlled Differential Drive Robotic Vehicles
The problem of regulation of the orientation angle of a remotely controlled differential-drive mobile robot with actuator dynamics and network-induced delays is studied. Using a preinstalled two-layer nonlinear control scheme that decouples linear and angular velocities and regulates heading, a third, delay-dependent layer that achieves exact model matching from the orientation angle command to the orientation angle is introduced. The proposed outer loop controller is a delay dependent dynamic measurable output-feedback controller with dynamic proper precompensator. Parameterization yields a simple characteristic quasi-polynomial with coefficients constrained to satisfy stability for all delays up to a computable bound. Computational experiments confirm accurate tracking, fast settling and bounded internal signals and control voltages. The approach offers an analytic design alternative to AI-based tuning for delayed robotic systems.
Optimal Batched Scheduling of Stochastic Processing Networks Using Atomic Action Decomposition
Stochastic processing networks (SPNs) have broad applications in healthcare, transportation, and communication networks. The control of SPN is to dynamically assign servers in batches under uncertainty to optimize long-run performance. This problem is challenging as the policy dimension grows exponentially with the number of servers, making standard reinforcement learning and policy optimization methods intractable at scale. We propose an atomic action decomposition framework that addresses this scalability challenge by breaking joint assignments into sequential single-server assignments. This yields policies with constant dimension, independent of the number of servers. We study two classes of atomic policies, the step-dependent and step-independent atomic policies, and prove that both achieve the same optimal long-run average reward as the original joint policies. These results establish that computing the optimal SPN control can be made scalable without loss of optimality using the atomic framework. Our results offer theoretical justification for the strong empirical success of the atomic framework in large-scale applications reported in previous articles.
Hybrid Quantum-Classical Policy Gradient for Adaptive Control of Cyber-Physical Systems: A Comparative Study of VQC vs. MLP
The comparative evaluation between classical and quantum reinforcement learning (QRL) paradigms was conducted to investigate their convergence behavior, robustness under observational noise, and computational efficiency in a benchmark control environment. The study employed a multilayer perceptron (MLP) agent as a classical baseline and a parameterized variational quantum circuit (VQC) as a quantum counterpart, both trained on the CartPole-v1 environment over 500 episodes. Empirical results demonstrated that the classical MLP achieved near-optimal policy convergence with a mean return of 498.7 +/- 3.2, maintaining stable equilibrium throughout training. In contrast, the VQC exhibited limited learning capability, with an average return of 14.6 +/- 4.8, primarily constrained by circuit depth and qubit connectivity. Noise robustness analysis further revealed that the MLP policy deteriorated gracefully under Gaussian perturbations, while the VQC displayed higher sensitivity at equivalent noise levels. Despite the lower asymptotic performance, the VQC exhibited significantly lower parameter count and marginally increased training time, highlighting its potential scalability for low-resource quantum processors. The results suggest that while classical neural policies remain dominant in current control benchmarks, quantum-enhanced architectures could offer promising efficiency advantages once hardware noise and expressivity limitations are mitigated.
comment: 6 pages, 5 figures, 2 tables, 17 equations, 1 algorithm
Distributed Platoon Control Under Quantization: Stability Analysis and Privacy Preservation
Distributed control of connected and automated vehicles has attracted considerable interest for its potential to improve traffic efficiency and safety. However, such control schemes require sharing privacy-sensitive vehicle data, which introduces risks of information leakage and potential malicious activities. This paper investigates the stability and privacy-preserving properties of distributed platoon control under two types of quantizers: deterministic and probabilistic. For deterministic quantization, we show that the resulting control strategy ensures the system errors remain uniformly ultimately bounded. Moreover, in the absence of auxiliary information, an eavesdropper cannot uniquely infer sensitive vehicle states. In contrast, the use of probabilistic quantization enables asymptotic convergence of the vehicle platoon in expectation with bounded variance. Importantly, probabilistic quantizers can satisfy differential privacy guarantees, thereby preserving privacy even when the eavesdropper possesses arbitrary auxiliary information. We further analyze the trade-off between control performance and privacy by formulating an optimization problem that characterizes the impact of the quantization step on both metrics. Numerical simulations are provided to illustrate the performance differences between the two quantization strategies.
comment: 12 pages, 6 figures
Safe Landing on Small Celestial Bodies with Gravitational Uncertainty Using Disturbance Estimation and Control Barrier Functions
Soft landing on small celestial bodies (SCBs) poses unique challenges, as uncertainties in gravitational models and poorly characterized, dynamic environments require a high level of autonomy. Existing control approaches lack formal guarantees for safety constraint satisfaction, necessary to ensure the safe execution of the maneuvers. This paper introduces a control that addresses this limitation by integrating trajectory tracking, disturbance estimation, and safety enforcement. An extended high-gain observer is employed to estimate disturbances resulting from gravitational model uncertainties. We then apply a feedback-linearizing and disturbance-canceling controller that achieves exponential tracking of reference trajectories. Finally, we use a control barrier function based minimum-intervention controller to enforce state and input constraints through out the maneuver execution. This control combines trajectory tracking of offline generated reference trajectories with formal guarantees of safety, which follows common guidance and control architectures for spacecraft and allows aggressive maneuvers to be executed without compromising safety. Numerical simulations using fuel-optimal trajectories demonstrate the effectiveness of the controller in achieving precise and safe soft-landing, highlighting its potential for autonomous SCB missions.
Human-in-the-loop Optimisation in Robot-assisted Gait Training
Wearable robots offer a promising solution for quantitatively monitoring gait and providing systematic, adaptive assistance to promote patient independence and improve gait. However, due to significant interpersonal and intrapersonal variability in walking patterns, it is important to design robot controllers that can adapt to the unique characteristics of each individual. This paper investigates the potential of human-in-the-loop optimisation (HILO) to deliver personalised assistance in gait training. The Covariance Matrix Adaptation Evolution Strategy (CMA-ES) was employed to continuously optimise an assist-as-needed controller of a lower-limb exoskeleton. Six healthy individuals participated over a two-day experiment. Our results suggest that while the CMA-ES appears to converge to a unique set of stiffnesses for each individual, no measurable impact on the subjects' performance was observed during the validation trials. These findings highlight the impact of human-robot co-adaptation and human behaviour variability, whose effect may be greater than potential benefits of personalising rule-based assistive controllers. Our work contributes to understanding the limitations of current personalisation approaches in exoskeleton-assisted gait rehabilitation and identifies key challenges for effective implementation of human-in-the-loop optimisation in this domain.
Federated Split Learning for Resource-Constrained Robots in Industrial IoT: Framework Comparison, Optimization Strategies, and Future Directions
Federated split learning (FedSL) has emerged as a promising paradigm for enabling collaborative intelligence in industrial Internet of Things (IoT) systems, particularly in smart factories where data privacy, communication efficiency, and device heterogeneity are critical concerns. In this article, we present a comprehensive study of FedSL frameworks tailored for resource-constrained robots in industrial scenarios. We compare synchronous, asynchronous, hierarchical, and heterogeneous FedSL frameworks in terms of workflow, scalability, adaptability, and limitations under dynamic industrial conditions. Furthermore, we systematically categorize token fusion strategies into three paradigms: input-level (pre-fusion), intermediate-level (intra-fusion), and output-level (post-fusion), and summarize their respective strengths in industrial applications. We also provide adaptive optimization techniques to enhance the efficiency and feasibility of FedSL implementation, including model compression, split layer selection, computing frequency allocation, and wireless resource management. Simulation results validate the performance of these frameworks under industrial detection scenarios. Finally, we outline open issues and research directions of FedSL in future smart manufacturing systems.
comment: 9 pages, 5 figures, submitted to the IEEE magazine
Sample-Efficient and Smooth Cross-Entropy Method Model Predictive Control Using Deterministic Samples
Cross-entropy method model predictive control (CEM--MPC) is a powerful gradient-free technique for nonlinear optimal control, but its performance is often limited by the reliance on random sampling. This conventional approach can lead to inefficient exploration of the solution space and non-smooth control inputs, requiring a large number of samples to achieve satisfactory results. To address these limitations, we propose deterministic sampling CEM (dsCEM), a novel framework that replaces the random sampling step with deterministic samples derived from localized cumulative distributions (LCDs). Our approach introduces modular schemes to generate and adapt these sample sets, incorporating temporal correlations to ensure smooth control trajectories. This method can be used as a drop-in replacement for the sampling step in existing CEM-based controllers. Experimental evaluations on two nonlinear control tasks demonstrate that dsCEM consistently outperforms state-of-the-art iCEM in terms of cumulative cost and control input smoothness, particularly in the critical low-sample regime.
Generative AI-Driven Hierarchical Multi-Agent Framework for Zero-Touch Optical Networks
The rapid development of Generative Artificial Intelligence (GenAI) has catalyzed a transformative technological revolution across all walks of life. As the backbone of wideband communication, optical networks are expecting high-level autonomous operation and zero-touch management to accommodate their expanding network scales and escalating transmission bandwidth. The integration of GenAI is deemed as the pivotal solution for realizing zero-touch optical networks. However, the lifecycle management of optical networks involves a multitude of tasks and necessitates seamless collaboration across multiple layers, which poses significant challenges to the existing single-agent GenAI systems. In this paper, we propose a GenAI-driven hierarchical multi-agent framework designed to streamline multi-task autonomous execution for zero-touch optical networks. We present the architecture, implementation, and applications of this framework. A field-deployed mesh network is utilized to demonstrate three typical scenarios throughout the lifecycle of optical network: quality of transmission estimation in the planning stage, dynamic channel adding/dropping in the operation stage, and system capacity increase in the upgrade stage. The case studies, illustrate the capabilities of multi-agent framework in multi-task allocation, coordination, execution, evaluation, and summarization. This work provides a promising approach for the future development of intelligent, efficient, and collaborative network management solutions, paving the way for more specialized and adaptive zero-touch optical networks.
comment: 7 pages,6 figures, Accepted by lEEE Communications Magazine, Open call
GO-Flock: Goal-Oriented Flocking in 3D Unknown Environments with Depth Maps
Artificial Potential Field (APF) methods are widely used for reactive flocking control, but they often suffer from challenges such as deadlocks and local minima, especially in the presence of obstacles. Existing solutions to address these issues are typically passive, leading to slow and inefficient collective navigation. As a result, many APF approaches have only been validated in obstacle-free environments or simplified, pseudo 3D simulations. This paper presents GO-Flock, a hybrid flocking framework that integrates planning with reactive APF-based control. GO-Flock consists of an upstream Perception Module, which processes depth maps to extract waypoints and virtual agents for obstacle avoidance, and a downstream Collective Navigation Module, which applies a novel APF strategy to achieve effective flocking behavior in cluttered environments. We evaluate GO-Flock against passive APF-based approaches to demonstrate their respective merits, such as their flocking behavior and the ability to overcome local minima. Finally, we validate GO-Flock through obstacle-filled environment and also hardware-in-the-loop experiments where we successfully flocked a team of nine drones, six physical and three virtual, in a forest environment.
Real-Time Glass Detection and Reprojection using Sensor Fusion Onboard Aerial Robots ICRA 2026
Autonomous aerial robots are increasingly being deployed in real-world scenarios, where transparent obstacles present significant challenges to reliable navigation and mapping. These materials pose a unique problem for traditional perception systems because they lack discernible features and can cause conventional depth sensors to fail, leading to inaccurate maps and potential collisions. To ensure safe navigation, robots must be able to accurately detect and map these transparent obstacles. Existing methods often rely on large, expensive sensors or algorithms that impose high computational burdens, making them unsuitable for low Size, Weight, and Power (SWaP) robots. In this work, we propose a novel and computationally efficient framework for detecting and mapping transparent obstacles onboard a sub-300g quadrotor. Our method fuses data from a Time-of-Flight (ToF) camera and an ultrasonic sensor with a custom, lightweight 2D convolution model. This specialized approach accurately detects specular reflections and propagates their depth into corresponding empty regions of the depth map, effectively rendering transparent obstacles visible. The entire pipeline operates in real-time, utilizing only a small fraction of a CPU core on an embedded processor. We validate our system through a series of experiments in both controlled and real-world environments, demonstrating the utility of our method through experiments where the robot maps indoor environments containing glass. Our work is, to our knowledge, the first of its kind to demonstrate a real-time, onboard transparent obstacle mapping system on a low-SWaP quadrotor using only the CPU.
comment: 8 pages, 8 figures, submitted to ICRA 2026
Terrain-Aided Navigation Using a Point Cloud Measurement Sensor
We investigate the use of a point cloud measurement in terrain-aided navigation. Our goal is to aid an inertial navigation system, by exploring ways to generate a useful measurement innovation error for effective nonlinear state estimation. We compare two such measurement models that involve the scanning of a digital terrain elevation model: a) one that is based on typical ray-casting from a given pose, that returns the predicted point cloud measurement from that pose, and b) another computationally less intensive one that does not require raycasting and we refer to herein as a sliding grid. Besides requiring a pose, it requires the pattern of the point cloud measurement itself and returns a predicted point cloud measurement. We further investigate the observability properties of the altitude for both measurement models. As a baseline, we compare the use of a point cloud measurement performance to the use of a radar altimeter and show the gains in accuracy. We conclude by showing that a point cloud measurement outperforms the use of a radar altimeter, and the point cloud measurement model to use depends on the computational resources
Three-dimensional Integrated Guidance and Control for Leader-Follower Flexible Formation of Fixed Wing UAVs
This paper presents a nonlinear integrated guidance and control (IGC) approach for flexible leader-follower formation flight of fixed-wing unmanned aerial vehicles (UAVs) while accounting for high-fidelity aerodynamics and thrust dynamics. Unlike conventional leader-follower schemes that fix the follower's position relative to the leader, the follower is steered to maintain range and bearing angles (which is the angle between its velocity vector and its line-of-sight (LOS) with respect to the leader) arbitrarily close to the prescribed values, enabling the follower to maintain formation on a hemispherical region behind the leader. The proposed IGC framework directly maps leader-follower relative range dynamics to throttle commands, and the follower's velocity orientation relative to the LOS to aerodynamic control surface deflections. This enables synergism between guidance and control subsystems. The control design uses a dynamic surface control-based backstepping approach to achieve convergence to the desired formation set, where Lyapunov barrier functions are incorporated to ensure the follower's bearing angle is constrained within specified bounds. Rigorous stability analysis guarantees uniform ultimate boundedness of all error states and strict constraint satisfaction in the presence of aerodynamic nonlinearities. The proposed flexible formation scheme allows the follower to have an orientation mismatch relative to the leader to execute anticipatory reconfiguration by transitioning between the relative positions in the admissible formation set when the leader aggressively maneuvers. The proposed IGC law relies only on relative information and onboard sensors without the information about the leader's maneuver, making it suitable for GPS-denied or non-cooperative scenarios. Finally, we present simulation results to vindicate the effectiveness and robustness of our approach.
Comparing Normal Form Representations for Station-Keeping near Cislunar Libration Points
The normal forms provide useful approximations for many trajectories of interest within the circular restricted three-body problem. This paper aims to thoroughly compare two of these forms: the Birkhoff normal form and the resonant normal form, highlighting the strengths of each for the representation of center manifold trajectories. A method of station-keeping is introduced, analogous to Floquet modes, in which the unstable component is minimized at specific points along a trajectory through impulsive maneuvers. Three different formulations of the same station-keeping approach are posed, collectively spanning Lyapunov, vertical, and halo orbits, as well as Lissajous and quasihalo trajectories.
comment: 2025 AAS/AIAA Space Flight Mechanics Meeting
Time-causal and time-recursive wavelets
When to apply wavelet analysis to real-time temporal signals, where the future cannot be accessed, it is essential to base all the steps in the signal processing pipeline on computational mechanisms that are truly time-causal. This paper describes how a time-causal wavelet analysis can be performed based on concepts developed in the area of temporal scale-space theory, originating from a complete classification of temporal smoothing kernels that guarantee non-creation of new structures from finer to coarser temporal scale levels. By necessity, convolution with truncated exponential kernels in cascade constitutes the only permissable class of kernels, as well as their temporal derivatives as a natural complement to fulfil the admissibility conditions of wavelet representations. For a particular way of choosing the time constants in the resulting infinite convolution of truncated exponential kernels, to ensure temporal scale covariance and thus self-similarity over temporal scales, we describe how mother wavelets can be chosen as temporal derivatives of the resulting time-causal limit kernel. By developing connections between wavelet theory and scale-space theory, we characterize and quantify how the continuous scaling properties transfer to the discrete implementation, demonstrating how the proposed time-causal wavelet representation can reflect the duration of locally dominant temporal structures in the input signals. We propose that this notion of time-causal wavelet analysis could be a valuable tool for signal processing tasks, where streams of signals are to be processed in real time, specifically for signals that may contain local variations over a rich span of temporal scales, or more generally for analysing physical or biophysical temporal phenomena, where a fully time-causal analysis is called for to be physically realistic.
comment: 23 pages, 8 figures
Techno-economic analysis of self-sustainable thermophotovoltaic systems for grid-scale energy generation
To facilitate the widespread adoption of renewable energy, dispatchable, zero-emission power sources are essential for grid stability. This work performs a comprehensive techno-economic analysis of a self-sustainable thermophotovoltaic (TPV) system, an architecture that integrates solar charging to function as a standalone power generation asset. Using theory-based models for air-bridge InGaAs and Si diode cells, our analysis reveals that while the system is not currently competitive from a pure levelized of storage cost (LCOS) perspective due to the high capital expenditure for thermal battery materials, its primary value lies in its competitive levelized cost of electricity (LCOE). The results demonstrate that the LCOE of this self-sustaining system can be competitive with conventional dispatchable generators, such as gas turbines. Furthermore, at scales exceeding the gigawatt-hour level, a Si-based system can also achieve an LCOE comparable to that of traditional gas-turbine power plants, despite having a lower conversion efficiency than its InGaAs counterpart. This highlights a practical engineering pathway for leveraging silicon's immense manufacturing scalability, offering a lower-risk route to deployment compared to III-V materials. Ultimately, this work establishes the self-sustainable TPV architecture as a compelling pathway toward providing grid-scale, on-demand, zero-emission power.
comment: 27 pages, 6 figures, 1 table
Nonlinear System Identification for Model-Based Control of Waked Wind Turbines
This work presents a nonlinear system identification framework for modeling the power extraction dynamics of wind turbines, including both freestream and waked conditions. The approach models turbine dynamics using data-driven power coefficient maps expressed as combinations of compact radial basis functions and polynomial bases, parameterized in terms of tip-speed ratio and upstream conditions. These surrogate models are embedded in a first-order dynamic system suitable for model-based control. Experimental validation is carried out in two wind tunnel configurations: a low-turbulence tandem setup and a high-turbulence wind farm scenario. In the tandem case, the identified model is integrated into an adapted K\omega^2 controller, resulting in improved tip-speed ratio tracking and power stability compared to BEM-based and steady-state models. In the wind farm scenario, the model captures the statistical behavior of the turbines despite unresolved turbulence. The proposed method enables interpretable, adaptive control across a range of operating conditions without relying on black-box learning strategies.
comment: Submitted to: Data-Centric Engineering Journal Length: 27 pages (including references) Figures: 14 numbered figures (from Fig. 1 to Fig. 14) Keywords: Wind Turbines, Wake Interaction, Nonlinear System Identification, Adaptive Control, RBF Regression, Model-Based Control, Wind Tunnel Experiments
Electrical System Architecture for Aviation Electrification
The electrification of aircraft is reshaping the foundations of aerospace design by positioning electrical systems at the center of propulsion, control, and onboard functionality. This chapter provides an overview of electrical system architectures for electric and hybrid electric aircraft, highlighting both established principles and emerging design strategies. The discussion begins with the motivations for electrification, including reducing environmental impact, improving operational efficiency, and replacing complex pneumatic and hydraulic subsystems with lighter and more reliable electrical alternatives. Aircraft electrical architectures are classified into four major categories: conventional, more electric, all electric, and hybrid electric. A range of system topologies is examined, including direct current (DC), alternating current (AC), hybrid, and distributed configurations. Each is considered in terms of its effectiveness in delivering power, enabling redundancy, supporting fault isolation, and managing thermal performance. Real world examples are presented to demonstrate practical applications, with case studies drawn from the Boeing 787 Dreamliner, the Eviation Alice commuter aircraft, and NASA X57 Maxwell demonstrator. These examples illustrate the ongoing transition from incremental subsystem electrification toward fully integrated architectures that promise higher efficiency and greater sustainability.
Neural Network-based Co-design of Output-Feedback Control Barrier Function and Observer
Control Barrier Functions (CBFs) provide a powerful framework for ensuring safety in dynamical systems. However, their application typically relies on full state information, which is often violated in real-world scenarios due to the availability of partial state information. In this work, we propose a neural network-based framework for the co-design of a safety controller, observer, and CBF for partially observed continuous-time systems. By formulating barrier conditions over an augmented state space, our approach ensures safety without requiring bounded estimation errors or handcrafted barrier functions. All components are jointly trained by formulating appropriate loss functions, and we introduce a validity condition to provide formal safety guarantees beyond the training data. Finally, we demonstrate the effectiveness of the proposed approach through several case studies.
comment: There were errors in paper (introduction section and notations)
Efficient MPC-Based Energy Management System for Secure and Cost-Effective Microgrid Operations
Model predictive control (MPC)-based energy management systems (EMS) are essential for ensuring optimal, secure, and stable operation in microgrids with high penetrations of distributed energy resources. However, due to the high computational cost for the decision-making, the conventional MPC-based EMS typically adopts a simplified integrated-bus power balance model. While this simplification is effective for small networks, large-scale systems require a more detailed branch flow model to account for the increased impact of grid power losses and security constraints. This work proposes an efficient and reliable MPC-based EMS that incorporates power-loss effects and grid-security constraints. %, while adaptively shaping the battery power profile in response to online renewable inputs, achieving reduced operational costs. It enhances system reliability, reduces operational costs, and shows strong potential for online implementation due to its reduced computational effort. Specifically, a second-order cone program (SOCP) branch flow relaxation is integrated into the constraint set, yielding a convex formulation that guarantees globally optimal solutions with high computational efficiency. Owing to the radial topology of the microgrid, this relaxation is practically tight, ensuring equivalence to the original problem. Building on this foundation, an online demand response (DR) module is designed to further reduce the operation cost through peak shaving. To the best of our knowledge, no prior MPC-EMS framework has simultaneously modeled losses and security constraints while coordinating flexible loads within a unified architecture. The developed framework enables secure operation with effective peak shaving and reduced total cost. The effectiveness of the proposed method is validated on 10-bus, 18-bus, and 33-bus systems.
Equivariant Filter for Relative Attitude and Target's Angular Velocity Estimation
Accurate estimation of the relative attitude and angular velocity between two rigid bodies is fundamental in aerospace applications such as spacecraft rendezvous and docking. In these scenarios, a chaser vehicle must determine the orientation and angular velocity of a target object using onboard sensors. This work addresses the challenge of designing an Equivariant Filter (EqF) that can reliably estimate both the relative attitude and the target angular velocity using noisy observations of two known, non-collinear vectors fixed in the target frame. To derive the EqF, a symmetry for the system is proposed and an equivariant lift onto the symmetry group is calculated. Observability and convergence properties are analyzed. Simulations demonstrate the filter's performance, with Monte Carlo runs yielding statistically significant results. The impact of low-rate measurements is also examined and a strategy to mitigate this effect is proposed. Experimental results, using fiducial markers and both conventional and event cameras for measurement acquisition, further validate the approach, confirming its effectiveness in a realistic setting.
comment: This work has been submitted to the IEEE for possible publication
Optimal Duration of Reserve Capacity Ancillary Services for Distributed Energy Resources
The increasing integration of distributed energy resources (DERs) into power systems presents opportunities and challenges for ancillary services (AS) provision. Technical requirements of existing AS (i.e., duration, reliability, ramp rate, and lead time) have been designed for traditional generating units, making their provision by DER aggregates particularly challenging. This paper proposes a method to design the duration of reserve capacity AS products considering the operational constraints of DERs and the temporal dynamics of system imbalances. The optimal product duration is determined by maximizing product availability and aligning the supply profile with the system's balancing needs. We apply the methodology to a realistic Swiss low-voltage network with a diverse DER portfolio. The results reveal that (i) shorter product durations maximize average availability and (ii) long product durations improve the alignment with system balancing needs. This paper offers valuable insights for system operators to design AS products tailored for DER participation.
Image-Based Visual Servoing for Enhanced Cooperation of Dual-Arm Manipulation
The cooperation of a pair of robot manipulators is required to manipulate a target object without any fixtures. The conventional control methods coordinate the end-effector pose of each manipulator with that of the other using their kinematics and joint coordinate measurements. Yet, the manipulators' inaccurate kinematics and joint coordinate measurements can cause significant pose synchronization errors in practice. This paper thus proposes an image-based visual servoing approach for enhancing the cooperation of a dual-arm manipulation system. On top of the classical control, the visual servoing controller lets each manipulator use its carried camera to measure the image features of the other's marker and adapt its end-effector pose with the counterpart on the move. Because visual measurements are robust to kinematic errors, the proposed control can reduce the end-effector pose synchronization errors and the fluctuations of the interaction forces of the pair of manipulators on the move. Theoretical analyses have rigorously proven the stability of the closed-loop system. Comparative experiments on real robots have substantiated the effectiveness of the proposed control.
comment: 8 pages, 7 figures. Project website: https://zizhe.io/ral-ibvs-enhanced/. This work has been accepted to the IEEE Robotics and Automation Letters in Feb 2025
Generalizable Physics-Informed Learning for Stochastic Safety-Critical Systems
Accurate estimation of long-term risk is essential for the design and analysis of stochastic dynamical systems. Existing risk quantification methods typically rely on extensive datasets involving risk events observed over extended time horizons, which can be prohibitively expensive to acquire. Motivated by this gap, we propose an efficient method for learning long-term risk probabilities using short-term samples with limited occurrence of risk events. Specifically, we establish that four distinct classes of long-term risk probabilities are characterized by specific partial differential equations (PDEs). Using this characterization, we introduce a physics-informed learning framework that combines empirical data with physics information to infer risk probabilities. We then analyze the theoretical properties of this framework in terms of generalization and convergence. Through numerical experiments, we demonstrate that our framework not only generalizes effectively beyond the sampled states and time horizons but also offers additional benefits such as improved sample efficiency, rapid online inference capabilities under changing system dynamics, and stable computation of probability gradients. These results highlight how embedding PDE constraints, which contain explicit gradient terms and inform how risk probabilities depend on state, time horizon, and system parameters, improves interpolation and generalization between/beyond the available data.
Multi-Agent Stage-wise Conservative Linear Bandits
In many real-world applications such as recommendation systems, multiple learning agents must balance exploration and exploitation while maintaining safety guarantees to avoid catastrophic failures. We study the stochastic linear bandit problem in a multi-agent networked setting where agents must satisfy stage-wise conservative constraints. A network of $N$ agents collaboratively maximizes cumulative reward while ensuring that the expected reward at every round is no less than $(1-\alpha)$ times that of a baseline policy. Each agent observes local rewards with unknown parameters, but the network optimizes for the global parameter (average of local parameters). Agents communicate only with immediate neighbors, and each communication round incurs additional regret. We propose MA-SCLUCB (Multi-Agent Stage-wise Conservative Linear UCB), an episodic algorithm alternating between action selection and consensus-building phases. We prove that MA-SCLUCB achieves regret $\tilde{O}\left(\frac{d}{\sqrt{N}}\sqrt{T}\cdot\frac{\log(NT)}{\sqrt{\log(1/|\lambda_2|)}}\right)$ with high probability, where $d$ is the dimension, $T$ is the horizon, and $|\lambda_2|$ is the network's second largest eigenvalue magnitude. Our analysis shows: (i) collaboration yields $\frac{1}{\sqrt{N}}$ improvement despite local communication, (ii) communication overhead grows only logarithmically for well-connected networks, and (iii) stage-wise safety adds only lower-order regret. Thus, distributed learning with safety guarantees achieves near-optimal performance in reasonably connected networks.
Systems and Control (EESS)
Multi-Segment Photonic Power Converters for Energy Harvesting and High-Speed Optical Wireless Communication
The demand for energy-efficient high-speed wireless communication, coupled with the rapid rise of IoT devices, requires systems that integrate power harvesting with optical data reception to eliminate the need for charging or battery replacements. Recent advances have explored the use of solar cells as optical receivers for high-speed data detection alongside power harvesting. \acs{GaAs}-based \acp{PPC} provide six times greater electron mobility than silicon- or cadmium telluride-based cells, enabling faster data detection and improved power efficiency. However, their bandwidth is constrained by junction capacitance, which increases with active area, creating a trade-off between power output and data rate. To address this, we propose and test multi-segment \acs{GaAs}-based \Acp{PPC} that serve as both energy harvesters and data detectors. By segmenting the active area into 2, 4, or 6 subcells, forming circular areas with diameters of 1, 1.5, or 2.08~mm, we reduce capacitance and boost bandwidth while preserving light collection. Fabricated on a semi-insulating \ac{GaAs} substrate with etched trenches for electrical isolation, the series-connected subcells optimize absorption and minimize parasitic effects. The \Acp{PPC} were used for an eye-safe 1.5~m optical wireless link, employing \ac{OFDM} with adaptive bit and power loading. The system achieved a world record data rate of 3.8~Gbps, which is four times higher than prior works. The system converts 39.7\% of optical power from a beam of 2.3~mW, although the segmentation increases the sensitivity of the alignment. These findings provide new solutions for off-grid backhaul for future communication networks, such as 6th generation (6G) cellular.
Differentiable Model Predictive Control on the GPU
Differentiable model predictive control (MPC) offers a powerful framework for combining learning and control. However, its adoption has been limited by the inherently sequential nature of traditional optimization algorithms, which are challenging to parallelize on modern computing hardware like GPUs. In this work, we tackle this bottleneck by introducing a GPU-accelerated differentiable optimization tool for MPC. This solver leverages sequential quadratic programming and a custom preconditioned conjugate gradient (PCG) routine with tridiagonal preconditioning to exploit the problem's structure and enable efficient parallelization. We demonstrate substantial speedups over CPU- and GPU-based baselines, significantly improving upon state-of-the-art training times on benchmark reinforcement learning and imitation learning tasks. Finally, we showcase the method on the challenging task of reinforcement learning for driving at the limits of handling, where it enables robust drifting of a Toyota Supra through water puddles.
Learning Mixtures of Linear Dynamical Systems (MoLDS) via Hybrid Tensor-EM Method
Mixtures of linear dynamical systems (MoLDS) provide a path to model time-series data that exhibit diverse temporal dynamics across trajectories. However, its application remains challenging in complex and noisy settings, limiting its effectiveness for neural data analysis. Tensor-based moment methods can provide global identifiability guarantees for MoLDS, but their performance degrades under noise and complexity. Commonly used expectation-maximization (EM) methods offer flexibility in fitting latent models but are highly sensitive to initialization and prone to poor local minima. Here, we propose a tensor-based method that provides identifiability guarantees for learning MoLDS, which is followed by EM updates to combine the strengths of both approaches. The novelty in our approach lies in the construction of moment tensors using the input-output data to recover globally consistent estimates of mixture weights and system parameters. These estimates can then be refined through a Kalman EM algorithm, with closed-form updates for all LDS parameters. We validate our framework on synthetic benchmarks and real-world datasets. On synthetic data, the proposed Tensor-EM method achieves more reliable recovery and improved robustness compared to either pure tensor or randomly initialized EM methods. We then analyze neural recordings from the primate somatosensory cortex while a non-human primate performs reaches in different directions. Our method successfully models and clusters different conditions as separate subsystems, consistent with supervised single-LDS fits for each condition. Finally, we apply this approach to another neural dataset where monkeys perform a sequential reaching task. These results demonstrate that MoLDS provides an effective framework for modeling complex neural data, and that Tensor-EM is a reliable approach to MoLDS learning for these applications.
comment: 20 pages, 7 figures
Toward Model Matching for Remotely Controlled Differential Drive Robotic Vehicles
The problem of regulation of the orientation angle of a remotely controlled differential-drive mobile robot with actuator dynamics and network-induced delays is studied. Using a preinstalled two-layer nonlinear control scheme that decouples linear and angular velocities and regulates heading, a third, delay-dependent layer that achieves exact model matching from the orientation angle command to the orientation angle is introduced. The proposed outer loop controller is a delay dependent dynamic measurable output-feedback controller with dynamic proper precompensator. Parameterization yields a simple characteristic quasi-polynomial with coefficients constrained to satisfy stability for all delays up to a computable bound. Computational experiments confirm accurate tracking, fast settling and bounded internal signals and control voltages. The approach offers an analytic design alternative to AI-based tuning for delayed robotic systems.
Optimal Batched Scheduling of Stochastic Processing Networks Using Atomic Action Decomposition
Stochastic processing networks (SPNs) have broad applications in healthcare, transportation, and communication networks. The control of SPN is to dynamically assign servers in batches under uncertainty to optimize long-run performance. This problem is challenging as the policy dimension grows exponentially with the number of servers, making standard reinforcement learning and policy optimization methods intractable at scale. We propose an atomic action decomposition framework that addresses this scalability challenge by breaking joint assignments into sequential single-server assignments. This yields policies with constant dimension, independent of the number of servers. We study two classes of atomic policies, the step-dependent and step-independent atomic policies, and prove that both achieve the same optimal long-run average reward as the original joint policies. These results establish that computing the optimal SPN control can be made scalable without loss of optimality using the atomic framework. Our results offer theoretical justification for the strong empirical success of the atomic framework in large-scale applications reported in previous articles.
Hybrid Quantum-Classical Policy Gradient for Adaptive Control of Cyber-Physical Systems: A Comparative Study of VQC vs. MLP
The comparative evaluation between classical and quantum reinforcement learning (QRL) paradigms was conducted to investigate their convergence behavior, robustness under observational noise, and computational efficiency in a benchmark control environment. The study employed a multilayer perceptron (MLP) agent as a classical baseline and a parameterized variational quantum circuit (VQC) as a quantum counterpart, both trained on the CartPole-v1 environment over 500 episodes. Empirical results demonstrated that the classical MLP achieved near-optimal policy convergence with a mean return of 498.7 +/- 3.2, maintaining stable equilibrium throughout training. In contrast, the VQC exhibited limited learning capability, with an average return of 14.6 +/- 4.8, primarily constrained by circuit depth and qubit connectivity. Noise robustness analysis further revealed that the MLP policy deteriorated gracefully under Gaussian perturbations, while the VQC displayed higher sensitivity at equivalent noise levels. Despite the lower asymptotic performance, the VQC exhibited significantly lower parameter count and marginally increased training time, highlighting its potential scalability for low-resource quantum processors. The results suggest that while classical neural policies remain dominant in current control benchmarks, quantum-enhanced architectures could offer promising efficiency advantages once hardware noise and expressivity limitations are mitigated.
comment: 6 pages, 5 figures, 2 tables, 17 equations, 1 algorithm
Distributed Platoon Control Under Quantization: Stability Analysis and Privacy Preservation
Distributed control of connected and automated vehicles has attracted considerable interest for its potential to improve traffic efficiency and safety. However, such control schemes require sharing privacy-sensitive vehicle data, which introduces risks of information leakage and potential malicious activities. This paper investigates the stability and privacy-preserving properties of distributed platoon control under two types of quantizers: deterministic and probabilistic. For deterministic quantization, we show that the resulting control strategy ensures the system errors remain uniformly ultimately bounded. Moreover, in the absence of auxiliary information, an eavesdropper cannot uniquely infer sensitive vehicle states. In contrast, the use of probabilistic quantization enables asymptotic convergence of the vehicle platoon in expectation with bounded variance. Importantly, probabilistic quantizers can satisfy differential privacy guarantees, thereby preserving privacy even when the eavesdropper possesses arbitrary auxiliary information. We further analyze the trade-off between control performance and privacy by formulating an optimization problem that characterizes the impact of the quantization step on both metrics. Numerical simulations are provided to illustrate the performance differences between the two quantization strategies.
comment: 12 pages, 6 figures
Safe Landing on Small Celestial Bodies with Gravitational Uncertainty Using Disturbance Estimation and Control Barrier Functions
Soft landing on small celestial bodies (SCBs) poses unique challenges, as uncertainties in gravitational models and poorly characterized, dynamic environments require a high level of autonomy. Existing control approaches lack formal guarantees for safety constraint satisfaction, necessary to ensure the safe execution of the maneuvers. This paper introduces a control that addresses this limitation by integrating trajectory tracking, disturbance estimation, and safety enforcement. An extended high-gain observer is employed to estimate disturbances resulting from gravitational model uncertainties. We then apply a feedback-linearizing and disturbance-canceling controller that achieves exponential tracking of reference trajectories. Finally, we use a control barrier function based minimum-intervention controller to enforce state and input constraints through out the maneuver execution. This control combines trajectory tracking of offline generated reference trajectories with formal guarantees of safety, which follows common guidance and control architectures for spacecraft and allows aggressive maneuvers to be executed without compromising safety. Numerical simulations using fuel-optimal trajectories demonstrate the effectiveness of the controller in achieving precise and safe soft-landing, highlighting its potential for autonomous SCB missions.
Human-in-the-loop Optimisation in Robot-assisted Gait Training
Wearable robots offer a promising solution for quantitatively monitoring gait and providing systematic, adaptive assistance to promote patient independence and improve gait. However, due to significant interpersonal and intrapersonal variability in walking patterns, it is important to design robot controllers that can adapt to the unique characteristics of each individual. This paper investigates the potential of human-in-the-loop optimisation (HILO) to deliver personalised assistance in gait training. The Covariance Matrix Adaptation Evolution Strategy (CMA-ES) was employed to continuously optimise an assist-as-needed controller of a lower-limb exoskeleton. Six healthy individuals participated over a two-day experiment. Our results suggest that while the CMA-ES appears to converge to a unique set of stiffnesses for each individual, no measurable impact on the subjects' performance was observed during the validation trials. These findings highlight the impact of human-robot co-adaptation and human behaviour variability, whose effect may be greater than potential benefits of personalising rule-based assistive controllers. Our work contributes to understanding the limitations of current personalisation approaches in exoskeleton-assisted gait rehabilitation and identifies key challenges for effective implementation of human-in-the-loop optimisation in this domain.
Federated Split Learning for Resource-Constrained Robots in Industrial IoT: Framework Comparison, Optimization Strategies, and Future Directions
Federated split learning (FedSL) has emerged as a promising paradigm for enabling collaborative intelligence in industrial Internet of Things (IoT) systems, particularly in smart factories where data privacy, communication efficiency, and device heterogeneity are critical concerns. In this article, we present a comprehensive study of FedSL frameworks tailored for resource-constrained robots in industrial scenarios. We compare synchronous, asynchronous, hierarchical, and heterogeneous FedSL frameworks in terms of workflow, scalability, adaptability, and limitations under dynamic industrial conditions. Furthermore, we systematically categorize token fusion strategies into three paradigms: input-level (pre-fusion), intermediate-level (intra-fusion), and output-level (post-fusion), and summarize their respective strengths in industrial applications. We also provide adaptive optimization techniques to enhance the efficiency and feasibility of FedSL implementation, including model compression, split layer selection, computing frequency allocation, and wireless resource management. Simulation results validate the performance of these frameworks under industrial detection scenarios. Finally, we outline open issues and research directions of FedSL in future smart manufacturing systems.
comment: 9 pages, 5 figures, submitted to the IEEE magazine
Sample-Efficient and Smooth Cross-Entropy Method Model Predictive Control Using Deterministic Samples
Cross-entropy method model predictive control (CEM--MPC) is a powerful gradient-free technique for nonlinear optimal control, but its performance is often limited by the reliance on random sampling. This conventional approach can lead to inefficient exploration of the solution space and non-smooth control inputs, requiring a large number of samples to achieve satisfactory results. To address these limitations, we propose deterministic sampling CEM (dsCEM), a novel framework that replaces the random sampling step with deterministic samples derived from localized cumulative distributions (LCDs). Our approach introduces modular schemes to generate and adapt these sample sets, incorporating temporal correlations to ensure smooth control trajectories. This method can be used as a drop-in replacement for the sampling step in existing CEM-based controllers. Experimental evaluations on two nonlinear control tasks demonstrate that dsCEM consistently outperforms state-of-the-art iCEM in terms of cumulative cost and control input smoothness, particularly in the critical low-sample regime.
Generative AI-Driven Hierarchical Multi-Agent Framework for Zero-Touch Optical Networks
The rapid development of Generative Artificial Intelligence (GenAI) has catalyzed a transformative technological revolution across all walks of life. As the backbone of wideband communication, optical networks are expecting high-level autonomous operation and zero-touch management to accommodate their expanding network scales and escalating transmission bandwidth. The integration of GenAI is deemed as the pivotal solution for realizing zero-touch optical networks. However, the lifecycle management of optical networks involves a multitude of tasks and necessitates seamless collaboration across multiple layers, which poses significant challenges to the existing single-agent GenAI systems. In this paper, we propose a GenAI-driven hierarchical multi-agent framework designed to streamline multi-task autonomous execution for zero-touch optical networks. We present the architecture, implementation, and applications of this framework. A field-deployed mesh network is utilized to demonstrate three typical scenarios throughout the lifecycle of optical network: quality of transmission estimation in the planning stage, dynamic channel adding/dropping in the operation stage, and system capacity increase in the upgrade stage. The case studies, illustrate the capabilities of multi-agent framework in multi-task allocation, coordination, execution, evaluation, and summarization. This work provides a promising approach for the future development of intelligent, efficient, and collaborative network management solutions, paving the way for more specialized and adaptive zero-touch optical networks.
comment: 7 pages,6 figures, Accepted by lEEE Communications Magazine, Open call
GO-Flock: Goal-Oriented Flocking in 3D Unknown Environments with Depth Maps
Artificial Potential Field (APF) methods are widely used for reactive flocking control, but they often suffer from challenges such as deadlocks and local minima, especially in the presence of obstacles. Existing solutions to address these issues are typically passive, leading to slow and inefficient collective navigation. As a result, many APF approaches have only been validated in obstacle-free environments or simplified, pseudo 3D simulations. This paper presents GO-Flock, a hybrid flocking framework that integrates planning with reactive APF-based control. GO-Flock consists of an upstream Perception Module, which processes depth maps to extract waypoints and virtual agents for obstacle avoidance, and a downstream Collective Navigation Module, which applies a novel APF strategy to achieve effective flocking behavior in cluttered environments. We evaluate GO-Flock against passive APF-based approaches to demonstrate their respective merits, such as their flocking behavior and the ability to overcome local minima. Finally, we validate GO-Flock through obstacle-filled environment and also hardware-in-the-loop experiments where we successfully flocked a team of nine drones, six physical and three virtual, in a forest environment.
Real-Time Glass Detection and Reprojection using Sensor Fusion Onboard Aerial Robots ICRA 2026
Autonomous aerial robots are increasingly being deployed in real-world scenarios, where transparent obstacles present significant challenges to reliable navigation and mapping. These materials pose a unique problem for traditional perception systems because they lack discernible features and can cause conventional depth sensors to fail, leading to inaccurate maps and potential collisions. To ensure safe navigation, robots must be able to accurately detect and map these transparent obstacles. Existing methods often rely on large, expensive sensors or algorithms that impose high computational burdens, making them unsuitable for low Size, Weight, and Power (SWaP) robots. In this work, we propose a novel and computationally efficient framework for detecting and mapping transparent obstacles onboard a sub-300g quadrotor. Our method fuses data from a Time-of-Flight (ToF) camera and an ultrasonic sensor with a custom, lightweight 2D convolution model. This specialized approach accurately detects specular reflections and propagates their depth into corresponding empty regions of the depth map, effectively rendering transparent obstacles visible. The entire pipeline operates in real-time, utilizing only a small fraction of a CPU core on an embedded processor. We validate our system through a series of experiments in both controlled and real-world environments, demonstrating the utility of our method through experiments where the robot maps indoor environments containing glass. Our work is, to our knowledge, the first of its kind to demonstrate a real-time, onboard transparent obstacle mapping system on a low-SWaP quadrotor using only the CPU.
comment: 8 pages, 8 figures, submitted to ICRA 2026
Terrain-Aided Navigation Using a Point Cloud Measurement Sensor
We investigate the use of a point cloud measurement in terrain-aided navigation. Our goal is to aid an inertial navigation system, by exploring ways to generate a useful measurement innovation error for effective nonlinear state estimation. We compare two such measurement models that involve the scanning of a digital terrain elevation model: a) one that is based on typical ray-casting from a given pose, that returns the predicted point cloud measurement from that pose, and b) another computationally less intensive one that does not require raycasting and we refer to herein as a sliding grid. Besides requiring a pose, it requires the pattern of the point cloud measurement itself and returns a predicted point cloud measurement. We further investigate the observability properties of the altitude for both measurement models. As a baseline, we compare the use of a point cloud measurement performance to the use of a radar altimeter and show the gains in accuracy. We conclude by showing that a point cloud measurement outperforms the use of a radar altimeter, and the point cloud measurement model to use depends on the computational resources
Three-dimensional Integrated Guidance and Control for Leader-Follower Flexible Formation of Fixed Wing UAVs
This paper presents a nonlinear integrated guidance and control (IGC) approach for flexible leader-follower formation flight of fixed-wing unmanned aerial vehicles (UAVs) while accounting for high-fidelity aerodynamics and thrust dynamics. Unlike conventional leader-follower schemes that fix the follower's position relative to the leader, the follower is steered to maintain range and bearing angles (which is the angle between its velocity vector and its line-of-sight (LOS) with respect to the leader) arbitrarily close to the prescribed values, enabling the follower to maintain formation on a hemispherical region behind the leader. The proposed IGC framework directly maps leader-follower relative range dynamics to throttle commands, and the follower's velocity orientation relative to the LOS to aerodynamic control surface deflections. This enables synergism between guidance and control subsystems. The control design uses a dynamic surface control-based backstepping approach to achieve convergence to the desired formation set, where Lyapunov barrier functions are incorporated to ensure the follower's bearing angle is constrained within specified bounds. Rigorous stability analysis guarantees uniform ultimate boundedness of all error states and strict constraint satisfaction in the presence of aerodynamic nonlinearities. The proposed flexible formation scheme allows the follower to have an orientation mismatch relative to the leader to execute anticipatory reconfiguration by transitioning between the relative positions in the admissible formation set when the leader aggressively maneuvers. The proposed IGC law relies only on relative information and onboard sensors without the information about the leader's maneuver, making it suitable for GPS-denied or non-cooperative scenarios. Finally, we present simulation results to vindicate the effectiveness and robustness of our approach.
Comparing Normal Form Representations for Station-Keeping near Cislunar Libration Points
The normal forms provide useful approximations for many trajectories of interest within the circular restricted three-body problem. This paper aims to thoroughly compare two of these forms: the Birkhoff normal form and the resonant normal form, highlighting the strengths of each for the representation of center manifold trajectories. A method of station-keeping is introduced, analogous to Floquet modes, in which the unstable component is minimized at specific points along a trajectory through impulsive maneuvers. Three different formulations of the same station-keeping approach are posed, collectively spanning Lyapunov, vertical, and halo orbits, as well as Lissajous and quasihalo trajectories.
comment: 2025 AAS/AIAA Space Flight Mechanics Meeting
Time-causal and time-recursive wavelets
When to apply wavelet analysis to real-time temporal signals, where the future cannot be accessed, it is essential to base all the steps in the signal processing pipeline on computational mechanisms that are truly time-causal. This paper describes how a time-causal wavelet analysis can be performed based on concepts developed in the area of temporal scale-space theory, originating from a complete classification of temporal smoothing kernels that guarantee non-creation of new structures from finer to coarser temporal scale levels. By necessity, convolution with truncated exponential kernels in cascade constitutes the only permissable class of kernels, as well as their temporal derivatives as a natural complement to fulfil the admissibility conditions of wavelet representations. For a particular way of choosing the time constants in the resulting infinite convolution of truncated exponential kernels, to ensure temporal scale covariance and thus self-similarity over temporal scales, we describe how mother wavelets can be chosen as temporal derivatives of the resulting time-causal limit kernel. By developing connections between wavelet theory and scale-space theory, we characterize and quantify how the continuous scaling properties transfer to the discrete implementation, demonstrating how the proposed time-causal wavelet representation can reflect the duration of locally dominant temporal structures in the input signals. We propose that this notion of time-causal wavelet analysis could be a valuable tool for signal processing tasks, where streams of signals are to be processed in real time, specifically for signals that may contain local variations over a rich span of temporal scales, or more generally for analysing physical or biophysical temporal phenomena, where a fully time-causal analysis is called for to be physically realistic.
comment: 23 pages, 8 figures
Techno-economic analysis of self-sustainable thermophotovoltaic systems for grid-scale energy generation
To facilitate the widespread adoption of renewable energy, dispatchable, zero-emission power sources are essential for grid stability. This work performs a comprehensive techno-economic analysis of a self-sustainable thermophotovoltaic (TPV) system, an architecture that integrates solar charging to function as a standalone power generation asset. Using theory-based models for air-bridge InGaAs and Si diode cells, our analysis reveals that while the system is not currently competitive from a pure levelized of storage cost (LCOS) perspective due to the high capital expenditure for thermal battery materials, its primary value lies in its competitive levelized cost of electricity (LCOE). The results demonstrate that the LCOE of this self-sustaining system can be competitive with conventional dispatchable generators, such as gas turbines. Furthermore, at scales exceeding the gigawatt-hour level, a Si-based system can also achieve an LCOE comparable to that of traditional gas-turbine power plants, despite having a lower conversion efficiency than its InGaAs counterpart. This highlights a practical engineering pathway for leveraging silicon's immense manufacturing scalability, offering a lower-risk route to deployment compared to III-V materials. Ultimately, this work establishes the self-sustainable TPV architecture as a compelling pathway toward providing grid-scale, on-demand, zero-emission power.
comment: 27 pages, 6 figures, 1 table
Nonlinear System Identification for Model-Based Control of Waked Wind Turbines
This work presents a nonlinear system identification framework for modeling the power extraction dynamics of wind turbines, including both freestream and waked conditions. The approach models turbine dynamics using data-driven power coefficient maps expressed as combinations of compact radial basis functions and polynomial bases, parameterized in terms of tip-speed ratio and upstream conditions. These surrogate models are embedded in a first-order dynamic system suitable for model-based control. Experimental validation is carried out in two wind tunnel configurations: a low-turbulence tandem setup and a high-turbulence wind farm scenario. In the tandem case, the identified model is integrated into an adapted K\omega^2 controller, resulting in improved tip-speed ratio tracking and power stability compared to BEM-based and steady-state models. In the wind farm scenario, the model captures the statistical behavior of the turbines despite unresolved turbulence. The proposed method enables interpretable, adaptive control across a range of operating conditions without relying on black-box learning strategies.
comment: Submitted to: Data-Centric Engineering Journal Length: 27 pages (including references) Figures: 14 numbered figures (from Fig. 1 to Fig. 14) Keywords: Wind Turbines, Wake Interaction, Nonlinear System Identification, Adaptive Control, RBF Regression, Model-Based Control, Wind Tunnel Experiments
Electrical System Architecture for Aviation Electrification
The electrification of aircraft is reshaping the foundations of aerospace design by positioning electrical systems at the center of propulsion, control, and onboard functionality. This chapter provides an overview of electrical system architectures for electric and hybrid electric aircraft, highlighting both established principles and emerging design strategies. The discussion begins with the motivations for electrification, including reducing environmental impact, improving operational efficiency, and replacing complex pneumatic and hydraulic subsystems with lighter and more reliable electrical alternatives. Aircraft electrical architectures are classified into four major categories: conventional, more electric, all electric, and hybrid electric. A range of system topologies is examined, including direct current (DC), alternating current (AC), hybrid, and distributed configurations. Each is considered in terms of its effectiveness in delivering power, enabling redundancy, supporting fault isolation, and managing thermal performance. Real world examples are presented to demonstrate practical applications, with case studies drawn from the Boeing 787 Dreamliner, the Eviation Alice commuter aircraft, and NASA X57 Maxwell demonstrator. These examples illustrate the ongoing transition from incremental subsystem electrification toward fully integrated architectures that promise higher efficiency and greater sustainability.
Neural Network-based Co-design of Output-Feedback Control Barrier Function and Observer
Control Barrier Functions (CBFs) provide a powerful framework for ensuring safety in dynamical systems. However, their application typically relies on full state information, which is often violated in real-world scenarios due to the availability of partial state information. In this work, we propose a neural network-based framework for the co-design of a safety controller, observer, and CBF for partially observed continuous-time systems. By formulating barrier conditions over an augmented state space, our approach ensures safety without requiring bounded estimation errors or handcrafted barrier functions. All components are jointly trained by formulating appropriate loss functions, and we introduce a validity condition to provide formal safety guarantees beyond the training data. Finally, we demonstrate the effectiveness of the proposed approach through several case studies.
comment: There were errors in paper (introduction section and notations)
Efficient MPC-Based Energy Management System for Secure and Cost-Effective Microgrid Operations
Model predictive control (MPC)-based energy management systems (EMS) are essential for ensuring optimal, secure, and stable operation in microgrids with high penetrations of distributed energy resources. However, due to the high computational cost for the decision-making, the conventional MPC-based EMS typically adopts a simplified integrated-bus power balance model. While this simplification is effective for small networks, large-scale systems require a more detailed branch flow model to account for the increased impact of grid power losses and security constraints. This work proposes an efficient and reliable MPC-based EMS that incorporates power-loss effects and grid-security constraints. %, while adaptively shaping the battery power profile in response to online renewable inputs, achieving reduced operational costs. It enhances system reliability, reduces operational costs, and shows strong potential for online implementation due to its reduced computational effort. Specifically, a second-order cone program (SOCP) branch flow relaxation is integrated into the constraint set, yielding a convex formulation that guarantees globally optimal solutions with high computational efficiency. Owing to the radial topology of the microgrid, this relaxation is practically tight, ensuring equivalence to the original problem. Building on this foundation, an online demand response (DR) module is designed to further reduce the operation cost through peak shaving. To the best of our knowledge, no prior MPC-EMS framework has simultaneously modeled losses and security constraints while coordinating flexible loads within a unified architecture. The developed framework enables secure operation with effective peak shaving and reduced total cost. The effectiveness of the proposed method is validated on 10-bus, 18-bus, and 33-bus systems.
Equivariant Filter for Relative Attitude and Target's Angular Velocity Estimation
Accurate estimation of the relative attitude and angular velocity between two rigid bodies is fundamental in aerospace applications such as spacecraft rendezvous and docking. In these scenarios, a chaser vehicle must determine the orientation and angular velocity of a target object using onboard sensors. This work addresses the challenge of designing an Equivariant Filter (EqF) that can reliably estimate both the relative attitude and the target angular velocity using noisy observations of two known, non-collinear vectors fixed in the target frame. To derive the EqF, a symmetry for the system is proposed and an equivariant lift onto the symmetry group is calculated. Observability and convergence properties are analyzed. Simulations demonstrate the filter's performance, with Monte Carlo runs yielding statistically significant results. The impact of low-rate measurements is also examined and a strategy to mitigate this effect is proposed. Experimental results, using fiducial markers and both conventional and event cameras for measurement acquisition, further validate the approach, confirming its effectiveness in a realistic setting.
comment: This work has been submitted to the IEEE for possible publication
Optimal Duration of Reserve Capacity Ancillary Services for Distributed Energy Resources
The increasing integration of distributed energy resources (DERs) into power systems presents opportunities and challenges for ancillary services (AS) provision. Technical requirements of existing AS (i.e., duration, reliability, ramp rate, and lead time) have been designed for traditional generating units, making their provision by DER aggregates particularly challenging. This paper proposes a method to design the duration of reserve capacity AS products considering the operational constraints of DERs and the temporal dynamics of system imbalances. The optimal product duration is determined by maximizing product availability and aligning the supply profile with the system's balancing needs. We apply the methodology to a realistic Swiss low-voltage network with a diverse DER portfolio. The results reveal that (i) shorter product durations maximize average availability and (ii) long product durations improve the alignment with system balancing needs. This paper offers valuable insights for system operators to design AS products tailored for DER participation.
Image-Based Visual Servoing for Enhanced Cooperation of Dual-Arm Manipulation
The cooperation of a pair of robot manipulators is required to manipulate a target object without any fixtures. The conventional control methods coordinate the end-effector pose of each manipulator with that of the other using their kinematics and joint coordinate measurements. Yet, the manipulators' inaccurate kinematics and joint coordinate measurements can cause significant pose synchronization errors in practice. This paper thus proposes an image-based visual servoing approach for enhancing the cooperation of a dual-arm manipulation system. On top of the classical control, the visual servoing controller lets each manipulator use its carried camera to measure the image features of the other's marker and adapt its end-effector pose with the counterpart on the move. Because visual measurements are robust to kinematic errors, the proposed control can reduce the end-effector pose synchronization errors and the fluctuations of the interaction forces of the pair of manipulators on the move. Theoretical analyses have rigorously proven the stability of the closed-loop system. Comparative experiments on real robots have substantiated the effectiveness of the proposed control.
comment: 8 pages, 7 figures. Project website: https://zizhe.io/ral-ibvs-enhanced/. This work has been accepted to the IEEE Robotics and Automation Letters in Feb 2025
Generalizable Physics-Informed Learning for Stochastic Safety-Critical Systems
Accurate estimation of long-term risk is essential for the design and analysis of stochastic dynamical systems. Existing risk quantification methods typically rely on extensive datasets involving risk events observed over extended time horizons, which can be prohibitively expensive to acquire. Motivated by this gap, we propose an efficient method for learning long-term risk probabilities using short-term samples with limited occurrence of risk events. Specifically, we establish that four distinct classes of long-term risk probabilities are characterized by specific partial differential equations (PDEs). Using this characterization, we introduce a physics-informed learning framework that combines empirical data with physics information to infer risk probabilities. We then analyze the theoretical properties of this framework in terms of generalization and convergence. Through numerical experiments, we demonstrate that our framework not only generalizes effectively beyond the sampled states and time horizons but also offers additional benefits such as improved sample efficiency, rapid online inference capabilities under changing system dynamics, and stable computation of probability gradients. These results highlight how embedding PDE constraints, which contain explicit gradient terms and inform how risk probabilities depend on state, time horizon, and system parameters, improves interpolation and generalization between/beyond the available data.
Multi-Agent Stage-wise Conservative Linear Bandits
In many real-world applications such as recommendation systems, multiple learning agents must balance exploration and exploitation while maintaining safety guarantees to avoid catastrophic failures. We study the stochastic linear bandit problem in a multi-agent networked setting where agents must satisfy stage-wise conservative constraints. A network of $N$ agents collaboratively maximizes cumulative reward while ensuring that the expected reward at every round is no less than $(1-\alpha)$ times that of a baseline policy. Each agent observes local rewards with unknown parameters, but the network optimizes for the global parameter (average of local parameters). Agents communicate only with immediate neighbors, and each communication round incurs additional regret. We propose MA-SCLUCB (Multi-Agent Stage-wise Conservative Linear UCB), an episodic algorithm alternating between action selection and consensus-building phases. We prove that MA-SCLUCB achieves regret $\tilde{O}\left(\frac{d}{\sqrt{N}}\sqrt{T}\cdot\frac{\log(NT)}{\sqrt{\log(1/|\lambda_2|)}}\right)$ with high probability, where $d$ is the dimension, $T$ is the horizon, and $|\lambda_2|$ is the network's second largest eigenvalue magnitude. Our analysis shows: (i) collaboration yields $\frac{1}{\sqrt{N}}$ improvement despite local communication, (ii) communication overhead grows only logarithmically for well-connected networks, and (iii) stage-wise safety adds only lower-order regret. Thus, distributed learning with safety guarantees achieves near-optimal performance in reasonably connected networks.
Robotics
Dropping the D: RGB-D SLAM Without the Depth Sensor
We present DropD-SLAM, a real-time monocular SLAM system that achieves RGB-D-level accuracy without relying on depth sensors. The system replaces active depth input with three pretrained vision modules: a monocular metric depth estimator, a learned keypoint detector, and an instance segmentation network. Dynamic objects are suppressed using dilated instance masks, while static keypoints are assigned predicted depth values and backprojected into 3D to form metrically scaled features. These are processed by an unmodified RGB-D SLAM back end for tracking and mapping. On the TUM RGB-D benchmark, DropD-SLAM attains 7.4 cm mean ATE on static sequences and 1.8 cm on dynamic sequences, matching or surpassing state-of-the-art RGB-D methods while operating at 22 FPS on a single GPU. These results suggest that modern pretrained vision models can replace active depth sensors as reliable, real-time sources of metric scale, marking a step toward simpler and more cost-effective SLAM systems.
EmbodiedCoder: Parameterized Embodied Mobile Manipulation via Modern Coding Model
Recent advances in control robot methods, from end-to-end vision-language-action frameworks to modular systems with predefined primitives, have advanced robots' ability to follow natural language instructions. Nonetheless, many approaches still struggle to scale to diverse environments, as they often rely on large annotated datasets and offer limited interpretability.In this work, we introduce EmbodiedCoder, a training-free framework for open-world mobile robot manipulation that leverages coding models to directly generate executable robot trajectories. By grounding high-level instructions in code, EmbodiedCoder enables flexible object geometry parameterization and manipulation trajectory synthesis without additional data collection or fine-tuning.This coding-based paradigm provides a transparent and generalizable way to connect perception with manipulation. Experiments on real mobile robots show that EmbodiedCoder achieves robust performance across diverse long-term tasks and generalizes effectively to novel objects and environments.Our results demonstrate an interpretable approach for bridging high-level reasoning and low-level control, moving beyond fixed primitives toward versatile robot intelligence. See the project page at: https://anonymous.4open.science/w/Embodied-Coder/
comment: Demo Page: https://anonymous.4open.science/w/Embodied-Coder/
DYMO-Hair: Generalizable Volumetric Dynamics Modeling for Robot Hair Manipulation
Hair care is an essential daily activity, yet it remains inaccessible to individuals with limited mobility and challenging for autonomous robot systems due to the fine-grained physical structure and complex dynamics of hair. In this work, we present DYMO-Hair, a model-based robot hair care system. We introduce a novel dynamics learning paradigm that is suited for volumetric quantities such as hair, relying on an action-conditioned latent state editing mechanism, coupled with a compact 3D latent space of diverse hairstyles to improve generalizability. This latent space is pre-trained at scale using a novel hair physics simulator, enabling generalization across previously unseen hairstyles. Using the dynamics model with a Model Predictive Path Integral (MPPI) planner, DYMO-Hair is able to perform visual goal-conditioned hair styling. Experiments in simulation demonstrate that DYMO-Hair's dynamics model outperforms baselines on capturing local deformation for diverse, unseen hairstyles. DYMO-Hair further outperforms baselines in closed-loop hair styling tasks on unseen hairstyles, with an average of 22% lower final geometric error and 42% higher success rate than the state-of-the-art system. Real-world experiments exhibit zero-shot transferability of our system to wigs, achieving consistent success on challenging unseen hairstyles where the state-of-the-art system fails. Together, these results introduce a foundation for model-based robot hair care, advancing toward more generalizable, flexible, and accessible robot hair styling in unconstrained physical environments. More details are available on our project page: https://chengyzhao.github.io/DYMOHair-web/.
comment: Project page: https://chengyzhao.github.io/DYMOHair-web/
A Preview of HoloOcean 2.0 ICRA 2025
Marine robotics simulators play a fundamental role in the development of marine robotic systems. With increased focus on the marine robotics field in recent years, there has been significant interest in developing higher fidelitysimulation of marine sensors, physics, and visual rendering capabilities to support autonomous marine robot development and validation. HoloOcean 2.0, the next major release of HoloOcean, brings state-of-the-art features under a general marine simulator capable of supporting a variety of tasks. New features in HoloOcean 2.0 include migration to Unreal Engine (UE) 5.3, advanced vehicle dynamics using models from Fossen, and support for ROS2 using a custom bridge. Additional features are currently in development, including significantly more efficient ray tracing-based sidescan, forward-looking, and bathymetric sonar implementations; semantic sensors; environment generation tools; volumetric environmental effects; and realistic waves.
comment: 5 pages, 9 figures, submitted to the ICRA 2025 aq2uasim workshop
Vision-Guided Targeted Grasping and Vibration for Robotic Pollination in Controlled Environments
Robotic pollination offers a promising alternative to manual labor and bumblebee-assisted methods in controlled agriculture, where wind-driven pollination is absent and regulatory restrictions limit the use of commercial pollinators. In this work, we present and validate a vision-guided robotic framework that uses data from an end-effector mounted RGB-D sensor and combines 3D plant reconstruction, targeted grasp planning, and physics-based vibration modeling to enable precise pollination. First, the plant is reconstructed in 3D and registered to the robot coordinate frame to identify obstacle-free grasp poses along the main stem. Second, a discrete elastic rod model predicts the relationship between actuation parameters and flower dynamics, guiding the selection of optimal pollination strategies. Finally, a manipulator with soft grippers grasps the stem and applies controlled vibrations to induce pollen release. End-to-end experiments demonstrate a 92.5\% main-stem grasping success rate, and simulation-guided optimization of vibration parameters further validates the feasibility of our approach, ensuring that the robot can safely and effectively perform pollination without damaging the flower. To our knowledge, this is the first robotic system to jointly integrate vision-based grasping and vibration modeling for automated precision pollination.
Towards Autonomous Tape Handling for Robotic Wound Redressing
Chronic wounds, such as diabetic, pressure, and venous ulcers, affect over 6.5 million patients in the United States alone and generate an annual cost exceeding \$25 billion. Despite this burden, chronic wound care remains a routine yet manual process performed exclusively by trained clinicians due to its critical safety demands. We envision a future in which robotics and automation support wound care to lower costs and enhance patient outcomes. This paper introduces an autonomous framework for one of the most fundamental yet challenging subtasks in wound redressing: adhesive tape manipulation. Specifically, we address two critical capabilities: tape initial detachment (TID) and secure tape placement. To handle the complex adhesive dynamics of detachment, we propose a force-feedback imitation learning approach trained from human teleoperation demonstrations. For tape placement, we develop a numerical trajectory optimization method based to ensure smooth adhesion and wrinkle-free application across diverse anatomical surfaces. We validate these methods through extensive experiments, demonstrating reliable performance in both quantitative evaluations and integrated wound redressing pipelines. Our results establish tape manipulation as an essential step toward practical robotic wound care automation.
Multi-Robot Distributed Optimization for Exploration and Mapping of Unknown Environments using Bioinspired Tactile-Sensor
This project proposes a bioinspired multi-robot system using Distributed Optimization for efficient exploration and mapping of unknown environments. Each robot explores its environment and creates a map, which is afterwards put together to form a global 2D map of the environment. Inspired by wall-following behaviors, each robot autonomously explores its neighborhood based on a tactile sensor, similar to the antenna of a cockroach, mounted on the surface of the robot. Instead of avoiding obstacles, robots log collision points when they touch obstacles. This decentralized control strategy ensures effective task allocation and efficient exploration of unknown terrains, with applications in search and rescue, industrial inspection, and environmental monitoring. The approach was validated through experiments using e-puck robots in a simulated 1.5 x 1.5 m environment with three obstacles. The results demonstrated the system's effectiveness in achieving high coverage, minimizing collisions, and constructing accurate 2D maps.
Cross-Embodiment Dexterous Hand Articulation Generation via Morphology-Aware Learning
Dexterous grasping with multi-fingered hands remains challenging due to high-dimensional articulations and the cost of optimization-based pipelines. Existing end-to-end methods require training on large-scale datasets for specific hands, limiting their ability to generalize across different embodiments. We propose an eigengrasp-based, end-to-end framework for cross-embodiment grasp generation. From a hand's morphology description, we derive a morphology embedding and an eigengrasp set. Conditioned on these, together with the object point cloud and wrist pose, an amplitude predictor regresses articulation coefficients in a low-dimensional space, which are decoded into full joint articulations. Articulation learning is supervised with a Kinematic-Aware Articulation Loss (KAL) that emphasizes fingertip-relevant motions and injects morphology-specific structure. In simulation on unseen objects across three dexterous hands, our model attains a 91.9% average grasp success rate with less than 0.4 seconds inference per grasp. With few-shot adaptation to an unseen hand, it achieves 85.6% success on unseen objects in simulation, and real-world experiments on this few-shot generalized hand achieve an 87% success rate. The code and additional materials will be made available upon publication on our project website https://connor-zh.github.io/cross_embodiment_dexterous_grasping.
Hybrid Quantum-Classical Policy Gradient for Adaptive Control of Cyber-Physical Systems: A Comparative Study of VQC vs. MLP
The comparative evaluation between classical and quantum reinforcement learning (QRL) paradigms was conducted to investigate their convergence behavior, robustness under observational noise, and computational efficiency in a benchmark control environment. The study employed a multilayer perceptron (MLP) agent as a classical baseline and a parameterized variational quantum circuit (VQC) as a quantum counterpart, both trained on the CartPole-v1 environment over 500 episodes. Empirical results demonstrated that the classical MLP achieved near-optimal policy convergence with a mean return of 498.7 +/- 3.2, maintaining stable equilibrium throughout training. In contrast, the VQC exhibited limited learning capability, with an average return of 14.6 +/- 4.8, primarily constrained by circuit depth and qubit connectivity. Noise robustness analysis further revealed that the MLP policy deteriorated gracefully under Gaussian perturbations, while the VQC displayed higher sensitivity at equivalent noise levels. Despite the lower asymptotic performance, the VQC exhibited significantly lower parameter count and marginally increased training time, highlighting its potential scalability for low-resource quantum processors. The results suggest that while classical neural policies remain dominant in current control benchmarks, quantum-enhanced architectures could offer promising efficiency advantages once hardware noise and expressivity limitations are mitigated.
comment: 6 pages, 5 figures, 2 tables, 17 equations, 1 algorithm
Information-Theoretic Policy Pre-Training with Empowerment
Empowerment, an information-theoretic measure of an agent's potential influence on its environment, has emerged as a powerful intrinsic motivation and exploration framework for reinforcement learning (RL). Besides for unsupervised RL and skill learning algorithms, the specific use of empowerment as a pre-training signal has received limited attention in the literature. We show that empowerment can be used as a pre-training signal for data-efficient downstream task adaptation. For this we extend the traditional notion of empowerment by introducing discounted empowerment, which balances the agent's control over the environment across short- and long-term horizons. Leveraging this formulation, we propose a novel pre-training paradigm that initializes policies to maximize discounted empowerment, enabling agents to acquire a robust understanding of environmental dynamics. We analyze empowerment-based pre-training for various existing RL algorithms and empirically demonstrate its potential as a general-purpose initialization strategy: empowerment-maximizing policies with long horizons are data-efficient and effective, leading to improved adaptability in downstream tasks. Our findings pave the way for future research to scale this framework to high-dimensional and complex tasks, further advancing the field of RL.
Coordinate-Consistent Localization via Continuous-Time Calibration and Fusion of UWB and SLAM Observations
Onboard simultaneous localization and mapping (SLAM) methods are commonly used to provide accurate localization information for autonomous robots. However, the coordinate origin of SLAM estimate often resets for each run. On the other hand, UWB-based localization with fixed anchors can ensure a consistent coordinate reference across sessions; however, it requires an accurate assignment of the anchor nodes' coordinates. To this end, we propose a two-stage approach that calibrates and fuses UWB data and SLAM data to achieve coordinate-wise consistent and accurate localization in the same environment. In the first stage, we solve a continuous-time batch optimization problem by using the range and odometry data from one full run, incorporating height priors and anchor-to-anchor distance factors to recover the anchors' 3D positions. For the subsequent runs in the second stage, a sliding-window optimization scheme fuses the UWB and SLAM data, which facilitates accurate localization in the same coordinate system. Experiments are carried out on the NTU VIRAL dataset with six scenarios of UAV flight, and we show that calibration using data in one run is sufficient to enable accurate localization in the remaining runs. We release our source code to benefit the community at https://github.com/ntdathp/slam-uwb-calibration.
AI-Enabled Capabilities to Facilitate Next-Generation Rover Surface Operations
Current planetary rovers operate at traverse speeds of approximately 10 cm/s, fundamentally limiting exploration efficiency. This work presents integrated AI systems which significantly improve autonomy through three components: (i) the FASTNAV Far Obstacle Detector (FOD), capable of facilitating sustained 1.0 m/s speeds via computer vision-based obstacle detection; (ii) CISRU, a multi-robot coordination framework enabling human-robot collaboration for in-situ resource utilisation; and (iii) the ViBEKO and AIAXR deep learning-based terrain classification studies. Field validation in Mars analogue environments demonstrated these systems at Technology Readiness Level 4, providing measurable improvements in traverse speed, classification accuracy, and operational safety for next-generation planetary missions.
comment: Paper for 18th Symposium on Advanced Space Technologies in Robotics and Automation (ASTRA), presented on October 7th at Leiden, Netherlands
The DISTANT Design for Remote Transmission and Steering Systems for Planetary Robotics
Planetary exploration missions require robust locomotion systems capable of operating in extreme environments over extended periods. This paper presents the DISTANT (Distant Transmission and Steering Systems) design, a novel approach for relocating rover traction and steering actuators from wheel-mounted positions to a thermally protected warm box within the rover body. The design addresses critical challenges in long-distance traversal missions by protecting sensitive components from thermal cycling, dust contamination, and mechanical wear. A double wishbone suspension configuration with cardan joints and capstan drive steering has been selected as the optimal architecture following comprehensive trade-off analysis. The system enables independent wheel traction, steering control, and suspension management whilst maintaining all motorisation within the protected environment. The design meets a 50 km traverse requirement without performance degradation, with integrated dust protection mechanisms and thermal management solutions. Testing and validation activities are planned for Q1 2026 following breadboard manufacturing at 1:3 scale.
comment: Paper for 18th Symposium on Advanced Space Technologies in Robotics and Automation (ASTRA), presented on October 7th at Leiden, Netherlands
Learning to Crawl: Latent Model-Based Reinforcement Learning for Soft Robotic Adaptive Locomotion
Soft robotic crawlers are mobile robots that utilize soft body deformability and compliance to achieve locomotion through surface contact. Designing control strategies for such systems is challenging due to model inaccuracies, sensor noise, and the need to discover locomotor gaits. In this work, we present a model-based reinforcement learning (MB-RL) framework in which latent dynamics inferred from onboard sensors serve as a predictive model that guides an actor-critic algorithm to optimize locomotor policies. We evaluate the framework on a minimal crawler model in simulation using inertial measurement units and time-of-flight sensors as observations. The learned latent dynamics enable short-horizon motion prediction while the actor-critic discovers effective locomotor policies. This approach highlights the potential of latent-dynamics MB-RL for enabling embodied soft robotic adaptive locomotion based solely on noisy sensor feedback.
A Co-Design Framework for Energy-Aware Monoped Jumping with Detailed Actuator Modeling
A monoped's jump height and energy consumption depend on both, its mechanical design and control strategy. Existing co-design frameworks typically optimize for either maximum height or minimum energy, neglecting their trade-off. They also often omit gearbox parameter optimization and use oversimplified actuator mass models, producing designs difficult to replicate in practice. In this work, we introduce a novel three-stage co-design optimization framework that jointly maximizes jump height while minimizing mechanical energy consumption of a monoped. The proposed method explicitly incorporates realistic actuator mass models and optimizes mechanical design (including gearbox) and control parameters within a unified framework. The resulting design outputs are then used to automatically generate a parameterized CAD model suitable for direct fabrication, significantly reducing manual design iterations. Our experimental evaluations demonstrate a 50 percent reduction in mechanical energy consumption compared to the baseline design, while achieving a jump height of 0.8m. Video presentation is available at http://y2u.be/XW8IFRCcPgM
comment: 7 pages, 8 figures, 1 table, Accepted at IEEE-RAS 24th International Conference on Humanoid Robots (Humanoids) 2025, Aman Singh, Aastha Mishra - Authors contributed equally
The Safety Challenge of World Models for Embodied AI Agents: A Review
The rapid progress in embodied artificial intelligence has highlighted the necessity for more advanced and integrated models that can perceive, interpret, and predict environmental dynamics. In this context, World Models (WMs) have been introduced to provide embodied agents with the abilities to anticipate future environmental states and fill in knowledge gaps, thereby enhancing agents' ability to plan and execute actions. However, when dealing with embodied agents it is fundamental to ensure that predictions are safe for both the agent and the environment. In this article, we conduct a comprehensive literature review of World Models in the domains of autonomous driving and robotics, with a specific focus on the safety implications of scene and control generation tasks. Our review is complemented by an empirical analysis, wherein we collect and examine predictions from state-of-the-art models, identify and categorize common faults (herein referred to as pathologies), and provide a quantitative evaluation of the results.
VCoT-Grasp: Grasp Foundation Models with Visual Chain-of-Thought Reasoning for Language-driven Grasp Generation
Robotic grasping is one of the most fundamental tasks in robotic manipulation, and grasp detection/generation has long been the subject of extensive research. Recently, language-driven grasp generation has emerged as a promising direction due to its practical interaction capabilities. However, most existing approaches either lack sufficient reasoning and generalization capabilities or depend on complex modular pipelines. Moreover, current grasp foundation models tend to overemphasize dialog and object semantics, resulting in inferior performance and restriction to single-object grasping. To maintain strong reasoning ability and generalization in cluttered environments, we propose VCoT-Grasp, an end-to-end grasp foundation model that incorporates visual chain-of-thought reasoning to enhance visual understanding for grasp generation. VCoT-Grasp adopts a multi-turn processing paradigm that dynamically focuses on visual inputs while providing interpretable reasoning traces. For training, we refine and introduce a large-scale dataset, VCoT-GraspSet, comprising 167K synthetic images with over 1.36M grasps, as well as 400+ real-world images with more than 1.2K grasps, annotated with intermediate bounding boxes. Extensive experiments on both VCoT-GraspSet and real robot demonstrate that our method significantly improves grasp success rates and generalizes effectively to unseen objects, backgrounds, and distractors. More details can be found at https://zhanghr2001.github.io/VCoT-Grasp.github.io.
Human-in-the-loop Optimisation in Robot-assisted Gait Training
Wearable robots offer a promising solution for quantitatively monitoring gait and providing systematic, adaptive assistance to promote patient independence and improve gait. However, due to significant interpersonal and intrapersonal variability in walking patterns, it is important to design robot controllers that can adapt to the unique characteristics of each individual. This paper investigates the potential of human-in-the-loop optimisation (HILO) to deliver personalised assistance in gait training. The Covariance Matrix Adaptation Evolution Strategy (CMA-ES) was employed to continuously optimise an assist-as-needed controller of a lower-limb exoskeleton. Six healthy individuals participated over a two-day experiment. Our results suggest that while the CMA-ES appears to converge to a unique set of stiffnesses for each individual, no measurable impact on the subjects' performance was observed during the validation trials. These findings highlight the impact of human-robot co-adaptation and human behaviour variability, whose effect may be greater than potential benefits of personalising rule-based assistive controllers. Our work contributes to understanding the limitations of current personalisation approaches in exoskeleton-assisted gait rehabilitation and identifies key challenges for effective implementation of human-in-the-loop optimisation in this domain.
Precise and Efficient Collision Prediction under Uncertainty in Autonomous Driving ICRA 2026
This research introduces two efficient methods to estimate the collision risk of planned trajectories in autonomous driving under uncertain driving conditions. Deterministic collision checks of planned trajectories are often inaccurate or overly conservative, as noisy perception, localization errors, and uncertain predictions of other traffic participants introduce significant uncertainty into the planning process. This paper presents two semi-analytic methods to compute the collision probability of planned trajectories with arbitrary convex obstacles. The first approach evaluates the probability of spatial overlap between an autonomous vehicle and surrounding obstacles, while the second estimates the collision probability based on stochastic boundary crossings. Both formulations incorporate full state uncertainties, including position, orientation, and velocity, and achieve high accuracy at computational costs suitable for real-time planning. Simulation studies verify that the proposed methods closely match Monte Carlo results while providing significant runtime advantages, enabling their use in risk-aware trajectory planning. The collision estimation methods are available as open-source software: https://github.com/TUM-AVS/Collision-Probability-Estimation
comment: 8 pages, submitted to the IEEE ICRA 2026, Vienna, Austria
Federated Split Learning for Resource-Constrained Robots in Industrial IoT: Framework Comparison, Optimization Strategies, and Future Directions
Federated split learning (FedSL) has emerged as a promising paradigm for enabling collaborative intelligence in industrial Internet of Things (IoT) systems, particularly in smart factories where data privacy, communication efficiency, and device heterogeneity are critical concerns. In this article, we present a comprehensive study of FedSL frameworks tailored for resource-constrained robots in industrial scenarios. We compare synchronous, asynchronous, hierarchical, and heterogeneous FedSL frameworks in terms of workflow, scalability, adaptability, and limitations under dynamic industrial conditions. Furthermore, we systematically categorize token fusion strategies into three paradigms: input-level (pre-fusion), intermediate-level (intra-fusion), and output-level (post-fusion), and summarize their respective strengths in industrial applications. We also provide adaptive optimization techniques to enhance the efficiency and feasibility of FedSL implementation, including model compression, split layer selection, computing frequency allocation, and wireless resource management. Simulation results validate the performance of these frameworks under industrial detection scenarios. Finally, we outline open issues and research directions of FedSL in future smart manufacturing systems.
comment: 9 pages, 5 figures, submitted to the IEEE magazine
Stable Robot Motions on Manifolds: Learning Lyapunov-Constrained Neural Manifold ODEs
Learning stable dynamical systems from data is crucial for safe and reliable robot motion planning and control. However, extending stability guarantees to trajectories defined on Riemannian manifolds poses significant challenges due to the manifold's geometric constraints. To address this, we propose a general framework for learning stable dynamical systems on Riemannian manifolds using neural ordinary differential equations. Our method guarantees stability by projecting the neural vector field evolving on the manifold so that it strictly satisfies the Lyapunov stability criterion, ensuring stability at every system state. By leveraging a flexible neural parameterisation for both the base vector field and the Lyapunov function, our framework can accurately represent complex trajectories while respecting manifold constraints by evolving solutions directly on the manifold. We provide an efficient training strategy for applying our framework and demonstrate its utility by solving Riemannian LASA datasets on the unit quaternion (S^3) and symmetric positive-definite matrix manifolds, as well as robotic motions evolving on \mathbb{R}^3 \times S^3. We demonstrate the performance, scalability, and practical applicability of our approach through extensive simulations and by learning robot motions in a real-world experiment.
comment: 12 pages, 6 figures
Oracle-Guided Masked Contrastive Reinforcement Learning for Visuomotor Policies
A prevailing approach for learning visuomotor policies is to employ reinforcement learning to map high-dimensional visual observations directly to action commands. However, the combination of high-dimensional visual inputs and agile maneuver outputs leads to long-standing challenges, including low sample efficiency and significant sim-to-real gaps. To address these issues, we propose Oracle-Guided Masked Contrastive Reinforcement Learning (OMC-RL), a novel framework designed to improve the sample efficiency and asymptotic performance of visuomotor policy learning. OMC-RL explicitly decouples the learning process into two stages: an upstream representation learning stage and a downstream policy learning stage. In the upstream stage, a masked Transformer module is trained with temporal modeling and contrastive learning to extract temporally-aware and task-relevant representations from sequential visual inputs. After training, the learned encoder is frozen and used to extract visual representations from consecutive frames, while the Transformer module is discarded. In the downstream stage, an oracle teacher policy with privileged access to global state information supervises the agent during early training to provide informative guidance and accelerate early policy learning. This guidance is gradually reduced to allow independent exploration as training progresses. Extensive experiments in simulated and real-world environments demonstrate that OMC-RL achieves superior sample efficiency and asymptotic policy performance, while also improving generalization across diverse and perceptually complex scenarios.
D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI
Large language models leverage internet-scale text data, yet embodied AI remains constrained by the prohibitive costs of physical trajectory collection. Desktop environments -- particularly gaming -- offer a compelling alternative: they provide rich sensorimotor interactions at scale while maintaining the structured observation-action coupling essential for embodied learning. We present D2E (Desktop to Embodied AI), a framework that demonstrates desktop interactions can serve as an effective pretraining substrate for robotics embodied AI tasks. Unlike prior work that remained domain-specific (e.g., VPT for Minecraft) or kept data proprietary (e.g., SIMA), D2E establishes a complete pipeline from scalable desktop data collection to verified transfer in embodied domains. Our framework comprises three components: (1) the OWA Toolkit that unifies diverse desktop interactions into a standardized format with 152x compression, (2) the Generalist-IDM that achieves strong zero-shot generalization across unseen games through timestamp-based event prediction, enabling internet-scale pseudo-labeling, and (3) VAPT that transfers desktop-pretrained representations to physical manipulation and navigation. Using 1.3K+ hours of data (259 hours of human demonstrations, and 1K+ hours of pseudo-labeled gameplay), we achieve a total of 96.6% success rate on LIBERO manipulation and 83.3% on CANVAS navigation benchmarks. This validates that sensorimotor primitives in digital interactions exhibit sufficient invariance to transfer meaningfully to physical embodied tasks, establishing desktop pretraining as a practical paradigm for robotics. We will make all our work public, including the OWA toolkit, datasets of human-collected and pseudo-labeled, and VAPT-trained models available at https://worv-ai.github.io/d2e/
Verifier-free Test-Time Sampling for Vision Language Action Models
Vision-Language-Action models (VLAs) have demonstrated remarkable performance in robot control. However, they remain fundamentally limited in tasks that require high precision due to their single-inference paradigm. While test-time scaling approaches using external verifiers have shown promise, they require additional training and fail to generalize to unseen conditions. We propose Masking Distribution Guided Selection (MG-Select), a novel test-time scaling framework for VLAs that leverages the model's internal properties without requiring additional training or external modules. Our approach utilizes KL divergence from a reference action token distribution as a confidence metric for selecting the optimal action from multiple candidates. We introduce a reference distribution generated by the same VLA but with randomly masked states and language conditions as inputs, ensuring maximum uncertainty while remaining aligned with the target task distribution. Additionally, we propose a joint training strategy that enables the model to learn both conditional and unconditional distributions by applying dropout to state and language conditions, thereby further improving the quality of the reference distribution. Our experiments demonstrate that MG-Select achieves significant performance improvements, including a 28%/35% improvement in real-world in-distribution/out-of-distribution tasks, along with a 168% relative gain on RoboCasa pick-and-place tasks trained with 30 demonstrations.
comment: 14 pages; 3 figures
DeLTa: Demonstration and Language-Guided Novel Transparent Object Manipulation
Despite the prevalence of transparent object interactions in human everyday life, transparent robotic manipulation research remains limited to short-horizon tasks and basic grasping capabilities.Although some methods have partially addressed these issues, most of them have limitations in generalizability to novel objects and are insufficient for precise long-horizon robot manipulation. To address this limitation, we propose DeLTa (Demonstration and Language-Guided Novel Transparent Object Manipulation), a novel framework that integrates depth estimation, 6D pose estimation, and vision-language planning for precise long-horizon manipulation of transparent objects guided by natural task instructions. A key advantage of our method is its single-demonstration approach, which generalizes 6D trajectories to novel transparent objects without requiring category-level priors or additional training. Additionally, we present a task planner that refines the VLM-generated plan to account for the constraints of a single-arm, eye-in-hand robot for long-horizon object manipulation tasks. Through comprehensive evaluation, we demonstrate that our method significantly outperforms existing transparent object manipulation approaches, particularly in long-horizon scenarios requiring precise manipulation capabilities. Project page: https://sites.google.com/view/DeLTa25/
comment: Project page: https://sites.google.com/view/DeLTa25/
MetaVLA: Unified Meta Co-training For Efficient Embodied Adaption
Vision-Language-Action (VLA) models show promise in embodied reasoning, yet remain far from true generalists-they often require task-specific fine-tuning, and generalize poorly to unseen tasks. We propose MetaVLA, a unified, backbone-agnostic post-training framework for efficient and scalable alignment. MetaVLA introduces Context-Aware Meta Co-Training, which consolidates diverse target tasks into a single fine-tuning stage while leveraging structurally diverse auxiliary tasks to improve in-domain generalization. Unlike naive multi-task SFT, MetaVLA integrates a lightweight meta-learning mechanism-derived from Attentive Neural Processes-to enable rapid adaptation from diverse contexts with minimal architectural change or inference overhead. On the LIBERO benchmark, MetaVLA with six auxiliary tasks outperforms OpenVLA by up to 8.0% on long-horizon tasks, reduces training steps from 240K to 75K, and cuts GPU time by ~76%. These results show that scalable, low-resource post-training is achievable-paving the way toward general-purpose embodied agents. Code will be available.
GO-Flock: Goal-Oriented Flocking in 3D Unknown Environments with Depth Maps
Artificial Potential Field (APF) methods are widely used for reactive flocking control, but they often suffer from challenges such as deadlocks and local minima, especially in the presence of obstacles. Existing solutions to address these issues are typically passive, leading to slow and inefficient collective navigation. As a result, many APF approaches have only been validated in obstacle-free environments or simplified, pseudo 3D simulations. This paper presents GO-Flock, a hybrid flocking framework that integrates planning with reactive APF-based control. GO-Flock consists of an upstream Perception Module, which processes depth maps to extract waypoints and virtual agents for obstacle avoidance, and a downstream Collective Navigation Module, which applies a novel APF strategy to achieve effective flocking behavior in cluttered environments. We evaluate GO-Flock against passive APF-based approaches to demonstrate their respective merits, such as their flocking behavior and the ability to overcome local minima. Finally, we validate GO-Flock through obstacle-filled environment and also hardware-in-the-loop experiments where we successfully flocked a team of nine drones, six physical and three virtual, in a forest environment.
ARRC: Advanced Reasoning Robot Control - Knowledge-Driven Autonomous Manipulation Using Retrieval-Augmented Generation
We present ARRC (Advanced Reasoning Robot Control), a practical system that connects natural-language instructions to safe local robotic control by combining Retrieval-Augmented Generation (RAG) with RGB-D perception and guarded execution on an affordable robot arm. The system indexes curated robot knowledge (movement patterns, task templates, and safety heuristics) in a vector database, retrieves task-relevant context for each instruction, and conditions a large language model (LLM) to produce JSON-structured action plans. Plans are executed on a UFactory xArm 850 fitted with a Dynamixel-driven parallel gripper and an Intel RealSense D435 camera. Perception uses AprilTag detections fused with depth to produce object-centric metric poses. Execution is enforced via software safety gates: workspace bounds, speed and force caps, timeouts, and bounded retries. We describe the architecture, knowledge design, integration choices, and a reproducible evaluation protocol for tabletop scan, approach, and pick-place tasks. Experimental results demonstrate the efficacy of the proposed approach. Our design shows that RAG-based planning can substantially improve plan validity and adaptability while keeping perception and low-level control local to the robot.
Correlation-Aware Dual-View Pose and Velocity Estimation for Dynamic Robotic Manipulation
Accurate pose and velocity estimation is essential for effective spatial task planning in robotic manipulators. While centralized sensor fusion has traditionally been used to improve pose estimation accuracy, this paper presents a novel decentralized fusion approach to estimate both pose and velocity. We use dual-view measurements from an eye-in-hand and an eye-to-hand vision sensor configuration mounted on a manipulator to track a target object whose motion is modeled as random walk (stochastic acceleration model). The robot runs two independent adaptive extended Kalman filters formulated on a matrix Lie group, developed as part of this work. These filters predict poses and velocities on the manifold $\mathbb{SE}(3) \times \mathbb{R}^3 \times \mathbb{R}^3$ and update the state on the manifold $\mathbb{SE}(3)$. The final fused state comprising the fused pose and velocities of the target is obtained using a correlation-aware fusion rule on Lie groups. The proposed method is evaluated on a UFactory xArm 850 equipped with Intel RealSense cameras, tracking a moving target. Experimental results validate the effectiveness and robustness of the proposed decentralized dual-view estimation framework, showing consistent improvements over state-of-the-art methods.
Real-Time Glass Detection and Reprojection using Sensor Fusion Onboard Aerial Robots ICRA 2026
Autonomous aerial robots are increasingly being deployed in real-world scenarios, where transparent obstacles present significant challenges to reliable navigation and mapping. These materials pose a unique problem for traditional perception systems because they lack discernible features and can cause conventional depth sensors to fail, leading to inaccurate maps and potential collisions. To ensure safe navigation, robots must be able to accurately detect and map these transparent obstacles. Existing methods often rely on large, expensive sensors or algorithms that impose high computational burdens, making them unsuitable for low Size, Weight, and Power (SWaP) robots. In this work, we propose a novel and computationally efficient framework for detecting and mapping transparent obstacles onboard a sub-300g quadrotor. Our method fuses data from a Time-of-Flight (ToF) camera and an ultrasonic sensor with a custom, lightweight 2D convolution model. This specialized approach accurately detects specular reflections and propagates their depth into corresponding empty regions of the depth map, effectively rendering transparent obstacles visible. The entire pipeline operates in real-time, utilizing only a small fraction of a CPU core on an embedded processor. We validate our system through a series of experiments in both controlled and real-world environments, demonstrating the utility of our method through experiments where the robot maps indoor environments containing glass. Our work is, to our knowledge, the first of its kind to demonstrate a real-time, onboard transparent obstacle mapping system on a low-SWaP quadrotor using only the CPU.
comment: 8 pages, 8 figures, submitted to ICRA 2026
What You Don't Know Can Hurt You: How Well do Latent Safety Filters Understand Partially Observable Safety Constraints?
Safe control techniques, such as Hamilton-Jacobi reachability, provide principled methods for synthesizing safety-preserving robot policies but typically assume hand-designed state spaces and full observability. Recent work has relaxed these assumptions via latent-space safe control, where state representations and dynamics are learned jointly through world models that reconstruct future high-dimensional observations (e.g., RGB images) from current observations and actions. This enables safety constraints that are difficult to specify analytically (e.g., spilling) to be framed as classification problems in latent space, allowing controllers to operate directly from raw observations. However, these methods assume that safety-critical features are observable in the learned latent state. We ask: when are latent state spaces sufficient for safe control? To study this, we examine temperature-based failures, comparable to overheating in cooking or manufacturing tasks, and find that RGB-only observations can produce myopic safety behaviors, e.g., avoiding seeing failure states rather than preventing failure itself. To predict such behaviors, we introduce a mutual information-based measure that identifies when observations fail to capture safety-relevant features. Finally, we propose a multimodal-supervised training strategy that shapes the latent state with additional sensory inputs during training, but requires no extra modalities at deployment, and validate our approach in simulation and on hardware with a Franka Research 3 manipulator preventing a pot of wax from overheating.
comment: 8 tables 6 figures
Active Next-Best-View Optimization for Risk-Averse Path Planning
Safe navigation in uncertain environments requires planning methods that integrate risk aversion with active perception. In this work, we present a unified framework that refines a coarse reference path by constructing tail-sensitive risk maps from Average Value-at-Risk statistics on an online-updated 3D Gaussian-splat Radiance Field. These maps enable the generation of locally safe and feasible trajectories. In parallel, we formulate Next-Best-View (NBV) selection as an optimization problem on the SE(3) pose manifold, where Riemannian gradient descent maximizes an expected information gain objective to reduce uncertainty most critical for imminent motion. Our approach advances the state-of-the-art by coupling risk-averse path refinement with NBV planning, while introducing scalable gradient decompositions that support efficient online updates in complex environments. We demonstrate the effectiveness of the proposed framework through extensive computational studies.
Terrain-Aided Navigation Using a Point Cloud Measurement Sensor
We investigate the use of a point cloud measurement in terrain-aided navigation. Our goal is to aid an inertial navigation system, by exploring ways to generate a useful measurement innovation error for effective nonlinear state estimation. We compare two such measurement models that involve the scanning of a digital terrain elevation model: a) one that is based on typical ray-casting from a given pose, that returns the predicted point cloud measurement from that pose, and b) another computationally less intensive one that does not require raycasting and we refer to herein as a sliding grid. Besides requiring a pose, it requires the pattern of the point cloud measurement itself and returns a predicted point cloud measurement. We further investigate the observability properties of the altitude for both measurement models. As a baseline, we compare the use of a point cloud measurement performance to the use of a radar altimeter and show the gains in accuracy. We conclude by showing that a point cloud measurement outperforms the use of a radar altimeter, and the point cloud measurement model to use depends on the computational resources
Three-dimensional Integrated Guidance and Control for Leader-Follower Flexible Formation of Fixed Wing UAVs
This paper presents a nonlinear integrated guidance and control (IGC) approach for flexible leader-follower formation flight of fixed-wing unmanned aerial vehicles (UAVs) while accounting for high-fidelity aerodynamics and thrust dynamics. Unlike conventional leader-follower schemes that fix the follower's position relative to the leader, the follower is steered to maintain range and bearing angles (which is the angle between its velocity vector and its line-of-sight (LOS) with respect to the leader) arbitrarily close to the prescribed values, enabling the follower to maintain formation on a hemispherical region behind the leader. The proposed IGC framework directly maps leader-follower relative range dynamics to throttle commands, and the follower's velocity orientation relative to the LOS to aerodynamic control surface deflections. This enables synergism between guidance and control subsystems. The control design uses a dynamic surface control-based backstepping approach to achieve convergence to the desired formation set, where Lyapunov barrier functions are incorporated to ensure the follower's bearing angle is constrained within specified bounds. Rigorous stability analysis guarantees uniform ultimate boundedness of all error states and strict constraint satisfaction in the presence of aerodynamic nonlinearities. The proposed flexible formation scheme allows the follower to have an orientation mismatch relative to the leader to execute anticipatory reconfiguration by transitioning between the relative positions in the admissible formation set when the leader aggressively maneuvers. The proposed IGC law relies only on relative information and onboard sensors without the information about the leader's maneuver, making it suitable for GPS-denied or non-cooperative scenarios. Finally, we present simulation results to vindicate the effectiveness and robustness of our approach.
Constrained Natural Language Action Planning for Resilient Embodied Systems
Replicating human-level intelligence in the execution of embodied tasks remains challenging due to the unconstrained nature of real-world environments. Novel use of large language models (LLMs) for task planning seeks to address the previously intractable state/action space of complex planning tasks, but hallucinations limit their reliability, and thus, viability beyond a research context. Additionally, the prompt engineering required to achieve adequate system performance lacks transparency, and thus, repeatability. In contrast to LLM planning, symbolic planning methods offer strong reliability and repeatability guarantees, but struggle to scale to the complexity and ambiguity of real-world tasks. We introduce a new robotic planning method that augments LLM planners with symbolic planning oversight to improve reliability and repeatability, and provide a transparent approach to defining hard constraints with considerably stronger clarity than traditional prompt engineering. Importantly, these augmentations preserve the reasoning capabilities of LLMs and retain impressive generalization in open-world environments. We demonstrate our approach in simulated and real-world environments. On the ALFWorld planning benchmark, our approach outperforms current state-of-the-art methods, achieving a near-perfect 99% success rate. Deployment of our method to a real-world quadruped robot resulted in 100% task success compared to 50% and 30% for pure LLM and symbolic planners across embodied pick and place tasks. Our approach presents an effective strategy to enhance the reliability, repeatability and transparency of LLM-based robot planners while retaining their key strengths: flexibility and generalizability to complex real-world environments. We hope that this work will contribute to the broad goal of building resilient embodied intelligent systems.
A Formal gatekeeper Framework for Safe Dual Control with Active Exploration
Planning safe trajectories under model uncertainty is a fundamental challenge. Robust planning ensures safety by considering worst-case realizations, yet ignores uncertainty reduction and leads to overly conservative behavior. Actively reducing uncertainty on-the-fly during a nominal mission defines the dual control problem. Most approaches address this by adding a weighted exploration term to the cost, tuned to trade off the nominal objective and uncertainty reduction, but without formal consideration of when exploration is beneficial. Moreover, safety is enforced in some methods but not in others. We propose a framework that integrates robust planning with active exploration under formal guarantees as follows: The key innovation and contribution is that exploration is pursued only when it provides a verifiable improvement without compromising safety. To achieve this, we utilize our earlier work on gatekeeper as an architecture for safety verification, and extend it so that it generates both safe and informative trajectories that reduce uncertainty and the cost of the mission, or keep it within a user-defined budget. The methodology is evaluated via simulation case studies on the online dual control of a quadrotor under parametric uncertainty.
comment: Submitted to American Control Conference (ACC) 2026
Vi-TacMan: Articulated Object Manipulation via Vision and Touch
Autonomous manipulation of articulated objects remains a fundamental challenge for robots in human environments. Vision-based methods can infer hidden kinematics but can yield imprecise estimates on unfamiliar objects. Tactile approaches achieve robust control through contact feedback but require accurate initialization. This suggests a natural synergy: vision for global guidance, touch for local precision. Yet no framework systematically exploits this complementarity for generalized articulated manipulation. Here we present Vi-TacMan, which uses vision to propose grasps and coarse directions that seed a tactile controller for precise execution. By incorporating surface normals as geometric priors and modeling directions via von Mises-Fisher distributions, our approach achieves significant gains over baselines (all p<0.0001). Critically, manipulation succeeds without explicit kinematic models -- the tactile controller refines coarse visual estimates through real-time contact regulation. Tests on more than 50,000 simulated and diverse real-world objects confirm robust cross-category generalization. This work establishes that coarse visual cues suffice for reliable manipulation when coupled with tactile feedback, offering a scalable paradigm for autonomous systems in unstructured environments.
Bioinspired Tapered-Spring Turbulence Sensor for Underwater Flow Detection
This paper presents a bio-inspired underwater whisker sensor for robust hydrodynamic disturbance detection and efficient signal analysis based on Physical Reservoir Computing (PRC). The design uses a tapered nylon spring with embedded accelerometers to achieve spatially distributed vibration sensing and frequency separation along the whisker. Towing-tank experiments and computational fluid dynamics simulations confirmed that the whisker effectively distinguishes vortex regimes across different fin angles and maintains Strouhal scaling with flow velocity, where higher speeds increase vibration intensity without affecting the dominant frequencies. Frequency-domain analysis, Shannon entropy, and machine learning further validated the sensing performance: vortex shedding frequencies were identified with less than 10\% error, entropy captured the transition from coherent vortex streets to turbulence, and logistic regression achieved 86.0\% classification accuracy with millisecond-level inference. These results demonstrate that structurally encoded whisker sensing provides a scalable and real-time solution for underwater perception, wake tracking, and turbulence-aware navigation in autonomous marine robots.
comment: 9 pages, 9 figures
pRRTC: GPU-Parallel RRT-Connect for Fast, Consistent, and Low-Cost Motion Planning
Sampling-based motion planning algorithms, like the Rapidly-Exploring Random Tree (RRT) and its widely used variant, RRT-Connect, provide efficient solutions for high-dimensional planning problems faced by real-world robots. However, these methods remain computationally intensive, particularly in complex environments that require many collision checks. To improve performance, recent efforts have explored parallelizing specific components of RRT such as collision checking, or running multiple planners independently. However, little has been done to develop an integrated parallelism approach, co-designed for large-scale parallelism. In this work we present pRRTC, a RRT-Connect based planner co-designed for GPU acceleration across the entire algorithm through parallel expansion and SIMT-optimized collision checking. We evaluate the effectiveness of pRRTC on the MotionBenchMaker dataset using robots with 7, 8, and 14 degrees of freedom (DoF). Compared to the state-of-the-art, pRRTC achieves as much as a 10x speedup on constrained reaching tasks with a 5.4x reduction in standard deviation. pRRTC also achieves a 1.4x reduction in average initial path cost. Finally, we deploy pRRTC on a 14-DoF dual Franka Panda arm setup and demonstrate real-time, collision-free motion planning with dynamic obstacles. We open-source our planner to support the wider community.
comment: 7 pages, 7 figures, 1 table. Submitted to IEEE International Conference on Robotics and Automation 2026
BC-ADMM: An Efficient Non-convex Constrained Optimizer with Robotic Applications
Non-convex constrained optimizations are ubiquitous in robotic applications such as multi-agent navigation, UAV trajectory optimization, and soft robot simulation. For this problem class, conventional optimizers suffer from small step sizes and slow convergence. We propose BC-ADMM, a variant of Alternating Direction Method of Multiplier (ADMM), that can solve a class of non-convex constrained optimizations with biconvex constraint relaxation. Our algorithm allows larger step sizes by breaking the problem into small-scale sub-problems that can be easily solved in parallel. We show that our method has both theoretical convergence speed guarantees and practical convergence guarantees in the asymptotic sense. Through numerical experiments in a row of four robotic applications, we show that BC-ADMM has faster convergence than conventional gradient descent and Newton's method in terms of wall clock time.
Toward Dynamic Control of Tendon-driven Continuum Robots using Clarke Transform IROS 2025
In this paper, we propose a dynamic model and control framework for tendon-driven continuum robots (TDCRs) with multiple segments and an arbitrary number of tendons per segment. Our approach leverages the Clarke transform, the Euler-Lagrange formalism, and the piecewise constant curvature assumption to formulate a dynamic model on a two-dimensional manifold embedded in the joint space that inherently satisfies tendon constraints. We present linear and constraint-informed controllers that operate directly on this manifold, along with practical methods for preventing negative tendon forces without compromising control fidelity. This opens up new design possibilities for overactuated TDCRs with improved force distribution and stiffness without increasing controller complexity. We validate these approaches in simulation and on a physical prototype with one segment and five tendons, demonstrating accurate dynamic behavior and robust trajectory tracking under real-time conditions.
comment: Accepted for publication at IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025), 8 pages, and 8 figures
CottonSim: A vision-guided autonomous robotic system for cotton harvesting in Gazebo simulation
Cotton is a major cash crop in the United States, with the country being a leading global producer and exporter. Nearly all U.S. cotton is grown in the Cotton Belt, spanning 17 states in the southern region. Harvesting remains a critical yet challenging stage, impacted by the use of costly, environmentally harmful defoliants and heavy, expensive cotton pickers. These factors contribute to yield loss, reduced fiber quality, and soil compaction, which collectively threaten long-term sustainability. To address these issues, this study proposes a lightweight, small-scale, vision-guided autonomous robotic cotton picker as an alternative. An autonomous system, built on Clearpath's Husky platform and integrated with the CottonEye perception system, was developed and tested in the Gazebo simulation environment. A virtual cotton field was designed to facilitate autonomous navigation testing. The navigation system used Global Positioning System (GPS) and map-based guidance, assisted by an RGBdepth camera and a YOLOv8nseg instance segmentation model. The model achieved a mean Average Precision (mAP) of 85.2%, a recall of 88.9%, and a precision of 93.0%. The GPS-based approach reached a 100% completion rate (CR) within a $(5e-6)^{\circ}$ threshold, while the map-based method achieved a 96.7% CR within a 0.25 m threshold. The developed Robot Operating System (ROS) packages enable robust simulation of autonomous cotton picking, offering a scalable baseline for future agricultural robotics. CottonSim code and datasets are publicly available on GitHub: https://github.com/imtheva/CottonSim
comment: 16 pages, 15 figures, 4 tables
Capturing a Moving Target by Two Robots in the F2F Model
We study a search problem on capturing a moving target on an infinite real line. Two autonomous mobile robots (which can move with a maximum speed of 1) are initially placed at the origin, while an oblivious moving target is initially placed at a distance $d$ away from the origin. The robots can move along the line in any direction, but the target is oblivious, cannot change direction, and moves either away from or toward the origin at a constant speed $v$. Our aim is to design efficient algorithms for the two robots to capture the target. The target is captured only when both robots are co-located with it. The robots communicate with each other only face-to-face (F2F), meaning they can exchange information only when co-located, while the target remains oblivious and has no communication capabilities. We design algorithms under various knowledge scenarios, which take into account the prior knowledge the robots have about the starting distance $d$, the direction of movement (either toward or away from the origin), and the speed $v$ of the target. As a measure of the efficiency of the algorithms, we use the competitive ratio, which is the ratio of the capture time of an algorithm with limited knowledge to the capture time in the full-knowledge model. In our analysis, we are mindful of the cost of changing direction of movement, and show how to accomplish the capture of the target with at most three direction changes (turns).
Emergent interactions lead to collective frustration in robotic matter
Current artificial intelligence systems show near-human-level capabilities when deployed in isolation. Systems of a few collaborating intelligent agents are being engineered to perform tasks collectively. This raises the question of whether robotic matter, where many learning and intelligent agents interact, shows emergence of collective behaviour. And if so, which kind of phenomena would such systems exhibit? Here, we study a paradigmatic model for robotic matter: a stochastic many-particle system in which each particle is endowed with a deep neural network that predicts its transitions based on the particles' environments. For a one-dimensional model, we show that robotic matter exhibits complex emergent phenomena, including transitions between long-lived learning regimes, the emergence of particle species, and frustration. We also find a density-dependent phase transition with signatures of criticality. Using active matter theory, we show that this phase transition is a consequence of self-organisation mediated by emergent inter-particle interactions. Our simple model captures key features of more complex forms of robotic systems.
mindmap: Spatial Memory in Deep Feature Maps for 3D Action Policies
End-to-end learning of robot control policies, structured as neural networks, has emerged as a promising approach to robotic manipulation. To complete many common tasks, relevant objects are required to pass in and out of a robot's field of view. In these settings, spatial memory - the ability to remember the spatial composition of the scene - is an important competency. However, building such mechanisms into robot learning systems remains an open research problem. We introduce mindmap (Spatial Memory in Deep Feature Maps for 3D Action Policies), a 3D diffusion policy that generates robot trajectories based on a semantic 3D reconstruction of the environment. We show in simulation experiments that our approach is effective at solving tasks where state-of-the-art approaches without memory mechanisms struggle. We release our reconstruction system, training code, and evaluation tasks to spur research in this direction.
comment: Accepted to CoRL 2025 Workshop RemembeRL
FlowVLA: Visual Chain of Thought-based Motion Reasoning for Vision-Language-Action Models
Many Vision-Language-Action (VLA) models are built upon an internal world model trained via next-frame prediction ``$v_t \rightarrow v_{t+1}$''. However, this paradigm attempts to predict the future frame's appearance directly, without explicitly reasoning about the underlying dynamics. \textbf{This lack of an explicit motion reasoning step} often leads to physically implausible visual forecasts and inefficient policy learning. To address this limitation, we introduce the \textbf{Visual Chain of Thought (Visual CoT)}, a paradigm that compels the model to first reason about \textbf{motion dynamics} before generating the future frame. We instantiate this paradigm by proposing \textbf{FlowVLA}, an autoregressive Transformer that explicitly materializes this reasoning process as ``$v_t \rightarrow f_t \rightarrow v_{t+1}$'', where $f_t$ is an intermediate optical flow prediction that inherently encodes motion. By forcing the model to first follow the motion plan encoded by $f_t$, this process inherently \textbf{aligns the pre-training objective of dynamics prediction with the downstream task of action generation.} We conduct experiments on challenging robotics manipulation benchmarks, as well as real-robot evaluations. Our FlowVLA not only generates \textbf{more coherent and physically plausible visual predictions}, but also achieves state-of-the-art policy performance with \textbf{substantially improved sample efficiency}, pointing toward a more principled foundation for world modeling in VLAs. Project page: https://irpn-lab.github.io/FlowVLA/
Equivariant Filter for Relative Attitude and Target's Angular Velocity Estimation
Accurate estimation of the relative attitude and angular velocity between two rigid bodies is fundamental in aerospace applications such as spacecraft rendezvous and docking. In these scenarios, a chaser vehicle must determine the orientation and angular velocity of a target object using onboard sensors. This work addresses the challenge of designing an Equivariant Filter (EqF) that can reliably estimate both the relative attitude and the target angular velocity using noisy observations of two known, non-collinear vectors fixed in the target frame. To derive the EqF, a symmetry for the system is proposed and an equivariant lift onto the symmetry group is calculated. Observability and convergence properties are analyzed. Simulations demonstrate the filter's performance, with Monte Carlo runs yielding statistically significant results. The impact of low-rate measurements is also examined and a strategy to mitigate this effect is proposed. Experimental results, using fiducial markers and both conventional and event cameras for measurement acquisition, further validate the approach, confirming its effectiveness in a realistic setting.
comment: This work has been submitted to the IEEE for possible publication
Identifying Uncertainty in Self-Adaptive Robotics with Large Language Models
Future self-adaptive robots are expected to operate in highly dynamic environments while effectively managing uncertainties. However, identifying the sources and impacts of uncertainties in such robotic systems and defining appropriate mitigation strategies is challenging due to the inherent complexity of self-adaptive robots and the lack of comprehensive knowledge about the various factors influencing uncertainty. Hence, practitioners often rely on intuition and past experiences from similar systems to address uncertainties. In this article, we evaluate the potential of large language models (LLMs) in enabling a systematic and automated approach to identify uncertainties in self-adaptive robotics throughout the software engineering lifecycle. For this evaluation, we analyzed 10 advanced LLMs with varying capabilities across four industrial-sized robotics case studies, gathering the practitioners' perspectives on the LLM-generated responses related to uncertainties. Results showed that practitioners agreed with 63-88% of the LLM responses and expressed strong interest in the practicality of LLMs for this purpose.
Image-Based Visual Servoing for Enhanced Cooperation of Dual-Arm Manipulation
The cooperation of a pair of robot manipulators is required to manipulate a target object without any fixtures. The conventional control methods coordinate the end-effector pose of each manipulator with that of the other using their kinematics and joint coordinate measurements. Yet, the manipulators' inaccurate kinematics and joint coordinate measurements can cause significant pose synchronization errors in practice. This paper thus proposes an image-based visual servoing approach for enhancing the cooperation of a dual-arm manipulation system. On top of the classical control, the visual servoing controller lets each manipulator use its carried camera to measure the image features of the other's marker and adapt its end-effector pose with the counterpart on the move. Because visual measurements are robust to kinematic errors, the proposed control can reduce the end-effector pose synchronization errors and the fluctuations of the interaction forces of the pair of manipulators on the move. Theoretical analyses have rigorously proven the stability of the closed-loop system. Comparative experiments on real robots have substantiated the effectiveness of the proposed control.
comment: 8 pages, 7 figures. Project website: https://zizhe.io/ral-ibvs-enhanced/. This work has been accepted to the IEEE Robotics and Automation Letters in Feb 2025
Interpreting Behaviors and Geometric Constraints as Knowledge Graphs for Robot Manipulation Control
In this paper, we investigate the feasibility of using knowledge graphs to interpret actions and behaviors for robot manipulation control. Equipped with an uncalibrated visual servoing controller, we propose to use robot knowledge graphs to unify behavior trees and geometric constraints, conceptualizing robot manipulation control as semantic events. The robot knowledge graphs not only preserve the advantages of behavior trees in scripting actions and behaviors, but also offer additional benefits of mapping natural interactions between concepts and events, which enable knowledgeable explanations of the manipulation contexts. Through real-world evaluations, we demonstrate the flexibility of the robot knowledge graphs to support explainable robot manipulation control.
RoboMemory: A Brain-inspired Multi-memory Agentic Framework for Interactive Environmental Learning in Physical Embodied Systems
Embodied agents face persistent challenges in real-world environments, including partial observability, limited spatial reasoning, and high-latency multi-memory integration. We present RoboMemory, a brain-inspired framework that unifies Spatial, Temporal, Episodic, and Semantic memory under a parallelized architecture for efficient long-horizon planning and interactive environmental learning. A dynamic spatial knowledge graph (KG) ensures scalable and consistent memory updates, while a closed-loop planner with a critic module supports adaptive decision-making in dynamic settings. Experiments on EmbodiedBench show that RoboMemory, built on Qwen2.5-VL-72B-Ins, improves average success rates by 25% over its baseline and exceeds the closed-source state-of-the-art (SOTA) Gemini-1.5-Pro by 3%. Real-world trials further confirm its capacity for cumulative learning, with performance improving across repeated tasks. These results highlight RoboMemory as a scalable foundation for memory-augmented embodied intelligence, bridging the gap between cognitive neuroscience and robotic autonomy.
Self-Supervised Representation Learning with Joint Embedding Predictive Architecture for Automotive LiDAR Object Detection
Recently, self-supervised representation learning relying on vast amounts of unlabeled data has been explored as a pre-training method for autonomous driving. However, directly applying popular contrastive or generative methods to this problem is insufficient and may even lead to negative transfer. In this paper, we present AD-L-JEPA, a novel self-supervised pre-training framework with a joint embedding predictive architecture (JEPA) for automotive LiDAR object detection. Unlike existing methods, AD-L-JEPA is neither generative nor contrastive. Instead of explicitly generating masked regions, our method predicts Bird's-Eye-View embeddings to capture the diverse nature of driving scenes. Furthermore, our approach eliminates the need to manually form contrastive pairs by employing explicit variance regularization to avoid representation collapse. Experimental results demonstrate consistent improvements on the LiDAR 3D object detection downstream task across the KITTI3D, Waymo, and ONCE datasets, while reducing GPU hours by 1.9x-2.7x and GPU memory by 2.8x-4x compared with the state-of-the-art method Occupancy-MAE. Notably, on the largest ONCE dataset, pre-training on 100K frames yields a 1.61 mAP gain, better than all other methods pre-trained on either 100K or 500K frames, and pre-training on 500K frames yields a 2.98 mAP gain, better than all other methods pre-trained on either 500K or 1M frames. AD-L-JEPA constitutes the first JEPA-based pre-training method for autonomous driving. It offers better quality, faster, and more GPU-memory-efficient self-supervised representation learning. The source code of AD-L-JEPA is ready to be released.
Multiagent Systems
Improved High-probability Convergence Guarantees of Decentralized SGD
Convergence in high-probability (HP) has been receiving increasing interest, due to its attractive properties, such as exponentially decaying tail bounds and strong guarantees for each individual run of an algorithm. While HP guarantees are extensively studied in centralized settings, much less is understood in the decentralized, networked setup. Existing HP studies in decentralized settings impose strong assumptions, like uniformly bounded gradients, or asymptotically vanishing noise, resulting in a significant gap between assumptions used to establish convergence in the HP and the mean-squared error (MSE) sense, even for vanilla Decentralized Stochastic Gradient Descent ($\mathtt{DSGD}$) algorithm. This is contrary to centralized settings, where it is known that $\mathtt{SGD}$ converges in HP under the same conditions on the cost function as needed to guarantee MSE convergence. Motivated by this observation, we revisit HP guarantees for $\mathtt{DSGD}$ in the presence of light-tailed noise. We show that $\mathtt{DSGD}$ converges in HP under the same conditions on the cost as in the MSE sense, removing uniformly bounded gradients and other restrictive assumptions, while simultaneously achieving order-optimal rates for both non-convex and strongly convex costs. Moreover, our improved analysis yields linear speed-up in the number of users, demonstrating that $\mathtt{DSGD}$ maintains strong performance in the HP sense and matches existing MSE guarantees. Our improved results stem from a careful analysis of the MGF of quantities of interest (norm-squared of gradient or optimality gap) and the MGF of the consensus gap between users' models. To achieve linear speed-up, we provide a novel result on the variance-reduction effect of decentralized methods in the HP sense and more fine-grained bounds on the MGF for strongly convex costs, which are both of independent interest.
comment: 39 pages
A Timed Obstruction Logic for Dynamic Game Models
Real-time cybersecurity and privacy applications require reliable verification methods and system design tools to ensure their correctness. Many of these reactive real-time applications embedded in various infrastructures, such as airports, hospitals, and oil pipelines, are potentially vulnerable to malicious cyber-attacks. Recently, a growing literature has recognized Timed Game Theory as a sound theoretical foundation for modeling strategic interactions between attackers and defenders. This paper proposes Timed Obstruction Logic (TOL), an extension of Obstruction Logic (OL), a formalism for verifying specific timed games with real-time objectives unfolding in dynamic models. These timed games involve players whose discrete and continuous actions can impact the underlying timed game model. We show that TOL can be used to describe important timed properties of real-time cybersecurity games. Finally, in addition to introducing our new logic and adapting it to specify properties in the context of cybersecurity, we provide a verification procedure for TOL and show that its complexity is PSPACE-complete, meaning that it is not higher than that of classical timed temporal logics like TCTL. Thus, we increase the expressiveness of properties without incurring any cost in terms of complexity.
Agent+P: Guiding UI Agents via Symbolic Planning
Large Language Model (LLM)-based UI agents show great promise for UI automation but often hallucinate in long-horizon tasks due to their lack of understanding of the global UI transition structure. To address this, we introduce AGENT+P, a novel framework that leverages symbolic planning to guide LLM-based UI agents. Specifically, we model an app's UI transition structure as a UI Transition Graph (UTG), which allows us to reformulate the UI automation task as a pathfinding problem on the UTG. This further enables an off-the-shelf symbolic planner to generate a provably correct and optimal high-level plan, preventing the agent from redundant exploration and guiding the agent to achieve the automation goals. AGENT+P is designed as a plug-and-play framework to enhance existing UI agents. Evaluation on the AndroidWorld benchmark demonstrates that AGENT+P improves the success rates of state-of-the-art UI agents by up to 14% and reduces the action steps by 37.7%.
Emergent Directedness in Social Contagion
An enduring challenge in contagion theory is that the pathways contagions follow through social networks exhibit emergent complexities that are difficult to predict using network structure. Here, we address this challenge by developing a causal modeling framework that (i) simulates the possible network pathways that emerge as contagions spread and (ii) identifies which edges and nodes are most impactful on diffusion across these possible pathways. This yields a surprising discovery. If people require exposure to multiple peers to adopt a contagion (a.k.a., 'complex contagions'), the pathways that emerge often only work in one direction. In fact, the more complex a contagion is, the more asymmetric its paths become. This emergent directedness problematizes canonical theories of how networks mediate contagion. Weak ties spanning network regions - widely thought to facilitate mutual influence and integration - prove to privilege the spread contagions from one community to the other. Emergent directedness also disproportionately channels complex contagions from the network periphery to the core, inverting standard centrality models. We demonstrate two practical applications. We show that emergent directedness accounts for unexplained nonlinearity in the effects of tie strength in a recent study of job diffusion over LinkedIn. Lastly, we show that network evolution is biased toward growing directed paths, but that cultural factors (e.g., triadic closure) can curtail this bias, with strategic implications for network building and behavioral interventions.
comment: 36 pages, 6 figures, plus 30-page appendix with 15 figures
RareAgent: Self-Evolving Reasoning for Drug Repurposing in Rare Diseases
Computational drug repurposing for rare diseases is especially challenging when no prior associations exist between drugs and target diseases. Therefore, knowledge graph completion and message-passing GNNs have little reliable signal to learn and propagate, resulting in poor performance. We present RareAgent, a self-evolving multi-agent system that reframes this task from passive pattern recognition to active evidence-seeking reasoning. RareAgent organizes task-specific adversarial debates in which agents dynamically construct evidence graphs from diverse perspectives to support, refute, or entail hypotheses. The reasoning strategies are analyzed post hoc in a self-evolutionary loop, producing textual feedback that refines agent policies, while successful reasoning paths are distilled into transferable heuristics to accelerate future investigations. Comprehensive evaluations reveal that RareAgent improves the indication AUPRC by 18.1% over reasoning baselines and provides a transparent reasoning chain consistent with clinical evidence.
Federated Split Learning for Resource-Constrained Robots in Industrial IoT: Framework Comparison, Optimization Strategies, and Future Directions
Federated split learning (FedSL) has emerged as a promising paradigm for enabling collaborative intelligence in industrial Internet of Things (IoT) systems, particularly in smart factories where data privacy, communication efficiency, and device heterogeneity are critical concerns. In this article, we present a comprehensive study of FedSL frameworks tailored for resource-constrained robots in industrial scenarios. We compare synchronous, asynchronous, hierarchical, and heterogeneous FedSL frameworks in terms of workflow, scalability, adaptability, and limitations under dynamic industrial conditions. Furthermore, we systematically categorize token fusion strategies into three paradigms: input-level (pre-fusion), intermediate-level (intra-fusion), and output-level (post-fusion), and summarize their respective strengths in industrial applications. We also provide adaptive optimization techniques to enhance the efficiency and feasibility of FedSL implementation, including model compression, split layer selection, computing frequency allocation, and wireless resource management. Simulation results validate the performance of these frameworks under industrial detection scenarios. Finally, we outline open issues and research directions of FedSL in future smart manufacturing systems.
comment: 9 pages, 5 figures, submitted to the IEEE magazine
Generative AI-Driven Hierarchical Multi-Agent Framework for Zero-Touch Optical Networks
The rapid development of Generative Artificial Intelligence (GenAI) has catalyzed a transformative technological revolution across all walks of life. As the backbone of wideband communication, optical networks are expecting high-level autonomous operation and zero-touch management to accommodate their expanding network scales and escalating transmission bandwidth. The integration of GenAI is deemed as the pivotal solution for realizing zero-touch optical networks. However, the lifecycle management of optical networks involves a multitude of tasks and necessitates seamless collaboration across multiple layers, which poses significant challenges to the existing single-agent GenAI systems. In this paper, we propose a GenAI-driven hierarchical multi-agent framework designed to streamline multi-task autonomous execution for zero-touch optical networks. We present the architecture, implementation, and applications of this framework. A field-deployed mesh network is utilized to demonstrate three typical scenarios throughout the lifecycle of optical network: quality of transmission estimation in the planning stage, dynamic channel adding/dropping in the operation stage, and system capacity increase in the upgrade stage. The case studies, illustrate the capabilities of multi-agent framework in multi-task allocation, coordination, execution, evaluation, and summarization. This work provides a promising approach for the future development of intelligent, efficient, and collaborative network management solutions, paving the way for more specialized and adaptive zero-touch optical networks.
comment: 7 pages,6 figures, Accepted by lEEE Communications Magazine, Open call
Decoupling Correctness from Policy: A Deterministic Causal Structure for Multi-Agent Systems
In distributed multi-agent systems, correctness is often entangled with operational policies such as scheduling, batching, or routing, which makes systems brittle since performance-driven policy evolution may break integrity guarantees. This paper introduces the Deterministic Causal Structure (DCS), a formal foundation that decouples correctness from policy. We develop a minimal axiomatic theory and prove four results: existence and uniqueness, policy-agnostic invariance, observational equivalence, and axiom minimality. These results show that DCS resolves causal ambiguities that value-centric convergence models such as CRDTs cannot address, and that removing any axiom collapses determinism into ambiguity. DCS thus emerges as a boundary principle of asynchronous computation, analogous to CAP and FLP: correctness is preserved only within the expressive power of a join-semilattice. All guarantees are established by axioms and proofs, with only minimal illustrative constructions included to aid intuition. This work establishes correctness as a fixed, policy-agnostic substrate, a Correctness-as-a-Chassis paradigm, on which distributed intelligent systems can be built modularly, safely, and evolvably.
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use
Outcome-driven reinforcement learning has advanced reasoning in large language models (LLMs), but prevailing tool-augmented approaches train a single, monolithic policy that interleaves thoughts and tool calls under full context; this scales poorly with long horizons and diverse tools and generalizes weakly to new scenarios. Agentic systems offer a promising alternative by decomposing work across specialized modules, yet most remain training-free or rely on offline training decoupled from the live dynamics of multi-turn interaction. We introduce AgentFlow, a trainable, in-the-flow agentic framework that coordinates four modules (planner, executor, verifier, generator) through an evolving memory and directly optimizes its planner inside the multi-turn loop. To train on-policy in live environments, we propose Flow-based Group Refined Policy Optimization (Flow-GRPO), which tackles long-horizon, sparse-reward credit assignment by converting multi-turn optimization into a sequence of tractable single-turn policy updates. It broadcasts a single, verifiable trajectory-level outcome to every turn to align local planner decisions with global success and stabilizes learning with group-normalized advantages. Across ten benchmarks, AgentFlow with a 7B-scale backbone outperforms top-performing baselines with average accuracy gains of 14.9% on search, 14.0% on agentic, 14.5% on mathematical, and 4.1% on scientific tasks, even surpassing larger proprietary models like GPT-4o. Further analyses confirm the benefits of in-the-flow optimization, showing improved planning, enhanced tool-calling reliability, and positive scaling with model size and reasoning turns.
comment: 45 pages, 12 figures. Project website: https://agentflow.stanford.edu/
R3R: Decentralized Multi-Agent Collision Avoidance with Infinite-Horizon Safety
Existing decentralized methods for multi-agent motion planning lack formal, infinite-horizon safety guarantees, especially for communication-constrained systems. We present R3R, to our knowledge the first decentralized and asynchronous framework for multi-agent motion planning under distance-based communication constraints with infinite-horizon safety guarantees for systems of nonlinear agents. R3R's novelty lies in combining our gatekeeper safety framework with a geometric constraint called R-Boundedness, which together establish a formal link between an agent's communication radius and its ability to plan safely. We constrain trajectories to within a fixed planning radius that is a function of the agent's communication radius, which enables trajectories to be shown provably safe for all time, using only local information. Our algorithm is fully asynchronous, and ensures the forward invariance of these guarantees even in time-varying networks where agents asynchronously join, leave, and replan. We validate our approach in simulations of up to 128 Dubins vehicles, demonstrating 100% safety in dense, obstacle rich scenarios. Our results demonstrate that R3R's performance scales with agent density rather than problem size, providing a practical solution for scalable and provably safe multi-agent systems.
comment: 8 pages, LaTeX; submitted to the American Control Conference (ACC) 2026
Three-dimensional Integrated Guidance and Control for Leader-Follower Flexible Formation of Fixed Wing UAVs
This paper presents a nonlinear integrated guidance and control (IGC) approach for flexible leader-follower formation flight of fixed-wing unmanned aerial vehicles (UAVs) while accounting for high-fidelity aerodynamics and thrust dynamics. Unlike conventional leader-follower schemes that fix the follower's position relative to the leader, the follower is steered to maintain range and bearing angles (which is the angle between its velocity vector and its line-of-sight (LOS) with respect to the leader) arbitrarily close to the prescribed values, enabling the follower to maintain formation on a hemispherical region behind the leader. The proposed IGC framework directly maps leader-follower relative range dynamics to throttle commands, and the follower's velocity orientation relative to the LOS to aerodynamic control surface deflections. This enables synergism between guidance and control subsystems. The control design uses a dynamic surface control-based backstepping approach to achieve convergence to the desired formation set, where Lyapunov barrier functions are incorporated to ensure the follower's bearing angle is constrained within specified bounds. Rigorous stability analysis guarantees uniform ultimate boundedness of all error states and strict constraint satisfaction in the presence of aerodynamic nonlinearities. The proposed flexible formation scheme allows the follower to have an orientation mismatch relative to the leader to execute anticipatory reconfiguration by transitioning between the relative positions in the admissible formation set when the leader aggressively maneuvers. The proposed IGC law relies only on relative information and onboard sensors without the information about the leader's maneuver, making it suitable for GPS-denied or non-cooperative scenarios. Finally, we present simulation results to vindicate the effectiveness and robustness of our approach.
Flexible Swarm Learning May Outpace Foundation Models in Essential Tasks
Foundation models have rapidly advanced AI, raising the question of whether their decisions will ultimately surpass human strategies in real-world domains. The exponential, and possibly super-exponential, pace of AI development makes such analysis elusive. Nevertheless, many application areas that matter for daily life and society show only modest gains so far; a prominent case is diagnosing and treating dynamically evolving disease in intensive care. The common challenge is adapting complex systems to dynamic environments. Effective strategies must optimize outcomes in systems composed of strongly interacting functions while avoiding shared side effects; this requires reliable, self-adaptive modeling. These tasks align with building digital twins of highly complex systems whose mechanisms are not fully or quantitatively understood. It is therefore essential to develop methods for self-adapting AI models with minimal data and limited mechanistic knowledge. As this challenge extends beyond medicine, AI should demonstrate clear superiority in these settings before assuming broader decision-making roles. We identify the curse of dimensionality as a fundamental barrier to efficient self-adaptation and argue that monolithic foundation models face conceptual limits in overcoming it. As an alternative, we propose a decentralized architecture of interacting small agent networks (SANs). We focus on agents representing the specialized substructure of the system, where each agent covers only a subset of the full system functions. Drawing on mathematical results on the learning behavior of SANs and evidence from existing applications, we argue that swarm-learning in diverse swarms can enable self-adaptive SANs to deliver superior decision-making in dynamic environments compared with monolithic foundation models, though at the cost of reduced reproducibility in detail.
GRPO-GCC: Enhancing Cooperation in Spatial Public Goods Games via Group Relative Policy Optimization with Global Cooperation Constraint
Inspired by the principle of self-regulating cooperation in collective institutions, we propose the Group Relative Policy Optimization with Global Cooperation Constraint (GRPO-GCC) framework. This work is the first to introduce GRPO into spatial public goods games, establishing a new deep reinforcement learning baseline for structured populations. GRPO-GCC integrates group relative policy optimization with a global cooperation constraint that strengthens incentives at intermediate cooperation levels while weakening them at extremes. This mechanism aligns local decision making with sustainable collective outcomes and prevents collapse into either universal defection or unconditional cooperation. The framework advances beyond existing approaches by combining group-normalized advantage estimation, a reference-anchored KL penalty, and a global incentive term that dynamically adjusts cooperative payoffs. As a result, it achieves accelerated cooperation onset, stabilized policy adaptation, and long-term sustainability. GRPO-GCC demonstrates how a simple yet global signal can reshape incentives toward resilient cooperation, and provides a new paradigm for multi-agent reinforcement learning in socio-technical systems.
Mnemosyne: An Unsupervised, Human-Inspired Long-Term Memory Architecture for Edge-Based LLMs
Long-term memory is essential for natural, realistic dialogue. However, current large language model (LLM) memory systems rely on either brute-force context expansion or static retrieval pipelines that fail on edge-constrained devices. We introduce Mnemosyne, an unsupervised, human-inspired long-term memory architecture designed for edge-based LLMs. Our approach uses graph-structured storage, modular substance and redundancy filters, memory committing and pruning mechanisms, and probabilistic recall with temporal decay and refresh processes modeled after human memory. Mnemosyne also introduces a concentrated "core summary" efficiently derived from a fixed-length subset of the memory graph to capture the user's personality and other domain-specific long-term details such as, using healthcare application as an example, post-recovery ambitions and attitude towards care. Unlike existing retrieval-augmented methods, Mnemosyne is designed for use in longitudinal healthcare assistants, where repetitive and semantically similar but temporally distinct conversations are limited by naive retrieval. In experiments with longitudinal healthcare dialogues, Mnemosyne demonstrates the highest win rate of 65.8% in blind human evaluations of realism and long-term memory capability compared to a baseline RAG win rate of 31.1%. Mnemosyne also achieves current highest LoCoMo benchmark scores in temporal reasoning and single-hop retrieval compared to other same-backboned techniques. Further, the average overall score of 54.6% was second highest across all methods, beating commonly used Mem0 and OpenAI baselines among others. This demonstrates that improved factual recall, enhanced temporal reasoning, and much more natural user-facing responses can be feasible with an edge-compatible and easily transferable unsupervised memory architecture.
comment: 12 pages, 4 figures
QLLM: Do We Really Need a Mixing Network for Credit Assignment in Multi-Agent Reinforcement Learning?
Credit assignment has remained a fundamental challenge in multi-agent reinforcement learning (MARL). Previous studies have primarily addressed this issue through value decomposition methods under the centralized training with decentralized execution paradigm, where neural networks are utilized to approximate the nonlinear relationship between individual Q-values and the global Q-value. Although these approaches have achieved considerable success in various benchmark tasks, they still suffer from several limitations, including imprecise attribution of contributions, limited interpretability, and poor scalability in high-dimensional state spaces. To address these challenges, we propose a novel algorithm, \textbf{QLLM}, which facilitates the automatic construction of credit assignment functions using large language models (LLMs). Specifically, the concept of \textbf{TFCAF} is introduced, wherein the credit allocation process is represented as a direct and expressive nonlinear functional formulation. A custom-designed \textit{coder-evaluator} framework is further employed to guide the generation, verification, and refinement of executable code by LLMs, significantly mitigating issues such as hallucination and shallow reasoning during inference. Extensive experiments conducted on several standard MARL benchmarks demonstrate that the proposed method consistently outperforms existing state-of-the-art baselines. Moreover, QLLM exhibits strong generalization capability and maintains compatibility with a wide range of MARL algorithms that utilize mixing networks, positioning it as a promising and versatile solution for complex multi-agent scenarios.
comment: We are withdrawing this manuscript due to experimental errors and mistakes in data preprocessing. These issues materially affect the results and could mislead subsequent studies
Decentralized Collective World Model for Emergent Communication and Coordination
We propose a fully decentralized multi-agent world model that enables both symbol emergence for communication and coordinated behavior through temporal extension of collective predictive coding. Unlike previous research that focuses on either communication or coordination separately, our approach achieves both simultaneously. Our method integrates world models with communication channels, enabling agents to predict environmental dynamics, estimate states from partial observations, and share critical information through bidirectional message exchange with contrastive learning for message alignment. Using a two-agent trajectory drawing task, we demonstrate that our communication-based approach outperforms non-communicative models when agents have divergent perceptual capabilities, achieving the second-best coordination after centralized models. Importantly, our decentralized approach with constraints preventing direct access to other agents' internal states facilitates the emergence of more meaningful symbol systems that accurately reflect environmental states. These findings demonstrate the effectiveness of decentralized communication for supporting coordination while developing shared representations of the environment.
comment: Accepted at IEEE ICDL 2025
Dynamic Strategy Adaptation in Multi-Agent Environments with Large Language Models
Large language models (LLMs) demonstrate strong reasoning abilities across mathematical, strategic, and linguistic tasks, yet little is known about how well they reason in dynamic, real-time, multi-agent scenarios, such as collaborative environments in which agents continuously adapt to each other's behavior, as in cooperative gameplay settings. In this paper, we bridge this gap by combining LLM-driven agents with strategic reasoning and real-time adaptation in cooperative, multi-agent environments grounded in game-theoretic principles such as belief consistency and Nash equilibrium. The proposed framework applies broadly to dynamic scenarios in which agents coordinate, communicate, and make decisions in response to continuously changing conditions. We provide real-time strategy refinement and adaptive feedback mechanisms that enable agents to dynamically adjust policies based on immediate contextual interactions, in contrast to previous efforts that evaluate LLM capabilities in static or turn-based settings. Empirical results show that our method achieves up to a 26\% improvement in return over PPO baselines in high-noise environments, while maintaining real-time latency under 1.05 milliseconds. Our approach improves collaboration efficiency, task completion rates, and flexibility, illustrating that game-theoretic guidance integrated with real-time feedback enhances LLM performance, ultimately fostering more resilient and flexible strategic multi-agent systems.
Generalizing Liquid Democracy to multi-agent delegation: A Voting Weight Measure and Equilibrium Analysis
In this study, we propose a generalization of the classic model of liquid democracy that allows fractional delegation of voting weight, while simultaneously allowing for the existence of equilibrium states. Our approach empowers agents to partition and delegate their votes to multiple representatives, all while retaining a fraction of the voting power for themselves. We introduce a penalty mechanism for the length of delegation chains. We discuss the desirable properties of a reasonable generalization of the classic model, and prove that smaller penalty factors bring the model closer to satisfying these properties. In the subsequent section, we explore the presence of equilibrium states in a general delegation game utilizing the proposed voting measure. In contrast to the classical model, we demonstrate that this game exhibits pure strategy Nash equilibria, contingent upon the imposition of a penalty on the length of delegation chains.
KramaBench: A Benchmark for AI Systems on Data-to-Insight Pipelines over Data Lakes
Constructing real-world data-to-insight pipelines often involves data extraction from data lakes, data integration across heterogeneous data sources, and diverse operations from data cleaning to analysis. The design and implementation of data science pipelines require domain knowledge, technical expertise, and even project-specific insights. AI systems have shown remarkable reasoning, coding, and understanding capabilities. However, it remains unclear to what extent these capabilities translate into successful design and execution of such complex pipelines. We introduce KRAMABENCH: a benchmark composed of 104 manually-curated real-world data science pipelines spanning 1700 data files from 24 data sources in 6 different domains. We show that these pipelines test the end-to-end capabilities of AI systems on data processing, requiring data discovery, wrangling and cleaning, efficient processing, statistical reasoning, and orchestrating data processing steps given a high-level task. Our evaluation tests 5 general models and 3 code generation models using our reference framework, DS-GURU, which instructs the AI model to decompose a question into a sequence of subtasks, reason through each step, and synthesize Python code that implements the proposed design. Our results on KRAMABENCH show that, although the models are sufficiently capable of solving well-specified data science code generation tasks, when extensive data processing and domain knowledge are required to construct real-world data science pipelines, existing out-of-box models fall short. Progress on KramaBench represents crucial steps towards developing autonomous data science agents for real-world applications. Our code, reference framework, and data are available at https://github.com/mitdbg/KramaBench.